This introduction show how OpenRefine can help any data project, from analytics, migration or reconciliation. OpenRefine powerful interface helps domains expert to explore, transform and enrich their data.
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Toronto OpenRefine MeetUp Nov 2015
1. refinepro.com - @RefinePro – martin@refinepro.com 1
Enable domain experts explore,
normalize and enrich their data via a
self service data preparation platform
Toronto OpenRefine Meet Up Nov 17, 2015
3. refinepro.com - @RefinePro – martin@refinepro.com 3
Analytics need
clean data to be
reliable
Legacy data
need to be
migrated to a
new system
Data must be
reconciled
against a master
data set
Data projects needs access to
reliable data quickly
5. refinepro.com - @RefinePro – martin@refinepro.com 5
60 to 80% of data analysis
is spent on the process of
cleaning, transformation and integration
Data Processing Pipeline
6. refinepro.com - @RefinePro – martin@refinepro.com 6
• Duplicate value & Typos
• Multi value cells
• Data in the wrong field
• Missing / Partial Values
• Encoding Errors
• Change format (text, number, date)
• Flat to relational data set
• Schema alignment
• Transpose rows and columns
• Join data-set
• Enrichment from other sources
(MDM, API calls)
Data Quality & Integration &
Is Time Consuming
7. refinepro.com - @RefinePro – martin@refinepro.com 7
• Which field should it contains?
• What format should it follow?
• What geographical scope should it support?
• Enforce data integrity rules (eg. postal code vs city)?
–
Individual and business unit needs are
unique
What is clean data?
How do you know define a clean address?
8. refinepro.com - @RefinePro – martin@refinepro.com 8
• New economy of machine learning, predictive and data
enrichment service:
• Geocoding and address cleaning
• Name recognition and extraction
• Churn prediction
• …
Those services come with an API first approach requiring
technical skills
Data Service have an
API first approach
9. refinepro.com - @RefinePro – martin@refinepro.com 9
DBA
ETL
Data Science
Spreadsheet User
Data Visualization / Interpretation
User Base
Understand
the Data
(Business Skills)
Know How To
Transform Data
(Technical Skills)
Today's data environment
challenge traditional
technologies
Excel doesn't
scale or
automate well
IT can't pace
with the volume
of requests
10. refinepro.com - @RefinePro – martin@refinepro.com 10
Frequency
- number
of use
case
Profiling
Preparation
Discovery
Data Wrangling
1 32
Sense Making
Data Exploration
Is the data useful?
What Can I do with it?
OpenRefine in the Data Quality
& Integration Pipeline
11. refinepro.com - @RefinePro – martin@refinepro.com 11
Frequency
- number
of use
case
Profiling
Preparation
Discovery
Data Wrangling
1 32
Personal ETL &
Analysis
Prototype
One time migration
Sense Making
Data Exploration
Is the data useful?
What Can I do with it?
OpenRefine in the Data Quality
& Integration Pipeline
12. refinepro.com - @RefinePro – martin@refinepro.com 12
Frequency
- number
of use
case
Profiling
Preparation
Discovery
Data Wrangling
1 32
Big Data
Real -Time
Processing
Enterprise ETL
Personal ETL &
Analysis
Prototype
One time migration
Sense Making
Data Exploration
Is the data useful?
What Can I do with it?
OpenRefine in the Data Quality
& Integration Pipeline
13. refinepro.com - @RefinePro – martin@refinepro.com 13
Understand
the Data
(Business Skills)
Know How To
Transform Data
(Technical Skills)
Frequency
- number
of use
case
Profiling
Preparation
Discovery
Data Wrangling
1 2 3
OpenRefine in the Data Quality
& Integration Pipeline
14. refinepro.com - @RefinePro – martin@refinepro.com 14
OpenRefine Functionality
XLS, CSV, JSON,
XML Input &
Output Support
Point & Click
Cluster &
Deduplication
Filter &
Sort
Transpose Custom Query
Language
Enrich data via
APIs
Join, Merge
& Reconcile
Split to rows
and columns
Undo /
Redo
15. refinepro.com - @RefinePro – martin@refinepro.com 15
Training
Cloud & on-
premise
hosting
Integration &
Custom
Development
RefinePro helps teams and
organization to scale OpenRefine
16. refinepro.com - @RefinePro – martin@refinepro.com 16
Contact and Link
OpenRefine
website: http://openrefine.org
Twitter @OpenRefine
RefinePro
website: http://refinepro.com
Twitter: @RefinePro
Martin Magdinier
twitter: @magdmartin
Linkedin: https://ca.linkedin.com/in/magdinier/en
17. refinepro.com - @RefinePro – martin@refinepro.com 17
Enable domain experts explore,
normalize and enrich their data via a
self service data preparation platform
Toronto OpenRefine Meet Up Nov 17, 2015