Más contenido relacionado La actualidad más candente (20) Similar a Tackling big data with hadoop and open source integration (20) Más de DataWorks Summit (20) Tackling big data with hadoop and open source integration1. Tackling Big Data with Hadoop and
Open Source Integration
Ciaran Dynes
Remy Dubois
2. Agenda
1. Talend’s Goal: Democratizing Integration
2. What is Big Data (integration)?
3. Big Data for the Masses: Talend’s strategy and vision
© Talend 2011 2
4. Talend – The Market Leading Unified Integration Platform
Talend Enterprise
Data Data
MDM ESB BPM
Quality Integration
¾ Commercial license
¾ Subscription model
Studio Repository Deployment Execution Monitoring
¾ Open source license
Talend Open Studio for
¾ Free of charge
¾ Optional support
Data Data
Quality Integration MDM ESB
Recognized as the open source leader in each of its market
category by all industry analysts
© Talend 2011 4
5. Who uses Talend?
A high adoption rate
§ 20 million downloads
§ 950,000 users
§ 3,500 customers
1 product download 150 new customers
every 30 seconds per month
© Talend 2011 5
6. Trying to get from this…
© Talend 2011 – Stri2y Private & Confidential
© Talend 2011 6
7. to this…
Why Talend…
ONLY Talend generates code that is executed within map reduce. This
open approach removes the limitation of a proprietary “engine” to
provide a truly unique and powerful set of tools for big data.
8. Big data is….
Hans Rosling – uses big data to analyze world health trends
Key Takeaway #1
transactions, interactions, observations
© Talend 2011 – Stri2y Private & Confidential
© Talend 2011 8
9. Big Data = Transactions + Interactions + Observations
Sensors/RFID/Devices User Generated Content
Big Data
Mega, Giga, Tera, Peta bytes
Sentiment Social Interactions & Feeds
Mobile Web
Spatial & GPS coordinates
User Clicks
External Demographics
Web logs WEB Business Data Feeds
Offer history A/B testing Video, Audio, Images
Dynamic pricing SMS/MMS
CRM Segmentation Affiliate Networks
Search Marketing
ERP Offer details
Purchase detail Customer Touchpoints Behavioral Targeting
Purchase record Support Contacts Dynamic Funnels
Payment record
Increasing Data Variety and Complexity
Source: Hortonworks
© Talend 2011 – Stri2y Private & Confidential
© Talend 2011 9
11. Traditional Data Flows
CRM
ETL
Normalized Traditional Data
ERP Data Data
Warehouse
Quality
Finance
• Scheduled–daily or weekly,
sometimes more frequently. Business Business
Analyst User
• Volumes rarely exceed
terabytes Warehouse
Administrator
Executives
© Talend 2011 – Stri2y Private & Confidential
© Talend 2011 11
12. The new world of big data
Social
Networking
CRM
ERP
Big Data
Finance
© Talend 2011 – Stri2y Private & Confidential
© Talend 2011 12
13. The new world of big data
Social
Networking
CRM
Mobile Devices
ERP
Big Data
Finance
© Talend 2011 – Stri2y Private & Confidential
© Talend 2011 13
14. The new world of big data
Social
Networking
CRM
Mobile Devices
ERP
Transactions
Finance
Big Data
© Talend 2011 – Stri2y Private & Confidential
© Talend 2011 14
15. The new world of big data
Social
Networking
CRM
Mobile Devices
ERP
Transactions
Finance
Network Devices
Big Data Sensors
© Talend 2011 – Stri2y Private & Confidential
© Talend 2011 15
16. Key Takeaway #2
Forces us to think
© Talend 2011
differently
© Talend 2011 – Stri2y Private & Confidential 16
17. But for Talend…. Big data is…
…everything that is old, is new again!
© Talend 2011 – Stri2y Private & Confidential
© Talend 2011 17
18. Data driven business
enables
data governance
supports
information decisions
drives
Information provides
value to the business
If you can't rely on your information then Your
the result can be missed opportunities, or business
higher costs.
Matthew West and Julian Fowler (1999). Developing High Quality Data Models.
The European Process Industries STEP Technical Liaison Executive (EPISTLE).
© Talend 2011 – Stri2y Private & Confidential
© Talend 2011 18
19. BIG data driven business
enables
BIG data governance
supports
BIG BIG
information decisions
drives
Information provides
value to the business
If you can't rely on your information then
the result can be missed opportunities, or BIG
higher costs. business
Matthew West and Julian Fowler (1999). Developing High Quality Data Models.
The European Process Industries STEP Technical Liaison Executive (EPISTLE).
© Talend 2011 – Stri2y Private & Confidential
© Talend 2011 19
21. Goal: Democratize Big Data
Talend Open Studio for Big Data
¾ “Big Data for the Masses”
¾ Improves efficiency of big data job
design with graphic interface
¾ Abstracts and generates code
¾ Run transforms inside Hadoop
Pig
¾ Native support for HDFS, Pig, HBase,
Sqoop and Hive
¾ Apache License 2.0
¾ Embedded in Hortonworks Data
…an open source Platform
ecosystem
© Talend 2011 – Stri2y Private & Confidential
© Talend 2011 21
24. How is big data integration being used?
Use Cases
• Recommendation Engine
• Sentiment Analysis
• Risk Modeling
• Fraud Detection
• Marketing Campaign Analysis
• Customer Churn Analysis
• Social Graph Analysis
• Customer Experience Analytics
• Network Monitoring
• Research And Development
BUT: to what level is DQ required for your use
case?
© Talend 2011 – Stri2y Private & Confidential
© Talend 2011 24
25. Poor Data Quality + Big Data = Big Problems
Poor Data Quality * Big Data = Big Problems^2
Key Takeaway #3
In big data…
poor data quality can be magnified at huge scale
© Talend 2011 25
26. Two methods for inserting data quality into a big data job
1. Pipelining: as part of the load process
2. Load the cluster than implement and execute
a data quality map reduce job
© Talend 2011 26
27. E-T-L - Load
Extract – Transform
© Talend 2011 – Stri2y Private & Confidential
© Talend 2011 27
28. E- DQ -L
Extract – Improve/Cleanse - Load
© Talend 2011 – Stri2y Private & Confidential
© Talend 2011 28
29. Pipelining: data quality with big data
CRM
DQ
ERP
DQ
Finance
Big Data
Social
Networking
• Use traditional data quality tools
• No new programming, no PHDs
• Once and done
Mobile Devices
© Talend 2011 – Stri2y Private & Confidential
© Talend 2011 29
30. Big data alternative: Load and improve within the cluster
CRM
DQ
ERP
DQ
Finance
Big Data
Social
Networking
• Load first, improve later
• Really complex to build, limited tools
• Constant on, increments
Mobile Devices
• Insane performance
© Talend 2011 – Stri2y Private & Confidential
© Talend 2011 30
31. big
2012
data now Q4 2013
Talend Open Studio for Big Data
¾ Packaged within Hortonworks Data Platform
…Eclipse tools for HIVE, HDFS, PIG, SCOOP
…supports Oozie, Hcatalog, Kerberos
¾ Free to download and use under the Apache license
…democratizing big data through intuitive tools
© Talend 2011 – Stri2y Private & Confidential
© Talend 2011 31