Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Open Source ETL using Talend Open Studio
1. Open Source ETL using Talend Open Studio
Lu´ Santos
ıs
luis@luissantos.pt
February 14, 2013
Lu´ Santos luis@luissantos.pt
ıs Open Source ETL February 14, 2013 1
2. Overview
1 Who am i?
2 What is ETL?
3 ETL Software Suites
4 Talend Open Studio for Data Integration
5 Hands on
6 Conclusion
Lu´ Santos luis@luissantos.pt
ıs Open Source ETL February 14, 2013 2
3. Warning!!!
This presentation was created using Latex
Why?
Because i can!
Lu´ Santos luis@luissantos.pt
ıs Open Source ETL February 14, 2013 3
4. Who am i?
Lu´ Santos luis@luissantos.pt
ıs Open Source ETL February 14, 2013 4
5. Who am i?
Software Engineer and
Mathematics Student
Open Source addicted
PHP and Java Developer
Lu´ Santos luis@luissantos.pt
ıs Open Source ETL February 14, 2013 5
6. What is ETL?
Lu´ Santos luis@luissantos.pt
ıs Open Source ETL February 14, 2013 6
7. What is ETL?
In computing, Extract, Transform and Load (ETL) refers to a
process in database usage and especially in data warehousing
that involves:
Extracting data from outside sources
Transforming it to fit operational needs (which can include
quality levels)
Loading it into the end target (database, more specifically,
operational data store, data mart or data warehouse)
(2013, http://en.wikipedia.org/wiki/Extract, transform, load)
Lu´ Santos luis@luissantos.pt
ıs Open Source ETL February 14, 2013 7
8. ETL Software Suites
Pentaho Data Integration (Kettle)
SQL Server Integration Services
Talend Open Studio for Data Integration
etc...
Lu´ Santos luis@luissantos.pt
ıs Open Source ETL February 14, 2013 8
9. Talend Open Studio for Data Integration
Talend Open Studio is a set of tools for developing, testing, deploying and
application integration projects.
Talend Open Studio for Big Data
Bonita Open Solution (BPM)
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend ESB
Talend Open Studio for MDM
Lu´ Santos luis@luissantos.pt
ıs Open Source ETL February 14, 2013 9
13. Transformers (Transform)
Sort data
Convert data
Cross data between datasources
Filter data
Fuzzy search
Normalize and Denormalize data
Lu´ Santos luis@luissantos.pt
ıs Open Source ETL February 14, 2013 13
14. Where and how ?
Where ?
Multi-platform ( Linux, MacOs, BSD-* even on windows )
You just need a JVM (Java Virtual Machine)
Lu´ Santos luis@luissantos.pt
ıs Open Source ETL February 14, 2013 14
15. Where and how ?
Where ?
Multi-platform ( Linux, MacOs, BSD-* even on windows )
You just need a JVM (Java Virtual Machine)
How ?
Execute it from your favorite programming language using syscalls
Command line
From your JVM based application (Java, Groovy, JRuby)
Webservices runing on the top Java App Server (Tomcat, Glassfish)
Lu´ Santos luis@luissantos.pt
ıs Open Source ETL February 14, 2013 14
16. Hands on
Lu´ Santos luis@luissantos.pt
ıs Open Source ETL February 14, 2013 15
17. Hands on
Querying data
Joining data from multiple datasources
Filtering and sorting data
Exporting data
Deploying your job
Calling it from PHP
Lu´ Santos luis@luissantos.pt
ıs Open Source ETL February 14, 2013 16
18. Database Schema
Lu´ Santos luis@luissantos.pt
ıs Open Source ETL February 14, 2013 17
19. Example
Lu´ Santos luis@luissantos.pt
ıs Open Source ETL February 14, 2013 18
20. ”With great power comes great responsability.”
(Voltair)
Lu´ Santos luis@luissantos.pt
ıs Open Source ETL February 14, 2013 19
21. The End
email: luis@luissantos.pt
twitter: @santosluis87
linkedin: https://www.linkedin.com/in/luissantos87
Lu´ Santos luis@luissantos.pt
ıs Open Source ETL February 14, 2013 20