SlideShare una empresa de Scribd logo
1 de 21
Descargar para leer sin conexión
Open Source ETL using Talend Open Studio

                                    Lu´ Santos
                                      ıs
                                luis@luissantos.pt



                                 February 14, 2013




Lu´ Santos luis@luissantos.pt
  ıs                                Open Source ETL   February 14, 2013   1
Overview

1    Who am i?

2    What is ETL?

3    ETL Software Suites

4    Talend Open Studio for Data Integration

5    Hands on

6    Conclusion



    Lu´ Santos luis@luissantos.pt
      ıs                            Open Source ETL   February 14, 2013   2
Warning!!!




This presentation was created using Latex
                  Why?
             Because i can!




  Lu´ Santos luis@luissantos.pt
    ıs                            Open Source ETL   February 14, 2013   3
Who am i?




Lu´ Santos luis@luissantos.pt
  ıs                              Open Source ETL   February 14, 2013   4
Who am i?




          Software Engineer and
          Mathematics Student
          Open Source addicted
          PHP and Java Developer




 Lu´ Santos luis@luissantos.pt
   ıs                             Open Source ETL   February 14, 2013   5
What is ETL?




Lu´ Santos luis@luissantos.pt
  ıs                               Open Source ETL   February 14, 2013   6
What is ETL?


     In computing, Extract, Transform and Load (ETL) refers to a
     process in database usage and especially in data warehousing
     that involves:
             Extracting data from outside sources
             Transforming it to fit operational needs (which can include
             quality levels)
             Loading it into the end target (database, more specifically,
             operational data store, data mart or data warehouse)



        (2013, http://en.wikipedia.org/wiki/Extract, transform, load)




 Lu´ Santos luis@luissantos.pt
   ıs                              Open Source ETL               February 14, 2013   7
ETL Software Suites




      Pentaho Data Integration (Kettle)
      SQL Server Integration Services
      Talend Open Studio for Data Integration
      etc...




  Lu´ Santos luis@luissantos.pt
    ıs                            Open Source ETL   February 14, 2013   8
Talend Open Studio for Data Integration


Talend Open Studio is a set of tools for developing, testing, deploying and
application integration projects.
      Talend Open Studio for Big Data
      Bonita Open Solution (BPM)
      Talend Open Studio for Data Integration
      Talend Open Studio for Data Quality
      Talend ESB
      Talend Open Studio for MDM




  Lu´ Santos luis@luissantos.pt
    ıs                            Open Source ETL             February 14, 2013   9
Datasource(rer)s




Lu´ Santos luis@luissantos.pt
  ıs                                 Open Source ETL   February 14, 2013   10
Datasources (Extract and Load)




  Mysql, MSSQL, Oracle, Sqlite, FirebirdSQL, XLS, CSV, XML, SOAP,
                  REST, HTTP, FTP, SSH, Imap




  Lu´ Santos luis@luissantos.pt
    ıs                            Open Source ETL     February 14, 2013   11
Transformers




Lu´ Santos luis@luissantos.pt
  ıs                               Open Source ETL   February 14, 2013   12
Transformers (Transform)




      Sort data
      Convert data
      Cross data between datasources
      Filter data
      Fuzzy search
      Normalize and Denormalize data




  Lu´ Santos luis@luissantos.pt
    ıs                            Open Source ETL   February 14, 2013   13
Where and how ?



     Where ?
             Multi-platform ( Linux, MacOs, BSD-* even on windows )
             You just need a JVM (Java Virtual Machine)




 Lu´ Santos luis@luissantos.pt
   ıs                              Open Source ETL              February 14, 2013   14
Where and how ?



     Where ?
             Multi-platform ( Linux, MacOs, BSD-* even on windows )
             You just need a JVM (Java Virtual Machine)
     How ?
             Execute it from your favorite programming language using syscalls
             Command line
             From your JVM based application (Java, Groovy, JRuby)
             Webservices runing on the top Java App Server (Tomcat, Glassfish)




 Lu´ Santos luis@luissantos.pt
   ıs                               Open Source ETL               February 14, 2013   14
Hands on




Lu´ Santos luis@luissantos.pt
  ıs                             Open Source ETL   February 14, 2013   15
Hands on




     Querying data
     Joining data from multiple datasources
     Filtering and sorting data
     Exporting data
     Deploying your job
     Calling it from PHP




 Lu´ Santos luis@luissantos.pt
   ıs                             Open Source ETL   February 14, 2013   16
Database Schema




 Lu´ Santos luis@luissantos.pt
   ıs                            Open Source ETL   February 14, 2013   17
Example




 Lu´ Santos luis@luissantos.pt
   ıs                            Open Source ETL   February 14, 2013   18
”With great power comes great responsability.”
                                         (Voltair)




Lu´ Santos luis@luissantos.pt
  ıs                            Open Source ETL      February 14, 2013   19
The End
    email: luis@luissantos.pt
    twitter: @santosluis87
    linkedin: https://www.linkedin.com/in/luissantos87




Lu´ Santos luis@luissantos.pt
  ıs                             Open Source ETL         February 14, 2013   20

Más contenido relacionado

La actualidad más candente

Manipulating Data with Talend.
Manipulating Data with Talend.Manipulating Data with Talend.
Manipulating Data with Talend.Edureka!
 
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...Edureka!
 
Talend Big Data Capabilities - 2014
Talend Big Data Capabilities - 2014Talend Big Data Capabilities - 2014
Talend Big Data Capabilities - 2014Rajan Kanitkar
 
Talend Open Studio for Big Data | Talend Open Studio Tutorial | Talend Online...
Talend Open Studio for Big Data | Talend Open Studio Tutorial | Talend Online...Talend Open Studio for Big Data | Talend Open Studio Tutorial | Talend Online...
Talend Open Studio for Big Data | Talend Open Studio Tutorial | Talend Online...Edureka!
 
Talend Big Data Tutorial | Talend DI and Big Data Certification | Talend Onli...
Talend Big Data Tutorial | Talend DI and Big Data Certification | Talend Onli...Talend Big Data Tutorial | Talend DI and Big Data Certification | Talend Onli...
Talend Big Data Tutorial | Talend DI and Big Data Certification | Talend Onli...Edureka!
 
Talend online training and jobsupport
Talend online training and jobsupportTalend online training and jobsupport
Talend online training and jobsupportkraja2035
 
Patterns and Anti-Patterns for Memorializing Data Science Project Artifacts
Patterns and Anti-Patterns for Memorializing Data Science Project ArtifactsPatterns and Anti-Patterns for Memorializing Data Science Project Artifacts
Patterns and Anti-Patterns for Memorializing Data Science Project ArtifactsDatabricks
 
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)Roland Bouman
 
KnowIT, semantic informatics knowledge base
KnowIT, semantic informatics knowledge baseKnowIT, semantic informatics knowledge base
KnowIT, semantic informatics knowledge baseLaurent Alquier
 
Machine learning at scale challenges and solutions
Machine learning at scale challenges and solutionsMachine learning at scale challenges and solutions
Machine learning at scale challenges and solutionsStavros Kontopoulos
 
Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhD
Spark with Azure HDInsight  - Tampa Bay Data Science - Adnan Masood, PhDSpark with Azure HDInsight  - Tampa Bay Data Science - Adnan Masood, PhD
Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhDAdnan Masood
 
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with InnovationNot Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with InnovationInside Analysis
 
From Raw Data to Analytics with No ETL
From Raw Data to Analytics with No ETLFrom Raw Data to Analytics with No ETL
From Raw Data to Analytics with No ETLCloudera, Inc.
 
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...DataWorks Summit
 

La actualidad más candente (20)

Manipulating Data with Talend.
Manipulating Data with Talend.Manipulating Data with Talend.
Manipulating Data with Talend.
 
Talend for big_data_intorduction
Talend for big_data_intorductionTalend for big_data_intorduction
Talend for big_data_intorduction
 
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
 
TaLend Online Training
TaLend Online TrainingTaLend Online Training
TaLend Online Training
 
Talend Big Data Capabilities - 2014
Talend Big Data Capabilities - 2014Talend Big Data Capabilities - 2014
Talend Big Data Capabilities - 2014
 
Talend Open Studio for Big Data | Talend Open Studio Tutorial | Talend Online...
Talend Open Studio for Big Data | Talend Open Studio Tutorial | Talend Online...Talend Open Studio for Big Data | Talend Open Studio Tutorial | Talend Online...
Talend Open Studio for Big Data | Talend Open Studio Tutorial | Talend Online...
 
Talend Big Data Tutorial | Talend DI and Big Data Certification | Talend Onli...
Talend Big Data Tutorial | Talend DI and Big Data Certification | Talend Onli...Talend Big Data Tutorial | Talend DI and Big Data Certification | Talend Onli...
Talend Big Data Tutorial | Talend DI and Big Data Certification | Talend Onli...
 
Talend online training and jobsupport
Talend online training and jobsupportTalend online training and jobsupport
Talend online training and jobsupport
 
Patterns and Anti-Patterns for Memorializing Data Science Project Artifacts
Patterns and Anti-Patterns for Memorializing Data Science Project ArtifactsPatterns and Anti-Patterns for Memorializing Data Science Project Artifacts
Patterns and Anti-Patterns for Memorializing Data Science Project Artifacts
 
Oracle data integrator (odi) online training
Oracle data integrator (odi) online trainingOracle data integrator (odi) online training
Oracle data integrator (odi) online training
 
Introduction To Pentaho Kettle
Introduction To Pentaho KettleIntroduction To Pentaho Kettle
Introduction To Pentaho Kettle
 
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
 
KnowIT, semantic informatics knowledge base
KnowIT, semantic informatics knowledge baseKnowIT, semantic informatics knowledge base
KnowIT, semantic informatics knowledge base
 
Kettle – Etl Tool
Kettle – Etl ToolKettle – Etl Tool
Kettle – Etl Tool
 
Machine learning at scale challenges and solutions
Machine learning at scale challenges and solutionsMachine learning at scale challenges and solutions
Machine learning at scale challenges and solutions
 
Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhD
Spark with Azure HDInsight  - Tampa Bay Data Science - Adnan Masood, PhDSpark with Azure HDInsight  - Tampa Bay Data Science - Adnan Masood, PhD
Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhD
 
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with InnovationNot Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
 
From Raw Data to Analytics with No ETL
From Raw Data to Analytics with No ETLFrom Raw Data to Analytics with No ETL
From Raw Data to Analytics with No ETL
 
Great Scott! Dealing with New Datatypes
Great Scott! Dealing with New DatatypesGreat Scott! Dealing with New Datatypes
Great Scott! Dealing with New Datatypes
 
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
 

Similar a Open Source ETL using Talend Open Studio

Data analytics course 3
Data analytics course 3Data analytics course 3
Data analytics course 3nakshatraL
 
Elephants vs. Dolphins: Comparing PostgreSQL and MySQL for use in the DoD
Elephants vs. Dolphins:  Comparing PostgreSQL and MySQL for use in the DoDElephants vs. Dolphins:  Comparing PostgreSQL and MySQL for use in the DoD
Elephants vs. Dolphins: Comparing PostgreSQL and MySQL for use in the DoDJamey Hanson
 
Navigating the Open Source Geospatial Ecosystem
Navigating the Open Source Geospatial EcosystemNavigating the Open Source Geospatial Ecosystem
Navigating the Open Source Geospatial EcosystemJust van den Broecke
 
Tyler Rutschman- Kansas City
Tyler Rutschman- Kansas CityTyler Rutschman- Kansas City
Tyler Rutschman- Kansas CitySplunk
 
Application Engine ETL
Application Engine ETLApplication Engine ETL
Application Engine ETLkabrilake
 
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...FIWARE
 
Migration to Lotus Groupware @ UZH
Migration to Lotus Groupware  @ UZHMigration to Lotus Groupware  @ UZH
Migration to Lotus Groupware @ UZHRoberto Mazzoni
 
Linked (Open) Data: A quick introduction
Linked (Open) Data: A quick introductionLinked (Open) Data: A quick introduction
Linked (Open) Data: A quick introductionnvitucci
 
Linked Open Data - State of the Art, Challenges and Applications
Linked Open Data - State of the Art, Challenges and ApplicationsLinked Open Data - State of the Art, Challenges and Applications
Linked Open Data - State of the Art, Challenges and ApplicationsRui Vieira
 
Gerenral insurance Accounts IT and Investment
Gerenral insurance Accounts IT and InvestmentGerenral insurance Accounts IT and Investment
Gerenral insurance Accounts IT and Investmentvijayk23x
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Andy Petrella_Med@Scale by Data Fellas: Scalable and Interoperable Genomics d...
Andy Petrella_Med@Scale by Data Fellas: Scalable and Interoperable Genomics d...Andy Petrella_Med@Scale by Data Fellas: Scalable and Interoperable Genomics d...
Andy Petrella_Med@Scale by Data Fellas: Scalable and Interoperable Genomics d...Dataconomy Media
 
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...Andy Petrella
 
Linked Open Data: A simple how-to
Linked Open Data: A simple how-toLinked Open Data: A simple how-to
Linked Open Data: A simple how-tonvitucci
 
GET STARTED WITH R FOR DATA SCIENCE
GET STARTED WITH R FOR DATA SCIENCEGET STARTED WITH R FOR DATA SCIENCE
GET STARTED WITH R FOR DATA SCIENCEUSDSI
 

Similar a Open Source ETL using Talend Open Studio (20)

20130206 open refine
20130206  open refine20130206  open refine
20130206 open refine
 
Treasure Data Cloud Strategy
Treasure Data Cloud StrategyTreasure Data Cloud Strategy
Treasure Data Cloud Strategy
 
Data analytics course 3
Data analytics course 3Data analytics course 3
Data analytics course 3
 
20100714accel
20100714accel20100714accel
20100714accel
 
Elephants vs. Dolphins: Comparing PostgreSQL and MySQL for use in the DoD
Elephants vs. Dolphins:  Comparing PostgreSQL and MySQL for use in the DoDElephants vs. Dolphins:  Comparing PostgreSQL and MySQL for use in the DoD
Elephants vs. Dolphins: Comparing PostgreSQL and MySQL for use in the DoD
 
Navigating the Open Source Geospatial Ecosystem
Navigating the Open Source Geospatial EcosystemNavigating the Open Source Geospatial Ecosystem
Navigating the Open Source Geospatial Ecosystem
 
Tyler Rutschman- Kansas City
Tyler Rutschman- Kansas CityTyler Rutschman- Kansas City
Tyler Rutschman- Kansas City
 
Application Engine ETL
Application Engine ETLApplication Engine ETL
Application Engine ETL
 
Oracle GoldenGate for Oracle DBAs
Oracle GoldenGate for Oracle DBAsOracle GoldenGate for Oracle DBAs
Oracle GoldenGate for Oracle DBAs
 
Lakshmi_DB_Engineer1
Lakshmi_DB_Engineer1Lakshmi_DB_Engineer1
Lakshmi_DB_Engineer1
 
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
 
Migration to Lotus Groupware @ UZH
Migration to Lotus Groupware  @ UZHMigration to Lotus Groupware  @ UZH
Migration to Lotus Groupware @ UZH
 
Linked (Open) Data: A quick introduction
Linked (Open) Data: A quick introductionLinked (Open) Data: A quick introduction
Linked (Open) Data: A quick introduction
 
Linked Open Data - State of the Art, Challenges and Applications
Linked Open Data - State of the Art, Challenges and ApplicationsLinked Open Data - State of the Art, Challenges and Applications
Linked Open Data - State of the Art, Challenges and Applications
 
Gerenral insurance Accounts IT and Investment
Gerenral insurance Accounts IT and InvestmentGerenral insurance Accounts IT and Investment
Gerenral insurance Accounts IT and Investment
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Andy Petrella_Med@Scale by Data Fellas: Scalable and Interoperable Genomics d...
Andy Petrella_Med@Scale by Data Fellas: Scalable and Interoperable Genomics d...Andy Petrella_Med@Scale by Data Fellas: Scalable and Interoperable Genomics d...
Andy Petrella_Med@Scale by Data Fellas: Scalable and Interoperable Genomics d...
 
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
 
Linked Open Data: A simple how-to
Linked Open Data: A simple how-toLinked Open Data: A simple how-to
Linked Open Data: A simple how-to
 
GET STARTED WITH R FOR DATA SCIENCE
GET STARTED WITH R FOR DATA SCIENCEGET STARTED WITH R FOR DATA SCIENCE
GET STARTED WITH R FOR DATA SCIENCE
 

Último

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Último (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Open Source ETL using Talend Open Studio

  • 1. Open Source ETL using Talend Open Studio Lu´ Santos ıs luis@luissantos.pt February 14, 2013 Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 1
  • 2. Overview 1 Who am i? 2 What is ETL? 3 ETL Software Suites 4 Talend Open Studio for Data Integration 5 Hands on 6 Conclusion Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 2
  • 3. Warning!!! This presentation was created using Latex Why? Because i can! Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 3
  • 4. Who am i? Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 4
  • 5. Who am i? Software Engineer and Mathematics Student Open Source addicted PHP and Java Developer Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 5
  • 6. What is ETL? Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 6
  • 7. What is ETL? In computing, Extract, Transform and Load (ETL) refers to a process in database usage and especially in data warehousing that involves: Extracting data from outside sources Transforming it to fit operational needs (which can include quality levels) Loading it into the end target (database, more specifically, operational data store, data mart or data warehouse) (2013, http://en.wikipedia.org/wiki/Extract, transform, load) Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 7
  • 8. ETL Software Suites Pentaho Data Integration (Kettle) SQL Server Integration Services Talend Open Studio for Data Integration etc... Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 8
  • 9. Talend Open Studio for Data Integration Talend Open Studio is a set of tools for developing, testing, deploying and application integration projects. Talend Open Studio for Big Data Bonita Open Solution (BPM) Talend Open Studio for Data Integration Talend Open Studio for Data Quality Talend ESB Talend Open Studio for MDM Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 9
  • 10. Datasource(rer)s Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 10
  • 11. Datasources (Extract and Load) Mysql, MSSQL, Oracle, Sqlite, FirebirdSQL, XLS, CSV, XML, SOAP, REST, HTTP, FTP, SSH, Imap Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 11
  • 12. Transformers Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 12
  • 13. Transformers (Transform) Sort data Convert data Cross data between datasources Filter data Fuzzy search Normalize and Denormalize data Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 13
  • 14. Where and how ? Where ? Multi-platform ( Linux, MacOs, BSD-* even on windows ) You just need a JVM (Java Virtual Machine) Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 14
  • 15. Where and how ? Where ? Multi-platform ( Linux, MacOs, BSD-* even on windows ) You just need a JVM (Java Virtual Machine) How ? Execute it from your favorite programming language using syscalls Command line From your JVM based application (Java, Groovy, JRuby) Webservices runing on the top Java App Server (Tomcat, Glassfish) Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 14
  • 16. Hands on Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 15
  • 17. Hands on Querying data Joining data from multiple datasources Filtering and sorting data Exporting data Deploying your job Calling it from PHP Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 16
  • 18. Database Schema Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 17
  • 19. Example Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 18
  • 20. ”With great power comes great responsability.” (Voltair) Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 19
  • 21. The End email: luis@luissantos.pt twitter: @santosluis87 linkedin: https://www.linkedin.com/in/luissantos87 Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 20