SlideShare una empresa de Scribd logo
1 de 28
Descargar para leer sin conexión
Talend Data Integration and Management
Data Integration



   Data Integration involves combining data
 residing in differente sources and providing the
        user with a unified view of the data


Data Management combines different disciplines
    to manage data as a valuable resource




                                         www.robertomarchetto.com
Talend


●   Talend is a company focused on Data
    Integration and Data Management solutions
●   Talend is a „Cool Vendor“ for Gartner (2010)
●   Present in more than 12 locations around the
    World
●   Fast growing company




                                          www.robertomarchetto.com
Talend Open Studio




                     www.robertomarchetto.com
Talend Open Studio

●   Open Source, professional tool
●   Draw procedures linking components, each
    component performs an operation
●   DB vendor-specific optimized components
●   Produces fully editable Java (or Perl) code
●   Deployment with small and fast compiled Java
    or as Web Service
●   Eclipse based IDE, excellent flexibility
●   BI Platform indipendent, DB Vendor indipendent
                                               www.robertomarchetto.com
Automatic code generation, diffent
           deployment




                             www.robertomarchetto.com
Extracion Transformation Loading


●   ETL is a common process in Data Integration
    ●   Extract, reading data from different datasources
        (database, flat files, spreadsheet files, web
        services, etc)
    ●   Transfom, converting data in a form so that it can
        be placed in another container (database, web
        services, files, etc). Cleaning, computations and
        verifications are also performed
    ●   Load, write the data in the target format



                                                    www.robertomarchetto.com
Tutorial, Source data




                        www.robertomarchetto.com
Tutorial, Destination data (Datawarehouse)




                                 www.robertomarchetto.com
Tutorial, Metadata


●   Talend requires a preliminary definition of the
    metadata
●   Often a strong metadata definition means, as in
    programming languages, fast, robust and
    maintenable applications
●   ..demo..




                                            www.robertomarchetto.com
Tutorial, Talend jobs basics



●   Place components on the designer
●   Link components to build a transformation
●   Main type of link: Rows flow
●   Schema metadata is propagated and must be
    coherent
●   ..demo..



                                         www.robertomarchetto.com
Tutorial, users_dimension




                        www.robertomarchetto.com
Test the job




               www.robertomarchetto.com
Tutorial, accounts_dimension




                         www.robertomarchetto.com
Tutorial, dates_dimension




                        www.robertomarchetto.com
Tutorial, write a Java library




                            www.robertomarchetto.com
Tutorial, opportunities_fact




                          www.robertomarchetto.com
Tutorial, define a root job




                          www.robertomarchetto.com
Deploy and run




                 www.robertomarchetto.com
Extensibility, comunity plugins


                ●   Many official
                    components
                ●   Components for
                    every task released
                    by the comunity
                ●   Geospatial
                    components, log
                    analysis, Google
                    analytics, data
                    encryption, etc

                                www.robertomarchetto.com
Scheduler




            www.robertomarchetto.com
And now.. reports, dashboards, OLAP,
        Geoanalysis, KPIs..




                              www.robertomarchetto.com
Do you trust your data?




                     www.robertomarchetto.com
What about data quality?

●   Customer A is present 5 times with different
    names
●   Null values can vary statistical indexes like
    mean calculation
●   Duplicated records
●   Blank values
●   Some records can contain errors (es -1 field
    values)
●   Some records can be garbage

                                            www.robertomarchetto.com
Talend Open Profiler




                       www.robertomarchetto.com
What abount data storage size?


●   Some fields can be oversized for the data they
    contain
●   Sometimes fields are related and can be
    calculated
●   Some keys or values are never used
●   When data grow garbage grow
●   Data storage is not free (disks, electricity,
    backups, DB licenses)

                                              www.robertomarchetto.com
Data is „the black gold“ that can produce
                knowledge


●   Data is a resource, you can extract knowledge
●   A lot of Data produces concise informations
●   Data storage is not free and a lot of data can
    make system not fast
●   Data cleansing is a central process in statistical
    analysis and Data Mining




                                            www.robertomarchetto.com
Talend Master Data Management




                         www.robertomarchetto.com

Más contenido relacionado

La actualidad más candente

Talend Open Studio for Big Data | Talend Open Studio Tutorial | Talend Online...
Talend Open Studio for Big Data | Talend Open Studio Tutorial | Talend Online...Talend Open Studio for Big Data | Talend Open Studio Tutorial | Talend Online...
Talend Open Studio for Big Data | Talend Open Studio Tutorial | Talend Online...Edureka!
 
Informatica Training | Informatica PowerCenter | Informatica Tutorial | Edureka
Informatica Training | Informatica PowerCenter | Informatica Tutorial | EdurekaInformatica Training | Informatica PowerCenter | Informatica Tutorial | Edureka
Informatica Training | Informatica PowerCenter | Informatica Tutorial | EdurekaEdureka!
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseSnowflake Computing
 
SQL Server Integration Services
SQL Server Integration ServicesSQL Server Integration Services
SQL Server Integration ServicesRobert MacLean
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationDenodo
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 
Informatica Powercenter Architecture
Informatica Powercenter ArchitectureInformatica Powercenter Architecture
Informatica Powercenter ArchitectureBigClasses Com
 
Introduction to snowflake
Introduction to snowflakeIntroduction to snowflake
Introduction to snowflakeSunil Gurav
 
1- Introduction of Azure data factory.pptx
1- Introduction of Azure data factory.pptx1- Introduction of Azure data factory.pptx
1- Introduction of Azure data factory.pptxBRIJESH KUMAR
 
Pentaho Data Integration Introduction
Pentaho Data Integration IntroductionPentaho Data Integration Introduction
Pentaho Data Integration Introductionmattcasters
 
ETL Using Informatica Power Center
ETL Using Informatica Power CenterETL Using Informatica Power Center
ETL Using Informatica Power CenterEdureka!
 
ETL VS ELT.pdf
ETL VS ELT.pdfETL VS ELT.pdf
ETL VS ELT.pdfBOSupport
 

La actualidad más candente (20)

ETL
ETLETL
ETL
 
Introduction To Pentaho
Introduction To PentahoIntroduction To Pentaho
Introduction To Pentaho
 
Talend Open Studio for Big Data | Talend Open Studio Tutorial | Talend Online...
Talend Open Studio for Big Data | Talend Open Studio Tutorial | Talend Online...Talend Open Studio for Big Data | Talend Open Studio Tutorial | Talend Online...
Talend Open Studio for Big Data | Talend Open Studio Tutorial | Talend Online...
 
Kettle – Etl Tool
Kettle – Etl ToolKettle – Etl Tool
Kettle – Etl Tool
 
Informatica Training | Informatica PowerCenter | Informatica Tutorial | Edureka
Informatica Training | Informatica PowerCenter | Informatica Tutorial | EdurekaInformatica Training | Informatica PowerCenter | Informatica Tutorial | Edureka
Informatica Training | Informatica PowerCenter | Informatica Tutorial | Edureka
 
Introduction to ETL and Data Integration
Introduction to ETL and Data IntegrationIntroduction to ETL and Data Integration
Introduction to ETL and Data Integration
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data Warehouse
 
SQL Server Integration Services
SQL Server Integration ServicesSQL Server Integration Services
SQL Server Integration Services
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Tableau
TableauTableau
Tableau
 
Informatica Powercenter Architecture
Informatica Powercenter ArchitectureInformatica Powercenter Architecture
Informatica Powercenter Architecture
 
Introduction to snowflake
Introduction to snowflakeIntroduction to snowflake
Introduction to snowflake
 
ETL Technologies.pptx
ETL Technologies.pptxETL Technologies.pptx
ETL Technologies.pptx
 
1- Introduction of Azure data factory.pptx
1- Introduction of Azure data factory.pptx1- Introduction of Azure data factory.pptx
1- Introduction of Azure data factory.pptx
 
ETL Process
ETL ProcessETL Process
ETL Process
 
Pentaho Data Integration Introduction
Pentaho Data Integration IntroductionPentaho Data Integration Introduction
Pentaho Data Integration Introduction
 
Tableau Architecture
Tableau ArchitectureTableau Architecture
Tableau Architecture
 
ETL Using Informatica Power Center
ETL Using Informatica Power CenterETL Using Informatica Power Center
ETL Using Informatica Power Center
 
ETL VS ELT.pdf
ETL VS ELT.pdfETL VS ELT.pdf
ETL VS ELT.pdf
 

Similar a Talend Open Studio Data Integration

Business Intelligence Open Source
Business Intelligence Open SourceBusiness Intelligence Open Source
Business Intelligence Open SourceRoberto Marchetto
 
Integrating Oracle Data Integrator with Oracle GoldenGate 12c
Integrating Oracle Data Integrator with Oracle GoldenGate 12cIntegrating Oracle Data Integrator with Oracle GoldenGate 12c
Integrating Oracle Data Integrator with Oracle GoldenGate 12cEdelweiss Kammermann
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dan Lynn
 
Spark Workflow Management
Spark Workflow ManagementSpark Workflow Management
Spark Workflow ManagementRomi Kuntsman
 
Python for Data Logistics
Python for Data LogisticsPython for Data Logistics
Python for Data LogisticsKen Farmer
 
An Introduction To Palomino
An Introduction To PalominoAn Introduction To Palomino
An Introduction To PalominoLaine Campbell
 
Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dan Lynn
 
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...Mark Rittman
 
Liferay portals in real projects
Liferay portals  in real projectsLiferay portals  in real projects
Liferay portals in real projectsIBACZ
 
Scaling Magento
Scaling MagentoScaling Magento
Scaling MagentoCopious
 
Fighting legacy with hexagonal architecture and frameworkless php
Fighting legacy with hexagonal architecture and frameworkless phpFighting legacy with hexagonal architecture and frameworkless php
Fighting legacy with hexagonal architecture and frameworkless phpFabio Pellegrini
 
Amit Kumar_Resume
Amit Kumar_ResumeAmit Kumar_Resume
Amit Kumar_ResumeAmit Kumar
 
Resume ETL-Informatica developer
Resume  ETL-Informatica developerResume  ETL-Informatica developer
Resume ETL-Informatica developerajayagrawal92
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSKimmo Kantojärvi
 

Similar a Talend Open Studio Data Integration (20)

Business Intelligence Open Source
Business Intelligence Open SourceBusiness Intelligence Open Source
Business Intelligence Open Source
 
Integrating Oracle Data Integrator with Oracle GoldenGate 12c
Integrating Oracle Data Integrator with Oracle GoldenGate 12cIntegrating Oracle Data Integrator with Oracle GoldenGate 12c
Integrating Oracle Data Integrator with Oracle GoldenGate 12c
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
 
Spark Workflow Management
Spark Workflow ManagementSpark Workflow Management
Spark Workflow Management
 
Python for Data Logistics
Python for Data LogisticsPython for Data Logistics
Python for Data Logistics
 
An Introduction To Palomino
An Introduction To PalominoAn Introduction To Palomino
An Introduction To Palomino
 
Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016
 
Resume
ResumeResume
Resume
 
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
 
Are we there yet?
Are we there yet?Are we there yet?
Are we there yet?
 
Liferay portals in real projects
Liferay portals  in real projectsLiferay portals  in real projects
Liferay portals in real projects
 
Dynomite @ RedisConf 2017
Dynomite @ RedisConf 2017Dynomite @ RedisConf 2017
Dynomite @ RedisConf 2017
 
Scaling Magento
Scaling MagentoScaling Magento
Scaling Magento
 
DevOps Days Rockies MLOps
DevOps Days Rockies MLOpsDevOps Days Rockies MLOps
DevOps Days Rockies MLOps
 
Fighting legacy with hexagonal architecture and frameworkless php
Fighting legacy with hexagonal architecture and frameworkless phpFighting legacy with hexagonal architecture and frameworkless php
Fighting legacy with hexagonal architecture and frameworkless php
 
Boobalan_Muthukumarasamy_Resume_DW_8_Yrs
Boobalan_Muthukumarasamy_Resume_DW_8_YrsBoobalan_Muthukumarasamy_Resume_DW_8_Yrs
Boobalan_Muthukumarasamy_Resume_DW_8_Yrs
 
Amit Kumar_Resume
Amit Kumar_ResumeAmit Kumar_Resume
Amit Kumar_Resume
 
Resume ETL-Informatica developer
Resume  ETL-Informatica developerResume  ETL-Informatica developer
Resume ETL-Informatica developer
 
Odi ireland rittman
Odi ireland rittmanOdi ireland rittman
Odi ireland rittman
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWS
 

Último

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Último (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Talend Open Studio Data Integration

  • 1. Talend Data Integration and Management
  • 2. Data Integration Data Integration involves combining data residing in differente sources and providing the user with a unified view of the data Data Management combines different disciplines to manage data as a valuable resource www.robertomarchetto.com
  • 3. Talend ● Talend is a company focused on Data Integration and Data Management solutions ● Talend is a „Cool Vendor“ for Gartner (2010) ● Present in more than 12 locations around the World ● Fast growing company www.robertomarchetto.com
  • 4. Talend Open Studio www.robertomarchetto.com
  • 5. Talend Open Studio ● Open Source, professional tool ● Draw procedures linking components, each component performs an operation ● DB vendor-specific optimized components ● Produces fully editable Java (or Perl) code ● Deployment with small and fast compiled Java or as Web Service ● Eclipse based IDE, excellent flexibility ● BI Platform indipendent, DB Vendor indipendent www.robertomarchetto.com
  • 6. Automatic code generation, diffent deployment www.robertomarchetto.com
  • 7. Extracion Transformation Loading ● ETL is a common process in Data Integration ● Extract, reading data from different datasources (database, flat files, spreadsheet files, web services, etc) ● Transfom, converting data in a form so that it can be placed in another container (database, web services, files, etc). Cleaning, computations and verifications are also performed ● Load, write the data in the target format www.robertomarchetto.com
  • 8. Tutorial, Source data www.robertomarchetto.com
  • 9. Tutorial, Destination data (Datawarehouse) www.robertomarchetto.com
  • 10. Tutorial, Metadata ● Talend requires a preliminary definition of the metadata ● Often a strong metadata definition means, as in programming languages, fast, robust and maintenable applications ● ..demo.. www.robertomarchetto.com
  • 11. Tutorial, Talend jobs basics ● Place components on the designer ● Link components to build a transformation ● Main type of link: Rows flow ● Schema metadata is propagated and must be coherent ● ..demo.. www.robertomarchetto.com
  • 12. Tutorial, users_dimension www.robertomarchetto.com
  • 13. Test the job www.robertomarchetto.com
  • 14. Tutorial, accounts_dimension www.robertomarchetto.com
  • 15. Tutorial, dates_dimension www.robertomarchetto.com
  • 16. Tutorial, write a Java library www.robertomarchetto.com
  • 17. Tutorial, opportunities_fact www.robertomarchetto.com
  • 18. Tutorial, define a root job www.robertomarchetto.com
  • 19. Deploy and run www.robertomarchetto.com
  • 20. Extensibility, comunity plugins ● Many official components ● Components for every task released by the comunity ● Geospatial components, log analysis, Google analytics, data encryption, etc www.robertomarchetto.com
  • 21. Scheduler www.robertomarchetto.com
  • 22. And now.. reports, dashboards, OLAP, Geoanalysis, KPIs.. www.robertomarchetto.com
  • 23. Do you trust your data? www.robertomarchetto.com
  • 24. What about data quality? ● Customer A is present 5 times with different names ● Null values can vary statistical indexes like mean calculation ● Duplicated records ● Blank values ● Some records can contain errors (es -1 field values) ● Some records can be garbage www.robertomarchetto.com
  • 25. Talend Open Profiler www.robertomarchetto.com
  • 26. What abount data storage size? ● Some fields can be oversized for the data they contain ● Sometimes fields are related and can be calculated ● Some keys or values are never used ● When data grow garbage grow ● Data storage is not free (disks, electricity, backups, DB licenses) www.robertomarchetto.com
  • 27. Data is „the black gold“ that can produce knowledge ● Data is a resource, you can extract knowledge ● A lot of Data produces concise informations ● Data storage is not free and a lot of data can make system not fast ● Data cleansing is a central process in statistical analysis and Data Mining www.robertomarchetto.com
  • 28. Talend Master Data Management www.robertomarchetto.com