SlideShare una empresa de Scribd logo
1 de 19
Kettle – ETL Tool
Sreenivas K
Agenda

Introduction
− ETL Process
− Pentaho's Kettle

Data Integration Challenges

Prerequisites and Recent Releases

Pentaho DI Components

Spoon
− Transformations
− Jobs
Introduction – ETL Process

Major Components
− Extracting

Gathering raw data from source systems and storing it in ETL staging
environment

Data Profiling

Identifying data that changed since last load.
− Transforming- Cleaning and Conforming

Processing data to improve its quality, format it, merge from multiple
sources, enforce conformed dimensions

Data cleansing

Recording error events

Audit dimensions

Creating and maintaining conformed dimensions and facts
Introduction – ETL Process
− Loading

Loading data into data warehouse tables

Managing hierarchies in dimensions

Managing special dimensions such as date and time, junk, mini, shrunken,
small static, and user-maintained dimensions

Fact table loading

Building and maintaining bridge dimension tables

Handling late arriving data

Management of conformed dimensions

Administration of fact tables

Building aggregations

Building OLAP cubes

Transferring DW data to other environment for specific purposes
Data Transformation and
Integration Examples

Data filtering
− Is not null, greater than, less than, includes

Field manipulation
− Trimming, padding, upper and lowercase conversion

Data calculations
− + - X / , average, absolute value, arctangent, natural logarithm

Date manipulation
− First day of month, Last day of month, add months, week of year, day of year

Data type conversion
− String to number, number to string, date to number

Merging fields & splitting fields

Looking up date
− Look up in a database, in a text file, an excel sheet, …
Introduction – Pentaho Kettle

Kettle – Kettle Extraction Transformation Transportation &
Loading tool

Its open source business intelligence suite for powerful
data integration by Pentaho. Founded in 2004.

Products of Pentaho
− Mondrain – OLAP server written in Java
− Kettle – ETL tool
Data Integration - Challenges

Data is everywhere

Data is inconsistent
− Records are different in each system

Performance issues
− Running queries to summarize data for stipulated
long period takes operating system for task

Data is never all in Data Warehouse
− Excel sheet, acquisition, new application
Prerequisites Recent Releases

Java Runtime Environment
1.5 and above

Compatible with almost any
platform

Compatible with wide range
of Databases technologies.

4/25 Data Integration 3.0.3 GA

4/18 Data Integration 3.1 Milestone

2/8 Data Integration 3.0.2 GA

12/12 Data Integration 3.0.1 GA

11/15 Data Integration 3.0 GA

10/31 Data Integration 3.0 RC2

10/24 Data Integration 2.5.2 GA

10/08 Data Integration 3.0 RC1

08/24 Data Integration 2.5.1 GA
Pentaho Components

Spoon
− GUI that allows you to design transformations and jobs that can
be run with the Kettle tools — Pan and Kitchen
− Transformations and Jobs can describe themselves using an XML
file or can be put in a Kettle database repository.
− Spoon is available as executable script and batch file to make use
of tool in heterogeneous environment.

Pan
− A program to execute transformations designed by Spoon in XML or
database repository.
− Transformations are scheduled in batch mode to be run automatically at
regular intervals

Kitchen
− Execute jobs designed by Spoon in XML or database repository

Repository Connection establishment

Auto login
− By setting manually KETTLE_REPOSITORY,
KETTLE_USER and KETTLE_PASSWORD
environmental variables.

Login
− By default PDI provides login username and
password ad admin.

Transformation
− Value: Values are part of a row
and can contain any type of data
− Row: a row exists of 0 or more
values
− Output stream: an output
stream is a stack of rows that
leaves a step.
− Input stream: an input stream is
a stack of rows that enters a
step.
− Hop: A hop is a graphical
representation of one or more
data streams between 2 steps.
− Note: A note is a piece of
information that can be added to
a transformation
Engine capable of performing a
multitude of functions such as reading,
manipulating and writing data to and
from various data sources.

Jobs
− Job Entry: A job entry is
one part of a job and
performs a certain
− Hop: A hop is a graphical
representation of one or
more data streams
between 2 steps
− Note: a note is a piece of
information that can be added to
a job
A way of calling transformations and
controlling the sequence of their
execution. Usually jobs are
scheduled in batch mode to be run
automatically at regular intervals.
Input Steps
Output Steps
Lookup Steps
Transformation
Steps
Join Steps
DW Steps
Mapping Steps
Job Steps
Pentaho etl-tool
Pentaho etl-tool
Pentaho etl-tool

Más contenido relacionado

La actualidad más candente

Pentaho-BI
Pentaho-BIPentaho-BI
Pentaho-BI
Edureka!
 

La actualidad más candente (20)

Pentaho-BI
Pentaho-BIPentaho-BI
Pentaho-BI
 
Talend Open Studio Data Integration
Talend Open Studio Data IntegrationTalend Open Studio Data Integration
Talend Open Studio Data Integration
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
An overview of snowflake
An overview of snowflakeAn overview of snowflake
An overview of snowflake
 
Introduction to ETL and Data Integration
Introduction to ETL and Data IntegrationIntroduction to ETL and Data Integration
Introduction to ETL and Data Integration
 
File Format Benchmark - Avro, JSON, ORC and Parquet
File Format Benchmark - Avro, JSON, ORC and ParquetFile Format Benchmark - Avro, JSON, ORC and Parquet
File Format Benchmark - Avro, JSON, ORC and Parquet
 
Introduction to Data Engineer and Data Pipeline at Credit OK
Introduction to Data Engineer and Data Pipeline at Credit OKIntroduction to Data Engineer and Data Pipeline at Credit OK
Introduction to Data Engineer and Data Pipeline at Credit OK
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache Spark
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Talend Data Integration Tutorial | Talend Tutorial For Beginners | Talend Onl...
Talend Data Integration Tutorial | Talend Tutorial For Beginners | Talend Onl...Talend Data Integration Tutorial | Talend Tutorial For Beginners | Talend Onl...
Talend Data Integration Tutorial | Talend Tutorial For Beginners | Talend Onl...
 
Data Engineer's Lunch #54: dbt and Spark
Data Engineer's Lunch #54: dbt and SparkData Engineer's Lunch #54: dbt and Spark
Data Engineer's Lunch #54: dbt and Spark
 
Talend Open Studio Introduction - OSSCamp 2014
Talend Open Studio Introduction - OSSCamp 2014Talend Open Studio Introduction - OSSCamp 2014
Talend Open Studio Introduction - OSSCamp 2014
 
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
 
3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta
 
Why shift from ETL to ELT?
Why shift from ETL to ELT?Why shift from ETL to ELT?
Why shift from ETL to ELT?
 

Destacado

Daum 내부 빅데이터 및 클라우드 기술 활용 사례- 윤석찬 (2012)
Daum 내부 빅데이터 및 클라우드 기술 활용 사례- 윤석찬 (2012)Daum 내부 빅데이터 및 클라우드 기술 활용 사례- 윤석찬 (2012)
Daum 내부 빅데이터 및 클라우드 기술 활용 사례- 윤석찬 (2012)
Channy Yun
 

Destacado (14)

Pentaho PDI
Pentaho PDIPentaho PDI
Pentaho PDI
 
Spatial ETL For Web Services-Based Data Sharing
Spatial ETL For Web Services-Based Data SharingSpatial ETL For Web Services-Based Data Sharing
Spatial ETL For Web Services-Based Data Sharing
 
vertica_tmp_4.5
vertica_tmp_4.5vertica_tmp_4.5
vertica_tmp_4.5
 
Penatho
PenathoPenatho
Penatho
 
Hire Pentaho Developer | BI Tools
Hire Pentaho Developer | BI ToolsHire Pentaho Developer | BI Tools
Hire Pentaho Developer | BI Tools
 
Hybrid & Logical Data Warehouse
Hybrid & Logical Data WarehouseHybrid & Logical Data Warehouse
Hybrid & Logical Data Warehouse
 
Pentaho ETL ハンズオン
Pentaho ETL ハンズオンPentaho ETL ハンズオン
Pentaho ETL ハンズオン
 
Open Source Reporting Tool Comparison
Open Source Reporting Tool ComparisonOpen Source Reporting Tool Comparison
Open Source Reporting Tool Comparison
 
Building Data Integration and Transformations using Pentaho
Building Data Integration and Transformations using PentahoBuilding Data Integration and Transformations using Pentaho
Building Data Integration and Transformations using Pentaho
 
TeraStream for ETL
TeraStream for ETLTeraStream for ETL
TeraStream for ETL
 
Daum 내부 빅데이터 및 클라우드 기술 활용 사례- 윤석찬 (2012)
Daum 내부 빅데이터 및 클라우드 기술 활용 사례- 윤석찬 (2012)Daum 내부 빅데이터 및 클라우드 기술 활용 사례- 윤석찬 (2012)
Daum 내부 빅데이터 및 클라우드 기술 활용 사례- 윤석찬 (2012)
 
빅데이터 기술 현황과 시장 전망(2014)
빅데이터 기술 현황과 시장 전망(2014)빅데이터 기술 현황과 시장 전망(2014)
빅데이터 기술 현황과 시장 전망(2014)
 
Informatica Pentaho Etl Tools Comparison
Informatica Pentaho Etl Tools ComparisonInformatica Pentaho Etl Tools Comparison
Informatica Pentaho Etl Tools Comparison
 
What's New in Pentaho 7.0?
What's New in Pentaho 7.0?What's New in Pentaho 7.0?
What's New in Pentaho 7.0?
 

Similar a Pentaho etl-tool

Kettleetltool 090522005630-phpapp01
Kettleetltool 090522005630-phpapp01Kettleetltool 090522005630-phpapp01
Kettleetltool 090522005630-phpapp01
jade_22
 
Building the DW - ETL
Building the DW - ETLBuilding the DW - ETL
Building the DW - ETL
ganblues
 
ELT Publishing Tool Overview V3_Jeff
ELT Publishing Tool Overview V3_JeffELT Publishing Tool Overview V3_Jeff
ELT Publishing Tool Overview V3_Jeff
Jeff McQuigg
 
Dan Querimit - BI Portfolio
Dan Querimit - BI PortfolioDan Querimit - BI Portfolio
Dan Querimit - BI Portfolio
querimit
 
oracle_soultion_oracledataintegrator_goldengate_2021
oracle_soultion_oracledataintegrator_goldengate_2021oracle_soultion_oracledataintegrator_goldengate_2021
oracle_soultion_oracledataintegrator_goldengate_2021
ssuser8ccb5a
 
CERN_DIS_ODI_OGG_final_oracle_golde.pptx
CERN_DIS_ODI_OGG_final_oracle_golde.pptxCERN_DIS_ODI_OGG_final_oracle_golde.pptx
CERN_DIS_ODI_OGG_final_oracle_golde.pptx
camyla81
 
Ramesh BODS_IS
Ramesh BODS_ISRamesh BODS_IS
Ramesh BODS_IS
Ramesh Ch
 
Amit Kumar_Resume
Amit Kumar_ResumeAmit Kumar_Resume
Amit Kumar_Resume
Amit Kumar
 
Ramesh BODS_IS
Ramesh BODS_ISRamesh BODS_IS
Ramesh BODS_IS
Ramesh Ch
 

Similar a Pentaho etl-tool (20)

Kettleetltool 090522005630-phpapp01
Kettleetltool 090522005630-phpapp01Kettleetltool 090522005630-phpapp01
Kettleetltool 090522005630-phpapp01
 
Kettle – Etl Tool
Kettle – Etl ToolKettle – Etl Tool
Kettle – Etl Tool
 
Skills Portfolio
Skills PortfolioSkills Portfolio
Skills Portfolio
 
Datawa.re: Data warehouse design, development and support just got alot faster
Datawa.re: Data warehouse design, development and support just got alot fasterDatawa.re: Data warehouse design, development and support just got alot faster
Datawa.re: Data warehouse design, development and support just got alot faster
 
Building the DW - ETL
Building the DW - ETLBuilding the DW - ETL
Building the DW - ETL
 
ETL (1).ppt
ETL (1).pptETL (1).ppt
ETL (1).ppt
 
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data AnalyticsStrata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
 
ELT Publishing Tool Overview V3_Jeff
ELT Publishing Tool Overview V3_JeffELT Publishing Tool Overview V3_Jeff
ELT Publishing Tool Overview V3_Jeff
 
Dan Querimit - BI Portfolio
Dan Querimit - BI PortfolioDan Querimit - BI Portfolio
Dan Querimit - BI Portfolio
 
ETL
ETL ETL
ETL
 
oracle_soultion_oracledataintegrator_goldengate_2021
oracle_soultion_oracledataintegrator_goldengate_2021oracle_soultion_oracledataintegrator_goldengate_2021
oracle_soultion_oracledataintegrator_goldengate_2021
 
CERN_DIS_ODI_OGG_final_oracle_golde.pptx
CERN_DIS_ODI_OGG_final_oracle_golde.pptxCERN_DIS_ODI_OGG_final_oracle_golde.pptx
CERN_DIS_ODI_OGG_final_oracle_golde.pptx
 
Datastage to ODI
Datastage to ODIDatastage to ODI
Datastage to ODI
 
Ramesh BODS_IS
Ramesh BODS_ISRamesh BODS_IS
Ramesh BODS_IS
 
Data migration
Data migrationData migration
Data migration
 
Amit Kumar_Resume
Amit Kumar_ResumeAmit Kumar_Resume
Amit Kumar_Resume
 
Data automation 101
Data automation 101Data automation 101
Data automation 101
 
Ramesh BODS_IS
Ramesh BODS_ISRamesh BODS_IS
Ramesh BODS_IS
 
AIRflow at Scale
AIRflow at ScaleAIRflow at Scale
AIRflow at Scale
 
Informatica overview
Informatica overviewInformatica overview
Informatica overview
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Pentaho etl-tool

  • 1. Kettle – ETL Tool Sreenivas K
  • 2. Agenda  Introduction − ETL Process − Pentaho's Kettle  Data Integration Challenges  Prerequisites and Recent Releases  Pentaho DI Components  Spoon − Transformations − Jobs
  • 3. Introduction – ETL Process  Major Components − Extracting  Gathering raw data from source systems and storing it in ETL staging environment  Data Profiling  Identifying data that changed since last load. − Transforming- Cleaning and Conforming  Processing data to improve its quality, format it, merge from multiple sources, enforce conformed dimensions  Data cleansing  Recording error events  Audit dimensions  Creating and maintaining conformed dimensions and facts
  • 4. Introduction – ETL Process − Loading  Loading data into data warehouse tables  Managing hierarchies in dimensions  Managing special dimensions such as date and time, junk, mini, shrunken, small static, and user-maintained dimensions  Fact table loading  Building and maintaining bridge dimension tables  Handling late arriving data  Management of conformed dimensions  Administration of fact tables  Building aggregations  Building OLAP cubes  Transferring DW data to other environment for specific purposes
  • 5. Data Transformation and Integration Examples  Data filtering − Is not null, greater than, less than, includes  Field manipulation − Trimming, padding, upper and lowercase conversion  Data calculations − + - X / , average, absolute value, arctangent, natural logarithm  Date manipulation − First day of month, Last day of month, add months, week of year, day of year  Data type conversion − String to number, number to string, date to number  Merging fields & splitting fields  Looking up date − Look up in a database, in a text file, an excel sheet, …
  • 6. Introduction – Pentaho Kettle  Kettle – Kettle Extraction Transformation Transportation & Loading tool  Its open source business intelligence suite for powerful data integration by Pentaho. Founded in 2004.  Products of Pentaho − Mondrain – OLAP server written in Java − Kettle – ETL tool
  • 7. Data Integration - Challenges  Data is everywhere  Data is inconsistent − Records are different in each system  Performance issues − Running queries to summarize data for stipulated long period takes operating system for task  Data is never all in Data Warehouse − Excel sheet, acquisition, new application
  • 8. Prerequisites Recent Releases  Java Runtime Environment 1.5 and above  Compatible with almost any platform  Compatible with wide range of Databases technologies.  4/25 Data Integration 3.0.3 GA  4/18 Data Integration 3.1 Milestone  2/8 Data Integration 3.0.2 GA  12/12 Data Integration 3.0.1 GA  11/15 Data Integration 3.0 GA  10/31 Data Integration 3.0 RC2  10/24 Data Integration 2.5.2 GA  10/08 Data Integration 3.0 RC1  08/24 Data Integration 2.5.1 GA
  • 9. Pentaho Components  Spoon − GUI that allows you to design transformations and jobs that can be run with the Kettle tools — Pan and Kitchen − Transformations and Jobs can describe themselves using an XML file or can be put in a Kettle database repository. − Spoon is available as executable script and batch file to make use of tool in heterogeneous environment.  Pan − A program to execute transformations designed by Spoon in XML or database repository. − Transformations are scheduled in batch mode to be run automatically at regular intervals  Kitchen − Execute jobs designed by Spoon in XML or database repository
  • 10.  Repository Connection establishment  Auto login − By setting manually KETTLE_REPOSITORY, KETTLE_USER and KETTLE_PASSWORD environmental variables.  Login − By default PDI provides login username and password ad admin.
  • 11.
  • 12.
  • 13.
  • 14.  Transformation − Value: Values are part of a row and can contain any type of data − Row: a row exists of 0 or more values − Output stream: an output stream is a stack of rows that leaves a step. − Input stream: an input stream is a stack of rows that enters a step. − Hop: A hop is a graphical representation of one or more data streams between 2 steps. − Note: A note is a piece of information that can be added to a transformation Engine capable of performing a multitude of functions such as reading, manipulating and writing data to and from various data sources.
  • 15.  Jobs − Job Entry: A job entry is one part of a job and performs a certain − Hop: A hop is a graphical representation of one or more data streams between 2 steps − Note: a note is a piece of information that can be added to a job A way of calling transformations and controlling the sequence of their execution. Usually jobs are scheduled in batch mode to be run automatically at regular intervals.
  • 16. Input Steps Output Steps Lookup Steps Transformation Steps Join Steps DW Steps Mapping Steps Job Steps