SlideShare a Scribd company logo
1 of 37
IBM Ascential ETL Overview: DataStage and Quality Stage
More than ever, businesses today need to understand their operations, customers, suppliers, partners, employees, and stockholders. They need to know what is happening with the business, analyze their operations, reach to market conditions, make the right decisions to drive revenue growth, increase profits and improve productivity and efficiency.
CIOs are responding to their organizations’ strategic needs by developing IT initiatives that align corporate data with business objectives.  These initiatives include: Business intelligence Master data management Business transformation Infrastructure rationalization Risk and compliance
[object Object],[object Object],[object Object],[object Object],[object Object],IBM WebSphere Information Integration platform enables businesses to perform five key integration functions :
Data Analysis : Define, annotate, and report on fields of business data. ,[object Object],[object Object],[object Object],[object Object],Data Transformation & Movement : Move data and transform it to meet the requirements of   its target systems ,[object Object],[object Object],[object Object],[object Object],Software: QualityStage Software: DataStage Software: N/A  (not used at NCEN) Software: QualityStage Software: DataStage This presentation will deal with ETL  QualityStage  and  DataStage .
QualityStage QualityStage is used to cleanse and enrich data to meet business needs and data quality management standards. ,[object Object],[object Object],[object Object],[object Object],[object Object],= data cleansing
QualityStage Main QS stages used in the BRM project: ,[object Object],[object Object],[object Object],[object Object]
QualityStage  Investigate     Standardize    Match    Survive ,[object Object],[object Object]
QualityStage  Investigate     Standardize    Match    Survive ,[object Object],[object Object],[object Object],For the United States, the address data would include: USPREP (parses name, address and area if data not previously formatted) USNAME (for individual and organization names) USADDR (for street and mailing addresses) USAREA (for city, state, ZIP code and so on)
QualityStage  Investigate     Standardize    Match    Survive Field parsing breaks the address into individual tokens of “123”, “St.”, “Virginia” and “St.” Example :  The test field “ 123 St. Virginia St. ” would be analyzed in the following way: ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
QualityStage  Investigate     Standardize     Match    Survive The  Standardize  stage allows you to reformat data from multiple systems to ensure that each data type has the correct and consistent content and format.
QualityStage  Investigate     Standardize     Match    Survive Standardization is used to invoke specific standardization Rule Sets and standardize one or more fields using that Rule Set.  Standardization is used to invoke specific standardization Rule Sets and standardize one or more fields using that Rule Set.  For example, a Rule Set can be used so that “ Boulevard ” will always be “ Blvd ” Standardization is used to invoke specific standardization Rule Sets and standardize one or more fields using that Rule Set.  For example, a Rule Set can be used so that “ Boulevard ” will always be “ Blvd ”, not “ Boulevard ”, “ Blv .”, “ Boulev ”, or some other variation. ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],The list below shows some of the more commonly-used  Rule Sets .
QualityStage  Investigate    Standardize     Match     Survive Data  matching  is used to find records in a single data source or independent data sources Data  matching  is used to find records in a single data source or independent data sources that refer to the same entity Data  matching  is used to find records in a single data source or independent data sources that refer to the same entity (such as a person, organization, location, product, or material) regardless of the availability of a predetermined key.
[object Object],[object Object],QualityStage  Matching Stage  basically consists of two steps: QualityStage  Investigate    Standardize     Match     Survive
[object Object],[object Object],[object Object],[object Object],Operations in the Matching module: 2.  Processing Files 1.  Unduplication ,[object Object],[object Object],[object Object],[object Object],1.  Unduplication  (group records into sets having similar attributes) QualityStage  Investigate    Standardize     Match     Survive
QualityStage  Investigate    Standardize     Match      Survive Survivorship  is used to create a ‘best record’ from all available information about an entity (such as a person, location, material, etc.).  Survivorship and formatting ensure that the best available data survives and is correctly prepared for the target destination.   Using the rules setup screen, it implements business and mapping rules, creating the necessary output structures for the target application and identifying fields that do not conform to load standards.
QualityStage  Investigate    Standardize     Match      Survive ,[object Object],[object Object],[object Object],The  Survive  stage does the following:
DataStage = data transformation
DataStage In its simplest form, DataStage performs data transformation and movement from source systems to target systems in batch and in real time.   The data sources may include indexed files, sequential files, relational databases, archives, external data sources, enterprise applications and message queues.
DataStage ,[object Object],[object Object],[object Object],[object Object],The DataStage client components are:
[object Object],[object Object],[object Object],[object Object],DataStage  Administrator     Manager     Designer      Director Use DataStage  Administrator  to:
DataStage  Administrator     Manager     Designer      Director
DataStage  Administrator      Manager      Designer      Director DataStage  Manager  is the primary interface to the DataStage repository.  In addition to table and file layouts, it displays the routines, transforms, and jobs that are defines in the project.  It also allows us to move or copy ETL jobs from one project to another.
[object Object],[object Object],[object Object],[object Object],[object Object],DataStage  Administrator    Manager     Designer      Director Use DataStage  Designer  to:
DataStage  Administrator    Manager     Designer      Director Use DataStage  Director  to run, schedule, and monitor your DataStage jobs.  You can also gather statistics as the job runs.   Also used for looking at logs for debugging purposes.
[object Object],[object Object],[object Object],[object Object],DataStage:  Getting Started
DataStage  Designer   Developing a job
DataStage  Designer   Developing a job
DataStage  Designer   Input Stage
DataStage  Designer   Transformer Stage The Transformer stage performs any data conversion required before the data is output to another stage in the job design. After you are done, compile and run the job.
DataStage  Designer
DataStage  Designer
DataStage  Designer
DataStage  Designer
[object Object],[object Object],[object Object],[object Object],DataStage  An example :  Preventing the header row from inserting into MDM_Contact and MDM_Broker
Questions?
Thank you for attending

More Related Content

What's hot

DATA WAREHOUSE -- ETL testing Plan
DATA WAREHOUSE -- ETL testing PlanDATA WAREHOUSE -- ETL testing Plan
DATA WAREHOUSE -- ETL testing PlanMadhu Nepal
 
Data Warehouse (ETL) testing process
Data Warehouse (ETL) testing processData Warehouse (ETL) testing process
Data Warehouse (ETL) testing processRakesh Hansalia
 
Etl And Data Test Guidelines For Large Applications
Etl And Data Test Guidelines For Large ApplicationsEtl And Data Test Guidelines For Large Applications
Etl And Data Test Guidelines For Large ApplicationsWayne Yaddow
 
Data extraction, transformation, and loading
Data extraction, transformation, and loadingData extraction, transformation, and loading
Data extraction, transformation, and loadingSiddique Ibrahim
 
Data warehousing testing strategies cognos
Data warehousing testing strategies cognosData warehousing testing strategies cognos
Data warehousing testing strategies cognosSandeep Mehta
 
To Study E T L ( Extract, Transform, Load) Tools Specially S Q L Server I...
To Study  E T L ( Extract, Transform, Load) Tools Specially  S Q L  Server  I...To Study  E T L ( Extract, Transform, Load) Tools Specially  S Q L  Server  I...
To Study E T L ( Extract, Transform, Load) Tools Specially S Q L Server I...Shahzad
 
Data Verification In QA Department Final
Data Verification In QA Department FinalData Verification In QA Department Final
Data Verification In QA Department FinalWayne Yaddow
 
Data Warehouse Testing: It’s All about the Planning
Data Warehouse Testing: It’s All about the PlanningData Warehouse Testing: It’s All about the Planning
Data Warehouse Testing: It’s All about the PlanningTechWell
 
ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testingETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testingVibrant Event
 
ETL Testing Interview Questions and Answers
ETL Testing Interview Questions and AnswersETL Testing Interview Questions and Answers
ETL Testing Interview Questions and AnswersH2Kinfosys
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousingEr. Nawaraj Bhandari
 
Data Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubeyData Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubeyAnkita Dubey
 
Cts informatica interview question answers
Cts informatica interview question answersCts informatica interview question answers
Cts informatica interview question answersSweta Singh
 
Testing data warehouse applications by Kirti Bhushan
Testing data warehouse applications by Kirti BhushanTesting data warehouse applications by Kirti Bhushan
Testing data warehouse applications by Kirti BhushanKirti Bhushan
 

What's hot (19)

DATA WAREHOUSE -- ETL testing Plan
DATA WAREHOUSE -- ETL testing PlanDATA WAREHOUSE -- ETL testing Plan
DATA WAREHOUSE -- ETL testing Plan
 
Data Warehouse (ETL) testing process
Data Warehouse (ETL) testing processData Warehouse (ETL) testing process
Data Warehouse (ETL) testing process
 
Etl And Data Test Guidelines For Large Applications
Etl And Data Test Guidelines For Large ApplicationsEtl And Data Test Guidelines For Large Applications
Etl And Data Test Guidelines For Large Applications
 
ETL Testing Overview
ETL Testing OverviewETL Testing Overview
ETL Testing Overview
 
Data extraction, transformation, and loading
Data extraction, transformation, and loadingData extraction, transformation, and loading
Data extraction, transformation, and loading
 
Data warehousing testing strategies cognos
Data warehousing testing strategies cognosData warehousing testing strategies cognos
Data warehousing testing strategies cognos
 
To Study E T L ( Extract, Transform, Load) Tools Specially S Q L Server I...
To Study  E T L ( Extract, Transform, Load) Tools Specially  S Q L  Server  I...To Study  E T L ( Extract, Transform, Load) Tools Specially  S Q L  Server  I...
To Study E T L ( Extract, Transform, Load) Tools Specially S Q L Server I...
 
Data Verification In QA Department Final
Data Verification In QA Department FinalData Verification In QA Department Final
Data Verification In QA Department Final
 
Data Warehouse Testing: It’s All about the Planning
Data Warehouse Testing: It’s All about the PlanningData Warehouse Testing: It’s All about the Planning
Data Warehouse Testing: It’s All about the Planning
 
Data warehouse physical design
Data warehouse physical designData warehouse physical design
Data warehouse physical design
 
ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testingETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testing
 
Transaction
TransactionTransaction
Transaction
 
ETL Testing Interview Questions and Answers
ETL Testing Interview Questions and AnswersETL Testing Interview Questions and Answers
ETL Testing Interview Questions and Answers
 
ETL Process
ETL ProcessETL Process
ETL Process
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousing
 
Data Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubeyData Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubey
 
Data warehouse logical design
Data warehouse logical designData warehouse logical design
Data warehouse logical design
 
Cts informatica interview question answers
Cts informatica interview question answersCts informatica interview question answers
Cts informatica interview question answers
 
Testing data warehouse applications by Kirti Bhushan
Testing data warehouse applications by Kirti BhushanTesting data warehouse applications by Kirti Bhushan
Testing data warehouse applications by Kirti Bhushan
 

Viewers also liked

Viewers also liked (7)

ETL Process
ETL ProcessETL Process
ETL Process
 
Dw & etl concepts
Dw & etl conceptsDw & etl concepts
Dw & etl concepts
 
Le processus ETL (Extraction, Transformation, Chargement)
Le processus ETL (Extraction, Transformation, Chargement)Le processus ETL (Extraction, Transformation, Chargement)
Le processus ETL (Extraction, Transformation, Chargement)
 
data warehouse , data mart, etl
data warehouse , data mart, etldata warehouse , data mart, etl
data warehouse , data mart, etl
 
What is ETL?
What is ETL?What is ETL?
What is ETL?
 
Introduction to ETL and Data Integration
Introduction to ETL and Data IntegrationIntroduction to ETL and Data Integration
Introduction to ETL and Data Integration
 
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
 

Similar to Etl Overview (Extract, Transform, And Load)

Data quality and bi
Data quality and biData quality and bi
Data quality and bijeffd00
 
Data Warehousing AWS 12345
Data Warehousing AWS 12345Data Warehousing AWS 12345
Data Warehousing AWS 12345AkhilSinghal21
 
MIS5101 WK10 Outcome Measures
MIS5101 WK10 Outcome MeasuresMIS5101 WK10 Outcome Measures
MIS5101 WK10 Outcome MeasuresSteven Johnson
 
UNIT - 1 Part 2: Data Warehousing and Data Mining
UNIT - 1 Part 2: Data Warehousing and Data MiningUNIT - 1 Part 2: Data Warehousing and Data Mining
UNIT - 1 Part 2: Data Warehousing and Data MiningNandakumar P
 
Artifacts, Data Dictionary, Data Modeling, Data Wrangling
Artifacts, Data Dictionary, Data Modeling, Data WranglingArtifacts, Data Dictionary, Data Modeling, Data Wrangling
Artifacts, Data Dictionary, Data Modeling, Data WranglingFaisal Akbar
 
Data warehousing
Data warehousingData warehousing
Data warehousingkeeyre
 
Technical Documentation 101 for Data Engineers.pdf
Technical Documentation 101 for Data Engineers.pdfTechnical Documentation 101 for Data Engineers.pdf
Technical Documentation 101 for Data Engineers.pdfShristi Shrestha
 
05. Physical Data Specification Template
05. Physical Data Specification Template05. Physical Data Specification Template
05. Physical Data Specification TemplateAlan D. Duncan
 
Information Systems For Business and BeyondChapter 4Data a.docx
Information Systems For Business and BeyondChapter 4Data a.docxInformation Systems For Business and BeyondChapter 4Data a.docx
Information Systems For Business and BeyondChapter 4Data a.docxjaggernaoma
 
Large-Scale Distributed Storage System for Business Provenance - Cloud 2011
Large-Scale Distributed Storage System for Business Provenance - Cloud 2011Large-Scale Distributed Storage System for Business Provenance - Cloud 2011
Large-Scale Distributed Storage System for Business Provenance - Cloud 2011Szabolcs Rozsnyai
 
Performance management capability
Performance management capabilityPerformance management capability
Performance management capabilitydesigner DATA
 
It 302 computerized accounting (week 2) - sharifah
It 302   computerized accounting (week 2) - sharifahIt 302   computerized accounting (week 2) - sharifah
It 302 computerized accounting (week 2) - sharifahalish sha
 
20IT501_DWDM_PPT_Unit_II.ppt
20IT501_DWDM_PPT_Unit_II.ppt20IT501_DWDM_PPT_Unit_II.ppt
20IT501_DWDM_PPT_Unit_II.pptSamPrem3
 
Azure Data Fundamentals DP 900 Full Course
Azure Data Fundamentals DP 900 Full CourseAzure Data Fundamentals DP 900 Full Course
Azure Data Fundamentals DP 900 Full CoursePiyush sachdeva
 
Dqs mds-matching 15042015
Dqs mds-matching 15042015Dqs mds-matching 15042015
Dqs mds-matching 15042015Neil Hambly
 
Chapter 13 data warehousing
Chapter 13   data warehousingChapter 13   data warehousing
Chapter 13 data warehousingsumit621
 

Similar to Etl Overview (Extract, Transform, And Load) (20)

Data quality and bi
Data quality and biData quality and bi
Data quality and bi
 
Data Warehousing AWS 12345
Data Warehousing AWS 12345Data Warehousing AWS 12345
Data Warehousing AWS 12345
 
Database
DatabaseDatabase
Database
 
MIS5101 WK10 Outcome Measures
MIS5101 WK10 Outcome MeasuresMIS5101 WK10 Outcome Measures
MIS5101 WK10 Outcome Measures
 
UNIT - 1 Part 2: Data Warehousing and Data Mining
UNIT - 1 Part 2: Data Warehousing and Data MiningUNIT - 1 Part 2: Data Warehousing and Data Mining
UNIT - 1 Part 2: Data Warehousing and Data Mining
 
Fundamentals of Database Design
Fundamentals of Database DesignFundamentals of Database Design
Fundamentals of Database Design
 
Artifacts, Data Dictionary, Data Modeling, Data Wrangling
Artifacts, Data Dictionary, Data Modeling, Data WranglingArtifacts, Data Dictionary, Data Modeling, Data Wrangling
Artifacts, Data Dictionary, Data Modeling, Data Wrangling
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Technical Documentation 101 for Data Engineers.pdf
Technical Documentation 101 for Data Engineers.pdfTechnical Documentation 101 for Data Engineers.pdf
Technical Documentation 101 for Data Engineers.pdf
 
dbms-1.pptx
dbms-1.pptxdbms-1.pptx
dbms-1.pptx
 
05. Physical Data Specification Template
05. Physical Data Specification Template05. Physical Data Specification Template
05. Physical Data Specification Template
 
Information Systems For Business and BeyondChapter 4Data a.docx
Information Systems For Business and BeyondChapter 4Data a.docxInformation Systems For Business and BeyondChapter 4Data a.docx
Information Systems For Business and BeyondChapter 4Data a.docx
 
Large-Scale Distributed Storage System for Business Provenance - Cloud 2011
Large-Scale Distributed Storage System for Business Provenance - Cloud 2011Large-Scale Distributed Storage System for Business Provenance - Cloud 2011
Large-Scale Distributed Storage System for Business Provenance - Cloud 2011
 
Performance management capability
Performance management capabilityPerformance management capability
Performance management capability
 
It 302 computerized accounting (week 2) - sharifah
It 302   computerized accounting (week 2) - sharifahIt 302   computerized accounting (week 2) - sharifah
It 302 computerized accounting (week 2) - sharifah
 
20IT501_DWDM_PPT_Unit_II.ppt
20IT501_DWDM_PPT_Unit_II.ppt20IT501_DWDM_PPT_Unit_II.ppt
20IT501_DWDM_PPT_Unit_II.ppt
 
Azure Data Fundamentals DP 900 Full Course
Azure Data Fundamentals DP 900 Full CourseAzure Data Fundamentals DP 900 Full Course
Azure Data Fundamentals DP 900 Full Course
 
Dqs mds-matching 15042015
Dqs mds-matching 15042015Dqs mds-matching 15042015
Dqs mds-matching 15042015
 
Chapter 13 data warehousing
Chapter 13   data warehousingChapter 13   data warehousing
Chapter 13 data warehousing
 
Lecture 3 note.pptx
Lecture 3 note.pptxLecture 3 note.pptx
Lecture 3 note.pptx
 

Etl Overview (Extract, Transform, And Load)

  • 1. IBM Ascential ETL Overview: DataStage and Quality Stage
  • 2. More than ever, businesses today need to understand their operations, customers, suppliers, partners, employees, and stockholders. They need to know what is happening with the business, analyze their operations, reach to market conditions, make the right decisions to drive revenue growth, increase profits and improve productivity and efficiency.
  • 3. CIOs are responding to their organizations’ strategic needs by developing IT initiatives that align corporate data with business objectives. These initiatives include: Business intelligence Master data management Business transformation Infrastructure rationalization Risk and compliance
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11. QualityStage Investigate  Standardize  Match  Survive The Standardize stage allows you to reformat data from multiple systems to ensure that each data type has the correct and consistent content and format.
  • 12.
  • 13. QualityStage Investigate  Standardize  Match  Survive Data matching is used to find records in a single data source or independent data sources Data matching is used to find records in a single data source or independent data sources that refer to the same entity Data matching is used to find records in a single data source or independent data sources that refer to the same entity (such as a person, organization, location, product, or material) regardless of the availability of a predetermined key.
  • 14.
  • 15.
  • 16. QualityStage Investigate  Standardize  Match  Survive Survivorship is used to create a ‘best record’ from all available information about an entity (such as a person, location, material, etc.). Survivorship and formatting ensure that the best available data survives and is correctly prepared for the target destination. Using the rules setup screen, it implements business and mapping rules, creating the necessary output structures for the target application and identifying fields that do not conform to load standards.
  • 17.
  • 18. DataStage = data transformation
  • 19. DataStage In its simplest form, DataStage performs data transformation and movement from source systems to target systems in batch and in real time. The data sources may include indexed files, sequential files, relational databases, archives, external data sources, enterprise applications and message queues.
  • 20.
  • 21.
  • 22. DataStage Administrator  Manager  Designer  Director
  • 23. DataStage Administrator  Manager  Designer  Director DataStage Manager is the primary interface to the DataStage repository. In addition to table and file layouts, it displays the routines, transforms, and jobs that are defines in the project. It also allows us to move or copy ETL jobs from one project to another.
  • 24.
  • 25. DataStage Administrator  Manager  Designer  Director Use DataStage Director to run, schedule, and monitor your DataStage jobs. You can also gather statistics as the job runs. Also used for looking at logs for debugging purposes.
  • 26.
  • 27. DataStage Designer Developing a job
  • 28. DataStage Designer Developing a job
  • 29. DataStage Designer Input Stage
  • 30. DataStage Designer Transformer Stage The Transformer stage performs any data conversion required before the data is output to another stage in the job design. After you are done, compile and run the job.
  • 35.
  • 37. Thank you for attending

Editor's Notes

  1. Master data management – Reliably create and maintain consistent, complete, contextual and accurate business information about entities such as customers and products across multiple systems Business intelligence – Take the guesswork out of important decisions by gathering, storing, analyzing, and providing access to diverse enterprise information. Business transformation – Isolate users and applications from the underlying information completely to enable On Demand Business. Infrastructure rationalization – Quickly and accurately streamline corporate information by repurposing and reconciling data whenever it is required Risk and compliance - Deliver a dependable information management foundation to any quality control, corporate reporting visibility and data audit infrastructure.
  2. DS Administrator is used for administration tasks such as setting up users, logging, creating and moving projects and setting up purging criteria
  3. Permissions - Assign user categories to operating system user groups or enable operators to view all the details of an event in a job log file. Tracing – Enable or disable tracing on the server. Schedule – Set up a user name and password to use for running scheduled DataStage jobs. Mainframe – Set mainframe job properties and the default platform type. Turntables – Configure cache settings for Hashed File stages. Parallel – Set parallel job properties and defaults for date/time and number formats. Sequence – Set compilation defaults for job sequences. Remote – If you have specified that parallel jobs in the project are to be deployed on a USS system, this page allows you to specify deployment mode and USS machine details.
  4. DataStage Designer – used to create DataStage applications (known as jobs). Each job specifies the data sources, the transformations required, and the destination of the data. Jobs are compiled to create executables that are scheduled by the Director and run on the server.
  5. DataStage Director – used to validate, schedule, run, and monitor DataStage job sequences.
  6. Constraint - Prevents data from getting into the processing piece of the ETL job (reject) Derivation - Logic at the field level (example: is it “open”? (“click through”))