Etl Overview (Extract, Transform, And Load)

IBM Ascential ETL Overview: DataStage and Quality Stage

More than ever, businesses today need to understand their operations, customers, suppliers, partners, employees, and stockholders. They need to know what is happening with the business, analyze their operations, reach to market conditions, make the right decisions to drive revenue growth, increase profits and improve productivity and efficiency.

CIOs are responding to their organizations’ strategic needs by developing IT initiatives that align corporate data with business objectives. These initiatives include: Business intelligence Master data management Business transformation Infrastructure rationalization Risk and compliance

[object Object],[object Object],[object Object],[object Object],[object Object],IBM WebSphere Information Integration platform enables businesses to perform five key integration functions :

Data Analysis : Define, annotate, and report on fields of business data. ,[object Object],[object Object],[object Object],[object Object],Data Transformation & Movement : Move data and transform it to meet the requirements of its target systems ,[object Object],[object Object],[object Object],[object Object],Software: QualityStage Software: DataStage Software: N/A (not used at NCEN) Software: QualityStage Software: DataStage This presentation will deal with ETL QualityStage and DataStage .

QualityStage QualityStage is used to cleanse and enrich data to meet business needs and data quality management standards. ,[object Object],[object Object],[object Object],[object Object],[object Object],= data cleansing

QualityStage Main QS stages used in the BRM project: ,[object Object],[object Object],[object Object],[object Object]

QualityStage Investigate  Standardize  Match  Survive ,[object Object],[object Object]

QualityStage Investigate  Standardize  Match  Survive ,[object Object],[object Object],[object Object],For the United States, the address data would include: USPREP (parses name, address and area if data not previously formatted) USNAME (for individual and organization names) USADDR (for street and mailing addresses) USAREA (for city, state, ZIP code and so on)

QualityStage Investigate  Standardize  Match  Survive Field parsing breaks the address into individual tokens of “123”, “St.”, “Virginia” and “St.” Example : The test field “ 123 St. Virginia St. ” would be analyzed in the following way: ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

QualityStage Investigate  Standardize  Match  Survive The Standardize stage allows you to reformat data from multiple systems to ensure that each data type has the correct and consistent content and format.

QualityStage Investigate  Standardize  Match  Survive Standardization is used to invoke specific standardization Rule Sets and standardize one or more fields using that Rule Set. Standardization is used to invoke specific standardization Rule Sets and standardize one or more fields using that Rule Set. For example, a Rule Set can be used so that “ Boulevard ” will always be “ Blvd ” Standardization is used to invoke specific standardization Rule Sets and standardize one or more fields using that Rule Set. For example, a Rule Set can be used so that “ Boulevard ” will always be “ Blvd ”, not “ Boulevard ”, “ Blv .”, “ Boulev ”, or some other variation. ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],The list below shows some of the more commonly-used Rule Sets .

QualityStage Investigate  Standardize  Match  Survive Data matching is used to find records in a single data source or independent data sources Data matching is used to find records in a single data source or independent data sources that refer to the same entity Data matching is used to find records in a single data source or independent data sources that refer to the same entity (such as a person, organization, location, product, or material) regardless of the availability of a predetermined key.

[object Object],[object Object],QualityStage Matching Stage basically consists of two steps: QualityStage Investigate  Standardize  Match  Survive

[object Object],[object Object],[object Object],[object Object],Operations in the Matching module: 2. Processing Files 1. Unduplication ,[object Object],[object Object],[object Object],[object Object],1. Unduplication (group records into sets having similar attributes) QualityStage Investigate  Standardize  Match  Survive

QualityStage Investigate  Standardize  Match  Survive Survivorship is used to create a ‘best record’ from all available information about an entity (such as a person, location, material, etc.). Survivorship and formatting ensure that the best available data survives and is correctly prepared for the target destination. Using the rules setup screen, it implements business and mapping rules, creating the necessary output structures for the target application and identifying fields that do not conform to load standards.

QualityStage Investigate  Standardize  Match  Survive ,[object Object],[object Object],[object Object],The Survive stage does the following:

DataStage = data transformation

DataStage In its simplest form, DataStage performs data transformation and movement from source systems to target systems in batch and in real time. The data sources may include indexed files, sequential files, relational databases, archives, external data sources, enterprise applications and message queues.

DataStage ,[object Object],[object Object],[object Object],[object Object],The DataStage client components are:

[object Object],[object Object],[object Object],[object Object],DataStage Administrator  Manager  Designer  Director Use DataStage Administrator to:

DataStage Administrator  Manager  Designer  Director

DataStage Administrator  Manager  Designer  Director DataStage Manager is the primary interface to the DataStage repository. In addition to table and file layouts, it displays the routines, transforms, and jobs that are defines in the project. It also allows us to move or copy ETL jobs from one project to another.

[object Object],[object Object],[object Object],[object Object],[object Object],DataStage Administrator  Manager  Designer  Director Use DataStage Designer to:

DataStage Administrator  Manager  Designer  Director Use DataStage Director to run, schedule, and monitor your DataStage jobs. You can also gather statistics as the job runs. Also used for looking at logs for debugging purposes.

[object Object],[object Object],[object Object],[object Object],DataStage: Getting Started

DataStage Designer Developing a job

DataStage Designer Input Stage

DataStage Designer Transformer Stage The Transformer stage performs any data conversion required before the data is output to another stage in the job design. After you are done, compile and run the job.

[object Object],[object Object],[object Object],[object Object],DataStage An example : Preventing the header row from inserting into MDM_Contact and MDM_Broker

Etl Overview (Extract, Transform, And Load)

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (7)

Similar to Etl Overview (Extract, Transform, And Load)

Similar to Etl Overview (Extract, Transform, And Load) (20)

Etl Overview (Extract, Transform, And Load)

Editor's Notes