SlideShare una empresa de Scribd logo
1 de 17
Descargar para leer sin conexión
1 
Adriano Patrick Cunha 
ETL in DW Real-Time 
Adriano Patrick do N. Cunha
2 
Adriano Patrick Cunha 
Conceits 
Data Warehouse (DW)
3 
Adriano Patrick Cunha 
Conceits 
Data Warehouse (DW) 
“is a prominent approach to materialized data integration. 
Data of interest, scattered across multiple heterogeneous 
sources is integrated into a central database system.” (Jörg e 
Dessloch) 
“provides information for analytical processing, decision 
making and data mining tools. A DW collects data from 
multiple heterogeneous operational source systems OLTP 
and stores summarized integrated business data in a central 
repository used by analytical applications OLAP” (Bernadino e 
Santos)
(Kakish e Kraft) 
4 
ETL – Extraction, Transformation and Loading 
“Is a process extract the data from source system, transforms 
the data according to business rule, and loads results into the 
target data warehouse.” 
Actions: 
1)The identification of relevant information at the source 
side. 
2)The extraction of this information. 
3)The customization and integration of the information 
coming from multiple sources into common format. 
4)The cleaning of the result data set on the basis of 
database and business rules. 
5)The propagation of the data to the DW and DM 
Adriano Patrick Cunha 
Conceits
5 
Conceits 
Data Warehouse (DW) – Data Quality Dimensions 
Adriano Patrick Cunha 
Completeness 
Conformity 
Consistency 
Accuracy 
Duplication 
Integrity
6 
Adriano Patrick Cunha 
ETL Process 
Extract 
“Taking out the data from a variety of disparate source 
system correctly is often the most challenging aspect of ETL 
...” 
“The goal of the extraction phase is to convert the data into 
a single format which is appropriate for transformation 
process...” 
Relational DB, flat files, IMS, VSAM, ISAM etc. 
“Most of the time the data in source system is very complex, 
thus determining which data is relevant is very difficult...” 
(Kakish e Kraft)
7 
Adriano Patrick Cunha 
ETL Process 
Extract 
Logical Methods for extraction: 
Full extraction 
No need to keep track change 
Incremental extraction 
CDC mechanism 
Staging Area
8 
Adriano Patrick Cunha 
ETL Process 
Extract 
Physical Methods for extraction: 
Online extraction 
Connect to source system to extract in preconfigured format. 
Offline extraction 
The data extracted is staged outside
9 
Adriano Patrick Cunha 
ETL Process 
Transform 
Types Transformation 
1. Selecting only certain columns to load; 
2. Translating coded values (1 for male and 2 for famale, but DW M and F); 
3. Encoding free-form values (mapping “Male” to “1”); 
4. Deriving a new calculated value; 
5. Sorting; 
6. Joining data from multiple sources and removing data duplicating; 
7. Aggregation; 
8. Generating surrogate-key values;
10 
Adriano Patrick Cunha 
ETL Process 
Transform 
Types Transformation 
1. Transposing or pivoting (turning multiple columns into multiple rows or 
vice versa); 
2. Splitting a column into multiple columns; 
3. Disaggregation of repeating columns into a separate detail table; 
4. Lookup and validate the relevant data from tables or referential files for 
slowly change dimensions; and 
5. Applying any form of simple or complex data validation.
11 
Adriano Patrick Cunha 
ETL Process 
Load 
Mechanisms to load include: 
1. SQL loader: used in flat files into DW; 
2. External Tables: store data in virtual table to queried and joined; 
3. Oracle Call interface (OCI): is a API used when the transformation 
process is done outside database; 
4. Export/Import
12 
Adriano Patrick Cunha 
Types ETL´s
13 
Adriano Patrick Cunha 
CDC - Change Data Capture 
Snapshot Sources - Performs the ETL to a file and run a compare 
with the previous version of the file 
Logged Sources - Uses change logs, usually using triggers to go 
with storing the logs changes, but may also be used by the 
business logic of the applications or even using specific utilities of 
the DBMS, such as database log scraping or log sniffing, who 
loggin transactions 
Timestamped Sources - the tables have attributes audit, which 
indicate when the attribute is created or changed
14 
Adriano Patrick Cunha 
CDC - Change Data Capture 
Snapshot Sources
15 
Adriano Patrick Cunha 
CDC - Change Data Capture 
Logged Sources
16 
Adriano Patrick Cunha 
Bibliografia 
Near real-time data warehousing using state-of-the-art ETL tools 
Thomas Jörg, Stefan Dessloch (2010) 
Lecture Notes in Business Information Processing 41 LNBI 
Real-time data warehouse loading methodology 
Ricardo Jorge Santos, Jorge Bernardino (2008) 
Proceedings of the 2008 international symposium on Database engineering & applications - IDEAS '08 
http://portal.acm.org/citation.cfm?doid=1451940.1451949 
Near real-time data warehousing with multi-stage trickle and flip 
Janis Zuters (2011) 
Lecture Notes in Business Information Processing 90 LNBIP 
A Triggering and scheduling approach for ETL in a real-time data warehouse 
Jie Song, Yubin Bao, Jingang Shi (2010) 
Proceedings - 10th IEEE International Conference on Computer and Information Technology, 
CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, 
ScalCom-2010 
Creating a Real Time Data Warehouse 
Joseph Guerra, David A Andrews (2011) 
Andrews Consulting Group 
ETL Evolution for Real-Time Data Warehousing 
Kamal Kakish, Theresa A Kraft (2012) 
Proceedings of the Conference on Information Systems Applied Research p. 1-12 
www.aitp-edsig.org
17 
All text and image content in this document is licensed under the Creative Commons Attribution-Share Alike 3.0 License 
(unless otherwise specified). "LibreOffice" and "The Document Foundation" are registered trademarks. Their respective logos 
and icons are subject to international copyright laws. The use of these therefore is subject to the trademark policy. 
Adriano Patrick Cunha 
Thank you … 
adriano.patrick@unifor.br 
adrianopatrickc

Más contenido relacionado

La actualidad más candente

ETL Tools Ankita Dubey
ETL Tools Ankita DubeyETL Tools Ankita Dubey
ETL Tools Ankita DubeyAnkita Dubey
 
Why shift from ETL to ELT?
Why shift from ETL to ELT?Why shift from ETL to ELT?
Why shift from ETL to ELT?HEXANIKA
 
10 basic terms so you can talk to data engineer
10 basic terms so you can  talk to data engineer10 basic terms so you can  talk to data engineer
10 basic terms so you can talk to data engineerWorapol Alex Pongpech, PhD
 
Etl overview training
Etl overview trainingEtl overview training
Etl overview trainingMondy Holten
 
DATA WAREHOUSE -- ETL testing Plan
DATA WAREHOUSE -- ETL testing PlanDATA WAREHOUSE -- ETL testing Plan
DATA WAREHOUSE -- ETL testing PlanMadhu Nepal
 
Data warehouse
Data warehouseData warehouse
Data warehouseRajThakuri
 
Taming the ETL beast: How LinkedIn uses metadata to run complex ETL flows rel...
Taming the ETL beast: How LinkedIn uses metadata to run complex ETL flows rel...Taming the ETL beast: How LinkedIn uses metadata to run complex ETL flows rel...
Taming the ETL beast: How LinkedIn uses metadata to run complex ETL flows rel...rajappaiyer
 
Day 6.4 extraction__lo
Day 6.4 extraction__loDay 6.4 extraction__lo
Day 6.4 extraction__lotovetrivel
 
Day 8.1 system_admin_tasks
Day 8.1 system_admin_tasksDay 8.1 system_admin_tasks
Day 8.1 system_admin_taskstovetrivel
 
Integration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsIntegration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsMichael Häusler
 
Day 6.1 and_6.2__flat_files_and_service_api
Day 6.1 and_6.2__flat_files_and_service_apiDay 6.1 and_6.2__flat_files_and_service_api
Day 6.1 and_6.2__flat_files_and_service_apitovetrivel
 
Data Archiving -Ramesh sap bw
Data Archiving -Ramesh sap bwData Archiving -Ramesh sap bw
Data Archiving -Ramesh sap bwramesh rao
 
Catalogic DPX: Dashboard Reporting with Microsoft Power BI
Catalogic DPX: Dashboard Reporting with Microsoft Power BICatalogic DPX: Dashboard Reporting with Microsoft Power BI
Catalogic DPX: Dashboard Reporting with Microsoft Power BICatalogic Software
 

La actualidad más candente (20)

ETL Tools Ankita Dubey
ETL Tools Ankita DubeyETL Tools Ankita Dubey
ETL Tools Ankita Dubey
 
ETL Process
ETL ProcessETL Process
ETL Process
 
Why shift from ETL to ELT?
Why shift from ETL to ELT?Why shift from ETL to ELT?
Why shift from ETL to ELT?
 
Introduction To Pentaho Kettle
Introduction To Pentaho KettleIntroduction To Pentaho Kettle
Introduction To Pentaho Kettle
 
Etl
EtlEtl
Etl
 
Sap business objects interview questions
Sap business objects interview questionsSap business objects interview questions
Sap business objects interview questions
 
Introduction to ETL and Data Integration
Introduction to ETL and Data IntegrationIntroduction to ETL and Data Integration
Introduction to ETL and Data Integration
 
10 basic terms so you can talk to data engineer
10 basic terms so you can  talk to data engineer10 basic terms so you can  talk to data engineer
10 basic terms so you can talk to data engineer
 
Etl overview training
Etl overview trainingEtl overview training
Etl overview training
 
DATA WAREHOUSE -- ETL testing Plan
DATA WAREHOUSE -- ETL testing PlanDATA WAREHOUSE -- ETL testing Plan
DATA WAREHOUSE -- ETL testing Plan
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
ECSA 2013 (Cuesta)
ECSA 2013 (Cuesta)ECSA 2013 (Cuesta)
ECSA 2013 (Cuesta)
 
Hadoop & Data Warehouse
Hadoop & Data Warehouse Hadoop & Data Warehouse
Hadoop & Data Warehouse
 
Taming the ETL beast: How LinkedIn uses metadata to run complex ETL flows rel...
Taming the ETL beast: How LinkedIn uses metadata to run complex ETL flows rel...Taming the ETL beast: How LinkedIn uses metadata to run complex ETL flows rel...
Taming the ETL beast: How LinkedIn uses metadata to run complex ETL flows rel...
 
Day 6.4 extraction__lo
Day 6.4 extraction__loDay 6.4 extraction__lo
Day 6.4 extraction__lo
 
Day 8.1 system_admin_tasks
Day 8.1 system_admin_tasksDay 8.1 system_admin_tasks
Day 8.1 system_admin_tasks
 
Integration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsIntegration Patterns for Big Data Applications
Integration Patterns for Big Data Applications
 
Day 6.1 and_6.2__flat_files_and_service_api
Day 6.1 and_6.2__flat_files_and_service_apiDay 6.1 and_6.2__flat_files_and_service_api
Day 6.1 and_6.2__flat_files_and_service_api
 
Data Archiving -Ramesh sap bw
Data Archiving -Ramesh sap bwData Archiving -Ramesh sap bw
Data Archiving -Ramesh sap bw
 
Catalogic DPX: Dashboard Reporting with Microsoft Power BI
Catalogic DPX: Dashboard Reporting with Microsoft Power BICatalogic DPX: Dashboard Reporting with Microsoft Power BI
Catalogic DPX: Dashboard Reporting with Microsoft Power BI
 

Destacado

Using Continuous Etl With Real Time Queries To Eliminate My Sql Bottlenecks
Using Continuous Etl With Real Time Queries To Eliminate My Sql BottlenecksUsing Continuous Etl With Real Time Queries To Eliminate My Sql Bottlenecks
Using Continuous Etl With Real Time Queries To Eliminate My Sql BottlenecksMySQLConference
 
Processing Near Real-Time Global Vessel Data
Processing Near Real-Time Global Vessel DataProcessing Near Real-Time Global Vessel Data
Processing Near Real-Time Global Vessel DataSafe Software
 
High volume real time contiguous etl and audit
High volume real time contiguous etl and auditHigh volume real time contiguous etl and audit
High volume real time contiguous etl and auditRemus Rusanu
 
Hand Coding ETL Scenarios and Challenges
Hand Coding ETL Scenarios and ChallengesHand Coding ETL Scenarios and Challenges
Hand Coding ETL Scenarios and Challengesmark madsen
 
Real time ETL processing using Spark streaming
Real time ETL processing using Spark streamingReal time ETL processing using Spark streaming
Real time ETL processing using Spark streamingdatamantra
 
Introduction to Real-time data processing
Introduction to Real-time data processingIntroduction to Real-time data processing
Introduction to Real-time data processingYogi Devendra Vyavahare
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureJames Serra
 

Destacado (8)

Using Continuous Etl With Real Time Queries To Eliminate My Sql Bottlenecks
Using Continuous Etl With Real Time Queries To Eliminate My Sql BottlenecksUsing Continuous Etl With Real Time Queries To Eliminate My Sql Bottlenecks
Using Continuous Etl With Real Time Queries To Eliminate My Sql Bottlenecks
 
Processing Near Real-Time Global Vessel Data
Processing Near Real-Time Global Vessel DataProcessing Near Real-Time Global Vessel Data
Processing Near Real-Time Global Vessel Data
 
High volume real time contiguous etl and audit
High volume real time contiguous etl and auditHigh volume real time contiguous etl and audit
High volume real time contiguous etl and audit
 
kafka for db as postgres
kafka for db as postgreskafka for db as postgres
kafka for db as postgres
 
Hand Coding ETL Scenarios and Challenges
Hand Coding ETL Scenarios and ChallengesHand Coding ETL Scenarios and Challenges
Hand Coding ETL Scenarios and Challenges
 
Real time ETL processing using Spark streaming
Real time ETL processing using Spark streamingReal time ETL processing using Spark streaming
Real time ETL processing using Spark streaming
 
Introduction to Real-time data processing
Introduction to Real-time data processingIntroduction to Real-time data processing
Introduction to Real-time data processing
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 

Similar a ETL DW-RealTime

DMDW 5. Student Presentation - Pentaho Data Integration (Kettle)
DMDW 5. Student Presentation - Pentaho Data Integration (Kettle)DMDW 5. Student Presentation - Pentaho Data Integration (Kettle)
DMDW 5. Student Presentation - Pentaho Data Integration (Kettle)Johannes Hoppe
 
A Comparitive Study Of ETL Tools
A Comparitive Study Of ETL ToolsA Comparitive Study Of ETL Tools
A Comparitive Study Of ETL ToolsRhonda Cetnar
 
NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA
NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATANEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA
NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATAcsandit
 
NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA
NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA
NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA cscpconf
 
BI Chapter 03.pdf business business business business business business
BI Chapter 03.pdf business business business business business businessBI Chapter 03.pdf business business business business business business
BI Chapter 03.pdf business business business business business businessJawaherAlbaddawi
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSSDeepali Raut
 
To Study E T L ( Extract, Transform, Load) Tools Specially S Q L Server I...
To Study  E T L ( Extract, Transform, Load) Tools Specially  S Q L  Server  I...To Study  E T L ( Extract, Transform, Load) Tools Specially  S Q L  Server  I...
To Study E T L ( Extract, Transform, Load) Tools Specially S Q L Server I...Shahzad
 
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreBig Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreHPCC Systems
 
Designing TCS e-Infrastructure: data, metadata and architecture
Designing TCS e-Infrastructure: data, metadata and architecture Designing TCS e-Infrastructure: data, metadata and architecture
Designing TCS e-Infrastructure: data, metadata and architecture Daniele Bailo
 
An Overview on Data Quality Issues at Data Staging ETL
An Overview on Data Quality Issues at Data Staging ETLAn Overview on Data Quality Issues at Data Staging ETL
An Overview on Data Quality Issues at Data Staging ETLidescitation
 
Enhancing Data Staging as a Mechanism for Fast Data Access
Enhancing Data Staging as a Mechanism for Fast Data AccessEnhancing Data Staging as a Mechanism for Fast Data Access
Enhancing Data Staging as a Mechanism for Fast Data AccessEditor IJCATR
 
Enhancing Data Staging as a Mechanism for Fast Data Access
Enhancing Data Staging as a Mechanism for Fast Data AccessEnhancing Data Staging as a Mechanism for Fast Data Access
Enhancing Data Staging as a Mechanism for Fast Data AccessEditor IJCATR
 
Enhancing Data Staging as a Mechanism for Fast Data Access
Enhancing Data Staging as a Mechanism for Fast Data AccessEnhancing Data Staging as a Mechanism for Fast Data Access
Enhancing Data Staging as a Mechanism for Fast Data AccessEditor IJCATR
 
Enhancing Data Staging as a Mechanism for Fast Data Access
Enhancing Data Staging as a Mechanism for Fast Data AccessEnhancing Data Staging as a Mechanism for Fast Data Access
Enhancing Data Staging as a Mechanism for Fast Data AccessEditor IJCATR
 

Similar a ETL DW-RealTime (20)

DMDW 5. Student Presentation - Pentaho Data Integration (Kettle)
DMDW 5. Student Presentation - Pentaho Data Integration (Kettle)DMDW 5. Student Presentation - Pentaho Data Integration (Kettle)
DMDW 5. Student Presentation - Pentaho Data Integration (Kettle)
 
ETL Process
ETL ProcessETL Process
ETL Process
 
A Comparitive Study Of ETL Tools
A Comparitive Study Of ETL ToolsA Comparitive Study Of ETL Tools
A Comparitive Study Of ETL Tools
 
NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA
NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATANEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA
NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA
 
NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA
NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA
NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA
 
ETL Technologies.pptx
ETL Technologies.pptxETL Technologies.pptx
ETL Technologies.pptx
 
BI Chapter 03.pdf business business business business business business
BI Chapter 03.pdf business business business business business businessBI Chapter 03.pdf business business business business business business
BI Chapter 03.pdf business business business business business business
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSS
 
DW 101
DW 101DW 101
DW 101
 
To Study E T L ( Extract, Transform, Load) Tools Specially S Q L Server I...
To Study  E T L ( Extract, Transform, Load) Tools Specially  S Q L  Server  I...To Study  E T L ( Extract, Transform, Load) Tools Specially  S Q L  Server  I...
To Study E T L ( Extract, Transform, Load) Tools Specially S Q L Server I...
 
Etl techniques
Etl techniquesEtl techniques
Etl techniques
 
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreBig Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
 
Designing TCS e-Infrastructure: data, metadata and architecture
Designing TCS e-Infrastructure: data, metadata and architecture Designing TCS e-Infrastructure: data, metadata and architecture
Designing TCS e-Infrastructure: data, metadata and architecture
 
ETL (1).ppt
ETL (1).pptETL (1).ppt
ETL (1).ppt
 
DMDW 1st module.pdf
DMDW 1st module.pdfDMDW 1st module.pdf
DMDW 1st module.pdf
 
An Overview on Data Quality Issues at Data Staging ETL
An Overview on Data Quality Issues at Data Staging ETLAn Overview on Data Quality Issues at Data Staging ETL
An Overview on Data Quality Issues at Data Staging ETL
 
Enhancing Data Staging as a Mechanism for Fast Data Access
Enhancing Data Staging as a Mechanism for Fast Data AccessEnhancing Data Staging as a Mechanism for Fast Data Access
Enhancing Data Staging as a Mechanism for Fast Data Access
 
Enhancing Data Staging as a Mechanism for Fast Data Access
Enhancing Data Staging as a Mechanism for Fast Data AccessEnhancing Data Staging as a Mechanism for Fast Data Access
Enhancing Data Staging as a Mechanism for Fast Data Access
 
Enhancing Data Staging as a Mechanism for Fast Data Access
Enhancing Data Staging as a Mechanism for Fast Data AccessEnhancing Data Staging as a Mechanism for Fast Data Access
Enhancing Data Staging as a Mechanism for Fast Data Access
 
Enhancing Data Staging as a Mechanism for Fast Data Access
Enhancing Data Staging as a Mechanism for Fast Data AccessEnhancing Data Staging as a Mechanism for Fast Data Access
Enhancing Data Staging as a Mechanism for Fast Data Access
 

Más de Adriano Patrick Cunha (8)

Data structures and algorithms
Data structures and algorithmsData structures and algorithms
Data structures and algorithms
 
Desenvolvimento web e mobile ifce
Desenvolvimento web e mobile   ifceDesenvolvimento web e mobile   ifce
Desenvolvimento web e mobile ifce
 
Recuperacao Falhas em Sistemas Workflow
Recuperacao Falhas em Sistemas WorkflowRecuperacao Falhas em Sistemas Workflow
Recuperacao Falhas em Sistemas Workflow
 
Congresso TI - Qualidade de Código.
Congresso TI - Qualidade de Código.Congresso TI - Qualidade de Código.
Congresso TI - Qualidade de Código.
 
Concurrencyproblem
ConcurrencyproblemConcurrencyproblem
Concurrencyproblem
 
Article K-OPT in JSSP
Article K-OPT in JSSPArticle K-OPT in JSSP
Article K-OPT in JSSP
 
Natuurweb
NatuurwebNatuurweb
Natuurweb
 
Natuur mobile
Natuur mobileNatuur mobile
Natuur mobile
 

Último

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 

Último (20)

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 

ETL DW-RealTime

  • 1. 1 Adriano Patrick Cunha ETL in DW Real-Time Adriano Patrick do N. Cunha
  • 2. 2 Adriano Patrick Cunha Conceits Data Warehouse (DW)
  • 3. 3 Adriano Patrick Cunha Conceits Data Warehouse (DW) “is a prominent approach to materialized data integration. Data of interest, scattered across multiple heterogeneous sources is integrated into a central database system.” (Jörg e Dessloch) “provides information for analytical processing, decision making and data mining tools. A DW collects data from multiple heterogeneous operational source systems OLTP and stores summarized integrated business data in a central repository used by analytical applications OLAP” (Bernadino e Santos)
  • 4. (Kakish e Kraft) 4 ETL – Extraction, Transformation and Loading “Is a process extract the data from source system, transforms the data according to business rule, and loads results into the target data warehouse.” Actions: 1)The identification of relevant information at the source side. 2)The extraction of this information. 3)The customization and integration of the information coming from multiple sources into common format. 4)The cleaning of the result data set on the basis of database and business rules. 5)The propagation of the data to the DW and DM Adriano Patrick Cunha Conceits
  • 5. 5 Conceits Data Warehouse (DW) – Data Quality Dimensions Adriano Patrick Cunha Completeness Conformity Consistency Accuracy Duplication Integrity
  • 6. 6 Adriano Patrick Cunha ETL Process Extract “Taking out the data from a variety of disparate source system correctly is often the most challenging aspect of ETL ...” “The goal of the extraction phase is to convert the data into a single format which is appropriate for transformation process...” Relational DB, flat files, IMS, VSAM, ISAM etc. “Most of the time the data in source system is very complex, thus determining which data is relevant is very difficult...” (Kakish e Kraft)
  • 7. 7 Adriano Patrick Cunha ETL Process Extract Logical Methods for extraction: Full extraction No need to keep track change Incremental extraction CDC mechanism Staging Area
  • 8. 8 Adriano Patrick Cunha ETL Process Extract Physical Methods for extraction: Online extraction Connect to source system to extract in preconfigured format. Offline extraction The data extracted is staged outside
  • 9. 9 Adriano Patrick Cunha ETL Process Transform Types Transformation 1. Selecting only certain columns to load; 2. Translating coded values (1 for male and 2 for famale, but DW M and F); 3. Encoding free-form values (mapping “Male” to “1”); 4. Deriving a new calculated value; 5. Sorting; 6. Joining data from multiple sources and removing data duplicating; 7. Aggregation; 8. Generating surrogate-key values;
  • 10. 10 Adriano Patrick Cunha ETL Process Transform Types Transformation 1. Transposing or pivoting (turning multiple columns into multiple rows or vice versa); 2. Splitting a column into multiple columns; 3. Disaggregation of repeating columns into a separate detail table; 4. Lookup and validate the relevant data from tables or referential files for slowly change dimensions; and 5. Applying any form of simple or complex data validation.
  • 11. 11 Adriano Patrick Cunha ETL Process Load Mechanisms to load include: 1. SQL loader: used in flat files into DW; 2. External Tables: store data in virtual table to queried and joined; 3. Oracle Call interface (OCI): is a API used when the transformation process is done outside database; 4. Export/Import
  • 12. 12 Adriano Patrick Cunha Types ETL´s
  • 13. 13 Adriano Patrick Cunha CDC - Change Data Capture Snapshot Sources - Performs the ETL to a file and run a compare with the previous version of the file Logged Sources - Uses change logs, usually using triggers to go with storing the logs changes, but may also be used by the business logic of the applications or even using specific utilities of the DBMS, such as database log scraping or log sniffing, who loggin transactions Timestamped Sources - the tables have attributes audit, which indicate when the attribute is created or changed
  • 14. 14 Adriano Patrick Cunha CDC - Change Data Capture Snapshot Sources
  • 15. 15 Adriano Patrick Cunha CDC - Change Data Capture Logged Sources
  • 16. 16 Adriano Patrick Cunha Bibliografia Near real-time data warehousing using state-of-the-art ETL tools Thomas Jörg, Stefan Dessloch (2010) Lecture Notes in Business Information Processing 41 LNBI Real-time data warehouse loading methodology Ricardo Jorge Santos, Jorge Bernardino (2008) Proceedings of the 2008 international symposium on Database engineering & applications - IDEAS '08 http://portal.acm.org/citation.cfm?doid=1451940.1451949 Near real-time data warehousing with multi-stage trickle and flip Janis Zuters (2011) Lecture Notes in Business Information Processing 90 LNBIP A Triggering and scheduling approach for ETL in a real-time data warehouse Jie Song, Yubin Bao, Jingang Shi (2010) Proceedings - 10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, ScalCom-2010 Creating a Real Time Data Warehouse Joseph Guerra, David A Andrews (2011) Andrews Consulting Group ETL Evolution for Real-Time Data Warehousing Kamal Kakish, Theresa A Kraft (2012) Proceedings of the Conference on Information Systems Applied Research p. 1-12 www.aitp-edsig.org
  • 17. 17 All text and image content in this document is licensed under the Creative Commons Attribution-Share Alike 3.0 License (unless otherwise specified). "LibreOffice" and "The Document Foundation" are registered trademarks. Their respective logos and icons are subject to international copyright laws. The use of these therefore is subject to the trademark policy. Adriano Patrick Cunha Thank you … adriano.patrick@unifor.br adrianopatrickc