SlideShare a Scribd company logo
1 of 17
Capacity Model of an ETL system
Ashok Bhatla
Email – ASHOK.BHATLA.WRITER@GMAIL.COM
What is Business Intelligence?
Business Intelligence (BI) is a combination of tools, processes and
software which help a company to transform data into actionable
knowledge, thereby allowing them to take faster and informed decisions in
order to achieve their strategic goals.
It’s all about providing right information to the management at the right
time with the lowest possible cost.

As we are drowning in data, but
starving for knowledge,
Business Intelligence has
become the No. 1 priority for IT
Managers today.
What is ETL?
ETL stands for Extract, Transform and Load. A transactional system is meant
to be a high performance system so that users can get their work faster.
Running some reports from a Transactional system makes it slower. Therefore,
the concept of ETL gained popularity.

In computing, Extract, Transform, and Load
(ETL) refers to a process in database usage
which involves the following steps
Extracts data from outside sources.
Transforms it to fit operational needs,
which can include joining/reformatting
some tables.
Loads it into the end target (database,
more specifically, operational data
store, data mart, or data warehouse)
Example of ETL
OLTP Systems
Cost
Accounting
System

Payroll
Data
ETL – Joins,
Transforms,
Deletes etc.

Load Data

Sales
Data
Staged
Data
Purchasing
Data

EDW /
Reporting Data
What is Capacity Planning?
 Capacity Planning is the process of identifying the current
computing needs of a business application and to forecast the
future computing needs based on the business plans.
 In other words, it means what computing resources are needed to
meet an application’s service level objectives over a period of time.
 In today’s economic climate, business requirements can change
rapidly depending upon an organization’s strategy and goals.
 Therefore properly managed capacity plans should be able to take
unforeseen requirements into account.
 Capacity Planning can be either done in a very casual manner or
very organized and disciplined methodologies can be used.
 More data driven the capacity planning is, more accurate the
results.
Capacity Planning of an IT System
Capacity planning needs to
ensure that all Hardware (Disks,
Memory, CPU, and Network),
Software resources (User
Licenses) and facilities are
optimally used.

Software Licenses,
No. of Users
Servers, Storage,
Networking, CPU
Data Center Space,
Power, Cooling
Capacity Planning
We cannot manage
something which we
cannot measure.

Avoid
downtimes by
reducing no of
Incidents

Achieve
Performance
Objectives
established by
business

If no corrective action is
taken based on measured
data, then Capacity
Planning is of no use

Proactive
Capacity
Planning

Reduce TCO for
the ETL System

Achieve optimal
utilization of
computing
Resources
Capacity Planning Steps
Identify Service Level Objectives – know
the requirements in business terms

Analyze Current Capacity – Gather data
about resource consumption, ideal times
and peak usage
Know the future business needs and plan
for future capacity needs – How the IT
systems will be able to handle increased
load
Strike a Balance
As per Moore’s Law, IT is getting cheaper
and faster every 18 months. But
organizations cannot wait for next
generation of technology to be available –
as they need to take care of business.

Performance

Utilization
Supply

Demand
Cost
As per Parkinson’s Law, if you give
more resources to customers, they will
find ways to use more resources. IT
managers cannot keep on giving
unlimited resources to users.

Resources
Capacity Challenges for ETL Systems
ETL jobs are of different types
(Full Refresh and some Delta
Refresh), process varying
amounts of data and are
scheduled at different
frequencies. Therefore, there
are always spikes and valleys
of workload.

SQL queries are simple and do
not require parallelism. On
the other hand in an ETL
system, very large datasets
and processed and Workloads
are random in nature and not
easy to predict. This makes it
difficult to predict the
resource requirement.

An enterprise ETL system
processes thousands of
batch jobs on a daily basis.
These Systems connect to
large no. of data sources
which reside on different
platforms and may be on
different networks across
the WAN

Different types of users have
different peak usage
requirements. They have
different needs for
Transaction times, Elapsed
Times and Response Times
Disks Capacity Issues – Engineers spending lots of time cleaning
old stale data
Over Capacity – Paid for extra compute Capacity, but not
utilizing it
Network Slowness Problems – Batch Jobs running slow
sometimes.
No. of User Licenses reaching limits.
Analyse the Complete Picture
User Needs
Transaction Time
Response Time
Elapsed Time
Throughput Time
Data Usage Patterns

Data Complexity

(Type of SQL Queries or ETL Transformations)
(Financial, Marketing or Factory Data)
Business Terms
Volume and Frequency of Data Loads

User Profile

(No. of Batch Jobs and GB of data processed)

(Simple User or Advanced Data Miner)

Storage
( SAN / NAS / Local Disks,)

Processing Power(CPU, No. of Cores )
Technical Terms

Network Bandwidth
(Transfer Rate, Bytes Tx/Rx)

Memory (Physical, Cache, Swap)
Capacity Planning Tools
Vectors of Measurement
Availability
Performance
Throughput
Utilization
Quality
Efficiency

Simulation
Accurate, but needs
lots of time for setup

Testing
Costly, as another
environment similar
to Production is
needed.

Trending
Can be done using
Excel. Simple, but
does not take non
linear behavior into
account

Analytical Modeling
More advanced,
Faster and Accurate
Data Collection
No. of Subject
Period ( WW or Month) Areas

No. of ETL
No. of Projects Batch Jobs

Storage
Consumption

CPU

Network

Disk I/O

Tx/Rx Bytes

How do we collect Performance / Capacity Data?
OS monitoring tools – even freeware like Nagios, kSar, SQLMon. PerfMon
Data collected in SQL tables
Data collected by Software used by the Storage Frames – gives Utilization, Capacity
and Performance Data
Capacity Model for ETL System ??
Examples of some metrics which can be developed
o Average Run time for a Batch job
o Average CPU for a Batch job
o CPU Utilization /Subject Areas /Week
o CPU Utilization / Project / Week
o No. of Batch Jobs / GB of Storage
o No. of Batch Jobs / X Amount of CPU
Dashboard / Indicators
Phase I
Develop a Trending Model in the beginning

Dashboards can be developed using Share Point BI if the Capacity Data is captured
in an Excel Pivot Table or SQL Databases

Phase II
Can we develop a Predictive Model???
Capacity Management of an ETL System

More Related Content

What's hot

Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesIvo Andreev
 
Clickhouse at Cloudflare. By Marek Vavrusa
Clickhouse at Cloudflare. By Marek VavrusaClickhouse at Cloudflare. By Marek Vavrusa
Clickhouse at Cloudflare. By Marek VavrusaValery Tkachenko
 
Phương pháp luận triển khai phần mềm DMS
Phương pháp luận triển khai phần mềm DMSPhương pháp luận triển khai phần mềm DMS
Phương pháp luận triển khai phần mềm DMSctydms
 
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...DataWorks Summit/Hadoop Summit
 
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalDataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalHarvinder Atwal
 
Rethinking State Management in Cloud-Native Streaming Systems
Rethinking State Management in Cloud-Native Streaming SystemsRethinking State Management in Cloud-Native Streaming Systems
Rethinking State Management in Cloud-Native Streaming SystemsYingjun Wu
 
Dynamic Allocation in Spark
Dynamic Allocation in SparkDynamic Allocation in Spark
Dynamic Allocation in SparkDatabricks
 
From MSSQL to MariaDB
From MSSQL to MariaDBFrom MSSQL to MariaDB
From MSSQL to MariaDBI Goo Lee
 
Consumption Based On-Demand Private Cloud in a Box
Consumption Based On-Demand Private Cloud in a BoxConsumption Based On-Demand Private Cloud in a Box
Consumption Based On-Demand Private Cloud in a BoxRebekah Rodriguez
 
Proxysql use case scenarios fosdem17
Proxysql use case scenarios    fosdem17Proxysql use case scenarios    fosdem17
Proxysql use case scenarios fosdem17Alkin Tezuysal
 
오라클 DB 아키텍처와 튜닝
오라클 DB 아키텍처와 튜닝오라클 DB 아키텍처와 튜닝
오라클 DB 아키텍처와 튜닝철민 권
 
Big data processing using Hadoop with Cloudera Quickstart
Big data processing using Hadoop with Cloudera QuickstartBig data processing using Hadoop with Cloudera Quickstart
Big data processing using Hadoop with Cloudera QuickstartIMC Institute
 
Maria db 이중화구성_고민하기
Maria db 이중화구성_고민하기Maria db 이중화구성_고민하기
Maria db 이중화구성_고민하기NeoClova
 
MongoDB.local Sydney 2019: Data Modeling for MongoDB
MongoDB.local Sydney 2019: Data Modeling for MongoDBMongoDB.local Sydney 2019: Data Modeling for MongoDB
MongoDB.local Sydney 2019: Data Modeling for MongoDBMongoDB
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
1.mysql disk io 모니터링 및 분석사례
1.mysql disk io 모니터링 및 분석사례1.mysql disk io 모니터링 및 분석사례
1.mysql disk io 모니터링 및 분석사례I Goo Lee
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefitsRicky Barron
 
Lessons from the Field, Episode II: Applying Best Practices to Your Apache S...
 Lessons from the Field, Episode II: Applying Best Practices to Your Apache S... Lessons from the Field, Episode II: Applying Best Practices to Your Apache S...
Lessons from the Field, Episode II: Applying Best Practices to Your Apache S...Databricks
 
Best Practices of HA and Replication of PostgreSQL in Virtualized Environments
Best Practices of HA and Replication of PostgreSQL in Virtualized EnvironmentsBest Practices of HA and Replication of PostgreSQL in Virtualized Environments
Best Practices of HA and Replication of PostgreSQL in Virtualized EnvironmentsJignesh Shah
 

What's hot (20)

Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best Practices
 
Clickhouse at Cloudflare. By Marek Vavrusa
Clickhouse at Cloudflare. By Marek VavrusaClickhouse at Cloudflare. By Marek Vavrusa
Clickhouse at Cloudflare. By Marek Vavrusa
 
Phương pháp luận triển khai phần mềm DMS
Phương pháp luận triển khai phần mềm DMSPhương pháp luận triển khai phần mềm DMS
Phương pháp luận triển khai phần mềm DMS
 
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
 
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalDataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
 
Rethinking State Management in Cloud-Native Streaming Systems
Rethinking State Management in Cloud-Native Streaming SystemsRethinking State Management in Cloud-Native Streaming Systems
Rethinking State Management in Cloud-Native Streaming Systems
 
Dynamic Allocation in Spark
Dynamic Allocation in SparkDynamic Allocation in Spark
Dynamic Allocation in Spark
 
From MSSQL to MariaDB
From MSSQL to MariaDBFrom MSSQL to MariaDB
From MSSQL to MariaDB
 
Consumption Based On-Demand Private Cloud in a Box
Consumption Based On-Demand Private Cloud in a BoxConsumption Based On-Demand Private Cloud in a Box
Consumption Based On-Demand Private Cloud in a Box
 
Proxysql use case scenarios fosdem17
Proxysql use case scenarios    fosdem17Proxysql use case scenarios    fosdem17
Proxysql use case scenarios fosdem17
 
오라클 DB 아키텍처와 튜닝
오라클 DB 아키텍처와 튜닝오라클 DB 아키텍처와 튜닝
오라클 DB 아키텍처와 튜닝
 
Big data processing using Hadoop with Cloudera Quickstart
Big data processing using Hadoop with Cloudera QuickstartBig data processing using Hadoop with Cloudera Quickstart
Big data processing using Hadoop with Cloudera Quickstart
 
Maria db 이중화구성_고민하기
Maria db 이중화구성_고민하기Maria db 이중화구성_고민하기
Maria db 이중화구성_고민하기
 
MongoDB.local Sydney 2019: Data Modeling for MongoDB
MongoDB.local Sydney 2019: Data Modeling for MongoDBMongoDB.local Sydney 2019: Data Modeling for MongoDB
MongoDB.local Sydney 2019: Data Modeling for MongoDB
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
1.mysql disk io 모니터링 및 분석사례
1.mysql disk io 모니터링 및 분석사례1.mysql disk io 모니터링 및 분석사례
1.mysql disk io 모니터링 및 분석사례
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefits
 
Data and AI reference architecture
Data and AI reference architectureData and AI reference architecture
Data and AI reference architecture
 
Lessons from the Field, Episode II: Applying Best Practices to Your Apache S...
 Lessons from the Field, Episode II: Applying Best Practices to Your Apache S... Lessons from the Field, Episode II: Applying Best Practices to Your Apache S...
Lessons from the Field, Episode II: Applying Best Practices to Your Apache S...
 
Best Practices of HA and Replication of PostgreSQL in Virtualized Environments
Best Practices of HA and Replication of PostgreSQL in Virtualized EnvironmentsBest Practices of HA and Replication of PostgreSQL in Virtualized Environments
Best Practices of HA and Replication of PostgreSQL in Virtualized Environments
 

Viewers also liked

Capacity management for ETL System
Capacity management for ETL SystemCapacity management for ETL System
Capacity management for ETL SystemASHOK BHATLA
 
Multiple resources for multiple intelligences
Multiple resources for multiple intelligencesMultiple resources for multiple intelligences
Multiple resources for multiple intelligencesXavier Pradheep Singh
 
Data flow in Extraction of ETL data warehousing
Data flow in Extraction of ETL data warehousingData flow in Extraction of ETL data warehousing
Data flow in Extraction of ETL data warehousingDr. Dipti Patil
 
Taming the ETL beast: How LinkedIn uses metadata to run complex ETL flows rel...
Taming the ETL beast: How LinkedIn uses metadata to run complex ETL flows rel...Taming the ETL beast: How LinkedIn uses metadata to run complex ETL flows rel...
Taming the ETL beast: How LinkedIn uses metadata to run complex ETL flows rel...rajappaiyer
 
ETL Validator: Flat File Validation
ETL Validator: Flat File ValidationETL Validator: Flat File Validation
ETL Validator: Flat File ValidationDatagaps Inc
 
Managing users & tables using Oracle Enterprise Manage
Managing users & tables using Oracle Enterprise ManageManaging users & tables using Oracle Enterprise Manage
Managing users & tables using Oracle Enterprise ManageNR Computer Learning Center
 
Capacity planning ppt @ bec doms
Capacity planning ppt @ bec domsCapacity planning ppt @ bec doms
Capacity planning ppt @ bec domsBabasab Patil
 
ETL Validator: Creating Data Model
ETL Validator: Creating Data ModelETL Validator: Creating Data Model
ETL Validator: Creating Data ModelDatagaps Inc
 
Open Source ETL vs Commercial ETL
Open Source ETL vs Commercial ETLOpen Source ETL vs Commercial ETL
Open Source ETL vs Commercial ETLJonathan Levin
 
Crossref webinar - Maintaining your metadata - latest
Crossref webinar - Maintaining your metadata - latestCrossref webinar - Maintaining your metadata - latest
Crossref webinar - Maintaining your metadata - latestCrossref
 
Supply and demand management in services
Supply and demand management in servicesSupply and demand management in services
Supply and demand management in servicesShwetanshu Gupta
 
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...Amazon Web Services
 
Overview sap bo girona nib efimatica
Overview sap bo girona nib efimaticaOverview sap bo girona nib efimatica
Overview sap bo girona nib efimaticaEfimatica
 
Strategic capacity planning for products and services
Strategic capacity planning for products and servicesStrategic capacity planning for products and services
Strategic capacity planning for products and servicesgerlyn bonus
 
Seven building blocks for MDM
Seven building blocks for MDMSeven building blocks for MDM
Seven building blocks for MDMKousik Mukherjee
 
Waiting Line Management
Waiting Line Management Waiting Line Management
Waiting Line Management Joshua Miranda
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
 

Viewers also liked (20)

Capacity management for ETL System
Capacity management for ETL SystemCapacity management for ETL System
Capacity management for ETL System
 
Multiple resources for multiple intelligences
Multiple resources for multiple intelligencesMultiple resources for multiple intelligences
Multiple resources for multiple intelligences
 
Manage users & tables in Oracle Database
Manage users & tables in Oracle DatabaseManage users & tables in Oracle Database
Manage users & tables in Oracle Database
 
Data flow in Extraction of ETL data warehousing
Data flow in Extraction of ETL data warehousingData flow in Extraction of ETL data warehousing
Data flow in Extraction of ETL data warehousing
 
Taming the ETL beast: How LinkedIn uses metadata to run complex ETL flows rel...
Taming the ETL beast: How LinkedIn uses metadata to run complex ETL flows rel...Taming the ETL beast: How LinkedIn uses metadata to run complex ETL flows rel...
Taming the ETL beast: How LinkedIn uses metadata to run complex ETL flows rel...
 
ETL Validator: Flat File Validation
ETL Validator: Flat File ValidationETL Validator: Flat File Validation
ETL Validator: Flat File Validation
 
Managing users & tables using Oracle Enterprise Manage
Managing users & tables using Oracle Enterprise ManageManaging users & tables using Oracle Enterprise Manage
Managing users & tables using Oracle Enterprise Manage
 
Capacity planning ppt @ bec doms
Capacity planning ppt @ bec domsCapacity planning ppt @ bec doms
Capacity planning ppt @ bec doms
 
Oracle Tablespace - Basic
Oracle Tablespace - BasicOracle Tablespace - Basic
Oracle Tablespace - Basic
 
ETL Validator: Creating Data Model
ETL Validator: Creating Data ModelETL Validator: Creating Data Model
ETL Validator: Creating Data Model
 
Open Source ETL vs Commercial ETL
Open Source ETL vs Commercial ETLOpen Source ETL vs Commercial ETL
Open Source ETL vs Commercial ETL
 
Crossref webinar - Maintaining your metadata - latest
Crossref webinar - Maintaining your metadata - latestCrossref webinar - Maintaining your metadata - latest
Crossref webinar - Maintaining your metadata - latest
 
Supply and demand management in services
Supply and demand management in servicesSupply and demand management in services
Supply and demand management in services
 
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
 
Overview sap bo girona nib efimatica
Overview sap bo girona nib efimaticaOverview sap bo girona nib efimatica
Overview sap bo girona nib efimatica
 
Strategic capacity planning for products and services
Strategic capacity planning for products and servicesStrategic capacity planning for products and services
Strategic capacity planning for products and services
 
Seven building blocks for MDM
Seven building blocks for MDMSeven building blocks for MDM
Seven building blocks for MDM
 
Waiting Line Management
Waiting Line Management Waiting Line Management
Waiting Line Management
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
 
Capacity planning
Capacity planning Capacity planning
Capacity planning
 

Similar to Capacity Management of an ETL System

Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSSDeepali Raut
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse conceptsobieefans
 
Informatica and datawarehouse Material
Informatica and datawarehouse MaterialInformatica and datawarehouse Material
Informatica and datawarehouse Materialobieefans
 
Should ETL Become Obsolete
Should ETL Become ObsoleteShould ETL Become Obsolete
Should ETL Become ObsoleteJerald Burget
 
Scalable scheduling of updates in streaming data warehouses
Scalable scheduling of updates in streaming data warehousesScalable scheduling of updates in streaming data warehouses
Scalable scheduling of updates in streaming data warehousesFinalyear Projects
 
REAL TIME PROJECTS IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...
REAL TIME PROJECTS  IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...REAL TIME PROJECTS  IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...
REAL TIME PROJECTS IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...Finalyear Projects
 
BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)Syaifuddin Ismail
 
Real-Time Data Warehouse Loading Methodology Ricardo Jorge S.docx
Real-Time Data Warehouse Loading Methodology Ricardo Jorge S.docxReal-Time Data Warehouse Loading Methodology Ricardo Jorge S.docx
Real-Time Data Warehouse Loading Methodology Ricardo Jorge S.docxsodhi3
 
What are the benefits of learning ETL Development and where to start learning...
What are the benefits of learning ETL Development and where to start learning...What are the benefits of learning ETL Development and where to start learning...
What are the benefits of learning ETL Development and where to start learning...kzayra69
 
DATAWAREHOUSE MAIn under data mining for
DATAWAREHOUSE MAIn under data mining forDATAWAREHOUSE MAIn under data mining for
DATAWAREHOUSE MAIn under data mining forAyushMeraki1
 
Эволюция Big Data и Information Management. Reference Architecture.
Эволюция Big Data и Information Management. Reference Architecture.Эволюция Big Data и Information Management. Reference Architecture.
Эволюция Big Data и Information Management. Reference Architecture.Andrey Akulov
 
An Integrated ERP With Web Portal
An Integrated ERP With Web PortalAn Integrated ERP With Web Portal
An Integrated ERP With Web PortalTracy Morgan
 
ETL VS ELT.pdf
ETL VS ELT.pdfETL VS ELT.pdf
ETL VS ELT.pdfBOSupport
 
An Integrated ERP with Web Portal
An Integrated ERP with Web Portal An Integrated ERP with Web Portal
An Integrated ERP with Web Portal acijjournal
 
Designing modern dw and data lake
Designing modern dw and data lakeDesigning modern dw and data lake
Designing modern dw and data lakepunedevscom
 

Similar to Capacity Management of an ETL System (20)

Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSS
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse concepts
 
Informatica and datawarehouse Material
Informatica and datawarehouse MaterialInformatica and datawarehouse Material
Informatica and datawarehouse Material
 
Should ETL Become Obsolete
Should ETL Become ObsoleteShould ETL Become Obsolete
Should ETL Become Obsolete
 
Gowthami_Resume
Gowthami_ResumeGowthami_Resume
Gowthami_Resume
 
Scalable scheduling of updates in streaming data warehouses
Scalable scheduling of updates in streaming data warehousesScalable scheduling of updates in streaming data warehouses
Scalable scheduling of updates in streaming data warehouses
 
REAL TIME PROJECTS IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...
REAL TIME PROJECTS  IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...REAL TIME PROJECTS  IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...
REAL TIME PROJECTS IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...
 
BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)
 
Real-Time Data Warehouse Loading Methodology Ricardo Jorge S.docx
Real-Time Data Warehouse Loading Methodology Ricardo Jorge S.docxReal-Time Data Warehouse Loading Methodology Ricardo Jorge S.docx
Real-Time Data Warehouse Loading Methodology Ricardo Jorge S.docx
 
DW 101
DW 101DW 101
DW 101
 
What are the benefits of learning ETL Development and where to start learning...
What are the benefits of learning ETL Development and where to start learning...What are the benefits of learning ETL Development and where to start learning...
What are the benefits of learning ETL Development and where to start learning...
 
DATAWAREHOUSE MAIn under data mining for
DATAWAREHOUSE MAIn under data mining forDATAWAREHOUSE MAIn under data mining for
DATAWAREHOUSE MAIn under data mining for
 
Эволюция Big Data и Information Management. Reference Architecture.
Эволюция Big Data и Information Management. Reference Architecture.Эволюция Big Data и Information Management. Reference Architecture.
Эволюция Big Data и Information Management. Reference Architecture.
 
An Integrated ERP With Web Portal
An Integrated ERP With Web PortalAn Integrated ERP With Web Portal
An Integrated ERP With Web Portal
 
Copy of sec d (2)
Copy of sec d (2)Copy of sec d (2)
Copy of sec d (2)
 
Copy of sec d (2)
Copy of sec d (2)Copy of sec d (2)
Copy of sec d (2)
 
ETL VS ELT.pdf
ETL VS ELT.pdfETL VS ELT.pdf
ETL VS ELT.pdf
 
An Integrated ERP with Web Portal
An Integrated ERP with Web Portal An Integrated ERP with Web Portal
An Integrated ERP with Web Portal
 
Designing modern dw and data lake
Designing modern dw and data lakeDesigning modern dw and data lake
Designing modern dw and data lake
 
E05WAREH1.PPT
E05WAREH1.PPTE05WAREH1.PPT
E05WAREH1.PPT
 

More from ASHOK BHATLA

Smart Electric Meters - Role of Govt. in Technology Management
Smart Electric Meters - Role of Govt. in Technology ManagementSmart Electric Meters - Role of Govt. in Technology Management
Smart Electric Meters - Role of Govt. in Technology ManagementASHOK BHATLA
 
World innovation - Knowledge Competitiveness Index
World innovation - Knowledge Competitiveness IndexWorld innovation - Knowledge Competitiveness Index
World innovation - Knowledge Competitiveness IndexASHOK BHATLA
 
R&d management trending between india, china and us
R&d management   trending between india, china and usR&d management   trending between india, china and us
R&d management trending between india, china and usASHOK BHATLA
 
Data centers site selection mathematical model - may 2012
Data centers site selection   mathematical model - may 2012Data centers site selection   mathematical model - may 2012
Data centers site selection mathematical model - may 2012ASHOK BHATLA
 
Dc energy efficiency presentation for psu lecture - ashok bhatla - final
Dc energy efficiency presentation   for psu lecture - ashok bhatla - finalDc energy efficiency presentation   for psu lecture - ashok bhatla - final
Dc energy efficiency presentation for psu lecture - ashok bhatla - finalASHOK BHATLA
 
Solar lantern technology adoption model for indian villages - final
Solar lantern   technology adoption model for indian villages - finalSolar lantern   technology adoption model for indian villages - final
Solar lantern technology adoption model for indian villages - finalASHOK BHATLA
 
Emerging Technology Products for Indian Villages
Emerging Technology Products for Indian VillagesEmerging Technology Products for Indian Villages
Emerging Technology Products for Indian VillagesASHOK BHATLA
 

More from ASHOK BHATLA (8)

Smart Electric Meters - Role of Govt. in Technology Management
Smart Electric Meters - Role of Govt. in Technology ManagementSmart Electric Meters - Role of Govt. in Technology Management
Smart Electric Meters - Role of Govt. in Technology Management
 
World innovation - Knowledge Competitiveness Index
World innovation - Knowledge Competitiveness IndexWorld innovation - Knowledge Competitiveness Index
World innovation - Knowledge Competitiveness Index
 
R&d management trending between india, china and us
R&d management   trending between india, china and usR&d management   trending between india, china and us
R&d management trending between india, china and us
 
Ashok career map
Ashok career mapAshok career map
Ashok career map
 
Data centers site selection mathematical model - may 2012
Data centers site selection   mathematical model - may 2012Data centers site selection   mathematical model - may 2012
Data centers site selection mathematical model - may 2012
 
Dc energy efficiency presentation for psu lecture - ashok bhatla - final
Dc energy efficiency presentation   for psu lecture - ashok bhatla - finalDc energy efficiency presentation   for psu lecture - ashok bhatla - final
Dc energy efficiency presentation for psu lecture - ashok bhatla - final
 
Solar lantern technology adoption model for indian villages - final
Solar lantern   technology adoption model for indian villages - finalSolar lantern   technology adoption model for indian villages - final
Solar lantern technology adoption model for indian villages - final
 
Emerging Technology Products for Indian Villages
Emerging Technology Products for Indian VillagesEmerging Technology Products for Indian Villages
Emerging Technology Products for Indian Villages
 

Recently uploaded

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

Capacity Management of an ETL System

  • 1. Capacity Model of an ETL system Ashok Bhatla Email – ASHOK.BHATLA.WRITER@GMAIL.COM
  • 2. What is Business Intelligence? Business Intelligence (BI) is a combination of tools, processes and software which help a company to transform data into actionable knowledge, thereby allowing them to take faster and informed decisions in order to achieve their strategic goals. It’s all about providing right information to the management at the right time with the lowest possible cost. As we are drowning in data, but starving for knowledge, Business Intelligence has become the No. 1 priority for IT Managers today.
  • 3. What is ETL? ETL stands for Extract, Transform and Load. A transactional system is meant to be a high performance system so that users can get their work faster. Running some reports from a Transactional system makes it slower. Therefore, the concept of ETL gained popularity. In computing, Extract, Transform, and Load (ETL) refers to a process in database usage which involves the following steps Extracts data from outside sources. Transforms it to fit operational needs, which can include joining/reformatting some tables. Loads it into the end target (database, more specifically, operational data store, data mart, or data warehouse)
  • 4. Example of ETL OLTP Systems Cost Accounting System Payroll Data ETL – Joins, Transforms, Deletes etc. Load Data Sales Data Staged Data Purchasing Data EDW / Reporting Data
  • 5. What is Capacity Planning?  Capacity Planning is the process of identifying the current computing needs of a business application and to forecast the future computing needs based on the business plans.  In other words, it means what computing resources are needed to meet an application’s service level objectives over a period of time.  In today’s economic climate, business requirements can change rapidly depending upon an organization’s strategy and goals.  Therefore properly managed capacity plans should be able to take unforeseen requirements into account.  Capacity Planning can be either done in a very casual manner or very organized and disciplined methodologies can be used.  More data driven the capacity planning is, more accurate the results.
  • 6. Capacity Planning of an IT System Capacity planning needs to ensure that all Hardware (Disks, Memory, CPU, and Network), Software resources (User Licenses) and facilities are optimally used. Software Licenses, No. of Users Servers, Storage, Networking, CPU Data Center Space, Power, Cooling
  • 7. Capacity Planning We cannot manage something which we cannot measure. Avoid downtimes by reducing no of Incidents Achieve Performance Objectives established by business If no corrective action is taken based on measured data, then Capacity Planning is of no use Proactive Capacity Planning Reduce TCO for the ETL System Achieve optimal utilization of computing Resources
  • 8. Capacity Planning Steps Identify Service Level Objectives – know the requirements in business terms Analyze Current Capacity – Gather data about resource consumption, ideal times and peak usage Know the future business needs and plan for future capacity needs – How the IT systems will be able to handle increased load
  • 9. Strike a Balance As per Moore’s Law, IT is getting cheaper and faster every 18 months. But organizations cannot wait for next generation of technology to be available – as they need to take care of business. Performance Utilization Supply Demand Cost As per Parkinson’s Law, if you give more resources to customers, they will find ways to use more resources. IT managers cannot keep on giving unlimited resources to users. Resources
  • 10. Capacity Challenges for ETL Systems ETL jobs are of different types (Full Refresh and some Delta Refresh), process varying amounts of data and are scheduled at different frequencies. Therefore, there are always spikes and valleys of workload. SQL queries are simple and do not require parallelism. On the other hand in an ETL system, very large datasets and processed and Workloads are random in nature and not easy to predict. This makes it difficult to predict the resource requirement. An enterprise ETL system processes thousands of batch jobs on a daily basis. These Systems connect to large no. of data sources which reside on different platforms and may be on different networks across the WAN Different types of users have different peak usage requirements. They have different needs for Transaction times, Elapsed Times and Response Times
  • 11. Disks Capacity Issues – Engineers spending lots of time cleaning old stale data Over Capacity – Paid for extra compute Capacity, but not utilizing it Network Slowness Problems – Batch Jobs running slow sometimes. No. of User Licenses reaching limits.
  • 12. Analyse the Complete Picture User Needs Transaction Time Response Time Elapsed Time Throughput Time Data Usage Patterns Data Complexity (Type of SQL Queries or ETL Transformations) (Financial, Marketing or Factory Data) Business Terms Volume and Frequency of Data Loads User Profile (No. of Batch Jobs and GB of data processed) (Simple User or Advanced Data Miner) Storage ( SAN / NAS / Local Disks,) Processing Power(CPU, No. of Cores ) Technical Terms Network Bandwidth (Transfer Rate, Bytes Tx/Rx) Memory (Physical, Cache, Swap)
  • 13. Capacity Planning Tools Vectors of Measurement Availability Performance Throughput Utilization Quality Efficiency Simulation Accurate, but needs lots of time for setup Testing Costly, as another environment similar to Production is needed. Trending Can be done using Excel. Simple, but does not take non linear behavior into account Analytical Modeling More advanced, Faster and Accurate
  • 14. Data Collection No. of Subject Period ( WW or Month) Areas No. of ETL No. of Projects Batch Jobs Storage Consumption CPU Network Disk I/O Tx/Rx Bytes How do we collect Performance / Capacity Data? OS monitoring tools – even freeware like Nagios, kSar, SQLMon. PerfMon Data collected in SQL tables Data collected by Software used by the Storage Frames – gives Utilization, Capacity and Performance Data
  • 15. Capacity Model for ETL System ?? Examples of some metrics which can be developed o Average Run time for a Batch job o Average CPU for a Batch job o CPU Utilization /Subject Areas /Week o CPU Utilization / Project / Week o No. of Batch Jobs / GB of Storage o No. of Batch Jobs / X Amount of CPU
  • 16. Dashboard / Indicators Phase I Develop a Trending Model in the beginning Dashboards can be developed using Share Point BI if the Capacity Data is captured in an Excel Pivot Table or SQL Databases Phase II Can we develop a Predictive Model???