SlideShare a Scribd company logo
1 of 21
Data Platform
Architecture Principles
and Evaluation Criteria
Pooja Kelgaonkar,
Senior Data Architect at Rackspace Technology
Pooja Kelgaonkar
■ Senior Data Architect - GCP, Snowflake
■ Specialist in Data Modernization Implementations
■ Expertise in “Data” domain
■ Learner, Tech Blogger, Tech evangelist
■ Reading, listening Indian classical music
■ Architecture Principles
■ Data Modernization
■ Data Platform Offerings
■ Evaluation Criteria
■ Sample Use Case - Evaluation Comparison
Agenda
3
Data Architecture
Principles
Framework Pillars
Operational Excellence
05
● Serviceability - Easy Operations & Maint
● Maintenance - Data Pipeline Maintenance
● Reduced Ops Activities & Cost
Efficiency
04 ● Performance Efficiency
● Cost Efficiency - Cost Optimized
Availability
03
● Reliability
● Resiliency of System
● Availability - System Time UP
Scalability
02
● Horizontal Scaling
● Vertical Scaling
● Auto Scaling
Security
01
● Access Management & Controls
● Data Protection - Encryption , Data Masking
● Compliance - ISO, HIPPA , PCI DSS
● Data Governance
5
■ Cloud Migration / Adoption - 5Rs of transformation
■ Rehost , Refactor , Revise , Rebuild and Replace
Data Modernization Journey
Data Discovery
Analysis of existing Data Architecture,
System Design and evaluating the need,
requirements of new data system
Data Architecture & Assessment
Designing new data platform, assessment of data
modelling, Data Governance and Security
Data Architecture &
Engineering
Data Platform implementation, Data
Pipeline development and enhancement
POCs. Designing end to end cycle.
Go Live & DataOps
Soft launch/ early cut off to integrate with
other systems and signing off from
business users. Implementing operations
of new platform and modified pipelines
Data Migration & Pipeline
Development/Conversion
Actual pipeline development, conversion on
new platform. Implementing , testing and
validating pipelines/data on new platform.
05
01
02 03
04
05
01
02 03
04
6
Design Framework Pillars & Considerations
7
Teams
Architects Engineering Operations
Who?
When? How?
Business Drive
Technology Drive
Management &
Engagement Drive
What?
End User SLAs Assessment
& SLA Setting
System Assessment &
Technology Evaluations
Signed Up Services vs Open Source vs
Hybrid Evaluations
Business Assessment
Technology Evangelist
Sign Up for Services
Business Teams
Data Platform
Offerings
There are various offerings to implement Data Platform by Public Cloud
providers for DB / DW / Data Lake / Data Mesh / SQL / NoSQL etc.
Cloud Native
● AWS Glue
● EMR
● Kinesis, Opensearch
● RDS , Aurora
● Redshift
● DynamoDB , DocumentDB
AWS
● Azure Data Factory
● HDInsight
● Azure Stream Analytics
● Azure SQL, Managed SQL
● SQL Server, PostgreSQL
● MariaDB, CosmosDB, Managed
Cassandra
Azure
● DataFlow, Data Fusion
● DataProc
● Pub/Sub, Stream
● Cloud SQL , Cloud Spanner
● BigQuery , BigLake
● Bigtable, Firestore, Memorystore
GCP
9
There are a variety of offerings to implement data platform and design data
pipelines using native and open source services.
Data on Cloud - Common Offerings
10
Data Platform -
Evaluation Criteria
(Assessment Phase)
Evaluation - Pre-Requisites
Evaluation
Criteria
Existing/Cross
Application Platform to
be Evaluated
New Platforms to be
Explored
Platform Offerings
Existing Support Tier/
Billing Plans
Platform Offerings
Probable Platform to be
Evaluated, Cost
Comparisons Done?
Managed/Native Services /
BYOL services
Existing System Licenses,
Integrators- BI, OPS tools
Managed/Native Services /
BYOL services
Existing System Licenses,
Integrators - BI, OPS tools
Specific Evaluation or Open
Evaluation to Select Best Fit
for Given Use Case
12
13
Evaluation - Inputs
Capex vs Opex
% of Data Scan vs
Processed
Compute vs
Storage
Utilization
Data Challenges
System
Challenges
Capex vs Opex
% Storage vs Scan vs Processed
Compute vs Storage Utilization
% Data Challenges
System Challenges
Evaluation - CheckPoints
1
Data Operations & Business
Critical Requirements
● Data Pipeline Management - Monitoring & Operations
● Business Requirements - 24X7 Monitoring vs SLAs
● Critical Applications - Availability & SLAs
3
Business Checkpoint
● Data Availability - SLAs
● End User Agreements
● Business Requirements - Specific to Tooling
● Existing Cost utilization
● Performance Ratio - Current vs Expected
● Modernization Drive
5 Data Platform Checkpoint
● Type of Data - Structured, Semi-Structured,
Unstructured
● Sources of Data - Files , DBs, ioTs, Devices,
APIs
● Consumers of Data - Users vs System
● Frequency of Data - Batch, RealTime
● Data Storage - Active vs Passive
● Data Modelling - Schema, Tables , DB
Objects
2
Data Analytics Checkpoint
● Data Analytics - BI Tooling
● Predictive Analytics - Algorithms, Tools,
Libraries used
● AI/ML Use Cases - Customer Facing vs
In-House
● Enterprise vs Cloud Native
4
Data Processing Checkpoint
● Target Systems Integrations
● Data Usage - Hot Data vs Cold Data
● Data Stored vs Data Processed vs Reads
● Data Pipelines - Batch vs Streaming
● Data Pipeline Complexity - S/M/C/VC
● Data Pipeline Scheduling - Tools , Cron jobs,
Native Schedulers, Event based
● ETL vs ELT Requirements
14
15
Evaluation - Metrics
Checkpoint Category Metrics
Data
Checkpoint
Data
Integrators
No of Sources
No of Target
No of Specific Systems
Total Storage Volume
Daily Delta Volume
Data
Modelling
Frequency of Schema
Evolution
No of Objects
% of NoSQL Objects
% of PL SQL Objects
Data
Processing
Data
Pipelines
No of S/M/C Jobs
No of External Functions
Integrated (Java/Python/SQL)
No of ETL Jobs (Tool Based)
No of Compute Intense Jobs
No of Storage Intense Jobs
Checkpoint Category Metrics
Business
Checkpoint
Operations No of Times SLA Challenged
No of End Users Affected
Reliability No of times Data compromised
No of DR activities
No of end users impacted
Performance
Efficiency
Total Batch Time
No of Times Batch SLA Impacted
No of End User Reports
No of End Users/Consumers
No of Poor Performing Reports/Queries
Cost
Utilization
Overall Billing ( Capex )
Total Operations, Maint cost
Data
Operations
Monitoring No of Support Team Members
No of Monitoring Dashboards
Data Analytics Analytics No of ML Jobs/Algorithms
ML Integrators
Data Platform -
Evaluation Use Case
Evaluation - Pre-Requisites
17
Evaluation
Criteria
Existing/Cross
Application Platform to
be Evaluated
New Platforms to be
Explored
Platform Offerings
Existing Support Tier/
Billing Plans
Platform Offerings
Probable Platform to be
Evaluated, Cost
Comparisons Done?
Managed/Native Services /
BYOL services
Existing System Licenses,
Integrators- BI, OPS tools
Managed/Native Services /
BYOL services
Existing System Licenses,
Integrators - BI, OPS tools
Specific Evaluation or Open
Evaluation to Select Best Fit
for Given Use Case
Evaluation - Inputs
■ Domain - Retail , DW - Teradata, ETL - DataStage
■ Platform - Recently Signed up for Google Cloud Platform
■ Data Platform - Evaluate GCP Services to Setup Data Warehouse Platform
■ DW Size - 120TB (70 TB Active + 50 TB Passive )
■ Daily Volume - 1TB ( 80% Batch + 20% Streaming )
■ Data - Structured & Semi-structured (JSON, XML)
■ Data Pipelines - Mostly ELT - Datastage to Teradata (landing layer), Teradata SQL to Transform Data
■ Data Analytics - Tableau Reports - Customer Reports
■ Enterprise Scheduler - Control-M , Ticketing Tool - JIRA , Alerting via Slack, Email
■ Monitoring Dashboards , 24X7 Support Team
18
DW - Google BigQuery vs Azure Synapse
BigQuery Synapse Observations
● Supports More Than 90% of Requirements
● SaaS Offering , Cloud Managed
● Very Well Integrated
1 Data Platform
Checkpoint
● Native Drivers to Support Batch & Stream
● Highest Data Processing Speed
● Storage vs Compute - Scaling In and Out
● Automatic Scaling, Performance Efficient
2 Data Processing
Checkpoint
● Can Be Integrated With Any BI Tools
● Support AI/ML Libraries and Jobs
● Performance Efficient - Data Processing , Scanning
3 Data Analytics
Checkpoint
● Customized Logging & Monitoring
● Native vs Customized Dashboards
● Integration With Various Alerting, Messaging Tools
5 Data Operations
● High Availability
● Automatic Failover , No DR Required
● Performance & Cost Efficient
● Pay as You Go vs Commitment Comparison Based on
Overall Usage
4 Business Checkpoint
19
Evaluation - Final Report
Approach 1 Approach 2
DW BigQuery BigQuery
ETL + ELT Pipelines
Modify DS jobs to use BQ connector to load data to BQ
landing layer
Convert DS load jobs to BQ load jobs to pull data from source
and load to BQ
(this is depending on types of source systems and integration
complexity)
Data Storage
Store active data in BQ native tables with roll up policies
and store passive datasets on GCS layer depending on
usage of tables. External tables can be built on GCS
datasets.
Store active data in BQ native tables with roll up policies and
store passive datasets on GCS layer depending on usage of
tables.External tables can be built on GCS datasets.
Data Analytics Tableau connections can be replaced with BQ connections Tableau connections can be replaced with BQ connections
Data Pipeline Scheduler &
Maint
Control-M can be used to trigger the pipelines,
Orchestration can be done using Composer. Existing
ticketing tools, alerting tools can be used as is
Control-M can be used to trigger the pipelines, Orchestration can
be done using Composer.Existing ticketing tools, alerting tools
can be used as is
BigQuery is opted here post evaluation which is completely based on the initial sign up to GCP as well as data storage % ratio between active and
passive storage. Azure Synapse can offer the same capabilities however choices here are business & enterprise driven.
20
Thank You
Stay in Touch
Pooja Kelgaonkar
poojakelgaonkar@gmail.com & pooja.kelgaonkar@rackspace.com
www.linkedin.com/in/poojakelgaonkar
poojakelgaonkar.medium.com

More Related Content

What's hot

Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureDatabricks
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
Databricks for Dummies
Databricks for DummiesDatabricks for Dummies
Databricks for DummiesRodney Joyce
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseDatabricks
 
Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPDatabricks
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?DATAVERSITY
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 
Modernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data PipelinesModernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data PipelinesCarole Gunst
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricCambridge Semantics
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationDenodo
 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation Brett VanderPlaats
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 DataWorks Summit
 
Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data EngineeringHarald Erb
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks DeltaDatabricks
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta LakeDatabricks
 
Actionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data ScienceActionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data ScienceHarald Erb
 

What's hot (20)

Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Databricks for Dummies
Databricks for DummiesDatabricks for Dummies
Databricks for Dummies
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCP
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Modernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data PipelinesModernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data Pipelines
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
 
Lakehouse in Azure
Lakehouse in AzureLakehouse in Azure
Lakehouse in Azure
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
 
Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data Engineering
 
Snowflake Datawarehouse Architecturing
Snowflake Datawarehouse ArchitecturingSnowflake Datawarehouse Architecturing
Snowflake Datawarehouse Architecturing
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
Actionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data ScienceActionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data Science
 

Similar to Data Platform Architecture Principles and Evaluation Criteria

Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Group
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...DataStax
 
Cloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow AnalysisCloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow AnalysisAlex Henthorn-Iwane
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataAshnikbiz
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Precisely
 
Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.Amazon Web Services
 
Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data AnalyticsAmazon Web Services
 
Big Data application - OSS / BSS
Big Data application - OSS / BSSBig Data application - OSS / BSS
Big Data application - OSS / BSSKeyur Thakore
 
rough-work.pptx
rough-work.pptxrough-work.pptx
rough-work.pptxsharpan
 
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014Jaroslav Gergic
 
IPC Data Analysis and Extraction
IPC Data Analysis and ExtractionIPC Data Analysis and Extraction
IPC Data Analysis and Extractionpzybrick
 
Acme data engineering case study
Acme data engineering case studyAcme data engineering case study
Acme data engineering case studyMukul Sood
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsCloudera, Inc.
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse OptimizationCloudera, Inc.
 
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT InfrastructuresOPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT InfrastructuresKangaroot
 

Similar to Data Platform Architecture Principles and Evaluation Criteria (20)

Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Hadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data WarehouseHadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data Warehouse
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
 
Cloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow AnalysisCloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow Analysis
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?
 
Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.
 
Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data Analytics
 
Big Data application - OSS / BSS
Big Data application - OSS / BSSBig Data application - OSS / BSS
Big Data application - OSS / BSS
 
rough-work.pptx
rough-work.pptxrough-work.pptx
rough-work.pptx
 
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
 
IPC Data Analysis and Extraction
IPC Data Analysis and ExtractionIPC Data Analysis and Extraction
IPC Data Analysis and Extraction
 
Acme data engineering case study
Acme data engineering case studyAcme data engineering case study
Acme data engineering case study
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
 
Database Freedom | AWS Floor28
Database Freedom | AWS Floor28Database Freedom | AWS Floor28
Database Freedom | AWS Floor28
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse Optimization
 
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT InfrastructuresOPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
 
Big Data + PeopleSoft = BIG WIN!
Big Data + PeopleSoft = BIG WIN!Big Data + PeopleSoft = BIG WIN!
Big Data + PeopleSoft = BIG WIN!
 

More from ScyllaDB

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLScyllaDB
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...ScyllaDB
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...ScyllaDB
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaScyllaDB
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityScyllaDB
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptxScyllaDB
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDBScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationScyllaDB
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsScyllaDB
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesScyllaDB
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsScyllaDB
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101ScyllaDB
 

More from ScyllaDB (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 

Recently uploaded

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 

Recently uploaded (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 

Data Platform Architecture Principles and Evaluation Criteria

  • 1. Data Platform Architecture Principles and Evaluation Criteria Pooja Kelgaonkar, Senior Data Architect at Rackspace Technology
  • 2. Pooja Kelgaonkar ■ Senior Data Architect - GCP, Snowflake ■ Specialist in Data Modernization Implementations ■ Expertise in “Data” domain ■ Learner, Tech Blogger, Tech evangelist ■ Reading, listening Indian classical music
  • 3. ■ Architecture Principles ■ Data Modernization ■ Data Platform Offerings ■ Evaluation Criteria ■ Sample Use Case - Evaluation Comparison Agenda 3
  • 5. Framework Pillars Operational Excellence 05 ● Serviceability - Easy Operations & Maint ● Maintenance - Data Pipeline Maintenance ● Reduced Ops Activities & Cost Efficiency 04 ● Performance Efficiency ● Cost Efficiency - Cost Optimized Availability 03 ● Reliability ● Resiliency of System ● Availability - System Time UP Scalability 02 ● Horizontal Scaling ● Vertical Scaling ● Auto Scaling Security 01 ● Access Management & Controls ● Data Protection - Encryption , Data Masking ● Compliance - ISO, HIPPA , PCI DSS ● Data Governance 5
  • 6. ■ Cloud Migration / Adoption - 5Rs of transformation ■ Rehost , Refactor , Revise , Rebuild and Replace Data Modernization Journey Data Discovery Analysis of existing Data Architecture, System Design and evaluating the need, requirements of new data system Data Architecture & Assessment Designing new data platform, assessment of data modelling, Data Governance and Security Data Architecture & Engineering Data Platform implementation, Data Pipeline development and enhancement POCs. Designing end to end cycle. Go Live & DataOps Soft launch/ early cut off to integrate with other systems and signing off from business users. Implementing operations of new platform and modified pipelines Data Migration & Pipeline Development/Conversion Actual pipeline development, conversion on new platform. Implementing , testing and validating pipelines/data on new platform. 05 01 02 03 04 05 01 02 03 04 6
  • 7. Design Framework Pillars & Considerations 7 Teams Architects Engineering Operations Who? When? How? Business Drive Technology Drive Management & Engagement Drive What? End User SLAs Assessment & SLA Setting System Assessment & Technology Evaluations Signed Up Services vs Open Source vs Hybrid Evaluations Business Assessment Technology Evangelist Sign Up for Services Business Teams
  • 9. There are various offerings to implement Data Platform by Public Cloud providers for DB / DW / Data Lake / Data Mesh / SQL / NoSQL etc. Cloud Native ● AWS Glue ● EMR ● Kinesis, Opensearch ● RDS , Aurora ● Redshift ● DynamoDB , DocumentDB AWS ● Azure Data Factory ● HDInsight ● Azure Stream Analytics ● Azure SQL, Managed SQL ● SQL Server, PostgreSQL ● MariaDB, CosmosDB, Managed Cassandra Azure ● DataFlow, Data Fusion ● DataProc ● Pub/Sub, Stream ● Cloud SQL , Cloud Spanner ● BigQuery , BigLake ● Bigtable, Firestore, Memorystore GCP 9
  • 10. There are a variety of offerings to implement data platform and design data pipelines using native and open source services. Data on Cloud - Common Offerings 10
  • 11. Data Platform - Evaluation Criteria (Assessment Phase)
  • 12. Evaluation - Pre-Requisites Evaluation Criteria Existing/Cross Application Platform to be Evaluated New Platforms to be Explored Platform Offerings Existing Support Tier/ Billing Plans Platform Offerings Probable Platform to be Evaluated, Cost Comparisons Done? Managed/Native Services / BYOL services Existing System Licenses, Integrators- BI, OPS tools Managed/Native Services / BYOL services Existing System Licenses, Integrators - BI, OPS tools Specific Evaluation or Open Evaluation to Select Best Fit for Given Use Case 12
  • 13. 13 Evaluation - Inputs Capex vs Opex % of Data Scan vs Processed Compute vs Storage Utilization Data Challenges System Challenges Capex vs Opex % Storage vs Scan vs Processed Compute vs Storage Utilization % Data Challenges System Challenges
  • 14. Evaluation - CheckPoints 1 Data Operations & Business Critical Requirements ● Data Pipeline Management - Monitoring & Operations ● Business Requirements - 24X7 Monitoring vs SLAs ● Critical Applications - Availability & SLAs 3 Business Checkpoint ● Data Availability - SLAs ● End User Agreements ● Business Requirements - Specific to Tooling ● Existing Cost utilization ● Performance Ratio - Current vs Expected ● Modernization Drive 5 Data Platform Checkpoint ● Type of Data - Structured, Semi-Structured, Unstructured ● Sources of Data - Files , DBs, ioTs, Devices, APIs ● Consumers of Data - Users vs System ● Frequency of Data - Batch, RealTime ● Data Storage - Active vs Passive ● Data Modelling - Schema, Tables , DB Objects 2 Data Analytics Checkpoint ● Data Analytics - BI Tooling ● Predictive Analytics - Algorithms, Tools, Libraries used ● AI/ML Use Cases - Customer Facing vs In-House ● Enterprise vs Cloud Native 4 Data Processing Checkpoint ● Target Systems Integrations ● Data Usage - Hot Data vs Cold Data ● Data Stored vs Data Processed vs Reads ● Data Pipelines - Batch vs Streaming ● Data Pipeline Complexity - S/M/C/VC ● Data Pipeline Scheduling - Tools , Cron jobs, Native Schedulers, Event based ● ETL vs ELT Requirements 14
  • 15. 15 Evaluation - Metrics Checkpoint Category Metrics Data Checkpoint Data Integrators No of Sources No of Target No of Specific Systems Total Storage Volume Daily Delta Volume Data Modelling Frequency of Schema Evolution No of Objects % of NoSQL Objects % of PL SQL Objects Data Processing Data Pipelines No of S/M/C Jobs No of External Functions Integrated (Java/Python/SQL) No of ETL Jobs (Tool Based) No of Compute Intense Jobs No of Storage Intense Jobs Checkpoint Category Metrics Business Checkpoint Operations No of Times SLA Challenged No of End Users Affected Reliability No of times Data compromised No of DR activities No of end users impacted Performance Efficiency Total Batch Time No of Times Batch SLA Impacted No of End User Reports No of End Users/Consumers No of Poor Performing Reports/Queries Cost Utilization Overall Billing ( Capex ) Total Operations, Maint cost Data Operations Monitoring No of Support Team Members No of Monitoring Dashboards Data Analytics Analytics No of ML Jobs/Algorithms ML Integrators
  • 17. Evaluation - Pre-Requisites 17 Evaluation Criteria Existing/Cross Application Platform to be Evaluated New Platforms to be Explored Platform Offerings Existing Support Tier/ Billing Plans Platform Offerings Probable Platform to be Evaluated, Cost Comparisons Done? Managed/Native Services / BYOL services Existing System Licenses, Integrators- BI, OPS tools Managed/Native Services / BYOL services Existing System Licenses, Integrators - BI, OPS tools Specific Evaluation or Open Evaluation to Select Best Fit for Given Use Case
  • 18. Evaluation - Inputs ■ Domain - Retail , DW - Teradata, ETL - DataStage ■ Platform - Recently Signed up for Google Cloud Platform ■ Data Platform - Evaluate GCP Services to Setup Data Warehouse Platform ■ DW Size - 120TB (70 TB Active + 50 TB Passive ) ■ Daily Volume - 1TB ( 80% Batch + 20% Streaming ) ■ Data - Structured & Semi-structured (JSON, XML) ■ Data Pipelines - Mostly ELT - Datastage to Teradata (landing layer), Teradata SQL to Transform Data ■ Data Analytics - Tableau Reports - Customer Reports ■ Enterprise Scheduler - Control-M , Ticketing Tool - JIRA , Alerting via Slack, Email ■ Monitoring Dashboards , 24X7 Support Team 18
  • 19. DW - Google BigQuery vs Azure Synapse BigQuery Synapse Observations ● Supports More Than 90% of Requirements ● SaaS Offering , Cloud Managed ● Very Well Integrated 1 Data Platform Checkpoint ● Native Drivers to Support Batch & Stream ● Highest Data Processing Speed ● Storage vs Compute - Scaling In and Out ● Automatic Scaling, Performance Efficient 2 Data Processing Checkpoint ● Can Be Integrated With Any BI Tools ● Support AI/ML Libraries and Jobs ● Performance Efficient - Data Processing , Scanning 3 Data Analytics Checkpoint ● Customized Logging & Monitoring ● Native vs Customized Dashboards ● Integration With Various Alerting, Messaging Tools 5 Data Operations ● High Availability ● Automatic Failover , No DR Required ● Performance & Cost Efficient ● Pay as You Go vs Commitment Comparison Based on Overall Usage 4 Business Checkpoint 19
  • 20. Evaluation - Final Report Approach 1 Approach 2 DW BigQuery BigQuery ETL + ELT Pipelines Modify DS jobs to use BQ connector to load data to BQ landing layer Convert DS load jobs to BQ load jobs to pull data from source and load to BQ (this is depending on types of source systems and integration complexity) Data Storage Store active data in BQ native tables with roll up policies and store passive datasets on GCS layer depending on usage of tables. External tables can be built on GCS datasets. Store active data in BQ native tables with roll up policies and store passive datasets on GCS layer depending on usage of tables.External tables can be built on GCS datasets. Data Analytics Tableau connections can be replaced with BQ connections Tableau connections can be replaced with BQ connections Data Pipeline Scheduler & Maint Control-M can be used to trigger the pipelines, Orchestration can be done using Composer. Existing ticketing tools, alerting tools can be used as is Control-M can be used to trigger the pipelines, Orchestration can be done using Composer.Existing ticketing tools, alerting tools can be used as is BigQuery is opted here post evaluation which is completely based on the initial sign up to GCP as well as data storage % ratio between active and passive storage. Azure Synapse can offer the same capabilities however choices here are business & enterprise driven. 20
  • 21. Thank You Stay in Touch Pooja Kelgaonkar poojakelgaonkar@gmail.com & pooja.kelgaonkar@rackspace.com www.linkedin.com/in/poojakelgaonkar poojakelgaonkar.medium.com