SlideShare una empresa de Scribd logo
1 de 12
Cloudera, Data Warehouse Optimisation 
Jérôme Campo, Systems Engineering 
MAY 2014
The Enterprise Data Warehouse 
SERVERS 
MARTS 
DW 
DOCUMENTS 
STORAGE 
SEARCH 
ARCHIVE 
ERP, CRM, RDBMS, MACHINES 
FILES, IMAGES, VIDEOS, LOGS, CLICKSTREAMS 
EXTERNAL DATA SOURCES 
Complex Architecture 
•Many special-purposesystems, silos of data 
•Moving data around 
•No complete views 
4 
Visibility 
•Leaving data behind 
•Risk and compliance 
•High cost of storage 
1 
Time to Data 
•Up-front modeling 
•Transforms slow 
•Transforms lose data 
2 
Cost of Analytics 
•Existing systems strained 
•No agility 
•BI backlog 
3
Cloudera for the Enterprise Data Hub 
Multi-workload analytic platform 
•Bring applications to data 
•Combine different workloads on common data (i.e. SQL + Search) 
•True BI agility 
4 
Active archive 
•Full fidelity original data 
•Indefinite time, any source 
•Lowest cost storage 
1 
Data management, transforms 
•One source of data for all analytics 
•Persist state of transformed data 
•Significantly faster & cheaper 
2 
Self-service exploratory BI 
•Simple search + BI tools 
•“Schema on read” agility 
•Reduce BI user backlog requests 
3 
SERVERS 
MARTS 
DW 
DOCUMENTS 
STORAGE 
SEARCH 
ARCHIVE 
ERP, CRM, RDBMS, MACHINES 
FILES, IMAGES, VIDEOS, LOGS, CLICKSTREAMS 
EXTERNAL DATA SOURCES
Cloudera for the Enterprise Data Hub
Cloudera for Data Warehouse optimisation
EDW optimisation: Active Archive 
6 
Archive datasets 
Infrequently accessed tables 
Large, corpus of data 
Frequency of data access 
Changing regulatory compliance requirements 
Data volume growth 
Data remains accessible 
Data is not lost 
1/10ththe cost 
What to Migrate 
Influencing Factors 
Better in Cloudera 
Reliability for mission-critical workloads: high availability, disaster recovery, downtime-less upgrades 
Low-latency SQL processing, ability to absorb short-cycle ELT 
Broad support of leading data integration tools 
Only Available with Cloudera 
Key Partners
EDW optimisation: Transformation 
7 
High-scale batch data processing 
Implemented as SQL + scripting or ETL running on expensive HW infrastructure 
Staging data stored across diverse, temp tables 
High fraction of overall EDW utilization (25 –80%) 
Difficult to store, manage staging data in relational form 
Limited user adoption risk to migrate 
ETL tools to simplify migration 
Over 2X the performance 
1/10ththe cost 
Persistent staging, 
tracked lineage 
What to Migrate 
Influencing Factors 
Better in Cloudera 
Reliability for mission-critical workloads: high availability, disaster recovery, downtime-less upgrades 
Low-latency SQL processing, ability to absorb short-cycle ELT 
Broad support of leading data integration tools 
Only Available with Cloudera 
Key Partners
EDW optimisation: Self Service BI 
8 
Self-Service BI, Exploratory BI, Data Discovery 
Uncertain business questions and uncertain data 
Fastest growing workload for many warehouses 
Comparable support for end user tools between Cloudera and DBMS products 
Schema flexibility 
End user self-service on full fidelity data 
1/10ththe cost 
Workload 
Migration Priority 
Better In Cloudera 
Open source parallel interactive SQL engine: Cloudera Impala 
Integration and certification of every leading SSBI vendor 
Only Available with Cloudera 
Key Partners
EDW optimisation: Multi-workload 
9 
Training & scoringpredictive models 
Deep and broad data sets, within and beyond the warehouse 
Statisticians want unconstrained analysis; limited DW compute resources 
Paying top dollar for warehouse data storage only to load into ML tools 
Inability to analyze data beyond the warehouse 
Greater user productivity(pre-packaged ML libraries, no more down-sampling) 
Support for 3rdparty ML tools 
Greater flexibility(SQL + MR + Search + Spark 
+ SAS procs) 
1/10ththe cost 
Workload and Data 
Influencing Factors 
Better in Cloudera 
Ability to run SAS, R natively on the same cluster 
Interactive search and SQL experience for data exploration 
Built-in analytics libraries (Mahout, DataFu, ClouderaML) Support from Cloudera’s Data Science team 
Only Available with Cloudera 
Key Partners
Why EDW optimisation? 
1.Lower costs of data management, allow growth 
2.Improve quality of service 
•Shorten ETL windows 
•Faster BI queries 
3.Extend existing warehouse capacity 
•Increase ROI from current investments 
•More operational data –volume and schemas 
•More business intelligence and analytics workloads 
4.Retain all data for more varied analysis 
5.Deliver a foundation for innovation 
•Bring more applications to Hadoop data for low incremental cost
Customers agree, Cloudera delivers 
Customer 
Workload 
Results 
Leading Payments Company 
Analytics, ETL Processing, DR 
Largest fraud discovery in firm history 
Time to report collapsedfrom 2 days => 2 hours 
Save $30M on DR 
Global Money Center Bank 
DataProcessing (ELT) 
Avoidedtens of millions in expansion purchases 
42% faster processing 
MobileDevice Manufacturer 
Data Processing (ELT) 
Offloaded 90% ofdata volume; keep all data 
Fortune500 Retailer 
Analytics 
Moreinsights by supporting more exploration of more extensive & granular data 
Leading Financial Regulator 
DataProcessing (ELT) and DR 
Shrank EDW footprint by 4PB, 20X perf. boost
BigDataBx #1 - Atelier 1 Cloudera Datawarehouse Optimisation

Más contenido relacionado

La actualidad más candente

Full stack monitoring across apps & infrastructure with Azure Monitor
Full stack monitoring across apps & infrastructure with Azure MonitorFull stack monitoring across apps & infrastructure with Azure Monitor
Full stack monitoring across apps & infrastructure with Azure MonitorSquared Up
 
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...Microsoft Tech Community
 
Azure SQL DB Managed Instances Built to easily modernize application data layer
Azure SQL DB Managed Instances Built to easily modernize application data layerAzure SQL DB Managed Instances Built to easily modernize application data layer
Azure SQL DB Managed Instances Built to easily modernize application data layerMicrosoft Tech Community
 
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Michael Rys
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's includedJames Serra
 
Sql pass summit
Sql pass summitSql pass summit
Sql pass summitDon Severs
 
Delta Lake with Azure Databricks
Delta Lake with Azure DatabricksDelta Lake with Azure Databricks
Delta Lake with Azure DatabricksDustin Vannoy
 
Azure SQL Database & Azure SQL Data Warehouse
Azure SQL Database & Azure SQL Data WarehouseAzure SQL Database & Azure SQL Data Warehouse
Azure SQL Database & Azure SQL Data WarehouseMohamed Tawfik
 
Building a Turbo-fast Data Warehousing Platform with Databricks
Building a Turbo-fast Data Warehousing Platform with DatabricksBuilding a Turbo-fast Data Warehousing Platform with Databricks
Building a Turbo-fast Data Warehousing Platform with DatabricksDatabricks
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategyJames Serra
 
Webinar - Introduction to Azure Data Lake
Webinar - Introduction to Azure Data LakeWebinar - Introduction to Azure Data Lake
Webinar - Introduction to Azure Data LakeJosh Lane
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure Antonios Chatzipavlis
 
Afternoons with Azure - Azure Data Services
Afternoons with Azure - Azure Data ServicesAfternoons with Azure - Azure Data Services
Afternoons with Azure - Azure Data ServicesCCG
 
Point of View to Accelerate with dev ops
Point of View to Accelerate with dev opsPoint of View to Accelerate with dev ops
Point of View to Accelerate with dev opsSanjay B. Bhakta
 
Azure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data LakeAzure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data LakeRick van den Bosch
 

La actualidad más candente (20)

Data warehousing
Data warehousingData warehousing
Data warehousing
 
Azure synapse by usama whaba khan
Azure synapse by usama whaba khanAzure synapse by usama whaba khan
Azure synapse by usama whaba khan
 
Full stack monitoring across apps & infrastructure with Azure Monitor
Full stack monitoring across apps & infrastructure with Azure MonitorFull stack monitoring across apps & infrastructure with Azure Monitor
Full stack monitoring across apps & infrastructure with Azure Monitor
 
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...
 
Azure SQL DB Managed Instances Built to easily modernize application data layer
Azure SQL DB Managed Instances Built to easily modernize application data layerAzure SQL DB Managed Instances Built to easily modernize application data layer
Azure SQL DB Managed Instances Built to easily modernize application data layer
 
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
Sql pass summit
Sql pass summitSql pass summit
Sql pass summit
 
Delta Lake with Azure Databricks
Delta Lake with Azure DatabricksDelta Lake with Azure Databricks
Delta Lake with Azure Databricks
 
Azure SQL Database & Azure SQL Data Warehouse
Azure SQL Database & Azure SQL Data WarehouseAzure SQL Database & Azure SQL Data Warehouse
Azure SQL Database & Azure SQL Data Warehouse
 
Building a Turbo-fast Data Warehousing Platform with Databricks
Building a Turbo-fast Data Warehousing Platform with DatabricksBuilding a Turbo-fast Data Warehousing Platform with Databricks
Building a Turbo-fast Data Warehousing Platform with Databricks
 
Synapse for mere mortals
Synapse for mere mortalsSynapse for mere mortals
Synapse for mere mortals
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
 
Webinar - Introduction to Azure Data Lake
Webinar - Introduction to Azure Data LakeWebinar - Introduction to Azure Data Lake
Webinar - Introduction to Azure Data Lake
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Afternoons with Azure - Azure Data Services
Afternoons with Azure - Azure Data ServicesAfternoons with Azure - Azure Data Services
Afternoons with Azure - Azure Data Services
 
Point of View to Accelerate with dev ops
Point of View to Accelerate with dev opsPoint of View to Accelerate with dev ops
Point of View to Accelerate with dev ops
 
Azure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data LakeAzure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data Lake
 

Destacado

La plateforme OpenData 3.0 pour libérer et valoriser les données
La plateforme OpenData 3.0 pour libérer et valoriser les données  La plateforme OpenData 3.0 pour libérer et valoriser les données
La plateforme OpenData 3.0 pour libérer et valoriser les données Excelerate Systems
 
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...Cloudera, Inc.
 
A poster version of HadoopXML
A poster version of HadoopXMLA poster version of HadoopXML
A poster version of HadoopXMLKyong-Ha Lee
 
XML Parsing with Map Reduce
XML Parsing with Map ReduceXML Parsing with Map Reduce
XML Parsing with Map ReduceEdureka!
 
Data Ingestion, Extraction & Parsing on Hadoop
Data Ingestion, Extraction & Parsing on HadoopData Ingestion, Extraction & Parsing on Hadoop
Data Ingestion, Extraction & Parsing on Hadoopskaluska
 
Efficient processing of large and complex XML documents in Hadoop
Efficient processing of large and complex XML documents in HadoopEfficient processing of large and complex XML documents in Hadoop
Efficient processing of large and complex XML documents in HadoopDataWorks Summit
 

Destacado (6)

La plateforme OpenData 3.0 pour libérer et valoriser les données
La plateforme OpenData 3.0 pour libérer et valoriser les données  La plateforme OpenData 3.0 pour libérer et valoriser les données
La plateforme OpenData 3.0 pour libérer et valoriser les données
 
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
 
A poster version of HadoopXML
A poster version of HadoopXMLA poster version of HadoopXML
A poster version of HadoopXML
 
XML Parsing with Map Reduce
XML Parsing with Map ReduceXML Parsing with Map Reduce
XML Parsing with Map Reduce
 
Data Ingestion, Extraction & Parsing on Hadoop
Data Ingestion, Extraction & Parsing on HadoopData Ingestion, Extraction & Parsing on Hadoop
Data Ingestion, Extraction & Parsing on Hadoop
 
Efficient processing of large and complex XML documents in Hadoop
Efficient processing of large and complex XML documents in HadoopEfficient processing of large and complex XML documents in Hadoop
Efficient processing of large and complex XML documents in Hadoop
 

Similar a BigDataBx #1 - Atelier 1 Cloudera Datawarehouse Optimisation

Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse OptimizationCloudera, Inc.
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...Cloudera, Inc.
 
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Cloudera, Inc.
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?James Serra
 
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...Cloudera, Inc.
 
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyEnterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyInside Analysis
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...DATAVERSITY
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesDATAVERSITY
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 
Oracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analyticsOracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analyticsjdijcks
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopDatameer
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure Antonios Chatzipavlis
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformCloudera, Inc.
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Martin Bém
 
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindDeliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindAvere Systems
 
How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?Vincent Terrasi
 
Big data and Analytics on AWS
Big data and Analytics on AWSBig data and Analytics on AWS
Big data and Analytics on AWS2nd Watch
 

Similar a BigDataBx #1 - Atelier 1 Cloudera Datawarehouse Optimisation (20)

Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse Optimization
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
 
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
 
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyEnterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Oracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analyticsOracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analytics
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & Hadoop
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
 
AWS Big Data Solution Days
AWS Big Data Solution DaysAWS Big Data Solution Days
AWS Big Data Solution Days
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindDeliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
 
How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?
 
Big data and Analytics on AWS
Big data and Analytics on AWSBig data and Analytics on AWS
Big data and Analytics on AWS
 

Más de Excelerate Systems

Sécurité Zéro Confiance - La Fin du Périmètre de Sécurité
Sécurité Zéro Confiance - La Fin du Périmètre de SécuritéSécurité Zéro Confiance - La Fin du Périmètre de Sécurité
Sécurité Zéro Confiance - La Fin du Périmètre de SécuritéExcelerate Systems
 
Zero Trust Security / Sécurité Zéro Confiance
Zero Trust Security / Sécurité Zéro ConfianceZero Trust Security / Sécurité Zéro Confiance
Zero Trust Security / Sécurité Zéro ConfianceExcelerate Systems
 
Vision-AI | the Next AI | the Next Disruption in Data Accuracy
Vision-AI | the Next AI | the Next Disruption in Data AccuracyVision-AI | the Next AI | the Next Disruption in Data Accuracy
Vision-AI | the Next AI | the Next Disruption in Data AccuracyExcelerate Systems
 
PECTORIS -|- LA CLINIQUE VIRTUELLE
PECTORIS -|- LA CLINIQUE VIRTUELLEPECTORIS -|- LA CLINIQUE VIRTUELLE
PECTORIS -|- LA CLINIQUE VIRTUELLEExcelerate Systems
 
E santé - Entrez dans l'ère du BigData
E santé - Entrez dans l'ère du BigDataE santé - Entrez dans l'ère du BigData
E santé - Entrez dans l'ère du BigDataExcelerate Systems
 
OpenData - BigData - OpenSource : l'inévitable convergence
OpenData - BigData - OpenSource : l'inévitable convergenceOpenData - BigData - OpenSource : l'inévitable convergence
OpenData - BigData - OpenSource : l'inévitable convergenceExcelerate Systems
 
BigDataBx #1 - Data Marketing, l'ère de l'intelligence numérique
BigDataBx #1 - Data Marketing, l'ère de l'intelligence numériqueBigDataBx #1 - Data Marketing, l'ère de l'intelligence numérique
BigDataBx #1 - Data Marketing, l'ère de l'intelligence numériqueExcelerate Systems
 
BigDataBx #1 - BigData et Protection de Données Privées
BigDataBx #1 - BigData et Protection de Données PrivéesBigDataBx #1 - BigData et Protection de Données Privées
BigDataBx #1 - BigData et Protection de Données PrivéesExcelerate Systems
 
#BigDataBx 1 - Présentation de la BI au BigData - Solocal Group
#BigDataBx 1 - Présentation de la BI au BigData - Solocal Group#BigDataBx 1 - Présentation de la BI au BigData - Solocal Group
#BigDataBx 1 - Présentation de la BI au BigData - Solocal GroupExcelerate Systems
 
BigDataBx #1 - Journée BigData à la CCI de Bordeaux
BigDataBx #1 - Journée BigData à la CCI de BordeauxBigDataBx #1 - Journée BigData à la CCI de Bordeaux
BigDataBx #1 - Journée BigData à la CCI de BordeauxExcelerate Systems
 
Enterprise Data Hub - La Clé de la Transformation de la Gestion de Données d'...
Enterprise Data Hub - La Clé de la Transformation de la Gestion de Données d'...Enterprise Data Hub - La Clé de la Transformation de la Gestion de Données d'...
Enterprise Data Hub - La Clé de la Transformation de la Gestion de Données d'...Excelerate Systems
 
BigData & Cloud @ Excelerate Systems France
BigData & Cloud @ Excelerate Systems FranceBigData & Cloud @ Excelerate Systems France
BigData & Cloud @ Excelerate Systems FranceExcelerate Systems
 
BigData en France par Excelerate Systems
BigData en France par Excelerate Systems BigData en France par Excelerate Systems
BigData en France par Excelerate Systems Excelerate Systems
 

Más de Excelerate Systems (18)

Sécurité Zéro Confiance
Sécurité Zéro ConfianceSécurité Zéro Confiance
Sécurité Zéro Confiance
 
Sécurité Zéro Confiance - La Fin du Périmètre de Sécurité
Sécurité Zéro Confiance - La Fin du Périmètre de SécuritéSécurité Zéro Confiance - La Fin du Périmètre de Sécurité
Sécurité Zéro Confiance - La Fin du Périmètre de Sécurité
 
Zero Trust Security / Sécurité Zéro Confiance
Zero Trust Security / Sécurité Zéro ConfianceZero Trust Security / Sécurité Zéro Confiance
Zero Trust Security / Sécurité Zéro Confiance
 
Vision-AI | the Next AI | the Next Disruption in Data Accuracy
Vision-AI | the Next AI | the Next Disruption in Data AccuracyVision-AI | the Next AI | the Next Disruption in Data Accuracy
Vision-AI | the Next AI | the Next Disruption in Data Accuracy
 
Plateforme DATA HUB / API
Plateforme DATA HUB / APIPlateforme DATA HUB / API
Plateforme DATA HUB / API
 
PECTORIS -|- LA CLINIQUE VIRTUELLE
PECTORIS -|- LA CLINIQUE VIRTUELLEPECTORIS -|- LA CLINIQUE VIRTUELLE
PECTORIS -|- LA CLINIQUE VIRTUELLE
 
Le Net pour Tou(te)s
Le Net pour Tou(te)sLe Net pour Tou(te)s
Le Net pour Tou(te)s
 
E santé - Entrez dans l'ère du BigData
E santé - Entrez dans l'ère du BigDataE santé - Entrez dans l'ère du BigData
E santé - Entrez dans l'ère du BigData
 
OpenData - BigData - OpenSource : l'inévitable convergence
OpenData - BigData - OpenSource : l'inévitable convergenceOpenData - BigData - OpenSource : l'inévitable convergence
OpenData - BigData - OpenSource : l'inévitable convergence
 
BigDataBx #1 - Data Marketing, l'ère de l'intelligence numérique
BigDataBx #1 - Data Marketing, l'ère de l'intelligence numériqueBigDataBx #1 - Data Marketing, l'ère de l'intelligence numérique
BigDataBx #1 - Data Marketing, l'ère de l'intelligence numérique
 
BigDataBx #1 - BigData et Protection de Données Privées
BigDataBx #1 - BigData et Protection de Données PrivéesBigDataBx #1 - BigData et Protection de Données Privées
BigDataBx #1 - BigData et Protection de Données Privées
 
#BigDataBx 1 - Présentation de la BI au BigData - Solocal Group
#BigDataBx 1 - Présentation de la BI au BigData - Solocal Group#BigDataBx 1 - Présentation de la BI au BigData - Solocal Group
#BigDataBx 1 - Présentation de la BI au BigData - Solocal Group
 
BigDataBx #1 - Journée BigData à la CCI de Bordeaux
BigDataBx #1 - Journée BigData à la CCI de BordeauxBigDataBx #1 - Journée BigData à la CCI de Bordeaux
BigDataBx #1 - Journée BigData à la CCI de Bordeaux
 
BigData on change d'ère !
BigData on change d'ère ! BigData on change d'ère !
BigData on change d'ère !
 
Enterprise Data Hub - La Clé de la Transformation de la Gestion de Données d'...
Enterprise Data Hub - La Clé de la Transformation de la Gestion de Données d'...Enterprise Data Hub - La Clé de la Transformation de la Gestion de Données d'...
Enterprise Data Hub - La Clé de la Transformation de la Gestion de Données d'...
 
BigData BigBuzz @ Le Node
BigData BigBuzz @ Le Node BigData BigBuzz @ Le Node
BigData BigBuzz @ Le Node
 
BigData & Cloud @ Excelerate Systems France
BigData & Cloud @ Excelerate Systems FranceBigData & Cloud @ Excelerate Systems France
BigData & Cloud @ Excelerate Systems France
 
BigData en France par Excelerate Systems
BigData en France par Excelerate Systems BigData en France par Excelerate Systems
BigData en France par Excelerate Systems
 

Último

DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etclalithasri22
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformationAnnie Melnic
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are successPratikSingh115843
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfPratikPatil591646
 

Último (17)

DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etc
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdf
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformation
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are success
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdf
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdf
 

BigDataBx #1 - Atelier 1 Cloudera Datawarehouse Optimisation

  • 1. Cloudera, Data Warehouse Optimisation Jérôme Campo, Systems Engineering MAY 2014
  • 2. The Enterprise Data Warehouse SERVERS MARTS DW DOCUMENTS STORAGE SEARCH ARCHIVE ERP, CRM, RDBMS, MACHINES FILES, IMAGES, VIDEOS, LOGS, CLICKSTREAMS EXTERNAL DATA SOURCES Complex Architecture •Many special-purposesystems, silos of data •Moving data around •No complete views 4 Visibility •Leaving data behind •Risk and compliance •High cost of storage 1 Time to Data •Up-front modeling •Transforms slow •Transforms lose data 2 Cost of Analytics •Existing systems strained •No agility •BI backlog 3
  • 3. Cloudera for the Enterprise Data Hub Multi-workload analytic platform •Bring applications to data •Combine different workloads on common data (i.e. SQL + Search) •True BI agility 4 Active archive •Full fidelity original data •Indefinite time, any source •Lowest cost storage 1 Data management, transforms •One source of data for all analytics •Persist state of transformed data •Significantly faster & cheaper 2 Self-service exploratory BI •Simple search + BI tools •“Schema on read” agility •Reduce BI user backlog requests 3 SERVERS MARTS DW DOCUMENTS STORAGE SEARCH ARCHIVE ERP, CRM, RDBMS, MACHINES FILES, IMAGES, VIDEOS, LOGS, CLICKSTREAMS EXTERNAL DATA SOURCES
  • 4. Cloudera for the Enterprise Data Hub
  • 5. Cloudera for Data Warehouse optimisation
  • 6. EDW optimisation: Active Archive 6 Archive datasets Infrequently accessed tables Large, corpus of data Frequency of data access Changing regulatory compliance requirements Data volume growth Data remains accessible Data is not lost 1/10ththe cost What to Migrate Influencing Factors Better in Cloudera Reliability for mission-critical workloads: high availability, disaster recovery, downtime-less upgrades Low-latency SQL processing, ability to absorb short-cycle ELT Broad support of leading data integration tools Only Available with Cloudera Key Partners
  • 7. EDW optimisation: Transformation 7 High-scale batch data processing Implemented as SQL + scripting or ETL running on expensive HW infrastructure Staging data stored across diverse, temp tables High fraction of overall EDW utilization (25 –80%) Difficult to store, manage staging data in relational form Limited user adoption risk to migrate ETL tools to simplify migration Over 2X the performance 1/10ththe cost Persistent staging, tracked lineage What to Migrate Influencing Factors Better in Cloudera Reliability for mission-critical workloads: high availability, disaster recovery, downtime-less upgrades Low-latency SQL processing, ability to absorb short-cycle ELT Broad support of leading data integration tools Only Available with Cloudera Key Partners
  • 8. EDW optimisation: Self Service BI 8 Self-Service BI, Exploratory BI, Data Discovery Uncertain business questions and uncertain data Fastest growing workload for many warehouses Comparable support for end user tools between Cloudera and DBMS products Schema flexibility End user self-service on full fidelity data 1/10ththe cost Workload Migration Priority Better In Cloudera Open source parallel interactive SQL engine: Cloudera Impala Integration and certification of every leading SSBI vendor Only Available with Cloudera Key Partners
  • 9. EDW optimisation: Multi-workload 9 Training & scoringpredictive models Deep and broad data sets, within and beyond the warehouse Statisticians want unconstrained analysis; limited DW compute resources Paying top dollar for warehouse data storage only to load into ML tools Inability to analyze data beyond the warehouse Greater user productivity(pre-packaged ML libraries, no more down-sampling) Support for 3rdparty ML tools Greater flexibility(SQL + MR + Search + Spark + SAS procs) 1/10ththe cost Workload and Data Influencing Factors Better in Cloudera Ability to run SAS, R natively on the same cluster Interactive search and SQL experience for data exploration Built-in analytics libraries (Mahout, DataFu, ClouderaML) Support from Cloudera’s Data Science team Only Available with Cloudera Key Partners
  • 10. Why EDW optimisation? 1.Lower costs of data management, allow growth 2.Improve quality of service •Shorten ETL windows •Faster BI queries 3.Extend existing warehouse capacity •Increase ROI from current investments •More operational data –volume and schemas •More business intelligence and analytics workloads 4.Retain all data for more varied analysis 5.Deliver a foundation for innovation •Bring more applications to Hadoop data for low incremental cost
  • 11. Customers agree, Cloudera delivers Customer Workload Results Leading Payments Company Analytics, ETL Processing, DR Largest fraud discovery in firm history Time to report collapsedfrom 2 days => 2 hours Save $30M on DR Global Money Center Bank DataProcessing (ELT) Avoidedtens of millions in expansion purchases 42% faster processing MobileDevice Manufacturer Data Processing (ELT) Offloaded 90% ofdata volume; keep all data Fortune500 Retailer Analytics Moreinsights by supporting more exploration of more extensive & granular data Leading Financial Regulator DataProcessing (ELT) and DR Shrank EDW footprint by 4PB, 20X perf. boost