SlideShare una empresa de Scribd logo
1 de 16
Descargar para leer sin conexión
Bramhope
a modern data
architecture
for BI
© the DataShed Limited 2015
a (not so) long time ago in a galaxy far, far away…
The most complex system you had to handle was an old AS/400
Publishing data weekly was adequate for most users
Analysts didn’t really exist
Most queries took hours to run – but that was ok
© the DataShed Limited 2015
what the hell happened?
© the DataShed Limited 2015
data explosion
Source(s): CSC: http://www.csc.com/insights/flxwd/78931-big_data_universe_beginning_to_explode Gartner: http://www.gartner.com/technology/research/it-spending-forecast/
Growth between
2010 and 2020:
Data: 500%
Budget: 16%
$3.0
$4.0
$5.0
0
5
10
15
20
25
30
35
40
2010 2012 2013 2014 2015 2016 2017 2018 2020
GlobalITBudget($Trillion)
GlobalDataVolume(Zettabytes)
Data Growth IT Budget Growth Expon. (Data Growth) Expon. (IT Budget Growth)
© the DataShed Limited 2015
help is at hand…
© the DataShed Limited 2015
Hadoop & Big Data tools
First incarnation in 2005
Highly-scalable data processing, based on a distributed file system (HDFS)
Ability to handle PB size workloads
Becoming more mature – including:
ANSI-SQL compliant data warehousing tools (Hive & Stinger.next)
Batch processing (Map Reduce/Tez, Pig)
Operations management (Ambari)
Security (Knox)
Governance (HCatalog)
© the DataShed Limited 2015
…or is it?
© the DataShed Limited 2015
too small?
Small cluster: 5 – 50 nodes
Assuming a single node:
24GB RAM
Single socket quad core
4-6 2 TB SATA drives
Total storage ≈ 10 TB
How many of us need to process 10TB of data?
Source: http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.2/bk_cluster-planning-guide/content/conclusion.html
© the DataShed Limited 2015
if not Hadoop, then what?
Big Data has driven innovation in both technology and tools.
If you can’t adopt the tools, you can still adopt some of the principles:
Design for scale-out, rather than up.
ELT vs ETL
Lambda data architecture
… amongst other things!
© the DataShed Limited 2015
prepare to scale out
Data Storage
Data Integration
Data Marts & Cubes
Business Intelligence Apps
Executives: Dashboards
Managers & Stakeholders: Reports
Business/Data Analysts: Cubes & Direct Access
Specific, small data
marts
CRM, ERP, Transactional System
ETL Tools (SSIS,
Informatica, Scripting,
Oracle Data Integrator)
Data Integration
Data Marts & Cubes
ETL Tools (SSIS,
Informatica, Scripting,
Oracle Data Integrator)
Data Integration
Data Marts & Cubes
ETL Tools (SSIS,
Informatica, Scripting,
Oracle Data Integrator)
Specific, small data
marts
Specific, small data
marts
Data Storage
Data Integration
Data Marts & Cubes
Business Intelligence Apps
Executives: Dashboards
Managers & Stakeholders: Reports
Business/Data Analysts: Cubes & Direct Access
Data marts constructed on top of an EDW
Cubes present views of this data to business users
ETL Tools (SSIS, Informatica, Scripting, Oracle Data Integrator)
CRM, ERP, Transactional System
© the DataShed Limited 2015
ELT vs ETL
Key considerations:
Metadata & data lineage
How real-time is real-time?
How long does it take you to get data to analysts?
How powerful is your presentation server?
Can you use both?
vs
Schema on read? Or schema on write?
© the DataShed Limited 2015
lambda data architecture
© the DataShed Limited 2015
…most importantly, think
differently
© the DataShed Limited 2015
You don’t need big data to use Big Data tools
Example: Prediction.io (http://prediction.io/)
Open Source Machine Learning Server, utilizes Hadoop, HBase, Spark and ElasticSearch
© the DataShed Limited 2015
any questions?
© the DataShed Limited 2015
ed thewlis
tech director – the DataShed
@edthewlis
ed@thedatashed.co.uk
© the DataShed Limited 2015

Más contenido relacionado

La actualidad más candente

Journey to Cloud Analytics
Journey to Cloud Analytics Journey to Cloud Analytics
Journey to Cloud Analytics Datavail
 
Analyst Keynote: Forrester: Data Fabric Strategy is Vital for Business Innova...
Analyst Keynote: Forrester: Data Fabric Strategy is Vital for Business Innova...Analyst Keynote: Forrester: Data Fabric Strategy is Vital for Business Innova...
Analyst Keynote: Forrester: Data Fabric Strategy is Vital for Business Innova...Denodo
 
Solution Centric Architectural Presentation - Implementing a Logical Data War...
Solution Centric Architectural Presentation - Implementing a Logical Data War...Solution Centric Architectural Presentation - Implementing a Logical Data War...
Solution Centric Architectural Presentation - Implementing a Logical Data War...Denodo
 
Dell hans timmerman v1.1
Dell hans timmerman v1.1Dell hans timmerman v1.1
Dell hans timmerman v1.1BigDataExpo
 
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)A Successful Data Strategy for Insurers in Volatile Times (ASEAN)
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)Denodo
 
Data Services and the Modern Data Ecosystem (Middle East)
Data Services and the Modern Data Ecosystem (Middle East)Data Services and the Modern Data Ecosystem (Middle East)
Data Services and the Modern Data Ecosystem (Middle East)Denodo
 
Fixing data science & Accelerating Artificial Super Intelligence Development
 Fixing data science & Accelerating Artificial Super Intelligence Development Fixing data science & Accelerating Artificial Super Intelligence Development
Fixing data science & Accelerating Artificial Super Intelligence DevelopmentManojKumarR41
 
Data Virtualization at Logitech = #Winning
Data Virtualization at Logitech = #WinningData Virtualization at Logitech = #Winning
Data Virtualization at Logitech = #WinningDenodo
 
Data Democratization for Faster Decision-making and Business Agility (ASEAN)
Data Democratization for Faster Decision-making and Business Agility (ASEAN)Data Democratization for Faster Decision-making and Business Agility (ASEAN)
Data Democratization for Faster Decision-making and Business Agility (ASEAN)Denodo
 
Data Virtualization - Enabling Next Generation Analytics
Data Virtualization - Enabling Next Generation AnalyticsData Virtualization - Enabling Next Generation Analytics
Data Virtualization - Enabling Next Generation AnalyticsDenodo
 
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...Denodo
 
Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonCapgemini
 
2020 Big Data & Analytics Maturity Survey Results
2020 Big Data & Analytics Maturity Survey Results2020 Big Data & Analytics Maturity Survey Results
2020 Big Data & Analytics Maturity Survey ResultsAtScale
 
Denodo DataFest 2016: The Governed Data Lake – Putting Big Data to Work
Denodo DataFest 2016: The Governed Data Lake – Putting Big Data to WorkDenodo DataFest 2016: The Governed Data Lake – Putting Big Data to Work
Denodo DataFest 2016: The Governed Data Lake – Putting Big Data to WorkDenodo
 
Getting down to business on Big Data analytics
Getting down to business on Big Data analyticsGetting down to business on Big Data analytics
Getting down to business on Big Data analyticsThe Marketing Distillery
 
Introduction to Modern Data Virtualization 2021 (APAC)
Introduction to Modern Data Virtualization 2021 (APAC)Introduction to Modern Data Virtualization 2021 (APAC)
Introduction to Modern Data Virtualization 2021 (APAC)Denodo
 
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)Denodo
 
Pieter den Hamer Alliander
Pieter den Hamer Alliander Pieter den Hamer Alliander
Pieter den Hamer Alliander BigDataExpo
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphDataStax
 

La actualidad más candente (20)

Journey to Cloud Analytics
Journey to Cloud Analytics Journey to Cloud Analytics
Journey to Cloud Analytics
 
Analyst Keynote: Forrester: Data Fabric Strategy is Vital for Business Innova...
Analyst Keynote: Forrester: Data Fabric Strategy is Vital for Business Innova...Analyst Keynote: Forrester: Data Fabric Strategy is Vital for Business Innova...
Analyst Keynote: Forrester: Data Fabric Strategy is Vital for Business Innova...
 
Solution Centric Architectural Presentation - Implementing a Logical Data War...
Solution Centric Architectural Presentation - Implementing a Logical Data War...Solution Centric Architectural Presentation - Implementing a Logical Data War...
Solution Centric Architectural Presentation - Implementing a Logical Data War...
 
[XConf Brasil 2020] Data mesh
[XConf Brasil 2020] Data mesh[XConf Brasil 2020] Data mesh
[XConf Brasil 2020] Data mesh
 
Dell hans timmerman v1.1
Dell hans timmerman v1.1Dell hans timmerman v1.1
Dell hans timmerman v1.1
 
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)A Successful Data Strategy for Insurers in Volatile Times (ASEAN)
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)
 
Data Services and the Modern Data Ecosystem (Middle East)
Data Services and the Modern Data Ecosystem (Middle East)Data Services and the Modern Data Ecosystem (Middle East)
Data Services and the Modern Data Ecosystem (Middle East)
 
Fixing data science & Accelerating Artificial Super Intelligence Development
 Fixing data science & Accelerating Artificial Super Intelligence Development Fixing data science & Accelerating Artificial Super Intelligence Development
Fixing data science & Accelerating Artificial Super Intelligence Development
 
Data Virtualization at Logitech = #Winning
Data Virtualization at Logitech = #WinningData Virtualization at Logitech = #Winning
Data Virtualization at Logitech = #Winning
 
Data Democratization for Faster Decision-making and Business Agility (ASEAN)
Data Democratization for Faster Decision-making and Business Agility (ASEAN)Data Democratization for Faster Decision-making and Business Agility (ASEAN)
Data Democratization for Faster Decision-making and Business Agility (ASEAN)
 
Data Virtualization - Enabling Next Generation Analytics
Data Virtualization - Enabling Next Generation AnalyticsData Virtualization - Enabling Next Generation Analytics
Data Virtualization - Enabling Next Generation Analytics
 
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
 
Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A Comparison
 
2020 Big Data & Analytics Maturity Survey Results
2020 Big Data & Analytics Maturity Survey Results2020 Big Data & Analytics Maturity Survey Results
2020 Big Data & Analytics Maturity Survey Results
 
Denodo DataFest 2016: The Governed Data Lake – Putting Big Data to Work
Denodo DataFest 2016: The Governed Data Lake – Putting Big Data to WorkDenodo DataFest 2016: The Governed Data Lake – Putting Big Data to Work
Denodo DataFest 2016: The Governed Data Lake – Putting Big Data to Work
 
Getting down to business on Big Data analytics
Getting down to business on Big Data analyticsGetting down to business on Big Data analytics
Getting down to business on Big Data analytics
 
Introduction to Modern Data Virtualization 2021 (APAC)
Introduction to Modern Data Virtualization 2021 (APAC)Introduction to Modern Data Virtualization 2021 (APAC)
Introduction to Modern Data Virtualization 2021 (APAC)
 
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
 
Pieter den Hamer Alliander
Pieter den Hamer Alliander Pieter den Hamer Alliander
Pieter den Hamer Alliander
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise Graph
 

Destacado

Why Big Data Analytics Needs Business Intelligence Too
Why Big Data Analytics Needs Business Intelligence Too Why Big Data Analytics Needs Business Intelligence Too
Why Big Data Analytics Needs Business Intelligence Too Barry Devlin
 
Business unIntelligence, Chapter 5
Business unIntelligence, Chapter 5Business unIntelligence, Chapter 5
Business unIntelligence, Chapter 5Barry Devlin
 
Business unIntelligence - a Whistle Stop Tour
Business unIntelligence - a Whistle Stop TourBusiness unIntelligence - a Whistle Stop Tour
Business unIntelligence - a Whistle Stop TourBarry Devlin
 
How big data is transforming BI
How big data is transforming BIHow big data is transforming BI
How big data is transforming BIDeZyre
 
Three signs your architecture is too small for big data. Camp IT December 2014
Three signs your architecture is too small for big data.  Camp IT December 2014Three signs your architecture is too small for big data.  Camp IT December 2014
Three signs your architecture is too small for big data. Camp IT December 2014Craig Jordan
 

Destacado (6)

Why Big Data Analytics Needs Business Intelligence Too
Why Big Data Analytics Needs Business Intelligence Too Why Big Data Analytics Needs Business Intelligence Too
Why Big Data Analytics Needs Business Intelligence Too
 
Business unIntelligence, Chapter 5
Business unIntelligence, Chapter 5Business unIntelligence, Chapter 5
Business unIntelligence, Chapter 5
 
Business unIntelligence - a Whistle Stop Tour
Business unIntelligence - a Whistle Stop TourBusiness unIntelligence - a Whistle Stop Tour
Business unIntelligence - a Whistle Stop Tour
 
Etl elt simplified
Etl elt simplifiedEtl elt simplified
Etl elt simplified
 
How big data is transforming BI
How big data is transforming BIHow big data is transforming BI
How big data is transforming BI
 
Three signs your architecture is too small for big data. Camp IT December 2014
Three signs your architecture is too small for big data.  Camp IT December 2014Three signs your architecture is too small for big data.  Camp IT December 2014
Three signs your architecture is too small for big data. Camp IT December 2014
 

Similar a Modern Data Architecture

How to Merge the Data Lake and the Data Warehouse: The Power of a Unified Ana...
How to Merge the Data Lake and the Data Warehouse: The Power of a Unified Ana...How to Merge the Data Lake and the Data Warehouse: The Power of a Unified Ana...
How to Merge the Data Lake and the Data Warehouse: The Power of a Unified Ana...Enterprise Management Associates
 
Exploiting Data Lakes: Architecture, Capabilities & Future
Exploiting Data Lakes: Architecture, Capabilities & FutureExploiting Data Lakes: Architecture, Capabilities & Future
Exploiting Data Lakes: Architecture, Capabilities & FutureAgilisium Consulting
 
Building a marketing data lake
Building a marketing data lakeBuilding a marketing data lake
Building a marketing data lakeSumit Sarkar
 
Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019mark madsen
 
A Connected Data Landscape: Virtualization and the Internet of Things
A Connected Data Landscape: Virtualization and the Internet of ThingsA Connected Data Landscape: Virtualization and the Internet of Things
A Connected Data Landscape: Virtualization and the Internet of ThingsInside Analysis
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationDenodo
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopInside Analysis
 
Dataiku & Snowflake Meetup Berlin 2020
Dataiku & Snowflake Meetup Berlin 2020Dataiku & Snowflake Meetup Berlin 2020
Dataiku & Snowflake Meetup Berlin 2020Harald Erb
 
Visualisation and forecasting on IT capacity planning data
Visualisation and forecasting on IT capacity planning dataVisualisation and forecasting on IT capacity planning data
Visualisation and forecasting on IT capacity planning dataAndrew Gadsby
 
The State of Streaming Analytics: The Need for Speed and Scale
The State of Streaming Analytics: The Need for Speed and ScaleThe State of Streaming Analytics: The Need for Speed and Scale
The State of Streaming Analytics: The Need for Speed and ScaleVoltDB
 
SendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data WarehousingSendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data WarehousingAmazon Web Services
 
Modernize your Infrastructure and Mobilize Your Data
Modernize your Infrastructure and Mobilize Your DataModernize your Infrastructure and Mobilize Your Data
Modernize your Infrastructure and Mobilize Your DataPrecisely
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Denodo
 
Big data an elephant business opportunities
Big data an elephant   business opportunitiesBig data an elephant   business opportunities
Big data an elephant business opportunitiesBigdata Meetup Kochi
 
Delivering Analytics at The Speed of Transactions with Data Fabric
Delivering Analytics at The Speed of Transactions with Data FabricDelivering Analytics at The Speed of Transactions with Data Fabric
Delivering Analytics at The Speed of Transactions with Data FabricDenodo
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoptionHortonworks
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Denodo
 
Succeeding with Analytics: Mastering People, Process, and Technology
Succeeding with Analytics: Mastering People, Process, and TechnologySucceeding with Analytics: Mastering People, Process, and Technology
Succeeding with Analytics: Mastering People, Process, and Technologyibi
 

Similar a Modern Data Architecture (20)

How to Merge the Data Lake and the Data Warehouse: The Power of a Unified Ana...
How to Merge the Data Lake and the Data Warehouse: The Power of a Unified Ana...How to Merge the Data Lake and the Data Warehouse: The Power of a Unified Ana...
How to Merge the Data Lake and the Data Warehouse: The Power of a Unified Ana...
 
Exploiting Data Lakes: Architecture, Capabilities & Future
Exploiting Data Lakes: Architecture, Capabilities & FutureExploiting Data Lakes: Architecture, Capabilities & Future
Exploiting Data Lakes: Architecture, Capabilities & Future
 
Building a marketing data lake
Building a marketing data lakeBuilding a marketing data lake
Building a marketing data lake
 
Business Insights ICT
Business Insights ICTBusiness Insights ICT
Business Insights ICT
 
Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019
 
A Connected Data Landscape: Virtualization and the Internet of Things
A Connected Data Landscape: Virtualization and the Internet of ThingsA Connected Data Landscape: Virtualization and the Internet of Things
A Connected Data Landscape: Virtualization and the Internet of Things
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and Visualization
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of Hadoop
 
Dataiku & Snowflake Meetup Berlin 2020
Dataiku & Snowflake Meetup Berlin 2020Dataiku & Snowflake Meetup Berlin 2020
Dataiku & Snowflake Meetup Berlin 2020
 
Visualisation and forecasting on IT capacity planning data
Visualisation and forecasting on IT capacity planning dataVisualisation and forecasting on IT capacity planning data
Visualisation and forecasting on IT capacity planning data
 
The State of Streaming Analytics: The Need for Speed and Scale
The State of Streaming Analytics: The Need for Speed and ScaleThe State of Streaming Analytics: The Need for Speed and Scale
The State of Streaming Analytics: The Need for Speed and Scale
 
SendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data WarehousingSendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data Warehousing
 
Modernize your Infrastructure and Mobilize Your Data
Modernize your Infrastructure and Mobilize Your DataModernize your Infrastructure and Mobilize Your Data
Modernize your Infrastructure and Mobilize Your Data
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
 
Big data an elephant business opportunities
Big data an elephant   business opportunitiesBig data an elephant   business opportunities
Big data an elephant business opportunities
 
Delivering Analytics at The Speed of Transactions with Data Fabric
Delivering Analytics at The Speed of Transactions with Data FabricDelivering Analytics at The Speed of Transactions with Data Fabric
Delivering Analytics at The Speed of Transactions with Data Fabric
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)
 
Succeeding with Analytics: Mastering People, Process, and Technology
Succeeding with Analytics: Mastering People, Process, and TechnologySucceeding with Analytics: Mastering People, Process, and Technology
Succeeding with Analytics: Mastering People, Process, and Technology
 

Último

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 

Último (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 

Modern Data Architecture

  • 1. Bramhope a modern data architecture for BI © the DataShed Limited 2015
  • 2. a (not so) long time ago in a galaxy far, far away… The most complex system you had to handle was an old AS/400 Publishing data weekly was adequate for most users Analysts didn’t really exist Most queries took hours to run – but that was ok © the DataShed Limited 2015
  • 3. what the hell happened? © the DataShed Limited 2015
  • 4. data explosion Source(s): CSC: http://www.csc.com/insights/flxwd/78931-big_data_universe_beginning_to_explode Gartner: http://www.gartner.com/technology/research/it-spending-forecast/ Growth between 2010 and 2020: Data: 500% Budget: 16% $3.0 $4.0 $5.0 0 5 10 15 20 25 30 35 40 2010 2012 2013 2014 2015 2016 2017 2018 2020 GlobalITBudget($Trillion) GlobalDataVolume(Zettabytes) Data Growth IT Budget Growth Expon. (Data Growth) Expon. (IT Budget Growth) © the DataShed Limited 2015
  • 5. help is at hand… © the DataShed Limited 2015
  • 6. Hadoop & Big Data tools First incarnation in 2005 Highly-scalable data processing, based on a distributed file system (HDFS) Ability to handle PB size workloads Becoming more mature – including: ANSI-SQL compliant data warehousing tools (Hive & Stinger.next) Batch processing (Map Reduce/Tez, Pig) Operations management (Ambari) Security (Knox) Governance (HCatalog) © the DataShed Limited 2015
  • 7. …or is it? © the DataShed Limited 2015
  • 8. too small? Small cluster: 5 – 50 nodes Assuming a single node: 24GB RAM Single socket quad core 4-6 2 TB SATA drives Total storage ≈ 10 TB How many of us need to process 10TB of data? Source: http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.2/bk_cluster-planning-guide/content/conclusion.html © the DataShed Limited 2015
  • 9. if not Hadoop, then what? Big Data has driven innovation in both technology and tools. If you can’t adopt the tools, you can still adopt some of the principles: Design for scale-out, rather than up. ELT vs ETL Lambda data architecture … amongst other things! © the DataShed Limited 2015
  • 10. prepare to scale out Data Storage Data Integration Data Marts & Cubes Business Intelligence Apps Executives: Dashboards Managers & Stakeholders: Reports Business/Data Analysts: Cubes & Direct Access Specific, small data marts CRM, ERP, Transactional System ETL Tools (SSIS, Informatica, Scripting, Oracle Data Integrator) Data Integration Data Marts & Cubes ETL Tools (SSIS, Informatica, Scripting, Oracle Data Integrator) Data Integration Data Marts & Cubes ETL Tools (SSIS, Informatica, Scripting, Oracle Data Integrator) Specific, small data marts Specific, small data marts Data Storage Data Integration Data Marts & Cubes Business Intelligence Apps Executives: Dashboards Managers & Stakeholders: Reports Business/Data Analysts: Cubes & Direct Access Data marts constructed on top of an EDW Cubes present views of this data to business users ETL Tools (SSIS, Informatica, Scripting, Oracle Data Integrator) CRM, ERP, Transactional System © the DataShed Limited 2015
  • 11. ELT vs ETL Key considerations: Metadata & data lineage How real-time is real-time? How long does it take you to get data to analysts? How powerful is your presentation server? Can you use both? vs Schema on read? Or schema on write? © the DataShed Limited 2015
  • 12. lambda data architecture © the DataShed Limited 2015
  • 13. …most importantly, think differently © the DataShed Limited 2015
  • 14. You don’t need big data to use Big Data tools Example: Prediction.io (http://prediction.io/) Open Source Machine Learning Server, utilizes Hadoop, HBase, Spark and ElasticSearch © the DataShed Limited 2015
  • 15. any questions? © the DataShed Limited 2015
  • 16. ed thewlis tech director – the DataShed @edthewlis ed@thedatashed.co.uk © the DataShed Limited 2015