SlideShare una empresa de Scribd logo
1 de 32
Descargar para leer sin conexión
Grab some coffee and enjoy 
the pre-show banter before 
the top of the hour!
“Think Big: How to Design a Big Data Information Architecture” 
Exploratory Webcast | January 22, 2014
Guests 
Robin Bloor 
Chief Analyst, The Bloor Group 
@robinbloor robin.bloor@bloorgroup.com 
Eric Kavanagh 
CEO, The Bloor Group 
@eric_kavanagh eric.kavanagh@bloorgroup.com
Big Data Information Architecture 
Exploratory Webcast 
January 22, 2014 
Roundtable Webcast 
April 9, 2014 
Findings Webcast 
June 25, 2014 
#BigDataArch
Big 
Data 
Information 
Architecture
In Three Segments 
The Big Data Curve? 
Technology Disruption 
Data Flow 
PART 
ONE 
PART 
THREE 
PART 
TWO
Part 1: The Big Data Curve
The Visible “Big Data” Trend 
u Corporate data volumes 
grow at about 55% per 
annum - exponentially 
u Data has been growing 
at this rate for, maybe, 
40 years 
u There is nothing new 
about big data. It clings 
to an established 
exponential trend
The Invisible Trend: Moore’s Law Cubed 
u The biggest databases are new 
databases 
u They grow at the cube of 
Moore’s Law 
u Moore’s Law = 10x every 6 years 
u VLDB: 1000x every 6 years 
– 1991/2 megabytes 
– 1997/8 gigabytes 
– 2003/4 terabytes 
– 2009/10 petabytes 
– 2015/16 exabytes
Technology Evolution (Bloor Curve) 
Application 
Migration 
The Area Of 
As-Yet-Unrealized 
Applications 
Source: The Bloor Group
The Traditional Force of Disruption 
u Software architectures 
change: centralized, C/S, 
3 tier/web, SOA, etc. 
u Applications migrate 
according to latencies 
u Dominant applications 
and software brands can 
die via “The innovator’s 
dilemma” 
u Wholly new applications 
appear because of lower 
latencies, e.g., VMs, CEP 
Application 
Migration 
The Area Of 
As-Yet-Unrealized 
Applications 
Source: The Bloor Group
This Curve is Compromised 
Application 
Migration 
The Area Of 
As-Yet-Unrealized 
Applications 
Source: The Bloor Group 
Two DISRUPTIVE 
forces have changed 
the curve: 
PARALLELISM 
and 
The CLOUD
Big Data??? 
It’s not really about 
It’s about
Part 2: Technology Disruption
It’s Over for Spinning Disk 
u SSD is now on the 
Moore’s Law curve 
u Disk is not and never 
was (in respect of seek 
time) 
u All traditional databases 
were engineered for 
spinning disk and not 
for scale-out 
u This explains the new 
DBMS products…
In-Memory Disruption 
u Memory may gradually 
become the primary 
store for data (this 
impacts data flows) 
u Almost all applications 
are poorly built for 
this 
u Memory is an 
accelerator – as is CPU 
cache. This is 
becoming a factor
The Memory Cascade 
u On chip speed v RAM 
• L1(32K) = 100x 
• L2(246K) = 30x 
• L3(8-20Mb) = 8.6x 
u RAM v SSD 
• RAM = 300x 
u SSD v Disk 
• SSD = 10x 
Note: Vector instructions 
and data compression
Tech Revolutions 
TECH REVOLUTION ARCHITECTURE 
u Computer 
u On-line 
u PC 
u Internet 
u Mobile 
u Internet of things 
u Batch 
u Centralized 
u Client/server 
u Multi-tier 
u Service Orientation 
u Event Driven/Big 
Data
Event Driven/Big Data Architecture?
The Open Source Picture 
u The R Language 
• Over 1 million 
users 
u Hadoop and its 
Ecosystem 
• Reduced latency 
for analytics 
u Machine Learning 
Algorithms 
• Raw power 
None of these are engineered for performance
Part 3: Data Flow
What Is A Data Scientist? 
u Project manager 
u Qualified statistician 
u Domain Business 
expert 
u Experienced data 
architect 
u Software engineer 
(IT’S A TEAM)
A Process, Not an Activity 
u Data Analytics is a multi-disciplinary 
end-to-end 
process 
u Until recently it was a 
walled-garden. But 
recently the walls were 
torn down by… 
• Data availability 
• Scalable technology 
• Open source tools
The CRITICAL Workload Issue 
u Previously, we 
viewed database 
workloads as an i/o 
optimization problem 
u With analytics the 
workload is a very 
variable mix of i/o 
and calculation 
u No databases were 
built precisely for 
this – not even Big 
Data databases
Take Note 
You can know more 
about a BUSINESS from 
its data 
than by any other means
The Biological System 
u Our human control system 
works at different speeds: 
• Almost instant reflex 
• Swift response 
• Considered response 
u Organizations will 
gradually implement 
similar control systems 
u This suggests a data-flow-based 
architecture
The Corporate Biological System 
u Right now this division 
into two different data 
flows is already occurring 
u Currently we can 
distinguish between: 
• Real-time/Business time 
applications 
• Analytical applications 
u We should build specific 
architectures for this
Some Architectural Principles 
u The new atom of data 
is the event 
u SUSO, scale up before 
scale out 
u Take the processing 
to the data, if you 
can 
u Hadoop is a 
component not a 
solution
In Conclusion 
The Big Data Curve? 
Technology Disruption 
Data Flow 
PART 
ONE 
PART 
THREE 
PART 
TWO
Questions? 
#BigDataArch 
or 
USE THE Q&A
THANK 
YOU! 
REGISTER FOR BDIA WEBCASTS AT: 
http://insideanalysis.com/research/big-data-information-architecture

Más contenido relacionado

La actualidad más candente

The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
Romeo Kienzler
 

La actualidad más candente (17)

Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and more
 
Software team linkedin
Software team linkedinSoftware team linkedin
Software team linkedin
 
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
 
Data Science Popup Austin: Ubiquity and Trust Lead to Adoption
Data Science Popup Austin: Ubiquity and Trust Lead to Adoption Data Science Popup Austin: Ubiquity and Trust Lead to Adoption
Data Science Popup Austin: Ubiquity and Trust Lead to Adoption
 
Eecs6893 big dataanalytics-lecture1
Eecs6893 big dataanalytics-lecture1Eecs6893 big dataanalytics-lecture1
Eecs6893 big dataanalytics-lecture1
 
Open Source Tools for Big Data
Open Source Tools for Big DataOpen Source Tools for Big Data
Open Source Tools for Big Data
 
Structure Data 2014: INVERTING 80/20: BEYOND BESPOKE BIG DATA, Ari Gesher
Structure Data 2014: INVERTING 80/20: BEYOND BESPOKE BIG DATA, Ari GesherStructure Data 2014: INVERTING 80/20: BEYOND BESPOKE BIG DATA, Ari Gesher
Structure Data 2014: INVERTING 80/20: BEYOND BESPOKE BIG DATA, Ari Gesher
 
big data overview ppt
big data overview pptbig data overview ppt
big data overview ppt
 
SuanIct-Bigdata desktop-final
SuanIct-Bigdata desktop-finalSuanIct-Bigdata desktop-final
SuanIct-Bigdata desktop-final
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations Presentation
 
IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014
 
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
 
Data Driven - The Ancestry Journey - 12-10-14
Data Driven - The Ancestry Journey - 12-10-14Data Driven - The Ancestry Journey - 12-10-14
Data Driven - The Ancestry Journey - 12-10-14
 
Decision Ready Data: Power Your Analytics with Great Data
Decision Ready Data: Power Your Analytics with Great DataDecision Ready Data: Power Your Analytics with Great Data
Decision Ready Data: Power Your Analytics with Great Data
 
Fixing data science & Accelerating Artificial Super Intelligence Development
 Fixing data science & Accelerating Artificial Super Intelligence Development Fixing data science & Accelerating Artificial Super Intelligence Development
Fixing data science & Accelerating Artificial Super Intelligence Development
 
Big Data with IOT approach and trends with case study
Big Data with IOT approach and trends with case studyBig Data with IOT approach and trends with case study
Big Data with IOT approach and trends with case study
 

Destacado

Raising the Bar: Innovative Healthcare Program Fosters Collaboration, Education
Raising the Bar: Innovative Healthcare Program Fosters Collaboration, EducationRaising the Bar: Innovative Healthcare Program Fosters Collaboration, Education
Raising the Bar: Innovative Healthcare Program Fosters Collaboration, Education
Inside Analysis
 

Destacado (16)

A Tighter Weave – How YARN Changes the Data Quality Game
A Tighter Weave – How YARN Changes the Data Quality GameA Tighter Weave – How YARN Changes the Data Quality Game
A Tighter Weave – How YARN Changes the Data Quality Game
 
Raising the Bar: Innovative Healthcare Program Fosters Collaboration, Education
Raising the Bar: Innovative Healthcare Program Fosters Collaboration, EducationRaising the Bar: Innovative Healthcare Program Fosters Collaboration, Education
Raising the Bar: Innovative Healthcare Program Fosters Collaboration, Education
 
Enabling Flexible Governance for All Data Sources
Enabling Flexible Governance for All Data SourcesEnabling Flexible Governance for All Data Sources
Enabling Flexible Governance for All Data Sources
 
No Time-Outs: How to Empower Round-the-Clock Analytics
No Time-Outs: How to Empower Round-the-Clock AnalyticsNo Time-Outs: How to Empower Round-the-Clock Analytics
No Time-Outs: How to Empower Round-the-Clock Analytics
 
How Data Visualization Enhances the News
How Data Visualization Enhances the NewsHow Data Visualization Enhances the News
How Data Visualization Enhances the News
 
The Cloud Imperative – What, Why, When and How
The Cloud Imperative – What, Why, When and HowThe Cloud Imperative – What, Why, When and How
The Cloud Imperative – What, Why, When and How
 
Bridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the CloudBridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the Cloud
 
Down to Business: Taking Action Quickly with Linked Data Services
Down to Business: Taking Action Quickly with Linked Data ServicesDown to Business: Taking Action Quickly with Linked Data Services
Down to Business: Taking Action Quickly with Linked Data Services
 
Continuous Intelligence: Staying Ahead with Streaming Analytics
Continuous Intelligence: Staying Ahead with Streaming AnalyticsContinuous Intelligence: Staying Ahead with Streaming Analytics
Continuous Intelligence: Staying Ahead with Streaming Analytics
 
The Crown Jewels: Is Enterprise Data Ready for the Cloud?
The Crown Jewels: Is Enterprise Data Ready for the Cloud?The Crown Jewels: Is Enterprise Data Ready for the Cloud?
The Crown Jewels: Is Enterprise Data Ready for the Cloud?
 
Thinking Outside the Cube: How In-Memory Bolsters Analytics
Thinking Outside the Cube: How In-Memory Bolsters AnalyticsThinking Outside the Cube: How In-Memory Bolsters Analytics
Thinking Outside the Cube: How In-Memory Bolsters Analytics
 
Agents for Agility - The Just-in-Time Enterprise Has Arrived
Agents for Agility - The Just-in-Time Enterprise Has ArrivedAgents for Agility - The Just-in-Time Enterprise Has Arrived
Agents for Agility - The Just-in-Time Enterprise Has Arrived
 
All Grown Up: Maturation of Analytics in the Cloud
All Grown Up: Maturation of Analytics in the CloudAll Grown Up: Maturation of Analytics in the Cloud
All Grown Up: Maturation of Analytics in the Cloud
 
Database Revolution - Exploratory Webcast
Database Revolution - Exploratory WebcastDatabase Revolution - Exploratory Webcast
Database Revolution - Exploratory Webcast
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
At the Tipping Point: Considerations for Cloud BI in a Multi-platform BI Ente...
At the Tipping Point: Considerations for Cloud BI in a Multi-platform BI Ente...At the Tipping Point: Considerations for Cloud BI in a Multi-platform BI Ente...
At the Tipping Point: Considerations for Cloud BI in a Multi-platform BI Ente...
 

Similar a Think Big - How to Design a Big Data Information Architecture

TidalScale Overview
TidalScale OverviewTidalScale Overview
TidalScale Overview
Pete Jarvis
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
Srinath Perera
 
Lean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataLean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big Data
Stylight
 

Similar a Think Big - How to Design a Big Data Information Architecture (20)

BDIA Findings
BDIA FindingsBDIA Findings
BDIA Findings
 
The Central Hub: Defining the Data Lake
The Central Hub: Defining the Data LakeThe Central Hub: Defining the Data Lake
The Central Hub: Defining the Data Lake
 
Data Strategy in 2016
Data Strategy in 2016Data Strategy in 2016
Data Strategy in 2016
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
 
TidalScale Overview
TidalScale OverviewTidalScale Overview
TidalScale Overview
 
Big Data and Fast Data combined – is it possible?
Big Data and Fast Data combined – is it possible?Big Data and Fast Data combined – is it possible?
Big Data and Fast Data combined – is it possible?
 
Big Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-AriBig Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-Ari
 
Foundation for Success: How Big Data Fits in an Information Architecture
Foundation for Success: How Big Data Fits in an Information ArchitectureFoundation for Success: How Big Data Fits in an Information Architecture
Foundation for Success: How Big Data Fits in an Information Architecture
 
WWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big dataWWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big data
 
Data analytics introduction
Data analytics introductionData analytics introduction
Data analytics introduction
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
 
Cloud-Native Data: What data questions to ask when building cloud-native apps
Cloud-Native Data: What data questions to ask when building cloud-native appsCloud-Native Data: What data questions to ask when building cloud-native apps
Cloud-Native Data: What data questions to ask when building cloud-native apps
 
Lean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataLean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big Data
 
Cloudera Cares + DataKind | 7 May 2015 | London, UK
Cloudera Cares + DataKind | 7 May 2015 | London, UKCloudera Cares + DataKind | 7 May 2015 | London, UK
Cloudera Cares + DataKind | 7 May 2015 | London, UK
 
Effective Microservices In a Data-centric World
Effective Microservices In a Data-centric WorldEffective Microservices In a Data-centric World
Effective Microservices In a Data-centric World
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big Data
Big DataBig Data
Big Data
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Data Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackData Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data Stack
 

Más de Inside Analysis

Rethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldRethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile World
Inside Analysis
 

Más de Inside Analysis (20)

An Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BIAn Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BI
 
Agile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for SuccessAgile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for Success
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter Integration
 
Fit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data LetdownFit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data Letdown
 
To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security
 
The Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeThe Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On Time
 
Introducing: A Complete Algebra of Data
Introducing: A Complete Algebra of DataIntroducing: A Complete Algebra of Data
Introducing: A Complete Algebra of Data
 
The Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionThe Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop Adoption
 
Ahead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time AnalyticsAhead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time Analytics
 
All Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of EverythingAll Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of Everything
 
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLGoodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
 
The Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelThe Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global Level
 
Structurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your ArchitectureStructurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your Architecture
 
SQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the RiskSQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the Risk
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big Data
 
A Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data WarehouseA Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data Warehouse
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of Hadoop
 
Rethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldRethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile World
 
DisrupTech - Dave Duggal
DisrupTech - Dave DuggalDisrupTech - Dave Duggal
DisrupTech - Dave Duggal
 
Modus Operandi
Modus OperandiModus Operandi
Modus Operandi
 

Último

Último (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

Think Big - How to Design a Big Data Information Architecture

  • 1. Grab some coffee and enjoy the pre-show banter before the top of the hour!
  • 2. “Think Big: How to Design a Big Data Information Architecture” Exploratory Webcast | January 22, 2014
  • 3. Guests Robin Bloor Chief Analyst, The Bloor Group @robinbloor robin.bloor@bloorgroup.com Eric Kavanagh CEO, The Bloor Group @eric_kavanagh eric.kavanagh@bloorgroup.com
  • 4. Big Data Information Architecture Exploratory Webcast January 22, 2014 Roundtable Webcast April 9, 2014 Findings Webcast June 25, 2014 #BigDataArch
  • 5.
  • 6. Big Data Information Architecture
  • 7. In Three Segments The Big Data Curve? Technology Disruption Data Flow PART ONE PART THREE PART TWO
  • 8. Part 1: The Big Data Curve
  • 9. The Visible “Big Data” Trend u Corporate data volumes grow at about 55% per annum - exponentially u Data has been growing at this rate for, maybe, 40 years u There is nothing new about big data. It clings to an established exponential trend
  • 10. The Invisible Trend: Moore’s Law Cubed u The biggest databases are new databases u They grow at the cube of Moore’s Law u Moore’s Law = 10x every 6 years u VLDB: 1000x every 6 years – 1991/2 megabytes – 1997/8 gigabytes – 2003/4 terabytes – 2009/10 petabytes – 2015/16 exabytes
  • 11. Technology Evolution (Bloor Curve) Application Migration The Area Of As-Yet-Unrealized Applications Source: The Bloor Group
  • 12. The Traditional Force of Disruption u Software architectures change: centralized, C/S, 3 tier/web, SOA, etc. u Applications migrate according to latencies u Dominant applications and software brands can die via “The innovator’s dilemma” u Wholly new applications appear because of lower latencies, e.g., VMs, CEP Application Migration The Area Of As-Yet-Unrealized Applications Source: The Bloor Group
  • 13. This Curve is Compromised Application Migration The Area Of As-Yet-Unrealized Applications Source: The Bloor Group Two DISRUPTIVE forces have changed the curve: PARALLELISM and The CLOUD
  • 14. Big Data??? It’s not really about It’s about
  • 15. Part 2: Technology Disruption
  • 16. It’s Over for Spinning Disk u SSD is now on the Moore’s Law curve u Disk is not and never was (in respect of seek time) u All traditional databases were engineered for spinning disk and not for scale-out u This explains the new DBMS products…
  • 17. In-Memory Disruption u Memory may gradually become the primary store for data (this impacts data flows) u Almost all applications are poorly built for this u Memory is an accelerator – as is CPU cache. This is becoming a factor
  • 18. The Memory Cascade u On chip speed v RAM • L1(32K) = 100x • L2(246K) = 30x • L3(8-20Mb) = 8.6x u RAM v SSD • RAM = 300x u SSD v Disk • SSD = 10x Note: Vector instructions and data compression
  • 19. Tech Revolutions TECH REVOLUTION ARCHITECTURE u Computer u On-line u PC u Internet u Mobile u Internet of things u Batch u Centralized u Client/server u Multi-tier u Service Orientation u Event Driven/Big Data
  • 20. Event Driven/Big Data Architecture?
  • 21. The Open Source Picture u The R Language • Over 1 million users u Hadoop and its Ecosystem • Reduced latency for analytics u Machine Learning Algorithms • Raw power None of these are engineered for performance
  • 22. Part 3: Data Flow
  • 23. What Is A Data Scientist? u Project manager u Qualified statistician u Domain Business expert u Experienced data architect u Software engineer (IT’S A TEAM)
  • 24. A Process, Not an Activity u Data Analytics is a multi-disciplinary end-to-end process u Until recently it was a walled-garden. But recently the walls were torn down by… • Data availability • Scalable technology • Open source tools
  • 25. The CRITICAL Workload Issue u Previously, we viewed database workloads as an i/o optimization problem u With analytics the workload is a very variable mix of i/o and calculation u No databases were built precisely for this – not even Big Data databases
  • 26. Take Note You can know more about a BUSINESS from its data than by any other means
  • 27. The Biological System u Our human control system works at different speeds: • Almost instant reflex • Swift response • Considered response u Organizations will gradually implement similar control systems u This suggests a data-flow-based architecture
  • 28. The Corporate Biological System u Right now this division into two different data flows is already occurring u Currently we can distinguish between: • Real-time/Business time applications • Analytical applications u We should build specific architectures for this
  • 29. Some Architectural Principles u The new atom of data is the event u SUSO, scale up before scale out u Take the processing to the data, if you can u Hadoop is a component not a solution
  • 30. In Conclusion The Big Data Curve? Technology Disruption Data Flow PART ONE PART THREE PART TWO
  • 32. THANK YOU! REGISTER FOR BDIA WEBCASTS AT: http://insideanalysis.com/research/big-data-information-architecture