SlideShare una empresa de Scribd logo
1 de 26
BIG DATA SCIENCE 
Chandan Rajah [ @ChandanRajah ] 
“The price of light is far less than the cost of darkness”
COST SPEED 
BENEFITS OF BIG DATA 
AGILITY CAPABILITY
WHAT WHY 
Steps to the EPIPHANY 
WHERE 
DEMO
What is Big Data ? 
Big Data ≠ Data Volume 
Big Data = Crude Oil 
Think of data like ‘Crude Oil’ 
Big Data is about extracting ‘crude oil’; transporting it in ‘pipelines’; storing it in ‘mega tanks’
What is Data Science ? 
Data Science ≠ Statistical Analysis 
Data Science = Oil Refinery 
Data science is about ‘treating’ data; applying ‘science’ to the data; 
Refine the data ‘results’; and combine to form ‘insight’
Knowns, Unknowns & DIKUW FTW! 
known knowns 
we know we know 
known unknowns 
we know we don’t know 
unknown unknowns 
we don’t know we don’t know 
D 
DATA 
I 
INFORMATION 
K 
KNOWLEDGE 
W 
WISDOM 
U 
UNDERSTANDING 
PAST FUTURE 
Data Engineer Data Analyst Data Miner Data Scientist 
raw what how to why when 
numbers description experience cause & effect prediction 
letters context tested proven what’s best 
symbols relationship instruction 
known knowns 
known unknowns unknown unknowns 
signals reports programs models
Data Analytics to Data Discovery ? 
data you know 
data you don’t know 
questions you’re asking 
questions you’re not asking 
Data Analyst 
Data Scientist 
Data 
Analytics 
Data Discovery 
DATA MODELLING 
Y  F( X, random noise, parameters) 
ALGORITHMIC MODELLING 
Y  [ BLACK BOX ]  X
DIVIDE 
SCATTER 
Split Data in Block 
Replicate and Store 
Petabytes of Resilience 
CONQUER 
EXPLORE 
1000s of Parallel Threads 
Explore Every Path 
Machine Learning 
INSIGHT 
GATHER 
Real Time Action 
Periodic Dashboards 
Iterative Evolution 
What is the Big Idea ?
Divide = HDFS 
Name Node 
Client 1. Create Metadata 
2. Put Blocks 
1 2 3 
Control / Monitoring 
2 2 
1 1 
Data Nodes 
3 3 
WRITE 
Name Node 
Client 1. Get Metadata 
Control / Monitoring 
1 1 1 2 
2 
2 
4 3 3 3 
4 4 
2. Fetch Blocks 
Data Nodes 
READ
Conquer = MapReduce
Insight = Functional Paradigm
WHAT WHY 
Steps to the EPIPHANY 
WHERE 
DEMO
Why is Big Data needed ? 
VOLUME VELOCITY VARIETY 
Exponential growth; 2x in 2 yrs 
PB (1000 TB) is now common 
Event streams; never at rest 
640k GB per internet minute 
100s of data sources 
85% not in a table
Where in the Value Chain ? 
Generation Transport Knowledge Output Value 
BIG DATA SCIENCE 
Straddles all four Challenge Areas
WHAT WHY 
Steps to the EPIPHANY 
WHERE 
DEMO
Big Data Heat Map – Gartner 2012
Big Data Potential by Sector – McKinsey for USBLS, 2011
Big Data Investment by Industry – Gartner, 2012
Top Big Data Challenges – Gartner, 2012
Survey on Big Data Investments – IDG Survey, 2013
Survey on Main Drivers to Invest – IDG Survey, 2014
WHAT WHY 
Steps to the EPIPHANY 
WHERE 
DEMO
DEMO
COST SPEED 
RECAP OF BENEFITS 
AGILITY CAPABILITY
TIME VALUE OF DATA KNOWLEDGE IS POWER 
LAST WORDS OF WISDOM 
NOT ALL ROADS LEAD TO ROME 
I AM AN INDIVIDUAL
“The price of light is far less than the cost of darkness”

Más contenido relacionado

La actualidad más candente

Big Data Analysis for page ranking using map reduce concept
Big Data Analysis for page ranking using map reduce conceptBig Data Analysis for page ranking using map reduce concept
Big Data Analysis for page ranking using map reduce conceptVidhya Kumar
 
Big data analytics
Big data analyticsBig data analytics
Big data analyticsRavi Teja
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data miningEmran Hossain
 
5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance 5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance Qubole
 
What is a Data Scientist
What is a Data Scientist What is a Data Scientist
What is a Data Scientist Experian_US
 
Tools for Unstructured Data Analytics
Tools for Unstructured Data AnalyticsTools for Unstructured Data Analytics
Tools for Unstructured Data AnalyticsRavi Teja
 
"Demystifying Big Data by AIBDP.org
"Demystifying Big Data by AIBDP.org"Demystifying Big Data by AIBDP.org
"Demystifying Big Data by AIBDP.orgAIBDP
 
Introduction to Big Data
Introduction to Big Data Introduction to Big Data
Introduction to Big Data Srinath Perera
 
BIG DATA Analysis for page ranking using Map Reduce
BIG DATA Analysis for page ranking using Map ReduceBIG DATA Analysis for page ranking using Map Reduce
BIG DATA Analysis for page ranking using Map ReduceVidhya Kumar
 
XLDB Lightning Talk: Databases for an Engaged World: Requirements and Design...
XLDB Lightning Talk: Databases for an Engaged World: Requirements and Design...XLDB Lightning Talk: Databases for an Engaged World: Requirements and Design...
XLDB Lightning Talk: Databases for an Engaged World: Requirements and Design...Keshav Murthy
 
How to build a data science team 20115.03.13v6
How to build a data science team 20115.03.13v6How to build a data science team 20115.03.13v6
How to build a data science team 20115.03.13v6Zhihao Lin
 

La actualidad más candente (20)

Big Data Analysis for page ranking using map reduce concept
Big Data Analysis for page ranking using map reduce conceptBig Data Analysis for page ranking using map reduce concept
Big Data Analysis for page ranking using map reduce concept
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data mining
 
5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance 5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance
 
What is a Data Scientist
What is a Data Scientist What is a Data Scientist
What is a Data Scientist
 
Big data
Big dataBig data
Big data
 
De-Mystifying Big Data
De-Mystifying Big DataDe-Mystifying Big Data
De-Mystifying Big Data
 
Tools for Unstructured Data Analytics
Tools for Unstructured Data AnalyticsTools for Unstructured Data Analytics
Tools for Unstructured Data Analytics
 
Big data
Big dataBig data
Big data
 
"Demystifying Big Data by AIBDP.org
"Demystifying Big Data by AIBDP.org"Demystifying Big Data by AIBDP.org
"Demystifying Big Data by AIBDP.org
 
What is Big Data ?
What is Big Data ?What is Big Data ?
What is Big Data ?
 
Big data Introduction by Mohan
Big data Introduction by MohanBig data Introduction by Mohan
Big data Introduction by Mohan
 
Introduction to Big Data
Introduction to Big Data Introduction to Big Data
Introduction to Big Data
 
Exploring Big Data Analytics Tools
Exploring Big Data Analytics ToolsExploring Big Data Analytics Tools
Exploring Big Data Analytics Tools
 
BIG DATA Analysis for page ranking using Map Reduce
BIG DATA Analysis for page ranking using Map ReduceBIG DATA Analysis for page ranking using Map Reduce
BIG DATA Analysis for page ranking using Map Reduce
 
Thilga
ThilgaThilga
Thilga
 
Big Data & Data Mining
Big Data & Data MiningBig Data & Data Mining
Big Data & Data Mining
 
XLDB Lightning Talk: Databases for an Engaged World: Requirements and Design...
XLDB Lightning Talk: Databases for an Engaged World: Requirements and Design...XLDB Lightning Talk: Databases for an Engaged World: Requirements and Design...
XLDB Lightning Talk: Databases for an Engaged World: Requirements and Design...
 
AI and Applications
AI and ApplicationsAI and Applications
AI and Applications
 
How to build a data science team 20115.03.13v6
How to build a data science team 20115.03.13v6How to build a data science team 20115.03.13v6
How to build a data science team 20115.03.13v6
 

Similar a Big Data Science at the Digital Catapult

Steps to the Big Data Science Epiphany
Steps to the Big Data Science EpiphanySteps to the Big Data Science Epiphany
Steps to the Big Data Science EpiphanyChandan Rajah
 
Big Data Science: Intro and Benefits
Big Data Science: Intro and BenefitsBig Data Science: Intro and Benefits
Big Data Science: Intro and BenefitsChandan Rajah
 
From Info Science to Data Science & Smart Nation
From Info Science to Data Science & Smart Nation From Info Science to Data Science & Smart Nation
From Info Science to Data Science & Smart Nation CK Toh
 
An Overview of BigData
An Overview of BigDataAn Overview of BigData
An Overview of BigDataValarmathi V
 
BDA 2012 Big data why the big fuss?
BDA 2012 Big data why the big fuss?BDA 2012 Big data why the big fuss?
BDA 2012 Big data why the big fuss?Christopher Bradley
 
Project Management Careers in Data Science
Project Management Careers in Data ScienceProject Management Careers in Data Science
Project Management Careers in Data ScienceGanes Kesari
 
Intro to Data Science
Intro to Data ScienceIntro to Data Science
Intro to Data ScienceTJ Stalcup
 
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAIMAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAIBig Data Week
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?Caserta
 
What's the Value of Data Science for Organizations: Tips for Invincibility in...
What's the Value of Data Science for Organizations: Tips for Invincibility in...What's the Value of Data Science for Organizations: Tips for Invincibility in...
What's the Value of Data Science for Organizations: Tips for Invincibility in...Ganes Kesari
 
iTrain Malaysia: Data Science by Tarun Sukhani
iTrain Malaysia: Data Science by Tarun SukhaniiTrain Malaysia: Data Science by Tarun Sukhani
iTrain Malaysia: Data Science by Tarun SukhaniiTrain
 
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"MDS ap
 
Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science TJ Stalcup
 
The Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallThe Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallTrillium Software
 
Democratizing Big Data (Updated)
Democratizing Big Data (Updated)Democratizing Big Data (Updated)
Democratizing Big Data (Updated)Jeff Kelly
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Sciencesarith divakar
 
Democratizing Big Data
Democratizing Big DataDemocratizing Big Data
Democratizing Big DataJeff Kelly
 
Accelerate Digital Transformation with an Enterprise Big Data Fabric
Accelerate Digital Transformation with an Enterprise Big Data FabricAccelerate Digital Transformation with an Enterprise Big Data Fabric
Accelerate Digital Transformation with an Enterprise Big Data FabricCambridge Semantics
 
Data Scientists: Your Must-Have Business Investment
Data Scientists: Your Must-Have Business InvestmentData Scientists: Your Must-Have Business Investment
Data Scientists: Your Must-Have Business InvestmentKalido
 
Level Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentationLevel Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentationDoug Denton
 

Similar a Big Data Science at the Digital Catapult (20)

Steps to the Big Data Science Epiphany
Steps to the Big Data Science EpiphanySteps to the Big Data Science Epiphany
Steps to the Big Data Science Epiphany
 
Big Data Science: Intro and Benefits
Big Data Science: Intro and BenefitsBig Data Science: Intro and Benefits
Big Data Science: Intro and Benefits
 
From Info Science to Data Science & Smart Nation
From Info Science to Data Science & Smart Nation From Info Science to Data Science & Smart Nation
From Info Science to Data Science & Smart Nation
 
An Overview of BigData
An Overview of BigDataAn Overview of BigData
An Overview of BigData
 
BDA 2012 Big data why the big fuss?
BDA 2012 Big data why the big fuss?BDA 2012 Big data why the big fuss?
BDA 2012 Big data why the big fuss?
 
Project Management Careers in Data Science
Project Management Careers in Data ScienceProject Management Careers in Data Science
Project Management Careers in Data Science
 
Intro to Data Science
Intro to Data ScienceIntro to Data Science
Intro to Data Science
 
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAIMAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?
 
What's the Value of Data Science for Organizations: Tips for Invincibility in...
What's the Value of Data Science for Organizations: Tips for Invincibility in...What's the Value of Data Science for Organizations: Tips for Invincibility in...
What's the Value of Data Science for Organizations: Tips for Invincibility in...
 
iTrain Malaysia: Data Science by Tarun Sukhani
iTrain Malaysia: Data Science by Tarun SukhaniiTrain Malaysia: Data Science by Tarun Sukhani
iTrain Malaysia: Data Science by Tarun Sukhani
 
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
 
Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science
 
The Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallThe Bigger They Are The Harder They Fall
The Bigger They Are The Harder They Fall
 
Democratizing Big Data (Updated)
Democratizing Big Data (Updated)Democratizing Big Data (Updated)
Democratizing Big Data (Updated)
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
Democratizing Big Data
Democratizing Big DataDemocratizing Big Data
Democratizing Big Data
 
Accelerate Digital Transformation with an Enterprise Big Data Fabric
Accelerate Digital Transformation with an Enterprise Big Data FabricAccelerate Digital Transformation with an Enterprise Big Data Fabric
Accelerate Digital Transformation with an Enterprise Big Data Fabric
 
Data Scientists: Your Must-Have Business Investment
Data Scientists: Your Must-Have Business InvestmentData Scientists: Your Must-Have Business Investment
Data Scientists: Your Must-Have Business Investment
 
Level Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentationLevel Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentation
 

Más de Chandan Rajah

Business Change through Predictive Analytics
Business Change through Predictive AnalyticsBusiness Change through Predictive Analytics
Business Change through Predictive AnalyticsChandan Rajah
 
Business Change through Predictive Analytics
Business Change through Predictive AnalyticsBusiness Change through Predictive Analytics
Business Change through Predictive AnalyticsChandan Rajah
 
Data Disruption by Vertical Innovation
Data Disruption by Vertical InnovationData Disruption by Vertical Innovation
Data Disruption by Vertical InnovationChandan Rajah
 
Data Innovation in the UK
Data Innovation in the UKData Innovation in the UK
Data Innovation in the UKChandan Rajah
 
Data Disruption by Vertical Innovation in Media
Data Disruption by Vertical Innovation in MediaData Disruption by Vertical Innovation in Media
Data Disruption by Vertical Innovation in MediaChandan Rajah
 
Catalysing Sector Advantage
Catalysing Sector AdvantageCatalysing Sector Advantage
Catalysing Sector AdvantageChandan Rajah
 
Rise of the Machines
Rise of the MachinesRise of the Machines
Rise of the MachinesChandan Rajah
 
Health Innovation and the Digital Catapult
Health Innovation and the Digital CatapultHealth Innovation and the Digital Catapult
Health Innovation and the Digital CatapultChandan Rajah
 
Connected Farms ...and the Digital Catapult
Connected Farms ...and the Digital CatapultConnected Farms ...and the Digital Catapult
Connected Farms ...and the Digital CatapultChandan Rajah
 
Data Innovation in the Digital Economy
Data Innovation in the Digital EconomyData Innovation in the Digital Economy
Data Innovation in the Digital EconomyChandan Rajah
 
Disruptive Data in Future Care
Disruptive Data in Future CareDisruptive Data in Future Care
Disruptive Data in Future CareChandan Rajah
 
Data Warehouse to Data Science
Data Warehouse to Data ScienceData Warehouse to Data Science
Data Warehouse to Data ScienceChandan Rajah
 
Business Impact of Predictive Analytics
Business Impact of Predictive AnalyticsBusiness Impact of Predictive Analytics
Business Impact of Predictive AnalyticsChandan Rajah
 
Social Triangulation with Big Data
Social Triangulation with Big DataSocial Triangulation with Big Data
Social Triangulation with Big DataChandan Rajah
 
Big Data Science Challenges in Media
Big Data Science Challenges in MediaBig Data Science Challenges in Media
Big Data Science Challenges in MediaChandan Rajah
 

Más de Chandan Rajah (17)

Business Change through Predictive Analytics
Business Change through Predictive AnalyticsBusiness Change through Predictive Analytics
Business Change through Predictive Analytics
 
Business Change through Predictive Analytics
Business Change through Predictive AnalyticsBusiness Change through Predictive Analytics
Business Change through Predictive Analytics
 
Data Disruption by Vertical Innovation
Data Disruption by Vertical InnovationData Disruption by Vertical Innovation
Data Disruption by Vertical Innovation
 
Data Innovation in the UK
Data Innovation in the UKData Innovation in the UK
Data Innovation in the UK
 
Data Disruption by Vertical Innovation in Media
Data Disruption by Vertical Innovation in MediaData Disruption by Vertical Innovation in Media
Data Disruption by Vertical Innovation in Media
 
Catalysing Sector Advantage
Catalysing Sector AdvantageCatalysing Sector Advantage
Catalysing Sector Advantage
 
Rise of the Machines
Rise of the MachinesRise of the Machines
Rise of the Machines
 
Health Innovation and the Digital Catapult
Health Innovation and the Digital CatapultHealth Innovation and the Digital Catapult
Health Innovation and the Digital Catapult
 
Connected Farms ...and the Digital Catapult
Connected Farms ...and the Digital CatapultConnected Farms ...and the Digital Catapult
Connected Farms ...and the Digital Catapult
 
Data Innovation in the Digital Economy
Data Innovation in the Digital EconomyData Innovation in the Digital Economy
Data Innovation in the Digital Economy
 
Disruptive Data in Future Care
Disruptive Data in Future CareDisruptive Data in Future Care
Disruptive Data in Future Care
 
Data Warehouse to Data Science
Data Warehouse to Data ScienceData Warehouse to Data Science
Data Warehouse to Data Science
 
Business Impact of Predictive Analytics
Business Impact of Predictive AnalyticsBusiness Impact of Predictive Analytics
Business Impact of Predictive Analytics
 
Social Triangulation with Big Data
Social Triangulation with Big DataSocial Triangulation with Big Data
Social Triangulation with Big Data
 
Big Data Science Challenges in Media
Big Data Science Challenges in MediaBig Data Science Challenges in Media
Big Data Science Challenges in Media
 
Hadoop and friends
Hadoop and friendsHadoop and friends
Hadoop and friends
 
IPTV Case Study
IPTV Case StudyIPTV Case Study
IPTV Case Study
 

Último

RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxAleenaJamil4
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxellehsormae
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 

Último (20)

RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptx
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptx
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 

Big Data Science at the Digital Catapult

  • 1. BIG DATA SCIENCE Chandan Rajah [ @ChandanRajah ] “The price of light is far less than the cost of darkness”
  • 2. COST SPEED BENEFITS OF BIG DATA AGILITY CAPABILITY
  • 3. WHAT WHY Steps to the EPIPHANY WHERE DEMO
  • 4. What is Big Data ? Big Data ≠ Data Volume Big Data = Crude Oil Think of data like ‘Crude Oil’ Big Data is about extracting ‘crude oil’; transporting it in ‘pipelines’; storing it in ‘mega tanks’
  • 5. What is Data Science ? Data Science ≠ Statistical Analysis Data Science = Oil Refinery Data science is about ‘treating’ data; applying ‘science’ to the data; Refine the data ‘results’; and combine to form ‘insight’
  • 6. Knowns, Unknowns & DIKUW FTW! known knowns we know we know known unknowns we know we don’t know unknown unknowns we don’t know we don’t know D DATA I INFORMATION K KNOWLEDGE W WISDOM U UNDERSTANDING PAST FUTURE Data Engineer Data Analyst Data Miner Data Scientist raw what how to why when numbers description experience cause & effect prediction letters context tested proven what’s best symbols relationship instruction known knowns known unknowns unknown unknowns signals reports programs models
  • 7. Data Analytics to Data Discovery ? data you know data you don’t know questions you’re asking questions you’re not asking Data Analyst Data Scientist Data Analytics Data Discovery DATA MODELLING Y  F( X, random noise, parameters) ALGORITHMIC MODELLING Y  [ BLACK BOX ]  X
  • 8. DIVIDE SCATTER Split Data in Block Replicate and Store Petabytes of Resilience CONQUER EXPLORE 1000s of Parallel Threads Explore Every Path Machine Learning INSIGHT GATHER Real Time Action Periodic Dashboards Iterative Evolution What is the Big Idea ?
  • 9. Divide = HDFS Name Node Client 1. Create Metadata 2. Put Blocks 1 2 3 Control / Monitoring 2 2 1 1 Data Nodes 3 3 WRITE Name Node Client 1. Get Metadata Control / Monitoring 1 1 1 2 2 2 4 3 3 3 4 4 2. Fetch Blocks Data Nodes READ
  • 12. WHAT WHY Steps to the EPIPHANY WHERE DEMO
  • 13. Why is Big Data needed ? VOLUME VELOCITY VARIETY Exponential growth; 2x in 2 yrs PB (1000 TB) is now common Event streams; never at rest 640k GB per internet minute 100s of data sources 85% not in a table
  • 14. Where in the Value Chain ? Generation Transport Knowledge Output Value BIG DATA SCIENCE Straddles all four Challenge Areas
  • 15. WHAT WHY Steps to the EPIPHANY WHERE DEMO
  • 16. Big Data Heat Map – Gartner 2012
  • 17. Big Data Potential by Sector – McKinsey for USBLS, 2011
  • 18. Big Data Investment by Industry – Gartner, 2012
  • 19. Top Big Data Challenges – Gartner, 2012
  • 20. Survey on Big Data Investments – IDG Survey, 2013
  • 21. Survey on Main Drivers to Invest – IDG Survey, 2014
  • 22. WHAT WHY Steps to the EPIPHANY WHERE DEMO
  • 23. DEMO
  • 24. COST SPEED RECAP OF BENEFITS AGILITY CAPABILITY
  • 25. TIME VALUE OF DATA KNOWLEDGE IS POWER LAST WORDS OF WISDOM NOT ALL ROADS LEAD TO ROME I AM AN INDIVIDUAL
  • 26. “The price of light is far less than the cost of darkness”

Notas del editor

  1. COST – 20x less per TB v/s Teradata, Netezza, Oracle – 75% less average marginal cost per capacity SPEED – 10x faster than Teradata, Netezza AGILITY – 115% lesser average cost per data source v/s Oracle SCIENCE – Machine learning, prediction
  2. WHAT - What is Big Data Science? WHY - Why is it needed? WHERE - Where is it being used? HOW - How will it evolve?
  3. WHAT - What is Big Data Science? WHY - Why is it needed? WHERE - Where is it being used? HOW - How will it evolve?
  4. WHAT - What is Big Data Science? WHY - Why is it needed? WHERE - Where is it being used? HOW - How will it evolve?
  5. WHAT - What is Big Data Science? WHY - Why is it needed? WHERE - Where is it being used? HOW - How will it evolve?
  6. COST – 20x less per TB v/s Teradata, Netezza, Oracle – 75% less average marginal cost per capacity SPEED – 10x faster than Teradata, Netezza AGILITY – 115% lesser average cost per data source v/s Oracle SCIENCE – Machine learning, prediction
  7. TIME VALUE - Yesterday’s data is less valuable than today’s data - Historical data is more valuable than just now alone POWER - Get from unknown unknowns to known unknowns or known knowns is powerful LEAD TO ROME - Exploring with no direct business impact is not a bad thing INDIVUDUAL - Treat every customer as an individual not an aggregate and analyse - Aggregate only individual insights