SlideShare a Scribd company logo
1 of 28
Download to read offline
Making Sense out of Big Data
Peter Morgan - July 2013
Table of Contents
1. Definition and Overview
2. Data Sources
3. Databases
4. Data Analytics
Glossary
References
2
1. Definition and Overview
3
What is big data?
More and more data is being collected and stored each day
4
Four main components
• Data
– Structured and unstructured
• Databases
– Proprietary and open source
• Query language
– Querying the database
• Analytics
– Analysing the data
5
How big is big?
• Large data sets
– Greater than 1,000 Terabytes? (1 Petabyte)
– 1,000,000 Terabytes? (1 Exabyte)
• Excel 2013 can have 1,048,576 rows by 16,384 columns
– About 10 Gigabyte of data
• Only going to get bigger
– 90% of all data produced in the past two years !
– Rate is increasing
• Recall
– Giga = 10⁹
– Tera = 10¹²
– Peta = 10¹⁵
– Exa = 10¹⁸
6
Big Data Evolution
7
2. Data Sources
8
Where does the data come from?
• Science – particle, astrophysics
• Industry – oil, finance, telecom
– Actually all verticals
• Social – Facebook, LinkedIn, Twitter
• Medicine – genome, neuroscience
• Government – census, education, police
• Sports – statistics
• Environment – weather, sensors
9
Unstructured Data
• 80% of data is unstructured
• NoSQL
• Document based
– Documents
– Texts, tweets
– Emails
– Machine logs
– Blogs
– Web pages
– Photos
– Videos (YouTube)
• Graph based
– Social media sites
– Facebook has 1.1billions users (Microstrategy, July 27, 2013)
10
Why do we need to use big data?
Use in public and private sector to:
• Make faster and more accurate business decisions
• Make accurate predictions
• Gain competitive advantage
• Implement smarter marketing – CRM
• Discover new opportunities
• Enhance Business Intelligence
• Enable fraud detection
• Reduce crime
• Improve scientific research
• Quicken analysis (up to real time)
– Weeks, days  minutes, seconds
11
Big Data Startup - Case Study
• Rocket Fuel
• No. 4 on Forbes' 2013 Most Promising Companies In
America list
• Digital advertising startup
• Screens over 26 billion ads per day
• “Advertising that learns” big data platform
• Distributed planet-scale computing engine
• Hadoop implementation
• Founders from Yahoo!, Salesforce.com, DoubleClick
• Targeting algorithms use lifestyle, purchase intent and
social data
12
Some big statistics
13
3. Databases
14
Database Timeline
15
Relational databases – SQL
Proprietary
• Oracle DB
• IBM DB2
• Microsoft SQL
• SAP
• EMC
Open Source
• MySQL
• PostgresQL
• Drizzle
• Firebird
16
Non-relational databases – NoSQL
• BigTable – Google
• Cassandra – Facebook
• Eucalyptus – Amazon
• Hbase – Hadoop
• MongoDB – 10Gen
• Neo4j - NeoTechnologies
• CouchDB - Apache
• CouchBase
• Riak - Basho
• Redis - Pivotal
17
4. Big Data Analytics
18
Big Data Analytics - Incumbents
• Oracle – Exadata, Exalytics
• Microsoft – HDInsight, xVelocity
• IBM – Netezza, Cognos, BigInsights
• SAP – HANA, Business Objects
• EMC – Pivotal (Greenplum)
• HP – Vertica, HAVEn
• All run on Hadoop
19
Big Data Analytics – Pure Plays
• Pure plays – definition:
– Been around more than 20 years
– Purely data analytic companies
• Teradata - Aster
• SAS
• Microstrategy
20
Big Data Analytics – New Entrants
• Hortonworks
• Cloudera
• MapR
• Acunu
• Pentaho
• Tableau
• Talend
• Splunk
21
(Some of) IBM’s Big Data Acquisitions
• Algorithmics
– Oct 2011, $400million
• OpenPages
– Oct 2010, ?
• Netezza
– Sept 2010, $1.7billion
• SPSS
– Jan 2010, $1.2billion
• Cognos
– Jan 2008, $4.9billion
• About $10billion in four years
http://en.wikipedia.org/wiki/List_of_mergers_and_acquisitions_by_IBM
22
Big Data Science Tools
• Hadoop
• NoSQL
• MapReduce
• R
• Matlab
• Python
• Statistics
23
Big Data Hadoop Stack
• Hadoop is the de facto big data operating system
• Developed from Google and Yahoo! (2005)
• It is distributed, open source and managed by Apache
24
Analytic Technologies
• A/B testing
• Genetic algorithms
• Machine learning
• Natural language
processing
• Neural networks
• Pattern recognition
• Anomaly detection
• Decision tree
• Predictive modeling
• Regression testing
• Sentiment analysis
• Signal processing
• Simulations
• Time series analysis
• Visualization
• Multivariate analysis
• Text analytics
25
Glossary
• OLTP = On Line Transactional Processing
• OLAP = On Line Analytic Processing
• ODBC = Open DataBase Connectivity
• IMDB = In Memory DataBase
• CRUD = Create, Read, Update, Delete
• ETL = Extract, Transform and Load
• CDO = Chief Data Officer
• NLP = Natural Language Processing
• GQL = Graph Query Language
• AaaS = Analytics as a Service
• EDW = Enterprise Data Warehouse
26
References
• Microstrategy website, 27 July, 2013, Michael Saylor
Presentation at Microstrategy World 2013,
http://www.microstrategy.com/
• Teradata website www.teradata.com
• Wikipedia http://en.wikipedia.org/wiki/
• Google images www.google.co.uk
• IBM website www.ibm.com
• Youtube www.youtube.com
• Hadoop www.hortonworks.com
27
Any Questions?
28

More Related Content

What's hot

Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsPetr Novotný
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data miningEmran Hossain
 
Wikibon Big Data Capital Markets Day 2014
Wikibon Big Data Capital Markets Day 2014Wikibon Big Data Capital Markets Day 2014
Wikibon Big Data Capital Markets Day 2014Jeff Kelly
 
Presentation Big Data
Presentation Big DataPresentation Big Data
Presentation Big DataRené Kuipers
 
Open source for customer analytics
Open source for customer analyticsOpen source for customer analytics
Open source for customer analyticsMatthias Funke
 
Big Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data ScientistsBig Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data ScientistsWay-Yen Lin
 
Bigdatacooltools
BigdatacooltoolsBigdatacooltools
Bigdatacooltoolssuresh sood
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big datakk1718
 
JPJ1417 Data Mining With Big Data
JPJ1417   Data Mining With Big DataJPJ1417   Data Mining With Big Data
JPJ1417 Data Mining With Big Datachennaijp
 
Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks ...
Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks ...Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks ...
Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks ...Connected Data World
 
Business intelligence architectures.pdf
Business intelligence architectures.pdfBusiness intelligence architectures.pdf
Business intelligence architectures.pdfAnand572211
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data miningPolash Halder
 
Introduction of big data and analytics
Introduction of big data and analyticsIntroduction of big data and analytics
Introduction of big data and analyticsSanjeev Solanki
 

What's hot (20)

Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big Graphs
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data mining
 
Wikibon Big Data Capital Markets Day 2014
Wikibon Big Data Capital Markets Day 2014Wikibon Big Data Capital Markets Day 2014
Wikibon Big Data Capital Markets Day 2014
 
Big data
Big dataBig data
Big data
 
Presentation Big Data
Presentation Big DataPresentation Big Data
Presentation Big Data
 
Open source for customer analytics
Open source for customer analyticsOpen source for customer analytics
Open source for customer analytics
 
Unit 1
Unit 1Unit 1
Unit 1
 
Big Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data ScientistsBig Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data Scientists
 
Bigdatacooltools
BigdatacooltoolsBigdatacooltools
Bigdatacooltools
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
JPJ1417 Data Mining With Big Data
JPJ1417   Data Mining With Big DataJPJ1417   Data Mining With Big Data
JPJ1417 Data Mining With Big Data
 
Big data
Big dataBig data
Big data
 
Are you ready for BIG DATA?
Are you ready for BIG DATA?Are you ready for BIG DATA?
Are you ready for BIG DATA?
 
Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks ...
Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks ...Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks ...
Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks ...
 
Business intelligence architectures.pdf
Business intelligence architectures.pdfBusiness intelligence architectures.pdf
Business intelligence architectures.pdf
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data mining
 
Introduction of big data and analytics
Introduction of big data and analyticsIntroduction of big data and analytics
Introduction of big data and analytics
 
Big data
Big dataBig data
Big data
 

Similar to Big data – An Introduction, July 2013

Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data AnalyticsS P Sajjan
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigManish Chopra
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataRoi Blanco
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...Mihai Criveti
 
Content1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxContent1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxdickonsondorris
 
Graph tour keynote 2019
Graph tour keynote 2019Graph tour keynote 2019
Graph tour keynote 2019Neo4j
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadhMithlesh Sadh
 
ppt final.pptx
ppt final.pptxppt final.pptx
ppt final.pptxkalai75
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxAIMLSEMINARS
 
Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01nayanbhatia2
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014ALTER WAY
 
Modul_1_Introduction_to_Big_Data.pptx
Modul_1_Introduction_to_Big_Data.pptxModul_1_Introduction_to_Big_Data.pptx
Modul_1_Introduction_to_Big_Data.pptxNouhaElhaji1
 
Lecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdfLecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdfahmedibrahimghnnam01
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big datahktripathy
 

Similar to Big data – An Introduction, July 2013 (20)

BigData.pptx
BigData.pptxBigData.pptx
BigData.pptx
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data Analytics
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
 
Big data
Big dataBig data
Big data
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
 
Content1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxContent1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docx
 
Graph tour keynote 2019
Graph tour keynote 2019Graph tour keynote 2019
Graph tour keynote 2019
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
ppt final.pptx
ppt final.pptxppt final.pptx
ppt final.pptx
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
 
Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Modul_1_Introduction_to_Big_Data.pptx
Modul_1_Introduction_to_Big_Data.pptxModul_1_Introduction_to_Big_Data.pptx
Modul_1_Introduction_to_Big_Data.pptx
 
Lecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdfLecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdf
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
Hadoop Eco system
Hadoop Eco systemHadoop Eco system
Hadoop Eco system
 
Big_Data_ppt[1] (1).pptx
Big_Data_ppt[1] (1).pptxBig_Data_ppt[1] (1).pptx
Big_Data_ppt[1] (1).pptx
 

More from Peter Morgan

Towards AGI Berlin - Building AGI, May 2019
Towards AGI Berlin - Building AGI, May 2019Towards AGI Berlin - Building AGI, May 2019
Towards AGI Berlin - Building AGI, May 2019Peter Morgan
 
AI in Physics - University of Washington, Jan 2024
AI in Physics - University of Washington, Jan 2024AI in Physics - University of Washington, Jan 2024
AI in Physics - University of Washington, Jan 2024Peter Morgan
 
Towards a General Theory of Intelligence - April 2018
Towards a General Theory of Intelligence - April 2018Towards a General Theory of Intelligence - April 2018
Towards a General Theory of Intelligence - April 2018Peter Morgan
 
Simulation Hypothesis 2017
Simulation Hypothesis 2017Simulation Hypothesis 2017
Simulation Hypothesis 2017Peter Morgan
 
AI Developments Aug 2017
AI Developments Aug 2017AI Developments Aug 2017
AI Developments Aug 2017Peter Morgan
 
London Exponential Technologies Meetup, July 2017
London Exponential Technologies Meetup, July 2017London Exponential Technologies Meetup, July 2017
London Exponential Technologies Meetup, July 2017Peter Morgan
 
Robotics Overview 2016
Robotics Overview 2016Robotics Overview 2016
Robotics Overview 2016Peter Morgan
 
AI and Blockchain 2017
AI and Blockchain 2017AI and Blockchain 2017
AI and Blockchain 2017Peter Morgan
 
AI in Healthcare 2017
AI in Healthcare 2017AI in Healthcare 2017
AI in Healthcare 2017Peter Morgan
 
AI Predictions 2017
AI Predictions 2017AI Predictions 2017
AI Predictions 2017Peter Morgan
 
AI State of Play Dec 2016 NYC
AI State of Play Dec 2016 NYCAI State of Play Dec 2016 NYC
AI State of Play Dec 2016 NYCPeter Morgan
 
Machine Learning - Where to Next?, May 2015
Machine Learning  - Where to Next?, May 2015Machine Learning  - Where to Next?, May 2015
Machine Learning - Where to Next?, May 2015Peter Morgan
 

More from Peter Morgan (12)

Towards AGI Berlin - Building AGI, May 2019
Towards AGI Berlin - Building AGI, May 2019Towards AGI Berlin - Building AGI, May 2019
Towards AGI Berlin - Building AGI, May 2019
 
AI in Physics - University of Washington, Jan 2024
AI in Physics - University of Washington, Jan 2024AI in Physics - University of Washington, Jan 2024
AI in Physics - University of Washington, Jan 2024
 
Towards a General Theory of Intelligence - April 2018
Towards a General Theory of Intelligence - April 2018Towards a General Theory of Intelligence - April 2018
Towards a General Theory of Intelligence - April 2018
 
Simulation Hypothesis 2017
Simulation Hypothesis 2017Simulation Hypothesis 2017
Simulation Hypothesis 2017
 
AI Developments Aug 2017
AI Developments Aug 2017AI Developments Aug 2017
AI Developments Aug 2017
 
London Exponential Technologies Meetup, July 2017
London Exponential Technologies Meetup, July 2017London Exponential Technologies Meetup, July 2017
London Exponential Technologies Meetup, July 2017
 
Robotics Overview 2016
Robotics Overview 2016Robotics Overview 2016
Robotics Overview 2016
 
AI and Blockchain 2017
AI and Blockchain 2017AI and Blockchain 2017
AI and Blockchain 2017
 
AI in Healthcare 2017
AI in Healthcare 2017AI in Healthcare 2017
AI in Healthcare 2017
 
AI Predictions 2017
AI Predictions 2017AI Predictions 2017
AI Predictions 2017
 
AI State of Play Dec 2016 NYC
AI State of Play Dec 2016 NYCAI State of Play Dec 2016 NYC
AI State of Play Dec 2016 NYC
 
Machine Learning - Where to Next?, May 2015
Machine Learning  - Where to Next?, May 2015Machine Learning  - Where to Next?, May 2015
Machine Learning - Where to Next?, May 2015
 

Recently uploaded

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Recently uploaded (20)

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Big data – An Introduction, July 2013

  • 1. Making Sense out of Big Data Peter Morgan - July 2013
  • 2. Table of Contents 1. Definition and Overview 2. Data Sources 3. Databases 4. Data Analytics Glossary References 2
  • 3. 1. Definition and Overview 3
  • 4. What is big data? More and more data is being collected and stored each day 4
  • 5. Four main components • Data – Structured and unstructured • Databases – Proprietary and open source • Query language – Querying the database • Analytics – Analysing the data 5
  • 6. How big is big? • Large data sets – Greater than 1,000 Terabytes? (1 Petabyte) – 1,000,000 Terabytes? (1 Exabyte) • Excel 2013 can have 1,048,576 rows by 16,384 columns – About 10 Gigabyte of data • Only going to get bigger – 90% of all data produced in the past two years ! – Rate is increasing • Recall – Giga = 10⁹ – Tera = 10¹² – Peta = 10¹⁵ – Exa = 10¹⁸ 6
  • 9. Where does the data come from? • Science – particle, astrophysics • Industry – oil, finance, telecom – Actually all verticals • Social – Facebook, LinkedIn, Twitter • Medicine – genome, neuroscience • Government – census, education, police • Sports – statistics • Environment – weather, sensors 9
  • 10. Unstructured Data • 80% of data is unstructured • NoSQL • Document based – Documents – Texts, tweets – Emails – Machine logs – Blogs – Web pages – Photos – Videos (YouTube) • Graph based – Social media sites – Facebook has 1.1billions users (Microstrategy, July 27, 2013) 10
  • 11. Why do we need to use big data? Use in public and private sector to: • Make faster and more accurate business decisions • Make accurate predictions • Gain competitive advantage • Implement smarter marketing – CRM • Discover new opportunities • Enhance Business Intelligence • Enable fraud detection • Reduce crime • Improve scientific research • Quicken analysis (up to real time) – Weeks, days  minutes, seconds 11
  • 12. Big Data Startup - Case Study • Rocket Fuel • No. 4 on Forbes' 2013 Most Promising Companies In America list • Digital advertising startup • Screens over 26 billion ads per day • “Advertising that learns” big data platform • Distributed planet-scale computing engine • Hadoop implementation • Founders from Yahoo!, Salesforce.com, DoubleClick • Targeting algorithms use lifestyle, purchase intent and social data 12
  • 16. Relational databases – SQL Proprietary • Oracle DB • IBM DB2 • Microsoft SQL • SAP • EMC Open Source • MySQL • PostgresQL • Drizzle • Firebird 16
  • 17. Non-relational databases – NoSQL • BigTable – Google • Cassandra – Facebook • Eucalyptus – Amazon • Hbase – Hadoop • MongoDB – 10Gen • Neo4j - NeoTechnologies • CouchDB - Apache • CouchBase • Riak - Basho • Redis - Pivotal 17
  • 18. 4. Big Data Analytics 18
  • 19. Big Data Analytics - Incumbents • Oracle – Exadata, Exalytics • Microsoft – HDInsight, xVelocity • IBM – Netezza, Cognos, BigInsights • SAP – HANA, Business Objects • EMC – Pivotal (Greenplum) • HP – Vertica, HAVEn • All run on Hadoop 19
  • 20. Big Data Analytics – Pure Plays • Pure plays – definition: – Been around more than 20 years – Purely data analytic companies • Teradata - Aster • SAS • Microstrategy 20
  • 21. Big Data Analytics – New Entrants • Hortonworks • Cloudera • MapR • Acunu • Pentaho • Tableau • Talend • Splunk 21
  • 22. (Some of) IBM’s Big Data Acquisitions • Algorithmics – Oct 2011, $400million • OpenPages – Oct 2010, ? • Netezza – Sept 2010, $1.7billion • SPSS – Jan 2010, $1.2billion • Cognos – Jan 2008, $4.9billion • About $10billion in four years http://en.wikipedia.org/wiki/List_of_mergers_and_acquisitions_by_IBM 22
  • 23. Big Data Science Tools • Hadoop • NoSQL • MapReduce • R • Matlab • Python • Statistics 23
  • 24. Big Data Hadoop Stack • Hadoop is the de facto big data operating system • Developed from Google and Yahoo! (2005) • It is distributed, open source and managed by Apache 24
  • 25. Analytic Technologies • A/B testing • Genetic algorithms • Machine learning • Natural language processing • Neural networks • Pattern recognition • Anomaly detection • Decision tree • Predictive modeling • Regression testing • Sentiment analysis • Signal processing • Simulations • Time series analysis • Visualization • Multivariate analysis • Text analytics 25
  • 26. Glossary • OLTP = On Line Transactional Processing • OLAP = On Line Analytic Processing • ODBC = Open DataBase Connectivity • IMDB = In Memory DataBase • CRUD = Create, Read, Update, Delete • ETL = Extract, Transform and Load • CDO = Chief Data Officer • NLP = Natural Language Processing • GQL = Graph Query Language • AaaS = Analytics as a Service • EDW = Enterprise Data Warehouse 26
  • 27. References • Microstrategy website, 27 July, 2013, Michael Saylor Presentation at Microstrategy World 2013, http://www.microstrategy.com/ • Teradata website www.teradata.com • Wikipedia http://en.wikipedia.org/wiki/ • Google images www.google.co.uk • IBM website www.ibm.com • Youtube www.youtube.com • Hadoop www.hortonworks.com 27