SlideShare una empresa de Scribd logo
1 de 16
By Thanuja Seneviratne
 What is Big Data
› Top Down Approach to the Topic of Big Data
› Data Science and Data Scientists
› Days of Data Past – Part I
› Days of Data Past – Part II
› Days of Data Past – Part III
› Three V’s
› “Data Lake” Architecture
 Who can use Big Data
› Individual Experience vs Collective Experience
› Business Cases
› Use Cases
 How to Use Big Data
› Coming of Hadoop
› Evolution of Hadoop
› “Other than” HDFS
 Data Science and Data Scientists
› Science of mining, extracting, analyzing,
modeling, visualizing large data sets from
multiple sources
› “Data analyst, data artist”
› Knowledge of math, statistics, predictive
modeling, pattern recognition and learning, data
visualization, data warehousing, etc
› From C.F. Jeff Wu to William S. Cleveland to
“Data Science Journal” and beyond
 Days of Data Past - Part 1
› Relational databases and their impact
› Write-first schema
› ACID compliant
› Row-store technology
› Relationally structured data for smaller data sets
› Relatively cheaper products
 SQL Server, Oracle, etc
› Highly available skill-set
› SQL languages
 Data manipulation– Insert, Select, Update and Delete
 Data definition – Create, Alter, Truncate and Drop
› Influenced LINQ (in .NET) and JPQL (in Java) etc in application
programming
› Enterprise ready
 Days of Data Past - Part II
› Enterprise Data Warehouses (EDW) and their impact
› Massively Parallel Processing (MPP) appliances –not all EDW’s are
packaged as MPP appliances
› Column-store technology, faster and easier for BI – not all EDW’s use
column-store
› Dimensionally structured data for large data sets
› Enterprise storage not commodity storage
› Expensive premium products
 TeraData, Vertica, SQL Server PDW, etc
 Some major companies offers commodity hardware for low price
customer
 Some major companies offers services in addition to products
› Demanding skill-set
› Enterprise ready
 Days of Data Past - Part III
› NoSql data stores and their impact
› Not relational and Not ACID compliant
› Four types
 Key-value stores (KV)
 Document stores
 Graph database stores
 Wide column stores
› Relatively cheaper products
› Commodity storage not enterprise storages
› Demanding and scarce skill-set
› Not Enterprise ready
NewSql data stores as an alternative to NoSql
 Relational and ACID compliant
 SQL driven so that existing SQL investments are intact
 Three V’s
› Volume
 Large volumes 100 TB or more currently
 Expecting above benchmark in future
› Velocity
 How quickly data accumulates
 How quickly your data makes sense
 Batch, near-time, real-time
 Batch vs Interactive
› Variety
 Various data sources
 Structured data – relational, ERP, CRM
 Semi-structured data – click streams, weblogs, geographical,
social
 Unstructured data – sensor, textual, machine generated
 “Data Lake” Architecture
› Modern Data Architecture
 Provides a shared service for broad insight across a
large, diverse data set at efficient scale according to
HortonWorks
 A unified data architecture which integrated to
enterprise end-to-end solutions according to TeraData
› Cater to support 3V driven big data opportunities
› Raw data of unrecognized value
› Read-first schema
 Individual Experience vs Collective Experience
› Need to treat as individuals instead a mass collective
› Predictive modeling to recommend individual’s best
“intent”
› Implementing Process communication models (PCM) to
give better individualized customer service
 Listening to particular song by particular artist via mobile
 Calling to a call center
› Privacy concerns – main obstacle in current big data trend
 Business Cases
› Medical or Healthcare
› Entertainment
› Forensics
› Financial
› Retail
 Use Cases
› Medical or Healthcare
 Find a cure to a disease based on individual’s medical history,
behavior patterns, food and drug consumption, and similar
patients’ data
› Entertainment
 Provide a recommendation engine for IMDB or Netflix for
individual’s viewing patterns
› Forensics
 Capture a serial killer from historical murder data in CSI.
Similarly avoid more incidents in the similar killer pattern
› Financial
 Provide a predictive financial model for Wall Street stock market
fluctuations based on historical shareholder patterns
› Retail
 Coming of Hadoop
› GFS and Google’s MapReduce engine and
publishing of white papers by Google
› Yahoo team who first to decode the white papers and
create HDFS and an MR engine to scale out yahoo
search
› Creation of Hadoop 1.0 (Generation 1) in 2006 and
commit for Production level Hadoop by Yahoo
› Spawning the HortonWorks company in 2011 from a
set of Yahoo employees and move towards
Enterprise hardening
› Spawning multiple Hadoop distros as products
 Evolution of Hadoop
› Hadoop 1.x (Generation 1)
 Data Management – HDFS for redundant data storage from various sources and MapReduce
to process the data
 Data Access Layer (batch, near-time, real-time) - to access data simultaneously in multiple
ways
› Hadoop 2.x (Generation 2)
 Introducing YARN for Data Management layer
 Governance and Integration for Enterprise level – data loading, execute data policies, data
management – introducing Apache Falcon
 Security – authentication and authorization at a layered and secured way – Apache Knox
 Operations – deploy, monitor and manage the platform as whole – introducing Apache Ambari
› Enterprise Hadoop
 Deployment choice – Physical, virtual, cloud; distro Windows or Linux; distro product
HortonWorks or Cloudera or other
 Presentation and Applications – Enable existing and new applications to generate value from
Hadoop
 Enterprise management and security – empower existing proven enterprise tools to integrate
with Hadoop
 Services or Product choice - YARN-enabling always –on forever running services with Apache
Slider
 Hadoop 2.7 Stack (HortonWorks view)
 “Other than” Hadoop, HDFS
› HDFS-like storage systems with similar
MapReduce engines
› MapR (uses an NFS)
 Has cloud support too
› EMC, NetApp, CleverState, Symentic
› IBM’s BigInsight (kind of distro of Cloudera which
is intern distro of Hadoop)
› SAP’s HANA suite
› Of course proprietary GFS which HDFS is based
on originally
Big Data - Part I

Más contenido relacionado

La actualidad más candente

Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data HadoopApache Apex
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampSpotle.ai
 
Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)SahilRaina21
 
big data and hadoop
 big data and hadoop big data and hadoop
big data and hadoopahmed alshikh
 
Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoBig Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoMark Kromer
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations PresentationAdam Doyle
 
Big Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersBig Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersEdureka!
 
Next Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon ThomasNext Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon ThomasThoughtworks
 
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizIntroduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizITJobZone.biz
 
Big Data Visualisation with Hadoop and PowerPivot
Big Data Visualisation with Hadoop and PowerPivotBig Data Visualisation with Hadoop and PowerPivot
Big Data Visualisation with Hadoop and PowerPivotJen Stirrup
 
BigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTBigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTAmrit Chhetri
 
How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?Vincent Terrasi
 
An exploration in analysis and visualization
An exploration in analysis and visualizationAn exploration in analysis and visualization
An exploration in analysis and visualizationDorai Thodla
 

La actualidad más candente (20)

Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data Hadoop
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop Bootcamp
 
Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)
 
Hadoop
HadoopHadoop
Hadoop
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
big data and hadoop
 big data and hadoop big data and hadoop
big data and hadoop
 
Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoBig Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with Pentaho
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations Presentation
 
Big Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersBig Data Analytics for Non-Programmers
Big Data Analytics for Non-Programmers
 
Big Data
Big DataBig Data
Big Data
 
Next Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon ThomasNext Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon Thomas
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizIntroduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
 
Big Data Visualisation with Hadoop and PowerPivot
Big Data Visualisation with Hadoop and PowerPivotBig Data Visualisation with Hadoop and PowerPivot
Big Data Visualisation with Hadoop and PowerPivot
 
BigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTBigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRT
 
NoSQL Introduction
NoSQL IntroductionNoSQL Introduction
NoSQL Introduction
 
How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?
 
An exploration in analysis and visualization
An exploration in analysis and visualizationAn exploration in analysis and visualization
An exploration in analysis and visualization
 
Big data in Azure
Big data in AzureBig data in Azure
Big data in Azure
 

Similar a Big Data - Part I

Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2RojaT4
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarioskcmallu
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusersBob Hardaway
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsAbhishekKumarAgrahar2
 
bigdataintro.pptx
bigdataintro.pptxbigdataintro.pptx
bigdataintro.pptxAlbert Alex
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxAIMLSEMINARS
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amirydatastack
 
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...SoftServe
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big dataRaul Chong
 
Big data presentationandoverview_of_couchbase
Big data presentationandoverview_of_couchbaseBig data presentationandoverview_of_couchbase
Big data presentationandoverview_of_couchbaseAMAR NATH
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...DataStax
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptalmaraniabwmalk
 
Introduction Big Data
Introduction Big DataIntroduction Big Data
Introduction Big DataFrank Kienle
 

Similar a Big Data - Part I (20)

Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Bigdata (1) converted
Bigdata (1) convertedBigdata (1) converted
Bigdata (1) converted
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in details
 
bigdataintro.pptx
bigdataintro.pptxbigdataintro.pptx
bigdataintro.pptx
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
Big data presentationandoverview_of_couchbase
Big data presentationandoverview_of_couchbaseBig data presentationandoverview_of_couchbase
Big data presentationandoverview_of_couchbase
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
 
Thilga
ThilgaThilga
Thilga
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Lecture1
Lecture1Lecture1
Lecture1
 
Unit 1
Unit 1Unit 1
Unit 1
 
All About Big Data
All About Big Data All About Big Data
All About Big Data
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Introduction Big Data
Introduction Big DataIntroduction Big Data
Introduction Big Data
 

Último

%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...masabamasaba
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxAnnaArtyushina1
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2
 
tonesoftg
tonesoftgtonesoftg
tonesoftglanshi9
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...masabamasaba
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2
 

Último (20)

%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 

Big Data - Part I

  • 2.  What is Big Data › Top Down Approach to the Topic of Big Data › Data Science and Data Scientists › Days of Data Past – Part I › Days of Data Past – Part II › Days of Data Past – Part III › Three V’s › “Data Lake” Architecture  Who can use Big Data › Individual Experience vs Collective Experience › Business Cases › Use Cases  How to Use Big Data › Coming of Hadoop › Evolution of Hadoop › “Other than” HDFS
  • 3.  Data Science and Data Scientists › Science of mining, extracting, analyzing, modeling, visualizing large data sets from multiple sources › “Data analyst, data artist” › Knowledge of math, statistics, predictive modeling, pattern recognition and learning, data visualization, data warehousing, etc › From C.F. Jeff Wu to William S. Cleveland to “Data Science Journal” and beyond
  • 4.  Days of Data Past - Part 1 › Relational databases and their impact › Write-first schema › ACID compliant › Row-store technology › Relationally structured data for smaller data sets › Relatively cheaper products  SQL Server, Oracle, etc › Highly available skill-set › SQL languages  Data manipulation– Insert, Select, Update and Delete  Data definition – Create, Alter, Truncate and Drop › Influenced LINQ (in .NET) and JPQL (in Java) etc in application programming › Enterprise ready
  • 5.  Days of Data Past - Part II › Enterprise Data Warehouses (EDW) and their impact › Massively Parallel Processing (MPP) appliances –not all EDW’s are packaged as MPP appliances › Column-store technology, faster and easier for BI – not all EDW’s use column-store › Dimensionally structured data for large data sets › Enterprise storage not commodity storage › Expensive premium products  TeraData, Vertica, SQL Server PDW, etc  Some major companies offers commodity hardware for low price customer  Some major companies offers services in addition to products › Demanding skill-set › Enterprise ready
  • 6.  Days of Data Past - Part III › NoSql data stores and their impact › Not relational and Not ACID compliant › Four types  Key-value stores (KV)  Document stores  Graph database stores  Wide column stores › Relatively cheaper products › Commodity storage not enterprise storages › Demanding and scarce skill-set › Not Enterprise ready NewSql data stores as an alternative to NoSql  Relational and ACID compliant  SQL driven so that existing SQL investments are intact
  • 7.  Three V’s › Volume  Large volumes 100 TB or more currently  Expecting above benchmark in future › Velocity  How quickly data accumulates  How quickly your data makes sense  Batch, near-time, real-time  Batch vs Interactive › Variety  Various data sources  Structured data – relational, ERP, CRM  Semi-structured data – click streams, weblogs, geographical, social  Unstructured data – sensor, textual, machine generated
  • 8.  “Data Lake” Architecture › Modern Data Architecture  Provides a shared service for broad insight across a large, diverse data set at efficient scale according to HortonWorks  A unified data architecture which integrated to enterprise end-to-end solutions according to TeraData › Cater to support 3V driven big data opportunities › Raw data of unrecognized value › Read-first schema
  • 9.  Individual Experience vs Collective Experience › Need to treat as individuals instead a mass collective › Predictive modeling to recommend individual’s best “intent” › Implementing Process communication models (PCM) to give better individualized customer service  Listening to particular song by particular artist via mobile  Calling to a call center › Privacy concerns – main obstacle in current big data trend
  • 10.  Business Cases › Medical or Healthcare › Entertainment › Forensics › Financial › Retail
  • 11.  Use Cases › Medical or Healthcare  Find a cure to a disease based on individual’s medical history, behavior patterns, food and drug consumption, and similar patients’ data › Entertainment  Provide a recommendation engine for IMDB or Netflix for individual’s viewing patterns › Forensics  Capture a serial killer from historical murder data in CSI. Similarly avoid more incidents in the similar killer pattern › Financial  Provide a predictive financial model for Wall Street stock market fluctuations based on historical shareholder patterns › Retail
  • 12.  Coming of Hadoop › GFS and Google’s MapReduce engine and publishing of white papers by Google › Yahoo team who first to decode the white papers and create HDFS and an MR engine to scale out yahoo search › Creation of Hadoop 1.0 (Generation 1) in 2006 and commit for Production level Hadoop by Yahoo › Spawning the HortonWorks company in 2011 from a set of Yahoo employees and move towards Enterprise hardening › Spawning multiple Hadoop distros as products
  • 13.  Evolution of Hadoop › Hadoop 1.x (Generation 1)  Data Management – HDFS for redundant data storage from various sources and MapReduce to process the data  Data Access Layer (batch, near-time, real-time) - to access data simultaneously in multiple ways › Hadoop 2.x (Generation 2)  Introducing YARN for Data Management layer  Governance and Integration for Enterprise level – data loading, execute data policies, data management – introducing Apache Falcon  Security – authentication and authorization at a layered and secured way – Apache Knox  Operations – deploy, monitor and manage the platform as whole – introducing Apache Ambari › Enterprise Hadoop  Deployment choice – Physical, virtual, cloud; distro Windows or Linux; distro product HortonWorks or Cloudera or other  Presentation and Applications – Enable existing and new applications to generate value from Hadoop  Enterprise management and security – empower existing proven enterprise tools to integrate with Hadoop  Services or Product choice - YARN-enabling always –on forever running services with Apache Slider
  • 14.  Hadoop 2.7 Stack (HortonWorks view)
  • 15.  “Other than” Hadoop, HDFS › HDFS-like storage systems with similar MapReduce engines › MapR (uses an NFS)  Has cloud support too › EMC, NetApp, CleverState, Symentic › IBM’s BigInsight (kind of distro of Cloudera which is intern distro of Hadoop) › SAP’s HANA suite › Of course proprietary GFS which HDFS is based on originally

Notas del editor

  1. In November 1997, C.F. Jeff Wu gave the inaugural lecture entitled "Statistics = Data Science?“ In 2001, William S. Cleveland introduced data science as an independent discipline, extending the field of statistics to incorporate "advances in computing with data" in his article "Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics"  In April 2002, the International Council for Science: Committee on Data for Science and Technology (CODATA)[9] started the Data Science Journal
  2. ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties that guarantee that database transactions are processed reliably