SlideShare a Scribd company logo
1 of 34
Introduction to Big Data
                   An analogy between Sugar Cane & Big Data




Image Source: alternative-energy-fuels.com                                    Image Source: MicFarris.com




                                             Jean-Marc Desvaux – March 2012
Session Abstract :

What is Big Data ? Where does it apply ?
What are the technologies behind it ?
Is it going to replace your RDBMS ? …
Big data, It’s all Silicon Valley is talking about. It’s
the new buzz word after ‘cloud.’


“Everybody is speaking of it and many are
convinced it is the only way forward. As always,
such dramatic statements are not only dangerous
but serve to put some people off the concept. “
Source: Tom Kyte’s Big Data Are you ready ? presentation
What is Big Data ?
Big Data is data that exceeds the processing
capacity of conventional database systems.

It’s too big, too fast or does not fit the
structures of database architectures.
To gain value from this type of data you need
an alternative way to process it.

Why this is happening ?
Data is growing faster than computers are
getting bigger.
A catch-all term.
Includes Social Networks data, Web logs, MP3s,
Web pages unstructured content, XML, GPS
tracking data, Vehicles Telemetry, financial market
data and many more…

Can be characterized by the 3 Vs :-




                                   Image Source: Tom Kyte’s Big Data Are you ready ? presentation
Volume
Data growing faster than machines getting
bigger.
Data sources adding up..

Velocity
Rate of acquisition and desired rate of
consumption.


Variety
Extends beyond structured data, includes
unstructured data of all varieties.

                  Image Source: Tom Kyte’s Big Data Are you ready ? presentation
Where does Big Data apply?
Big Data value to an Organisation falls into two
main categories :


            Analytical Use


            Enabling new products
            and services
Analytical Use
To reveal insights previously hidden because
hard to record and exploit.
An edge on classic Analytics based on
sampling and more “static” &
predetermined reports.
It promotes an investigative approach to
data and put the data scientist and analyst
in the spotlight.

Hal Varian, chief economist at Google
“I keep saying that the sexy job in the next 10 years
will be statisticians”
Some terms linked to the Analytical Use of Big Data


                            Sentiment Analysis :
Mining the Web in real time and getting a quick read of what people are thinking.


          Named-entity recognition (NER) (also known as entity
 identification and entity extraction) is a subtask of information extraction that
  seeks to locate and classify atomic elements in text into predefined categories
   such as the names of persons, organizations, locations, expressions of times,
    quantities, monetary values, percentages, etc.(ex: Big B in a tweet is for Big
                            Brother or Amitabh Bachan)
Product/Service Enabler

Some products and services cannot exist if not
backed up by Big Data technologies:
-Need to Scale
-Need a fast Feedback Loop on complex
analytics.

Highly successful Web startups pioneering Big
Data technologies through R&D to enable new
type of products are a good example:
Google, Yahoo, Amazon,Facebook.
Sectors with Fast Adoption and High Potential

              Financial Sector
            Telecommunications
                Government
                  Health
                   Retail
Big Data Sources :
Internal &
Data Marketplaces.
Internal sources

             Time Attendance logs
                RFID sensors logs
                  Security Logs
             Vehicles GPS tracking
           Machinery/Telemetry Logs
                Pictures & videos
           Enterprise Social Networks
           Service Forum/Discussions
                       ….

Mostly anything unstructured or simply structured
External Sources (feeders/data marketplaces)
Examples: Infochimps.com, DataSift.com, datamarket.azure.com




                                                Source: DataSift.com
An Enterprise Architecture for Big
              Data
 An analogy with a Sugar Cane Factory
SUGAR CANE FIELDS        A Sugar Factory
AQUIRE (HARVEST)




EXTRACT/SCHRED




EVAPORATE/DISTILL/BOIL




 DRY/STORE/SUGAR



   BOTTOM LINE              = VALUE
DATA SOURCES
     (RDBMS &
                        An Enterprise Big Data Factory
 Data Marketplaces)




 AQUIRE (HARVEST)
                                 HDFS                  NoSQL Database                  RDBMS
                         (Hadoop Distributed FS)      (Hadoop Distributed FS)    Enterprise Applications



ORGANIZE(EXTRACT)           Map Reduce                     Big Data                    RDBMS
                             (Hadoop)                     Connectors                 Connectors


      ANALYSE
                                           Data Warehousing / RDBMS stores
(SCHRED/DISTILL/BOIL)


     BUSINESS                                        Analytic Applications
   INTELLIGENCE                                    the sweet part (sugar/rhum)
     (DECIDE)

    BOTTOM LINE                                       = VALUE
Some Factories & architectures
        from vendors
Greenplum (EMC2)
An Example of a Turnkey Factory Solution
Another “Turnkey Factory” Example from Oracle
            Targeting high-end Analytics




AQUIRE (HARVEST)     ORGANIZE(EXTRACT)                        BUSINESS
                          ANALYSE                           INTELLIGENCE
ORGANIZE(EXTRACT)   (SCHRED/DISTILL/BOIL)                     (DECIDE)

                                        Image Source: Tom Kyte’s Big Data Are you ready ? presentation
The Microsoft way




+ Of Course, you can build your own factory using
 OpenSource widely available and on which most
            turnkey factory are built.
Technologies behind Big Data
Factory blocks & screws used for engineering
                  solutions
NoSQL will kill SQL ?!
Turning RDBMS to a legacy data store ?

Not at all.

We need RDBMS to store high value data and for its
feature rich approach (feature first).

NoSQL (scale first) is not a superset of RDBMS
technologies (a bit like Einstein Relativity to Newton
Physics).

Remember NoSQL is not “No SQL” but “Not Only SQL”
Big Data future
Rise of Data Marketplaces
Data Science tools development:
More powerful & expressive toolsets for analysis
Streaming Data processing emerging tools
(Twitter Storm, Yahoo s4, Streambase) :Real-time enablement / Live BI

Further cloud-enablement
Ease of integration to Enterprise Sources
Conclusion
To leverage Big Data you need something like a Sugar
Factory.
It can be very entry level factory (Excel – Azure Source)
or more complex.
The more complex and complete the more value at the
end of the processing chain

To turn Big Data technologies from developer-centric
solutions to enterprise solutions, they must be
combined with SQL solutions into a single proven
infrastructure meeting manageability and security
requirements of enterprises.
The challenge for Enterprises is to simplify Big Data
integration/engineering and leverage it where possible
to improve their processes at tactical and strategic
levels.

Architects & DBAs will be able to make choices for
datastores technologies and will need to understand
where one is better than the other.

Big Data has to be part of the Enterprise Applications
EcoSystem where it will be turned to value.
Thank you.

More Related Content

What's hot

Sudhir hadoop and Data warehousing resume
Sudhir hadoop and Data warehousing resume Sudhir hadoop and Data warehousing resume
Sudhir hadoop and Data warehousing resume Sudhir Saxena
 
Data Science Operationalization: The Journey of Enterprise AI
Data Science Operationalization: The Journey of Enterprise AIData Science Operationalization: The Journey of Enterprise AI
Data Science Operationalization: The Journey of Enterprise AIDenodo
 
DataStax GeekNet Webinar - Apache Cassandra: Enterprise NoSQL
DataStax GeekNet Webinar - Apache Cassandra: Enterprise NoSQLDataStax GeekNet Webinar - Apache Cassandra: Enterprise NoSQL
DataStax GeekNet Webinar - Apache Cassandra: Enterprise NoSQLDataStax
 
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InBuilding the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InSnapLogic
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond Rajesh Kumar
 
Improve your Tech Quotient
Improve your Tech QuotientImprove your Tech Quotient
Improve your Tech QuotientTarence DSouza
 
Balance agility and governance with #TrueDataOps and The Data Cloud
Balance agility and governance with #TrueDataOps and The Data CloudBalance agility and governance with #TrueDataOps and The Data Cloud
Balance agility and governance with #TrueDataOps and The Data CloudKent Graziano
 
Big Data & Oracle Technologies
Big Data & Oracle TechnologiesBig Data & Oracle Technologies
Big Data & Oracle TechnologiesOleksii Movchaniuk
 
Cloud Computing Big Data Is Future Of It
Cloud Computing Big  Data Is Future Of ItCloud Computing Big  Data Is Future Of It
Cloud Computing Big Data Is Future Of ItAman Ghei
 
Big data - what, why, where, when and how
Big data - what, why, where, when and howBig data - what, why, where, when and how
Big data - what, why, where, when and howbobosenthil
 
Introduction to Bigdata and NoSQL
Introduction to Bigdata and NoSQLIntroduction to Bigdata and NoSQL
Introduction to Bigdata and NoSQLTushar Shende
 
Data virtualization
Data virtualizationData virtualization
Data virtualizationHamed Hatami
 
Microsof azure class 1- intro
Microsof azure   class 1- introMicrosof azure   class 1- intro
Microsof azure class 1- introMHMuhammadAli1
 
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...Vladimir Bacvanski, PhD
 
The Evolution of Metadata: LinkedIn's Story [Strata NYC 2019]
The Evolution of Metadata: LinkedIn's Story [Strata NYC 2019]The Evolution of Metadata: LinkedIn's Story [Strata NYC 2019]
The Evolution of Metadata: LinkedIn's Story [Strata NYC 2019]Shirshanka Das
 
Modern Data Platforms
Modern Data Platforms Modern Data Platforms
Modern Data Platforms Arne Roßmann
 
Study notes for CompTIA Certified Advanced Security Practitioner
Study notes for CompTIA Certified Advanced Security PractitionerStudy notes for CompTIA Certified Advanced Security Practitioner
Study notes for CompTIA Certified Advanced Security PractitionerDavid Sweigert
 
Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12mark madsen
 

What's hot (20)

Sudhir hadoop and Data warehousing resume
Sudhir hadoop and Data warehousing resume Sudhir hadoop and Data warehousing resume
Sudhir hadoop and Data warehousing resume
 
Data Science Operationalization: The Journey of Enterprise AI
Data Science Operationalization: The Journey of Enterprise AIData Science Operationalization: The Journey of Enterprise AI
Data Science Operationalization: The Journey of Enterprise AI
 
Data lake ppt
Data lake pptData lake ppt
Data lake ppt
 
DataStax GeekNet Webinar - Apache Cassandra: Enterprise NoSQL
DataStax GeekNet Webinar - Apache Cassandra: Enterprise NoSQLDataStax GeekNet Webinar - Apache Cassandra: Enterprise NoSQL
DataStax GeekNet Webinar - Apache Cassandra: Enterprise NoSQL
 
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InBuilding the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump In
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond
 
Improve your Tech Quotient
Improve your Tech QuotientImprove your Tech Quotient
Improve your Tech Quotient
 
Balance agility and governance with #TrueDataOps and The Data Cloud
Balance agility and governance with #TrueDataOps and The Data CloudBalance agility and governance with #TrueDataOps and The Data Cloud
Balance agility and governance with #TrueDataOps and The Data Cloud
 
Big Data & Oracle Technologies
Big Data & Oracle TechnologiesBig Data & Oracle Technologies
Big Data & Oracle Technologies
 
Cloud Computing Big Data Is Future Of It
Cloud Computing Big  Data Is Future Of ItCloud Computing Big  Data Is Future Of It
Cloud Computing Big Data Is Future Of It
 
Big data - what, why, where, when and how
Big data - what, why, where, when and howBig data - what, why, where, when and how
Big data - what, why, where, when and how
 
Introduction to Bigdata and NoSQL
Introduction to Bigdata and NoSQLIntroduction to Bigdata and NoSQL
Introduction to Bigdata and NoSQL
 
Data virtualization
Data virtualizationData virtualization
Data virtualization
 
Microsof azure class 1- intro
Microsof azure   class 1- introMicrosof azure   class 1- intro
Microsof azure class 1- intro
 
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
 
The Evolution of Metadata: LinkedIn's Story [Strata NYC 2019]
The Evolution of Metadata: LinkedIn's Story [Strata NYC 2019]The Evolution of Metadata: LinkedIn's Story [Strata NYC 2019]
The Evolution of Metadata: LinkedIn's Story [Strata NYC 2019]
 
Modern Data Platforms
Modern Data Platforms Modern Data Platforms
Modern Data Platforms
 
Hadoop dev 01
Hadoop dev 01Hadoop dev 01
Hadoop dev 01
 
Study notes for CompTIA Certified Advanced Security Practitioner
Study notes for CompTIA Certified Advanced Security PractitionerStudy notes for CompTIA Certified Advanced Security Practitioner
Study notes for CompTIA Certified Advanced Security Practitioner
 
Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12
 

Viewers also liked

U.S. Cane Sugar Market. Analysis And Forecast to 2020
U.S. Cane Sugar Market. Analysis And Forecast to 2020U.S. Cane Sugar Market. Analysis And Forecast to 2020
U.S. Cane Sugar Market. Analysis And Forecast to 2020IndexBox Marketing
 
monsanto 12-01-08
monsanto 12-01-08monsanto 12-01-08
monsanto 12-01-08finance28
 
monsanto_rd_platform_aquisition
monsanto_rd_platform_aquisitionmonsanto_rd_platform_aquisition
monsanto_rd_platform_aquisitionfinance28
 
Marketing Strategy - Daurala Sugar Works
Marketing Strategy - Daurala Sugar WorksMarketing Strategy - Daurala Sugar Works
Marketing Strategy - Daurala Sugar WorksSharad Srivastava
 
Digital Business Models 101
Digital Business Models 101Digital Business Models 101
Digital Business Models 101Willy Braun
 
Sugarcane cultivation
Sugarcane cultivationSugarcane cultivation
Sugarcane cultivationsugarmills
 
constraints in sugarcane production and strategies to overcome
constraints in sugarcane production and strategies to overcomeconstraints in sugarcane production and strategies to overcome
constraints in sugarcane production and strategies to overcomeSameera Deshan
 
Lean Canvas Process and Examples
Lean Canvas Process and ExamplesLean Canvas Process and Examples
Lean Canvas Process and Examplesde-pe
 
Business Model Canvas
Business Model CanvasBusiness Model Canvas
Business Model Canvassvanebjerg
 
Business Model Canvas 101
Business Model Canvas 101Business Model Canvas 101
Business Model Canvas 101Emad Saif
 

Viewers also liked (13)

U.S. Cane Sugar Market. Analysis And Forecast to 2020
U.S. Cane Sugar Market. Analysis And Forecast to 2020U.S. Cane Sugar Market. Analysis And Forecast to 2020
U.S. Cane Sugar Market. Analysis And Forecast to 2020
 
monsanto 12-01-08
monsanto 12-01-08monsanto 12-01-08
monsanto 12-01-08
 
monsanto_rd_platform_aquisition
monsanto_rd_platform_aquisitionmonsanto_rd_platform_aquisition
monsanto_rd_platform_aquisition
 
Marketing Strategy - Daurala Sugar Works
Marketing Strategy - Daurala Sugar WorksMarketing Strategy - Daurala Sugar Works
Marketing Strategy - Daurala Sugar Works
 
sugarcane pests
sugarcane pests sugarcane pests
sugarcane pests
 
Digital Business Models 101
Digital Business Models 101Digital Business Models 101
Digital Business Models 101
 
Canvas examples
Canvas examplesCanvas examples
Canvas examples
 
Sugarcane cultivation
Sugarcane cultivationSugarcane cultivation
Sugarcane cultivation
 
Sugarcane crop-ebook
Sugarcane crop-ebookSugarcane crop-ebook
Sugarcane crop-ebook
 
constraints in sugarcane production and strategies to overcome
constraints in sugarcane production and strategies to overcomeconstraints in sugarcane production and strategies to overcome
constraints in sugarcane production and strategies to overcome
 
Lean Canvas Process and Examples
Lean Canvas Process and ExamplesLean Canvas Process and Examples
Lean Canvas Process and Examples
 
Business Model Canvas
Business Model CanvasBusiness Model Canvas
Business Model Canvas
 
Business Model Canvas 101
Business Model Canvas 101Business Model Canvas 101
Business Model Canvas 101
 

Similar to Introduction to Big Data - An Overview of Big Data Concepts and Technologies

Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyRohit Dubey
 
Big Data
Big DataBig Data
Big DataNGDATA
 
Hadoop for beginners free course ppt
Hadoop for beginners   free course pptHadoop for beginners   free course ppt
Hadoop for beginners free course pptNjain85
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
Deutsche Telekom on Big Data
Deutsche Telekom on Big DataDeutsche Telekom on Big Data
Deutsche Telekom on Big DataDataWorks Summit
 
The book of elephant tattoo
The book of elephant tattooThe book of elephant tattoo
The book of elephant tattooMohamed Magdy
 
Big data peresintaion
Big data peresintaion Big data peresintaion
Big data peresintaion ahmed alshikh
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Managementrightsize
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessAjay Ohri
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big dataRaul Chong
 
Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Prof.Balakrishnan S
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesAshraf Uddin
 
How to tackle big data from a security
How to tackle big data from a securityHow to tackle big data from a security
How to tackle big data from a securityTyrone Systems
 
Fast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow PresentationFast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow PresentationDenodo
 
Hadoop India Summit, Feb 2011 - Informatica
Hadoop India Summit, Feb 2011 - InformaticaHadoop India Summit, Feb 2011 - Informatica
Hadoop India Summit, Feb 2011 - InformaticaSanjeev Kumar
 

Similar to Introduction to Big Data - An Overview of Big Data Concepts and Technologies (20)

Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
 
Big Data
Big DataBig Data
Big Data
 
Big data analysis concepts and references
Big data analysis concepts and referencesBig data analysis concepts and references
Big data analysis concepts and references
 
Hadoop for beginners free course ppt
Hadoop for beginners   free course pptHadoop for beginners   free course ppt
Hadoop for beginners free course ppt
 
A Big Data Concept
A Big Data ConceptA Big Data Concept
A Big Data Concept
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Deutsche Telekom on Big Data
Deutsche Telekom on Big DataDeutsche Telekom on Big Data
Deutsche Telekom on Big Data
 
The book of elephant tattoo
The book of elephant tattooThe book of elephant tattoo
The book of elephant tattoo
 
Big data peresintaion
Big data peresintaion Big data peresintaion
Big data peresintaion
 
Big Data: an introduction
Big Data: an introductionBig Data: an introduction
Big Data: an introduction
 
Big Data: hype or necessity?
Big Data: hype or necessity?Big Data: hype or necessity?
Big Data: hype or necessity?
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Management
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help business
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19
 
Big Data
Big DataBig Data
Big Data
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 
How to tackle big data from a security
How to tackle big data from a securityHow to tackle big data from a security
How to tackle big data from a security
 
Fast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow PresentationFast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow Presentation
 
Hadoop India Summit, Feb 2011 - Informatica
Hadoop India Summit, Feb 2011 - InformaticaHadoop India Summit, Feb 2011 - Informatica
Hadoop India Summit, Feb 2011 - Informatica
 

Recently uploaded

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Recently uploaded (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Introduction to Big Data - An Overview of Big Data Concepts and Technologies

  • 1. Introduction to Big Data An analogy between Sugar Cane & Big Data Image Source: alternative-energy-fuels.com Image Source: MicFarris.com Jean-Marc Desvaux – March 2012
  • 2. Session Abstract : What is Big Data ? Where does it apply ? What are the technologies behind it ? Is it going to replace your RDBMS ? …
  • 3. Big data, It’s all Silicon Valley is talking about. It’s the new buzz word after ‘cloud.’ “Everybody is speaking of it and many are convinced it is the only way forward. As always, such dramatic statements are not only dangerous but serve to put some people off the concept. “
  • 4. Source: Tom Kyte’s Big Data Are you ready ? presentation
  • 5. What is Big Data ?
  • 6. Big Data is data that exceeds the processing capacity of conventional database systems. It’s too big, too fast or does not fit the structures of database architectures. To gain value from this type of data you need an alternative way to process it. Why this is happening ? Data is growing faster than computers are getting bigger.
  • 7. A catch-all term. Includes Social Networks data, Web logs, MP3s, Web pages unstructured content, XML, GPS tracking data, Vehicles Telemetry, financial market data and many more… Can be characterized by the 3 Vs :- Image Source: Tom Kyte’s Big Data Are you ready ? presentation
  • 8. Volume Data growing faster than machines getting bigger. Data sources adding up.. Velocity Rate of acquisition and desired rate of consumption. Variety Extends beyond structured data, includes unstructured data of all varieties. Image Source: Tom Kyte’s Big Data Are you ready ? presentation
  • 9. Where does Big Data apply?
  • 10. Big Data value to an Organisation falls into two main categories : Analytical Use Enabling new products and services
  • 11. Analytical Use To reveal insights previously hidden because hard to record and exploit. An edge on classic Analytics based on sampling and more “static” & predetermined reports. It promotes an investigative approach to data and put the data scientist and analyst in the spotlight. Hal Varian, chief economist at Google “I keep saying that the sexy job in the next 10 years will be statisticians”
  • 12. Some terms linked to the Analytical Use of Big Data Sentiment Analysis : Mining the Web in real time and getting a quick read of what people are thinking. Named-entity recognition (NER) (also known as entity identification and entity extraction) is a subtask of information extraction that seeks to locate and classify atomic elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.(ex: Big B in a tweet is for Big Brother or Amitabh Bachan)
  • 13. Product/Service Enabler Some products and services cannot exist if not backed up by Big Data technologies: -Need to Scale -Need a fast Feedback Loop on complex analytics. Highly successful Web startups pioneering Big Data technologies through R&D to enable new type of products are a good example: Google, Yahoo, Amazon,Facebook.
  • 14. Sectors with Fast Adoption and High Potential Financial Sector Telecommunications Government Health Retail
  • 15. Big Data Sources : Internal & Data Marketplaces.
  • 16. Internal sources Time Attendance logs RFID sensors logs Security Logs Vehicles GPS tracking Machinery/Telemetry Logs Pictures & videos Enterprise Social Networks Service Forum/Discussions …. Mostly anything unstructured or simply structured
  • 17. External Sources (feeders/data marketplaces) Examples: Infochimps.com, DataSift.com, datamarket.azure.com Source: DataSift.com
  • 18. An Enterprise Architecture for Big Data An analogy with a Sugar Cane Factory
  • 19. SUGAR CANE FIELDS A Sugar Factory AQUIRE (HARVEST) EXTRACT/SCHRED EVAPORATE/DISTILL/BOIL DRY/STORE/SUGAR BOTTOM LINE = VALUE
  • 20. DATA SOURCES (RDBMS & An Enterprise Big Data Factory Data Marketplaces) AQUIRE (HARVEST) HDFS NoSQL Database RDBMS (Hadoop Distributed FS) (Hadoop Distributed FS) Enterprise Applications ORGANIZE(EXTRACT) Map Reduce Big Data RDBMS (Hadoop) Connectors Connectors ANALYSE Data Warehousing / RDBMS stores (SCHRED/DISTILL/BOIL) BUSINESS Analytic Applications INTELLIGENCE the sweet part (sugar/rhum) (DECIDE) BOTTOM LINE = VALUE
  • 21. Some Factories & architectures from vendors
  • 22. Greenplum (EMC2) An Example of a Turnkey Factory Solution
  • 23. Another “Turnkey Factory” Example from Oracle Targeting high-end Analytics AQUIRE (HARVEST) ORGANIZE(EXTRACT) BUSINESS ANALYSE INTELLIGENCE ORGANIZE(EXTRACT) (SCHRED/DISTILL/BOIL) (DECIDE) Image Source: Tom Kyte’s Big Data Are you ready ? presentation
  • 24. The Microsoft way + Of Course, you can build your own factory using OpenSource widely available and on which most turnkey factory are built.
  • 26. Factory blocks & screws used for engineering solutions
  • 27. NoSQL will kill SQL ?!
  • 28. Turning RDBMS to a legacy data store ? Not at all. We need RDBMS to store high value data and for its feature rich approach (feature first). NoSQL (scale first) is not a superset of RDBMS technologies (a bit like Einstein Relativity to Newton Physics). Remember NoSQL is not “No SQL” but “Not Only SQL”
  • 30. Rise of Data Marketplaces Data Science tools development: More powerful & expressive toolsets for analysis Streaming Data processing emerging tools (Twitter Storm, Yahoo s4, Streambase) :Real-time enablement / Live BI Further cloud-enablement Ease of integration to Enterprise Sources
  • 32. To leverage Big Data you need something like a Sugar Factory. It can be very entry level factory (Excel – Azure Source) or more complex. The more complex and complete the more value at the end of the processing chain To turn Big Data technologies from developer-centric solutions to enterprise solutions, they must be combined with SQL solutions into a single proven infrastructure meeting manageability and security requirements of enterprises.
  • 33. The challenge for Enterprises is to simplify Big Data integration/engineering and leverage it where possible to improve their processes at tactical and strategic levels. Architects & DBAs will be able to make choices for datastores technologies and will need to understand where one is better than the other. Big Data has to be part of the Enterprise Applications EcoSystem where it will be turned to value.