SlideShare una empresa de Scribd logo
1 de 29
Big Data Analytics
Quite a few data analytics and visualization
tools are available in the market today from
leading vendors such as IBM, tableau, SAS,
analytics, statistics, world programming
system (WPS), etc. to help process and
analyze your big data
The Scope of Business
Intelligence
Smaller organizations:
Excel spreadsheets
Larger organizations:
Data mining, predictive
analytics, dashboards
What can you do with Business
Intelligence?
Business Intelligence Applications
• Multidimensional Analysis or Online Analytical
Processing (OLAP)
• Data Mining
• Decision Support Systems
• Big Data means different things to people with different
backgrounds and interests
• Traditionally, “Big Data” = massive volumes of data
– Example, volume of data at CERN, NASA, Google, …
• Where does the Big Data come from?
– Everywhere! Web logs, RFID, GPS systems, sensor networks,
social networks, Internet-based text documents, Internet search
indexes, detail call records, astronomy, atmospheric science,
biology, genomics, nuclear physics, biochemical experiments,
medical records, scientific research, military surveillance,
multimedia archives, …
Big Data - Definition and Concepts
Big Data - Definition and Concepts
• Big Data is a misnomer!
• Big Data is more than just “big”
• The Vs that define Big Data
– Volume
– Variety
– Velocity
– Veracity
– Variability
– Value
A High-Level Conceptual Architecture
for Big Data Solutions (by AsterData /
Teradata)
Math
and Stats
Data
Mining
Business
Intelligence
Applications
Languages
Marketing
ANALYTIC
TOOLS & APPS USERS
DISCOVERY PLATFORM
INTEGRATED
DATA WAREHOUSE
DATA
PLATFORM
ACCESSMANAGEMOVE
UNIFIED DATA ARCHITECTURE
System Conceptual View
Marketing
Executives
Operational
Systems
Frontline
Workers
Customers
Partners
Engineers
Data
Scientists
Business
Analysts
EVENT
PROCESSING
ERPERP
SCM
CRM
Images
Audio
and Video
Machine
Logs
Text
Web and
Social
BIG DATA
SOURCES
ERP
Fundamentals of Big Data
Analytics
• Big Data by itself, regardless of the size, type, or
speed, is worthless
• Big Data + “big” analytics = value
• With the value proposition, Big Data also brought
about big challenges
– Effectively and efficiently capturing, storing, and
analyzing Big Data
– New breed of technologies needed (developed or
purchased or hired or outsourced …)
Big Data Considerations
• You can’t process the amount of data that you want to because of the
limitations of your current platform.
• You can’t include new/contemporary data sources (example, social media,
RFID, Sensory, Web, GPS, textual data) because it does not comply with
the data storage schema
• You need to (or want to) integrate data as quickly as possible to be current
on your analysis.
• You want to work with a schema-on-demand data storage paradigm
because the variety of data types involved.
• The data is arriving so fast at your organization’s doorstep that your
traditional analytics platform cannot handle it.
Critical Success Factors for Big
Data Analytics
• A clear business need (alignment with the vision and the
strategy)
• Strong, committed sponsorship (executive champion)
• Alignment between the business and IT strategy
• A fact-based decision-making culture
• A strong data infrastructure
• The right analytics tools
• Right people with right skills
Critical Success Factors for
Big Data Analytics
Keys to Success
with Big Data
Analytics
A Clear
business need
Strong,
committed
sponsorship
Alignment
between the
business and IT
strategy
A fact-based
decision-making
culture
A strong data
infrastructure
The right
analytics tools
Personnel with
advanced
analytical skills
Enablers of Big Data Analytics
• In-memory analytics
– Storing and processing the complete data set in RAM
• In-database analytics
– Placing analytic procedures close to where data is stored
• Grid computing & MPP
– Use of many machines and processors in parallel (MPP -
massively parallel processing)
• Appliances
– Combining hardware, software, and storage in a single unit for
performance and scalability
Challenges of Big Data Analytics
• Data volume
– The ability to capture, store, and process the huge volume of data in a
timely manner
• Data integration
– The ability to combine data quickly and at reasonable cost
• Processing capabilities
– The ability to process the data quickly, as it is captured (i.e., stream
analytics)
• Data governance (… security, privacy, access)
• Skill availability (… data scientist)
• Solution cost (ROI)
Business Problems Addressed by
Big Data Analytics
• Process efficiency and cost reduction
• Brand management
• Revenue maximization, cross-selling/up-selling
• Enhanced customer experience
• Churn identification, customer recruiting
• Improved customer service
• Identifying new products and market opportunities
Business Problems
Addressed by Big Data
Analytics
• Risk management
• Regulatory compliance
• Enhanced security capabilities
Big Data Technologies--MapReduce
• MapReduce distributes the processing of very large multi-
structured data files across a large cluster of ordinary
machines/processors
• Goal - achieving high performance with “simple” computers
• Developed and popularized by Google
• Good at processing and analyzing large volumes of multi-
structured data in a timely manner
• Example tasks: indexing the Web for search, graph analysis,
text analysis, machine learning, …
Big Data Technologies--
MapReduce
• How does MapReduce work?
Big Data Technologies--Hadoop
• Hadoop is an open source framework for storing and analyzing
massive amounts of distributed, unstructured data
– Originally created by Doug Cutting at Yahoo!
• Hadoop clusters run on inexpensive commodity hardware so
projects can scale-out inexpensively
– Hadoop is now part of Apache Software Foundation
– Open source - hundreds of contributors continuously
improve the core technology
Big Data Technologies--
Hadoop
• Hadoop Technical Components
– Hadoop Distributed File System (HDFS)
– Name Node (primary facilitator)
– Secondary Node (backup to Name Node)
– Job Tracker
– Slave Nodes (the grunts of any Hadoop cluster)
– Additionally, Hadoop ecosystem is made up of a
number of complementary sub-projects: NoSQL
(Cassandra, Hbase), DW (Hive), …
• NoSQL = not only SQL
Big Data and Data Warehousing
• What is the impact of Big Data on DW?
– Big Data and RDBMS do not go nicely together
– Will Hadoop replace data warehousing/RDBMS?
• Use Cases for Hadoop
– Hadoop as the repository and refinery
– Hadoop as the active archive
• Use Cases for Data Warehousing
– Data warehouse performance
– Integrating data that provides business value
– Interactive BI tools
Hadoop Versus Data Warehouse
When to Use Which Platform
Requirement
Data
Warehouse
Hadoop
Low latency, interactive reports, and OLAP Checkmark Blank
ANSI 2003 SQL compliance is required Checkmark Checkmark
Preprocessing or exploration of raw
unstructured data
Blank Checkmark
Online archives alternative to tape Blank Checkmark
High-quality cleansed and consistent data Checkmark Checkmark
100s to 1,000s of concurrent users Checkmark Checkmark
Discover unknown relationships in the data Blank Checkmark
Definition of big data
• Anything beyond the human and technical
infrastructure needed to support storage,
processing and analysis.
• Today’s BIG may be tomorrow’s NORMAL.
• Terabytes or petabytes or zettabytes of data.
• I think it is about 3 vs.
Definition of big data
Cost-effective ,
innovation
forms of
information
processing
Enhanced
insight&decisi
on making
High-volume
High-velocity
High –variety
CHALLENGES WITH BIG DATA
• Data today is growing at an exponential rate. Most
of the data that we have today has been generated
in the last 2-3 years.
• Cloud computing and virtualization are here to
stay. cloud computing is the answer to managing
infrastructure for big data as far as cost-efficiency,
elasticity, and easy upgrading/downgrading is
concerned. This further complicates the decision to
host big data solution outsides the enterprise.
CHARACTERISTICS OF DATA
• Composition:
the composition of data deals with the
structure of data, that is , the sources of
data, the granularity, the types and the
nature of data as to whether it is static or
real- time streaming
• Condition:
the condition of data deals with the state
of data, that is “can one use this data as is
for analysis:” or “does it require cleansing
for further enhancement and enrichment.
Classification of analytics
More data
produced
More data
stored
More data
analyzed
Better
predictions
Steady
growth of
analysis
SYMMETRIC MULTIPROCESSOR
SYSTEM(SMP)
• Symmetric multiprocessor system
(SMP) has shared between or more
identical processor. The controlled by
a single operating system instance.
• Each processor has its own high-
speed memory called cache memory
and are connected using a application
Stream Analytics
A Use Case in Energy Industry
Sensor Data
(Energy Production
System Status)
Meteorological Data
(Wind, Light,
Temperature, etc.)
Usage Data
(Smart Meters,
Smart Grid Devises)
Permanent
Storage Area
Streaming Analytics
(Predicting Usage,
Production and
Anomalies)
Energy Production System
(Traditional and Renewable)
Energy Consumption System
(Residential and Commercial)
Data Integration
and Temporary
Staging
Capacity Decisions
Pricing Decisions
Big Data and Stream
Analytics
• Data-in-motion analytics and real-time data
analytics
– One of the Vs in Big Data = Velocity
• Analytic process of extracting actionable
information from continuously flowing data
• Why Stream Analytics?
– It may not be feasible to store the data, or lose
its value
• Stream Analytics Versus Perpetual Analytics
• Critical Event Processing

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Chapter 1 big data
Chapter 1 big dataChapter 1 big data
Chapter 1 big data
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Big data unit i
Big data unit iBig data unit i
Big data unit i
 
01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.
 
Big Data Open Source Technologies
Big Data Open Source TechnologiesBig Data Open Source Technologies
Big Data Open Source Technologies
 
data generalization and summarization
data generalization and summarization data generalization and summarization
data generalization and summarization
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentals
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Temporal databases
Temporal databasesTemporal databases
Temporal databases
 
Kdd process
Kdd processKdd process
Kdd process
 
Big data mining
Big data miningBig data mining
Big data mining
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
lazy learners and other classication methods
lazy learners and other classication methodslazy learners and other classication methods
lazy learners and other classication methods
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies Overview
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
I. AO* SEARCH ALGORITHM
I. AO* SEARCH ALGORITHMI. AO* SEARCH ALGORITHM
I. AO* SEARCH ALGORITHM
 
Apache PIG
Apache PIGApache PIG
Apache PIG
 
Data mining
Data miningData mining
Data mining
 
Data Analytics Life Cycle
Data Analytics Life CycleData Analytics Life Cycle
Data Analytics Life Cycle
 

Similar a Big data unit 2

big data processing.pptx
big data processing.pptxbig data processing.pptx
big data processing.pptxssuser96aab9
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsAbhishekKumarAgrahar2
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxAIMLSEMINARS
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusersBob Hardaway
 
bigdataintro.pptx
bigdataintro.pptxbigdataintro.pptx
bigdataintro.pptxAlbert Alex
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolutionitnewsafrica
 
Introduction to BIG DATA
Introduction to BIG DATA Introduction to BIG DATA
Introduction to BIG DATA Zeeshan Khan
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadhMithlesh Sadh
 
Digital intelligence satish bhatia
Digital intelligence satish bhatiaDigital intelligence satish bhatia
Digital intelligence satish bhatiaSatish Bhatia
 
Lecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdfLecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdfahmedibrahimghnnam01
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amirydatastack
 

Similar a Big data unit 2 (20)

big data processing.pptx
big data processing.pptxbig data processing.pptx
big data processing.pptx
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in details
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
 
Lecture1
Lecture1Lecture1
Lecture1
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
IT webinar 2016
IT webinar 2016IT webinar 2016
IT webinar 2016
 
unit 1 big data.pptx
unit 1 big data.pptxunit 1 big data.pptx
unit 1 big data.pptx
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
bigdataintro.pptx
bigdataintro.pptxbigdataintro.pptx
bigdataintro.pptx
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Big Data
Big DataBig Data
Big Data
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Introduction to BIG DATA
Introduction to BIG DATA Introduction to BIG DATA
Introduction to BIG DATA
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
Big Data - Module 1
Big Data - Module 1Big Data - Module 1
Big Data - Module 1
 
Thilga
ThilgaThilga
Thilga
 
Digital intelligence satish bhatia
Digital intelligence satish bhatiaDigital intelligence satish bhatia
Digital intelligence satish bhatia
 
Lecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdfLecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdf
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 

Último

Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.MateoGardella
 

Último (20)

Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 

Big data unit 2

  • 2. Quite a few data analytics and visualization tools are available in the market today from leading vendors such as IBM, tableau, SAS, analytics, statistics, world programming system (WPS), etc. to help process and analyze your big data
  • 3. The Scope of Business Intelligence Smaller organizations: Excel spreadsheets Larger organizations: Data mining, predictive analytics, dashboards
  • 4. What can you do with Business Intelligence? Business Intelligence Applications • Multidimensional Analysis or Online Analytical Processing (OLAP) • Data Mining • Decision Support Systems
  • 5. • Big Data means different things to people with different backgrounds and interests • Traditionally, “Big Data” = massive volumes of data – Example, volume of data at CERN, NASA, Google, … • Where does the Big Data come from? – Everywhere! Web logs, RFID, GPS systems, sensor networks, social networks, Internet-based text documents, Internet search indexes, detail call records, astronomy, atmospheric science, biology, genomics, nuclear physics, biochemical experiments, medical records, scientific research, military surveillance, multimedia archives, … Big Data - Definition and Concepts
  • 6. Big Data - Definition and Concepts • Big Data is a misnomer! • Big Data is more than just “big” • The Vs that define Big Data – Volume – Variety – Velocity – Veracity – Variability – Value
  • 7. A High-Level Conceptual Architecture for Big Data Solutions (by AsterData / Teradata) Math and Stats Data Mining Business Intelligence Applications Languages Marketing ANALYTIC TOOLS & APPS USERS DISCOVERY PLATFORM INTEGRATED DATA WAREHOUSE DATA PLATFORM ACCESSMANAGEMOVE UNIFIED DATA ARCHITECTURE System Conceptual View Marketing Executives Operational Systems Frontline Workers Customers Partners Engineers Data Scientists Business Analysts EVENT PROCESSING ERPERP SCM CRM Images Audio and Video Machine Logs Text Web and Social BIG DATA SOURCES ERP
  • 8. Fundamentals of Big Data Analytics • Big Data by itself, regardless of the size, type, or speed, is worthless • Big Data + “big” analytics = value • With the value proposition, Big Data also brought about big challenges – Effectively and efficiently capturing, storing, and analyzing Big Data – New breed of technologies needed (developed or purchased or hired or outsourced …)
  • 9. Big Data Considerations • You can’t process the amount of data that you want to because of the limitations of your current platform. • You can’t include new/contemporary data sources (example, social media, RFID, Sensory, Web, GPS, textual data) because it does not comply with the data storage schema • You need to (or want to) integrate data as quickly as possible to be current on your analysis. • You want to work with a schema-on-demand data storage paradigm because the variety of data types involved. • The data is arriving so fast at your organization’s doorstep that your traditional analytics platform cannot handle it.
  • 10. Critical Success Factors for Big Data Analytics • A clear business need (alignment with the vision and the strategy) • Strong, committed sponsorship (executive champion) • Alignment between the business and IT strategy • A fact-based decision-making culture • A strong data infrastructure • The right analytics tools • Right people with right skills
  • 11. Critical Success Factors for Big Data Analytics Keys to Success with Big Data Analytics A Clear business need Strong, committed sponsorship Alignment between the business and IT strategy A fact-based decision-making culture A strong data infrastructure The right analytics tools Personnel with advanced analytical skills
  • 12. Enablers of Big Data Analytics • In-memory analytics – Storing and processing the complete data set in RAM • In-database analytics – Placing analytic procedures close to where data is stored • Grid computing & MPP – Use of many machines and processors in parallel (MPP - massively parallel processing) • Appliances – Combining hardware, software, and storage in a single unit for performance and scalability
  • 13. Challenges of Big Data Analytics • Data volume – The ability to capture, store, and process the huge volume of data in a timely manner • Data integration – The ability to combine data quickly and at reasonable cost • Processing capabilities – The ability to process the data quickly, as it is captured (i.e., stream analytics) • Data governance (… security, privacy, access) • Skill availability (… data scientist) • Solution cost (ROI)
  • 14. Business Problems Addressed by Big Data Analytics • Process efficiency and cost reduction • Brand management • Revenue maximization, cross-selling/up-selling • Enhanced customer experience • Churn identification, customer recruiting • Improved customer service • Identifying new products and market opportunities
  • 15. Business Problems Addressed by Big Data Analytics • Risk management • Regulatory compliance • Enhanced security capabilities
  • 16. Big Data Technologies--MapReduce • MapReduce distributes the processing of very large multi- structured data files across a large cluster of ordinary machines/processors • Goal - achieving high performance with “simple” computers • Developed and popularized by Google • Good at processing and analyzing large volumes of multi- structured data in a timely manner • Example tasks: indexing the Web for search, graph analysis, text analysis, machine learning, …
  • 17. Big Data Technologies-- MapReduce • How does MapReduce work?
  • 18. Big Data Technologies--Hadoop • Hadoop is an open source framework for storing and analyzing massive amounts of distributed, unstructured data – Originally created by Doug Cutting at Yahoo! • Hadoop clusters run on inexpensive commodity hardware so projects can scale-out inexpensively – Hadoop is now part of Apache Software Foundation – Open source - hundreds of contributors continuously improve the core technology
  • 19. Big Data Technologies-- Hadoop • Hadoop Technical Components – Hadoop Distributed File System (HDFS) – Name Node (primary facilitator) – Secondary Node (backup to Name Node) – Job Tracker – Slave Nodes (the grunts of any Hadoop cluster) – Additionally, Hadoop ecosystem is made up of a number of complementary sub-projects: NoSQL (Cassandra, Hbase), DW (Hive), … • NoSQL = not only SQL
  • 20. Big Data and Data Warehousing • What is the impact of Big Data on DW? – Big Data and RDBMS do not go nicely together – Will Hadoop replace data warehousing/RDBMS? • Use Cases for Hadoop – Hadoop as the repository and refinery – Hadoop as the active archive • Use Cases for Data Warehousing – Data warehouse performance – Integrating data that provides business value – Interactive BI tools
  • 21. Hadoop Versus Data Warehouse When to Use Which Platform Requirement Data Warehouse Hadoop Low latency, interactive reports, and OLAP Checkmark Blank ANSI 2003 SQL compliance is required Checkmark Checkmark Preprocessing or exploration of raw unstructured data Blank Checkmark Online archives alternative to tape Blank Checkmark High-quality cleansed and consistent data Checkmark Checkmark 100s to 1,000s of concurrent users Checkmark Checkmark Discover unknown relationships in the data Blank Checkmark
  • 22. Definition of big data • Anything beyond the human and technical infrastructure needed to support storage, processing and analysis. • Today’s BIG may be tomorrow’s NORMAL. • Terabytes or petabytes or zettabytes of data. • I think it is about 3 vs.
  • 23. Definition of big data Cost-effective , innovation forms of information processing Enhanced insight&decisi on making High-volume High-velocity High –variety
  • 24. CHALLENGES WITH BIG DATA • Data today is growing at an exponential rate. Most of the data that we have today has been generated in the last 2-3 years. • Cloud computing and virtualization are here to stay. cloud computing is the answer to managing infrastructure for big data as far as cost-efficiency, elasticity, and easy upgrading/downgrading is concerned. This further complicates the decision to host big data solution outsides the enterprise.
  • 25. CHARACTERISTICS OF DATA • Composition: the composition of data deals with the structure of data, that is , the sources of data, the granularity, the types and the nature of data as to whether it is static or real- time streaming • Condition: the condition of data deals with the state of data, that is “can one use this data as is for analysis:” or “does it require cleansing for further enhancement and enrichment.
  • 26. Classification of analytics More data produced More data stored More data analyzed Better predictions Steady growth of analysis
  • 27. SYMMETRIC MULTIPROCESSOR SYSTEM(SMP) • Symmetric multiprocessor system (SMP) has shared between or more identical processor. The controlled by a single operating system instance. • Each processor has its own high- speed memory called cache memory and are connected using a application
  • 28. Stream Analytics A Use Case in Energy Industry Sensor Data (Energy Production System Status) Meteorological Data (Wind, Light, Temperature, etc.) Usage Data (Smart Meters, Smart Grid Devises) Permanent Storage Area Streaming Analytics (Predicting Usage, Production and Anomalies) Energy Production System (Traditional and Renewable) Energy Consumption System (Residential and Commercial) Data Integration and Temporary Staging Capacity Decisions Pricing Decisions
  • 29. Big Data and Stream Analytics • Data-in-motion analytics and real-time data analytics – One of the Vs in Big Data = Velocity • Analytic process of extracting actionable information from continuously flowing data • Why Stream Analytics? – It may not be feasible to store the data, or lose its value • Stream Analytics Versus Perpetual Analytics • Critical Event Processing