SlideShare una empresa de Scribd logo
1 de 25
March 26, 2015
Von McConnell
5/5/2015 Big Data Workshop 2
What is Big Data*?
The collection & analysis of data sets so large, complex & rapidly changing
that it is difficult to process & understand using traditional data processing tools &
applications
• Coined by either Professor Francis Diebold, University Pennsylvania or John Mashely, Chief Scientist of Silicon Graphics around 1999
5/5/2015 Big Data Workshop 3
Page 3
Traditional Analytics vs. Big Data Analytics
Traditional Analytics Big Data Analytics
 Report Past Events
 Processing Times 1-2 days
 Batch file oriented
 Responds NOW
 Processing Times <1-5 seconds
 Near Time oriented
 Traditional big DB Relational
Data
 Self generated
 Defined meta-data structures
 Batch file oriented
 Real-time data + warehouse
 Everyone creates data – e.g,
Industry, Cross-Ind. Gov. etc.
 All forms, images, videos, texts
 Near real time
 Linear Growth
 Mostly Sampled
 Gigabytes (109), Terabytes (1012)
 Exponential Growth
 All the Data
 Petabytes (1015), Exabytes (1018),
Zettabytes (1021) , Yotabytes, etc.
 Sustained relevance of
data series
 Short term relevance of
data snippetsVelocity
Volume
Variety
4
Big Data Workshop
5/5/2015
Page 4
More Big Data “facts” – Appendix pages 24 - 26
Big Data Workshop
55/5/2015
Page 5
Drivers of Big Data
(not inclusive)
Historical
• Cost of Data Storage
• Cost of Computing
• Mobile Phones and tablets
• Increase access to Internet
• Social Media and eCommerce
• Web Search
• Etc.
Future
• IoT (Internet of Things)
• Internet Cloud
• Analog to Digital Conversions
• Data Driven Decision-making
• Enterprise Applications
• Fraud, Security, CRM, ERP, etc.
0
10
20
30
40
2010 2015 2020
Zettabytes
Data Consumption Over the Years
Total Data
Enterprise
Managed
Enterprise
Created
- In 2007 was estimated that all human knowledge was 295 Exabytes
- In 2015, 1 Exabyte created d each day on internet = 250 million DVDs
worth of information
- By 2020, there will be 5.2 Terabytes per person on earth
- 70% of all data generated by individuals but 80% is stored & managed
by enterprises *The Rapid Growth of Big Data - CSC
*The Rapid Growth of
Big Data - CSC
More Big Data “Facts Pages
5/5/2015 Big Data Workshop 6
Page 6
People of Big Data
OPSBig DataSystem AdministrationSys Design/Engineering Apps, Process, Business
Data Scientist
• Recent due to Big Data ~ 2010
• Uses internal, external & 3rd
Party Data sets to answer
questions
• Knowledge of BD systems like
Hadoop, Spark, Python/R,
Statistics, etc.
• Looks for hidden insights to
solve business problems
• Usually PhD or MS degree
Data Engineer
• Traditional Engineer who knows DB systems,
Excel, Access, etc.
• Compiles, installs DB systems and writes
queries, etc.
• Knows DB software such as SQL/NoSQL
• Usually CS or Info Sys degree
Data Analyst
• Compiles & analyzes information – generally not Big Data
• Draws analytical insights from available data and makes business reports to aid
decision making –e.g., sales analyst, operations analyst, etc.
• Usually has CS or Business degree
Data Architect
• Manages lots of data
• Translates data into usable
info
• Designs DBs, and manages
data sets
• Com Sci, Com Eng, Info Sys,
etc.
7
Big Data Workshop
Data
Architect
5/5/2015
Page 7
Generalized Big Data Education Curriculums*
Usually Interdisciplinary
PhD Programs
Statistics
Computer
Science
Elected
Discipline**
Masters Programs
* Based on experience with Carnegie Mellon, Stanford and UC Berkley
** e.g., Engineering Programs, Biology, Economics, Psychology, etc.
Elected
Discipline**
Statistics
Computer
Science
8
Big Data Workshop
5/5/2015
Page 8
Big Data Challenges
1. General Business Topics
a. Value vs. Cost vs. Business Case
b. Understanding how to apply Big Data
• Focus on use cases, not technology
c. Corporate Commitment & Leadership
• Hesitation usually exist
d. Common Taxonomy/Definitions
e. HR/Personnel Issues
• Finding skilled personnel
– everyone thinks they know Big Data ;>
• Educating the masses
• Vendor vs. Employee-lead
f. Privacy, Security & Policy/Regulations
• New/increased costs
g. Etc.
9
Reporting
Monitoring
Data Mining
Evaluation
Prediction
Why did it happen?
(hypothesis based)
What will happen?
Why did it happen?
(correlations only)
What is happening now?
What is happening?
Complexity
Value & Cost
Big Data Capability Blocks
Privacy vs. Value Breakdown
$Value Privacy
Usage
Preferences
Status
Location
Personal
Identifiers
Intent
Relationships
Demographics
Interactions
3 Steps to Sustained Big Data Analytics Evolution – Appendix page 27
Big Data Workshop
5/5/2015
Page 9
Big Data Challenges (cont’d)
2) Data Management Topics
a. Data Governance – e.g., efficiencies of
standardization & data sharing
b. Collecting the “right” data – not everything
c. Lack of common data dictionary across
enterprise
d. How much data to collect – e.g., storage costs
e. Amount of data integrity – e.g., redundant
data system vs. how they handle data
f. Data retention – e.g., how long to store
g. Etc.
3) Data / BI Architectures Topics
a. Redundant & customized data systems e.g.,
silo’d data
b. Greenfield vs. Metadata & Federation Usage
c. Where analytics accomplished – e.g.
transport costs
d. Etc.
10
RealityGoal
Big Data Workshop
5/5/2015
Page 10
What are the essential elements needed to create
a Big Data professional education program?
Charge to Participants
This could include:
• What are the challenge areas in Big Data environment?
• What are the skills and knowledge need to meet those challenges?
• How do we address these educational needs or gaps through training?
5/5/2015 Big Data Workshop 11
Page 11
Example Use Case by Industries
• Educators
• Health Care
• HR Administrators
• Language/Linguistics
• Mobile Carriers
• Railroad
• Sales - Retail & Wholesale
• Water Utilities
• Web Search
Page
14
15
16
17
18
19
20
21
22
Big Data Workshop 125/5/2015
Page 12
Educational Sector Use Cases
1) Student Performance Management & Intervention
– What sequence of topics/subjects are most effective for a specific student?
– Predict student academic and behavior issues using social media, web semantics, interpret grades,
etc.
– Develop student- and class-specific recommendations, such as individual or small group tutoring,
supplemental learning materials in “problem” subject areas, or even changes in classes or majors
1) Teaching Effectiveness Analytics
– More data allows for more robust comparison of teachers across years and schools/districts
– Allows for a common base comparisons for trying new curricula
2) Academic Research
– Wide open at all scholastic levels and subjects
13
Big Data Workshop
5/5/2015
Page 13
Health Care Sector Use Cases
1) Predict sickness/disease outbreaks using reliable information on geographical
movement
– Reliable information from patients, doctors submitting reports, mobility records
2) Use pattern matching for predictive health: Correlate patient visits, diagnostics,
and hospital/provider interactions across years of multiple visits
– Find repeatable patterns in patient data & long term illness diagnosis (hypertension, diabetes, cancer,
etc.)
– Predict retreatment risk & proactively address, to avoid readmission within Medicare’s 30-day window
3) Identify best care approach via clinical analysis
– Longitudinal analysis of care across patients and diagnoses
– Cluster analysis around influencers on treatment, physicians, therapist, patient social relationships,
mobility, income, etc.
4) Perform fraud analysis and identification via pattern analysis
– Understand relationships among parties (physicians, consumers, organizations), locations, time of
filing, frequency and circumstances
– Detect potential for computer generated claims, graph analysis of cohort networks
14
Big Data Workshop
5/5/2015
Page 14
Human Resources Sector Use Cases
1) Hiring New Employees
– Profile candidates on various data points (e.g., situational/problem solving, social media interactions
etc.) to determine with probability how candidates will perform in specific positions, reduce employee
turnover, impact employee happiness, etc.
– Minimize risk of negative results and missed budgets by guiding managers to make better, more
informed decisions on which employees to select
2) Labor Force Cost Controls
– Use holistic industry and cross-industry analytics to control labor costs by recommending the right
level of labor (Mgr. vs. VP) and overall scope of position responsibilities
– Use holistic industry and cross industry analytics to reveal the optimal organizational size and shape
3) Productivity Improvements
– Improve workforce productivity with quick, timely adjustments to labor levels and fluctuating
workload volume
15
Big Data Workshop
5/5/2015
Page 15
Language/Linguistics Sector Use Cases
16
1) Identify Author of Anonymous Text
– Discover who wrote anonymous text, where it comes from, who claims what, who refutes whom, and
how many people claim this and how many people claim that position, etc.
2) Language Translation
– Better & faster real-time translations of words and name as well as their cultural/societal meanings
between various languages and locations, even slang and dialects. (almost 7000 primary languages
and dialects w/another 39k sublanguages & dialects)
– Significant improvements for verbal & textual web search capabilities across the world
3) Computational Linguistics
– Allows for a baseline understanding of the various cultural, social, political, religious etc., biases, of
individuals based upon the information read/obtained. Once an individual baseline is established,
individual biases and specific information via web searches can be summarized and delivered with the
individual biases for specific individuals, e.g., read all info with a “Republican” bias, etc.
Big Data Workshop
5/5/2015
Page 16
Mobile Carrier Sector Use Cases
17
1) Improved Network Operations
– Better network coverage by understanding customer movement patterns based on weather, social
events, etc.
– Identifying and resolving network bottlenecks in minutes
– Proactively managing customer experience and churn
– Managing and planning for capacity requirements to maintain and improve the quality of service
2) Proactive Call Centers
– Identifying and resolving service issues in minutes
– Proactively managing customer experience and churn
– Maximizing revenue and margins from existing subscriber base
– Decreasing average call handling times and network operating costs
3) Movement Analyses
– What is the movement between geographical markets/stores?
– What % of people visit more than 1 location within a specific timeframe?
– What locations share the same traffic patterns?
– Associate movement w/activities – e.g., purchasing trends, diseases, energy consumption
Big Data Workshop
5/5/2015
Page 17
Railroad Sector Use Cases
1) Improve Fuel (Annual consumption = 1.7 trillion ton miles of freight w/3.6 billion tons of fuels)
– Holistically determine shortest route, congestion, carriage determinations, etc.
– Predict & track conditions based on usage, weather, by train type, etc.
2) Predict train maintenance schedules based on type of train, mileage, parts usage
and shelf-life expectations, etc.
– Overlay augmented reality handsets to calculate in real time specific engine parts and even where
located and how best to get them to the nearest local maintenance facility
3) Improve Safety
– Use various sensors, thermometers, and trends to detect problems with railway tracks in order to
predict and prevent potential derailments
18
Big Data Workshop
5/5/2015
Page 18
Retail Sales Sector Use Cases
1) Store Traverse Pattern Analysis
– Use sensors to understand unique shopper’s movements and purchases throughout the store. How
long are they standing in one place vs. another? How many items are reviewed before one is selected
(if any at all)
– Understand shopper demographics, interests and possible desired brands as they walk in the store
and then “help” them make selections via electronic means
2) Voice of Customer
– Use social sentiment analytics (web semantics) to assess overall store preferences, brand status and
the launch of new products, services, or offers
– By combining social and mobile analytics with loyalty information, retailers can create personalized,
more relevant engagements with shoppers
3) Encourage Store Visits
– Using presence and location-based mobility analytics, retailers pinpoint the location of opt-in
shoppers when they are close to a store location
19
Big Data Workshop
5/5/2015
Page 19
Water Utilities Sector Use Cases
1) Better Understanding of Customer improves Service & Reduces Costs
– Understanding of customer profiles, weather patterns, populations growth & dynamics, mobility of
customers help to understand and predict usage patterns
– Size & condition of distribution network and better understanding of individual residential or business
consumption patterns allow for much easier customized pricing packages to meet needs.
2) Improving Operations
– Pinpointing or forecasting the location of outages for workforce deployment, scanning for potential
fraud or theft or security breaches, as well as identifying trends or patterns for unbilled accounts
assists the net income
3) Predictive Asset Management
– Forecast potential performance or equipment failures in the distribution network as well as rate
capacity of the network.
– Leaks can also help predict potential distribution plant failures.
20
Big Data Workshop
5/5/2015
Page 20
Web Search Sector Use Cases
21
1. Customer Analysis
– Advertising effectiveness - How many, who are they, and how long do they look at an advertisement?
– Where are they from?
– Apply demographic, lifestyle, interest
– Socioeconomic profiling
– Understand advertising campaign effectiveness
– Identify repeat viewer patterns
2) Economics of Social Networks
• Who are the Influencers in specific social network groups?
• How do Influencers impact purchases by others?
• Understand impacts on churn
• Understand impacts of acquisitions
3) Customer Behavior
–Do various channel solicitations influence viewing, purchasing or no action?
–How often do they shop/stay on your web store?
–What is the “Next Best Offer”?
Big Data Workshop
5/5/2015
Page 21
Appendix
22
Big Data Workshop
5/5/2015
Page 22
 With better big data integration, various industries could have significant savings:
 Healthcare – could save $300b per year (this is almost $1k per every man, woman and child)
 Mobile Carriers – could save 50% of annual budgets per year by better understanding and
forecasting usage patterns
 Decoding the human genome originally took 10 years to complete: now it can be achieved
in one week
 Wal-Mart has more than 1 million customer transactions every hour
 Retail sales revenue would increase by 60% if optimally used big data analytics
 Over 90% of data was created in the last 2 years
 The NSA is thought to analyze 1.6% of all global internet traffic (~ 30 petabytes)
 An average mobile phone user “looks” their phone 45 times per hour. Each transaction
generates on average of over 1200 data points or about 800k data points per user/per day
 There is more data generated on mobile phones than on desktop and laptop computers
combined
 One Boeing Jet creates 50 terabytes of data per hour (can’t download data until plane
lands)
More Big Data “Facts”?
(many attributed to Bernard Marr)
23
Big Data Workshop
5/5/2015
Page 23
245/5/2015 Big Data Workshop
Page 24
25Big Data Workshop5/5/2015
Page 25
3 Steps to Sustained Big Data Analytics Evolution
2) Collaborative path
1) Specialized path
• Functionally/issues-
based
• Solution-oriented
• Vertically specific
Business issue and vertical
process-driven
Data-driven
HIGH
LOW HIGH
Transforma
tion
• Individual Approaches
• Custom/Point Solutions
• Redundant Systems
• Increase Costs
• Functional/Dept. Leaders
• Data Analytics as Value
• Single Approach Focused
• Enterprise wide
• Program Oriented
• Focused Approach
• Reduce Costs Focused
• Lead by CIOs, COO & CFOs
3) Economies of Scale path
• Taxonomy/Definitions
• Architectures
• Analytic Methods
• Education
• Regulatory Issues
• Standards
26
Big Data Workshop
5/5/2015
Page 26

Más contenido relacionado

La actualidad más candente

SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
ijdpsjournal
 
Big data
Big dataBig data
Big data
Claire Choong
 
Data_Scientist_Position_Description
Data_Scientist_Position_DescriptionData_Scientist_Position_Description
Data_Scientist_Position_Description
Suman Banerjee
 
Big Data Brown Bag
Big Data Brown BagBig Data Brown Bag
Big Data Brown Bag
usmanqureshi
 
Dang and Nkhoma (2013), "Information Availability as Driver of Information Se...
Dang and Nkhoma (2013), "Information Availability as Driver of Information Se...Dang and Nkhoma (2013), "Information Availability as Driver of Information Se...
Dang and Nkhoma (2013), "Information Availability as Driver of Information Se...
tduy0506
 

La actualidad más candente (18)

Data analytics
Data analyticsData analytics
Data analytics
 
BIG DATA RESEARCH
BIG DATA RESEARCHBIG DATA RESEARCH
BIG DATA RESEARCH
 
Lecture #02
Lecture #02 Lecture #02
Lecture #02
 
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
 
Big data
Big dataBig data
Big data
 
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
 
Semantic Web Investigation within Big Data Context
Semantic Web Investigation within Big Data ContextSemantic Web Investigation within Big Data Context
Semantic Web Investigation within Big Data Context
 
Lecture #01
Lecture #01Lecture #01
Lecture #01
 
Data_Scientist_Position_Description
Data_Scientist_Position_DescriptionData_Scientist_Position_Description
Data_Scientist_Position_Description
 
Data science unit1
Data science unit1Data science unit1
Data science unit1
 
Overview of Big Data
Overview of Big DataOverview of Big Data
Overview of Big Data
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Big Data Brown Bag
Big Data Brown BagBig Data Brown Bag
Big Data Brown Bag
 
Data science
Data scienceData science
Data science
 
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
 
Bigdata and Hadoop with applications
Bigdata and Hadoop with applicationsBigdata and Hadoop with applications
Bigdata and Hadoop with applications
 
Data mining
Data mining Data mining
Data mining
 
Dang and Nkhoma (2013), "Information Availability as Driver of Information Se...
Dang and Nkhoma (2013), "Information Availability as Driver of Information Se...Dang and Nkhoma (2013), "Information Availability as Driver of Information Se...
Dang and Nkhoma (2013), "Information Availability as Driver of Information Se...
 

Destacado

Corrientes Filosóficas
Corrientes FilosóficasCorrientes Filosóficas
Corrientes Filosóficas
marko17adepla
 
Cơ thể kêu cứu do thói quen bẻ khớp
Cơ thể kêu cứu do thói quen bẻ khớpCơ thể kêu cứu do thói quen bẻ khớp
Cơ thể kêu cứu do thói quen bẻ khớp
edyth411
 
JAVED SAYYED RESUME (2)
JAVED SAYYED RESUME (2)JAVED SAYYED RESUME (2)
JAVED SAYYED RESUME (2)
Javed Sayyed
 

Destacado (19)

Aprendizaje autónomo
Aprendizaje  autónomoAprendizaje  autónomo
Aprendizaje autónomo
 
MURALIDHARANPM_CV
MURALIDHARANPM_CVMURALIDHARANPM_CV
MURALIDHARANPM_CV
 
Corrientes Filosóficas
Corrientes FilosóficasCorrientes Filosóficas
Corrientes Filosóficas
 
AY GROUP's PRESENTATION
AY GROUP's PRESENTATION AY GROUP's PRESENTATION
AY GROUP's PRESENTATION
 
Strat man bup mba cl 5 7
Strat man bup mba cl 5 7Strat man bup mba cl 5 7
Strat man bup mba cl 5 7
 
Resume-Reference-1
Resume-Reference-1Resume-Reference-1
Resume-Reference-1
 
Teori sifat
Teori sifatTeori sifat
Teori sifat
 
PEGN ADEUS CHEFE
PEGN ADEUS CHEFEPEGN ADEUS CHEFE
PEGN ADEUS CHEFE
 
Cơ thể kêu cứu do thói quen bẻ khớp
Cơ thể kêu cứu do thói quen bẻ khớpCơ thể kêu cứu do thói quen bẻ khớp
Cơ thể kêu cứu do thói quen bẻ khớp
 
JAVED SAYYED RESUME (2)
JAVED SAYYED RESUME (2)JAVED SAYYED RESUME (2)
JAVED SAYYED RESUME (2)
 
Загрузка данных в 1с
Загрузка данных в 1сЗагрузка данных в 1с
Загрузка данных в 1с
 
Presentacion prezi
Presentacion preziPresentacion prezi
Presentacion prezi
 
Project Manger Resume
Project Manger ResumeProject Manger Resume
Project Manger Resume
 
Delaval milking routine
Delaval milking routine Delaval milking routine
Delaval milking routine
 
Practical Relevance Measurement
Practical Relevance MeasurementPractical Relevance Measurement
Practical Relevance Measurement
 
Bab iv
Bab ivBab iv
Bab iv
 
Rapport de stage jibin Lin
Rapport de stage jibin LinRapport de stage jibin Lin
Rapport de stage jibin Lin
 
Session 03 – emitters
Session 03 – emittersSession 03 – emitters
Session 03 – emitters
 
Unidad 1 de salud alimentaria
Unidad 1 de salud alimentariaUnidad 1 de salud alimentaria
Unidad 1 de salud alimentaria
 

Similar a KU_Big_Data_3_25_2015a

SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
ijdpsjournal
 
Data Analytics in Industry Verticals, Data Analytics Lifecycle, Challenges of...
Data Analytics in Industry Verticals, Data Analytics Lifecycle, Challenges of...Data Analytics in Industry Verticals, Data Analytics Lifecycle, Challenges of...
Data Analytics in Industry Verticals, Data Analytics Lifecycle, Challenges of...
Sahilakhurana
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
Raul Chong
 

Similar a KU_Big_Data_3_25_2015a (20)

ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
 
Big data
Big dataBig data
Big data
 
elgendy2014.pdf
elgendy2014.pdfelgendy2014.pdf
elgendy2014.pdf
 
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
 
Bigdata Hadoop introduction
Bigdata Hadoop introductionBigdata Hadoop introduction
Bigdata Hadoop introduction
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
Big Data for Library Services (2017)
Big Data for Library Services (2017)Big Data for Library Services (2017)
Big Data for Library Services (2017)
 
Complete-SRS.doc
Complete-SRS.docComplete-SRS.doc
Complete-SRS.doc
 
Big Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARLBig Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARL
 
Data Analytics in Industry Verticals, Data Analytics Lifecycle, Challenges of...
Data Analytics in Industry Verticals, Data Analytics Lifecycle, Challenges of...Data Analytics in Industry Verticals, Data Analytics Lifecycle, Challenges of...
Data Analytics in Industry Verticals, Data Analytics Lifecycle, Challenges of...
 
Big Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptxBig Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptx
 
1 UNIT-DSP.pptx
1 UNIT-DSP.pptx1 UNIT-DSP.pptx
1 UNIT-DSP.pptx
 
Predictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use CasesPredictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use Cases
 
Unit-1 -2-3- BDA PIET 6 AIDS.pptx
Unit-1 -2-3- BDA PIET 6 AIDS.pptxUnit-1 -2-3- BDA PIET 6 AIDS.pptx
Unit-1 -2-3- BDA PIET 6 AIDS.pptx
 
IRJET- Big Data Management and Growth Enhancement
IRJET- Big Data Management and Growth EnhancementIRJET- Big Data Management and Growth Enhancement
IRJET- Big Data Management and Growth Enhancement
 
BIG DATA CHAPTER 2 IN DSS.pptx
BIG DATA CHAPTER 2 IN DSS.pptxBIG DATA CHAPTER 2 IN DSS.pptx
BIG DATA CHAPTER 2 IN DSS.pptx
 
A Brief Introduction to Big Data Analytics.pptx
A Brief Introduction to Big Data Analytics.pptxA Brief Introduction to Big Data Analytics.pptx
A Brief Introduction to Big Data Analytics.pptx
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
big-data.pdf
big-data.pdfbig-data.pdf
big-data.pdf
 
Big data Introduction
Big data IntroductionBig data Introduction
Big data Introduction
 

KU_Big_Data_3_25_2015a

  • 1. March 26, 2015 Von McConnell 5/5/2015 Big Data Workshop 2
  • 2. What is Big Data*? The collection & analysis of data sets so large, complex & rapidly changing that it is difficult to process & understand using traditional data processing tools & applications • Coined by either Professor Francis Diebold, University Pennsylvania or John Mashely, Chief Scientist of Silicon Graphics around 1999 5/5/2015 Big Data Workshop 3 Page 3
  • 3. Traditional Analytics vs. Big Data Analytics Traditional Analytics Big Data Analytics  Report Past Events  Processing Times 1-2 days  Batch file oriented  Responds NOW  Processing Times <1-5 seconds  Near Time oriented  Traditional big DB Relational Data  Self generated  Defined meta-data structures  Batch file oriented  Real-time data + warehouse  Everyone creates data – e.g, Industry, Cross-Ind. Gov. etc.  All forms, images, videos, texts  Near real time  Linear Growth  Mostly Sampled  Gigabytes (109), Terabytes (1012)  Exponential Growth  All the Data  Petabytes (1015), Exabytes (1018), Zettabytes (1021) , Yotabytes, etc.  Sustained relevance of data series  Short term relevance of data snippetsVelocity Volume Variety 4 Big Data Workshop 5/5/2015 Page 4
  • 4. More Big Data “facts” – Appendix pages 24 - 26 Big Data Workshop 55/5/2015 Page 5
  • 5. Drivers of Big Data (not inclusive) Historical • Cost of Data Storage • Cost of Computing • Mobile Phones and tablets • Increase access to Internet • Social Media and eCommerce • Web Search • Etc. Future • IoT (Internet of Things) • Internet Cloud • Analog to Digital Conversions • Data Driven Decision-making • Enterprise Applications • Fraud, Security, CRM, ERP, etc. 0 10 20 30 40 2010 2015 2020 Zettabytes Data Consumption Over the Years Total Data Enterprise Managed Enterprise Created - In 2007 was estimated that all human knowledge was 295 Exabytes - In 2015, 1 Exabyte created d each day on internet = 250 million DVDs worth of information - By 2020, there will be 5.2 Terabytes per person on earth - 70% of all data generated by individuals but 80% is stored & managed by enterprises *The Rapid Growth of Big Data - CSC *The Rapid Growth of Big Data - CSC More Big Data “Facts Pages 5/5/2015 Big Data Workshop 6 Page 6
  • 6. People of Big Data OPSBig DataSystem AdministrationSys Design/Engineering Apps, Process, Business Data Scientist • Recent due to Big Data ~ 2010 • Uses internal, external & 3rd Party Data sets to answer questions • Knowledge of BD systems like Hadoop, Spark, Python/R, Statistics, etc. • Looks for hidden insights to solve business problems • Usually PhD or MS degree Data Engineer • Traditional Engineer who knows DB systems, Excel, Access, etc. • Compiles, installs DB systems and writes queries, etc. • Knows DB software such as SQL/NoSQL • Usually CS or Info Sys degree Data Analyst • Compiles & analyzes information – generally not Big Data • Draws analytical insights from available data and makes business reports to aid decision making –e.g., sales analyst, operations analyst, etc. • Usually has CS or Business degree Data Architect • Manages lots of data • Translates data into usable info • Designs DBs, and manages data sets • Com Sci, Com Eng, Info Sys, etc. 7 Big Data Workshop Data Architect 5/5/2015 Page 7
  • 7. Generalized Big Data Education Curriculums* Usually Interdisciplinary PhD Programs Statistics Computer Science Elected Discipline** Masters Programs * Based on experience with Carnegie Mellon, Stanford and UC Berkley ** e.g., Engineering Programs, Biology, Economics, Psychology, etc. Elected Discipline** Statistics Computer Science 8 Big Data Workshop 5/5/2015 Page 8
  • 8. Big Data Challenges 1. General Business Topics a. Value vs. Cost vs. Business Case b. Understanding how to apply Big Data • Focus on use cases, not technology c. Corporate Commitment & Leadership • Hesitation usually exist d. Common Taxonomy/Definitions e. HR/Personnel Issues • Finding skilled personnel – everyone thinks they know Big Data ;> • Educating the masses • Vendor vs. Employee-lead f. Privacy, Security & Policy/Regulations • New/increased costs g. Etc. 9 Reporting Monitoring Data Mining Evaluation Prediction Why did it happen? (hypothesis based) What will happen? Why did it happen? (correlations only) What is happening now? What is happening? Complexity Value & Cost Big Data Capability Blocks Privacy vs. Value Breakdown $Value Privacy Usage Preferences Status Location Personal Identifiers Intent Relationships Demographics Interactions 3 Steps to Sustained Big Data Analytics Evolution – Appendix page 27 Big Data Workshop 5/5/2015 Page 9
  • 9. Big Data Challenges (cont’d) 2) Data Management Topics a. Data Governance – e.g., efficiencies of standardization & data sharing b. Collecting the “right” data – not everything c. Lack of common data dictionary across enterprise d. How much data to collect – e.g., storage costs e. Amount of data integrity – e.g., redundant data system vs. how they handle data f. Data retention – e.g., how long to store g. Etc. 3) Data / BI Architectures Topics a. Redundant & customized data systems e.g., silo’d data b. Greenfield vs. Metadata & Federation Usage c. Where analytics accomplished – e.g. transport costs d. Etc. 10 RealityGoal Big Data Workshop 5/5/2015 Page 10
  • 10. What are the essential elements needed to create a Big Data professional education program? Charge to Participants This could include: • What are the challenge areas in Big Data environment? • What are the skills and knowledge need to meet those challenges? • How do we address these educational needs or gaps through training? 5/5/2015 Big Data Workshop 11 Page 11
  • 11. Example Use Case by Industries • Educators • Health Care • HR Administrators • Language/Linguistics • Mobile Carriers • Railroad • Sales - Retail & Wholesale • Water Utilities • Web Search Page 14 15 16 17 18 19 20 21 22 Big Data Workshop 125/5/2015 Page 12
  • 12. Educational Sector Use Cases 1) Student Performance Management & Intervention – What sequence of topics/subjects are most effective for a specific student? – Predict student academic and behavior issues using social media, web semantics, interpret grades, etc. – Develop student- and class-specific recommendations, such as individual or small group tutoring, supplemental learning materials in “problem” subject areas, or even changes in classes or majors 1) Teaching Effectiveness Analytics – More data allows for more robust comparison of teachers across years and schools/districts – Allows for a common base comparisons for trying new curricula 2) Academic Research – Wide open at all scholastic levels and subjects 13 Big Data Workshop 5/5/2015 Page 13
  • 13. Health Care Sector Use Cases 1) Predict sickness/disease outbreaks using reliable information on geographical movement – Reliable information from patients, doctors submitting reports, mobility records 2) Use pattern matching for predictive health: Correlate patient visits, diagnostics, and hospital/provider interactions across years of multiple visits – Find repeatable patterns in patient data & long term illness diagnosis (hypertension, diabetes, cancer, etc.) – Predict retreatment risk & proactively address, to avoid readmission within Medicare’s 30-day window 3) Identify best care approach via clinical analysis – Longitudinal analysis of care across patients and diagnoses – Cluster analysis around influencers on treatment, physicians, therapist, patient social relationships, mobility, income, etc. 4) Perform fraud analysis and identification via pattern analysis – Understand relationships among parties (physicians, consumers, organizations), locations, time of filing, frequency and circumstances – Detect potential for computer generated claims, graph analysis of cohort networks 14 Big Data Workshop 5/5/2015 Page 14
  • 14. Human Resources Sector Use Cases 1) Hiring New Employees – Profile candidates on various data points (e.g., situational/problem solving, social media interactions etc.) to determine with probability how candidates will perform in specific positions, reduce employee turnover, impact employee happiness, etc. – Minimize risk of negative results and missed budgets by guiding managers to make better, more informed decisions on which employees to select 2) Labor Force Cost Controls – Use holistic industry and cross-industry analytics to control labor costs by recommending the right level of labor (Mgr. vs. VP) and overall scope of position responsibilities – Use holistic industry and cross industry analytics to reveal the optimal organizational size and shape 3) Productivity Improvements – Improve workforce productivity with quick, timely adjustments to labor levels and fluctuating workload volume 15 Big Data Workshop 5/5/2015 Page 15
  • 15. Language/Linguistics Sector Use Cases 16 1) Identify Author of Anonymous Text – Discover who wrote anonymous text, where it comes from, who claims what, who refutes whom, and how many people claim this and how many people claim that position, etc. 2) Language Translation – Better & faster real-time translations of words and name as well as their cultural/societal meanings between various languages and locations, even slang and dialects. (almost 7000 primary languages and dialects w/another 39k sublanguages & dialects) – Significant improvements for verbal & textual web search capabilities across the world 3) Computational Linguistics – Allows for a baseline understanding of the various cultural, social, political, religious etc., biases, of individuals based upon the information read/obtained. Once an individual baseline is established, individual biases and specific information via web searches can be summarized and delivered with the individual biases for specific individuals, e.g., read all info with a “Republican” bias, etc. Big Data Workshop 5/5/2015 Page 16
  • 16. Mobile Carrier Sector Use Cases 17 1) Improved Network Operations – Better network coverage by understanding customer movement patterns based on weather, social events, etc. – Identifying and resolving network bottlenecks in minutes – Proactively managing customer experience and churn – Managing and planning for capacity requirements to maintain and improve the quality of service 2) Proactive Call Centers – Identifying and resolving service issues in minutes – Proactively managing customer experience and churn – Maximizing revenue and margins from existing subscriber base – Decreasing average call handling times and network operating costs 3) Movement Analyses – What is the movement between geographical markets/stores? – What % of people visit more than 1 location within a specific timeframe? – What locations share the same traffic patterns? – Associate movement w/activities – e.g., purchasing trends, diseases, energy consumption Big Data Workshop 5/5/2015 Page 17
  • 17. Railroad Sector Use Cases 1) Improve Fuel (Annual consumption = 1.7 trillion ton miles of freight w/3.6 billion tons of fuels) – Holistically determine shortest route, congestion, carriage determinations, etc. – Predict & track conditions based on usage, weather, by train type, etc. 2) Predict train maintenance schedules based on type of train, mileage, parts usage and shelf-life expectations, etc. – Overlay augmented reality handsets to calculate in real time specific engine parts and even where located and how best to get them to the nearest local maintenance facility 3) Improve Safety – Use various sensors, thermometers, and trends to detect problems with railway tracks in order to predict and prevent potential derailments 18 Big Data Workshop 5/5/2015 Page 18
  • 18. Retail Sales Sector Use Cases 1) Store Traverse Pattern Analysis – Use sensors to understand unique shopper’s movements and purchases throughout the store. How long are they standing in one place vs. another? How many items are reviewed before one is selected (if any at all) – Understand shopper demographics, interests and possible desired brands as they walk in the store and then “help” them make selections via electronic means 2) Voice of Customer – Use social sentiment analytics (web semantics) to assess overall store preferences, brand status and the launch of new products, services, or offers – By combining social and mobile analytics with loyalty information, retailers can create personalized, more relevant engagements with shoppers 3) Encourage Store Visits – Using presence and location-based mobility analytics, retailers pinpoint the location of opt-in shoppers when they are close to a store location 19 Big Data Workshop 5/5/2015 Page 19
  • 19. Water Utilities Sector Use Cases 1) Better Understanding of Customer improves Service & Reduces Costs – Understanding of customer profiles, weather patterns, populations growth & dynamics, mobility of customers help to understand and predict usage patterns – Size & condition of distribution network and better understanding of individual residential or business consumption patterns allow for much easier customized pricing packages to meet needs. 2) Improving Operations – Pinpointing or forecasting the location of outages for workforce deployment, scanning for potential fraud or theft or security breaches, as well as identifying trends or patterns for unbilled accounts assists the net income 3) Predictive Asset Management – Forecast potential performance or equipment failures in the distribution network as well as rate capacity of the network. – Leaks can also help predict potential distribution plant failures. 20 Big Data Workshop 5/5/2015 Page 20
  • 20. Web Search Sector Use Cases 21 1. Customer Analysis – Advertising effectiveness - How many, who are they, and how long do they look at an advertisement? – Where are they from? – Apply demographic, lifestyle, interest – Socioeconomic profiling – Understand advertising campaign effectiveness – Identify repeat viewer patterns 2) Economics of Social Networks • Who are the Influencers in specific social network groups? • How do Influencers impact purchases by others? • Understand impacts on churn • Understand impacts of acquisitions 3) Customer Behavior –Do various channel solicitations influence viewing, purchasing or no action? –How often do they shop/stay on your web store? –What is the “Next Best Offer”? Big Data Workshop 5/5/2015 Page 21
  • 22.  With better big data integration, various industries could have significant savings:  Healthcare – could save $300b per year (this is almost $1k per every man, woman and child)  Mobile Carriers – could save 50% of annual budgets per year by better understanding and forecasting usage patterns  Decoding the human genome originally took 10 years to complete: now it can be achieved in one week  Wal-Mart has more than 1 million customer transactions every hour  Retail sales revenue would increase by 60% if optimally used big data analytics  Over 90% of data was created in the last 2 years  The NSA is thought to analyze 1.6% of all global internet traffic (~ 30 petabytes)  An average mobile phone user “looks” their phone 45 times per hour. Each transaction generates on average of over 1200 data points or about 800k data points per user/per day  There is more data generated on mobile phones than on desktop and laptop computers combined  One Boeing Jet creates 50 terabytes of data per hour (can’t download data until plane lands) More Big Data “Facts”? (many attributed to Bernard Marr) 23 Big Data Workshop 5/5/2015 Page 23
  • 23. 245/5/2015 Big Data Workshop Page 24
  • 25. 3 Steps to Sustained Big Data Analytics Evolution 2) Collaborative path 1) Specialized path • Functionally/issues- based • Solution-oriented • Vertically specific Business issue and vertical process-driven Data-driven HIGH LOW HIGH Transforma tion • Individual Approaches • Custom/Point Solutions • Redundant Systems • Increase Costs • Functional/Dept. Leaders • Data Analytics as Value • Single Approach Focused • Enterprise wide • Program Oriented • Focused Approach • Reduce Costs Focused • Lead by CIOs, COO & CFOs 3) Economies of Scale path • Taxonomy/Definitions • Architectures • Analytic Methods • Education • Regulatory Issues • Standards 26 Big Data Workshop 5/5/2015 Page 26