SlideShare a Scribd company logo
1 of 40
BIG DATA…A BIG DEAL?

Organized by: Andrew Waitman
Big Data, Small Sound Bytes




2              © 2009/2012 Pythian © All Rights Reserved
Big Data, Small Sound Bytes




3              © 2009/2012 Pythian © All Rights Reserved
Big Data, Small Sound Bytes




4              © 2009/2012 Pythian © All Rights Reserved
Why Big Data Now?
         VOLUME

1. All on-line digital activity creates artifacts or metadata
   which in Tera to Peta byte or more volume is being called
   BIG DATA

2. Unstructured Metadata collection occurs when ever digital
   activity occurs

3. Digital metadata volume has exploded with growing internet
   usage and has accelerated with recent smart phone & iPAD
   usage driving global mobile and social activity




5                         © 2009/2012 Pythian © All Rights Reserved
Why Big Data Now?
     HUMAN VOLUME
1.   In 1998 Google provided 3.6 Million searches in the year

2.   In 2011 Google ran 1,722,071,000,000 searches per year

3.   In August 2008 there were 100 Million Facebook users

4.   In December 2012 there will be over 1 Billion Facebook users

5.   In August 2012 Twitter reached over 500,000,000 users



Digital volume of user on-line metadata has exploded with growing
internet, mobile and social use.


6                            © 2009/2012 Pythian © All Rights Reserved
Why Big Data Now?
     DEVICE VOLUME

1. In 2005 There were 1.5 Billion RFID Tags

2. In 2012 There are 30 Billion RFID Tags

3. 350 Billion Smart Meter Transactions per year

4. 1 Billion smart phones by 2015 with location
   sensors



Digital sensor data volume has exploded with growing
machine usage of sensor and measurement reporting

7                        © 2009/2012 Pythian © All Rights Reserved
Why Big Data Now?
            ZEITGIEST
1. Data Driven Decision Making is mainstream
   thinking– Think Moneyball by Michael Lewis

2. Google demonstrated the value and importance
   of mining ―Big Data‖ for Search, Ad Placement,
   Language Translation and a myriad of other
   computing challenges with economic benefit.

3. Data trumps smarter algorithms. It is
    the dawning of the Age of Real Time & Near Real
    Time BIG Impact Analytics.




8                         © 2009/2012 Pythian © All Rights Reserved
Why Big Data Now?
         ECONOMICS
1. Collection & Analysis of large volumes of metadata is now
   relatively simple, low cost and potentially highly valuable

2. Storage & computing power is relatively low cost enabling the
   mining of massive metadata volumes in real time, near real
   time or later

3. The economic benefit or value of the insights can
   far exceed the costs of acquiring & storing the data
4. The simplification and access of Big Data infrastructure tools




9                         © 2009/2012 Pythian © All Rights Reserved
Purpose of Data Analysis
     The analysis of data are required to understand
     (a) why consumers purchase a particular,
     (b) how consumers purchase the product,
     (c) the demographics and psychographics of the purchaser of the
         product and
     (d) the ultimate user of the product.




10                           © 2009/2012 Pythian © All Rights Reserved
An Alternative Perspective
      “Big Data is just the new rallying cry
     for the same old stuff BI companies
     have been producing all along”
                            -Stephen Few Perceptual Edge


     This seems obvious, but almost no attention is being given
     to building the skills and technologies that help us glean
     insights from data more effectively. As Richards J. Heuer,                          AVOID
     Jr. argued in the Psychology of Intelligence Analysis
     (1999), the primary failures of analysis are less due to                          CONFUSING
     insufficient data than to flawed thinking. To succeed
     analytically, we must invest a great deal more of our                            ABUNDANCE
     resources in training people to think effectively and we
     must equip them with tools that support that effort.                                WITH
     Heuer spent 45 years supporting the work of the CIA.
     Identifying a potential terrorist plot requires an analyst                         INSIGHT
     to sift through a lot of data (perhaps Big Data), but more
     importantly, it relies on their ability to connect the dots.
     Contrary to Heuer’s emphasis on thinking skills, big data
     is merely about more, more, more; not smarter or
     better.

11                                        © 2009/2012 Pythian © All Rights Reserved
Is Big Data really new?
                                  NO
What is new is that the access-to-insights occurs at
economics and tools available to almost anyone today

Saving all data is now economically viable for everyone.

Large public and private sector (Global 2000) enterprises
have always generated, stored, processed and analyzed
large volume and a variety of structured and
unstructured data:
1.    Particle Physics Research - Large Hadron Collider generates 1 Petabyte per second.


2.        Oil Exploration - Seismic sensor daa


3.        Bioinformatics -Human Genome Project




     12                                          © 2009/2012 Pythian © All Rights Reserved

.
BIG DATA VS TRADITIONAL
     DATA



      Petabytes at1/10th Cost of Pre-Engineered                        Gigabytes to Tera-bytes
                       Storage                                              SQL Structured
                   Semi-structured                                        Engineered Systems
                  Variety of Sources                                      Data Model/Schema
                   Store Everything                                      Selected Data Stored
                      Raw Data                                  Complexity at Design/Architecture stage
               No Data Model/Schema                                   Simplicity at Usage stage
            Parallelize to handle volume                          Majority of $$ Investment up front
       Simplicity at Design/Architecture stage
             Complexity at Insight stage




13                                   © 2009/2012 Pythian © All Rights Reserved
Big Data is BI at Scale
        PHASE 1                 PHASE 2                                 PHASE 3




      Capture &             Speculate                                Exploit
      Store                 and                                      Insights
      • Petabyte scale      Investigate                              • Real Time
      • 300                 • Data Science                           • NRT Decisions
        Terabytes/Rack      • Analytics
                            • MAP-R




14                       © 2009/2012 Pythian © All Rights Reserved
Big Data Phase 1- Capture &
     Store
     Is the value of potential insights much greater
     than the cost of searching for them?
BUSINESS QUESTIONS
     • How do you plan to store what types of semi-structured data?
     • What questions are you attempting to answer?
     • What Data Analysis is being currently done?
     • What are people asking questions about?
     • What DR? What compression? What Storage is possible? Flash vs
       Disk? Capacity and How fast to access?
     • How many people can access simultaneously?
     • KNOW THE DATA? SOURCE? RATE OF GENERATION?


15                           © 2009/2012 Pythian © All Rights Reserved
Big Data Phase 1- Capture &
     Store
     Is the value of potential insights much greater
     than the cost of searching for them?
STORAGE REQUIREMENTS
     •   Be scalable
     •   Provide tiered storage
     •   Be self managing
     •   Ensure content is highly available
     •   Ensure content is widely accessible
     •   Support both analytical and content applications
     •   Support workflow automation
     •   Integrate with legacy applications
     •   Enable integration with public, private and hybrid cloud ecosystems
     •   Be self healing

16                                      © 2009/2012 Pythian © All Rights Reserved
Big Data Phase 2- Speculate and
     Investigate
     Is the value of potential insights much greater
     than the cost of searching for them?
BUSINESS QUESTIONS
     • What type of semi-structured data do I have?
     • What type of questions am I trying to answer?
     • Statistical? Correlation? Causal? Patterns?
     • How do I need to manipulate, translate, transform, cleanse,
       organize, visualize the data?
     • How much time do I have for analysis?
     • What tools do I have to perform transformation and analysis?




17                           © 2009/2012 Pythian © All Rights Reserved
Big Data Phase 3- Exploit
     Insights
     Is the value of potential insights much greater
     than the cost of searching for them?
BUSINESS QUESTIONS
     • Are discovered patterns/insights available in real-time, near real-
       time or further out?
     • How do systemically find pattern/insight going forward?
     • How do I integrate into business impacting decision process?




18                           © 2009/2012 Pythian © All Rights Reserved
Top 10 Reasons Why all the Hype
     around Big Data now?
1. At Tera & Peta bytes it really does get interesting.
2. All the Cool Kids are doing it.
     Once the Four Digerati Horseman (Google, Facebook, Twitter, Amazon) say its important, then it really is.


3. BI Folks needed a new marketing moniker.
4. ‗CLOUD‘ hype was already annoying and slowing.
5. Gartner says its near its peak!
6. The term went viral!
7. People thought you said Big Deal!
8. Voluminous data could not be pronounced
9. User Data mining is next to Voyeurism
10. Its Google‘s Vault!



19                                                 © 2009/2012 Pythian © All Rights Reserved
What is considered Big Data?
            VOLUME & VARIETY
 1. Any data stored digitally and at scale (Tera bytes
    +) with potential for providing practical, useful
    insights, potentially with economic benefits

 2. Very large volume of unstructured
    information/data

 3. Big Data is characterized by the volume, velocity
    and variety of large data sets



 Every “connected” person or “connected”
 device is potentially a data generator


20                            © 2009/2012 Pythian © All Rights Reserved
What is considered Big Data?
            DIFFICULT & TIMELY
 1. Big Data by the nature of the volume hides or
    obscures valuable insights. A lot of noise but with
    critical and potentially valuable signals buried
    within

 2. Often the signal value perishes rapidly requiring
    real time or near real time analysis and action



 Big Data is the quintessential signal vs noise
 problem




21                            © 2009/2012 Pythian © All Rights Reserved
Examples of Big Data?
• Local/regional weather information
• WEB Traffic information
• User search behavior

• Social information – who connected to whom, who
  poked who etc.
• Mobile User information – preferences, likes,
  habits
• Application usage information
• E-commerce transaction information
• Physical retail customer transaction data



22                        © 2009/2012 Pythian © All Rights Reserved
Who are the Top 15 ‘Big Data’ ‘Players’?
     1. Google                  11.Microsoft
     2. Amazon                  12.IBM
     3. Apple                   13.Hortonworks
     4. Yahoo                   14.Zynga
     5. Facebook                15.eBay
     6. Salesforce
     7. Twitter
     8. Cloudera
     9. LinkedIN
     10.NetFlix

23                   © 2009/2012 Pythian © All Rights Reserved
1. www.kaggle.com
     2. www.indeed.com
     3. www.recordedfuture.com
     4. www.datamarket.com
     5. www.climate.com
     6. www.manybills.com
     7. www.electrion.twitter.com
     8. www.consensu.gov
     9. www.coursera.com
     10. www.data.gov


24                  © 2009/2012 Pythian © All Rights Reserved
What is the size of the BIG DATA Market?

     Deloitte pegs the size of the big data market at
     about $1.3-$1.5 billion in 2012

     In March, the IDC released a statement that
     predicted the worldwide big data technology
     services market to reach $16.9 billion in 2015.

     The 2012 Global BI SW Market is $35 Billion




25                     © 2009/2012 Pythian © All Rights Reserved
Where does BI and Big Data co-
     exist?
                           PREDICTIVE ANALYTICS




26               © 2009/2012 Pythian © All Rights Reserved
How does Machine Learning and Big Data
     relate?

                              PREDICTIVE ANALYTICS




27                  © 2009/2012 Pythian © All Rights Reserved
When is Big Data valuable?

 1. When better Business decisions result from practical
    insights provided by data that were unavailable to
    expert judgment or unaware by experts

 2. When time-to-insight results in big returns or benefit
    eg. Real time book recommendation

 3. Where precision of analysis results in specific
    alternative decisions

 4. Where patterns from heterogeneous or seemingly
    disparate data sources provide material competitive
    insights/advantage versus competition



28                          © 2009/2012 Pythian © All Rights Reserved
What is unique about Big Data
     Technology?
         MASSIVE PARRALLISM
        AFFORDABLE HARDWARE
          LOCAL PROCESSING
 1. The tools do not require the data to be first
    structured in a particular schema as is required in
    relational databases
 2. Data is analyzed in native format closest to where
    it is stored, dramatically reducing the time and
    effort for retrieval and restore.




29                         © 2009/2012 Pythian © All Rights Reserved
Visualization may unlock the key to Big Data
     Insights




30                   © 2009/2012 Pythian © All Rights Reserved
What skills do I need in my organization
     for Big Data?
      1. Data scientists –
        •   Identify what analysis makes sense in context. Typical background in math and
            statistics, as well as artificial intelligence and natural language processing.


      2. Data architects –
        • Create Data mode and identify required data sources and analytical tools

      3. Data visualizers –
        • Using visualizations exploring what the data means and presenting how it will
            impact the company

      4. Data change agents –
        • Good communicators, and a Six Sigma background — Understand how to apply
            statistics.




31                                           © 2009/2012 Pythian © All Rights Reserved
What skills do I need in my organization
     for Big Data?
      5. Data engineer/operators –
        •   Big Data infrastructure operations. Develop architecture that helps analyze and
            supply data in the way the business needs, and make sure systems are
            performing smoothly

      6. Data stewards –
        • Ensure that data sources are properly accounted for, and may also maintain a
            centralized repository as part of a Master Data Management approach, in which
            there is one ―gold copy‖ of enterprise data to be referenced.

      7. Data virtualization/cloud specialists –
        •   Build and maintain a virtualized data service layer that can draw data from any
            source and make it available across organizations in a consistent, easy-to-access
            manner

      8. Systems Administrators



32                                         © 2009/2012 Pythian © All Rights Reserved
Six Steps to Big Data alchemy?
      1.       Select the right data sets
           •     Identify rich data sources which may contain insights to a particular problem you are trying to
                 solve or insight you are trying to gain. Social media data is providing incredible insights to
                 changes in Brand positioning and new product introductions

      2.       Join the various sets of data
           •     Rich unstructured and sometimes incomplete data into a new set for manipulation and analysis

      3.       Clean the new large data set
           •     Begin to discover important and relevant patterns, signatures, anomalies, correlations, outliers
                 using advanced analytic models

      4.       Create models
           •     These models predict outcomes using the data. Iterate your hypothesis and keep experimenting

      5.       Use visualization tools
           •     Visualization may assist in discovery or presentation of key insights from the data

      6.       Iterate
           •     Keep varying your various models and data sets to assist future planning or decision making




33                                                © 2009/2012 Pythian © All Rights Reserved
How is Big Data providing Value
     today?
     • On line Media and Social Sites mine user behavior Big Data for what
       interests whom, when, why and how. Big WEB SURF Data provides
       insights to Sites of what people are interested in, whom do they
       share that information with, and how long they stay engaged on
       line.
     • On line retailers mining Big Data to predict consumers buying
       behavior, purchase preferences and high impact offers to drive up
       total spend per session.
     • Insurance companies mining Big Data can improve their overall
       performance by facilitating greater pricing accuracy, deeper
       relationships with customers, and more effective and efficient loss
       prevention.




34                           © 2009/2012 Pythian © All Rights Reserved
How can Pythian help you with Big
     Data?
     1. First, get informed.
     2. Second, get started.
           Recognize an opportunity for competitive Advantage within your company.

     3. Third, get the right team of people involved.
           Organize an internal task force to drive the Big Data initiative. Don‘t forget
           to find the critical Data Scientist. That person who will understand the data
           sources and know what questions to pose.

     4. Fourth, identify the key sources of Big Data
        both external and internal.
     5. Fifth, with Pythian‘s assistance evaluate the
        tools and technology that will help your Big
        Data program.




35                                © 2009/2012 Pythian © All Rights Reserved
Key Questions for Executives

     • What does the data say?
     • Where did the data come from?
     • Has the data been sufficiently cleaned?
     • How was the data analyzed?
     • How confident can we be in our analysis?
     • Can we distinguish correlation from causality?
     • How much will the data influence the key
       decision makers?




36                       © 2009/2012 Pythian © All Rights Reserved
A compelling balanced perspective on Big
Data




                  Stephen Few- Perceptual Edge




37             © 2009/2012 Pythian © All Rights Reserved
Archive Slides




38              © 2009/2012 Pythian © All Rights Reserved
Big Data Start-ups
     • WeatherBill (which compiles large amounts of weather data from a
       variety of sources, then sells insurance based on statistical
       analysis),
     • Klout (a controversial startup that processes large amounts of data
       to create every users‘s social influence score) or
     • Wonga (which crunches data to grant financial loans) are some
       early examples of startups with big data as their core DNA.
     • John Partridge, the president and CEO of Tokutek Inc. — a
       Lexington company founded in 2006 that makes databases run
       faster.
     • Trifacta raised $$4.3 million from Accel‘s Big Data fund for a
       solution that doesn‘t just visualize insight, but also the analytics
       tools that produce it.
     • Platfora is a software company based in San Mateo, California, building a revolutionary BI and analytics platform
         that democratizes and simplifies use of big data and Hadoop. The company was founded by Ben Werther, former product
         head of Greenplum, an analytical database company acquired by EMC. Platfora is assembling a superb team of data and
         distributed systems architects/engineers, UI and UX developers, and data scientists.




39                                            © 2009/2012 Pythian © All Rights Reserved
Big Data Start-ups
     • About MapR Technologies
        MapR delivers on the promise of Hadoop, making managing and analyzing Big Data a reality for
        more business users. MapR enables customers to harness the power of Big Data analytics.
        Leading companies including Amazon, Cisco, EMC and Google partner with MapR to deliver an
        enterprise-grade Hadoop solution. Investors include Lightspeed Venture Partners, NEA and
        Redpoint Ventures.

     • Alteryx provides indispensable analytic solutions for enterprise and SMB companies making
        critical decisions about how to expand and grow. Our product, Alteryx Strategic Analytics, is a
        desktop-to-cloud Agile BI and analytics solution designed for data artisans and business leaders
        that brings together the market knowledge, location insight, and business intelligence today‘s
        organizations require. For more than a decade, Alteryx has enabled strategic planning
        executives to identify and seize market opportunities, outsmart their competitors, and drive
        more revenue.




40                                     © 2009/2012 Pythian © All Rights Reserved

More Related Content

What's hot

Why KM Programs Fail
Why KM Programs FailWhy KM Programs Fail
Why KM Programs FailSIKM
 
How to Accelerate Growth, Innovation, and High Performance for CPAs, Account...
How to  Accelerate Growth, Innovation, and High Performance for CPAs, Account...How to  Accelerate Growth, Innovation, and High Performance for CPAs, Account...
How to Accelerate Growth, Innovation, and High Performance for CPAs, Account...Tom Hood, CPA,CITP,CGMA
 
The 2016 Top 50 Tech Pioneers, Australia and New Zealand
The 2016 Top 50 Tech Pioneers, Australia and New ZealandThe 2016 Top 50 Tech Pioneers, Australia and New Zealand
The 2016 Top 50 Tech Pioneers, Australia and New ZealandH2 Ventures
 
Interconnect2017completewatsoniotjourneymap0216 170220225328
Interconnect2017completewatsoniotjourneymap0216 170220225328Interconnect2017completewatsoniotjourneymap0216 170220225328
Interconnect2017completewatsoniotjourneymap0216 170220225328Krystel Hery
 
Be a Great Product Leader (Zynga 2016)
Be a Great Product Leader (Zynga 2016)Be a Great Product Leader (Zynga 2016)
Be a Great Product Leader (Zynga 2016)Adam Nash
 
Univ Mich Ross - Innovation Strategy
Univ Mich Ross - Innovation StrategyUniv Mich Ross - Innovation Strategy
Univ Mich Ross - Innovation StrategyAki Balogh
 
The Future Ready CPA - Moving from Compliance to Reliance
The Future Ready CPA - Moving from Compliance to RelianceThe Future Ready CPA - Moving from Compliance to Reliance
The Future Ready CPA - Moving from Compliance to RelianceTom Hood, CPA,CITP,CGMA
 
CIO Role - Challenges in Management and Leadership
CIO Role - Challenges in Management and LeadershipCIO Role - Challenges in Management and Leadership
CIO Role - Challenges in Management and LeadershipCIO Vietnam
 
An Accounting Career: Big Waves of Change and Oceans of Opportunity
An Accounting Career: Big Waves of Change and Oceans of OpportunityAn Accounting Career: Big Waves of Change and Oceans of Opportunity
An Accounting Career: Big Waves of Change and Oceans of OpportunityTom Hood, CPA,CITP,CGMA
 
Future work skills - HR
Future work skills - HR Future work skills - HR
Future work skills - HR Atul K. Shukla
 
MACPA Fall Town Hall / Professional Issues Update - 2015
MACPA Fall Town Hall / Professional Issues Update - 2015MACPA Fall Town Hall / Professional Issues Update - 2015
MACPA Fall Town Hall / Professional Issues Update - 2015Tom Hood, CPA,CITP,CGMA
 
The Firm of the Future - 2016 AAM InNEWvation Summit
The Firm of the Future - 2016 AAM InNEWvation Summit The Firm of the Future - 2016 AAM InNEWvation Summit
The Firm of the Future - 2016 AAM InNEWvation Summit Tom Hood, CPA,CITP,CGMA
 
Solving the Wanamaker Problem for Healthcare (keynote file)
Solving the Wanamaker Problem for Healthcare (keynote file)Solving the Wanamaker Problem for Healthcare (keynote file)
Solving the Wanamaker Problem for Healthcare (keynote file)Tim O'Reilly
 
The Changing Structure of the Venture Capital Industry
The Changing Structure of the Venture Capital IndustryThe Changing Structure of the Venture Capital Industry
The Changing Structure of the Venture Capital IndustryMark Suster
 

What's hot (17)

Why KM Programs Fail
Why KM Programs FailWhy KM Programs Fail
Why KM Programs Fail
 
State of My Industry: Accounting 2016
State of My Industry: Accounting 2016State of My Industry: Accounting 2016
State of My Industry: Accounting 2016
 
How to Accelerate Growth, Innovation, and High Performance for CPAs, Account...
How to  Accelerate Growth, Innovation, and High Performance for CPAs, Account...How to  Accelerate Growth, Innovation, and High Performance for CPAs, Account...
How to Accelerate Growth, Innovation, and High Performance for CPAs, Account...
 
The 2016 Top 50 Tech Pioneers, Australia and New Zealand
The 2016 Top 50 Tech Pioneers, Australia and New ZealandThe 2016 Top 50 Tech Pioneers, Australia and New Zealand
The 2016 Top 50 Tech Pioneers, Australia and New Zealand
 
Interconnect2017completewatsoniotjourneymap0216 170220225328
Interconnect2017completewatsoniotjourneymap0216 170220225328Interconnect2017completewatsoniotjourneymap0216 170220225328
Interconnect2017completewatsoniotjourneymap0216 170220225328
 
Be a Great Product Leader (Zynga 2016)
Be a Great Product Leader (Zynga 2016)Be a Great Product Leader (Zynga 2016)
Be a Great Product Leader (Zynga 2016)
 
Univ Mich Ross - Innovation Strategy
Univ Mich Ross - Innovation StrategyUniv Mich Ross - Innovation Strategy
Univ Mich Ross - Innovation Strategy
 
The Future Ready CPA - Moving from Compliance to Reliance
The Future Ready CPA - Moving from Compliance to RelianceThe Future Ready CPA - Moving from Compliance to Reliance
The Future Ready CPA - Moving from Compliance to Reliance
 
CIO Role - Challenges in Management and Leadership
CIO Role - Challenges in Management and LeadershipCIO Role - Challenges in Management and Leadership
CIO Role - Challenges in Management and Leadership
 
An Accounting Career: Big Waves of Change and Oceans of Opportunity
An Accounting Career: Big Waves of Change and Oceans of OpportunityAn Accounting Career: Big Waves of Change and Oceans of Opportunity
An Accounting Career: Big Waves of Change and Oceans of Opportunity
 
Future work skills - HR
Future work skills - HR Future work skills - HR
Future work skills - HR
 
MACPA Fall Town Hall / Professional Issues Update - 2015
MACPA Fall Town Hall / Professional Issues Update - 2015MACPA Fall Town Hall / Professional Issues Update - 2015
MACPA Fall Town Hall / Professional Issues Update - 2015
 
The Future Ready CPA Firm
The Future Ready CPA FirmThe Future Ready CPA Firm
The Future Ready CPA Firm
 
The Firm of the Future - 2016 AAM InNEWvation Summit
The Firm of the Future - 2016 AAM InNEWvation Summit The Firm of the Future - 2016 AAM InNEWvation Summit
The Firm of the Future - 2016 AAM InNEWvation Summit
 
The Future Workforce
The Future WorkforceThe Future Workforce
The Future Workforce
 
Solving the Wanamaker Problem for Healthcare (keynote file)
Solving the Wanamaker Problem for Healthcare (keynote file)Solving the Wanamaker Problem for Healthcare (keynote file)
Solving the Wanamaker Problem for Healthcare (keynote file)
 
The Changing Structure of the Venture Capital Industry
The Changing Structure of the Venture Capital IndustryThe Changing Structure of the Venture Capital Industry
The Changing Structure of the Venture Capital Industry
 

Viewers also liked

2013 march madness_mythsmysteriesandmoneymakingmethods v2.1
2013 march madness_mythsmysteriesandmoneymakingmethods  v2.12013 march madness_mythsmysteriesandmoneymakingmethods  v2.1
2013 march madness_mythsmysteriesandmoneymakingmethods v2.1Andrew Waitman
 
MARKET MULTIPLES AW OCT 2015
MARKET MULTIPLES AW OCT 2015MARKET MULTIPLES AW OCT 2015
MARKET MULTIPLES AW OCT 2015Andrew Waitman
 
Why companies succeed
Why companies succeed Why companies succeed
Why companies succeed Andrew Waitman
 
Market multiples aw oct 2015
Market multiples aw oct 2015Market multiples aw oct 2015
Market multiples aw oct 2015Andrew Waitman
 
Venture and Leadership Insights in 1 Place.
Venture and Leadership  Insights in 1 Place. Venture and Leadership  Insights in 1 Place.
Venture and Leadership Insights in 1 Place. Andrew Waitman
 
Deep insights from steve jobs biography
Deep insights from steve jobs biographyDeep insights from steve jobs biography
Deep insights from steve jobs biographyAndrew Waitman
 
Copius cash crushes creativity
Copius cash crushes creativityCopius cash crushes creativity
Copius cash crushes creativityAndrew Waitman
 
Ver 1.10 the venture capital ecosystem feb 2015
Ver 1.10   the venture capital ecosystem feb 2015Ver 1.10   the venture capital ecosystem feb 2015
Ver 1.10 the venture capital ecosystem feb 2015Andrew Waitman
 

Viewers also liked (8)

2013 march madness_mythsmysteriesandmoneymakingmethods v2.1
2013 march madness_mythsmysteriesandmoneymakingmethods  v2.12013 march madness_mythsmysteriesandmoneymakingmethods  v2.1
2013 march madness_mythsmysteriesandmoneymakingmethods v2.1
 
MARKET MULTIPLES AW OCT 2015
MARKET MULTIPLES AW OCT 2015MARKET MULTIPLES AW OCT 2015
MARKET MULTIPLES AW OCT 2015
 
Why companies succeed
Why companies succeed Why companies succeed
Why companies succeed
 
Market multiples aw oct 2015
Market multiples aw oct 2015Market multiples aw oct 2015
Market multiples aw oct 2015
 
Venture and Leadership Insights in 1 Place.
Venture and Leadership  Insights in 1 Place. Venture and Leadership  Insights in 1 Place.
Venture and Leadership Insights in 1 Place.
 
Deep insights from steve jobs biography
Deep insights from steve jobs biographyDeep insights from steve jobs biography
Deep insights from steve jobs biography
 
Copius cash crushes creativity
Copius cash crushes creativityCopius cash crushes creativity
Copius cash crushes creativity
 
Ver 1.10 the venture capital ecosystem feb 2015
Ver 1.10   the venture capital ecosystem feb 2015Ver 1.10   the venture capital ecosystem feb 2015
Ver 1.10 the venture capital ecosystem feb 2015
 

Similar to Big Data a big deal?

An Overview of BigData
An Overview of BigDataAn Overview of BigData
An Overview of BigDataValarmathi V
 
Big Data - A Real Life Revolution
Big Data - A Real Life RevolutionBig Data - A Real Life Revolution
Big Data - A Real Life RevolutionCapgemini
 
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...Vladimir Bacvanski, PhD
 
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...DATAVERSITY
 
big-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptxbig-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptxVaishnavGhadge1
 
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Hritika Raj
 
OpTier McKinsey Big Data Overview
OpTier McKinsey Big Data OverviewOpTier McKinsey Big Data Overview
OpTier McKinsey Big Data Overviewnickychu
 
McKinsey Big Data Overview
McKinsey Big Data OverviewMcKinsey Big Data Overview
McKinsey Big Data Overviewoptier
 
IRJET- Youtube Data Sensitivity and Analysis using Hadoop Framework
IRJET-  	  Youtube Data Sensitivity and Analysis using Hadoop FrameworkIRJET-  	  Youtube Data Sensitivity and Analysis using Hadoop Framework
IRJET- Youtube Data Sensitivity and Analysis using Hadoop FrameworkIRJET Journal
 
McKinsey Big Data Overview
McKinsey Big Data OverviewMcKinsey Big Data Overview
McKinsey Big Data Overviewoptier
 
In memory big data management and processing
In memory big data management and processingIn memory big data management and processing
In memory big data management and processingPranav Gontalwar
 
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)Mark Heid
 
Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)Peter Wood
 
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...BigMine
 
Ab cs of big data
Ab cs of big dataAb cs of big data
Ab cs of big dataDigimark
 

Similar to Big Data a big deal? (20)

An Overview of BigData
An Overview of BigDataAn Overview of BigData
An Overview of BigData
 
Big Data - A Real Life Revolution
Big Data - A Real Life RevolutionBig Data - A Real Life Revolution
Big Data - A Real Life Revolution
 
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
 
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
 
big-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptxbig-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptx
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
 
OpTier McKinsey Big Data Overview
OpTier McKinsey Big Data OverviewOpTier McKinsey Big Data Overview
OpTier McKinsey Big Data Overview
 
McKinsey Big Data Overview
McKinsey Big Data OverviewMcKinsey Big Data Overview
McKinsey Big Data Overview
 
IRJET- Youtube Data Sensitivity and Analysis using Hadoop Framework
IRJET-  	  Youtube Data Sensitivity and Analysis using Hadoop FrameworkIRJET-  	  Youtube Data Sensitivity and Analysis using Hadoop Framework
IRJET- Youtube Data Sensitivity and Analysis using Hadoop Framework
 
McKinsey Big Data Overview
McKinsey Big Data OverviewMcKinsey Big Data Overview
McKinsey Big Data Overview
 
In memory big data management and processing
In memory big data management and processingIn memory big data management and processing
In memory big data management and processing
 
Big Data ppt
Big Data pptBig Data ppt
Big Data ppt
 
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
 
bigdata.pptx
bigdata.pptxbigdata.pptx
bigdata.pptx
 
Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)
 
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
 
Ab cs of big data
Ab cs of big dataAb cs of big data
Ab cs of big data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 

Recently uploaded

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

Big Data a big deal?

  • 1. BIG DATA…A BIG DEAL? Organized by: Andrew Waitman
  • 2. Big Data, Small Sound Bytes 2 © 2009/2012 Pythian © All Rights Reserved
  • 3. Big Data, Small Sound Bytes 3 © 2009/2012 Pythian © All Rights Reserved
  • 4. Big Data, Small Sound Bytes 4 © 2009/2012 Pythian © All Rights Reserved
  • 5. Why Big Data Now? VOLUME 1. All on-line digital activity creates artifacts or metadata which in Tera to Peta byte or more volume is being called BIG DATA 2. Unstructured Metadata collection occurs when ever digital activity occurs 3. Digital metadata volume has exploded with growing internet usage and has accelerated with recent smart phone & iPAD usage driving global mobile and social activity 5 © 2009/2012 Pythian © All Rights Reserved
  • 6. Why Big Data Now? HUMAN VOLUME 1. In 1998 Google provided 3.6 Million searches in the year 2. In 2011 Google ran 1,722,071,000,000 searches per year 3. In August 2008 there were 100 Million Facebook users 4. In December 2012 there will be over 1 Billion Facebook users 5. In August 2012 Twitter reached over 500,000,000 users Digital volume of user on-line metadata has exploded with growing internet, mobile and social use. 6 © 2009/2012 Pythian © All Rights Reserved
  • 7. Why Big Data Now? DEVICE VOLUME 1. In 2005 There were 1.5 Billion RFID Tags 2. In 2012 There are 30 Billion RFID Tags 3. 350 Billion Smart Meter Transactions per year 4. 1 Billion smart phones by 2015 with location sensors Digital sensor data volume has exploded with growing machine usage of sensor and measurement reporting 7 © 2009/2012 Pythian © All Rights Reserved
  • 8. Why Big Data Now? ZEITGIEST 1. Data Driven Decision Making is mainstream thinking– Think Moneyball by Michael Lewis 2. Google demonstrated the value and importance of mining ―Big Data‖ for Search, Ad Placement, Language Translation and a myriad of other computing challenges with economic benefit. 3. Data trumps smarter algorithms. It is the dawning of the Age of Real Time & Near Real Time BIG Impact Analytics. 8 © 2009/2012 Pythian © All Rights Reserved
  • 9. Why Big Data Now? ECONOMICS 1. Collection & Analysis of large volumes of metadata is now relatively simple, low cost and potentially highly valuable 2. Storage & computing power is relatively low cost enabling the mining of massive metadata volumes in real time, near real time or later 3. The economic benefit or value of the insights can far exceed the costs of acquiring & storing the data 4. The simplification and access of Big Data infrastructure tools 9 © 2009/2012 Pythian © All Rights Reserved
  • 10. Purpose of Data Analysis The analysis of data are required to understand (a) why consumers purchase a particular, (b) how consumers purchase the product, (c) the demographics and psychographics of the purchaser of the product and (d) the ultimate user of the product. 10 © 2009/2012 Pythian © All Rights Reserved
  • 11. An Alternative Perspective “Big Data is just the new rallying cry for the same old stuff BI companies have been producing all along” -Stephen Few Perceptual Edge This seems obvious, but almost no attention is being given to building the skills and technologies that help us glean insights from data more effectively. As Richards J. Heuer, AVOID Jr. argued in the Psychology of Intelligence Analysis (1999), the primary failures of analysis are less due to CONFUSING insufficient data than to flawed thinking. To succeed analytically, we must invest a great deal more of our ABUNDANCE resources in training people to think effectively and we must equip them with tools that support that effort. WITH Heuer spent 45 years supporting the work of the CIA. Identifying a potential terrorist plot requires an analyst INSIGHT to sift through a lot of data (perhaps Big Data), but more importantly, it relies on their ability to connect the dots. Contrary to Heuer’s emphasis on thinking skills, big data is merely about more, more, more; not smarter or better. 11 © 2009/2012 Pythian © All Rights Reserved
  • 12. Is Big Data really new? NO What is new is that the access-to-insights occurs at economics and tools available to almost anyone today Saving all data is now economically viable for everyone. Large public and private sector (Global 2000) enterprises have always generated, stored, processed and analyzed large volume and a variety of structured and unstructured data: 1. Particle Physics Research - Large Hadron Collider generates 1 Petabyte per second. 2. Oil Exploration - Seismic sensor daa 3. Bioinformatics -Human Genome Project 12 © 2009/2012 Pythian © All Rights Reserved .
  • 13. BIG DATA VS TRADITIONAL DATA Petabytes at1/10th Cost of Pre-Engineered Gigabytes to Tera-bytes Storage SQL Structured Semi-structured Engineered Systems Variety of Sources Data Model/Schema Store Everything Selected Data Stored Raw Data Complexity at Design/Architecture stage No Data Model/Schema Simplicity at Usage stage Parallelize to handle volume Majority of $$ Investment up front Simplicity at Design/Architecture stage Complexity at Insight stage 13 © 2009/2012 Pythian © All Rights Reserved
  • 14. Big Data is BI at Scale PHASE 1 PHASE 2 PHASE 3 Capture & Speculate Exploit Store and Insights • Petabyte scale Investigate • Real Time • 300 • Data Science • NRT Decisions Terabytes/Rack • Analytics • MAP-R 14 © 2009/2012 Pythian © All Rights Reserved
  • 15. Big Data Phase 1- Capture & Store Is the value of potential insights much greater than the cost of searching for them? BUSINESS QUESTIONS • How do you plan to store what types of semi-structured data? • What questions are you attempting to answer? • What Data Analysis is being currently done? • What are people asking questions about? • What DR? What compression? What Storage is possible? Flash vs Disk? Capacity and How fast to access? • How many people can access simultaneously? • KNOW THE DATA? SOURCE? RATE OF GENERATION? 15 © 2009/2012 Pythian © All Rights Reserved
  • 16. Big Data Phase 1- Capture & Store Is the value of potential insights much greater than the cost of searching for them? STORAGE REQUIREMENTS • Be scalable • Provide tiered storage • Be self managing • Ensure content is highly available • Ensure content is widely accessible • Support both analytical and content applications • Support workflow automation • Integrate with legacy applications • Enable integration with public, private and hybrid cloud ecosystems • Be self healing 16 © 2009/2012 Pythian © All Rights Reserved
  • 17. Big Data Phase 2- Speculate and Investigate Is the value of potential insights much greater than the cost of searching for them? BUSINESS QUESTIONS • What type of semi-structured data do I have? • What type of questions am I trying to answer? • Statistical? Correlation? Causal? Patterns? • How do I need to manipulate, translate, transform, cleanse, organize, visualize the data? • How much time do I have for analysis? • What tools do I have to perform transformation and analysis? 17 © 2009/2012 Pythian © All Rights Reserved
  • 18. Big Data Phase 3- Exploit Insights Is the value of potential insights much greater than the cost of searching for them? BUSINESS QUESTIONS • Are discovered patterns/insights available in real-time, near real- time or further out? • How do systemically find pattern/insight going forward? • How do I integrate into business impacting decision process? 18 © 2009/2012 Pythian © All Rights Reserved
  • 19. Top 10 Reasons Why all the Hype around Big Data now? 1. At Tera & Peta bytes it really does get interesting. 2. All the Cool Kids are doing it. Once the Four Digerati Horseman (Google, Facebook, Twitter, Amazon) say its important, then it really is. 3. BI Folks needed a new marketing moniker. 4. ‗CLOUD‘ hype was already annoying and slowing. 5. Gartner says its near its peak! 6. The term went viral! 7. People thought you said Big Deal! 8. Voluminous data could not be pronounced 9. User Data mining is next to Voyeurism 10. Its Google‘s Vault! 19 © 2009/2012 Pythian © All Rights Reserved
  • 20. What is considered Big Data? VOLUME & VARIETY 1. Any data stored digitally and at scale (Tera bytes +) with potential for providing practical, useful insights, potentially with economic benefits 2. Very large volume of unstructured information/data 3. Big Data is characterized by the volume, velocity and variety of large data sets Every “connected” person or “connected” device is potentially a data generator 20 © 2009/2012 Pythian © All Rights Reserved
  • 21. What is considered Big Data? DIFFICULT & TIMELY 1. Big Data by the nature of the volume hides or obscures valuable insights. A lot of noise but with critical and potentially valuable signals buried within 2. Often the signal value perishes rapidly requiring real time or near real time analysis and action Big Data is the quintessential signal vs noise problem 21 © 2009/2012 Pythian © All Rights Reserved
  • 22. Examples of Big Data? • Local/regional weather information • WEB Traffic information • User search behavior • Social information – who connected to whom, who poked who etc. • Mobile User information – preferences, likes, habits • Application usage information • E-commerce transaction information • Physical retail customer transaction data 22 © 2009/2012 Pythian © All Rights Reserved
  • 23. Who are the Top 15 ‘Big Data’ ‘Players’? 1. Google 11.Microsoft 2. Amazon 12.IBM 3. Apple 13.Hortonworks 4. Yahoo 14.Zynga 5. Facebook 15.eBay 6. Salesforce 7. Twitter 8. Cloudera 9. LinkedIN 10.NetFlix 23 © 2009/2012 Pythian © All Rights Reserved
  • 24. 1. www.kaggle.com 2. www.indeed.com 3. www.recordedfuture.com 4. www.datamarket.com 5. www.climate.com 6. www.manybills.com 7. www.electrion.twitter.com 8. www.consensu.gov 9. www.coursera.com 10. www.data.gov 24 © 2009/2012 Pythian © All Rights Reserved
  • 25. What is the size of the BIG DATA Market? Deloitte pegs the size of the big data market at about $1.3-$1.5 billion in 2012 In March, the IDC released a statement that predicted the worldwide big data technology services market to reach $16.9 billion in 2015. The 2012 Global BI SW Market is $35 Billion 25 © 2009/2012 Pythian © All Rights Reserved
  • 26. Where does BI and Big Data co- exist? PREDICTIVE ANALYTICS 26 © 2009/2012 Pythian © All Rights Reserved
  • 27. How does Machine Learning and Big Data relate? PREDICTIVE ANALYTICS 27 © 2009/2012 Pythian © All Rights Reserved
  • 28. When is Big Data valuable? 1. When better Business decisions result from practical insights provided by data that were unavailable to expert judgment or unaware by experts 2. When time-to-insight results in big returns or benefit eg. Real time book recommendation 3. Where precision of analysis results in specific alternative decisions 4. Where patterns from heterogeneous or seemingly disparate data sources provide material competitive insights/advantage versus competition 28 © 2009/2012 Pythian © All Rights Reserved
  • 29. What is unique about Big Data Technology? MASSIVE PARRALLISM AFFORDABLE HARDWARE LOCAL PROCESSING 1. The tools do not require the data to be first structured in a particular schema as is required in relational databases 2. Data is analyzed in native format closest to where it is stored, dramatically reducing the time and effort for retrieval and restore. 29 © 2009/2012 Pythian © All Rights Reserved
  • 30. Visualization may unlock the key to Big Data Insights 30 © 2009/2012 Pythian © All Rights Reserved
  • 31. What skills do I need in my organization for Big Data? 1. Data scientists – • Identify what analysis makes sense in context. Typical background in math and statistics, as well as artificial intelligence and natural language processing. 2. Data architects – • Create Data mode and identify required data sources and analytical tools 3. Data visualizers – • Using visualizations exploring what the data means and presenting how it will impact the company 4. Data change agents – • Good communicators, and a Six Sigma background — Understand how to apply statistics. 31 © 2009/2012 Pythian © All Rights Reserved
  • 32. What skills do I need in my organization for Big Data? 5. Data engineer/operators – • Big Data infrastructure operations. Develop architecture that helps analyze and supply data in the way the business needs, and make sure systems are performing smoothly 6. Data stewards – • Ensure that data sources are properly accounted for, and may also maintain a centralized repository as part of a Master Data Management approach, in which there is one ―gold copy‖ of enterprise data to be referenced. 7. Data virtualization/cloud specialists – • Build and maintain a virtualized data service layer that can draw data from any source and make it available across organizations in a consistent, easy-to-access manner 8. Systems Administrators 32 © 2009/2012 Pythian © All Rights Reserved
  • 33. Six Steps to Big Data alchemy? 1. Select the right data sets • Identify rich data sources which may contain insights to a particular problem you are trying to solve or insight you are trying to gain. Social media data is providing incredible insights to changes in Brand positioning and new product introductions 2. Join the various sets of data • Rich unstructured and sometimes incomplete data into a new set for manipulation and analysis 3. Clean the new large data set • Begin to discover important and relevant patterns, signatures, anomalies, correlations, outliers using advanced analytic models 4. Create models • These models predict outcomes using the data. Iterate your hypothesis and keep experimenting 5. Use visualization tools • Visualization may assist in discovery or presentation of key insights from the data 6. Iterate • Keep varying your various models and data sets to assist future planning or decision making 33 © 2009/2012 Pythian © All Rights Reserved
  • 34. How is Big Data providing Value today? • On line Media and Social Sites mine user behavior Big Data for what interests whom, when, why and how. Big WEB SURF Data provides insights to Sites of what people are interested in, whom do they share that information with, and how long they stay engaged on line. • On line retailers mining Big Data to predict consumers buying behavior, purchase preferences and high impact offers to drive up total spend per session. • Insurance companies mining Big Data can improve their overall performance by facilitating greater pricing accuracy, deeper relationships with customers, and more effective and efficient loss prevention. 34 © 2009/2012 Pythian © All Rights Reserved
  • 35. How can Pythian help you with Big Data? 1. First, get informed. 2. Second, get started. Recognize an opportunity for competitive Advantage within your company. 3. Third, get the right team of people involved. Organize an internal task force to drive the Big Data initiative. Don‘t forget to find the critical Data Scientist. That person who will understand the data sources and know what questions to pose. 4. Fourth, identify the key sources of Big Data both external and internal. 5. Fifth, with Pythian‘s assistance evaluate the tools and technology that will help your Big Data program. 35 © 2009/2012 Pythian © All Rights Reserved
  • 36. Key Questions for Executives • What does the data say? • Where did the data come from? • Has the data been sufficiently cleaned? • How was the data analyzed? • How confident can we be in our analysis? • Can we distinguish correlation from causality? • How much will the data influence the key decision makers? 36 © 2009/2012 Pythian © All Rights Reserved
  • 37. A compelling balanced perspective on Big Data Stephen Few- Perceptual Edge 37 © 2009/2012 Pythian © All Rights Reserved
  • 38. Archive Slides 38 © 2009/2012 Pythian © All Rights Reserved
  • 39. Big Data Start-ups • WeatherBill (which compiles large amounts of weather data from a variety of sources, then sells insurance based on statistical analysis), • Klout (a controversial startup that processes large amounts of data to create every users‘s social influence score) or • Wonga (which crunches data to grant financial loans) are some early examples of startups with big data as their core DNA. • John Partridge, the president and CEO of Tokutek Inc. — a Lexington company founded in 2006 that makes databases run faster. • Trifacta raised $$4.3 million from Accel‘s Big Data fund for a solution that doesn‘t just visualize insight, but also the analytics tools that produce it. • Platfora is a software company based in San Mateo, California, building a revolutionary BI and analytics platform that democratizes and simplifies use of big data and Hadoop. The company was founded by Ben Werther, former product head of Greenplum, an analytical database company acquired by EMC. Platfora is assembling a superb team of data and distributed systems architects/engineers, UI and UX developers, and data scientists. 39 © 2009/2012 Pythian © All Rights Reserved
  • 40. Big Data Start-ups • About MapR Technologies MapR delivers on the promise of Hadoop, making managing and analyzing Big Data a reality for more business users. MapR enables customers to harness the power of Big Data analytics. Leading companies including Amazon, Cisco, EMC and Google partner with MapR to deliver an enterprise-grade Hadoop solution. Investors include Lightspeed Venture Partners, NEA and Redpoint Ventures. • Alteryx provides indispensable analytic solutions for enterprise and SMB companies making critical decisions about how to expand and grow. Our product, Alteryx Strategic Analytics, is a desktop-to-cloud Agile BI and analytics solution designed for data artisans and business leaders that brings together the market knowledge, location insight, and business intelligence today‘s organizations require. For more than a decade, Alteryx has enabled strategic planning executives to identify and seize market opportunities, outsmart their competitors, and drive more revenue. 40 © 2009/2012 Pythian © All Rights Reserved