Enviar búsqueda
Cargar
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights and Streams
•
2 recomendaciones
•
2,473 vistas
Vladimir Bacvanski, PhD
Seguir
Tecnología
Empresariales
Denunciar
Compartir
Denunciar
Compartir
1 de 46
Descargar ahora
Descargar para leer sin conexión
Recomendados
Big Data = Big Decisions
Big Data = Big Decisions
InnoTech
Research paper on big data and hadoop
Research paper on big data and hadoop
Shree M.L.Kakadiya MCA mahila college, Amreli
Integrating Big Data Technologies
Integrating Big Data Technologies
DATAVERSITY
NextGen Infrastructure for Big Data
NextGen Infrastructure for Big Data
Ed Dodds
Big data, data science & fast data
Big data, data science & fast data
Kunal Joshi
Chris Marsden, University of Essex (Plenary): Regulation, Standards, Governan...
Chris Marsden, University of Essex (Plenary): Regulation, Standards, Governan...
i_scienceEU
Big Data & Hadoop Introduction
Big Data & Hadoop Introduction
Jayant Mukherjee
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
Rohit Dubey
Recomendados
Big Data = Big Decisions
Big Data = Big Decisions
InnoTech
Research paper on big data and hadoop
Research paper on big data and hadoop
Shree M.L.Kakadiya MCA mahila college, Amreli
Integrating Big Data Technologies
Integrating Big Data Technologies
DATAVERSITY
NextGen Infrastructure for Big Data
NextGen Infrastructure for Big Data
Ed Dodds
Big data, data science & fast data
Big data, data science & fast data
Kunal Joshi
Chris Marsden, University of Essex (Plenary): Regulation, Standards, Governan...
Chris Marsden, University of Essex (Plenary): Regulation, Standards, Governan...
i_scienceEU
Big Data & Hadoop Introduction
Big Data & Hadoop Introduction
Jayant Mukherjee
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
Rohit Dubey
Data Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research Opportunities
Kathirvel Ayyaswamy
Big data analytics, research report
Big data analytics, research report
JULIO GONZALEZ SANZ
Motivation for big data
Motivation for big data
Arockiaraj Durairaj
Big Data Marketing Analytics
Big Data Marketing Analytics
Akash Tyagi
Structuring Big Data
Structuring Big Data
Fujitsu UK
Core concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data Analytics
Kaniska Mandal
Big_data_ppt
Big_data_ppt
Sadhana Singh
IBM-Why Big Data?
IBM-Why Big Data?
Kun Le
Big data Analytics
Big data Analytics
TUSHAR GARG
On Big Data
On Big Data
arttan2001
Big data ppt
Big data ppt
Nasrin Hussain
big data analytics in mobile cellular network
big data analytics in mobile cellular network
shubham patil
Big data analysis using map/reduce
Big data analysis using map/reduce
RenuSuren
Big Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –Review
IJERA Editor
Big Data
Big Data
Rohit Jain
Big data
Big data
FACTS Computer Software L.L.C
Chapter 1 big data
Chapter 1 big data
Prof .Pragati Khade
BIG DATA
BIG DATA
Shashank Shetty
Big data analytics, survey r.nabati
Big data analytics, survey r.nabati
nabati
Our big data
Our big data
uthrarajan
VMs All the Way Down (BSides Delaware 2016)
VMs All the Way Down (BSides Delaware 2016)
John Hubbard
Philips Big Data Expo
Philips Big Data Expo
BigDataExpo
Más contenido relacionado
La actualidad más candente
Data Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research Opportunities
Kathirvel Ayyaswamy
Big data analytics, research report
Big data analytics, research report
JULIO GONZALEZ SANZ
Motivation for big data
Motivation for big data
Arockiaraj Durairaj
Big Data Marketing Analytics
Big Data Marketing Analytics
Akash Tyagi
Structuring Big Data
Structuring Big Data
Fujitsu UK
Core concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data Analytics
Kaniska Mandal
Big_data_ppt
Big_data_ppt
Sadhana Singh
IBM-Why Big Data?
IBM-Why Big Data?
Kun Le
Big data Analytics
Big data Analytics
TUSHAR GARG
On Big Data
On Big Data
arttan2001
Big data ppt
Big data ppt
Nasrin Hussain
big data analytics in mobile cellular network
big data analytics in mobile cellular network
shubham patil
Big data analysis using map/reduce
Big data analysis using map/reduce
RenuSuren
Big Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –Review
IJERA Editor
Big Data
Big Data
Rohit Jain
Big data
Big data
FACTS Computer Software L.L.C
Chapter 1 big data
Chapter 1 big data
Prof .Pragati Khade
BIG DATA
BIG DATA
Shashank Shetty
Big data analytics, survey r.nabati
Big data analytics, survey r.nabati
nabati
Our big data
Our big data
uthrarajan
La actualidad más candente
(20)
Data Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research Opportunities
Big data analytics, research report
Big data analytics, research report
Motivation for big data
Motivation for big data
Big Data Marketing Analytics
Big Data Marketing Analytics
Structuring Big Data
Structuring Big Data
Core concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data Analytics
Big_data_ppt
Big_data_ppt
IBM-Why Big Data?
IBM-Why Big Data?
Big data Analytics
Big data Analytics
On Big Data
On Big Data
Big data ppt
Big data ppt
big data analytics in mobile cellular network
big data analytics in mobile cellular network
Big data analysis using map/reduce
Big data analysis using map/reduce
Big Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –Review
Big Data
Big Data
Big data
Big data
Chapter 1 big data
Chapter 1 big data
BIG DATA
BIG DATA
Big data analytics, survey r.nabati
Big data analytics, survey r.nabati
Our big data
Our big data
Destacado
VMs All the Way Down (BSides Delaware 2016)
VMs All the Way Down (BSides Delaware 2016)
John Hubbard
Philips Big Data Expo
Philips Big Data Expo
BigDataExpo
(SEC320) Leveraging the Power of AWS to Automate Security & Compliance
(SEC320) Leveraging the Power of AWS to Automate Security & Compliance
Amazon Web Services
Delivering Quality Open Data by Chelsea Ursaner
Delivering Quality Open Data by Chelsea Ursaner
Data Con LA
Global Azure Bootcamp - Azure OMS
Global Azure Bootcamp - Azure OMS
Bruno Lopes
Microsoft Big Data Expo
Microsoft Big Data Expo
BigDataExpo
SRE Study Notes - CH2,3,4
SRE Study Notes - CH2,3,4
Rick Hwang
Opensource Search Engines
Opensource Search Engines
cusy GmbH
Walmart Big Data Expo
Walmart Big Data Expo
BigDataExpo
De Persgroep Big Data Expo
De Persgroep Big Data Expo
BigDataExpo
(MBL303) Get Deeper Insights Using Amazon Mobile Analytics | AWS re:Invent 2014
(MBL303) Get Deeper Insights Using Amazon Mobile Analytics | AWS re:Invent 2014
Amazon Web Services
C1 keynote creating_your_enterprise_cloud_strategy
C1 keynote creating_your_enterprise_cloud_strategy
Dr. Wilfred Lin (Ph.D.)
Greach 2014 Sesamestreet Grails2 Workshop
Greach 2014 Sesamestreet Grails2 Workshop
Fernando Redondo Ramírez
KD2017_System Center in the "cloud first" era
KD2017_System Center in the "cloud first" era
Tomica Kaniski
I1 - Securing Office 365 and Microsoft Azure like a rockstar (or like a group...
I1 - Securing Office 365 and Microsoft Azure like a rockstar (or like a group...
SPS Paris
Becoming the master of disaster... with asr
Becoming the master of disaster... with asr
nj-azure
Info qiy foundation digital me - dappre-eng-aug17
Info qiy foundation digital me - dappre-eng-aug17
BigDataExpo
Big Data Expo 2015 - Trillium software Big Data and the Data Quality
Big Data Expo 2015 - Trillium software Big Data and the Data Quality
BigDataExpo
Sfeldman bbworld 07_going_enterprise (1)
Sfeldman bbworld 07_going_enterprise (1)
Steve Feldman
Bol.com
Bol.com
BigDataExpo
Destacado
(20)
VMs All the Way Down (BSides Delaware 2016)
VMs All the Way Down (BSides Delaware 2016)
Philips Big Data Expo
Philips Big Data Expo
(SEC320) Leveraging the Power of AWS to Automate Security & Compliance
(SEC320) Leveraging the Power of AWS to Automate Security & Compliance
Delivering Quality Open Data by Chelsea Ursaner
Delivering Quality Open Data by Chelsea Ursaner
Global Azure Bootcamp - Azure OMS
Global Azure Bootcamp - Azure OMS
Microsoft Big Data Expo
Microsoft Big Data Expo
SRE Study Notes - CH2,3,4
SRE Study Notes - CH2,3,4
Opensource Search Engines
Opensource Search Engines
Walmart Big Data Expo
Walmart Big Data Expo
De Persgroep Big Data Expo
De Persgroep Big Data Expo
(MBL303) Get Deeper Insights Using Amazon Mobile Analytics | AWS re:Invent 2014
(MBL303) Get Deeper Insights Using Amazon Mobile Analytics | AWS re:Invent 2014
C1 keynote creating_your_enterprise_cloud_strategy
C1 keynote creating_your_enterprise_cloud_strategy
Greach 2014 Sesamestreet Grails2 Workshop
Greach 2014 Sesamestreet Grails2 Workshop
KD2017_System Center in the "cloud first" era
KD2017_System Center in the "cloud first" era
I1 - Securing Office 365 and Microsoft Azure like a rockstar (or like a group...
I1 - Securing Office 365 and Microsoft Azure like a rockstar (or like a group...
Becoming the master of disaster... with asr
Becoming the master of disaster... with asr
Info qiy foundation digital me - dappre-eng-aug17
Info qiy foundation digital me - dappre-eng-aug17
Big Data Expo 2015 - Trillium software Big Data and the Data Quality
Big Data Expo 2015 - Trillium software Big Data and the Data Quality
Sfeldman bbworld 07_going_enterprise (1)
Sfeldman bbworld 07_going_enterprise (1)
Bol.com
Bol.com
Similar a How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights and Streams
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Mark Heid
Big Data and Implications on Platform Architecture
Big Data and Implications on Platform Architecture
Odinot Stanislas
Big Data a big deal?
Big Data a big deal?
Andrew Waitman
Intel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick Knupffer
IntelAPAC
Data mining with big data
Data mining with big data
Sandip Tipayle Patil
Big data seminor
Big data seminor
berasrujana
Big Data - A Real Life Revolution
Big Data - A Real Life Revolution
Capgemini
Ibm big data hadoop summit 2012 james kobielus final 6-13-12(1)
Ibm big data hadoop summit 2012 james kobielus final 6-13-12(1)
Ajay Ohri
big-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptx
VaishnavGhadge1
INF2190_W1_2016_public
INF2190_W1_2016_public
Attila Barta
Ibm big dataibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousing
DataWorks Summit
Big data ppt
Big data ppt
OECLIB Odisha Electronics Control Library
Big data data lake and beyond
Big data data lake and beyond
Rajesh Kumar
Building Data Science Ecosystems for Smart Cities and Smart Commerce
Building Data Science Ecosystems for Smart Cities and Smart Commerce
Alex Liu
IBM Stream au Hadoop User Group
IBM Stream au Hadoop User Group
Modern Data Stack France
Big data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You Want
Stuart Miniman
Building Confidence in Big Data - IBM Smarter Business 2013
Building Confidence in Big Data - IBM Smarter Business 2013
IBM Sverige
IBM CDS Overview
IBM CDS Overview
Jean Tan
Big data presentation (2014)
Big data presentation (2014)
Xavier Constant
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Cynthia Saracco
Similar a How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights and Streams
(20)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data and Implications on Platform Architecture
Big Data and Implications on Platform Architecture
Big Data a big deal?
Big Data a big deal?
Intel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick Knupffer
Data mining with big data
Data mining with big data
Big data seminor
Big data seminor
Big Data - A Real Life Revolution
Big Data - A Real Life Revolution
Ibm big data hadoop summit 2012 james kobielus final 6-13-12(1)
Ibm big data hadoop summit 2012 james kobielus final 6-13-12(1)
big-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptx
INF2190_W1_2016_public
INF2190_W1_2016_public
Ibm big dataibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousing
Big data ppt
Big data ppt
Big data data lake and beyond
Big data data lake and beyond
Building Data Science Ecosystems for Smart Cities and Smart Commerce
Building Data Science Ecosystems for Smart Cities and Smart Commerce
IBM Stream au Hadoop User Group
IBM Stream au Hadoop User Group
Big data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You Want
Building Confidence in Big Data - IBM Smarter Business 2013
Building Confidence in Big Data - IBM Smarter Business 2013
IBM CDS Overview
IBM CDS Overview
Big data presentation (2014)
Big data presentation (2014)
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Más de Vladimir Bacvanski, PhD
High performance database applications with pure query and ibm data studio.ba...
High performance database applications with pure query and ibm data studio.ba...
Vladimir Bacvanski, PhD
Win Friends and Influence People... with DSLs
Win Friends and Influence People... with DSLs
Vladimir Bacvanski, PhD
Crash Introduction to Modern Java Data Access: Understanding JPA, Hibernate, ...
Crash Introduction to Modern Java Data Access: Understanding JPA, Hibernate, ...
Vladimir Bacvanski, PhD
UML for Data Architects
UML for Data Architects
Vladimir Bacvanski, PhD
Best Practices of Data Modeling with InfoSphere Data Architect
Best Practices of Data Modeling with InfoSphere Data Architect
Vladimir Bacvanski, PhD
Revolutionizing the Data Abstraction Layer with IBM Optim pureQuery and DB2
Revolutionizing the Data Abstraction Layer with IBM Optim pureQuery and DB2
Vladimir Bacvanski, PhD
Web 2.0 Development with IBM DB2
Web 2.0 Development with IBM DB2
Vladimir Bacvanski, PhD
Más de Vladimir Bacvanski, PhD
(7)
High performance database applications with pure query and ibm data studio.ba...
High performance database applications with pure query and ibm data studio.ba...
Win Friends and Influence People... with DSLs
Win Friends and Influence People... with DSLs
Crash Introduction to Modern Java Data Access: Understanding JPA, Hibernate, ...
Crash Introduction to Modern Java Data Access: Understanding JPA, Hibernate, ...
UML for Data Architects
UML for Data Architects
Best Practices of Data Modeling with InfoSphere Data Architect
Best Practices of Data Modeling with InfoSphere Data Architect
Revolutionizing the Data Abstraction Layer with IBM Optim pureQuery and DB2
Revolutionizing the Data Abstraction Layer with IBM Optim pureQuery and DB2
Web 2.0 Development with IBM DB2
Web 2.0 Development with IBM DB2
Último
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
Allon Mureinik
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Safe Software
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Delhi Call girls
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
BookNet Canada
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Anna Loughnan Colquhoun
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Gabriella Davis
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
2toLead Limited
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
Sinan KOZAK
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
soniya singh
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
hans926745
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
Radu Cotescu
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Miguel Araújo
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
ThousandEyes
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
Pooja Nehwal
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Alan Dix
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Maria Levchenko
Último
(20)
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights and Streams
1.
How to Crunch
Petabytes with Hadoop and Big Data using InfoSphere BigInsights and Streams Tom Deutsch, IBM Vladimir B Vl di i Bacvanski, Founder, SciSpike ki F d S iS ik vladimir.bacvanski@scispike.com Stephen Brodsky, Technical Executive and Distinguished Engineer, IBM sbrodsky@us.ibm.com b d k @ ib August 24, 2011 © 2011 IBM Corporation & SciSpike
2.
Who are we?
Dr. Vladimir Bacvanski – Consultant, trainer, and mentor focusing on making clients successful in adopting new data and software approaches – Over 20 years of experience y p – Founder of SciSpike – a training and consulting firm specializing in advanced software and data technologies Stephen Brodsky, Ph.D. – Di ti Distinguished E i i h d Engineer and T h i l E d Technical Executive f IBM Bi D t ti for Big Data initiatives at the IBM Silicon Valley Laboratory – Previously led the architecture for the Optim Data Studio product line and pureQuery and was a member of the architecture team for DB2 pureXML, Rational Application Developer (RAD), and WebSphere. 2 © 2011 IBM Corporation & SciSpike
3.
Agenda The
“Big Data challenge: smarter analytics for a Big Data” smarter planet How to do it? – The big data challenge –FFoundations of Big D d i f Bi Data approaches h – MapReduce and Hadoop – Real-time data and stream processing – Integration with existing systems 3 © 2011 IBM Corporation & SciSpike
4.
The “Big Data”
Challenge August 24, 2011 © 2011 IBM Corporation & SciSpike
5.
The World is
Changing and Becoming More… More INSTRUMENTED INTERCONNECTED INTELLIGENT The Th resulting explosion of information creates a need for lti l i fi f ti t df a new kind of intelligence …to help build a Smarter Planet 5 © 2011 IBM Corporation & SciSpike
6.
Information is Growing
at a Phenomenal Rate . . . . 44x 44 as much data and content over coming decade 80% Of world’s data is unstructured 2020 35 zettabytes (35 billion terabytes) 2009 800,000 petabytes Source: IDC, The Digital Universe Decade – Are You Ready?, May 2010 6 © 2011 IBM Corporation & SciSpike
7.
The BIG Data
Challenge • Manage and benefit from massive and growing amounts of data • Handle varied data formats (structured, unstructured, semi-structured) and increased data velocity • Exploit BIG Data in a timely and cost effective fashion COLLECT MANAGE Collect Manage Integrate INTEGRATE Analyze ANALYZE 7 © 2011 IBM Corporation & SciSpike
8.
What clients are
saying . . . Lots of potentially valuable data is dormant or discarded p y due to size/performance considerations Large volume of unstructured or semi-structured data is not worth semi structured integrating fully (e.g. Tweets, logs, . . .) Not clear what should be analyzed (exploratory iterative) (exploratory, Information distributed across multiple systems and/or Internet Some information has a short useful lifespan Volumes can be extremely high Analysis needed in the context of existing information (not stand alone) 8 © 2011 IBM Corporation & SciSpike
9.
Big Data Presents
Big Opportunities Extract insight from a high volume, variety and velocity of data in a timely and cost-effective manner Variety: Manage and benefit from diverse data types and data structures Velocity: Analyze streaming data and large volumes of persistent data Volume: Scale from terabytes to zettabytes ettabytes 9 9 © 2011 IBM Corporation & SciSpike
10.
Streams and Oceans
of Information . . . . Information oceans Information streams Information stored outside High Hi h speed information flowing in di f ti fl i i conventional systems. Data may ti l t D t real-time, often transient originate from the Web or different Information from sensors, instruments, internal different systems etc. etc Information flowing from real-time logs Collection of what has streamed and activity monitors Information from social media, logs, click Streaming content like audio and video streams, emails, etc. High speed transactions like tickers, trades, or traffic systems Unstructured or mixed schema documents like claims, forms, desktop applications, etc. Structured data from disparate systems 10 © 2011 IBM Corporation & SciSpike
11.
Applications for Big
Data Analytics Smarter Healthcare Multi-channel sales Finance Homeland security Traffic Control Telecom Manufacturing Trading Analytics Many more! 11 © 2011 IBM Corporation & SciSpike
12.
Use Case Example:
Energy Company Business scenario Analyze large volumes of public and private weather data for alternative energy business E i ti hi h Existing high-performance computing f ti hardware, limited staff Technical challenges High data volume: 2+ PB Range of q y types g query yp - Avg temp in given location? (Small result) - Geo pts where ice may form on wind turbines? (Large result derived values – result, icing determined by humidity + temp.) Run on system with non-Hadoop apps 12 © 2011 IBM Corporation & SciSpike
13.
Use Case Example:
Global Media Firm Business scenario Identify unauthorized content streaming in digital media (piracy) - Quantify annual revenue loss - Analyze trends Monitor social media sites to identify dissemination of pirated content. Time sensitive! Technical challenges High variety of unstructured and semi- structured data. t t dd t Initial focus: text analytics over 1 year’s worth of social media data. Look for live streaming URLs, sentiment, event info, etc. Complex rules to qualify & classify info Future potential for video analysis 13 © 2011 IBM Corporation & SciSpike
14.
IBM Watson IBM Watson
is a breakthrough in analytic innovation, but it is only successful because of the quality of the information from which it is working. 14 © 2011 IBM Corporation & SciSpike
15.
Big Data and
Watson Big Data technology is used to build Watson technology offers great potential Watson’s knowledge base for advanced business analytics Watson uses the Apache Hadoop open framework to distribute the workload for loading information into memory. CRM Data POS Data Social Media Approx. 200M pages of text (To compete on Jeopardy!) Distilled Insight - Spending habits - Social relationships - Buying trends InfoSphere BigInsights oSp e e g s g ts Watson’s Memory Advanced search and analysis 15 © 2011 IBM Corporation & SciSpike
16.
Customer Engagements Use patterns
Common requirements • Customer sentiment analysis (cross- (cross • Extract business insight from large volumes of sell, up-sell, campaign management) raw data (often outside operational systems) • Integrated retail and web customer • Integrate with other existing software behavior modeling g • Ready for enterprise use • Predictive modeling (credit card fraud) • System log analytics (reduce operational risk) p ) Consumer Text, Blog, Text Blog Weblog Insight Click streams Multi-channel sales Log & transactions Next Gen Text Analytics Biological Sequences Fraud Models Operational system & streams data sources p y New Business Stat st ca ode Statistical Model Development Building 1616 © 2011 IBM Corporation & SciSpike
17.
The approach to
crunching big data August 24, 2011 © 2011 IBM Corporation & SciSpike
18.
How to approach
Big Data analytics? InfoSphere BigInsights and InfoSphere Streams • Analytics for data in-motion and at-rest • Platform for processing large volumes of diverse data • Complements and integrates with existing software solutions 18 © 2011 IBM Corporation & SciSpike
19.
Addressing the Key
Requirements 1. Platform for V3 – Variety, Velocity, Volume Variety - manage data & content “As Is” Handle any velocity - low-latency streams and large volume batch Volume - huge volumes of at-rest or streaming data Big Data Platform 2 Analytics for V3 2. Analyze Sources in their native format - text, data, rich content Analyze all of the data - not just a subset Dynamic analytics - automatic adjustments and actions 3. Ease of Use for Developers and Users Developer UIs, common languages & automatic optimization End-user UIs & visualization 4. Enterprise Class Failure tolerance, Security and Privacy Scale Economically 5. Extensive Integration Capabilities Integrate wide variety of sources Leverage enterprise integration technologies 19 © 2011 IBM Corporation & SciSpike
20.
Big D t
I iti ti Bi Data Initiative Volumes of diverse persistent data diverse, Analytic applications for “Big Data” InfoSphere p BigInsights Warehouse Traditional warehouse applications IBM Confidential InfoSphere Streams Real-time streaming data 20 © 2011 IBM Corporation & SciSpike
21.
BigInsights Summary
BigInsights = analytical platform for persistent “Big Data” – Based on open source & IBM technologies Distinguishing characteristics – Built-in analytics . . . . Enhances business knowledge – Enterprise soft are integration . . . . Complements and e tends software extends existing capabilities – Production-ready platform . . . . Speeds time-to-value; simplifies development and maintenance 21 © 2011 IBM Corporation & SciSpike
22.
Big Data Platform
Vision Bringing Big Data to the Enterprise Data Big Data Solutions Warehouse Information Integration Big Data User Environments Developers End Users Administrators Master Data Mgmt IN NTEGRATIO AGENTS A Database Big Data Enterprise Engines Content ON Analytics Business Analytics Streaming Analytics g y Internet Scale Analytics y Marketing Open Source Foundational Components Data Growth Management 22 © 2011 IBM Corporation & SciSpike
23.
InfoSphere BigInsights v
1.1 Platform for volume, variety, velocity -- V3 Hadoop foundation Analytics for V3 Text analytics & tooling Enterprise Edition Licensed Usability Web admin console, LDAP authentication Web administrative lass RDBMS, warehouse connectivity nterprise cl console Text analytics Basic Edition Integrated install Spreadsheet-style analytic tool Free download Flexible job scheduler Spreadsheet-style analytic t l l ti tool Apache 24 x 7 Web En Hadoop support Enterprise Class Storage, security, cluster management Breadth of capabilities Integration Connectivity to DB2, Netezza 23 © 2011 IBM Corporation & SciSpike
24.
BigInsights Platform: Key
Ideas Flexible, enterprise-class support for processing large volumes of data – Based on Google’s MapReduce technology – Inspired by Apache Hadoop; compatible with its ecosystem a d sp ed pac e adoop; co pat b e t ts ecosyste and distribution – Well-suited to batch-oriented, read-intensive applications – Supports wide variety of data Enables applications to work with thousands of nodes and petabytes of data in a highly parallel, cost effective manner t b t fd t i hi hl ll l t ff ti – CPU + disks = “node” – Nodes can be combined into clusters – New nodes can be added as needed without changing • Data formats • How data is loaded • How jobs are written 24 © 2011 IBM Corporation & SciSpike
25.
The M R
d Th MapReduce Programming Model P i M d l "Map" step: Map – Input split into pieces – W k nodes process individual pieces i parallel ( d Worker d i di id l i in ll l (under global control of the Job Tracker node) – Each worker node stores its result in its local file system where a reducer is able to access it "Reduce" step: – Data is aggregated (‘reduced” from the map steps) by ( reduced worker nodes (under control of the Job Tracker) – M lti l reduce tasks can parallelize th aggregation Multiple d t k ll li the ti 25 25 © 2011 IBM Corporation & SciSpike
26.
What is Hadoop?
Apache Hadoop = free, open source framework for data- intensive applications – Inspired by Google technologies (MapReduce, GFS) – Well-suited to batc o e ted, read-intensive app cat o s e su ted batch-oriented, ead te s e applications – Originally built to address scalability problems of Nutch, an open source Web search technology Enables applications to work with thousands of nodes and petabytes of data in a highly parallel, cost effective manner – CPU + disks of commodity b = H d di k f dit box Hadoop “ d ” “node” – Boxes can be combined into clusters – New nodes can be added as needed without changing • Data formats • How data is loaded • How jobs are written 26 © 2011 IBM Corporation & SciSpike
27.
Two Key Aspects
of Hadoop MapReduce framework – How Hadoop understands and assigns work to the nodes (machines) Hadoop Distributed File System = HDFS – Where Hadoop stores data – A file system that spans all the nodes in a Hadoop cluster – It links together the file systems on many local nodes to make them into one big file system 27 © 2011 IBM Corporation & SciSpike
28.
Logical MapReduce Example:
Word Count Content of Input Documents Hello World Bye World map(String key, String value): Hello IBM // key: document name // value: document contents Map 1 emits: < Hello, 1> for each word w in value: < World, 1> EmitIntermediate(w, 1 ); EmitIntermediate(w "1"); < Bye, 1> Bye < World, 1> reduce(String key, Iterator values): ( g y, ) Map 2 emits: < Hello, 1> // key: a word < IBM, 1> // values: a list of counts Reduce (final output): int result = 0; < Bye, 1> for each v in values: < IBM, 1> result += ParseInt(v); < H ll 2> Hello, 2 Emit(AsString(result)); < World, 2> 28 © 2011 IBM Corporation & SciSpike
29.
How To Create
MapReduce Jobs MapReduce development in Java p p – Low level, very flexible – Time consuming development Hive – Open source language / Apache sub-project sub project – Provides a SQL-like interface to Hadoop Pig – Data flow language / Apache sub-project Jaql – A query language for JSON – Useful for loosely structured data 29 © 2011 IBM Corporation & SciSpike
30.
Management Tools: Web
Console Graphically manage cluster, jobs, HDFS Sample administration tasks – Start/Stop Servers – Add/Remove Servers – Server Status Details (Log) 30 © 2011 IBM Corporation & SciSpike
31.
Spreadsheet like Spreadsheet-like Analysis
Tool Web-based analysis BigSheets and visualization tool Spreadsheet-like interface – Define and manage long running data collection j b ll i jobs – Analyze content of the text on the pages that have been retrieved 31 © 2011 IBM Corporation & SciSpike
32.
Text Analytics •
Distill structured info from unstructured data "Acquisition" • Sentiment analysis "Address" Address "Alliance" • Consumer behavior "AnalystEarningsEstimate" • Illegal or suspicious activities "City" "CompanyEarningsAnnouncement" CompanyEarningsAnnouncement • ... "CompanyEarningsGuidance" "Continent" "Country" • Pre-built library of text annotators for common "County" County business entities "DateTime" "EmailAddress" "JointVenture" • Rich language and tooling to build custom g g g "Location" Location annotators "Merger" "NotesEmailAddress" "Organization" • Support for Western languages ( g , pp g g (English, "Person" Person Dutch/Flemish, French, German, Italian, "PhoneNumber" Portuguese, or Spanish) plus select Asian "StateOrProvince" languages (Japanese, Chinese) "URL" "ZipCode" ZipCode 32 32 © 2011 IBM Corporation & SciSpike
33.
Eclipse based Eclipse-based Text
Analytics Development 33 © 2011 IBM Corporation & SciSpike
34.
So What Does
This Result In? Easy To Scale Fault Tolerant and Self-Healing Data Agnostic Extremely Flexible 34 © 2011 IBM Corporation & SciSpike
35.
Working with streaming
data: a new paradigm Conventional processing: static data Queries Data Results Real-time processing: streaming data Data Queries Results 35 © 2011 IBM Corporation & SciSpike
36.
Real-Time Real Time Data
with InfoSphere Streams Source Sink Streaming analytic applications Adapters Operator Repository Adapters – M lti l i Multiple input streams t t – Advanced streaming analytics Eclipse based IDE InfoSphere Streams Studio – Define sources, apply operators, (IDE for Streams Processing Language) define intermediary and final output sinks – User defined operators in Java or C++ Automated, Automated Optimized Deploy O i i i Optimizing compiler automates il and Management (Scheduler) deployment and connections – Extremely low latency y y – Cluster of up to 125 nodes 36 © 2011 IBM Corporation & SciSpike
37.
Scalable stream processing
InfoSphere Streams provides – A programming model and IDE f d fi i d t sources and i d l d for defining data d software analytic modules called operators that are fused into process execution units (PEs) – infrastructure to support the composition of scalable stream processing applications from these components – deployment and operation of these applications across distributed p y p pp x86 processing nodes, when scaled processing is required – stream connectivity between data sources and PEs of a stream processing application 37 © 2011 IBM Corporation & SciSpike
38.
Merging the Traditional
and Big Data Approaches Traditional Approach Big Data Approach Structured & Repeatable Analysis Iterative & Exploratory Analysis IT Business Users Delivers a platform to Determine what enable creative bl ti question to ask discovery IT Business Structures the Explores what data to answer questions could be that question q asked Monthly sales reports Brand sentiment Profitability analysis Product strategy Customer surveys Maximum asset utilization 38 © 2011 IBM Corporation & SciSpike
39.
BigInsights and the
data warehouse: filtering and summarizing “Big Data” BigInsights • Broader analytic coverage • Exploits IT investments while p Data warehouse minimizing burden 39 © 2011 IBM Corporation & SciSpike
40.
BigInsights as a
“queryable archive for growing queryable archive” data warehouses BigInsights Data warehouse • Offl d “cold” or dated warehouse info but Offload “ ld” d t d h i f b t maintain access for further exploration • Keep warehouse size manageable and focused on well-known business analytic needs 40 © 2011 IBM Corporation & SciSpike
41.
Trends and directions
Enterprise software integration – Data warehouses, RDBMSs – ETL platforms l tf – Business intelligence tools – Applications – ... Diverse range of analytics – Text – Image / video (e.g., content based user profiling) (e g content-based – Predictive modeling (e.g., ranking and classification based on machine learning) – ... Sophisticated, scalable infrastructure for processing massive data volumes – High-performance file system with full POSIX compliance, g g p y p , granular security – Fully recoverable and restartable workflows – Parallel, distributed indexing for text (“BigIndex”) – Read-optimized column store p – Tooling for administrators, programmers, analysts – ... 41 © 2011 IBM Corporation & SciSpike
42.
Integrating Relational, Streams,
and BigInsights Traditional / Traditional Relational Warehouse Data Sources Database & At-rest Results Warehouse data analytics Non-Traditional / Streams Non-Relational N R l ti l Data Sources In-Motion Ultra Low Analytics Latencyy Results Varied data InfoSphere Big Insights formats Massive Scale Big Data Semi-structured, Batch oriented Results unstructured... data analytics 42 © 2011 IBM Corporation & SciSpike
43.
Typical Strategy for
Analytics ETL SQL Analytics, Mining Data warehouse / marts Source Sources S Transform/ Extract Load subset 43 © 2011 IBM Corporation & SciSpike
44.
Emerging requirements for
analytics SQL Analytics, Mining ETL, ELT (MR BI, Mining) Source Structured Transform, Analyze Warehouses / marts Sources Transform/ Extract subset Load BigInsights g g Source Repository Other Sources Explore large volumes of “raw” or diverse data. Discover, analyze new insights with BigInsights 44 © 2011 IBM Corporation & SciSpike
45.
Conclusions
– Scale out to crunch petabytes – We need a mix of technologies • Data at rest: MapReduce, Hadoop and beyond • Data in motion: stream processing – To be successful, integrate with conventional technologies 45 © 2011 IBM Corporation & SciSpike
46.
Getting in touch
Stephen Brodsky – IBM – Email: sbrodsky@us.ibm.com InfoSphere BigInsights – http://www-01.ibm.com/software/data/infosphere/biginsights/ ttp // 0 b co /so t a e/data/ osp e e/b g s g ts/ InfoSphere Streams – http://www-01.ibm.com/software/data/infosphere/streams/ Vladimir Bacvanski - SciSpike – Email: vladimir.bacvanski@scispike.com – Blog: http://www.OnBuildingSoftware.com/ – Twitter: http://twitter.com/OnSoftware – LinkedIn: http://www.linkedin.com/in/VladimirBacvanski p 46 © 2011 IBM Corporation & SciSpike
Descargar ahora