Enviar búsqueda
Cargar
IBM Stream au Hadoop User Group
•
2 recomendaciones
•
2,827 vistas
Modern Data Stack France
Seguir
Denunciar
Compartir
Denunciar
Compartir
1 de 49
Descargar ahora
Descargar para leer sin conexión
Recomendados
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Mark Heid
Analyse prédictive en assurance santé par Julien Cabot
Analyse prédictive en assurance santé par Julien Cabot
Modern Data Stack France
Ibm big data hadoop summit 2012 james kobielus final 6-13-12(1)
Ibm big data hadoop summit 2012 james kobielus final 6-13-12(1)
Ajay Ohri
Intel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick Knupffer
IntelAPAC
Hadoop, Big Data, and the Future of the Enterprise Data Warehouse
Hadoop, Big Data, and the Future of the Enterprise Data Warehouse
tervela
Information Management and Analytics
Information Management and Analytics
AKAGroup
Intel Cloud Summit: Big Data
Intel Cloud Summit: Big Data
IntelAPAC
BSC 3362 - Big Data and Social Analytics - IOD Conference (IBM)
BSC 3362 - Big Data and Social Analytics - IOD Conference (IBM)
Mark Heid
Recomendados
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Mark Heid
Analyse prédictive en assurance santé par Julien Cabot
Analyse prédictive en assurance santé par Julien Cabot
Modern Data Stack France
Ibm big data hadoop summit 2012 james kobielus final 6-13-12(1)
Ibm big data hadoop summit 2012 james kobielus final 6-13-12(1)
Ajay Ohri
Intel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick Knupffer
IntelAPAC
Hadoop, Big Data, and the Future of the Enterprise Data Warehouse
Hadoop, Big Data, and the Future of the Enterprise Data Warehouse
tervela
Information Management and Analytics
Information Management and Analytics
AKAGroup
Intel Cloud Summit: Big Data
Intel Cloud Summit: Big Data
IntelAPAC
BSC 3362 - Big Data and Social Analytics - IOD Conference (IBM)
BSC 3362 - Big Data and Social Analytics - IOD Conference (IBM)
Mark Heid
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
European Data Forum
Investigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists Toolbox
Data Science London
Tackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integration
DataWorks Summit
Hadoop: What It Is and What It's Not
Hadoop: What It Is and What It's Not
Inside Analysis
SmartData - Monetizing Data Assets
SmartData - Monetizing Data Assets
Ed Dodds
Greenplum hadoop
Greenplum hadoop
Chiou-Nan Chen
The Future of ERP by Bertrand Andries
The Future of ERP by Bertrand Andries
CONFENIS 2012
Finding the “Sweet Spot”: Big Data, Smart Technology, and Domain Knowledge
Finding the “Sweet Spot”: Big Data, Smart Technology, and Domain Knowledge
EmPower Research, a Genpact company
Progress with confidence into next generation IT
Progress with confidence into next generation IT
Paul Muller
The Comprehensive Approach: A Unified Information Architecture
The Comprehensive Approach: A Unified Information Architecture
Inside Analysis
Cetas Predictive Analytics Prezo
Cetas Predictive Analytics Prezo
Pivotal Analytics (Cetas Analytics)
Cetas Analytics as a Service for Predictive Analytics
Cetas Analytics as a Service for Predictive Analytics
J. David Morris
NogaLogic-DataClassification&Governace&BusinessIntelligence
NogaLogic-DataClassification&Governace&BusinessIntelligence
Giuliano Bonassi
IDC Big Data Conference 2013, Milano 20 febbraio
IDC Big Data Conference 2013, Milano 20 febbraio
IDC Italy
Big data primer
Big data primer
Stacia Misner
Big data and big content
Big data and big content
John Mancini
Query at Speed of Thought
Query at Speed of Thought
MISNet - Integeo SE Asia
The power of_mobile_and_social_data_webinar_slides_21_may2012
The power of_mobile_and_social_data_webinar_slides_21_may2012
Accenture
Informatics technologies in an evolving r & d landscape
Informatics technologies in an evolving r & d landscape
National Institute of Biologics
Enabling Flexible Governance for All Data Sources
Enabling Flexible Governance for All Data Sources
Inside Analysis
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Modern Data Stack France
Cascalog présenté par Bertrand Dechoux
Cascalog présenté par Bertrand Dechoux
Modern Data Stack France
Más contenido relacionado
La actualidad más candente
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
European Data Forum
Investigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists Toolbox
Data Science London
Tackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integration
DataWorks Summit
Hadoop: What It Is and What It's Not
Hadoop: What It Is and What It's Not
Inside Analysis
SmartData - Monetizing Data Assets
SmartData - Monetizing Data Assets
Ed Dodds
Greenplum hadoop
Greenplum hadoop
Chiou-Nan Chen
The Future of ERP by Bertrand Andries
The Future of ERP by Bertrand Andries
CONFENIS 2012
Finding the “Sweet Spot”: Big Data, Smart Technology, and Domain Knowledge
Finding the “Sweet Spot”: Big Data, Smart Technology, and Domain Knowledge
EmPower Research, a Genpact company
Progress with confidence into next generation IT
Progress with confidence into next generation IT
Paul Muller
The Comprehensive Approach: A Unified Information Architecture
The Comprehensive Approach: A Unified Information Architecture
Inside Analysis
Cetas Predictive Analytics Prezo
Cetas Predictive Analytics Prezo
Pivotal Analytics (Cetas Analytics)
Cetas Analytics as a Service for Predictive Analytics
Cetas Analytics as a Service for Predictive Analytics
J. David Morris
NogaLogic-DataClassification&Governace&BusinessIntelligence
NogaLogic-DataClassification&Governace&BusinessIntelligence
Giuliano Bonassi
IDC Big Data Conference 2013, Milano 20 febbraio
IDC Big Data Conference 2013, Milano 20 febbraio
IDC Italy
Big data primer
Big data primer
Stacia Misner
Big data and big content
Big data and big content
John Mancini
Query at Speed of Thought
Query at Speed of Thought
MISNet - Integeo SE Asia
The power of_mobile_and_social_data_webinar_slides_21_may2012
The power of_mobile_and_social_data_webinar_slides_21_may2012
Accenture
Informatics technologies in an evolving r & d landscape
Informatics technologies in an evolving r & d landscape
National Institute of Biologics
Enabling Flexible Governance for All Data Sources
Enabling Flexible Governance for All Data Sources
Inside Analysis
La actualidad más candente
(20)
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
Investigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists Toolbox
Tackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integration
Hadoop: What It Is and What It's Not
Hadoop: What It Is and What It's Not
SmartData - Monetizing Data Assets
SmartData - Monetizing Data Assets
Greenplum hadoop
Greenplum hadoop
The Future of ERP by Bertrand Andries
The Future of ERP by Bertrand Andries
Finding the “Sweet Spot”: Big Data, Smart Technology, and Domain Knowledge
Finding the “Sweet Spot”: Big Data, Smart Technology, and Domain Knowledge
Progress with confidence into next generation IT
Progress with confidence into next generation IT
The Comprehensive Approach: A Unified Information Architecture
The Comprehensive Approach: A Unified Information Architecture
Cetas Predictive Analytics Prezo
Cetas Predictive Analytics Prezo
Cetas Analytics as a Service for Predictive Analytics
Cetas Analytics as a Service for Predictive Analytics
NogaLogic-DataClassification&Governace&BusinessIntelligence
NogaLogic-DataClassification&Governace&BusinessIntelligence
IDC Big Data Conference 2013, Milano 20 febbraio
IDC Big Data Conference 2013, Milano 20 febbraio
Big data primer
Big data primer
Big data and big content
Big data and big content
Query at Speed of Thought
Query at Speed of Thought
The power of_mobile_and_social_data_webinar_slides_21_may2012
The power of_mobile_and_social_data_webinar_slides_21_may2012
Informatics technologies in an evolving r & d landscape
Informatics technologies in an evolving r & d landscape
Enabling Flexible Governance for All Data Sources
Enabling Flexible Governance for All Data Sources
Destacado
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Modern Data Stack France
Cascalog présenté par Bertrand Dechoux
Cascalog présenté par Bertrand Dechoux
Modern Data Stack France
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
Modern Data Stack France
Hug france-2012-12-04
Hug france-2012-12-04
Ted Dunning
M7 and Apache Drill, Micheal Hausenblas
M7 and Apache Drill, Micheal Hausenblas
Modern Data Stack France
Hadoop on Azure
Hadoop on Azure
Modern Data Stack France
Talend Open Studio for Big Data (powered by Apache Hadoop)
Talend Open Studio for Big Data (powered by Apache Hadoop)
Modern Data Stack France
Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Modern Data Stack France
Cassandra Hadoop Best Practices by Jeremy Hanna
Cassandra Hadoop Best Practices by Jeremy Hanna
Modern Data Stack France
Paris HUG - Agile Analytics Applications on Hadoop
Paris HUG - Agile Analytics Applications on Hadoop
Hortonworks
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Cedric CARBONE
Cassandra spark connector
Cassandra spark connector
Duyhai Doan
Hadoop HPC, calcul de VAR sur Hadoop vs GridGain
Hadoop HPC, calcul de VAR sur Hadoop vs GridGain
Modern Data Stack France
Dépasser map() et reduce()
Dépasser map() et reduce()
Modern Data Stack France
Hadoop chez Kobojo
Hadoop chez Kobojo
Modern Data Stack France
Big Data et SEO, par Vincent Heuschling
Big Data et SEO, par Vincent Heuschling
Modern Data Stack France
HCatalog
HCatalog
Modern Data Stack France
Hadopp Vue d'ensemble
Hadopp Vue d'ensemble
Modern Data Stack France
Hadoop Graph Analysis par Thomas Vial
Hadoop Graph Analysis par Thomas Vial
Modern Data Stack France
Retour Hadoop Summit 2012
Retour Hadoop Summit 2012
Modern Data Stack France
Destacado
(20)
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Cascalog présenté par Bertrand Dechoux
Cascalog présenté par Bertrand Dechoux
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
Hug france-2012-12-04
Hug france-2012-12-04
M7 and Apache Drill, Micheal Hausenblas
M7 and Apache Drill, Micheal Hausenblas
Hadoop on Azure
Hadoop on Azure
Talend Open Studio for Big Data (powered by Apache Hadoop)
Talend Open Studio for Big Data (powered by Apache Hadoop)
Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Cassandra Hadoop Best Practices by Jeremy Hanna
Cassandra Hadoop Best Practices by Jeremy Hanna
Paris HUG - Agile Analytics Applications on Hadoop
Paris HUG - Agile Analytics Applications on Hadoop
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Cassandra spark connector
Cassandra spark connector
Hadoop HPC, calcul de VAR sur Hadoop vs GridGain
Hadoop HPC, calcul de VAR sur Hadoop vs GridGain
Dépasser map() et reduce()
Dépasser map() et reduce()
Hadoop chez Kobojo
Hadoop chez Kobojo
Big Data et SEO, par Vincent Heuschling
Big Data et SEO, par Vincent Heuschling
HCatalog
HCatalog
Hadopp Vue d'ensemble
Hadopp Vue d'ensemble
Hadoop Graph Analysis par Thomas Vial
Hadoop Graph Analysis par Thomas Vial
Retour Hadoop Summit 2012
Retour Hadoop Summit 2012
Similar a IBM Stream au Hadoop User Group
Ibm big dataibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousing
DataWorks Summit
The New Enterprise Data Platform
The New Enterprise Data Platform
Krishnan Parasuraman
IBM Big Data Platform, 2012
IBM Big Data Platform, 2012
Rob Thomas
01 im overview high level
01 im overview high level
James Findlay
Accelerate Return on Data
Accelerate Return on Data
Jeffrey T. Pollock
Big Data = Big Decisions
Big Data = Big Decisions
InnoTech
Robert LeBlanc - Why Big Data? Why Now?
Robert LeBlanc - Why Big Data? Why Now?
Mauricio Godoy
What is big data - Architectures and Practical Use Cases
What is big data - Architectures and Practical Use Cases
Tony Pearson
Big Data and Implications on Platform Architecture
Big Data and Implications on Platform Architecture
Odinot Stanislas
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
Vladimir Bacvanski, PhD
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
DATAVERSITY
Big Data World Forum
Big Data World Forum
bigdatawf
Big Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the Future
Odinot Stanislas
IBM Software Day 2013. Smarter analytics and big data. building the next gene...
IBM Software Day 2013. Smarter analytics and big data. building the next gene...
IBM (Middle East and Africa)
Big Data, Big Deal? (A Big Data 101 presentation)
Big Data, Big Deal? (A Big Data 101 presentation)
Matt Turck
Big Data: A Big Trap for Product Development
Big Data: A Big Trap for Product Development
Strategy 2 Market, Inc,
Evaluating Big Data Predictive Analytics Platforms
Evaluating Big Data Predictive Analytics Platforms
Teradata Aster
Customer summit - big data (final)
Customer summit - big data (final)
Anand Deshpande
Big Data 視覺化分析解決方案
Big Data 視覺化分析解決方案
Etu Solution
Big data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You Want
Stuart Miniman
Similar a IBM Stream au Hadoop User Group
(20)
Ibm big dataibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousing
The New Enterprise Data Platform
The New Enterprise Data Platform
IBM Big Data Platform, 2012
IBM Big Data Platform, 2012
01 im overview high level
01 im overview high level
Accelerate Return on Data
Accelerate Return on Data
Big Data = Big Decisions
Big Data = Big Decisions
Robert LeBlanc - Why Big Data? Why Now?
Robert LeBlanc - Why Big Data? Why Now?
What is big data - Architectures and Practical Use Cases
What is big data - Architectures and Practical Use Cases
Big Data and Implications on Platform Architecture
Big Data and Implications on Platform Architecture
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
Big Data World Forum
Big Data World Forum
Big Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the Future
IBM Software Day 2013. Smarter analytics and big data. building the next gene...
IBM Software Day 2013. Smarter analytics and big data. building the next gene...
Big Data, Big Deal? (A Big Data 101 presentation)
Big Data, Big Deal? (A Big Data 101 presentation)
Big Data: A Big Trap for Product Development
Big Data: A Big Trap for Product Development
Evaluating Big Data Predictive Analytics Platforms
Evaluating Big Data Predictive Analytics Platforms
Customer summit - big data (final)
Customer summit - big data (final)
Big Data 視覺化分析解決方案
Big Data 視覺化分析解決方案
Big data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You Want
Más de Modern Data Stack France
Stash - Data FinOPS
Stash - Data FinOPS
Modern Data Stack France
Vue d'ensemble Dremio
Vue d'ensemble Dremio
Modern Data Stack France
From Data Warehouse to Lakehouse
From Data Warehouse to Lakehouse
Modern Data Stack France
Talend spark meetup 03042017 - Paris Spark Meetup
Talend spark meetup 03042017 - Paris Spark Meetup
Modern Data Stack France
Paris Spark Meetup - Trifacta - 03_04_2017
Paris Spark Meetup - Trifacta - 03_04_2017
Modern Data Stack France
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Modern Data Stack France
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
Modern Data Stack France
Hadoop France meetup Feb2016 : recommendations with spark
Hadoop France meetup Feb2016 : recommendations with spark
Modern Data Stack France
Hug janvier 2016 -EDF
Hug janvier 2016 -EDF
Modern Data Stack France
HUG France - 20160114 industrialisation_process_big_data CanalPlus
HUG France - 20160114 industrialisation_process_big_data CanalPlus
Modern Data Stack France
Hugfr SPARK & RIAK -20160114_hug_france
Hugfr SPARK & RIAK -20160114_hug_france
Modern Data Stack France
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)
Modern Data Stack France
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015
Modern Data Stack France
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
Modern Data Stack France
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
Modern Data Stack France
Spark dataframe
Spark dataframe
Modern Data Stack France
June Spark meetup : search as recommandation
June Spark meetup : search as recommandation
Modern Data Stack France
Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)
Modern Data Stack France
Spark meetup at viadeo
Spark meetup at viadeo
Modern Data Stack France
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
Modern Data Stack France
Más de Modern Data Stack France
(20)
Stash - Data FinOPS
Stash - Data FinOPS
Vue d'ensemble Dremio
Vue d'ensemble Dremio
From Data Warehouse to Lakehouse
From Data Warehouse to Lakehouse
Talend spark meetup 03042017 - Paris Spark Meetup
Talend spark meetup 03042017 - Paris Spark Meetup
Paris Spark Meetup - Trifacta - 03_04_2017
Paris Spark Meetup - Trifacta - 03_04_2017
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
Hadoop France meetup Feb2016 : recommendations with spark
Hadoop France meetup Feb2016 : recommendations with spark
Hug janvier 2016 -EDF
Hug janvier 2016 -EDF
HUG France - 20160114 industrialisation_process_big_data CanalPlus
HUG France - 20160114 industrialisation_process_big_data CanalPlus
Hugfr SPARK & RIAK -20160114_hug_france
Hugfr SPARK & RIAK -20160114_hug_france
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
Spark dataframe
Spark dataframe
June Spark meetup : search as recommandation
June Spark meetup : search as recommandation
Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark meetup at viadeo
Spark meetup at viadeo
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
IBM Stream au Hadoop User Group
1.
Big Data
Jerome Chailloux, Big Data Specialist jerome.chailloux@fr.ibm.com © 2011 IBM Corporation
2.
Imagine the Possibilities
of Analyzing All Available Data Faster, More Comprehensive, Less Expensive Real-time Understand and Traffic Flow Fraud & risk act on customer Optimization detection sentiment Accurate and timely Predict and act on Low-latency network threat detection intent to purchase analysis © 2011 IBM Corporation
3.
Where is this
data coming from? Every day, the New York Stock Exchange captures 1 TB of trade information. 12 TB of tweets being 5 Billion mobile phones in created each day. use in 2010. Only 12% were smartphones. Every second of HD video More than 30M networked generates > 2,000 times as sensor, growing at a rate many bytes as required to store >30% per year. a single page of text. What is your business doing with it? © 2011 IBM Corporation 3 Source: McKinsey & Company, May 2011
4.
Why is Big
Data important ? Data AVAILABLE to an organization Missed ty ni opportu data an organization can PROCESS Organizations are able to Enterprises are “more blind” process less and less of the to new opportunities. available data. 4 © 2011 IBM Corporation 4
5.
What does a
Big Data platform do ? Analyze a Variety of Information Novel analytics on a broad set of mixed information that could not be analyzed before Analyze Information in Motion Streaming data analysis Large volume data bursts & ad-hoc analysis Analyze Extreme Volumes of Information Cost-efficiently process and analyze petabytes of information Manage & analyze high volumes of structured, relational data Discover & Experiment Ad-hoc analytics, data discovery & experimentation Manage & Plan Enforce data structure, integrity and control to ensure consistency for repeatable queries © 2011 IBM Corporation 5
6.
Complementary Approaches for
Different Use Cases Traditional Approach New Approach Structured, analytical, logical Creative, holistic thought, intuition Data Hadoop Warehouse Streams Transaction Data Web Logs Internal App Data Social Data Structured Structured Unstructured Unstructured Repeatable Enterprise Exploratory Repeatable Integration Exploratory Text Data: emails Mainframe Data Linear Iterative Linear Iterative sentiment Monthly sales reports Brand Profitability analysis Product Sensor data: images strategy OLTP System Data Customer surveys Maximum asset utilization ERP data Traditional New RFID Sources Sources © 2011 IBM Corporation
7.
IBM Big Data
Strategy: Move the Analytics Closer to the Data New analytic applications drive the Analytic Applications requirements for a big data platform BI / Exploration / Functional Industry Predictive Content Reporting Visualization App App BI / Analytics Analytics Reporting • Integrate and manage the full IBM Big Data Platform variety, velocity and volume of data Visualization Application Systems • Apply advanced analytics to & Discovery Development Management information in its native form • Visualize all available data for ad- Accelerators hoc analysis • Development environment for Hadoop Stream Data System Computing Warehouse building new analytic applications • Workload optimization and scheduling • Security and Governance Information Integration & Governance © 2011 IBM Corporation
8.
Most Client Use
Cases Combine Multiple Technologies Pre-processing Ingest and analyze unstructured data types and convert to structured data Combine structured and unstructured analysis Augment data warehouse with additional external sources, such as social media Combine high velocity and historical analysis Analyze and react to data in motion; adjust models with deep historical analysis Reuse structured data for exploratory analysis Experimentation and ad-hoc analysis with structured data © 2011 IBM Corporation
9.
IBM is in
a lead position to exploit the Big Data opportunity February 2012 “The Forrester Wave™: Enterprise Hadoop Solutions, Q1 2012” Forrester Wave™: Enterprise Hadoop Solutions, Q1 ’12 IBM Differentiation Embracing Open Source Data in Motion (Streams) and Data at Rest (Hadoop/BigInsights) Tight integration with other Information Management products Bundled, scalable analytics technology Hardened Apache Hadoop for enterprise readiness © 2011 IBM Corporation
10.
IBM’s unique strengths
in Big Data Big Data in Ingest, analyze and act on massive volumes of streaming data. Real-Time Faster AND more cost-effective for specific use cases. (10x volume of data on the same hardware.) Fit for purpose Analyzes a variety of data types, in their native format – text, analytics geospatial, time series, video, audio & more. Enterprise Open source enhanced for reliability, performance and security. Class High performance warehouse software and appliances Ease of use with end users, admin and development UIs. Integration Integration into your IM architecture. Pre-integrated analytic applications. © 2011 IBM Corporation 10
11.
Stream Computing :
What is good for ? Analyze all your data, all the time, just in time What if you could get IMMEDIATE insight? Analytic Results What if you could analyze MORE kinds of data? What if you could do it with exceptional performance? Alerts Threat Prevention Systems More context Logging Traditional Data, Sensor Events, Active response Signals Storage and Warehousing © 2011 IBM Corporation 11
12.
What is Stream
Processing ? Relational databases and warehouses find information stored on disk Stream computing analyzes data before you store it Databases find the needle in the haystack Streams finds the needle as it’s blowing by © 2011 IBM Corporation
13.
Without Streams
With Streams • Intensive scripting Streams provide a Productive and Reusable • Embedded SQL Development Environment • File / Storage management by hand • Record management embedded in application code • Data Buffering, Locality • Security • Dynamic Application Composition • High Availability • Application management (checkpointing, Streams Runtime provides your Application performance optimization, monitoring, workload Infrastructure management, error and event handling) • Application tied to specific Hardware, Infrastructure • Multithreading, Multiprocessing • Debugging • Migration from development to production • Integration of best-of-breed commercial tools • Code reusability “TerraEchos developers can deliver • Source / Target interfaces applications 45% faster due to the agility of Streams Processing Language.“ – Alex Philp, TerraEchos IBM and Customer Confidential © 2011 IBM Corporation 13
14.
Streams
© 2011 IBM Corporation
15.
How Streams Works
Continuous ingestion Infrastructure provides services for Continuous analysis Scheduling analytics across hardware hosts, Establishing streaming connectivity Filter / Sample Transform Annotate Correlate Classify Achieve scale: Where appropriate: By partitioning applications into software components Elements can be fused together By distributing across stream-connected hardware hosts for lower communication latency © 2011 IBM Corporation 15
16.
Scalable Stream Processing
Streams programming model: construct a graph – Mathematical concept OP OP OP • not a line -, bar -, or pie chart! OP OP OP • Also called a network stream OP • Familiar: for example, a tree structure is a graph – Consisting of operators and the streams that connect them • The vertices (or nodes) and edges of the mathematical graph • A directed graph: the edges have a direction (arrows) Streams runtime model: distributed processes – Single or multiple operators form a Processing Element (PE) – Compiler and runtime services make it easy to deploy PEs • On one machine • Across multiple hosts in a cluster when scaled-up processing is required – All links and data transport are handled by runtime services • Automatically • With manual placement directives where required © 2011 IBM Corporation 16
17.
InfoSphere Streams Objects:
Runtime View Instance Instance – Runtime instantiation of InfoSphere Job Streams executing across one or more Node hosts PE PE Stream 1 Stream 2 – Collection of components and services operator Processing Element (PE) 1 Stream – Fundamental execution unit that is run by the Streams instance PE 3 Stream 4 – Can encapsulate a single operator or Stream many “fused” operators Stream 3 Stream 5 Job Node – A deployed Streams application executing in an instance – Consists of one or more PEs © 2011 IBM Corporation 17
18.
InfoSphere Streams Objects:
Development View Operator – The fundamental building block of the Streams Streams Application Processing Language – Operators process data from streams and may stream produce new streams operator Stream – An infinite sequence of structured tuples – Can be consumed by operators on a tuple-by- tuple basis or through the definition of a window height: height: height: 640 1280 640 Tuple width: width: width: – A structured list of attributes and their types. 480 1024 480 Each tuple on a stream has the form dictated data: data: data: by its stream type Stream type – Specification of the name and data type of each attribute in the tuple directory: directory: directory: directory: Window "/img" "/img" "/opt" "/img" – A finite, sequential group of tuples filename: filename: filename: filename: – Based on count, time, attribute value, "farm" "bird" "java" "cat" or punctuation marks tuple © 2011 IBM Corporation 18
19.
What is Streams
Processing Language? Designed for stream computing – Define a streaming-data flow graph – Rich set of data types to define tuple attributes Declarative – Operator invocations name the input and output streams – Referring to streams by name is enough to connect the graph Procedural support – Full-featured C++/Java-like language – Custom logic in operator invocations – Expressions in attribute assignments and parameter definitions Extensible – User-defined data types – Custom functions written in SPL or a native language (C++ or Java) – Custom operator written in SPL – User-defined operators written in C++ or Java © 2011 IBM Corporation 19
20.
Some SPL Terms
port An operator represents a class of manipulations Aggregate – of tuples from one or more input streams – to produce tuples on one or more output streams A stream connects to an operator on a port – an operator defines input and output ports Employee Salary Info Statistics Aggregate An operator invocation – is a specific use of an operator – with specific assigned input and output streams port – with locally specified parameters, logic, etc. TCP Many operators have one input port and one output port; Source others have File – zero input ports: source adapters, e.g., TCPSource Sink – zero output ports: sink adapters, e.g., FileSink – multiple output ports, e.g., Split Split – multiple input ports, e.g., Join Join A composite operator is a collection of operators – An encapsulation of a subgraph of • Primitive operators (non-composite) composite • Composite operators (nested) operator – Similar to a macro in a procedural language © 2011 IBM Corporation 20
21.
Composite Operators
Every graph is encoded as a composite composite Main { – A composite is a graph of one or more operators graph – A composite may have input and output ports stream … { – Source code construct only } • Nothing to do with operator fusion (PEs) stream … { } Each stream declaration in the composite . . . – Invokes a primitive operator or } – another composite operator Application (logical view) An application is a main composite – No input or output ports Stream 1 Stream 2 – Data flows in and out but not on operator streams within a graph 1 – Streams may be exported to and Stream imported from other applications Stream 4 3 running in the same instance Stream Stream 3 Stream 5 © 2011 IBM Corporation 21 21
22.
Anatomy of an
Operator Invocation Operators share a common structure Syntax: – <> are sections to fill in stream<stream-type> stream-name = MyOperator(input-stream; …) Reading an operator invocation { – Declare a stream stream-name logic logic ; – With attributes from stream-type param parameters ; – that is produced by MyOperator output output ; – from the input(s) input-stream window windowspec ; – MyOperator behavior defined by config configuration ; } logic, parameters, windowspec, and configuration; output attribute assignments are specified in output Example: For the example: stream<rstring item> Sale – Declare the stream Sale with the attribute = Join(Bid; Ask) item, which is a raw string { – Join Bid and Ask streams with window Bid: sliding, time(30); – sliding windows of 30 seconds on Bid, Ask: sliding, count(50); param match: Bid.item == Ask.item and 50 tuples of Ask && Bid.price >= Ask.price; – When items are equal, and Bid price is output Sale: item = Bid.item greater than or equal to Ask price } – Output the item value on the Sale stream © 2011 IBM Corporation 22 22
23.
Streams V2.0 Data
Types (any) (primitive) (composite) boolean enum (numeric) timestamp (string) blob (collection) tuple (integral) (floatingpoint) (complex) rstring ustring list set map (signed) (unsigned) (float) (decimal) int8 uint8 float32 decimal32 complex32 int16 uint16 float64 decimal64 complex64 int32 uint32 float128 decimal128 complex128 int64 uint64 © 2011 IBM Corporation 23
24.
Stream and Tuple
Types Stream type (often called “schema”) – Definition of the structure of the data flowing through the stream Tuple type definition – tuple<sequence of attributes> tuple<uint16 id, rstring name> • Attribute: a type and a name • Nesting: any attribute may be another tuple type Stream type is a tuple type – stream<sequence of attributes> stream<uint16 id, rstring name> Indirect stream type definitions – Fully defined within the output stream declaration stream<uint32 callerNum, … rstring endTime, list<uint32> mastIDs> Calls = Op(…)… – Reference a tuple type CallInfo = tuple<uint32 callerNum, … rstring endTime, list<uint32> mastIDs>; stream<CallInfo> InternationalCalls = Op(…) {…} – Reference another stream stream<uint32 callerNum, … rstring endTime, list<uint32> mastIDs> Calls = Op(…)… stream<Calls> RoamingCalls = Op(…) {…} © 2011 IBM Corporation 24
25.
Collection Types
list: array with bounds-checking [0, 17, age-1, 99] – Random access: can access any element at any time Ordered, base-zero indexed: first element is someList[0] set: unordered collection {"cats", "yeasts", "plankton"} – No duplicate element values map: key-to-value mappings {"Mon":0, "Sat":99, "Sun":-1} – Unordered Use type constructors to specify element type – list<type>, set<type> list<uint16>, set<rstring> – map<key-type,value-type> map<rstring[3],int8> Can be nested to any number of levels – map<int32, list<tuple<ustring name, int64 value>>> – {1 : [{"Joe",117885}, {"Fred",923416}], 2 : [{"Max",117885}], -1 : []} Bounded collections optimize performance – list<int32>[5]: at most 5 (32-bit) integer elements – Bounds also apply to strings: rstring[3] has at most 3 (8-bit) characters © 2011 IBM Corporation 25
26.
The Functor Operator
stream<rstring name, Transforms input tuples into output uint32 age, tuples uint64 salary> Person = Op(…){} – One input port – One or more output ports stream<rstring name, May filter tuples uint32 age, – Parameter filter rstring login, – A boolean expression tuple<boolean young, – If true, emit output tuple; boolean rich> info> if false, do not Adult = Functor(Person) { Arbitrary attribute assignments param – Full-blown expressions filter : age >= 21u; – Including function calls output Adult : – Drop, add, transform attributes login = lower(name), – Omitted attributes auto-assigned info = {young = (age < 30u), rich = (salary > 100000ul)}; Custom logic supported } – logic clause Person Adult name Functor name – May include state age age – Applies to filter and assignments salary login info © 2011 IBM Corporation 26
27.
The FileSink Operator
Writes tuples to a file Has a single input port – No output port: data goes to a file, () as Sink = FileSink(StreamIn) { not a Streams stream param Selected Parameters file : "/tmp/people.dat"; – file format : csv; • Mandatory • Base for relative paths is flush : 20u; data subdirectory } • Directories must already exist File- – flush Sink • Flush the output buffer after a given number of tuples – format • csv: comma-separated values • txt, line, binary, block © 2011 IBM Corporation 27
28.
Communication Between Streams
Applications Streams jobs exchange data with the outside world – Source- and Sink-type operators – Can also be used between Streams jobs (e.g., TCPSource/Sink) Streams jobs can exchange data with each other – Within one Streams Instance Supports Dynamic Application Composition – By name or based on properties (tags) – One job exports a stream; another imports it Implemented using two new pseudo-operators: Export and Import Job 1 Stream exported by Job 1 and imported by Job 2 oper- source sink ator Export Import Job 2 oper- oper- source sink ator ator © 2011 IBM Corporation 28
29.
Application Design –
Dynamic Stream Properties API available for toolkit development Can add/modify/delete – Exported stream properties – Imported stream subscription expression Dynamic Job Flow Control Bus Pattern – Operators within jobs interpret control stream tuples – Rewire the flow of data from job to job Flow Control Tuples Exported [A,B,C] Control Stream Job A Job B Job C Job D Data Stream © 2011 IBM Corporation 29
30.
Application Design –
Dynamic Stream Properties API available for toolkit development Can add/modify/delete – Exported stream properties – Imported stream subscription expression Dynamic Job Flow Control Bus Pattern – Operators within jobs interpret control stream tuples – Rewire the flow of data from job to job Flow Control Tuples Exported [A,B,C] Control Stream [A,C,D] Job A Job B Job C Job D Data Stream © 2011 IBM Corporation 30
31.
Application Design –
Multi-job Design Streams Instance: stream1 Job: imagefeeder Job: imagewriter Timestamp + File metadata File metadata Filename Directory- Image- Image- Functor FileSink Scan Source Sink subscription: properties: type == "Image" && name = "Feed", write == “ok" type = "Image", write = “ok" Application / Job Decomposition – Dynamic Job Submission + Stream Import / Export © 2011 IBM Corporation 31
32.
Application Design –
Multi-job Design Streams Instance: stream1 Job: imagefeeder Job: imagewriter Timestamp + File metadata Image + File metadata Filename File metadata Directory- Image- Image- Functor FileSink Scan Source Sink subscription: properties: type == "Image" && name = "Feed", write == “ok" type = "Image", write = “ok" Application / Job Decomposition – Dynamic Job Submission + Stream Import / Export © 2011 IBM Corporation 32
33.
Application Design –
Multi-job Design Streams Instance: stream1 Job: imagefeeder Job: imagewriter Timestamp + File metadata Image + File metadata Filename File metadata Directory- Image- Image- Functor FileSink Scan Source Sink Job: greyscaler subscription: properties: type == "Image" && name = "Feed", Greyscale write == “ok" type = "Image", write = “ok" properties: name = “Grey", subscription: type = "Image", name == "Feed" write = “ok" Application / Job Decomposition – Dynamic Job Submission + Stream Import / Export © 2011 IBM Corporation 33
34.
Application Design –
Multi-job Design Streams Instance: stream1 Job: imagefeeder Job: imagewriter Timestamp + File metadata Image + File metadata Filename File metadata Directory- Image- Image- Functor FileSink Scan Source Sink Job: greyscaler subscription: properties: type == "Image" && name = "Feed", Greyscale write == “ok" type = "Image", write = “ok" properties: Job: resizer name = “Grey", subscription: type = "Image", name == "Feed" write = “ok" Job: facial scan Job: Alerter Application / Job Decomposition – Dynamic Job Submission + Stream Import / Export © 2011 IBM Corporation 34
35.
Application Design –
Multi-job Design Streams Instance: stream1 Job: imagefeeder Job: imagewriter Job: imagefeeder Job: imagewriter Timestamp + Job: imagefeeder File metadata Image + File metadata Filename Directory- metadata File Image- metadata File File metadata Image- DirReader Scan File metadata Source WriteImage Functor Sink Functor FileSink Sink DirReader Job: greyscaler subscription: properties: type == "Image" && name = "Feed", Greyscale write == “ok" type = "Image", write = “ok" properties: Job: resizer name = “Grey", subscription: type = "Image", name == "Feed" write = “ok" Job: facial scan Job: Alerter Application / Job Decomposition – Dynamic Job Submission + Stream Import / Export © 2011 IBM Corporation 35
36.
Two Styles of
Export/Import Publish and subscribe (Recommended approach): – The exporting application publishes a stream with certain properties – The importing stream subscribes to an exported stream with properties satisfying a specified condition Point to point: – The importing application names a specific stream of a specific exporting application Dynamic publish and subscribe: – Export properties and Import expressions can be altered during the execution of a job – Allows dynamic data flows – Alter the flow of data based on the data (history, trends, etc.) () as ImageStream = Export(ImagesIn) { stream<IplImage image, rstring filename, param properties : { rstring directory> ImagesIn = streamName = "ImageFeed", Import() { dataType = "IplImage", param subscription : writeImage = "true"}; dataType == "IplImage" && } writeImage == "true"; } © 2011 IBM Corporation 36
37.
Parallelization Patterns –
Introduction Problem Statement – Series of operations to be performed on a piece of data (a tuple) – How to improve performance of these operations? Key Question – Reduce latency? • For a single piece of data – Increase throughput? • For the entire data flow Three possible design patterns – Serial Path – Parallel Operators (Task Parallelization) – Parallel Paths (Data Parallelization) © 2011 IBM Corporation 37
38.
Parallelization Patterns –
Pipeline, Task Pipeline (serial path) A B C D – Base pattern: inherent in graph paradigm – Results arrive at D in time T(A) + T(B) + T(C) Parallel operators (task parallelization) A B M D C – Process the tuple in operators A, B, and C at the same time – Requires merger (e.g., Barrier) before operator D – Results arrive at D in time Max(T(A),T(B),T(C)) + T(M) – Use when tuple latency requirement < T(A) + T(B) + T(C) – Complexity of merger depends on behavior of operators A, B, and C © 2011 IBM Corporation 38
39.
Parallelization Patterns –
Parallel Pipelines Parallel pipelines (data parallelization) A B C A B C D A B C – Migration step from pipeline patttern – Can improve throughput • Especially good for variable-size data / processing time Design Decisions – Are there latency and/or throughput requirements? – Do the operators perform filtering, feature extraction, transformation? – Is there an execution order requirement? – Is there a tuple order requirement? Recommend Pipeline Parallel Pipelines when possible © 2011 IBM Corporation 39
40.
Application Design –
Multi-tier Design Transport Processing / Transport Ingestion Reduction Transformation Adaptation Analytics Adaptation Transport Processing / Transport Ingestion Adaptation Analytics Adaptation Examples N-tier design – Number and purpose of tiers is a result of Application Design Create well-defined interfaces between the tiers Supports several overarching concepts – Incremental development / testing – Application / Job / Operator reuse – Modular programming practices Each tier in these examples may be made up of one or more jobs (programs) © 2011 IBM Corporation 40
41.
Application Design –
High Availability HA application design pattern – Source job exports stream, enriched with tuple ID – Jobs 1 & 2 process in parallel, and export final streams – Sink job imports streams, discards duplicates, alerts on missing tuples Job 1 Job 1 Job 1 Job 1 Sink Sink Host pool 1 Job 1 Job 1 Job 1 Job 1 Job 2 Job 2 Host pool 2 Source Source Job 2 Job 2 Job 2 Job 2 Job 2 Job 2 Host pool 3 x86 host x86 host x86 host x86 host x86 host Host pool 4 © 2011 IBM Corporation 41
42.
Application Design –
High Availability HA application design pattern – Source job exports stream, enriched with tuple ID – Jobs 1 & 2 process in parallel, and export final streams – Sink job imports streams, discards duplicates, alerts on missing tuples Job 1 Job 1 Job 1 Job 1 Sink Sink Host pool 1 Job 1 Job 1 Job 1 Job 1 Source Source Job 2 Job 2 Host pool 2 Job 2 Job 2 Job 2 Job 2 Job 2 Job 2 x86 host Host pool 3 x86 host x86 host x86 host x86 host Host pool 4 © 2011 IBM Corporation 42
43.
IBM InfoSphere Streams
Agile Development Distributed Runtime Sophisticated Analytics Environment Environment with Toolkits & Adapters Front Office 3.0 Toolkits Database Advanced Text Mining Geospatial Clustered runtime for Financial Timeseries massive scalability Standard Messaging RHEL v5.x and v6.x, Internet ... Eclipse IDE BigData User-defined CentOS v6.x Streams Live Graph x86 & Power multicore • HDFS • DataExplorer Streams Debugger hardware Ethernet & InfiniBand Over 50 samples © 2011 IBM Corporation
44.
Toolkits and Operators
to Speed and Simplify Development Standard Toolkit Internet Toolkit Relational Operators InetSource Filter Sort HTTP FTP HTTPS Functor Join FTPS RSS file Punctor Aggregate Adapter Operators Database Toolkit FileSource UDPSource ODBCAppendODBCEnrich FileSink UDPSink ODBCSource SolidDBEnrich DirectoryScan Export DB2SplitDB DB2PartitionedAppend TCPSource Import Supports: DB2 LUW, IDS, solidDB, TCPSink MetricsSink Netezza, Oracle, SQL Server, MySQL Utility Operators Custom Split Financial Toolkit Beacon DeDuplicate Throttle Union Data Mining Toolkit Delay ThreadedSplit Big Data toolkit Barrier DynamicFilter Pair Gate Text Toolkit JavaOp ….. Standard toolkit contains the User-Defined Toolkits default operators shipped with the Extend the language by adding product user-defined operators and functions © 2011 IBM Corporation 44
45.
User Defined Toolkits
Streams supports toolkits – Reusable sets of operators and functions – What can be included in a toolkit? • Primitive and composite operators • Native and SPL functions • Types • Tools/documentation/samples/data, etc. – Versioning is supported – Define dependencies on other versioned assets (toolkits, Streams) – Create cross-domain and domain-specific accelerators © 2011 IBM Corporation 45 45
46.
© 2011 IBM
Corporation 46
47.
A quick peek
inside … InfoSphere Streams Instance – Single Host Management Services & Applications Streams Web Service (SWS) Streams Application Manager (SAM) Streams Resource Manager (SRM) Authorization and Authentication Service (AAS) Scheduler Recover DB Name Server Host Controller Processing Element Container File System © 2011 IBM Corporation
48.
A quick peek
inside … InfoSphere Streams Instance – Multi host, Management Services on separate node Management Services Streams Web Service (SWS) Streams Application Manager (SAM) Streams Resource Manager (SRM) Authorization and Authentication Service (AAS) Scheduler Recover DB Name Server Shared File System Application Host Application Host Application Host Host Controller Host Controller Host Controller Processing Element Processing Element Processing Element Container Container Container © 2011 IBM Corporation
49.
A quick peek
inside … InfoSphere Streams Instance – Multi host, Management Services on multiple hosts Management Management Management Streams Web Service AAS Recovery DB Management Management Application Host Streams App Manager Scheduler Host Controller Processing Element Management Management Container Streams Resource Mgr Name Server Shared File System Application Host Application Host Application Host Application Host Host Controller Host Controller Host Controller Host Controller Processing Element Processing Element Processing Element Processing Element Container Container Container Container © 2011 IBM Corporation
Descargar ahora