Big data presentation, explanations and use cases in industrial sector
1. Big Data
explanations
&
use cases in
industrial sector
September 2015
Nicolas SARRAMAGNA
https://fr.linkedin.com/pub/nicolas-sarramagna/19/941/587
2. CONTENTS
What’s Big Data ?
1. Definition, 3 V
2. General use cases
3. Technologies used
4. Market Overview
Big Data in Industrial sector
1. What for ?
2. Vision
3. Demo Poc / PoV
3. COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
What’s Big Data – 3V
SEPTEMBER 2015
3
BIG DATA :
New contexts on data -> 3V
New business ambitions, new technologies
VOLUME : MASSIFICATION AND AUTOMATION OF DATA EXCHANGES
80% data created last 12 months
30 billions of contents on FB each month, Flickr 5 billions of page, 2 billions videos read on sur Youtube each day
VARIETY : MULTIPLICATION OF SOURCES AND TYPES
Mails, documents, logs (applications, networks, systems), databases, sensor data, open data, social networks,
blogs, forums, articles, browsing history, geolocation data, …
Structured data (DB), semi-structured (html page, tweet, xml), unstructured (mail content, excel, ppt, video, audio)
VELOCITY : NEED TO COLLECT AND PROCESS DATA IN REAL TIME
Risk management (fraud, security of the SI – SIEM)
Real time route optimization
Personalized advertising
4. COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
What’s Big Data – new technologies
SEPTEMBER 2015
4
BIG DATA :
More efficient components but also throughput I/O -> grid architecture
New technological knowledge : storage of large volumes of data in a cluster at a lower cost, distributed computing,
data mining industrialized, on-demand IT architecture with the cloud
ORIGIN OF BIG DATA
index the web and search engine for Google, Yahoo - years ~2006
5. COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
What’s Big Data - general use cases IT
SEPTEMBER 2015
5
COMPLETE THE ARCHITECTURE OF THE DATA
Vision of a Data lake / Enterprise data hub
Bringing closer data applications and not duplicate data for each application
"Deliver" managed data
REDUCE STORAGE COSTS AND COMPUTING COSTS
Big Data technologies use commodity hardware and / or cloud and parallel computing
STRONG TECHNICAL CONSTRAINTS
Manage + 1000 transactions / seconde
Flow of + 1000 events to collect / seconde
Computing + 10 threads /core cpu
Storage of data set +10To for actions
Require major adaptations and material logic without big data technologies
6. COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
What’s Big Data - general use cases business
SEPTEMBER 2015
6
END-USER CENTRIC
Products recommendation
Optimization of ads
PROCESS CENTRIC
Detection of unexpected events : fraud, network, predictive maintenance
Path optimization
DIVERSIFICATION OF THE BUSINESS MODEL
Orange : resale of geolocation data
7. COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
What’s Big Data – misconceptions
SEPTEMBER 2015
7
Only used for
unstructured data
Only needed for
massive data sets
Only available from
open-source
Replaces my current
BI platform
Used with structured
and unstructured data
To store and analyse
all size of data
It is complimentary to
our existing BI
strategy and
investments
Big Data will become esential for Business Intelligence
All big editors are on
the bridge
9. COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
What’s Big Data – BI opportunities
SEPTEMBER 2015 FOOTER CAN BE PERSIZED AS FOLLOW: INSERT / HEADER AND FOOTER
9
THE PAST - BI
BIG DATA ANALYTICS
10. COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
What’s Big Data - technologies under the hood - standard Hadoop
SEPTEMBER 2015 FOOTER CAN BE PERSONALIZED AS FOLLOW: INSERT / HEADER AND FOOTER
10
PLATEFORME HADOOP
11. COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
What’s Big Data - technologies under the hood
SEPTEMBER 2015 FOOTER CAN BE PERSONALIZED AS FOLLOW: INSERT / HEADER AND FOOTER
11
COLLECT
Spark, flume, Sqoop
Inject data into HDFS and NoSql DB : command line, API REST, API Java, streaming injection, massive injection,
from RDBMS injection
STORAGE
Cloud, Hadoop -> distributed file system HDFS (large and small data set)
NoSql, : not only sql : db distributed, schema-less : CAP theorem, DB key-value, column, document, graph oriented
12. COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
What’s Big Data - technologies under the hood
SEPTEMBER 2015
12
ANALYSIS
Data Science, Map / Reduce, Spark
Analysis, clean data
Goal : build a model
Machine Learning : 1 data set to train the model (67% of the data set), 1 data set to evaluate the model (33%)
VISUALIZATION
DataViz : all visual representation techniques to do data mining.
Build indicators decision easier
Give indicator whatever size or type of data
Innovate : give new perspectives to discover new opportunities
Tableau, QlikView, Power Pivot
Take data with ODBC connector, JDBC connector, API REST, native connector of the DataViz tool
13. COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
What’s Big Data - technologies under the hood
SEPTEMBER 2015
13
CONCEPTS OF A BIG DATA ARCHITECTURE
Data and actions distributed : the file-system, jobs (Map/Reduce, Spark, …) , databases (noSql)
Data and actions co-location : replication, treatments strategy in Hadoop
Horizontal elasticity : master / nodes architecture
Shared nothing : when a node breaks down, no data is lost. Each node is independent.
Design for failure : when a node breaks down, the cluster continues to work.
14. COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
What’s Big Data - technologies under the hood
SEPTEMBER 2015 FOOTER CAN BE PERSONALIZED AS FOLLOW: INSERT / HEADER AND FOOTER
14
HDFS : HADOOP DISTRIBUTED FILE SYSTEM
Name node : master of the system. Maintains and manages blocks presents on the datanodes
Data nodes : slaves deployed on each machine and provide actual storage. Serve read and write requests for the
clients
15. COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
What’s Big Data – technologies under the hood - storage costs
SEPTEMBER 2015 FOOTER CAN BE PERSIZED AS FOLLOW: INSERT / HEADER AND FOOTER
15
USE COMMODITY HARDWARE
In Big Data, the data center is not a collection of servers but is a collection of co-located cpus, ram and local disks
1 MILLION $ GETS ->
16. COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
COTS DISTRIBUTION
Cloudera, n°1
Hortonworks, n°2
MapR, n°3
CLOUD (BASED ON A DISTRIB)
Microsoft – Azure
Amazon - AWS
APPLIANCE EDITEURS, COSTS++
Terradata
Oracle
What’s Big Data - market Overview
SEPTEMBER 2015
16
leaders
17. COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
CLOUDERA
Business model editor, 5-6k€ / year / node
Amazon deploy Cloudera
Better maturity than others distributions
HORTONWORKS
Free, business model based on support : 15k€ / year / slot of 4 nodes or per slot of 50To
Azure, Amzon deploy Hortonworks
Less mature than Cloudera on security, administration
MAPR
Business model editor
Divergence with the standard Hadoop
Big Data – positioning of the distributions
SEPTEMBER 2015
17
0
20
40
60
80
100
Cloudera
Hortonworks
MapR
Between distributions, ratio 1 to 4
18. CONTENTS
What’s Big Data ?
1. Definition, 3 V
2. Use cases
3. Technologies under the hood
4. Market Overview
Big Data in Industrial sector
1. What for ?
2. Vision
3. Demo Poc / PoV
19. COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
Big Data in Industrial sector – What for ? - use cases IT
BUILD A DATA LAKE
Reduce cost, move cold data from DataWarehouse
Break the storage of the data in silos
Stock raw data and can work (data mining) with all of the data
Open the data, enrich them with metadata
LOG ANALYSIS AND MONITORING - SIEM
Monitoring of applications, networks, systems logs -> Splunk
PREDICTIVE MAINTENANCE
Monitoring of sensor data, predict breakdowns inter plants
SEPTEMBER 2015 FOOTER CAN BE PERSONALIZED AS FOLLOW: INSERT / HEADER AND FOOTER
19
20. COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
Big Data in Industrial sector – What for ? - use cases HR
SKILLS VISION AND MANAGEMENT
Cross informations from professional networks : viadeo, linkedin and internal HR informations : build a map of the
skills in PO
Build and manage groups of skills, enrich internal RH tools
E REPUTATION
Follow in real time the data about your brand, about the competitors, the customers
Monitoring of social networks (twitter, facebook), press news, financial news, forums, blogs, …
Quickly react in according with the results if necessary
SEPTEMBER 2015
20
21. COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
Big Data in Industrial sector – What for ? - use cases Marketing
VISION 360 OF CUSTOMERS, SUPPLIERS, COMPETITORS
Have as much information about a company : social, legal, financial, competitive position.
Evaluate risk, opportunity to work together
VISION OF THE ROI OF PLANTS
Real-time indicators from plants : invest, number of bumpers, tanks
Rank the plants, predict gain
SEPTEMBER 2015
21
22. COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
Big Data in Industrial sector – Vision & Roadmap
2016 : BEGIN TO BUILD A DATA LAKE
Make the data directly available for BI, Data Science and / or to transfer it in a Datawarehouse
Collect data and manage it (who has access, metadata)
Infrastructure : hybrid with cloud / on premise / appliance ?
2016 : CREATE A NEW CROSS-DIVISION SERVICE AROUND THE DATA
DataViz : create reporting, use your current dataViz tools -> current BI analyst, no change
Data IS : know his data and could give metadata to classify it -> current IS , no change
Data engineer : use collecting tools, coding jobs, transform data -> new skills
Data Administrator IT : Big Data architecture integration and monitoring -> new skills
Data Analysis & data mining : cross analysis the data, apply models, design indicators to the dataViz -> new skills
2016+ : IMPLEMENT OTHER USER CASES
Begin small and accelerate
SEPTEMBER 2015
22
23. COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
Big Data in Industrial sector – Data Lake
DATA LAKE / ENTERPRISE DATA HUB / DATA RESERVOIR
Low cost storage of heterogeneous data (semi, non-structured and structured data)
Raw data storage but data enriched and classified by metadata – a data reservoir, not a SWAMP
Used for data exploration, analysis and data mining
Data schema on read : old ETL, new ELT
Can be directly used for BI (ELT mode)
DATA LAKE AND DATA WAREHOUSE
Complete the sources of the data warehouse
Could stock cold data from Data Warehouse
Feed the Data Warehouse
DATA LAKE VISION
Stores aggregated data, can stock all the data
Data Lake centric vision : bring applications to Data and not copy Data to Applications
SEPTEMBER 2015
23
24. COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
Big Data in Industrial sector – Data Lake - infrastructure
BIG DATA INFRASTRUCTURE
hybrid with cloud : NO if you want to keep your data inside (security), network effort, cloud skills
appliance : infra, license, deployment -> TCO ++
On-premise : best compromise between cost, convenience of deployment and usages.
CHOICE : ON-PREMISE INFRASTRUCTURE
Go for Cloudera (better administration and security functionalities, ‘real-time’ module : Impala) or Hortonworks
Send your IT training : dev, admin, data mining
SEPTEMBER 2015
24
25. COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
Big Data in Industrial sector – Proof of Concept – Proof of Value
SEPTEMBER 2015
25
SUBJECT : E-REPUTATION
GOALS
Put in place indicators of e-Reputation of your enterprise/competitors/suppliers/customers
from various sources : news, social network
Experiment of big data tools
INDICATORS
Who speaks about ? How (positive, negative, neutral) ? What’s the content ? Where in the world ? From what
source ?
Different views of e-Reputation : financial, HR, societal, commercial
DEMO
"Big Data" : terme designant une rupture avec le traitement traditionnel de la donnee
Le Big Data permet de solutionner de nouvelles problematiques ou des anciennes d’une meilleure maniere
Goulet d’étranglement sur les accès écriture/lecture disque, le débit disque ne suit la croissance des espaces de stockage
Big Data ne remplace pas l’architecture existante du BI mais la complete et la réoriente : applications vers data et non data (et ses duplications) vers applications
Descriptive , Diagnostic : regarder le passé et trouver les raisons d’un succes ou d’un echec -> BI
Predictive : dégager un modèle qui donne les futurs tendances -> BIG DATA
Prescriptive : sous différentes contraintes, déterminer le meilleur moyen d’y parvenir -> BIG DATA
Raconter le cycle de vie de la donnée selon un ordre chrono depuis la source de données jusqu’à la restit.
Ods : data opérationnelles. Edw : entrepots de données data agrégée.
Datamart : /s ens d’un entrepot. Hdfs système de fichiers distribués.
Event -> Kafka (syst. Message distribue) -> Storm (traitement en tps reel du msg, opt.) -> Nosql