Powerpoint exploring the locations used in television show Time Clash
DLD Summer Workshop Big Data
1. Big Data Workshop - DLD Summer 15
Big Data – Workshop
DLD Summer 15
21/06/15, DLD Summer 15, @rjudas
2. Big Data Workshop - DLD Summer 15
Understanding Big Data
And getting the right mindset
21/06/15, DLD Summer 15, @rjudas
3. Big Data Workshop - DLD Summer 15
Agenda
Syncing
Defining Big Data
Hype or Evolution
Tech Drivers
Big Data – Big Business?
What‘s it all about?
How do we get there?
21/06/15, DLD Summer 15, @rjudas
4. Big Data Workshop - DLD Summer 15
Syncing
21/06/15, DLD Summer 15, @rjudas
5. Big Data Workshop - DLD Summer 15
Syncing
Please tell us your opinion about Big Data
Please tell us about your Big Data projects
21/06/15, DLD Summer 15, @rjudas
6. Big Data Workshop - DLD Summer 15
Defining Big Data
21/06/15, DLD Summer 15, @rjudas
7. Big Data Workshop - DLD Summer 15
Definition(s)
“Big Data describes datasets so large they become very
difficult to manage with traditional database tools.”
„big data is “data that exceeds the processing capacity
of conventional database systems. The data is too big,
moves too fast, or doesn’t fit the strictures of your
database architectures”.“
"Very pragmatically, it's about building net-new analytic
applications based on new types of data that (an
organization) wasn't previously tracking."
21/06/15, DLD Summer 15, @rjudas
8. Big Data Workshop - DLD Summer 15
The 3 V‘s
Variety
Tables, Images,
Videos, XML, Logs
Velocity
Batch, Streams, Real-
Time
Volume
Lot‘s of xBytes
Variety
VolumeVelocity
21/06/15, DLD Summer 15, @rjudas
9. Big Data Workshop - DLD Summer 15
Variety
Mix of Data types
BLOB‘s and CLOB‘s
Images, Audio, Videos, Log Files
Semi-Structured, Unstructured
Email, EDI-Messages, Transaction Logs, Sensor-
Data
21/06/15, DLD Summer 15, @rjudas
10. Big Data Workshop - DLD Summer 15
Velocity
Crucial – Speed of „Feedback Loop“
Streaming Data
Complex Event Processing
From Batch to (Near) Real-Time
Different Lifetime
21/06/15, DLD Summer 15, @rjudas
12. Big Data Workshop - DLD Summer 15
Figures
„Digital Universe“ according to EMC/IDG Study
2014 in 2013 4.4 Zettabytes, in 2020 44 Zettabytes
All human speech ever spoken 42 Zettabyte
(16kHz, 16bit)
2013 - Speculations about NSA Datacenter 1 YB,
real estimation 3-12 EB
CERN / LHC Datacenter passes 100 PB
21/06/15, DLD Summer 15, @rjudas
13. Big Data Workshop - DLD Summer 15
Volume – Most famous quote
2.5 Exabytes of Data Created each Day
(2,500,000,000,000,000,000 bytes) ≈ 1 ZB/Year
(with 90% of World Data created in the last two
years)
Source IBM CMO Study 2011
21/06/15, DLD Summer 15, @rjudas
14. Big Data Workshop - DLD Summer 15
Even more V‘s
Veracity
Uncertainty of Data, Trustworthiness, Accountability
Value
Big Data only if it generates value
Visibility
Security, stitching together data from various
sources
Validity
Logic inference, Correlation vs. Causation
21/06/15, DLD Summer 15, @rjudas
15. Big Data Workshop - DLD Summer 15
Hype or Evolution?
21/06/15, DLD Summer 15, @rjudas
16. Big Data Workshop - DLD Summer 15
Old wine?
OLTP, OLAP,
DataWareHouse
- Around since 1970s
- ACID (Atomicity,
Consistency,
Isolation, Durability)
- based on SQL
21/06/15, DLD Summer 15, @rjudas
17. Big Data Workshop - DLD Summer 15
Big Data 15 years ago
OLTP
Orders
Articles
Receiving
Orders,
Articles,
Receiving
Etc.
Data Warehouse
Decision Support
Systems (OLAP)
21/06/15, DLD Summer 15, @rjudas
18. Big Data Workshop - DLD Summer 15
Business Intelligence
21/06/15, DLD Summer 15, @rjudas
20. Big Data Workshop - DLD Summer 15
Enter Big Data
http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation
http://www.gartner.com/newsroom/id/1731916
http://chucksblog.emc.com/chucks_blog/2011/06/2011-idc-digital-universe-study-big-data-is-here-now-what.html
21/06/15, DLD Summer 15, @rjudas
21. Big Data Workshop - DLD Summer 15
“New” Big Data
New Paradigm
BASE (Basic Availability, Soft State and Eventually
consistency)
New Data Model
Data LifeCycle and Variability
Data Linking and referral integrity
New Analytics
Real-time/streaming analysis, interactive
Machine-learning
New Infrastructure and Tools
High Performance Computing, Storage, Network
Multi-Provider Services Integration
New Data Centric service models and security models
21/06/15, DLD Summer 15, @rjudas
23. Big Data Workshop - DLD Summer 15
Hadoop on
Premise
Big Data
Cluster
Mgmt /
Monitoring
NoSQL
NewSQL
Databases
MPP Databases
Graph
DB
Crowd-
sourcing
Transfo
rmation
Security
Storage
App Dev
Cross Infrastructure / Cloud Services
Analytics
Platform
BI
Platforms
For
Business
Analysts
Data
Science /
Platform
Data Visualization
Unstru
ctured
Data
AI Social Analytics
Analytic Services
Machine
Learning
Location/Pe
ople/Events
Search
Statistical
Computing
Log
Analytics
Crowd-
source
d
RealTime SMB
Frame-
work
Query Data Access
Collab.
workflow
Real-
Time
Stat.
Tools
ML
Data Source Sensors DataData Markets Incubators
Cloud
Deploy
Gov /
Regu
lation
Security
Education /
Learning
Health
Log
Analytics
Search
Finance
Human
Capital
Legal
Marketing
Publisher
Tools
Ad
Optimi-
zation
21/06/15, DLD Summer 15, @rjudas
24. Big Data Workshop - DLD Summer 15
Big Data
Hype AND Evolution
Some Vendors use it to remarket “old” stuff
Many “new” products/services
21/06/15, DLD Summer 15, @rjudas
25. Big Data Workshop - DLD Summer 15
Tech Drivers
21/06/15, DLD Summer 15, @rjudas
26. Big Data Workshop - DLD Summer 15
Drivers
Vendors
Hardware, Storage, Network, Software
Business
Mobile
Social
Customer Insights
Technology
Open Source Technology, Cloud Computing
21/06/15, DLD Summer 15, @rjudas
27. Big Data Workshop - DLD Summer 15
The Elephant in the Room
21/06/15, DLD Summer 15, @rjudas
28. Big Data Workshop - DLD Summer 15
Hadoop
- Hadoop is an Open Source „Big Data“
Framework
- Distributed Storage (HDFS) and Processing
(Map Reduce)
- Reliable, Fault tolerant
- Horizontal scalability from Single to thousands of
Cluster Nodes
- Cost 2.500$ / TB vs. 250.000$ / TB in
Datawarehouses
21/06/15, DLD Summer 15, @rjudas
29. Big Data Workshop - DLD Summer 15
MapReduce
Programming Model/Framework for processing
large Data Sets
21/06/15, DLD Summer 15, @rjudas
30. Big Data Workshop - DLD Summer 15
NoSQL Databases
Traditional
RDBMS outdated
for modern
paradigms
- Big Data
- Connectivity
- Concurrency
- Diversity
- Cloud
21/06/15, DLD Summer 15, @rjudas
31. Big Data Workshop - DLD Summer 15
The difference – SQL / Tables
21/06/15, DLD Summer 15, @rjudas
33. Big Data Workshop - DLD Summer 15
Pros/Cons Hadoop / NoSQL
Pro
Highly flexible, agile, available, performant
Scalable
Modern, open technology with Commercial Support
Support for very large datasets on commodity
hardware
Cons
Immature
No Standardization - Schema-free means
Application needs to know how to retrieve data
21/06/15, DLD Summer 15, @rjudas
34. Big Data Workshop - DLD Summer 15
Even more tools
Search/Index
Business Intelligence
Analytical Programming
Visualisation
21/06/15, DLD Summer 15, @rjudas
35. Big Data Workshop - DLD Summer 15
Machine Learning
21/06/15, DLD Summer 15, @rjudas
36. Big Data Workshop - DLD Summer 15
Big Data – Big Business?
21/06/15, DLD Summer 15, @rjudas
37. Big Data Workshop - DLD Summer 15
Big Data Market
Big Data Market projected in 2015 – $125bn*
(in comparison Public Cloud - $95bn**)
Big Funding
Cloudera – $1.2bn
MongoDB – $300m
HortonWorks – $250m
DataStax – $190m
BIRST – $130m
* According to Forbes.co / 2014/12/11 / 6 Predictions for Big Data / IDC Research
** According to Forrester Research
21/06/15, DLD Summer 15, @rjudas
38. Big Data Workshop - DLD Summer 15
Shares of Big Data Market
21/06/15, DLD Summer 15, @rjudas
39. Big Data Workshop - DLD Summer 15
Vendors love Big Data
21/06/15, DLD Summer 15, @rjudas
40. Big Data Workshop - DLD Summer 15
Vendors REALLY love Big Data!
Latest in Corporate Tech: In-Memory
Oracle Exalytics
SAP HANA
„Has SAP Bet The House With The Biggest
Update to its ERP in Two Decades?“
http://www.forbes.com/sites/greatspeculations/2015/03/04/has-sap-bet-the-house-with-the-biggest-update-to-its-erp-in-two-decades/
21/06/15, DLD Summer 15, @rjudas
41. Big Data Workshop - DLD Summer 15
Even more Sales!!!
21/06/15, DLD Summer 15, @rjudas
42. Big Data Workshop - DLD Summer 15
Best Practices DWH / BI / Big Data
Analyze problem / data / quality
Data Cleaning
Data quality initiatives
Sync Business / IT
Buy stuff
Implement stuff
Train users
Use governance / strategic approaches
21/06/15, DLD Summer 15, @rjudas
43. Big Data Workshop - DLD Summer 15
And the success?
Through 2017, 60% of big data projects will fail to go
beyond piloting and experimentation and will be
abandoned.
Through 2017, fewer than half of lagging
organizations will have made cultural or business
model adjustments sufficient to benefit from big
data.
Through 2018, 90% of deployed data lakes will be
useless as they are overwhelmed with information
assets captured for uncertain use cases.
Gartner: Predicts 2015: Big Data Challenges Move From Technology to the Organization
21/06/15, DLD Summer 15, @rjudas
44. Big Data Workshop - DLD Summer 15
Challenges
Usage Scenarios
Goals
Skills
Missing Data Scientists
Need to understand the Math
Technical
Data Integration
Privacy
Main discussion in Germany
21/06/15, DLD Summer 15, @rjudas
45. Big Data Workshop - DLD Summer 15
Syncing
What‘s your opinion?
Do you have experience with big vendors
offerings?
21/06/15, DLD Summer 15, @rjudas
46. Big Data Workshop - DLD Summer 15
What‘s it all about?
21/06/15, DLD Summer 15, @rjudas
48. Big Data Workshop - DLD Summer 15
What‘s it all about?
Data contains information of great business
value
If you can extract those insights you can make
far better decisions
Ultimately - Predicting the future
21/06/15, DLD Summer 15, @rjudas
49. Big Data Workshop - DLD Summer 15
Common Use Cases
Customer Insights
Market Basket/Pricing optimization
Fraud Detection / Security Analytics
(Proactive) Monitoring
Sensor Data (IoT)
Data Warehouse Optimization
21/06/15, DLD Summer 15, @rjudas
51. Big Data Workshop - DLD Summer 15
Understanding is important
Data Understanding
Connectedness
Information
Knowledge
Intelligence/Wisdom
Understanding
relations
Understanding
patterns
Understanding
principles
21/06/15, DLD Summer 15, @rjudas
52. Big Data Workshop - DLD Summer 15
How do we get there?
21/06/15, DLD Summer 15, @rjudas
53. Big Data Workshop - DLD Summer 15
Syncing
Anyone heard about „Semantic Web“ or
„Ontology“?
Anyone having experience or projects around
Ontologies?
21/06/15, DLD Summer 15, @rjudas
54. Big Data Workshop - DLD Summer 15
Mapping the territory
Enterprise Architecture (traditional)
„Holistic“ Approach
Many „Best practices“ and patterns
Big Data Discovery
Kind of Self-Service for Big Data
Next Big Thing?
Semantic Layer
Should exist from BI implementation (proprietary)
Or use modern approach “Linked Data”
21/06/15, DLD Summer 15, @rjudas
61. Big Data Workshop - DLD Summer 15
Ontologies
“A Data Model that represents Knowledge
as a set of concepts within a domain and
the relationships between these concepts”
FOAF
Schema.org
DBPedia Ontology
Good Relations
http://www.w3.org/wiki/Good_Ontologies
21/06/15, DLD Summer 15, @rjudas
62. Big Data Workshop - DLD Summer 15
Triples
Representation of facts
PredicateSubject Object
Is a (has type)Roland Person
http://about.me/rjudas rdf:type foaf:Person
21/06/15, DLD Summer 15, @rjudas
65. Big Data Workshop - DLD Summer 15
From Triples to Graphs
Is a
Person
Roland
likes
DLD
Songs
plays
Vertice / Node
Edge
21/06/15, DLD Summer 15, @rjudas
66. Big Data Workshop - DLD Summer 15
Famous Examples
21/06/15, DLD Summer 15, @rjudas
67. Big Data Workshop - DLD Summer 15
A pragmatic Approach
From the Basement
21/06/15, DLD Summer 15, @rjudas
68. Big Data Workshop - DLD Summer 15
Bringing Pieces together
Semantic Graphs
Big DataAPIs
21/06/15, DLD Summer 15, @rjudas
69. Big Data Workshop - DLD Summer 15
http://github.com/arago/ogit
21/06/15, DLD Summer 15, @rjudas
71. Big Data Workshop - DLD Summer 15
Semantic Data Platform
21/06/15, DLD Summer 15, @rjudas
72. Big Data Workshop - DLD Summer 15
Visualization
21/06/15, DLD Summer 15, @rjudas
73. Big Data Workshop - DLD Summer 15
Use Cases from/beyond the IT Department
Ticket Statistics
Provider Management
Network Planning
Comparing Architectures
Forecasting Technological Trends
Data Center Planning
Application Migration
Technical Analysis for Business Processes
IT Organisation Insights
User Ranking
21/06/15, DLD Summer 15, @rjudas
74. Big Data Workshop - DLD Summer 15
The right Mindset
Semantics
Graphs
APIs
“New” Big Data Tools
21/06/15, DLD Summer 15, @rjudas
75. Big Data Workshop - DLD Summer 15
www.autopilot.co www.graphit.co www.tabtab.co
21/06/15, DLD Summer
77. Big Data Workshop - DLD Summer 15
Image References and Licenses
Facebook Datacenter https://www.flickr.com/photos/intelfreepress/ License CC BY 2.0
Winery https://www.flickr.com/photos/joceykinghorn/ License CC BY-SA 2.0
BI Dashboard https://www.flickr.com/photos/ctsi-global/ License CC BY-SA 2.0
Dollars https://www.flickr.com/photos/amagill/ License CC BY 2.0
Old Timer Truck: https://www.flickr.com/photos/ell-r-brown/ License CC BY 2.0
SQL Designer https://www.flickr.com/photos/ejk/ License CC BY-SA 2.0
Crystal Ball https://www.flickr.com/photos/frogman2212/ License CC BY 2.0
MapReduce https://www.flickr.com/photos/lkaestner/ License CC BY-SA 2.0
Foaf https://www.flickr.com/photos/dullhunk/ License CC BY 2.0
Linked Open Data Richard Cyganiak and Anja Jentzsch License CC BY-SA 3.0
Rear-View Mirror https://www.flickr.com/photos/labyrinthx-2/ License CC BY-SA 2.0
Servers-8055_13.jpg https://commons.wikimedia.org/wiki/User:Victorgrigas License CC BY-SA 3.0
Watson https://commons.wikimedia.org/wiki/User:Clockready License CC BY-SA 3.0
Wolfram Alpha https://www.flickr.com/photos/morville/ License CC BY 2.0
Social_Network_Visualization MartinGrandjean http://www.martingrandjean.ch/wp-content/
21/06/15, DLD Summer 15, @rjudas
Top Players
Commercial
Microsoft, Hyperion (Oracle), Cognos (IBM), Business Objects (SAP)
Open Source
Pentaho, Jedox
ACID – Computing Principle from the 70s Transaction safety, Isolation: Concurrency control
reporting, online analytical processing, analytics, data mining, process mining, complex event processing, business performance management, benchmarking, text mining, predictive analytics and prescriptive analytics.
Dash-Boards
Drill-Down
Data-Mining
Also Predictive
Challenge: unstructured data
Google published End 2004 the MapReduce Algorithm and GFS
Doug Cutting, Engineer at Yahoo implemented this at Yahoo
Since 2008 Apache Foundation
Cassandra, MongoDB, HBASE
Key / Value: e.g. Redis, MemcacheDB, etc.
Column: e.g. Cassandra, HBase, etc.
Document: e.g. MongoDB, Couchbase, etc
Graph: e.g. OrientDB, Neo4J, etc
Key / Value: e.g. Redis, MemcacheDB, etc.
Column: e.g. Cassandra, HBase, etc.
Document: e.g. MongoDB, Couchbase, etc
Graph: e.g. OrientDB, Neo4J, etc
SOLR: Enterprise grade Search/Index Server
ElasticSearch: Search/Indexserver
Pentaho: Data Integration/ Business / Big Data Analytics, Jaspersoft Report/Analytics
R: Statistical Programming Language, Revolution Analytics in 2015 acq. By Microsoft
Python: Programming Language, Pandas: Big Data
Gephi: Graph Visualization Tool
D3: Java Library for Visualization
More Tools at
http://www.datamation.com/data-center/50-top-open-source-tools-for-big-data-1.html
Apache Mahout: Highly Scalable Machine Learning Framework based on Hadoop
Apache Spark: Cluster-Computing Framework adding SQL, R-Query and ML to Big Data Stores/Databases
Azure ML: Microsoft Machine Learning as a Service
Resources:
https://www.udacity.com/course/intro-to-machine-learning--ud120
http://alex.smola.org/drafts/thebook.pdf
http://www.forbes.com/sites/gilpress/2014/12/11/6-predictions-for-the-125-billion-big-data-analytics-market-in-2015/
Compared Public Cloud $95bn in 2015 according to Forrester Research
http://wikibon.org/wiki/v/Big_Data_Vendor_Revenue_and_Market_Forecast_2013-2017
Limited functionality , expensive 1 TB / $1m/yr / Cloud $0.6m
So far, it appears to be limited to 1TB in size, Analytic Workloads and doesn’t support mission critical scenarios, but it’s fair to assume that SAP are working on extending this.
https://blogs.saphana.com/2014/03/06/a-no-brainer-the-tco-of-hana-cloud-platform-vs-on-premise/
Data: symbols
Information: data that are processed to be useful; provides answers to "who", "what", "where", and "when" questions
Knowledge: application of data and information; answers "how" questions
Understanding: appreciation of "why“
Wisdom: evaluated understanding.
http://www.systems-thinking.org/dikw/dikw.htm
Gartner Says Big Data Disruptions Can Be Tamed With Enterprise Architecture
http://www.gartner.com/newsroom/id/1986015
Data Wrangling,
Analytical Latencies 1. Data access 2. Data preparation 3. Model development 4. Execution 5. Implementation 6. Model audit & update This is where the rubber meets the road: Speed = Value