The presentation compares Data Lakes with classical DWHs. Topics like schema-on-read, schema-on-write, security, JSON, data modeling, data integration are covered.
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Are Data Lakes the new Core DWHs?
1. A company of Daimler AG
ARE DATA LAKES THE NEW CORE DWHS?
ANDREAS BUCKENHOFER, DAIMLER TSS
ORACLE DATA VISION - NEUSS 2017
DOAG BIG DATA, REPORTING, GEODATA DAYS - KASSEL 2017
3. DAIMLER TSS. IT EXCELLENCE: COMPREHENSIVE, INNOVATIVE, CLOSE.
We're a specialist and strategic business partner for innovative IT Solutions within Daimler –
not just another supplier!
As a 100% subsidiary of Daimler, we live the culture of excellence and aspire to take an
innovative and technological lead.
With our outstanding technological and methodical know-how we are a competent provider of
services that help those who benefit from them to stand out from the competition. When it
comes to demanding IT questions we create impetus, especially in the core fields car IT and
mobility, information security, analytics, shared services and Digital Customer Experience.
Are Data Lakes the new Core DWHs?Daimler TSS GmbH 3
TSS 2 0 2 0 ALWAYS ON THE MOVE.
4. Daimler TSS GmbH 4
LOCATIONS
Are Data Lakes the new Core DWHs?
Daimler TSS China
Hub Beijing
6 Employees
Daimler TSS Malaysia
Hub Kuala Lumpur
38 Employees
Daimler TSS India
Hub Bangalore
16 Employees
Daimler TSS Germany
More than 1000 Employees
Ulm (Headquarters)
Stuttgart Area
Böblingen, Echterdingen,
Leinfelden, Möhringen
Berlin
Karlsruhe
6. • Software is becoming more and
more important
• 100Mio lines of code
• Physical products
• are significantly enhanced with
digital service capabilities, e.g. the
value of the car comes increasingly
from digital assets
• become digital services, e.g. car2go
• IOT, Robotics, etc.
DIGITIZATION – DATA AS AN ASSET FOR ANALYTICAL
DECISIONS
Are Data Lakes the new Core DWHs?Daimler TSS 6
Source image: https://www.linkedin.com/pulse/20140626152045-3625632-car-software-100m-lines-of-code-and-counting
7. Agility
• Is the Organization ready? IT (Dev + Ops) and Business
Flexibility
• Data Modeling under pressure, model as you go
• New data formats coming from logs, sensors, etc.
Performance
• Right Time
• Scale to high volumes
• Integrate data arriving at high speed
DWH AS INTEGRATION SYSTEM FOR DIGITAL ASSETS SOME
OF TODAY’S MAIN CHALLENGES
Are Data Lakes the new Core DWHs?Daimler TSS 7
8. IS THE DATA WAREHOUSE DEAD? AND ETL, TOO?
Are Data Lakes the new Core DWHs?Daimler TSS 8
Sources: https://www.linkedin.com/groups/45685/45685-6224210695295168512?trk=hp-feed-group-discussion&_mSplash=1
https://speakerdeck.com/nehanarkhede/etl-is-dead-long-live-streams
https://gcn.com/blogs/reality-check/2014/01/hadoop-vs-data-warehousing.aspx
10. REFERENCE DATA WAREHOUSE ARCHITECTURE
Are Data Lakes the new Core DWHs?Daimler TSS 10
Data Warehouse
FrontendBackend
External data sources
Internal data sources
Staging
Layer
(Input
Layer)
OLTP
OLTP
Core
Warehouse
Layer
(Storage
Layer)
Mart Layer
(Output
Layer)
(Reporting
Layer)
Integration
Layer
(Cleansing
Layer)
Aggregation
Layer
Metadata Management
Security
DWH Manager
subject-
oriented,
integrated,
time-
variant,
non-
volatile
11. REFERENCE DATA WAREHOUSE ARCHITECTURE
Are Data Lakes the new Core DWHs?Daimler TSS 11
Data Warehouse
FrontendBackend
External data sources
Internal data sources
Staging
Layer
(Input
Layer)
OLTP
OLTP
Core
Warehouse
Layer
(Storage
Layer)
Mart Layer
(Output
Layer)
(Reporting
Layer)
Integration
Layer
(Cleansing
Layer)
Aggregation
Layer
Metadata Management
Security
DWH Manager
subject-
oriented,
integrated,
time-
variant,
non-
volatile
12. Are Data Lakes the new Core DWHs?Daimler TSS 12
Data Lake on Hadoop
Data Swamp
Data Reservoir
Landing Zone
Data Library
Data Repository
Data Archive
Data Lake on Spark
Data Lake 3.0
13. DATA LAKE REFERENCE ARCHITECTURE
DATA LAKE OVERALL ARCHITECTURE VS DATA LAKE LAYER
Are Data Lakes the new Core DWHs?Daimler TSS 13
Landing Zone
DataGovernance
Data Reservoir / Presentation
Data Lake
MetadataManagement
DataArchival
DataSecurity
14. DATA LAKE REFERENCE ARCHITECTURE
Are Data Lakes the new Core DWHs?Daimler TSS 14
Landing Zone
DataGovernance
Data Reservoir /Presentation
Data Lake
Metadata
Management
DataArchival
DataSecurity
Firewall
Firewall
Sqoop Kafka
Knox
Rest API
ODBC/JDBC Restful Client
Sources
15. •Architecture, conceptData Lake
•Tools (that can be used to
implement a Lake)
Hadoop, Spark,
Elastic Stack
DATA LAKE VS HADOOP
Are Data Lakes the new Core DWHs?Daimler TSS 15
16. • Data has a structure: schema-less does not exist
• You apply
• schema-on-read
e.g. copy files (csv, json, html, …) into HDFS
• schema-on-write
e.g. create table on data files in HDFS
HOW TO STRUCTURE THE DATA LAKE?
SCHEMA-LESS REVOLUTION?
Are Data Lakes the new Core DWHs?Daimler TSS 16
17. Flexibility
• For whom? Writing the data vs reading the data
Simplicity
• For whom? Writing the data vs reading the data
• Human mistakes while trying to reading the data
Agility / Model as you go
• Just copy files into the directory
SCHEMA-ON-READ
Are Data Lakes the new Core DWHs?Daimler TSS 17
18. LAMBDA ARCHITECTURE
AN EARLY COMPREHENSIVE BIG DATA ARCHITECTURE
Are Data Lakes the new Core DWHs?Daimler TSS 18
Source image: Nathan Marz, James Warren: Big Data: Principles and best practices of scalable realtime data systems, Manning Publications 2015
• It can be argued about the complexity of the
Lambda architecture
• More interesting is the author’s view on data
• Rawness
Store the data as it is. No transformations.
• Immutability
Don’t update or delete data, just add more.
• Graph-like schema recommended
19. LAMBDA ARCHITECTURE
Are Data Lakes the new Core DWHs?Daimler TSS 19
Source image: Nathan Marz, James Warren: Big Data: Principles and best practices of scalable realtime data systems, Manning Publications 2015
• It can be argued about the complexity of the
Lambda architecture
• More interesting is the author’s view on data
• Rawness
Store the data as it is. No transformations.
• Immutability
Don’t update or delete data, just add more.
• Graph-like schema recommended
„Many developers go down the path of
writing their raw data in a schemaless
format like JSON. This is appealing because
of how easy it is to get started, but this
approach quickly leads to problems.
Whether due to bugs or misunderstandings
between different developers, data
corruption inevitably occurs“
(see page 103, Nathan Marz, „Big Data:
Principles and best practices of scalable
realtime data systems", Manning
Publications)
20. Just dumping data into the Lake?
• General Data Protection Regulation, e.g. Privacy by Design
• Vehicle identifier VIN is already sensitive data that needs to be protected
(anonymized) depending from usage
• Earmarked use of data
Schema-on-read: How do you protect data assets if you are not
aware that the data exists or where it exists?
STRUCTURING THE DATA LAKE
DATA SECURITY
Are Data Lakes the new Core DWHs?Daimler TSS 20
21. DATA LAKE REFERENCE ARCHITECTURE
Are Data Lakes the new Core DWHs?Daimler TSS 21
Landing Zone
DataGovernance
Data Presentation
Data Lake
MetadataManagement
DataArchival
DataSecurity
load
structure
transform
archive
archive
archive
access
Temporary storage
Immutable, modeled data
Tool neutral
Structured data for fast
access
Rawdata
22. Distinguish Data Lake as overall concept vs Data Lake as a layer
• Landing Zone
• Source data programmatically loaded
• Data is partitioned for processing
• Governance includes catalog and ILM (Security, Retention)
• Data Lake
• Lightly integrated by Keys
• Data accessible via SQL-on-Hadoop or using SerDes on raw data
• Data is partitioned for access
• Governance includes catalog, ILM, lightweight model
DATA LAKE HAS LAYERS (1)
DATA LAKE AS CONCEPT VS DATA LAKE AS LAYER
Are Data Lakes the new Core DWHs?Daimler TSS 22
23. • Presentation Zone
• Data is structured and partitioned/tuned for data access
• Full Governance including e.g. catalog, ILM, model
• Known schema including metadata about tables and columns
• Lineage
• Documented quality
DATA LAKE HAS LAYERS (2)
Are Data Lakes the new Core DWHs?Daimler TSS 23
24. GOVERNANCE BY DAIMLER AG / COE
E.G. SAMPLE HDFS LAYOUT
Are Data Lakes the new Core DWHs?Daimler TSS 24
/
scripts
data
Source_system
Landing_zone
scripts
data
Source_system
Data_archive
scripts
data
Source_system_object
Data_lake
model
data
Data_science_results
scripts
data
Use_case
Data_reservoir
scripts
data
Data_science_sandbox
26. USE CASES
WHAT IS THE BUSINESS PROBLEM TO SOLVE?
Are Data Lakes the new Core DWHs?Daimler TSS 26
Source:http://www.azquotes.com/
27. USE CASE: ANALYSIS BATTERY AGING
Are Data Lakes the new Core DWHs?Daimler TSS 27
Max capacity
Current capacity
• CSV data ingested into HDFS, Hive tables on files
• Identify breaks (“> 8h”) and compute current drain
28. • Sensor data format change without notice
• Sensors get regularly updated with new versions
• Names of metrics may change
• Sensors with various versions in the field
• Sensors from different suppliers
• Often many fields >>100 and increasing with new sensor versions
• Easy storing of data in HDFS and applying schema later
• Data from Robots, vehicles, …
STRUCTURING THE DATA LAKE
NEW DATA SOURCES – SENSOR DATA
Are Data Lakes the new Core DWHs?Daimler TSS 28
29. • Sensor data format change
without notice
• Time consuming and error-prone
data integration into the Data Lake
• Therefore preparation of data for
usage in the Data Reservoir
required: “Data Engineer”
STRUCTURING THE DATA LAKE
“SCHEMA-ON-READ”
Are Data Lakes the new Core DWHs?Daimler TSS 29
Landing Zone
DataGovernance
Data Reservoir
Data Lake
MetadataManagement
DataArchival
DataSecurity
csv
Samp-
ling /
filter
Hive tables
Hive tables
Struc-
ture
R Python
30. USE CASE: OPTIMIZE CYCLE TIME FOR LIGHTWEIGHT
ROBOTS
Are Data Lakes the new Core DWHs?Daimler TSS 30
• JSON data from Orient NoSQL-DB ingested into HDFS, Hive tables on files
• Partly automatize the diagnosis of anomalies (e.g. the identification of
reasons for idle times)
31. USE CASE: BOM EXPLOSION
HADOOP COMPUTING POWER
Are Data Lakes the new Core DWHs?Daimler TSS 31
32. • PLMXML files supplied by source systems
• Compute changes by comparing last BOM with current BOM
• Data Lake contains data across all tiers
• Data Reservoir contains “dedicated, secured” views for tiers
• Transfer changes to local relational DBs
USE CASE: BOM EXPLOSION
HADOOP COMPUTING POWER
Are Data Lakes the new Core DWHs?Daimler TSS 32
33. • Several stakeholders, e.g. different (independent) truck units
• Dumping existing systems (or new data sources like logs) into the Data
Lake
• Data is available fast, but
• Different data models
• No integration: IF ETL is reduced to EL, then T is performed by Data Scientists
many times
• Some lightweight data integration required
Data Vault
STRUCTURING THE DATA LAKE LAYER
EXISTING INTERNAL DATA FOR ANALYTICS
Are Data Lakes the new Core DWHs?Daimler TSS 33
34. • Hub and Link tables: how to ensure uniqueness?
• No unique constraints or indexes like RDBMS
• Use View with distinct or group by on Hub or Link table
• Don’t create Hub or Link table. Create view with distinct or group by on original
persisted incoming files
• Use HBase NoSQL wide-column store for Hub, Link (+ Sat) and Phoenix for SQL
access via Hive
• Hub and Link in RDBMS only
• Data Reservoir needs different structure or export data into Data Mart in
RDBMS for faster access
STRUCTURING THE DATA LAKE LAYER
DATA VAULT CHALLENGES WITH HADOOP
Are Data Lakes the new Core DWHs?Daimler TSS 34
35. • Vision: One central Enterprise DWH
• Reality for many organizations: Many DWHs
• more flexible
• acquisition of companies. Merge of systems?
• units with different (innovation) speeds and different interests, e.g. trucks
(Mercedes Benz LKW, Freightliner, Fuso, BharatBenz, Western Star, Fleetboard)
• legal requirements (e.g. data export)
• Vision: One central Data Lake
• Reality: ?
DATA LAKE IN ANALOGY TO AN ENTERPRISE DWH?
Are Data Lakes the new Core DWHs?Daimler TSS 35
36. “The long-term vision was clear –
the data warehouse should not be confined physically to a single
database or machine” (09-MAR-2017)
BARRY DEVLIN – LOGICAL DATA WAREHOUSE
Are Data Lakes the new Core DWHs?Daimler TSS 36
Source: https://upside.tdwi.org/articles/2017/03/09/making-the-most-of-a-logical-data-warehouse.aspx
Barry Devlin wrote the first published article describing a data warehouse
architecture in 1988 ( http://www.9sight.com/1988/02/art-ibmsj-ebis/ )
38. “Data modeling is the process of learning about the data, and regardless of technology,
this process must be performed for a successful application.”
• Learn about the data and promote collective data understanding
• Derive security classification and measures
• Design for performance
• Accelerate development
• Improve Software quality
• Reduce maintenance costs
• Generate code
• NoSQL Schema-on-read: understand model versions after years
WHY DATA MODELING?
Are Data Lakes the new Core DWHs?Daimler TSS 38
Source quote: Steve Hoberman: Data Modeling for Mongo DB, Technics Publications 2014
39. DWH AND DATA LAKE
Are Data Lakes the new Core DWHs?Daimler TSS 39
DWH on RDBMS
Slowly Changing Dimension
ELT vs ETL
3-Layer vs 2-Layer
Kimball Approach
Inmon Definition
Star Schema
Data Vault
Anchor Modeling
etc
Data Lake on Hadoop
Schema-on-Read
Agility
Parquet
Hive
Hbase
SQL-on-Hadoop
Impala
Oozie
Zoekeeper
Methods,
Concepts,
Techniques
Tools,
Tools,
Tools
40. Many ETL problems are home-made, e.g.
• Inefficient: ETL vs ETL / row-based vs set-based
• Expensive: repetitive tasks should be accomplished with generators
NO DATA INTEGRATION - IS ETL DEAD?
DATA SCIENCE REQUIRES PROPER DATA ENGINEERING
Are Data Lakes the new Core DWHs?Daimler TSS 40
Most people in AI forget that the hardest
part of building a new AI solution or
product is not the AI or algorithms— it’s
the data collection and labeling.
Source: https://medium.com/startup-grind/fueling-the-ai-gold-rush-7ae438505bc2#.ywjvuca6z (Luke de
Oliveira)
41. Data Lakes currently focus too much on tools instead on concepts and methods
•Tools come and go
•Flexibility / Schema-on read: Integration just postponed to Data Reservoir or in the worst case even
later to end user
PoCs vs production-ready implementation
•Many tools, but still low-productivity tools (Oozie, etc)
•Error handling coding nightmare across tools
Data Lakes and Core DWHs will coexist
•Another choice that makes sense for many use cases
•DWH: e.g. Data Vault 2.0 architecture with storing raw data and postponing data cleansing /
harmonization for lightweight data integration has similar ideas
IS THE CLASSICAL DWH DEAD?
ARE DATA LAKES THE NEW CORE DWHS?
Are Data Lakes the new Core DWHs?Daimler TSS 41
42. Daimler TSS GmbH
Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99
tss@daimler.com / Internet: www.daimler-tss.com/ Intranet-Portal-Code: @TSS
Domicile and Court of Registry: Ulm / HRB-Nr.: 3844 / Management: Christoph Röger (CEO), Steffen Bäuerle
Are Data Lakes the new Core DWHs?Daimler TSS 42
THANK YOU
43. GARTNER DATA LAKE ARCHITECTURE STYLES
Are Data Lakes the new Core DWHs?Daimler TSS 43
Source: http://blogs.gartner.com/nick-heudecker/data-lake-webinar-recap/
44. • Inflow Lake: accommodates a collection of data ingested from many
different sources that are disconnected outside the lake but can be used
together by being colocated within a single place
• Outflow Lake: a landing area for freshly arrived data available for
immediate access or via streaming. It employs schema-on-read for the
downstream data interpretation and refinement.
• Data Science Lab: most suitable for data discovery and for developing
new advanced analytics models
GARTNER DATA LAKE ARCHITECTURE STYLES
Source: http://blogs.gartner.com/nick-heudecker/data-lake-webinar-recap/ and https://www.asug.com/news/gartner-separate-data-lakes-myths-from-facts-before-you-dive-in
45. Slide 12: Creative Commons Licence, Hernán Piñera
https://www.flickr.com/photos/hernanpc/7175577368/in/photolist-bW5Hab-JF9HNW-a2LHAF-pwWNjx-oC1Jq8-noeV4d-oLsHUa-gUjhFx-qNB2Sw-jKLDCR-DB3B8-pRUpx2-crB6A7-nTUuNp-cXdPgN-
bX7mA4-7oHeKJ-arQCtK-njdhWh-nSadX3-dykooG-sjSZHV-eq69Ux-oW44NF-i2eUbE-5AyaGL-QkmoFh-nU7KcU-QEG6Nf-oziZ4t-oUbQi4-e2NWAT-i3Yna1-eJchKZ-pGC8eC-GDux8r-5FQt95-cWdzfh-ciwtqL-
jQg8BL-4X83Uc-nBZXBA-nogVER-oekb6A-9F7w4M-jKPnYQ-bAGrjd-qNB4Hq-8gJRqp-ahC2fg
Slide 47: Creative Commons Licence, James Loesch
https://www.flickr.com/photos/jal33/5182574275/in/photolist-8TY3LT-7M8Fb9-4jWYv1-hrdbHV-4jSWSn-6cHmvc-m4NnDV-s9Efoy-ccFCcW-5t3Csw-8R87fq-mT6WNq-89mMuL-pzzDjq-2iq7ti-bBA7PT-
rjPdnX-buU2V9-aottwt-4zHTZv-mT6gA6-5hLzzx-9aWGiZ-s9DJRY-jwfgr3-7WZA75-bVmho1-bXkF7U-9aWGba-3mJSwv-sa4Esa-4jWZaA-aottqr-8bj7rS-5NiZbm-oowJXV-3vp25c-5t3EkQ-NnLMaJ-naLPJm-
m78nWk-nqnUYk-mT7Wso-o54T1J-bVmgA9-emeyU1-5hQFV5-akhQQL-naLDim-pPeh93
IMAGE ATTRIBUTION
Are Data Lakes the new Core DWHs?Daimler TSS 45
46. Are Data Lakes the new Core DWHs?Daimler TSS 46
DWH = inflexible development,
bad performance,
complex architecture with 3 layers
47. Failure to talk to business to obtain proper requirements
Ingestion of wrong data
Storage of data with errors
Business Keys (independent object) nested into document
Read performance
SCHEMA-ON-READ
OR WHY MODELING CAN STILL BE USEFUL
Are Data Lakes the new Core DWHs?Daimler TSS 47
48. SCHEMA-ON-READ
OR WHICH BUSINESS PROBLEMS ARE SOLVED
Are Data Lakes the new Core DWHs?Daimler TSS 48
Schema-on-read Remark
Data storage Yes, flexible Store data from various systems
Data integration no Integrate data from various systems
Has to be done during each access by each user
Data historization Yes, auditable Stamp data with timestamp
Information delivery no Turn data into valuable information.
Has to be done during each access by each user
49. DATA MODELS IN THE DWH
Are Data Lakes the new Core DWHs?Daimler TSS 49
Layer Characteristics Data Model
Staging Layer Temporary storage
Ingest of source data
Normally 1:1 copy of source table structure –
usually without constraints and indexes
Core Warehouse
Layer
Historization / bitemporal data
Integration
Tool-independent
Non-redundant data storage
Historization
3NF with historization
Head and Version modelling
Data Vault
Anchor modeling
Dimensional model with historization (possible)
Data Mart Layer Performance for end user queries
required, Tool-dependent
Lots of joins necessary to answer
complex questions
Flat structures, esp. Dimensional model
(ROLAP / MOLAP / HOLAP)
50. Understand business requirements
Understand problem space
Design solution space
Think ideas (incl. alternatives) through
WHY MODEL?
Are Data Lakes the new Core DWHs?Daimler TSS 50
51. SQL is universal language to access and manipulate data in a
RDBMS
SQL is a language not only for DBAs or developers
SQL is standard for OLTP and OLAP, especially for BI tools
MAKE SQL GREAT AGAIN OR WHY SQL ON BIG DATA?
Are Data Lakes the new Core DWHs?Daimler TSS 51
52. STRATA 2012 VS 2016
Are Data Lakes the new Core DWHs?Daimler TSS 52
Source: http://www.cazena.com/blog/strata-word-cloud-2012-vs-2016-data-lakes-spark-real-time-and-other-trends
53. • Architecture with Atlas
• Supports the classical tools:
• Hive
• Sqoop
• HDFS?
• Schema-on-read?
ATLAS FOR METADATA MANAGEMENT
Are Data Lakes the new Core DWHs?Daimler TSS 53
54. NO DATA INTEGRATION NECESSARY OR
WHO REALLY DOES UNDERSTANDS DATA MODELS?
Are Data Lakes the new Core DWHs?Daimler TSS 54
Source: Corr / Stagnitto: Agile Data Warehouse Design, DecisionOne Press, 2011, page 5
• 3NF is inefficient for query processing
• 3NF models are difficult to
understand
• 3NF gets even more complicated with
history added
• Many ways from person to order
55. “Data modeling is the process of learning about the data, and regardless of technology,
this process must be performed for a successful application.”
• Learn about the data and promote collective data understanding
• Derive security classification and measures
• Design for performance
• Accelerate development
• Improve Software quality
• Reduce maintenance costs
• Generate code
• NoSQL Schema-on-read: understand model versions after years
WHY DATA MODELING?
Are Data Lakes the new Core DWHs?Daimler TSS 55
Source quote: Steve Hoberman: Data Modeling for Mongo DB, Technics Publications 2014
„Expanding your
modeling skills
enables you to
reduce documentation.“
Scott Ambler
56. • Standard approach in Data Marts in DWH
• Not just for performance reasons
• Performance is also an issue on Hadoop-based systems, e.g. Hive, Spark
• Joins!
• But also due to understandability for end users
• Understandability is also an issue on Hadoop-based systems
DIMENSIONAL MODELING
Are Data Lakes the new Core DWHs?Daimler TSS 56
57. A prime motivation for this evolution towards a more “database-like”
system was driven by the experiences of Google developers trying to build
on previous “key-value” storage systems. The prototypical example of such
a key-value system is Bigtable, which continues to see massive usage at
Google for a variety of applications. However, developers of many OLTP
applications found it difficult to build these applications without a
strong schema system, cross-row transactions, consistent replication and
a powerful query language.
Source: https://research.google.com/pubs/pub46103.html
IMPORTANCE OF STRONG SCHEMA @GOOGLE
Are Data Lakes the new Core DWHs?Daimler TSS 57
58. HADOOP VS CLASSIC DWH
SQL APPROACH
Are Data Lakes the new Core DWHs?Daimler TSS 58
Classic DWH Hadoop
Tables Yes Yes
SQL language Yes Yes, SQL-on-Hadoop
Query Optimizer Yes Yes
Indexes, Pks Yes No
Data “Owner” Proprietary RDBMS Open data format
Access by many engines like Spark, Hive
Many open formats like Parquet, Avro
Metadata dictionary User data + dictionary
in RDBMS
User data and dictionary (“Hive
Metastore”) separate
59. New data sources
• Sensors, Logs, NoSQL, etc. as data source
• Schema-on-read useful as sensor data format change frequent
Existing internal data
• Dump RDBMS exports into Data Lake for data analytics
• Schema-on-read does not make any sense as data is already in a
documented data model
STRUCTURING THE DATA LAKE
Are Data Lakes the new Core DWHs?Daimler TSS 59