SlideShare a Scribd company logo
1 of 4
Download to read offline
Copyright 2021 Hired Brains Research and Neil Raden
Data Lake/Lakehouse/Cloud Data
Warehouse: Which is Real?
By Neil Raden, Founder Hired Brains Research
Data Lake:
From Hadoop (and later, other cloud storage options), which was indifferent to the size
and type of files that could be processed, as opposed to the rigid and not nearly as
scalable nature of relational data warehouses, hatched the idea of the single place for
everything – the data lake. In truth, it was a concept hatched by the Hadoop
distributors to sell more licenses. Though it did simplify searching for and locating files,
it provided no analytical processing tools at all. The logic of moving a JSON file from
Paris, France, to a Paris, Texas cloud location adds no value except for some economies
of scale in storage and processing.
The data lake collects raw data, thousands, perhaps millions of files. This is
posited as a benefit. But is it really? At a certain level, raw data is an
oxymoron. We can't triangulate data to see if it's consistent with other
instances of the same phenomenon or event. "Raw data" typically implies it is
to be used for a particular purpose, and it is the beginning point for drawing
inferences and drawing conclusions. The context of data — why, how, and
when it was recorded, and what method it was collected and then transformed
is essential. Context-free data simply does not exist. The perfect objectivity we
assign to "raw data" is a myth. That's why in data warehousing, we attempted
to integrate and rationalize things.
Industry analyst Andrew Brust, in "Big on Data," quotes George Fraser, CEO of
Fivetran:
"I think 2021 will reveal the need for data lakes in the modern data stack is
shrinking...there are no longer new technical reasons for adopting data lakes because
data warehouses that separate compute from storage have emerged." If that's not
categorical enough for you, Fraser sums things up thus: "In the world of the modern
data stack, data lakes are not the optimal solution. They are becoming legacy
technology."
For organizations that lack cloud-native data warehouses that separate compute from
storage or even lack a cloud strategy, that is something of an oversimplification. The
calculation of costs of hybrid-cloud, multi-cloud, separation of storage from
compute...border on alchemy. And even a good approximation is only as good as when
Copyright 2021 Hired Brains Research and Neil Raden
you make it because things change so quickly. There is one secret, though, that you
will do worse without a model no matter what approach you take.
Another thing to consider is that "organization" is often an oxymoron. While there may
be a single "strategy" for data architecture in most organizations, the result of
acquisitions, legacies, geography, and just the usual punctuated progress, there may be
a collection of them, distributed physically and architecturally. The best advice is:
Pay more attention to what your data means than where you put it.
To patch some of the data lake idea's manifest deficiencies, cloud providers have
regularly added processing capabilities that mimic early data warehousing features –
comically calling it the "Data Lakehouse" (or the Databricks variant, the Delta Lake)
Data Lakehouse:
According to Databricks, "A data lakehouse is a new, open data management paradigm
that combines the capabilities of data lakes and data warehouses, enabling BI and ML
on all data. ... Merging them into a single system means that data teams can move
faster as they can use data without accessing multiple systems." This statement is more
aspirational than fact. Data warehouses represent forty years of continuous (though not
always smooth) progress and provide all of the services that are needed, such as:
• AI-driven query optimizer
• Complex query formation
• Massively parallel operation based on the model, not just sharding
• Workload Management
• Load Balancing
• Scaling to thousands of simultaneous queries
• Full ANSI SQL and beyond
• In-database Advanced Analytics and support for ML
• Ability to handle native data types such as spatial and time-series
The fact is that some data warehouse platforms do perform all of these functions and
more and are very central to the operations of businesses.
In the early seventies, the world was beset with an energy crisis. Some executives in
Detroit decided that the US needed small cars, with which they had little experience,
but they came up with a platform anyway. But Americans loved their pickup trucks,
which accounted for a substantial share of the automaker's revenue, Ford and Chevy
especially. When you have a terrible solution, the worst thing you can do is pile on
more terrible decisions - the 1973 Ford Courier mini pickup truck, one of the worst,
poorly designed, ill-conceived vehicles in history.
Copyright 2021 Hired Brains Research and Neil Raden
If you can query a JSON file in the Data Lakehouse with SQL transparently, you have
accomplished something. But not enough. What troubles me the most is that the data
lakehouse's excuse is that it's a data lake with some analytical capabilities. What I
haven't heard are understandability and usability. Those capabilities are mostly
inherited from the expanding capabilities of cloud services themselves.
Cloud Data Warehouse:
Cloud data warehouses and there are principally three: AWS Redshift, Snowflake, and
Google BigQuery. Many other relational data warehouse technologies have acceptable
cloud versions, but the cloud-natives claim the high ground for now. At a certain
maturity, they provide all of the functions listed above, rather than being bolt-on
capabilities to generic cloud features. However, it does get a little blurry because the
CDW's provide more than a traditional data warehouse. One, for example, proves a
public data exchange market. I've noticed the word "warehouse" starting to disappear
from their content.
Would you rather have a cloud-native data warehouse that can handle the most
challenging data warehouse tasks but can also provide most of the functionality of a
data lake (or, to put it another way, to eliminate the need for a data lake), or would
you prefer a data lake with partial data warehouse capabilities slapped on?
To sum up:
1. The concept of a data lake is flawed. In an age of multi-cloud and hybrid- cloud
distributed data, not to mention sprawling sensor farms of IoT, there is no
advantage to pulling it all together. AI-driven knowledge graphs are a far better
alternative to locating and tagging data where it is.
2. If you dismiss the data lake, you must of necessity dismiss the lake house
3. Pay more attention to what your data means than where you put it
A data lake looks to me to be static "dumb" data neatly arranged. A data lakehouse, if
you must use that term, is fundamentally different from a data warehouse. It is a
comprehensive set of capabilities that provides a graph-based linked and
contextualized information fabric (semantic metadata and linked datasets) where NLP
(Natural Language Processing), Sentiment Analysis, Rules Engines, Connectors,
Copyright 2021 Hired Brains Research and Neil Raden
Canonical Models for common domains. Add to that cognitive tools that can be
plugged in to turn "dumb" data into information assets with speed, agility, reuse, and
value. I haven't seen one yet.
Neil Raden founded Hired Brains Research in 1985 to provide thought leadership, context, and advisory
consulting and implementation services in Data Architecture, AI, Analytics/Data Science, and organizational
change for analytics for clients worldwide across many industries. Neil is a recognized authority on AI Ethics,
the co-author of the first book on Decision Man agent, "Smart (Enough) Systems," and the foundational
report for the Society of Actuaries, "Ethical Use of Artificial Intelligence for Actuaries." He welcomes your
comments at nraden@hiredbrains.com

More Related Content

What's hot

The Big Data Ecosystem at LinkedIn
The Big Data Ecosystem at LinkedInThe Big Data Ecosystem at LinkedIn
The Big Data Ecosystem at LinkedIn
OSCON Byrum
 
Ibm big data ibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousingIbm big dataibm marriage of hadoop and data warehousing
Ibm big data ibm marriage of hadoop and data warehousing
DataWorks Summit
 
Chicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop
Chicago Data Summit: Extending the Enterprise Data Warehouse with HadoopChicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop
Chicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop
Cloudera, Inc.
 

What's hot (20)

How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?
 
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionBig Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
 
The Big Data Ecosystem at LinkedIn
The Big Data Ecosystem at LinkedInThe Big Data Ecosystem at LinkedIn
The Big Data Ecosystem at LinkedIn
 
DataMinds 2022 Azure Purview Erwin de Kreuk
DataMinds 2022 Azure Purview Erwin de KreukDataMinds 2022 Azure Purview Erwin de Kreuk
DataMinds 2022 Azure Purview Erwin de Kreuk
 
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizIntroduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
 
Analytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopAnalytics in a Day Virtual Workshop
Analytics in a Day Virtual Workshop
 
Accelerating Innovation with Unified Analytics with Ali Ghodsi
Accelerating Innovation with Unified Analytics with Ali GhodsiAccelerating Innovation with Unified Analytics with Ali Ghodsi
Accelerating Innovation with Unified Analytics with Ali Ghodsi
 
Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...
 
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
 
Auckland SQL Saturday - Azure Data Lake
Auckland SQL Saturday - Azure Data LakeAuckland SQL Saturday - Azure Data Lake
Auckland SQL Saturday - Azure Data Lake
 
Ibm big data ibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousingIbm big dataibm marriage of hadoop and data warehousing
Ibm big data ibm marriage of hadoop and data warehousing
 
BDaas- BigData as a service
BDaas- BigData as a service  BDaas- BigData as a service
BDaas- BigData as a service
 
Chicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop
Chicago Data Summit: Extending the Enterprise Data Warehouse with HadoopChicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop
Chicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop
 
Webinar: DataStax Enterprise 5.0 What’s New and How It’ll Make Your Life Easier
Webinar: DataStax Enterprise 5.0 What’s New and How It’ll Make Your Life EasierWebinar: DataStax Enterprise 5.0 What’s New and How It’ll Make Your Life Easier
Webinar: DataStax Enterprise 5.0 What’s New and How It’ll Make Your Life Easier
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case Study
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Azure HDInsight
Azure HDInsightAzure HDInsight
Azure HDInsight
 
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
 
Azure Big data
Azure Big data Azure Big data
Azure Big data
 

Similar to Data lakehouse fallacies

Whitepaper-The-Data-Lake-3_0
Whitepaper-The-Data-Lake-3_0Whitepaper-The-Data-Lake-3_0
Whitepaper-The-Data-Lake-3_0
Jane Roberts
 

Similar to Data lakehouse fallacies (20)

Relational Technologies Under Siege: Will Handsome Newcomers Displace the St...
Relational Technologies Under Siege:  Will Handsome Newcomers Displace the St...Relational Technologies Under Siege:  Will Handsome Newcomers Displace the St...
Relational Technologies Under Siege: Will Handsome Newcomers Displace the St...
 
Big Data Basic Concepts | Presented in 2014
Big Data Basic Concepts  | Presented in 2014Big Data Basic Concepts  | Presented in 2014
Big Data Basic Concepts | Presented in 2014
 
On nosql
On nosqlOn nosql
On nosql
 
Data lakes
Data lakesData lakes
Data lakes
 
Big data management
Big data managementBig data management
Big data management
 
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
Shaping the Role of a Data Lake in a Modern Data Fabric ArchitectureShaping the Role of a Data Lake in a Modern Data Fabric Architecture
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond
 
Big data business case
Big data   business caseBig data   business case
Big data business case
 
Whitepaper-The-Data-Lake-3_0
Whitepaper-The-Data-Lake-3_0Whitepaper-The-Data-Lake-3_0
Whitepaper-The-Data-Lake-3_0
 
Python's Role in the Future of Data Analysis
Python's Role in the Future of Data AnalysisPython's Role in the Future of Data Analysis
Python's Role in the Future of Data Analysis
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
 
Future of big data nick kabra speaker compendium march 2013
Future of big data nick kabra speaker compendium march 2013Future of big data nick kabra speaker compendium march 2013
Future of big data nick kabra speaker compendium march 2013
 
Enterprise Data Lake
Enterprise Data LakeEnterprise Data Lake
Enterprise Data Lake
 
Enterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable DigitalEnterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable Digital
 
Ramya ppt.pptx
Ramya ppt.pptxRamya ppt.pptx
Ramya ppt.pptx
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystem
 
NOSQL
NOSQLNOSQL
NOSQL
 
Redis Cashe is an open-source distributed in-memory data store.
Redis Cashe is an open-source distributed in-memory data store.Redis Cashe is an open-source distributed in-memory data store.
Redis Cashe is an open-source distributed in-memory data store.
 

More from Neil Raden

Understanding the effects of steroid hormone exposure on direct gene regulati...
Understanding	the effects of steroid hormone exposure on direct gene regulati...Understanding	the effects of steroid hormone exposure on direct gene regulati...
Understanding the effects of steroid hormone exposure on direct gene regulati...
Neil Raden
 

More from Neil Raden (13)

Kagan our constitutional crisis is already here
Kagan our constitutional crisis is already here Kagan our constitutional crisis is already here
Kagan our constitutional crisis is already here
 
Keynote Dubai
Keynote DubaiKeynote Dubai
Keynote Dubai
 
Evaluating the opportunity for embedded ai in data productivity tools
Evaluating the opportunity for embedded ai in data productivity toolsEvaluating the opportunity for embedded ai in data productivity tools
Evaluating the opportunity for embedded ai in data productivity tools
 
Diginomica 2019 2020 not ai neil raden article links and captions
Diginomica 2019 2020 not ai  neil raden article links and captionsDiginomica 2019 2020 not ai  neil raden article links and captions
Diginomica 2019 2020 not ai neil raden article links and captions
 
Diginomica 2019 2020 ai ai ethics neil raden articles links and captions
Diginomica 2019 2020 ai ai ethics neil raden articles links and captionsDiginomica 2019 2020 ai ai ethics neil raden articles links and captions
Diginomica 2019 2020 ai ai ethics neil raden articles links and captions
 
Ethical use of ai for actuaries
Ethical use of ai for actuariesEthical use of ai for actuaries
Ethical use of ai for actuaries
 
Strategy Report for NextGen BI
Strategy Report for NextGen BIStrategy Report for NextGen BI
Strategy Report for NextGen BI
 
Precision medicine and AI: problems ahead
Precision medicine and AI: problems aheadPrecision medicine and AI: problems ahead
Precision medicine and AI: problems ahead
 
Global Data Management: Governance, Security and Usefulness in a Hybrid World
Global Data Management: Governance, Security and Usefulness in a Hybrid WorldGlobal Data Management: Governance, Security and Usefulness in a Hybrid World
Global Data Management: Governance, Security and Usefulness in a Hybrid World
 
Persistence of memory: In-memory Is Not Often the Answer
Persistence of memory: In-memory Is Not Often the AnswerPersistence of memory: In-memory Is Not Often the Answer
Persistence of memory: In-memory Is Not Often the Answer
 
Understanding the effects of steroid hormone exposure on direct gene regulati...
Understanding	the effects of steroid hormone exposure on direct gene regulati...Understanding	the effects of steroid hormone exposure on direct gene regulati...
Understanding the effects of steroid hormone exposure on direct gene regulati...
 
Storytelling Drives Usefulness in Business Intelligence
Storytelling Drives Usefulness in Business IntelligenceStorytelling Drives Usefulness in Business Intelligence
Storytelling Drives Usefulness in Business Intelligence
 
The Case for Business Modeling
The Case for Business ModelingThe Case for Business Modeling
The Case for Business Modeling
 

Recently uploaded

Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
SayantanBiswas37
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 

Recently uploaded (20)

SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 

Data lakehouse fallacies

  • 1. Copyright 2021 Hired Brains Research and Neil Raden Data Lake/Lakehouse/Cloud Data Warehouse: Which is Real? By Neil Raden, Founder Hired Brains Research Data Lake: From Hadoop (and later, other cloud storage options), which was indifferent to the size and type of files that could be processed, as opposed to the rigid and not nearly as scalable nature of relational data warehouses, hatched the idea of the single place for everything – the data lake. In truth, it was a concept hatched by the Hadoop distributors to sell more licenses. Though it did simplify searching for and locating files, it provided no analytical processing tools at all. The logic of moving a JSON file from Paris, France, to a Paris, Texas cloud location adds no value except for some economies of scale in storage and processing. The data lake collects raw data, thousands, perhaps millions of files. This is posited as a benefit. But is it really? At a certain level, raw data is an oxymoron. We can't triangulate data to see if it's consistent with other instances of the same phenomenon or event. "Raw data" typically implies it is to be used for a particular purpose, and it is the beginning point for drawing inferences and drawing conclusions. The context of data — why, how, and when it was recorded, and what method it was collected and then transformed is essential. Context-free data simply does not exist. The perfect objectivity we assign to "raw data" is a myth. That's why in data warehousing, we attempted to integrate and rationalize things. Industry analyst Andrew Brust, in "Big on Data," quotes George Fraser, CEO of Fivetran: "I think 2021 will reveal the need for data lakes in the modern data stack is shrinking...there are no longer new technical reasons for adopting data lakes because data warehouses that separate compute from storage have emerged." If that's not categorical enough for you, Fraser sums things up thus: "In the world of the modern data stack, data lakes are not the optimal solution. They are becoming legacy technology." For organizations that lack cloud-native data warehouses that separate compute from storage or even lack a cloud strategy, that is something of an oversimplification. The calculation of costs of hybrid-cloud, multi-cloud, separation of storage from compute...border on alchemy. And even a good approximation is only as good as when
  • 2. Copyright 2021 Hired Brains Research and Neil Raden you make it because things change so quickly. There is one secret, though, that you will do worse without a model no matter what approach you take. Another thing to consider is that "organization" is often an oxymoron. While there may be a single "strategy" for data architecture in most organizations, the result of acquisitions, legacies, geography, and just the usual punctuated progress, there may be a collection of them, distributed physically and architecturally. The best advice is: Pay more attention to what your data means than where you put it. To patch some of the data lake idea's manifest deficiencies, cloud providers have regularly added processing capabilities that mimic early data warehousing features – comically calling it the "Data Lakehouse" (or the Databricks variant, the Delta Lake) Data Lakehouse: According to Databricks, "A data lakehouse is a new, open data management paradigm that combines the capabilities of data lakes and data warehouses, enabling BI and ML on all data. ... Merging them into a single system means that data teams can move faster as they can use data without accessing multiple systems." This statement is more aspirational than fact. Data warehouses represent forty years of continuous (though not always smooth) progress and provide all of the services that are needed, such as: • AI-driven query optimizer • Complex query formation • Massively parallel operation based on the model, not just sharding • Workload Management • Load Balancing • Scaling to thousands of simultaneous queries • Full ANSI SQL and beyond • In-database Advanced Analytics and support for ML • Ability to handle native data types such as spatial and time-series The fact is that some data warehouse platforms do perform all of these functions and more and are very central to the operations of businesses. In the early seventies, the world was beset with an energy crisis. Some executives in Detroit decided that the US needed small cars, with which they had little experience, but they came up with a platform anyway. But Americans loved their pickup trucks, which accounted for a substantial share of the automaker's revenue, Ford and Chevy especially. When you have a terrible solution, the worst thing you can do is pile on more terrible decisions - the 1973 Ford Courier mini pickup truck, one of the worst, poorly designed, ill-conceived vehicles in history.
  • 3. Copyright 2021 Hired Brains Research and Neil Raden If you can query a JSON file in the Data Lakehouse with SQL transparently, you have accomplished something. But not enough. What troubles me the most is that the data lakehouse's excuse is that it's a data lake with some analytical capabilities. What I haven't heard are understandability and usability. Those capabilities are mostly inherited from the expanding capabilities of cloud services themselves. Cloud Data Warehouse: Cloud data warehouses and there are principally three: AWS Redshift, Snowflake, and Google BigQuery. Many other relational data warehouse technologies have acceptable cloud versions, but the cloud-natives claim the high ground for now. At a certain maturity, they provide all of the functions listed above, rather than being bolt-on capabilities to generic cloud features. However, it does get a little blurry because the CDW's provide more than a traditional data warehouse. One, for example, proves a public data exchange market. I've noticed the word "warehouse" starting to disappear from their content. Would you rather have a cloud-native data warehouse that can handle the most challenging data warehouse tasks but can also provide most of the functionality of a data lake (or, to put it another way, to eliminate the need for a data lake), or would you prefer a data lake with partial data warehouse capabilities slapped on? To sum up: 1. The concept of a data lake is flawed. In an age of multi-cloud and hybrid- cloud distributed data, not to mention sprawling sensor farms of IoT, there is no advantage to pulling it all together. AI-driven knowledge graphs are a far better alternative to locating and tagging data where it is. 2. If you dismiss the data lake, you must of necessity dismiss the lake house 3. Pay more attention to what your data means than where you put it A data lake looks to me to be static "dumb" data neatly arranged. A data lakehouse, if you must use that term, is fundamentally different from a data warehouse. It is a comprehensive set of capabilities that provides a graph-based linked and contextualized information fabric (semantic metadata and linked datasets) where NLP (Natural Language Processing), Sentiment Analysis, Rules Engines, Connectors,
  • 4. Copyright 2021 Hired Brains Research and Neil Raden Canonical Models for common domains. Add to that cognitive tools that can be plugged in to turn "dumb" data into information assets with speed, agility, reuse, and value. I haven't seen one yet. Neil Raden founded Hired Brains Research in 1985 to provide thought leadership, context, and advisory consulting and implementation services in Data Architecture, AI, Analytics/Data Science, and organizational change for analytics for clients worldwide across many industries. Neil is a recognized authority on AI Ethics, the co-author of the first book on Decision Man agent, "Smart (Enough) Systems," and the foundational report for the Society of Actuaries, "Ethical Use of Artificial Intelligence for Actuaries." He welcomes your comments at nraden@hiredbrains.com