Today’s data-driven companies have a choice to make – where do we store our data? As the move to the cloud continues to be a driving factor, the choice becomes either the data warehouse (Snowflake et al) or the data lake (AWS S3 et al). There are pro’s and con’s for each approach. While the data warehouse will give you strong data management with analytics, they don’t do well with semi-structured and unstructured data with tightly coupled storage and compute, not to mention expensive vendor lock-in. On the other hand, data lakes allow you to store all kinds of data and are extremely affordable, but they’re only meant for storage and by themselves provide no direct value to an organization.
Enter the Open Data Lakehouse, the next evolution of the data stack that gives you the openness and flexibility of the data lake with the key aspects of the data warehouse like management and transaction support.
In this webinar, you’ll hear from Ali LeClerc who will discuss the data landscape and why many companies are moving to an open data lakehouse. Ali will share more perspective on how you should think about what fits best based on your use case and workloads, and how some real world customers are using Presto, a SQL query engine, to bring analytics to the data lakehouse.
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
Data Warehouse or Data Lake, Which Do I Choose?
1. Data Warehouse
or Data Lake,
which do I
choose?
Ali LeClerc
Head of Community, Ahana
Chairperson, Community Team
Presto Foundation
2. 2
Today’s Agenda
Quick introduction - Data
Warehouses, Data Lakes
Enter the Data Lakehouse
Presto for the Data Lakehouse
Real world use cases
Q&A
3. 3
The Traditional
Data Warehouse
• Relational Database
• Columnar Structure
• In-Database Analytics
• Structured Data
• Modeled Data
• Extract, Transform, Load
• SQL Access
Challenges
• Expensive
• Difficult to Manage
• Costly to Maintain
• Limited Data
• Limited Access
3
4. 4
The Data Lake
• File System Data Store
• Semi-Structured Data
• Ingestion
• Discovery
• Data Science
• Notebook and Python Access
Challenges
• File System Data Store
• Semi-Structured Data
• Ingestion
• Discovery
• Data Science
• Notebook and Python Access
4
5. 5
The Drivers Behind Modernization
Digital
Transformation
Real Time
Events
Modern
Processing
Techniques
More Data
Fast Data
Smart
Data
The Deconstructed Database
6. 6
Data Warehouse vs. Data Lake
Data Warehouse
● Cloud-First
● In-Memory Capabilities
● Complex Data Types
● Storage & Compute still
loosely coupled
● High Performance
● SQL Access
● Expensive
Data Lake
● Cloud-First
● In-Memory Capabilities
● Open Formats
● Columnar Data Types
● Separate Storage &
Compute
● Expanded Analytics
● Improved Performance
● SQL Access
● Cheaper
8. 8
Merging the Data Warehouse and the Data Lake with a
Distributed Query Engine
1. SQL Access
2. Data Lake and Data Warehouse Access
3. Unified Analytics
4. Distributed Queries
5. Limitless Scale
6. Complex Data Types
● Leverage Resources
● Better Insight
● More Use Cases
● Leverage Platforms
● Remove Limits
● Amplified Insight
9. Open Data Lakehouse
The Next EDW is the Open Data Lakehouse
Data Science,
ML, & AI
Reporting and Dashboarding
Data Warehouse
Proprietary Storage
Proprietary
SQL Query Processing
ML and AI
Frameworks
SQL Query Processing
Cloud Data Lake
Open
Formats
Storage
Governance,
Discovery,
Quality &
Security
Reporting and Dashboarding
20. 20
Open Source Presto Overview
SQL Query Processing
What is Presto?
• Open source, distributed SQL query engine for
the data lake & lakehouse
• Designed from ground up for fast analytic
queries against data of any size
• Query in place - no need to move (ETL) data
• Federated querying - join data from different
source formats
22. 22
Ahana Cloud For
1. Zero to Presto in 30 Minutes.
Managed cloud service: No installation
and configuration.
2. Built for data teams of all experience
level.
3. Moderate level of control of
deployment without complexity.
4. Dedicated support from Presto
experts.
24. 24
Blinkit
● India’s instant delivery service
● Moved from the Data Warehouse to the Open
Data Lakehouse powered by Presto & Ahana to
power 200K orders/day
● “Everything delivered in 10 minutes”
“Ahana is providing Blinkit with a SaaS managed
service for Presto, providing the company with the
advanced data management capabilities it needs
to meet its instant delivery promise.”
Satyam Krishna, Engineering Manager at Blinkit
25. 25
Securonix
NextGen SIEM
Cluster
AWS S3 Data
Lake
Glue
Metastore
▪ Securonix is a Security
information and event
management software
▪ They use Ahana for in-app SQL
analytics on data from AWS S3
for threat hunting
▪ They pull in billions of events
per day that get stored in S3
▪ With Ahana Cloud, they saw 3x
better price performance
compared with Presto on AWS