Rajesh Dadhia. This session introduces the newest services in the Cortana Analytics family. Azure Data Lake is a hyper-scale data repository designed for big data analytics workloads. It provides a single place to store any type of data in its native format. In this session, we will show how the HDFS compatibility of Azure Data Lake as a Hadoop File System enables all Hadoop workloads including Azure HDInsight, Hortonworks and Cloudera. Further, we will focus on the key capabilities of the Azure Data Lake that make it an ideal choice for storing, accessing and sharing data for a wide range of analytics applications. Go to https://channel9.msdn.com/ to find the recording of this session.
6. Observation
Pattern
Theory
Hypothesis
What will
happen?
How can we
make it happen?
Predictive
Analytics
Prescriptive
Analytics
What
happened?
Why did
it happen?
Descriptive
Analytics
Diagnostic
Analytics
Top-Down
Confirmation
Theory
Hypothesis
Observation
7. Ingest
regardless of requirements
Store
in native format without
schema definition
Analyze
Using analytic engines
like Hadoop
Interactive queries
Batch queries
Machine Learning
Data warehouse
Real-time analytics
Devices
8. • Store any data in its native format
• Hadoop File System (HDFS) for the cloud
• Enterprise grade
• No limits to scale
• Optimized for analytic workload performance
Introducing Azure Data Lake
A hyper scale repository for big data analytics workloads
12. Ingress
Azure Storage Blobs
Client Machines
Azure SQL
DB
Azure
SQL DW
Azure
Tables
Azure Web Portal via Browser
Azure PowerShell
.NET SDK
JavaScript CLI
ADL built-in Copy Service
Azure Data Factory
Azure Data Factory
Sqoop
Azure Data Factory
Azure Data Factory
Third-party tools
ASA
13. Egress
Azure Storage Blobs
Client Machines
Azure SQL
DB
Azure
SQL DW
Azure
Tables
Azure Web Portal via Browser
Azure PowerShell
.NET SDK
JavaScript CLI
ADL built-in Copy Service
Azure Data FactoryAzure Data Factory
Sqoop
Azure Data Factory
14.
15. ``
• Built from the ground-up as a Hadoop File
System
• Support for file/folder objects and
operations
• Integrated w/ HDInsight, Hortonworks,
Cloudera
• Accessible to all HDFS compliant projects
(Spark, Storm, Flume, Sqoop, Kafka, R, etc.)
HDInsight
16.
17. • Azure Active Directory integration
• File and folder level access control
• Audit data access
• Encryption of data-at-rest
18. Access Control
• Secure Files and
Folders
• POSIX compliant
ACLs
• Minimal (octet)
and enhanced
ACLs
• Based on Azure
AD principals
Auditing
• Audit logs for all
operations
• Consumable via
big data analytics
Encryption at Rest
• Transparent
server-side
encryption
• Azure Managed
and Customer
managed Keys
• Azure Key Vault
Integration
21. • Unlimited account sizes
• Individual file sizes from GBs to PBs
• No limits to scale
PB
TB GB
PB
TB
22. • Built for running large analytic systems that
require massive throughput
• Optimized for parallel computation over PBs
of data
• Automatically optimize for any throughput
25. • Can store structured, semi-structured, unstructured data
• Can support all Hadoop applications
• Is built for the enterprise
• Can meet performance needs of big data applications