2. About me
Kenneth M. Nielsen
Worked with SQL Server since 1999
Data Solution Architect at Microsoft
Kenneth.Nielsen@microsoft.com
@doktorkermit
Linkedin.com/in/KennethMNielsen
www.funkylab.com
3. Agenda
• Azure Data Lake Store
• Azure Data Lake Analytics
• Azure Data Lake Analytics – Using Visual Studio
• Azure Data Lake Analytics – Using PowerShell
• Q & A
5. Azure Data Lake Store
A hyper scale repository for
big data analytics workloads
No limits to SCALE
Store ANY DATA in its native format
HADOOP FILE SYSTEM (HDFS) for the cloud
ENTERPRISE READY access control,
encryption at rest
Optimized for analytic workload
PERFORMANCE
6. Azure Data Lake Store
Any Data
• Unstructured
• Semi-structured
• Structured
8. Azure Data Lake Store
HDFS for the cloud
New filesystem build from the
ground up, based on HADOOP
file system
• Integrates with HDInsight,
Hortonworks and Cloudera
• Supports Files and Folder
objects and operations
9. Azure Data Lake Store
Unlimited storage • Files sizes can be from
Gigabytes to Petabytes
• No limits to scale
10. Azure Data Lake Store
Security • Integrates with Azure Active
Directory
• Audit logs for all operations*
• Server side Encryption*
• ACL on files and folders*
• Enterprise ready security
when in GA
12. Azure Data Lake Analytics
A elastic analytics service
built on Apache YARN that
processes all data, at any
size
• No limits to SCALE
• Includes U-SQL, a language that unifies the
benefits of SQL with the expressive power
of C#
• Optimized to work with ADL STORE
• FEDERATED QUERY across Azure data
sources
• ENTERPRISE READY Role based access
control & Auditing
• Pay PER JOB & Scale PER JOB
13. U-SQL
A new language for
Big Data
• Familiar syntax to millions of SQL & .NET
developers
• Unifies declarative nature of SQL with the
imperative power of C#
• Unifies structured, semi-structured and
unstructured data
• Distributed query support over all data
14. Language Overview
U-SQL Fundamentals
• All the familiar SQL clauses
SELECT | FROM | WHERE
GROUP BY | JOIN | OVER
• Operate on unstructured and
structured data
• Relational metadata objects
.NET integration and
extensibility
• U-SQL expressions are full C#
expressions
• Reuse .NET code in your own
assemblies
• Use C# to define your own:
Types | Functions | Joins | Aggregators | I/O
(Extractors, Outputters)
16. U-SQL Distributed Query
Azure Storage Blobs
Azure Data Lake Store
Azure SQL Database
Azure SQL Data Warehouse
Azure SQL DB in Azure VM
READ
READ
READ
READ
READ
WRITE
WRITE
WRITE
WRITE
WRITE
17. @orders =
EXTRACT
OrderId int,
Customer string,
Date DateTime,
Amount float
FROM "/input/orders.txt"
USING Extractors.Tsv();
OUTPUT @orders
TO "/output/orders_copy.txt"
USING Outputters.Tsv();
Apply Schema on read
From a file in a Data Lake
Easy delimited text handling
Write out
Read the input, write it directly to output (just a simple copy)
Rowset
18. Azure Data Lake Pattern
ADL Storage Visual Studio
ADL
Power BI
Desktop
Get Data
From CSV
Where CAQS Files are
stored, but would load into
ADLS directly if ingesting
from scratch
Upload
Dataset
ADL Analytics
AML Experiment
ADL Storage
Data
Analyst
Data
Scientist
Data
Engineer
19. Execution with Requested Parallelism
Requested Parallelism = 1
(reserve enough to do 1 vertex at a
time)
Requested Parallelism = 4
(reserve enough to do 4 vertices at
a time)
36. ADLA: List and submit jobs
• $adla = “sqlkonferenz”
• Get-AzureRmDataLakeAnalyticsJob
-Account $adla
•
Submit-AzureRmDataLakeAnalyticsJob
-Account $adla
-Script “…” # U-SQL text
-Name myjob
• Submit-AzureRmDataLakeAnalyticsJob
-Account $adla
-ScriptPath D:test.script
-Name myjob
37. ADL Store (ADLS) feature set
Account Management
Create new account
List accounts
Update account properties
Delete account
Transferring Data
Upload into store from local
disk
Download from store to local
disk
Files and Folders
List contents of folder
Create
Move
Delete
Does file exist
Security
Get ACLs
Update ACLs
Get Owner
Set Owner
File Content
Set file content
Append file content
Get file content
Merge files
38. ADL Analytics (ADLA) feature set
Account Management
Create new account
List accounts
Update account properties
Delete account
Data Sources
Add a data source
List data sources
Update data source
Delete data source
Compute
List jobs
Submit job
Cancel job
Catalog Items
List items in U-SQL
catalog
Update item
Catalog Secrets
Create catalog secret
List catalog secrets
Delete catalog secrets
Data lake store is your new friend for storing data, actually almost unlimited data, and the price, well it cost next to nothing to store data on Azure
Any file-format is supported, data is stored in its native format, meaning that you can store, images, json tables, csv, tcv, blobs etc etc.
It is build on HDFS, and here it is HDFS for the cloud.
Support for rename, create and delete files and folders.
Files system build from the scratch, based on HADOOP files system.
Microsoft Azure Data Lake Store is a Hadoop file system that’s compatible with Hadoop Distributed File System (HDFS) and works with the Hadoop ecosystem. Data Lake Store is integrated with Azure Data Lake Analytics and Azure HDInsight and will be integrated with Microsoft offerings like Revolution-R Enterprise; industry-standard distributions like Hortonworks, Cloudera, and MapR; and individual Hadoop projects like Spark, Storm, Flume, Sqoop, and Kafka.
Data Lake Store has no fixed limits on account size or file size. While other cloud storage offerings might restrict individual file sizes to a few terabytes, Data Lake Store can store very large files that are hundreds of times larger. At the same time, it provides very low latency read/write access and high throughput for scenarios like high-resolution video, scientific, medical, large backup data, event streams, web logs, and Internet of Things (IoT). Collect and store everything in Data Lake Store without restriction or prior understanding of business requirements.
Access Control List is only at root level at the moment, meaning that a user is granted access to a root folder, and will have access to everything in that root
This will be changed when the service goes into GA.
U-SQL project, where you write your statements
U-SQL sample project, really extensive project that you can work with on you own account, will give you a head start to getting up to speed on the topic
U-SQL unit testing project,