SlideShare una empresa de Scribd logo
1 de 41
Azure Data Lake
Kenneth M. Nielsen
About me
Kenneth M. Nielsen
Worked with SQL Server since 1999
Data Solution Architect at Microsoft
Kenneth.Nielsen@microsoft.com
@doktorkermit
Linkedin.com/in/KennethMNielsen
www.funkylab.com
Agenda
• Azure Data Lake Store
• Azure Data Lake Analytics
• Azure Data Lake Analytics – Using Visual Studio
• Azure Data Lake Analytics – Using PowerShell
• Q & A
Data Lake Store
Azure Data Lake Store
A hyper scale repository for
big data analytics workloads
No limits to SCALE
Store ANY DATA in its native format
HADOOP FILE SYSTEM (HDFS) for the cloud
ENTERPRISE READY access control,
encryption at rest
Optimized for analytic workload
PERFORMANCE
Azure Data Lake Store
Any Data
• Unstructured
• Semi-structured
• Structured
Azure Data Lake Store
Azure Data Lake Store
HDFS for the cloud
New filesystem build from the
ground up, based on HADOOP
file system
• Integrates with HDInsight,
Hortonworks and Cloudera
• Supports Files and Folder
objects and operations
Azure Data Lake Store
Unlimited storage • Files sizes can be from
Gigabytes to Petabytes
• No limits to scale
Azure Data Lake Store
Security • Integrates with Azure Active
Directory
• Audit logs for all operations*
• Server side Encryption*
• ACL on files and folders*
• Enterprise ready security
when in GA
Data Lake Analytics
Azure Data Lake Analytics
A elastic analytics service
built on Apache YARN that
processes all data, at any
size
• No limits to SCALE
• Includes U-SQL, a language that unifies the
benefits of SQL with the expressive power
of C#
• Optimized to work with ADL STORE
• FEDERATED QUERY across Azure data
sources
• ENTERPRISE READY Role based access
control & Auditing
• Pay PER JOB & Scale PER JOB
U-SQL
A new language for
Big Data
• Familiar syntax to millions of SQL & .NET
developers
• Unifies declarative nature of SQL with the
imperative power of C#
• Unifies structured, semi-structured and
unstructured data
• Distributed query support over all data
Language Overview
U-SQL Fundamentals
• All the familiar SQL clauses
SELECT | FROM | WHERE
GROUP BY | JOIN | OVER
• Operate on unstructured and
structured data
• Relational metadata objects
.NET integration and
extensibility
• U-SQL expressions are full C#
expressions
• Reuse .NET code in your own
assemblies
• Use C# to define your own:
Types | Functions | Joins | Aggregators | I/O
(Extractors, Outputters)
U-SQL Capabilities
Interactive
Batch
Streaming
Machine Learning
IN PROGRESS
AVAILABLE NOW
FUTURE
FUTURE
U-SQL Distributed Query
Azure Storage Blobs
Azure Data Lake Store
Azure SQL Database
Azure SQL Data Warehouse
Azure SQL DB in Azure VM
READ
READ
READ
READ
READ
WRITE
WRITE
WRITE
WRITE
WRITE
@orders =
EXTRACT
OrderId int,
Customer string,
Date DateTime,
Amount float
FROM "/input/orders.txt"
USING Extractors.Tsv();
OUTPUT @orders
TO "/output/orders_copy.txt"
USING Outputters.Tsv();
Apply Schema on read
From a file in a Data Lake
Easy delimited text handling
Write out
Read the input, write it directly to output (just a simple copy)
Rowset
Azure Data Lake Pattern
ADL Storage Visual Studio
ADL
Power BI
Desktop
Get Data
From CSV
Where CAQS Files are
stored, but would load into
ADLS directly if ingesting
from scratch
Upload
Dataset
ADL Analytics
AML Experiment
ADL Storage
Data
Analyst
Data
Scientist
Data
Engineer
Execution with Requested Parallelism
Requested Parallelism = 1
(reserve enough to do 1 vertex at a
time)
Requested Parallelism = 4
(reserve enough to do 4 vertices at
a time)
Stage Details
252 Pieces of work
AVG Vertex execution time
4.3 Billion rows
Data Read & Written
ADLAUs
Azure
Data
Lake
Analytics
Unit
Parallelism N = N ADLAUs
1 ADLAU ~=
A VM with 2 cores and 6
GB of memory
Data Lake Analytics
Visual Studio
Azure Data Lake – Visual Studio
Available
project types
Azure Data Lake – Visual Studio
Fully integrates to
Solution Explorer
Azure Data Lake – Visual Studio
• Monitor and
manage jobs
• Browse and
manage storage
• Browse U-SQL
catalog
Creating U-SQL
Creating U-SQL
IntelliSense
Supported
Creating U-SQL
Code behind
enhance your
code
Installing Azure PowerShell
• PowerShell Gallery
• Recommended approach
• PowerShell 5.0 supports PowerShell Gallery
• Windows 10 ships with PowerShell 5.0
• Web Platform Installation (WebPI)
Installing from the PowerShell Gallery
• Launch Windows PowerShell ISE as Administrator
• Install-Module AzureRM
• Install-AzureRM
Finding the ADL cmdlets
• Option 1
• Get-Command -Module AzureRM.DataLakeStore
• Get-Command -Module
AzureRM.DataLakeAnalytics
• Option 2
• Get-Command *DataLake*
Logging in to Azure
$subname = “BDHadoopTeamPMTestDemo”
Login-AzureRmAccount –SubscriptionName $subname
ADLS: Listing files in a store
• $adls = “sqlkonferenz”
• Get-AzureRmDataLakeStoreChildItem
• -Account $adls
• -Path /
ADLS: Upload and download
• $adls = “sqlkonferenz”
• Import-AzureRmDataLakeStoreItem
-Account $adls
-Path d:somefile.txt
-Destination /somefile.txt
• Export-AzureRmDataLakeStoreItem
-Account $adls
-Path /somefile.txt
-Destination d:somefile_copy.txt
ADLA: List and submit jobs
• $adla = “sqlkonferenz”
• Get-AzureRmDataLakeAnalyticsJob
-Account $adla
•
Submit-AzureRmDataLakeAnalyticsJob
-Account $adla
-Script “…” # U-SQL text
-Name myjob
• Submit-AzureRmDataLakeAnalyticsJob
-Account $adla
-ScriptPath D:test.script
-Name myjob
ADL Store (ADLS) feature set
Account Management
Create new account
List accounts
Update account properties
Delete account
Transferring Data
Upload into store from local
disk
Download from store to local
disk
Files and Folders
List contents of folder
Create
Move
Delete
Does file exist
Security
Get ACLs
Update ACLs
Get Owner
Set Owner
File Content
Set file content
Append file content
Get file content
Merge files
ADL Analytics (ADLA) feature set
Account Management
Create new account
List accounts
Update account properties
Delete account
Data Sources
Add a data source
List data sources
Update data source
Delete data source
Compute
List jobs
Submit job
Cancel job
Catalog Items
List items in U-SQL
catalog
Update item
Catalog Secrets
Create catalog secret
List catalog secrets
Delete catalog secrets
Questions
Azure data lake   sql konf 2016

Más contenido relacionado

La actualidad más candente

Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Michael Rys
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Lace Lofranco
 
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)Michael Rys
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Michael Rys
 
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...Michael Rys
 
Azure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAzure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAlex Tumanoff
 
3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sql3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sqlŁukasz Grala
 
Building Advanced Analytics Pipelines with Azure Databricks
Building Advanced Analytics Pipelines with Azure DatabricksBuilding Advanced Analytics Pipelines with Azure Databricks
Building Advanced Analytics Pipelines with Azure DatabricksLace Lofranco
 
Deep Dive into Azure Data Factory v2
Deep Dive into Azure Data Factory v2Deep Dive into Azure Data Factory v2
Deep Dive into Azure Data Factory v2Eric Bragas
 
Azure Data Lake and Azure Data Lake Analytics
Azure Data Lake and Azure Data Lake AnalyticsAzure Data Lake and Azure Data Lake Analytics
Azure Data Lake and Azure Data Lake AnalyticsWaqas Idrees
 
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)Jason L Brugger
 
Azure Data Factory for Azure Data Week
Azure Data Factory for Azure Data WeekAzure Data Factory for Azure Data Week
Azure Data Factory for Azure Data WeekMark Kromer
 
201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine LearningMark Tabladillo
 
Microsoft Azure Databricks
Microsoft Azure DatabricksMicrosoft Azure Databricks
Microsoft Azure DatabricksSascha Dittmann
 
U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)Michael Rys
 
Azure data factory
Azure data factoryAzure data factory
Azure data factoryBizTalk360
 
Moving to the cloud; PaaS, IaaS or Managed Instance
Moving to the cloud; PaaS, IaaS or Managed InstanceMoving to the cloud; PaaS, IaaS or Managed Instance
Moving to the cloud; PaaS, IaaS or Managed InstanceThomas Sykes
 
Azure Data Factory presentation with links
Azure Data Factory presentation with linksAzure Data Factory presentation with links
Azure Data Factory presentation with linksChris Testa-O'Neill
 
Microsoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview SlidesMicrosoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview SlidesMark Kromer
 

La actualidad más candente (20)

Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
 
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
 
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
 
Azure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAzure data bricks by Eugene Polonichko
Azure data bricks by Eugene Polonichko
 
3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sql3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sql
 
Building Advanced Analytics Pipelines with Azure Databricks
Building Advanced Analytics Pipelines with Azure DatabricksBuilding Advanced Analytics Pipelines with Azure Databricks
Building Advanced Analytics Pipelines with Azure Databricks
 
Deep Dive into Azure Data Factory v2
Deep Dive into Azure Data Factory v2Deep Dive into Azure Data Factory v2
Deep Dive into Azure Data Factory v2
 
Azure Data Lake and Azure Data Lake Analytics
Azure Data Lake and Azure Data Lake AnalyticsAzure Data Lake and Azure Data Lake Analytics
Azure Data Lake and Azure Data Lake Analytics
 
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
 
Data virtualization using polybase
Data virtualization using polybaseData virtualization using polybase
Data virtualization using polybase
 
Azure Data Factory for Azure Data Week
Azure Data Factory for Azure Data WeekAzure Data Factory for Azure Data Week
Azure Data Factory for Azure Data Week
 
201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning
 
Microsoft Azure Databricks
Microsoft Azure DatabricksMicrosoft Azure Databricks
Microsoft Azure Databricks
 
U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)
 
Azure data factory
Azure data factoryAzure data factory
Azure data factory
 
Moving to the cloud; PaaS, IaaS or Managed Instance
Moving to the cloud; PaaS, IaaS or Managed InstanceMoving to the cloud; PaaS, IaaS or Managed Instance
Moving to the cloud; PaaS, IaaS or Managed Instance
 
Azure Data Factory presentation with links
Azure Data Factory presentation with linksAzure Data Factory presentation with links
Azure Data Factory presentation with links
 
Microsoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview SlidesMicrosoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview Slides
 

Similar a Azure data lake sql konf 2016

Azure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data LakeAzure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data LakeRick van den Bosch
 
Tokyo azure meetup #2 big data made easy
Tokyo azure meetup #2   big data made easyTokyo azure meetup #2   big data made easy
Tokyo azure meetup #2 big data made easyTokyo Azure Meetup
 
Accelerating Business Intelligence Solutions with Microsoft Azure pass
Accelerating Business Intelligence Solutions with Microsoft Azure   passAccelerating Business Intelligence Solutions with Microsoft Azure   pass
Accelerating Business Intelligence Solutions with Microsoft Azure passJason Strate
 
Geek Sync | Deployment and Management of Complex Azure Environments
Geek Sync | Deployment and Management of Complex Azure EnvironmentsGeek Sync | Deployment and Management of Complex Azure Environments
Geek Sync | Deployment and Management of Complex Azure EnvironmentsIDERA Software
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)James Serra
 
Oracle database connection with the .net developers
Oracle database connection with the .net developersOracle database connection with the .net developers
Oracle database connection with the .net developersveerendramb3
 
Scalable relational database with SQL Azure
Scalable relational database with SQL AzureScalable relational database with SQL Azure
Scalable relational database with SQL AzureShy Engelberg
 
Rajnish singh(presentation on oracle )
Rajnish singh(presentation on  oracle )Rajnish singh(presentation on  oracle )
Rajnish singh(presentation on oracle )Rajput Rajnish
 
SQL Server 2019 hotlap - WARDY IT Solutions
SQL Server 2019 hotlap - WARDY IT SolutionsSQL Server 2019 hotlap - WARDY IT Solutions
SQL Server 2019 hotlap - WARDY IT SolutionsMichaela Murray
 
2014.10.22 Building Azure Solutions with Office 365
2014.10.22 Building Azure Solutions with Office 3652014.10.22 Building Azure Solutions with Office 365
2014.10.22 Building Azure Solutions with Office 365Marco Parenzan
 
Azure - Data Platform
Azure - Data PlatformAzure - Data Platform
Azure - Data Platformgiventocode
 
Analytics in the Cloud
Analytics in the CloudAnalytics in the Cloud
Analytics in the CloudRoss McNeely
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventTrivadis
 
Gs08 modernize your data platform with sql technologies wash dc
Gs08 modernize your data platform with sql technologies   wash dcGs08 modernize your data platform with sql technologies   wash dc
Gs08 modernize your data platform with sql technologies wash dcBob Ward
 

Similar a Azure data lake sql konf 2016 (20)

Azure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data LakeAzure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data Lake
 
Tokyo azure meetup #2 big data made easy
Tokyo azure meetup #2   big data made easyTokyo azure meetup #2   big data made easy
Tokyo azure meetup #2 big data made easy
 
Introduction to Azure Data Lake
Introduction to Azure Data LakeIntroduction to Azure Data Lake
Introduction to Azure Data Lake
 
Accelerating Business Intelligence Solutions with Microsoft Azure pass
Accelerating Business Intelligence Solutions with Microsoft Azure   passAccelerating Business Intelligence Solutions with Microsoft Azure   pass
Accelerating Business Intelligence Solutions with Microsoft Azure pass
 
Geek Sync | Deployment and Management of Complex Azure Environments
Geek Sync | Deployment and Management of Complex Azure EnvironmentsGeek Sync | Deployment and Management of Complex Azure Environments
Geek Sync | Deployment and Management of Complex Azure Environments
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Oracle database connection with the .net developers
Oracle database connection with the .net developersOracle database connection with the .net developers
Oracle database connection with the .net developers
 
Scalable relational database with SQL Azure
Scalable relational database with SQL AzureScalable relational database with SQL Azure
Scalable relational database with SQL Azure
 
ow.ppt
ow.pptow.ppt
ow.ppt
 
ow.ppt
ow.pptow.ppt
ow.ppt
 
Ow
OwOw
Ow
 
Plantilla oracle
Plantilla oraclePlantilla oracle
Plantilla oracle
 
Rajnish singh(presentation on oracle )
Rajnish singh(presentation on  oracle )Rajnish singh(presentation on  oracle )
Rajnish singh(presentation on oracle )
 
Campus days Azure HDInsight automation
Campus days Azure HDInsight automationCampus days Azure HDInsight automation
Campus days Azure HDInsight automation
 
SQL Server 2019 hotlap - WARDY IT Solutions
SQL Server 2019 hotlap - WARDY IT SolutionsSQL Server 2019 hotlap - WARDY IT Solutions
SQL Server 2019 hotlap - WARDY IT Solutions
 
2014.10.22 Building Azure Solutions with Office 365
2014.10.22 Building Azure Solutions with Office 3652014.10.22 Building Azure Solutions with Office 365
2014.10.22 Building Azure Solutions with Office 365
 
Azure - Data Platform
Azure - Data PlatformAzure - Data Platform
Azure - Data Platform
 
Analytics in the Cloud
Analytics in the CloudAnalytics in the Cloud
Analytics in the Cloud
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
 
Gs08 modernize your data platform with sql technologies wash dc
Gs08 modernize your data platform with sql technologies   wash dcGs08 modernize your data platform with sql technologies   wash dc
Gs08 modernize your data platform with sql technologies wash dc
 

Último

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 

Último (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 

Azure data lake sql konf 2016

  • 2. About me Kenneth M. Nielsen Worked with SQL Server since 1999 Data Solution Architect at Microsoft Kenneth.Nielsen@microsoft.com @doktorkermit Linkedin.com/in/KennethMNielsen www.funkylab.com
  • 3. Agenda • Azure Data Lake Store • Azure Data Lake Analytics • Azure Data Lake Analytics – Using Visual Studio • Azure Data Lake Analytics – Using PowerShell • Q & A
  • 5. Azure Data Lake Store A hyper scale repository for big data analytics workloads No limits to SCALE Store ANY DATA in its native format HADOOP FILE SYSTEM (HDFS) for the cloud ENTERPRISE READY access control, encryption at rest Optimized for analytic workload PERFORMANCE
  • 6. Azure Data Lake Store Any Data • Unstructured • Semi-structured • Structured
  • 8. Azure Data Lake Store HDFS for the cloud New filesystem build from the ground up, based on HADOOP file system • Integrates with HDInsight, Hortonworks and Cloudera • Supports Files and Folder objects and operations
  • 9. Azure Data Lake Store Unlimited storage • Files sizes can be from Gigabytes to Petabytes • No limits to scale
  • 10. Azure Data Lake Store Security • Integrates with Azure Active Directory • Audit logs for all operations* • Server side Encryption* • ACL on files and folders* • Enterprise ready security when in GA
  • 12. Azure Data Lake Analytics A elastic analytics service built on Apache YARN that processes all data, at any size • No limits to SCALE • Includes U-SQL, a language that unifies the benefits of SQL with the expressive power of C# • Optimized to work with ADL STORE • FEDERATED QUERY across Azure data sources • ENTERPRISE READY Role based access control & Auditing • Pay PER JOB & Scale PER JOB
  • 13. U-SQL A new language for Big Data • Familiar syntax to millions of SQL & .NET developers • Unifies declarative nature of SQL with the imperative power of C# • Unifies structured, semi-structured and unstructured data • Distributed query support over all data
  • 14. Language Overview U-SQL Fundamentals • All the familiar SQL clauses SELECT | FROM | WHERE GROUP BY | JOIN | OVER • Operate on unstructured and structured data • Relational metadata objects .NET integration and extensibility • U-SQL expressions are full C# expressions • Reuse .NET code in your own assemblies • Use C# to define your own: Types | Functions | Joins | Aggregators | I/O (Extractors, Outputters)
  • 16. U-SQL Distributed Query Azure Storage Blobs Azure Data Lake Store Azure SQL Database Azure SQL Data Warehouse Azure SQL DB in Azure VM READ READ READ READ READ WRITE WRITE WRITE WRITE WRITE
  • 17. @orders = EXTRACT OrderId int, Customer string, Date DateTime, Amount float FROM "/input/orders.txt" USING Extractors.Tsv(); OUTPUT @orders TO "/output/orders_copy.txt" USING Outputters.Tsv(); Apply Schema on read From a file in a Data Lake Easy delimited text handling Write out Read the input, write it directly to output (just a simple copy) Rowset
  • 18. Azure Data Lake Pattern ADL Storage Visual Studio ADL Power BI Desktop Get Data From CSV Where CAQS Files are stored, but would load into ADLS directly if ingesting from scratch Upload Dataset ADL Analytics AML Experiment ADL Storage Data Analyst Data Scientist Data Engineer
  • 19. Execution with Requested Parallelism Requested Parallelism = 1 (reserve enough to do 1 vertex at a time) Requested Parallelism = 4 (reserve enough to do 4 vertices at a time)
  • 20. Stage Details 252 Pieces of work AVG Vertex execution time 4.3 Billion rows Data Read & Written
  • 21. ADLAUs Azure Data Lake Analytics Unit Parallelism N = N ADLAUs 1 ADLAU ~= A VM with 2 cores and 6 GB of memory
  • 23. Azure Data Lake – Visual Studio Available project types
  • 24. Azure Data Lake – Visual Studio Fully integrates to Solution Explorer
  • 25. Azure Data Lake – Visual Studio • Monitor and manage jobs • Browse and manage storage • Browse U-SQL catalog
  • 29.
  • 30. Installing Azure PowerShell • PowerShell Gallery • Recommended approach • PowerShell 5.0 supports PowerShell Gallery • Windows 10 ships with PowerShell 5.0 • Web Platform Installation (WebPI)
  • 31. Installing from the PowerShell Gallery • Launch Windows PowerShell ISE as Administrator • Install-Module AzureRM • Install-AzureRM
  • 32. Finding the ADL cmdlets • Option 1 • Get-Command -Module AzureRM.DataLakeStore • Get-Command -Module AzureRM.DataLakeAnalytics • Option 2 • Get-Command *DataLake*
  • 33. Logging in to Azure $subname = “BDHadoopTeamPMTestDemo” Login-AzureRmAccount –SubscriptionName $subname
  • 34. ADLS: Listing files in a store • $adls = “sqlkonferenz” • Get-AzureRmDataLakeStoreChildItem • -Account $adls • -Path /
  • 35. ADLS: Upload and download • $adls = “sqlkonferenz” • Import-AzureRmDataLakeStoreItem -Account $adls -Path d:somefile.txt -Destination /somefile.txt • Export-AzureRmDataLakeStoreItem -Account $adls -Path /somefile.txt -Destination d:somefile_copy.txt
  • 36. ADLA: List and submit jobs • $adla = “sqlkonferenz” • Get-AzureRmDataLakeAnalyticsJob -Account $adla • Submit-AzureRmDataLakeAnalyticsJob -Account $adla -Script “…” # U-SQL text -Name myjob • Submit-AzureRmDataLakeAnalyticsJob -Account $adla -ScriptPath D:test.script -Name myjob
  • 37. ADL Store (ADLS) feature set Account Management Create new account List accounts Update account properties Delete account Transferring Data Upload into store from local disk Download from store to local disk Files and Folders List contents of folder Create Move Delete Does file exist Security Get ACLs Update ACLs Get Owner Set Owner File Content Set file content Append file content Get file content Merge files
  • 38. ADL Analytics (ADLA) feature set Account Management Create new account List accounts Update account properties Delete account Data Sources Add a data source List data sources Update data source Delete data source Compute List jobs Submit job Cancel job Catalog Items List items in U-SQL catalog Update item Catalog Secrets Create catalog secret List catalog secrets Delete catalog secrets
  • 39.

Notas del editor

  1. Data lake store is your new friend for storing data, actually almost unlimited data, and the price, well it cost next to nothing to store data on Azure Any file-format is supported, data is stored in its native format, meaning that you can store, images, json tables, csv, tcv, blobs etc etc. It is build on HDFS, and here it is HDFS for the cloud.
  2. Support for rename, create and delete files and folders. Files system build from the scratch, based on HADOOP files system. Microsoft Azure Data Lake Store is a Hadoop file system that’s compatible with Hadoop Distributed File System (HDFS) and works with the Hadoop ecosystem. Data Lake Store is integrated with Azure Data Lake Analytics and Azure HDInsight and will be integrated with Microsoft offerings like Revolution-R Enterprise; industry-standard distributions like Hortonworks, Cloudera, and MapR; and individual Hadoop projects like Spark, Storm, Flume, Sqoop, and Kafka.
  3. Data Lake Store has no fixed limits on account size or file size. While other cloud storage offerings might restrict individual file sizes to a few terabytes, Data Lake Store can store very large files that are hundreds of times larger. At the same time, it provides very low latency read/write access and high throughput for scenarios like high-resolution video, scientific, medical, large backup data, event streams, web logs, and Internet of Things (IoT). Collect and store everything in Data Lake Store without restriction or prior understanding of business requirements.
  4. Access Control List is only at root level at the moment, meaning that a user is granted access to a root folder, and will have access to everything in that root This will be changed when the service goes into GA.
  5. U-SQL project, where you write your statements U-SQL sample project, really extensive project that you can work with on you own account, will give you a head start to getting up to speed on the topic U-SQL unit testing project,
  6. Integrates seamlessly with server explorer