Azure Data Lake
Athens May 26, 2017
1982 I started working with computers
1988 I started my professional career in computers industry
1996 I started working with SQL Server 6.0
1998 I earned my first certification at Microsoft as
Microsoft Certified Solution Developer (3rd in Greece)
1999 I started my career as Microsoft Certified Trainer (MCT) with
more than 30.000 hours of training until now!
2010 I became for first time Microsoft MVP on Data Platform
I created the SQL School Greece www.sqlschool.gr
2012 I became MCT Regional Lead by Microsoft Learning Program.
2013 I was certified as MCSE : Data Platform
I was certified as MCSE : Business Intelligence
2016 I was certified as MCSE: Data Management & Analytics
SQL Server Expert and Evangelist
Data Platform MVP
MCT, MCSE, MCITP, MCPD, MCSD, MCDBA,
MCSA, MCTS, MCAD, MCP, OCA, ITIL-F
Μια πηγή ενημέρωσης για τον Microsoft SQL Server προς τους Έλληνες
IT Professionals, DBAs, Developers, Information Workers αλλά και
απλούς χομπίστες που απλά τους αρέσει ο SQL Server.
Help line : email@example.com
• Articles about SQL Server
• SQL Server News
• SQL Nights
What we are doing here Follow us in socials
SQL School Greece group
FROM SQL SERVER
▪ Sign up for a free membership today at sqlpass.org.
▪ Linked In: http://www.sqlpass.org/linkedin
▪ Facebook: http://www.sqlpass.org/facebook
▪ Twitter: @SQLPASS
▪ PASS: http://www.sqlpass.org
What is Azure Data Lake?
“A single store of all data… ranging from
raw data (which implies exact copy of
source system data) to transformed data
which is used for various forms including
reporting, visualization, analytics, and
• Data Lake Analytics
• Data Lake Store
• Develop, debug, and optimize big data programs with ease
• Integrates seamlessly with your existing IT investments
• Store and analyze petabyte-size files and trillions of objects
• Affordable and cost effective
• Enterprise grade security, auditing, and support
What Azure Data Lake Offers?
Data Lakes vs Data Warehouses
DATA WAREHOUSE vs. DATA LAKE
Schema-on-Write PROCESSING Schema-on-Read
Expensive for large data volumes STORAGE Designed for low-cost storage
Configure and Reconfigure as needed
Mature SECURITY Maturing
Business Professionals USERS Data Scientists et. al.
• Enterprise-wide hyper-scale repository for big data analytic workloads.
- Azure Data Lake enables you to capture data of any size, type, and ingestion speed in one single
place for operational and exploratory analytics.
• Can be accessed from Hadoop (available with HDInsight cluster) using
the WebHDFS-compatible REST APIs.
• Specifically designed to enable analytics on the stored data and is tuned
for performance for data analytics scenarios.
• It includes, out of the box, all the enterprise-grade capabilities
- security, manageability, scalability, reliability, and availability
• Essential for real-world enterprise use cases.
What is Azure Data Lake Store?
Azure Data Lake Store vs Azure Blob Storage
AZURE DATA LAKE STORE vs. AZURE BLOB STORAGE
Optimized storage for big data
General purpose object store for a wide variety
of storage scenarios
Batch, interactive, streaming analytics
and machine learning data such as
log files, IoT data, click streams, large
Any type of text or binary data, such as
application back end, backup data, media
storage for streaming and general purpose
Data Lake Store account contains
folders, which in turn contains data
stored as files
Storage account has containers, which in turn
has data in the form of blobs
Hierarchical file system STRUCTURE Object store with flat namespace
Based on Azure Active Directory
Based on shared secrets - Account Access
Keys and Shared Access Signature Keys.
• Is an on-demand analytics job service to simplify big data analytics.
• Focus on writing, running, and managing jobs rather than on
operating distributed infrastructure.
• Can handle jobs of any scale instantly by setting the dial for how much
power you need.
• You only pay for your job when it is running, making it cost-effective.
• The analytics service supports Azure Active Directory letting you
manage access and roles, integrated with your on-premises identity
What is Azure Data Lake Analytics?
• Dynamic scaling
• Develop faster, debug, and optimize smarter using
• U-SQL: simple and familiar, powerful, and extensible
• Integrates seamlessly with your IT investments
• Affordable and cost effective
• Works with all your Azure Data
Azure Data Lake Analytics Key Capabilities
- A only fully-managed cloud Apache Hadoop offering
- Provides optimized open-source analytic clusters for
- Microsoft R Server
- Provides a 99.9% SLA
- Deploy these big data technologies and ISV applications
as managed clusters with enterprise-level security and
What is Azure
Is the new big data query language of
the Azure Data Lake Analytics service
It evolved out of Microsoft's internal Big
Data language called
SCOPE : Easy and Efficient Parallel
Processing of Massive Data Sets
by Ronnie Chaiken, Bob Jenkins, Per-Åke Larson, Bill Ramsey, Darren
Shakib, Simon Weaver, Jingren Zhou
What is U-SQL?
– a familiar SQL-like declarative
– with the extensibility and
programmability provided by C# types
and the C# expression language
– and big data processing concepts such
as “schema on reads”, custom
processors and reducers.
– Azure Data Lake Storage,
– Azure Blob Storage,
– Azure SQL DB, Azure SQL Data
– SQL Server instances running in
ability to query
data from a
variety of data
– Its keywords such as SELECT have to be
– Its expression language inside SELECT
clauses, WHERE predicates etc is C#.
– This for example means, that the
comparison operations inside a predicate
follow C# syntax (e.g., a == "foo"),
– and that the language uses C# null
semantics which is 2-valued and not 3-
valued as in ANSI SQL.
• Azure Data Lake Analytics provides U-SQL for batch processing.
• U-SQL is written and executed in form of a batch script.
• U-SQL also supports data definition statements such as CREATE
TABLE to create metadata artifacts either in separate scripts or
sometimes even in combination with the transformation scripts.
• U-SQL Scripts can be submitted in a variety of ways.
- Directly from within the Azure Data Lake Tools for Visual Studio,
- From the Azure Portal
- Programmatically via the Azure Data Lake SDK job submission API
- Azure Powershell extension's job submission command
How does a U-SQL Script process Data?
It follows the following general processing pattern:
• Retrieve data from stored locations in rowset format
- Stored locations can be files that will be schematized on read with EXTRACT expressions
- Stored locations can be U-SQL tables that are stored in a schematized format
- Or can be tables provided by other data sources such as an Azure SQL database.
• Transform the rowset(s)
- Several transformations over the rowsets can be composed in a data flow format
• Store the transformed rowset data
- Store it in a file with an OUTPUT statement, or
- Store it in a U-SQL table with an INSERT statement
How does a U-SQL Script process Data?
DECLARE @in string = "/Samples/Data/SearchLog.tsv";
DECLARE @out string = "/output/result.tsv";
@searchlog = EXTRACT UserId int, Start DateTime, Region string, Query string,
Duration int?, Urls string, ClickedUrls string
FROM @in USING Extractors.Tsv();
@rs1 = SELECT Start, Region, Duration FROM @searchlog WHERE Region == "en-gb";
@rs1 = SELECT Start, Region, Duration FROM @rs1
WHERE Start >= DateTime.Parse("2012/02/16");
– Create Data Lake Stores
– Create Data Lake Analytics accounts and
connect them to Data Lake Stores
– Import data into Azure Data Lake Stores
– Run U-SQL jobs in Azure Data Lake
Parece que tiene un bloqueador de anuncios ejecutándose. Poniendo SlideShare en la lista blanca de su bloqueador de anuncios, está apoyando a nuestra comunidad de creadores de contenidos.
¿Odia los anuncios?
Hemos actualizado nuestra política de privacidad.
Hemos actualizado su política de privacidad para cumplir con las cambiantes normativas de privacidad internacionales y para ofrecerle información sobre las limitadas formas en las que utilizamos sus datos.
Puede leer los detalles a continuación. Al aceptar, usted acepta la política de privacidad actualizada.