U-SQL Intro (SQLBits 2016)

•Descargar como PPTX, PDF•

0 recomendaciones•468 vistas

Michael Rys

U-SQL Intro (SQLBits 2016 ADL/USQL Pre-Conference) Why U-SQL? Language philosophy, U-SQL Extensibility

Datos y análisis

SQLBits 2016
Azure Data Lake &
U-SQL
Michael Rys, @MikeDoesBigData
http://www.azure.com/datalake
{mrys, usql}@microsoft.com

 hard to work with anything other than
structured data
 difficult to extend with custom code

 User often has to
care about scale and performance
 SQL is 2nd class within string
 Often no code reuse/
sharing across queries

Get benefits of both!
Makes it easy for you by unifying:
• Unstructured and structured data processing
• Declarative SQL and custom imperative Code
• Local and remote Queries
• Increase productivity and agility from Day 1 and
at Day 100 for YOU!

Extend U-SQL with C#/.NET
Built-in operators,
function, aggregates
C# expressions (in SELECT expressions)
User-defined aggregates (UDAGGs)
User-defined functions (UDFs)
User-defined operators (UDOs)

DECLARE Constants/U-SQL Types
DECLARE @text1 string = "Hello World";
DECLARE @text2 string = @"Hello World";
DECLARE @text3 char = 'a';
DECLARE @text4 string = "BEGIN" + @text1 + "END";
DECLARE @text5 string = string.Format("BEGIN{0}END", @text1);
DECLARE @numeric1 sbyte = 0;
DECLARE @numeric2 short = 1;
DECLARE @numeric3 int = 2;
DECLARE @numeric4 long = 3L;
DECLARE @numeric5 float = 4.0f;
DECLARE @numeric6 double = 5.0;
DECLARE @d1 DateTime = System.DateTime.Parse("1979/03/31");
DECLARE @d2 DateTime = DateTime.Now;
DECLARE @misc1 bool = true;
DECLARE @misc2 Guid = System.Guid.Parse("BEF7A4E8-F583-4804-9711-7E608215EBA6");
DECLARE @misc4 byte [] = new byte[] { 0, 1, 2, 3, 4};
DECLARE @map SqlMap<string,string> =
new SqlMap<string,string> {{"key","value"}, {"key","value"});
DECLARE @arr SqlArray<string> = new SqlArray<string> {"val1", "val2"});
DECLARE @nullableint int? = null;
text
numeric
Date/time
Other

U-SQL Language Philosophy
Declarative Query and Transformation Language:
• Uses SQL’s SELECT FROM WHERE with GROUP
BY/Aggregation, Joins, SQL Analytics functions
• Optimizable, Scalable
Expression-flow programming style:
• Easy to use functional lambda composition
• Composable, globally optimizable
Operates on Unstructured & Structured Data
• Schema on read over files
• Relational metadata objects (e.g. database, table)
Extensible from ground up:
• Type system is based on C#
• Expression language IS C#
• User-defined functions (U-SQL and C#)
• User-defined Aggregators (C#)
• User-defined Operators (UDO) (C#)
U-SQL provides the Parallelization and Scale-out
Framework for Usercode
• EXTRACTOR, OUTPUTTER, PROCESSOR, REDUCER,
COMBINER, APPLIER
Federated query across distributed data sources
REFERENCE MyDB.MyAssembly;
CREATE TABLE T( cid int, first_order DateTime
, last_order DateTime, order_count int
, order_amount float );
@o = EXTRACT oid int, cid int, odate DateTime, amount float
FROM "/input/orders.txt"
USING Extractors.Csv();
@c = EXTRACT cid int, name string, city string
FROM "/input/customers.txt"
USING Extractors.Csv();
@j = SELECT c.cid, MIN(o.odate) AS firstorder
, MAX(o.date) AS lastorder, COUNT(o.oid) AS ordercnt
, AGG<MyAgg.MySum>(c.amount) AS totalamount
FROM @c AS c LEFT OUTER JOIN @o AS o ON c.cid == o.cid
WHERE c.city.StartsWith("New")
&& MyNamespace.MyFunction(o.odate) > 10
GROUP BY c.cid;
OUTPUT @j TO "/output/result.txt"
USING new MyData.Write();
INSERT INTO T SELECT * FROM @j;

2
 Automatic "in-lining"
 optimized out-of-
the-box
 Per job
parallelization
 visibility into execution
 Heatmap to identify
bottlenecks

Más contenido relacionado

La actualidad más candente

Killer Scenarios with Data Lake in Azure with U-SQLMichael Rys

Using C# with U-SQL (SQLBits 2016)Michael Rys

U-SQL Learning Resources (SQLBits 2016)Michael Rys

Microsoft's Hadoop StoryMichael Rys

U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...Michael Rys

U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...Michael Rys

Tuning and Optimizing U-SQL Queries (SQLPASS 2016)Michael Rys

U-SQL - Azure Data Lake Analytics for DevelopersMichael Rys

Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...Michael Rys

Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Michael Rys

Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Michael Rys

U-SQL Partitioned Data and Tables (SQLBits 2016)Michael Rys

U-SQL Does SQL (SQLBits 2016)Michael Rys

U-SQL Query Execution and Performance TuningMichael Rys

Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...Michael Rys

Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Michael Rys

SqliteKumar

DMann-SQLDeveloper4ReportingDavid Mann

RdbmsParthiv Prem

Introduction to SQLite: The Most Popular Database in the Worldjkreibich

La actualidad más candente (20)

Killer Scenarios with Data Lake in Azure with U-SQL

Using C# with U-SQL (SQLBits 2016)

U-SQL Learning Resources (SQLBits 2016)

Microsoft's Hadoop Story

U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...

U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...

Tuning and Optimizing U-SQL Queries (SQLPASS 2016)

U-SQL - Azure Data Lake Analytics for Developers

Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...

Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)

Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...

U-SQL Partitioned Data and Tables (SQLBits 2016)

U-SQL Does SQL (SQLBits 2016)

U-SQL Query Execution and Performance Tuning

Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...

Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...

Sqlite

DMann-SQLDeveloper4Reporting

Rdbms

Introduction to SQLite: The Most Popular Database in the World

Similar a U-SQL Intro (SQLBits 2016)

Azure Data Lake and U-SQLMichael Rys

3 CityNetConf - sql+c#=u-sqlŁukasz Grala

Rapid SQL Datasheet - The Intelligent IDE for SQL DevelopmentEmbarcadero Technologies

Resume_of_sayeedKhaled Khaled

Инструменты программистаAndrew Fadeev

Steve Molzen Resume 2016Steven Molzen

J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. NielsenMS Cloud Summit

SQL 3.pptxRishab Saini

20180929 jssug 10_a5_sql_mk2Kunihisa Abukawa

Sql server difference faqs- 6Umar Ali

Introduction to Azure Data LakeAntonios Chatzipavlis

SQL Training courses.pptxirfanakram32

Query editor for multi databasesAarthi Raghavendra

SQLEr. Nawaraj Bhandari

SQL Bhandari Nawaraj

SQL EXCLUSIVE NOTES .pdfNiravPanchal50

MysqlRaghu nath

SQL PPT.pptxKulbir4

SQLCLR For DBAs and Developerswebhostingguy

Dr. Jekyll and Mr. Hydewebhostingguy

Similar a U-SQL Intro (SQLBits 2016) (20)

Azure Data Lake and U-SQL

3 CityNetConf - sql+c#=u-sql

Rapid SQL Datasheet - The Intelligent IDE for SQL Development

Resume_of_sayeed

Инструменты программиста

Steve Molzen Resume 2016

J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen

SQL 3.pptx

20180929 jssug 10_a5_sql_mk2

Sql server difference faqs- 6

Introduction to Azure Data Lake

SQL Training courses.pptx

Query editor for multi databases

SQL

SQL EXCLUSIVE NOTES .pdf

Mysql

SQL PPT.pptx

SQLCLR For DBAs and Developers

Dr. Jekyll and Mr. Hyde

Más de Michael Rys

Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Michael Rys

Big Data Processing with .NET and Spark (SQLBits 2020)Michael Rys

Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...Michael Rys

Running cost effective big data workloads with Azure Synapse and Azure Data L...Michael Rys

Big Data Processing with Spark and .NET - Microsoft Ignite 2019Michael Rys

Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...Michael Rys

Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Michael Rys

U-SQL Query Execution and Performance Basics (SQLBits 2016)Michael Rys

Azure Data Lake Intro (SQLBits 2016)Michael Rys

Más de Michael Rys (9)

Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...

Big Data Processing with .NET and Spark (SQLBits 2020)

Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...

Running cost effective big data workloads with Azure Synapse and Azure Data L...

Big Data Processing with Spark and .NET - Microsoft Ignite 2019

Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...

Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...

U-SQL Query Execution and Performance Basics (SQLBits 2016)

Azure Data Lake Intro (SQLBits 2016)

Último

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823

Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Riyadh +966572737505 get cytotec

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

BabyOno dropshipping via API with DroFx.pptxolyaivanovalion

April 2024 - Crypto Market Report's Analysismanisha194592

Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823

Edukaciniai dropshipping via API with DroFxolyaivanovalion

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823

CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion

Anomaly detection and data imputation within time seriesParis Women in Machine Learning and Data Science

ELKO dropshipping via API with DroFx.pptxolyaivanovalion

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823

Sampling (random) method and Non random.pptDr. Soumendra Kumar Patra

Mature dropshipping via API with DroFx.pptxolyaivanovalion

Smarteg dropshipping via API with DroFx.pptxolyaivanovalion

U-SQL Intro (SQLBits 2016)

1. SQLBits 2016 Azure Data Lake & U-SQL Michael Rys, @MikeDoesBigData http://www.azure.com/datalake {mrys, usql}@microsoft.com

2. Azure Data Lake U-SQL

4. • • •

5.  hard to work with anything other than structured data  difficult to extend with custom code

6.  User often has to care about scale and performance  SQL is 2nd class within string  Often no code reuse/ sharing across queries

7. Get benefits of both! Makes it easy for you by unifying: • Unstructured and structured data processing • Declarative SQL and custom imperative Code • Local and remote Queries • Increase productivity and agility from Day 1 and at Day 100 for YOU!

9. Extend U-SQL with C#/.NET Built-in operators, function, aggregates C# expressions (in SELECT expressions) User-defined aggregates (UDAGGs) User-defined functions (UDFs) User-defined operators (UDOs)

10.

11. DECLARE Constants/U-SQL Types DECLARE @text1 string = "Hello World"; DECLARE @text2 string = @"Hello World"; DECLARE @text3 char = 'a'; DECLARE @text4 string = "BEGIN" + @text1 + "END"; DECLARE @text5 string = string.Format("BEGIN{0}END", @text1); DECLARE @numeric1 sbyte = 0; DECLARE @numeric2 short = 1; DECLARE @numeric3 int = 2; DECLARE @numeric4 long = 3L; DECLARE @numeric5 float = 4.0f; DECLARE @numeric6 double = 5.0; DECLARE @d1 DateTime = System.DateTime.Parse("1979/03/31"); DECLARE @d2 DateTime = DateTime.Now; DECLARE @misc1 bool = true; DECLARE @misc2 Guid = System.Guid.Parse("BEF7A4E8-F583-4804-9711-7E608215EBA6"); DECLARE @misc4 byte [] = new byte[] { 0, 1, 2, 3, 4}; DECLARE @map SqlMap<string,string> = new SqlMap<string,string> {{"key","value"}, {"key","value"}); DECLARE @arr SqlArray<string> = new SqlArray<string> {"val1", "val2"}); DECLARE @nullableint int? = null; text numeric Date/time Other

12. U-SQL Language Philosophy Declarative Query and Transformation Language: • Uses SQL’s SELECT FROM WHERE with GROUP BY/Aggregation, Joins, SQL Analytics functions • Optimizable, Scalable Expression-flow programming style: • Easy to use functional lambda composition • Composable, globally optimizable Operates on Unstructured & Structured Data • Schema on read over files • Relational metadata objects (e.g. database, table) Extensible from ground up: • Type system is based on C# • Expression language IS C# • User-defined functions (U-SQL and C#) • User-defined Aggregators (C#) • User-defined Operators (UDO) (C#) U-SQL provides the Parallelization and Scale-out Framework for Usercode • EXTRACTOR, OUTPUTTER, PROCESSOR, REDUCER, COMBINER, APPLIER Federated query across distributed data sources REFERENCE MyDB.MyAssembly; CREATE TABLE T( cid int, first_order DateTime , last_order DateTime, order_count int , order_amount float ); @o = EXTRACT oid int, cid int, odate DateTime, amount float FROM "/input/orders.txt" USING Extractors.Csv(); @c = EXTRACT cid int, name string, city string FROM "/input/customers.txt" USING Extractors.Csv(); @j = SELECT c.cid, MIN(o.odate) AS firstorder , MAX(o.date) AS lastorder, COUNT(o.oid) AS ordercnt , AGG<MyAgg.MySum>(c.amount) AS totalamount FROM @c AS c LEFT OUTER JOIN @o AS o ON c.cid == o.cid WHERE c.city.StartsWith("New") && MyNamespace.MyFunction(o.odate) > 10 GROUP BY c.cid; OUTPUT @j TO "/output/result.txt" USING new MyData.Write(); INSERT INTO T SELECT * FROM @j;

13. 2  Automatic "in-lining"  optimized out-of- the-box  Per job parallelization  visibility into execution  Heatmap to identify bottlenecks

Notas del editor

Add velocity?
Hard to operate on unstructured data: Even Hive requires meta data to be created to operate on unstructured data. Adding Custom Java functions, aggregators and SerDes is involving a lot of steps and often access to server’s head node and differs based on type of operation. Requires many tools and steps. Some examples: Hive UDAgg Code and compile .java into .jar Extend AbstractGenericUDAFResolver class: Does type checking, argument checking and overloading Extend GenericUDAFEvaluator class: implements logic in 8 methods. - Deploy: Deploy jar into class path on server Edit FunctionRegistry.java to register as built-in Update the content of show functions with ant Hive UDF (as of v0.13) Code Load JAR into head node or at URI CREATE FUNCTION USING JAR to register and load jar into classpath for every function (instead of registering jar and just use the functions)
Spark supports Custom “inputters and outputters” for defining custom RDDs No UDAGGs Simple integration of UDFs but only for duration of program. No reuse/sharing. Cloud dataflow? Requires has to care about scale and perf Spark UDAgg Is not yet supported ( SPARK-3947) Spark UDF Write inline functiondef westernState(state: String) = Seq("CA", "OR", "WA", "AK").contains(state) for SQL usage need to register the tablecustomerTable.registerTempTable("customerTable") Register each UDFsqlContext.udf.register("westernState", westernState _) Call itval westernStates =sqlContext.sql("SELECT * FROM customerTable WHERE westernState(state)")
Offers Auto-scaling and performance Operates on unstructured data without tables needed Easy to extend declaratively with custom code: consistent model for UDO, UDF and UDAgg. Easy to query remote sources even without external tables U-SQL UDAgg Code and compile .cs file: Implement IAggregate’s 3 methods :Init(), Accumulate(), Terminate() C# takes case of type checking, generics etc. Deploy: Tooling: one click registration in user db of assembly By Hand: Copy file to ADL CREATE ASSEMBLY to register assembly Use via AGG<MyNamespace.MyAggregate<T>>(a) U-SQL UDF Code in C#, register assembly once, call by C# name.
Remove SCOPE for external customers?
Extensions require .NET assemblies to be registered with a database
Use for language experts

U-SQL Intro (SQLBits 2016)

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a U-SQL Intro (SQLBits 2016)

Similar a U-SQL Intro (SQLBits 2016) (20)

Más de Michael Rys

Más de Michael Rys (9)

Último

Último (20)

U-SQL Intro (SQLBits 2016)

Notas del editor