Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Introduction to Azure Data Lake

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Cargando en…3
×

Eche un vistazo a continuación

1 de 32 Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

Anuncio

Similares a Introduction to Azure Data Lake (20)

Más de Antonios Chatzipavlis (18)

Anuncio

Más reciente (20)

Introduction to Azure Data Lake

  1. 1. Introduction to Azure Data Lake Athens May 26, 2017
  2. 2. PresenterInfo 1982 I started working with computers 1988 I started my professional career in computers industry 1996 I started working with SQL Server 6.0 1998 I earned my first certification at Microsoft as Microsoft Certified Solution Developer (3rd in Greece) 1999 I started my career as Microsoft Certified Trainer (MCT) with more than 30.000 hours of training until now! 2010 I became for first time Microsoft MVP on Data Platform I created the SQL School Greece www.sqlschool.gr 2012 I became MCT Regional Lead by Microsoft Learning Program. 2013 I was certified as MCSE : Data Platform I was certified as MCSE : Business Intelligence 2016 I was certified as MCSE: Data Management & Analytics Antonios Chatzipavlis SQL Server Expert and Evangelist Data Platform MVP MCT, MCSE, MCITP, MCPD, MCSD, MCDBA, MCSA, MCTS, MCAD, MCP, OCA, ITIL-F
  3. 3. SQLschool.gr Μια πηγή ενημέρωσης για τον Microsoft SQL Server προς τους Έλληνες IT Professionals, DBAs, Developers, Information Workers αλλά και απλούς χομπίστες που απλά τους αρέσει ο SQL Server. Help line : help@sqlschool.gr • Articles about SQL Server • SQL Server News • SQL Nights • Webcasts • Downloads • Resources What we are doing here Follow us in socials fb/sqlschoolgr fb/groups/sqlschool @antoniosch @sqlschool yt/c/SqlschoolGr SQL School Greece group SELECT KNOWLEDGE FROM SQL SERVER
  4. 4. ▪ Sign up for a free membership today at sqlpass.org. ▪ Linked In: http://www.sqlpass.org/linkedin ▪ Facebook: http://www.sqlpass.org/facebook ▪ Twitter: @SQLPASS ▪ PASS: http://www.sqlpass.org
  5. 5. PASSVirtualChapters
  6. 6. Data Lake Overview
  7. 7. What is Azure Data Lake? “A single store of all data… ranging from raw data (which implies exact copy of source system data) to transformed data which is used for various forms including reporting, visualization, analytics, and machine learning”
  8. 8. Built on Open-Source
  9. 9. Azure Ecosystem Integration Azure Data Lake
  10. 10. • Data Lake Analytics • HDInsight • Data Lake Store • Develop, debug, and optimize big data programs with ease • Integrates seamlessly with your existing IT investments • Store and analyze petabyte-size files and trillions of objects • Affordable and cost effective • Enterprise grade security, auditing, and support What Azure Data Lake Offers?
  11. 11. Data Lakes vs Data Warehouses DATA WAREHOUSE vs. DATA LAKE Structured Processed DATA Structured Semi-structured Unstructured Raw Schema-on-Write PROCESSING Schema-on-Read Expensive for large data volumes STORAGE Designed for low-cost storage Less Agile Fixed configuration AGILITY Highly Agile Configure and Reconfigure as needed Mature SECURITY Maturing Business Professionals USERS Data Scientists et. al.
  12. 12. Data Lake Store
  13. 13. • Enterprise-wide hyper-scale repository for big data analytic workloads. - Azure Data Lake enables you to capture data of any size, type, and ingestion speed in one single place for operational and exploratory analytics. • Can be accessed from Hadoop (available with HDInsight cluster) using the WebHDFS-compatible REST APIs. • Specifically designed to enable analytics on the stored data and is tuned for performance for data analytics scenarios. • It includes, out of the box, all the enterprise-grade capabilities - security, manageability, scalability, reliability, and availability • Essential for real-world enterprise use cases. What is Azure Data Lake Store?
  14. 14. Azure Data Lake Store vs Azure Blob Storage AZURE DATA LAKE STORE vs. AZURE BLOB STORAGE Optimized storage for big data analytics workloads PURPOSE General purpose object store for a wide variety of storage scenarios Batch, interactive, streaming analytics and machine learning data such as log files, IoT data, click streams, large datasets USE CASES Any type of text or binary data, such as application back end, backup data, media storage for streaming and general purpose data Data Lake Store account contains folders, which in turn contains data stored as files KEY CONCEPTS Storage account has containers, which in turn has data in the form of blobs Hierarchical file system STRUCTURE Object store with flat namespace Based on Azure Active Directory Identities SECURITY Based on shared secrets - Account Access Keys and Shared Access Signature Keys.
  15. 15. Data Lake Analytics
  16. 16. • Is an on-demand analytics job service to simplify big data analytics. • Focus on writing, running, and managing jobs rather than on operating distributed infrastructure. • Can handle jobs of any scale instantly by setting the dial for how much power you need. • You only pay for your job when it is running, making it cost-effective. • The analytics service supports Azure Active Directory letting you manage access and roles, integrated with your on-premises identity system. What is Azure Data Lake Analytics?
  17. 17. • Dynamic scaling • Develop faster, debug, and optimize smarter using familiar tools • U-SQL: simple and familiar, powerful, and extensible • Integrates seamlessly with your IT investments • Affordable and cost effective • Works with all your Azure Data Azure Data Lake Analytics Key Capabilities
  18. 18. HDInsight
  19. 19. - A only fully-managed cloud Apache Hadoop offering - Provides optimized open-source analytic clusters for - Spark, - Hive, - MapReduce, - HBase, - Storm, - Kafka, - Microsoft R Server - Provides a 99.9% SLA - Deploy these big data technologies and ISV applications as managed clusters with enterprise-level security and monitoring. What is Azure HDInsight?
  20. 20. U-SQL
  21. 21. Is the new big data query language of the Azure Data Lake Analytics service It evolved out of Microsoft's internal Big Data language called SCOPE : Easy and Efficient Parallel Processing of Massive Data Sets by Ronnie Chaiken, Bob Jenkins, Per-Åke Larson, Bill Ramsey, Darren Shakib, Simon Weaver, Jingren Zhou http://www.vldb.org/pvldb/1/1454166.pdf What is U-SQL?
  22. 22. – a familiar SQL-like declarative language – with the extensibility and programmability provided by C# types and the C# expression language – and big data processing concepts such as “schema on reads”, custom processors and reducers. U-SQL combines
  23. 23. – Azure Data Lake Storage, – Azure Blob Storage, – Azure SQL DB, Azure SQL Data Warehouse, – SQL Server instances running in Azure VMs. Provides the ability to query and combine data from a variety of data sources
  24. 24. – Its keywords such as SELECT have to be in UPPERCASE. – Its expression language inside SELECT clauses, WHERE predicates etc is C#. – This for example means, that the comparison operations inside a predicate follow C# syntax (e.g., a == "foo"), – and that the language uses C# null semantics which is 2-valued and not 3- valued as in ANSI SQL. It’s NOT ANSI SQL
  25. 25. • Azure Data Lake Analytics provides U-SQL for batch processing. • U-SQL is written and executed in form of a batch script. • U-SQL also supports data definition statements such as CREATE TABLE to create metadata artifacts either in separate scripts or sometimes even in combination with the transformation scripts. • U-SQL Scripts can be submitted in a variety of ways. - Directly from within the Azure Data Lake Tools for Visual Studio, - From the Azure Portal - Programmatically via the Azure Data Lake SDK job submission API - Azure Powershell extension's job submission command How does a U-SQL Script process Data?
  26. 26. It follows the following general processing pattern: • Retrieve data from stored locations in rowset format - Stored locations can be files that will be schematized on read with EXTRACT expressions - Stored locations can be U-SQL tables that are stored in a schematized format - Or can be tables provided by other data sources such as an Azure SQL database. • Transform the rowset(s) - Several transformations over the rowsets can be composed in a data flow format • Store the transformed rowset data - Store it in a file with an OUTPUT statement, or - Store it in a U-SQL table with an INSERT statement How does a U-SQL Script process Data?
  27. 27. DECLARE @in string = "/Samples/Data/SearchLog.tsv"; DECLARE @out string = "/output/result.tsv"; @searchlog = EXTRACT UserId int, Start DateTime, Region string, Query string, Duration int?, Urls string, ClickedUrls string FROM @in USING Extractors.Tsv(); @rs1 = SELECT Start, Region, Duration FROM @searchlog WHERE Region == "en-gb"; @rs1 = SELECT Start, Region, Duration FROM @rs1 WHERE Start >= DateTime.Parse("2012/02/16"); OUTPUT @rs1 TO @out USING Outputters.Tsv(); U-SQL Scripts
  28. 28. DEMO – Create Data Lake Stores – Create Data Lake Analytics accounts and connect them to Data Lake Stores – Import data into Azure Data Lake Stores – Run U-SQL jobs in Azure Data Lake Analytics
  29. 29. Ask your Questions
  30. 30. ☺ Thank you
  31. 31. SELECT KNOWLEDGE FROM SQL SERVER Copyright © 2017 SQLschool.gr. All right reserved. PRESENTER MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION

×