Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Microsoft cloud big data strategy

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Próximo SlideShare
Azure data platform overview
Azure data platform overview
Cargando en…3
×

Eche un vistazo a continuación

1 de 58 Anuncio

Microsoft cloud big data strategy

Descargar para leer sin conexión

Think of big data as all data, no matter what the volume, velocity, or variety. The simple truth is a traditional on-prem data warehouse will not handle big data. So what is Microsoft’s strategy for building a big data solution? And why is it best to have this solution in the cloud? That is what this presentation will cover. Be prepared to discover all the various Microsoft technologies and products from collecting data, transforming it, storing it, to visualizing it. My goal is to help you not only understand each product but understand how they all fit together, so you can be the hero who builds your companies big data solution.

Think of big data as all data, no matter what the volume, velocity, or variety. The simple truth is a traditional on-prem data warehouse will not handle big data. So what is Microsoft’s strategy for building a big data solution? And why is it best to have this solution in the cloud? That is what this presentation will cover. Be prepared to discover all the various Microsoft technologies and products from collecting data, transforming it, storing it, to visualizing it. My goal is to help you not only understand each product but understand how they all fit together, so you can be the hero who builds your companies big data solution.

Anuncio
Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

Anuncio

Similares a Microsoft cloud big data strategy (20)

Más de James Serra (15)

Anuncio

Más reciente (20)

Microsoft cloud big data strategy

  1. 1. About Me  Microsoft, Big Data Evangelist  In IT for 30 years, worked on many BI and DW projects  Worked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM architect, PDW/APS developer  Been perm employee, contractor, consultant, business owner  Presenter at PASS Business Analytics Conference, PASS Summit, Enterprise Data World conference  Certifications: MCSE: Data Platform, Business Intelligence; MS: Architecting Microsoft Azure Solutions, Design and Implement Big Data Analytics Solutions, Design and Implement Cloud Data Platform Solutions  Blog at JamesSerra.com  Former SQL Server MVP  Author of book “Reporting with Microsoft SQL Server 2012”
  2. 2. Agenda  Big data defined  Microsoft big data solution  Azure data lake
  3. 3. Big Data is changing traditional data warehousing … data warehousing has reached the most significant tipping point since its inception. The biggest, possibly most elaborate data management system in IT is changing. – Gartner, “The State of Data Warehousing”* * Donald Feinberg, Mark Beyer, Merv Adrian, Roxane Edjlali (Gartner), The State of Data Warehousing in 2012 (Stamford, CT.: Gartner, 2012) Data sources ETL Data warehouse BI and analytics
  4. 4. Big Data has new data characteristics Data complexity: variety and velocity Petabytes
  5. 5. Big Data is driving transformative changes Traditional Big Data Relational data with highly modeled schema All data with schema agility Specialized HW Commodity HW Data characteristics Costs Culture Operational reporting Focus on rear-view analysis Experimentation leading to intelligent action With machine learning, graph, a/b testing
  6. 6. Big Data introduces new culture of experimentation Understand customer patterns to uncover cross-sell opportunities Historical campaign effectiveness Generate year-end financial reports Financial monitoring with real-time recommendations to increase revenue Generate year-end financial reports Real-time product offers and promotions based on behavior Collect historical data on equipment performance Real-time monitoring to identify proactive maintenance Shipping features without understanding success Building successful features correlating user action with product experience
  7. 7. Action Value From data to decisions and actions
  8. 8. However, there are challenges to Big Data… Obtaining skills and capabilities Determining how to get value Integrating with existing IT investments *Gartner: Survey Analysis – Hadoop Adoption Drivers and Challenges (Stamford, CT.: Gartner, 2015)
  9. 9. But, Microsoft has done it before We needed to better leverage data and analytics to do more experimentation So we: • Designed a data lake for everyone to put their data into • Built tools approachable by any developer • Created machine learning tools for collaborating across large experiment models Result: • Across Microsoft, ten thousand developers doing experimentation leading to better insights • Leading to growth in our Microsoft businesses: • Office productivity revenue (45%YoY)* • Intelligent Cloud (100% YoY)* • Bing search share doubles 2010 2011 2012 2013 2014 2015 Growth of data @ Microsoft Windows SMSG Live Bing CRM/Dynamics Xbox Live Office365 Malware Protection Microsoft Stores Commerce Risk Skype LCA Exchange Yammer PetabytesExabytes * Microsoft. FY16 Q4 Results, URL: http://www.microsoft.com/en-us/Investor/earnings/FY-2016-Q4/press-release-webcast
  10. 10. Microsoft is now taking everything we’ve learned on this journey and bringing it to our customers Technology. Cost. Culture.
  11. 11. Big Data as a cornerstone of Cortana Intelligence Action People Automated Systems Apps Web Mobile Bots Intelligence Dashboards & Visualizations Cortana Bot Framework Cognitive Services Power BI Information Management Event Hubs Data Catalog Data Factory Machine Learning and Analytics HDInsight (Hadoop and Spark) Stream Analytics Intelligence Data Lake Analytics Machine Learning Big Data Stores Data Lake Store Data Sources Apps Sensors and devices Data SQL Data Warehouse
  12. 12. CONTROL EASE OF USE Azure Data Lake Analytics Azure Data Lake Store Azure Storage Any Hadoop technology Workload optimized, managed clusters Specific apps in a multi- tenant form factor Azure Marketplace HDP | CDH | MapR Azure Data Lake Analytics IaaS Hadoop Managed Hadoop Big Data as-a-service Azure HDInsight BIGDATA STORAGE BIGDATA ANALYTICS Bringing Big Data to everybody Accelerate the pace of innovation through a state-of-the-art cloud platform UserAdoption
  13. 13. Microsoft Big Data Portfolio SQL Server Stretch Business intelligence Machine learning analytics Insights Azure SQL Database SQL Server 2016 SQL Server 2016 Fast Track Azure SQL DW Azure Data Lake DocumentDB HDInsight Hadoop Analytics Platform System Sequential Scale Out + AcrossScale Up Key Relational Non-relational On-premisesCloud Microsoft has solutions covering and connecting all four quadrants – that’s why SQL Server is one of the most utilized databases in the world 16
  14. 14. Azure HDInsight A Cloud Spark and Hadoop service for the Enterprise Reliable with an industry leading SLA Enterprise-grade security and monitoring Productive platform for developers and scientists Cost effective cloud scale Integration with leading ISV applications Easy for administrators to manage 63% lower TCO than deploy your own Hadoop on-premises* *IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight”
  15. 15. Hortonworks Data Platform (HDP) 2.5 Simply put, Hortonworks ties all the open source products together (22) (under the covers of HDInsight)
  16. 16. Azure Data Lake Store A No limits Data Lake that powers Big Data Analytics Petabyte size files and Trillions of objects Scalable throughput for massively parallel analytics HDFS for the cloud Always encrypted, role-based security & auditing Enterprise-grade support
  17. 17. Azure Data Lake Analytics A No limits Analytics Job Service to power intelligent action Start in seconds, scale instantly, pay per job Develop massively parallel programs with simplicity Debug and optimize your big data programs with ease Virtualize your analytics Enterprise-grade security, auditing and support
  18. 18. Azure Data Lake YARN U-SQL Analytics HDInsight Hive R Server HDFS Store Store and analyze data of any kind and size Develop faster, debug and optimize smarter Interactively explore patterns in your data No learning curve Managed and supported Dynamically scales to match your business priorities Enterprise-grade security Built on YARN, designed for the cloud
  19. 19. Azure SQL Data Warehouse A relational data warehouse-as-a-service, fully managed by Microsoft. Industries first elastic cloud data warehouse with enterprise-grade capabilities. Integrated with on-premises and cloud assets. Simple compute & storage billing Pay for what you need High performance without rewriting applications Low cost for latent data Infrastructure, management and support provided Scales to petabytes of data with MPP processing Resize compute nodes < 1 minute Faster time to insight than other SMP offering Designed for “on-demand” workload Integrated with Azure platform and other Microsoft services Enables hybrid solutions Built on SQL Server experience & technology
  20. 20. PolyBase Query relational and non-relational data with T-SQL By preview early this year PolyBase will support Teradata, Oracle, SQL Server, MongoDB, Hadoop and Azure blob storage
  21. 21. Publish-subscribe data distribution Managed PaaS (Platform as a Service) solution Scales with your needs to millions of events per second Provides a durable buffer between event publishers and event consumers Azure Event Hubs
  22. 22. Azure Stream Analytics Process real-time data in Azure Consumes millions of real-time events from Event Hub collected from devices, sensors, infrastructure, and applications Performs time-sensitive analysis using SQL-like language against multiple real-time streams and reference data Outputs to persistent stores, dashboards or back to devices Point of Service Devices Self Checkout Stations Kiosks Smart Phones Slates/ Tablets PCs/ Laptops Servers Digital Signs Diagnostic EquipmentRemote Medical Monitors Logic Controllers Specialized DevicesThin Clients Handhelds Security POS Terminals Automation Devices Vending Machines Kinect ATM
  23. 23. Azure Machine Learning Get started with just a browser Requires no provisioning; simply log on to your Azure subscription or try it for free off azure.com/ml Experience the power of choice Choose from hundreds of algorithms and packages from R and Python or drop in your own custom code Take advantage of business-tested algorithms from Xbox and Bing Deploy solutions in minutes With the click of a button, deploy the finished model as a web service that can connect to any data, anywhere Connect to the world Brand and monetize solutions on our global Machine Learning Marketplace https://datamarket.azure.com/ Beyond business intelligence – machine intelligence Microsoft Azure Machine Learning Studio Modeling environment (shown) Microsoft Azure Machine Learning API service Model in production as a web service Microsoft Azure Machine Learning Marketplace APIs and solutions for broad use
  24. 24. Enable enterprise-wide self-service data source registration and discovery A metadata repository that allow users to register, enrich, understand, discover, and consume data sources Delivers differentiated value though ‒ Data source discovery; rather than data discovery ‒ Support for data from any source; Structured and unstructured, on premises and in the cloud ‒ Publishing, discovery and consumption through any tool ‒ Annotation crowdsourcing: empowering any user to capture and share their knowledge. This, while allowing IT to maintain control and oversight
  25. 25. Azure Data Factory Connect to relational or non- relational data that is on- premises or in the cloud Orchestrate data movement & data processing Publish to Power BI users as a searchable data view Operationalize (schedule, manage, debug) workflows Lifecycle management, monitoring Orchestrate trusted information production in Azure Microsoft Confidential – Under Strict NDA C# MapReduce Hive Pig Stored Procedures Azure Machine Learning
  26. 26. Discovery & exploration – Custom visualizations— R integration -
  27. 27. 146.03K145.84K145.96K146.06K 40.08K38.84K39.99K40.33K
  28. 28. www.botframework.com
  29. 29. Microsoft Cognitive Services Give your apps a human side Cognitive Services API Collection
  30. 30. Azure Analysis Services Azure Analysis Services is based on the proven analytics engine that has helped organizations turn complex data into a trusted, single source of truth for years. Built for hybrid data Access and model data on-premises, in the cloud, or both Interactive visualization Quick, highly interactive self-service data discovery with support of major data visualization tools Proven technology Powerful, proven tabular models built from SQL Server 2016 Analysis Services Cloud powered Easy to deploy, scale, and manage as a platform-as- a-service solution
  31. 31. SQL Server R Services Linux Hadoop Teradata Windows CommercialCommunity R ServerR Open
  32. 32. Fully managed database service built on a native JSON data model Application controlled schema with massive scale-out enables iterative development and evolving data models Automatic indexing enables robust querying over schema-free data Integrated transactional JavaScript processing + tunable consistency enable high performance application experiences Azure DocumentDB
  33. 33. SQL Server on Linux (Preview today, GA in mid-2017) Red Hat - Microsoft Partnership (Nov 2015) Microsoft joins Eclipse Foundation (Mar 2016). HD Insight PaaS on Linux GA (Sep 2015) C:Usersmarkhill> root@localhost: # bash Azure Marketplace 60% of all images in Azure Marketplace are based on Linux/OSS In partnership with the Linux Foundation, Microsoft releases the Microsoft Certified Solutions Associate (MCSA) Linux on Azure certification. 493,141,677 ?????? Microsoft Open Source Hub Ross Gardler: President Apache Software Foundation Wim Coekaerts: Oracle’s Mr Linux 1 out of 4 VMs on Azure runs Linux, and getting larger every day • 28.9% of All VMs are Linux • >50% of new VMs
  34. 34. Azure Data Lake Big Data made easy Analytics on any data, any size Easier and more productive for all users Enterprise-ready
  35. 35. Azure Data Lake Big Data made easy Analytics on any data, any size Easier and more productive for all users Enterprise-ready
  36. 36. Petabyte size files and Trillions of objects • Store data in it’s native format • PB sized files, 200x larger than anyone else • Scalable throughput for massively parallel analytics • No need to redesign application or reparation data at higher scale TBs EBs Store
  37. 37. Any type of analytics • Batch, interactive, streaming, machine learning • Allows for exploratory analytics over data • Analyze with Hadoop and Microsoft solutions Cortana Intelligence Suite YARN U-SQL Analytics HDInsight HDFS Store Hive R Server
  38. 38. Start in seconds, Scale instantly, Pay per job with Analytics • Process big data jobs in 30 seconds • No infrastructure to worry about (no servers, no VMs, no clusters) • Instantly scale analytic units up or down (processing power) • Architected for cloud scale and performance • Frees you up to focus only on your business logic
  39. 39. Azure Data Lake Big Data made easy Analytics on any data, any size Easier and more productive for all users Enterprise-ready
  40. 40. Easy for administrators to spin up quickly • Deploy big data projects in minutes • No hardware to install, tune, configure or deploy • No infrastructure or software to manage • Scale to tens to thousands of machines instantly
  41. 41. Debug and Optimize your Big Data programs with ease • Deep integration with Visual Studio, Visual Studio Code, Eclipse, & IntelliJ • Easy for novices to write simple queries • Integrated with U-SQL, Hive, Storm, and Spark • Actively offers recommendations to improve performance and reduce cost • Playback visually displays job run
  42. 42. Develop massively parallel programs with simplicity • U-SQL: a simple and powerful language that’s familiar and easily extensible • Unifies the declarative nature of SQL with expressive power of C# • Leverage existing libraries in .NET languages, R and Python • Massively parallelize code on diverse workloads (ETL, ML, image tagging, facial detection)
  43. 43. Query data where it lives Easily query data in multiple Azure data stores without moving it to a single store Benefits • Avoid moving large amounts of data across the network between stores (federated query/logical data warehouse) • Single view of data irrespective of physical location • Minimize data proliferation issues caused by maintaining multiple copies • Single query language for all data • Each data store maintains its own sovereignty • Design choices based on the need • Push SQL expressions to remote SQL sources • Filters • Joins U-SQL Query Query Azure Storage Blobs Azure SQL in VMs Azure SQL DB Azure Data Lake Analytics Azure SQL Data Warehouse Azure Data Lake Storage
  44. 44. Easy for data scientists with familiar R language R Server for HDInsight • Largest portable R parallel analytics library • Terabyte-scale machine learning—1,000x larger than in open source R • Up to 100x faster performance using Spark and optimized vector/math libraries • Enterprise-grade security and support *Applies to HDInsight only
  45. 45. Azure Data Lake Big Data made easy Analytics on any data, any size Easier and more productive for all users Enterprise-ready
  46. 46. Highest availability guarantee in the industry for peace of mind • Managed, monitored and supported by Microsoft • Enterprise-leading SLA— 99.9% uptime • No IT resources needed for upgrades and patching • Microsoft monitors your deployment so you don’t have to 99.9% SLA
  47. 47. Azure Regions 38 Regions Worldwide, 32 Generally Available  100+ datacenters  Top 3 networks in the world  2.5x AWS, 7x Google DC Regions  G Series – Largest VM in World, 32 cores, 448GB Ram, SSD…
  48. 48. Always encrypted, Role-based security & Auditing • Always encrypted; in motion using SSL, and at rest using keys in Azure Key Vault • Single sign-on, multi-factor authentication and seamless integration of on-premises identities with Active Directory • Fine-grained POSIX-based ACLs for role-based access controls • Auditing every access / configuration change
  49. 49. Lower total cost of ownership • No hardware • Hadoop support included with Azure support • Pay only for what you use • Independently scale storage and compute • No need to hire specialized operations team • 63% lower total cost of ownership than on-premises* *IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight”
  50. 50. Recognized by top analysts Forrester Wave for Big Data Hadoop Cloud • Named industry leader by Forrester with the most comprehensive, scalable, and integrated platforms* • Recognized for its cloud-first strategy that is paying off* *The Forrester WaveTM: Big Data Hadoop Cloud Solutions, Q2 2016.
  51. 51. Q & A ? James Serra, Big Data Evangelist Email me at: JamesSerra3@gmail.com Follow me at: @JamesSerra Link to me at: www.linkedin.com/in/JamesSerra Visit my blog at: JamesSerra.com (where this slide deck is posted under the “Presentations” tab)

Notas del editor

  • Think of big data as all data, no matter what the volume, velocity, or variety. The simple truth is a traditional on-prem data warehouse will not handle big data. So what is Microsoft’s strategy for building a big data solution? And why is it best to have this solution in the cloud? That is what this presentation will cover. Be prepared to discover all the various Microsoft technologies and products from collecting data, transforming it, storing it, to visualizing it. My goal is to help you not only understand each product but understand how they all fit together, so you can be the hero who builds your companies big data solution.
  • Fluff, but point is I bring real work experience to the session
  • Key goal of slide: To convey what every IT person knows: The data warehouse and what’s it for. Then we set-up the Gartner quote to say that there is a tipping point. End the slide with a question: Why is it at a tipping point?
     
    Slide talk track:
    What is the “traditional” data warehouse?
    IT professionals know this well. A data warehouse or an enterprise data warehouse is a database that was designed specifically for data analysis. It is the single source of truth or the central repository for all data in the company. This means disparate data in the company coming from your transactional systems, your ERP, CRM or Line of Business applications would all be extracted, transformed, and cleansed and put into the warehouse. It was built so that the people who is accessing the warehouse using BI tools will be accessing data that has been provisioned by IT and represent accurate data sanctioned by the company.

    However, this traditional data warehouse is reaching an inflection point. Gartner in their analysis of the state of data warehousing noted that it is reaching the most significant tipping point since it’s inception. The question is why? What is going on?
  • Data is now the key strategic business asset. Every device, every customer, every activity – everything that’s happening in the world around us - is producing incredibly rich data that can help us create new experiences, new efficiencies, new business models and even new inventions. Leveraging this data can be the differentiator for your business. For example, IDC estimates companies that are leaders in using data assets to their advantage will capture $1.6 trillion more in business value than those that lag behind.
     
    While data is pervasive, actionable intelligence from data is elusive. Our customers want to transform data to intelligent action and reinvent their business processes. To do this they need to more easily analyze massive amounts of data – so they can move from seeing “what happened” and understanding “why it happened” to predicting “what will happen” and ultimately, knowing “what should I do”. Only then can they create the intelligent enterprise.
  • Result:
    Used across Microsoft in Office, Xbox Live, Azure, Windows, Bing and Skype
    Supports ten thousand developers running experimentations
    Manages exabytes of data

    https://www.microsoft.com/en-us/Investor/earnings/FY-2016-Q4/press-release-webcast
  • Everything: technology, cost, culture
  • Our portfolio of products provides customers with the power to deploy the solution that suits their business needs.

    Your choice of platform, whether on-premises, hybrid or private or public cloud, doesn’t limit you now or in the future. Migrating or expanding becomes an easy process and doesn’t require excessive downtime or introduce potential threats to your business success.

    With Microsoft, you can seamlessly scale up to larger processing and storage capabilities, or scale out by adding additional servers in parallel arrangement.

    T: SQL Server is a trusted market leader, and it’s the cornerstone of our data warehouse offering.
  • Reliable Open Source analytics with an Industry leading SLA
    HDInsight allows you to easily spin up enterprise-grade open source cluster types guaranteed with the industry’s best 99.9% SLA and 24/7 support. We guarantee this SLA for the entire big data solution, not just the VM instances. HDInsight is architected for full redundancy and high availability including head node replication, data geo-replication, and built-in standby NameNode making HDInsight resilient to critical failures not addressed in standard Hadoop implementations. Azure also offers cluster monitoring and 24x7 enterprise support backed by Microsoft and Hortonworks with 37 combined committers for Hadoop core, more than all other managed cloud providers combined to support your deployment and the ability to fix and commit code back to Hadoop.

    Enterprise Grade Security & Monitoring
    HDInsight protects your data assets and easily extends your on-premise security and governance controls to the cloud. We feature single sign-on (SSO), multi-factor authentication and seamless management of millions of identities through Azure Active Directory. You can authorize users and groups with fine-grained access control policies over all your enterprise data with Apache Ranger. HDInsight meets HIPAA, PCI, SOC compliance, ensuring your enterprise data assets are always protected with the highest security and regulatory compliance. To ensure the highest level of business continuity, HDInsight extends capabilities for alerting, monitoring, defining pre-emptive actions, and enhanced workload protection through native integration with Azure Operations Management Suite (OMS).
    Most Productive platform for developers and scientists
    HDInsight offers developers tailored experiences through rich productivity suites for Hadoop & Spark with integrated development environments using Visual Studio, Eclipse, and IntelliJ supporting Scala, Python, R, Java, and .Net. HDInsight gives data scientists the ability to create narratives that combine code, statistical equations, and visualizations that tell a story about the data through integration to the two most popular notebooks: Jupyter and Zeppelin. HDInsight is also the only managed cloud Hadoop solution with integration to Microsoft R Server. Multi-threaded math libraries and transparent parallelization in R Server means handling up to 1000x more data and up to 50x faster speeds than open source R—helping you train more accurate models for better predictions than previously possible.

    Cost effective cloud scale
    HDInsight has decoupled compute and storage, enabling you to cost-effectively scale workloads up or down, independent of storage. Local storage can still be used for caching and fast I/O. Spark and interactive Hive users can choose SSD memory for interactive performance; while Kafka users can retain all streaming data in premium managed disks. You only pay for the compute and storage you use and are given the ability to choose any Azure VM types that enables the best utilization of resources. A recent study showed HDInsight delivering 63% lower TCO than deploying Hadoop on premises over 5 years.*
    Integration with leading Productivity Applications
    In the broader ecosystem for Hadoop, there is a thriving market of independent software vendors (ISVs) who provide value added solutions. Through a unique design where every cluster is extended with edge nodes and script action, HDInsight lets customers spin up Hadoop and Spark clusters pre-integrated and pre-tuned with any ISV application out-of-the-box. Datameer, Cask, AtScale, StreamSets are few such applications, which are very popular on the HDInsight platform today.

    Easy for administrators to manage
    With HDInsight, administrators can deploy Hadoop in the cloud without buying new hardware or incurring other up-front costs. There’s also no time-consuming installation or set up. There is also no need to patch the operating system or upgrade the Hadoop versions. Azure does it for you. Launch your first cluster in minutes.
  • Petabyte size files and Trillions of objects: With Azure Data Lake Store your organization can analyze all of its data in a single place with no artificial constraints. Your Data Lake Store can store trillions of files where a single file can be greater than a petabyte in size which is 200x larger than other cloud stores. This makes Data Lake Store ideal for storing any type of data including massive datasets like high-resolution video, genomic and seismic datasets, medical data, and data from a wide variety of industries.
    Scalable throughput  for massively parallel analytics: Without redesigning your application or repartitioning your data at higher scale, Data Lake Store scales throughput to support any size of analytic workload. It provides massive throughput to run analytic jobs with 1,000+ concurrent executors that read and write hundreds of terabytes of data efficiently.
    HDFS for the Cloud: Microsoft Azure Data Lake Store supports any application that uses the open Apache Hadoop Distributed File System (HDFS) standard. By supporting HDFS, you can easily migrate your existing Hadoop and Spark data to the cloud without recreating your HDFS directory structure.
    Always encrypted, Role-based security & Auditing: Data Lake Store protects your data assets and extends your on-premises security and governance controls to the cloud easily.  Data is always encrypted; in motion using SSL, and at rest using service or user managed HSM-backed keys in Azure Key Vault.  Capabilities such as single sign-on (SSO), multi-factor authentication and seamless management of millions of identities is built-in through Azure Active Directory. You can authorize users and groups with fine-grained POSIX-based ACLs for all data in the Store enabling role-based access controls. Finally, you can meet security and regulatory compliance needs by auditing every access or configuration change to the system.
    Enterprise-grade Support: We guarantee a 99.9% enterprise-grade SLA and 24/7 support for your big data solution.
  • Start in seconds, Scale instantly, Pay per job: Our on-demand service will have you processing Big Data jobs within 30 seconds. There is no infrastructure to worry about because there are no servers, VMs, or clusters to wait for, manage or tune. You can instantly scale the analytic units (processing power) from one to thousands for each job. You only pay for the processing used per job.
    Develop massively parallel programs with simplicity: U-SQL is a simple, expressive, and extensible language that allows you to write code once and automatically have it be parallelized for the scale you need. You can process petabytes of data for diverse workload categories such as ETL, machine learning, cognitive science, machine translation, imaging processing, and sentiment analysis by using U-SQL and leveraging existing libraries written in .NET languages, R, or Python..
    Debug and Optimize your Big Data programs with ease: Debugging failures in cloud distributed programs are now as easy as debugging a program in your personal environment. Our execution environment actively analyzes your programs as they run and offers recommendations to improve performance and reduce cost. For example, if you requested 1000 AUs for your program and only 50 AUs were needed, the system would recommend that you only use 50 AUs resulting in a 20x cost savings.
    Virtualize your analytics: The power to act on all your data with optimized data virtualization of your relational sources such as Azure SQL Server on VMs, Azure SQL Database, and Azure SQL Data Warehouse. Queries are automatically optimized by moving processing close to the source data, without data movement, thereby maximizing performance and minimizing latency.
    Enterprise-grade Security, Auditing and Support: Extend your on-premises security and governance controls to the cloud for meeting your security and regulatory compliance needs. Capabilities such as single sign-on (SSO), multi-factor authentication and seamless management of millions of identities is built-in through Azure Active Directory. Role Based Access control, and the ability to audit all processing and management operations are on by default. We guarantee a 99.9% enterprise-grade SLA and 24/7 support for your big data solution.
  • 22
  • We are planning to release a preview of this functionality early next year as part of SQL Server V.Next CTPs, exact release dates are still in flux.
    By preview early next year PolyBase will support Teradata, Oracle, SQL Server, MongoDB, Hadoop and Azure blob storage (not MySQL!). We will continue to add more sources until GA.

    http://demo.sqlmag.com/scaling-success-sql-server-2016/integrating-big-data-and-sql-server-2016

    When it comes to key BI investments we are making it much easier to manage relational and non-relational data with Polybase technology that allows you to query Hadoop data and SQL Server relational data through single T-SQL query. One of the challenges we see with Hadoop is there are not enough people out there with Hadoop and Map Reduce skillset and this technology simplifies the skillset needed to manage Hadoop data. This can also work across your on-premises environment or SQL Server running in Azure.
  • Comparison of IoT Hub and Event Hubs: https://azure.microsoft.com/en-us/documentation/articles/iot-hub-compare-event-hubs/
  • Azure Stream Analytics is a cost effective event processing engine that helps uncover real-time insights from devices, sensors, infrastructure, applications, and data. It will enable various opportunities including Internet of Things (IoT) scenarios such as real-time fleet management or gaining insights from devices like mobile phones and connected cars. Deployed in the Azure cloud, Stream Analytics has elastic scale where resources are efficiently allocated and paid for as requested. Developers are given a rapid development experience where they describe their desired transformations in SQL and the system abstracts the complexities of the parallelization, distributed computing, and error handling from them.

    Looking forward into H2 FY15, Stream Analytics will become generally available after previewing at TechEd EMEA 2014.
  • Microsoft’s Big Data vision in the cloud is to enable organizations to solve large, complex problems end-to-end, from storing and managing TBs of data without investing in hardware and software, to seamless integration with the 1 billion users of Excel. As part of this vision, Microsoft offers Azure Machine Learning, designed to democratize the complex task of advanced analytics.

    Advanced analytics is using products like Azure Machine Learning to find new and actionable insights that traditional approaches to business intelligence are unlikely to discover. An easy way to think about this is thinking about a dashboard. Today when confined by only BI tools without a connection to machine learning, it is solely the job of the human looking at the spreadsheet to gain insights and react to the data. But a human can only consume so many variables. A computer, on the other hand, can consume a great deal more variables to provide much deeper insight on the data. Humans can then react to the data to make decisions that drive competitive advantage, as well as program the computer further to recognize important patterns in the future. This is why we say beyond business intelligence – machine intelligence.

    The accessibility of our solution starts with set up. Previously you needed to provision your workspace on-premises for machine learning, also thinking about server space and a host of other considerations. Today you can get started with just a browser. With only an Azure subscription, you can take advantage of the full functionality of Azure Machine Learning within minutes. Taking a test drive is even easier, click Get Started off azure.com/ml and with simply a Microsoft ID you’re off to the races.

    Another limit with other machine learning solutions are siloed environments that only allow for one programming language or make changing from one algorithm to another time consuming and complex. With Azure ML, you can experience the power of choice. That choice expands to language, with both Python and R being first class citizens of Azure ML, or algorithm. You can choose from hundreds of algorithms, including business-tested ones running our Microsoft businesses today. And swapping out algorithms to land on the right one for you is done with a click. Additionally you can drop in custom R and Python code – your “special sauce” – and mix and match that with the other options in the tool.
    Most revolutionary of all you can deploy solutions in minutes as a web service, which is simply a url which can connect to any data, anywhere – including on-premises or in another cloud environment. The ability to put a model into production almost immediately, as well as revise it easily, is unique to Microsoft and allows companies to stay on top of the changing business landscape more effectively than is offered by any other provider today.

    We even take that a step further, allowing model developers to connect to the world with our Machine Learning Marketplace, where they can publish finished solutions and APIs with their own brand and business model. Developers can also discover machine learning solutions there without any machine learning skills needed – the data science is inside. Check it out at https://datamarket.azure.com/.

  • Azure Data Factory is a cloud service for creating, managing, and monitoring the production of trusted information from on-premises and cloud data sources using transformative analytics at scale. Data Factory can be used in solutions to gain insights from operational and service health telemetry data, analyze customer actions to determine an optimal targeted marketing strategy, or predict customer churn from customer profile and service log data. Instead of writing hard-to-manage custom code to wire together a data warehouse with Hadoop, NoSQL, and SaaS, use Data Factory to quickly create and deploy highly available data processing pipelines, significantly cutting your time to solution and your operational costs. Get a single monitoring view of all of your data processing pipelines along with data lineage and service health. Bring together on-premises data like SQL Server and cloud data like Azure SQL Database, Blobs, and Tables with the transformative analytics of HDInsight (Hive, Pig, MapReduce, custom .NET code), and even Azure Machine Learning, to produce trusted information that is easily consumed by BI tools or applications.

    Looking forward into H2 FY15, Data Factory will become generally available after previewing at TechEd EMEA 2014.
  • Power BI Desktop is a self-service BI tool designed to allow users to pull data together from multiple different data sources. Transform and clean that data. Model and add custom calculations. And then visually explore and create interactive reports that can be easily published and shared through the Power BI service.

    In addition – you can now create your own custom visualizations though our open source visualization framework. More information available at powerbi.com/visuals
  • Power BI dashboards
    With updates to Power BI customers can now see all their data through a single pane of glass. Live Power BI dashboards show visualizations and KPIs from data that reside both on-premises and in the cloud, providing a consolidated view across their business regardless of where their data lives.

    You can then explore their data further by drilling through the dashboard into the underlying reports, discovering new insights that they can pin back to the dashboard to monitor performance going forward.
  • Natural Language Interface - With Power BI we continue to find new ways to simplify how people analyze and gain insight from data, providing industry leading features such as natural language query. Natural language query provides users with an easier way to interact with their data, allowing them to type questions of their data and receive answers in the form of live visualizations. Power BI integration with Cortana allows you to now ask these question directly from Cortana and to have answers from your Power BI data surfaced to you by Cortana. These data driven answers can range from simple numeric values (“revenue for the last quarter”), charts (“revenue over time”), maps (“revenue by region”) or data represented through any of the other Power BI data visualizations. Combined with the Cortana Analytics suite, this opens up amazing new opportunities to use Cortana to enable your business, and your customers' businesses, to get things done in more helpful, proactive, and natural ways.
     
    Quick Insights - providing a new ways to help users find hidden insights in their data. The new Quick Insights feature allows users to automatically scan and detect patterns and trends in the data that they publish to Power BI. Through a partnership with Microsoft Research, the Quick Insights feature uses a growing list of algorithms to automatically discover and visualize correlations, outliers, trends, seasonality, change points in trends, and other factors in your data in seconds.
     
  • Animation set to loop (replace /Build walk in ?), Add session id to top

    Bot Framework provides everything you need to build and connect intelligent bots that interact naturally wherever your users are talking, from text/sms to Skype, Slack, Office 365 mail and other popular services.

    Bot Framework consists of three main components: Bot Connector, Bot Builder, and Bot Directory
  • At Microsoft, we’ve been offering APIs for a very long time across the company.  In delivering Microsoft Cognitive Services API, we started with 4 last year at /build (2015); added 7 more last December, and today (May 2016) we have 22 APIs in our collection.
        
    Cognitive Services are available individually or as a part of the Cortana Intelligence Suite, formerly known as Cortana Analytics, which provides a comprehensive collection of services powered by cutting-edge research into machine learning, perception, analytics and social bots.

    These APIs are powered by Microsoft Azure.

    Developers and businesses can use this suite of services and tools to create apps that learn about our world and interact with people and customers in personalized, intelligent ways.  
  • Key points: Summarize key benefits for Azure Analysis Services

    Talk track:
    As already mentioned, Azure Analysis Services is based on the proven analytics engine in SQL Server 2016 Analysis Services, that has helped organizations turn complex data into a trusted, single source of truth for years.
    This means that BI professionals who are familiar with SQL Server Analysis Services, tabular models can get started quickly and do not need to learn new tools or skills.
    And with the power of the cloud, BI professionals do not need to manage infrastructure on-premises. They can easily deploy the BI solution and benefit from the scalability of the cloud.
    Organizations store data in the cloud and on-premises. Azure Analysis Services is built for hybrid data. Data can be access in the cloud, on-premises or a combination of both, enabling a hybrid solution. So - customers do not have to move on-premises data to the cloud.
    And last but not least. Azure Analysis Services enables interactive data visualization over billions of rows of data and as it supports BI industry standards such as XML/A and MDX, business users can access data using their preferred data visualization tool. Whether it is Power BI, Excel or other major data visualization tools.
    To summarize, Azure Analysis Services is simple to use – it is easy to get started, you can use your existing skills to create BI semantic models, and your favorite data visualizations tools to analyze your data.
  • Slide objective
    Show broad commitment to R by preserving freely available, enhanced editions, Windows and SQL Server editions and R Server editions for leading EDWs, Linux and Hadoop platforms.
    Differentiate free, open editions from commercial by mentioning availability of commercial 24x7 support, and enhancements to support very large scale data analytics at speed.

    Talking points


    Notes
  • Microsoft Azure DocumentDB is the highly-scalable NoSQL document database-as-a-service that
    enables query over schema-free data and multi-document transaction processing
    helps deliver configurable and reliable performance
    and enables rapid development

    DocumentDB is the right solution for applications that run in the cloud when predictable throughput, low latency, and flexible query are key.
    Fully managed PaaS database service backed by the power of Microsoft Azure. Unlike many other NoSQL offers, DocumentDB was built for the cloud to perform and scale in a multi-tenant environment. Cluster administration, replication, and other management functions are handled for the customer automatically. DocumentDB is backed by a 99.95% availability SLA (at GA) to provide consistent, reliable performance.
    Application controlled schema with massive scale-out enables iterative development and evolving data models. DocumentDB supports a schema-free data model where the application defines the data model. This supports modern application development scenarios where applications are developed iteratively with many versions supported concurrently and data models continuously evolve.
    Automatic indexing enables robust querying over schema-free data. DocumentDB is the first of its kind to offer SQL over schema-free JSON data and multi-document transactional processing.
    Integrated transactional JavaScript processing + tunable consistency enable high performance application experiences. DocumentDB supports stored procedures, triggers, and user-defined functions. It also supports tunable consistency with well-defined click stops to enable developers to tune database performance based on the application’s needs.

    The key scenarios for DocumentDB are the following:
    Emitting telemetry and logging data
    Storing/querying event and workflow data
    Persisting device and app configuration data
    User generated content
    Scalable, iterative app development
  • Petabyte size files and Trillions of objects: With Azure Data Lake Store your organization can analyze all of its data in a single place with no artificial constraints. Your Data Lake Store can store trillions of files where a single file can be greater than a petabyte in size which is 200x larger than other cloud stores. This makes Data Lake Store ideal for storing any type of data including massive datasets like high-resolution video, genomic and seismic datasets, medical data, and data from a wide variety of industries.

    Scalable throughput  for massively parallel analytics: Without redesigning your application or repartitioning your data at higher scale, Data Lake Store scales throughput to support any size of analytic workload. It provides massive throughput to run analytic jobs with 1,000+ concurrent executors that read and write hundreds of terabytes
  • Start in seconds, Scale instantly, Pay per job: Our on-demand service will have you processing Big Data jobs within 30 seconds. There is no infrastructure to worry about because there are no servers, VMs, or clusters to wait for, manage or tune. You can instantly scale the analytic units (processing power) from one to thousands for each job. You only pay for the processing used per job.
  • Debug and Optimize your Big Data programs with ease: Debugging failures in cloud distributed programs are now as easy as debugging a program in your personal environment. Our execution environment actively analyzes your programs as they run and offers recommendations to improve performance and reduce cost. For example, if you requested 1000 AUs for your program and only 50 AUs were needed, the system would recommend that you only use 50 AUs resulting in a 20x cost savings.
  • Develop massively parallel programs with simplicity: U-SQL is a simple, expressive, and extensible language that allows you to write code once and automatically have it be parallelized for the scale you need. You can process petabytes of data for diverse workload categories such as ETL, machine learning, cognitive science, machine translation, imaging processing, and sentiment analysis by using U-SQL and leveraging existing libraries written in .NET languages, R, or Python..
  • With Microsoft Azure HDInsight, Microsoft R Server is now available as an option when you create HDInsight clusters in Azure. This new capability provides data scientists, statisticians, and R programmers with on-demand access to scalable, distributed methods of analytics on HDInsight.

    Clusters can be sized to the projects and tasks at hand and torn down when they're no longer needed. Since they're part of Azure HDInsight, these clusters come with enterprise-level 24/7 support, an SLA of 99.9% uptime, and the flexibility to integrate with other components in the Azure ecosystem.

    R Server on HDInsight provides the latest capabilities for R-based analytics on datasets of virtually any size loaded to either Azure Blob or Data Lake storage. Since R Server is built on open source R, the R-based applications you build can leverage any of the 8000+ open source R packages, as well as the routines in ScaleR, Microsoft’s big data analytics package that's included with R Server.

    The edge node of a cluster provides a convenient place to connect to the cluster and to run your R scripts. With an edge node, you have the option of running ScaleR’s parallelized distributed functions across the cores of the edge node server. You also have the option to run them across the nodes of the cluster by using ScaleR’s Hadoop Map Reduce or Spark compute contexts.

    The models or predictions that result from analyses can be downloaded for use on-premises. They can also be operationalized elsewhere in Azure, such as through an Azure Machine Learning Studio web service.
  • https://azure.microsoft.com/en-us/regions/
  • Always encrypted, Role-based security & Auditing: Data Lake Store protects your data assets and extends your on-premises security and governance controls to the cloud easily.  Data is always encrypted; in motion using SSL, and at rest using service or user managed HSM-backed keys in Azure Key Vault.  Capabilities such as single sign-on (SSO), multi-factor authentication and seamless management of millions of identities is built-in through Azure Active Directory. You can authorize users and groups with fine-grained POSIX-based ACLs for all data in the Store enabling role-based access controls. Finally, you can meet security and regulatory compliance needs by auditing every access or configuration change to the system.
  • 1) Copy source data into the Azure Data Lake Store (twitter data example) 2) Massage/filter the data using Hadoop (or skip using Hadoop and use stored procedures in SQL DW/DB to massage data after step #5) 3) Pass data into Azure ML to build models using Hive query (or pass in directly from Azure Data Lake Store) 4) Azure ML feeds prediction results into the data warehouse 5) Non-relational data in Azure Data Lake Store copied to data warehouse in relational format (optionally use PolyBase with external tables to avoid copying data) 6) Power BI pulls data from data warehouse to build dashboards and reports 7) Azure Data Catalog captures metadata from Azure Data Lake Store and SQL DW/DB 8) Power BI and Excel can pull data from the Azure Data Lake Store via HDInsight 9) To support high concurrency if using SQL DW, or for easier end-user data layer, create an SSAS cube
  • Get it at //aka.ms/forresterwave

×