Think of big data as all data, no matter what the volume, velocity, or variety. The simple truth is a traditional on-prem data warehouse will not handle big data. So what is Microsoft’s strategy for building a big data solution? And why is it best to have this solution in the cloud? That is what this presentation will cover. Be prepared to discover all the various Microsoft technologies and products from collecting data, transforming it, storing it, to visualizing it. My goal is to help you not only understand each product but understand how they all fit together, so you can be the hero who builds your companies big data solution.
2. About Me
Microsoft, Big Data Evangelist
In IT for 30 years, worked on many BI and DW projects
Worked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM
architect, PDW/APS developer
Been perm employee, contractor, consultant, business owner
Presenter at PASS Business Analytics Conference, PASS Summit, Enterprise Data World conference
Certifications: MCSE: Data Platform, Business Intelligence; MS: Architecting Microsoft Azure
Solutions, Design and Implement Big Data Analytics Solutions, Design and Implement Cloud Data
Platform Solutions
Blog at JamesSerra.com
Former SQL Server MVP
Author of book “Reporting with Microsoft SQL Server 2012”
3. Agenda
Big data defined
Microsoft big data solution
Azure data lake
5. Big Data is changing
traditional data
warehousing
… data warehousing has reached the
most significant tipping point since
its inception. The biggest, possibly
most elaborate data management
system in IT is changing.
– Gartner, “The State of Data Warehousing”*
* Donald Feinberg, Mark Beyer, Merv Adrian, Roxane Edjlali (Gartner), The State of Data Warehousing in 2012 (Stamford, CT.: Gartner, 2012)
Data sources
ETL
Data warehouse
BI and analytics
6. Big Data has new data characteristics
Data complexity: variety and velocity
Petabytes
7. Big Data is driving transformative changes
Traditional Big Data
Relational data
with highly modeled schema
All data
with schema agility
Specialized HW Commodity HW
Data
characteristics
Costs
Culture
Operational reporting
Focus on rear-view analysis
Experimentation leading
to intelligent action
With machine learning, graph, a/b testing
8. Big Data introduces new culture of experimentation
Understand customer patterns to
uncover cross-sell opportunities
Historical campaign
effectiveness
Generate year-end financial
reports
Financial monitoring with real-time
recommendations to increase revenue
Generate year-end financial
reports
Real-time product offers and
promotions based on behavior
Collect historical data on
equipment performance
Real-time monitoring to
identify proactive maintenance
Shipping features without
understanding success
Building successful features
correlating user action with
product experience
10. However, there are challenges to Big Data…
Obtaining skills
and capabilities
Determining how
to get value
Integrating with
existing IT investments
*Gartner: Survey Analysis – Hadoop Adoption Drivers and Challenges (Stamford, CT.: Gartner, 2015)
11. But, Microsoft has done it before
We needed to better leverage data and analytics to do
more experimentation
So we:
• Designed a data lake for everyone to put their data into
• Built tools approachable by any developer
• Created machine learning tools for collaborating
across large experiment models
Result:
• Across Microsoft, ten thousand developers doing
experimentation leading to better insights
• Leading to growth in our Microsoft businesses:
• Office productivity revenue (45%YoY)*
• Intelligent Cloud (100% YoY)*
• Bing search share doubles
2010 2011 2012 2013 2014 2015
Growth of data @ Microsoft
Windows
SMSG
Live
Bing
CRM/Dynamics
Xbox Live
Office365
Malware Protection Microsoft Stores
Commerce Risk
Skype
LCA
Exchange
Yammer
PetabytesExabytes
* Microsoft. FY16 Q4 Results, URL: http://www.microsoft.com/en-us/Investor/earnings/FY-2016-Q4/press-release-webcast
12. Microsoft is now taking
everything we’ve
learned on this journey
and bringing it to our
customers
Technology. Cost. Culture.
14. Big Data as a cornerstone of Cortana Intelligence
Action
People
Automated
Systems
Apps
Web
Mobile
Bots
Intelligence
Dashboards &
Visualizations
Cortana
Bot
Framework
Cognitive
Services
Power BI
Information
Management
Event Hubs
Data Catalog
Data Factory
Machine Learning
and Analytics
HDInsight
(Hadoop and
Spark)
Stream Analytics
Intelligence
Data Lake
Analytics
Machine
Learning
Big Data Stores
Data Lake Store
Data
Sources
Apps
Sensors
and
devices
Data
SQL Data
Warehouse
15. CONTROL EASE OF USE
Azure Data Lake
Analytics
Azure Data Lake Store
Azure Storage
Any Hadoop technology
Workload optimized,
managed clusters
Specific apps in a multi-
tenant form factor
Azure Marketplace
HDP | CDH | MapR
Azure Data Lake
Analytics
IaaS Hadoop Managed Hadoop Big Data as-a-service
Azure HDInsight
BIGDATA
STORAGE
BIGDATA
ANALYTICS
Bringing Big Data to everybody
Accelerate the pace of innovation through a state-of-the-art cloud platform
UserAdoption
16. Microsoft Big Data Portfolio
SQL Server Stretch
Business intelligence
Machine learning analytics
Insights
Azure SQL Database
SQL Server 2016
SQL Server 2016 Fast Track
Azure SQL DW
Azure Data Lake
DocumentDB
HDInsight
Hadoop
Analytics Platform System
Sequential Scale Out + AcrossScale Up
Key
Relational Non-relational
On-premisesCloud
Microsoft has solutions covering
and connecting all four
quadrants – that’s why SQL
Server is one of the most utilized
databases in the world
16
17. Azure
HDInsight
A Cloud Spark and
Hadoop service for the
Enterprise
Reliable with an industry leading SLA
Enterprise-grade security and monitoring
Productive platform for developers and
scientists
Cost effective cloud scale
Integration with leading ISV applications
Easy for administrators to manage
63% lower TCO than deploy your own
Hadoop on-premises*
*IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight”
18. Hortonworks Data Platform (HDP) 2.5
Simply put, Hortonworks ties all the open source products together (22)
(under the covers of HDInsight)
19. Azure
Data Lake Store
A No limits Data Lake that
powers Big Data Analytics
Petabyte size files and Trillions of objects
Scalable throughput for massively parallel
analytics
HDFS for the cloud
Always encrypted, role-based security &
auditing
Enterprise-grade support
20. Azure
Data Lake Analytics
A No limits Analytics Job
Service to power intelligent
action
Start in seconds, scale instantly, pay per job
Develop massively parallel programs with
simplicity
Debug and optimize your big data programs
with ease
Virtualize your analytics
Enterprise-grade security, auditing and
support
21. Azure Data Lake
YARN
U-SQL
Analytics HDInsight
Hive R Server
HDFS
Store
Store and analyze data of any kind and size
Develop faster, debug and optimize smarter
Interactively explore patterns in your data
No learning curve
Managed and supported
Dynamically scales to match your business
priorities
Enterprise-grade security
Built on YARN, designed for the cloud
22. Azure SQL Data Warehouse
A relational data warehouse-as-a-service, fully managed by Microsoft.
Industries first elastic cloud data warehouse with enterprise-grade capabilities.
Integrated with on-premises and cloud assets.
Simple compute & storage billing
Pay for what you need
High performance without rewriting
applications
Low cost for latent data
Infrastructure, management and
support provided
Scales to petabytes of data with MPP processing
Resize compute nodes < 1 minute
Faster time to insight than other SMP offering
Designed for “on-demand” workload
Integrated with Azure platform and
other Microsoft services
Enables hybrid solutions
Built on SQL Server experience &
technology
23. PolyBase
Query relational and non-relational data with T-SQL
By preview early this year PolyBase will support Teradata, Oracle,
SQL Server, MongoDB, Hadoop and Azure blob storage
24. Publish-subscribe data
distribution
Managed PaaS (Platform
as a Service) solution
Scales with your needs to
millions of events per
second
Provides a durable buffer
between event publishers
and event consumers
Azure Event Hubs
25. Azure Stream Analytics
Process real-time data in Azure
Consumes millions of real-time events from Event Hub collected from devices, sensors, infrastructure,
and applications
Performs time-sensitive analysis using SQL-like language against multiple real-time streams and
reference data
Outputs to persistent stores, dashboards or back to devices
Point of
Service Devices
Self Checkout
Stations
Kiosks
Smart
Phones
Slates/
Tablets
PCs/
Laptops
Servers
Digital
Signs
Diagnostic
EquipmentRemote Medical
Monitors
Logic
Controllers
Specialized
DevicesThin
Clients
Handhelds
Security
POS
Terminals
Automation
Devices
Vending
Machines
Kinect
ATM
26. Azure Machine Learning
Get started with just a browser
Requires no provisioning; simply log
on to your Azure subscription or try
it for free off azure.com/ml
Experience the power of choice
Choose from hundreds of algorithms
and packages from R and Python or
drop in your own custom code
Take advantage of business-tested
algorithms from Xbox and Bing
Deploy solutions in minutes
With the click of a button, deploy
the finished model as a web service
that can connect to any data,
anywhere
Connect to the world
Brand and monetize solutions on
our global Machine Learning
Marketplace
https://datamarket.azure.com/
Beyond business intelligence – machine intelligence
Microsoft Azure
Machine Learning Studio
Modeling environment (shown)
Microsoft Azure
Machine Learning API service
Model in production as a web service
Microsoft Azure
Machine Learning Marketplace
APIs and solutions for broad use
27. Enable enterprise-wide self-service data source registration and discovery
A metadata repository that allow users to register, enrich,
understand, discover, and consume data sources
Delivers differentiated value though
‒ Data source discovery; rather than data discovery
‒ Support for data from any source; Structured and
unstructured, on premises and in the cloud
‒ Publishing, discovery and consumption through any tool
‒ Annotation crowdsourcing: empowering any user to
capture and share their knowledge.
This, while allowing IT to maintain control and oversight
28. Azure Data Factory
Connect to relational or non-
relational data that is on-
premises or in the cloud
Orchestrate data movement &
data processing
Publish to Power BI users as a
searchable data view
Operationalize (schedule,
manage, debug) workflows
Lifecycle management,
monitoring
Orchestrate trusted information production in Azure
Microsoft Confidential – Under Strict NDA
C#
MapReduce
Hive
Pig
Stored Procedures
Azure Machine Learning
35. Azure Analysis Services
Azure Analysis Services is based on the proven analytics engine that has helped
organizations turn complex data into a trusted, single source of truth for years.
Built for
hybrid data
Access and model
data on-premises,
in the cloud, or both
Interactive
visualization
Quick, highly interactive
self-service data discovery
with support of major
data visualization tools
Proven
technology
Powerful, proven tabular
models built from SQL Server
2016 Analysis Services
Cloud
powered
Easy to deploy, scale, and
manage as a platform-as-
a-service solution
37. Fully managed database service
built on a native JSON data model
Application controlled schema with
massive scale-out enables iterative
development and evolving data models
Automatic indexing enables robust
querying over schema-free data
Integrated transactional JavaScript
processing + tunable consistency enable
high performance application
experiences
Azure DocumentDB
38. SQL Server on Linux
(Preview today, GA in
mid-2017)
Red Hat - Microsoft
Partnership
(Nov 2015)
Microsoft joins Eclipse
Foundation (Mar 2016).
HD Insight PaaS on
Linux GA (Sep 2015)
C:Usersmarkhill>
root@localhost: #
bash
Azure Marketplace 60% of all images in
Azure Marketplace
are based on
Linux/OSS
In partnership with the Linux
Foundation, Microsoft releases the
Microsoft Certified Solutions Associate
(MCSA) Linux on Azure certification.
493,141,677 ?????? Microsoft Open Source Hub
Ross Gardler: President Apache Software
Foundation
Wim Coekaerts: Oracle’s Mr Linux
1 out of 4 VMs on Azure runs
Linux, and getting larger every
day
• 28.9% of All VMs are Linux
• >50% of new VMs
40. Azure Data Lake
Big Data made easy
Analytics on any data,
any size
Easier and more
productive for all users Enterprise-ready
41. Azure Data Lake
Big Data made easy
Analytics on any data,
any size
Easier and more
productive for all users Enterprise-ready
42. Petabyte size files and
Trillions of objects • Store data in it’s native format
• PB sized files, 200x larger than
anyone else
• Scalable throughput for
massively parallel analytics
• No need to redesign
application or reparation data
at higher scale
TBs
EBs
Store
43. Any type
of analytics
• Batch, interactive, streaming,
machine learning
• Allows for exploratory analytics
over data
• Analyze with Hadoop and
Microsoft solutions
Cortana Intelligence Suite
YARN
U-SQL
Analytics HDInsight
HDFS
Store
Hive R Server
44. Start in seconds, Scale
instantly, Pay per job
with Analytics
• Process big data jobs in 30
seconds
• No infrastructure to worry
about (no servers, no VMs, no
clusters)
• Instantly scale analytic units up
or down (processing power)
• Architected for cloud scale and
performance
• Frees you up to focus only on
your business logic
45. Azure Data Lake
Big Data made easy
Analytics on any data,
any size
Easier and more
productive for all users Enterprise-ready
46. Easy for administrators
to spin up quickly
• Deploy big data projects
in minutes
• No hardware to install,
tune, configure or deploy
• No infrastructure or
software to manage
• Scale to tens to thousands
of machines instantly
47. Debug and Optimize
your Big Data
programs with ease
• Deep integration with
Visual Studio, Visual Studio
Code, Eclipse, & IntelliJ
• Easy for novices to write
simple queries
• Integrated with U-SQL,
Hive, Storm, and Spark
• Actively offers recommendations
to improve performance and
reduce cost
• Playback visually displays job run
48. Develop massively
parallel programs with
simplicity
• U-SQL: a simple
and powerful language that’s
familiar and easily extensible
• Unifies the declarative
nature of SQL with expressive
power of C#
• Leverage existing libraries in
.NET languages, R and Python
• Massively parallelize code on
diverse workloads (ETL, ML,
image tagging, facial detection)
49. Query data where it lives
Easily query data in multiple Azure data stores without moving it to a single store
Benefits
• Avoid moving large amounts of data across the
network between stores (federated query/logical data
warehouse)
• Single view of data irrespective of physical location
• Minimize data proliferation issues caused by
maintaining multiple copies
• Single query language for all data
• Each data store maintains its own sovereignty
• Design choices based on the need
• Push SQL expressions to remote SQL sources
• Filters
• Joins
U-SQL
Query
Query
Azure
Storage Blobs
Azure SQL
in VMs
Azure
SQL DB
Azure Data
Lake Analytics
Azure
SQL Data Warehouse
Azure
Data Lake Storage
50. Easy for data scientists
with familiar R language
R Server for HDInsight
• Largest portable R parallel
analytics library
• Terabyte-scale machine
learning—1,000x larger than
in open source R
• Up to 100x faster performance
using Spark and optimized
vector/math libraries
• Enterprise-grade security
and support
*Applies to HDInsight only
51. Azure Data Lake
Big Data made easy
Analytics on any data,
any size
Easier and more
productive for all users Enterprise-ready
52. Highest availability
guarantee in the industry
for peace of mind
• Managed, monitored and
supported by Microsoft
• Enterprise-leading SLA—
99.9% uptime
• No IT resources needed for
upgrades and patching
• Microsoft monitors your
deployment so you don’t
have to
99.9% SLA
53. Azure Regions
38 Regions Worldwide, 32 Generally Available
100+ datacenters
Top 3 networks in the world
2.5x AWS, 7x Google DC Regions
G Series – Largest VM in World, 32 cores, 448GB Ram, SSD…
54. Always encrypted,
Role-based security
& Auditing
• Always encrypted; in motion
using SSL, and at rest using keys
in Azure Key Vault
• Single sign-on, multi-factor
authentication and seamless
integration of on-premises
identities with Active Directory
• Fine-grained POSIX-based ACLs
for role-based access controls
• Auditing every access /
configuration change
55. Lower total cost
of ownership
• No hardware
• Hadoop support included with
Azure support
• Pay only for what you use
• Independently scale storage
and compute
• No need to hire specialized
operations team
• 63% lower total cost of
ownership than on-premises*
*IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud
with Microsoft Azure HDInsight”
57. Recognized by
top analysts
Forrester Wave for Big Data
Hadoop Cloud
• Named industry leader by
Forrester with the most
comprehensive, scalable, and
integrated platforms*
• Recognized for its cloud-first
strategy that is paying off*
*The Forrester WaveTM: Big Data Hadoop Cloud Solutions, Q2 2016.
58. Q & A ?
James Serra, Big Data Evangelist
Email me at: JamesSerra3@gmail.com
Follow me at: @JamesSerra
Link to me at: www.linkedin.com/in/JamesSerra
Visit my blog at: JamesSerra.com (where this slide deck is posted under the “Presentations” tab)
Notas del editor
Think of big data as all data, no matter what the volume, velocity, or variety. The simple truth is a traditional on-prem data warehouse will not handle big data. So what is Microsoft’s strategy for building a big data solution? And why is it best to have this solution in the cloud? That is what this presentation will cover. Be prepared to discover all the various Microsoft technologies and products from collecting data, transforming it, storing it, to visualizing it. My goal is to help you not only understand each product but understand how they all fit together, so you can be the hero who builds your companies big data solution.
Fluff, but point is I bring real work experience to the session
Key goal of slide: To convey what every IT person knows: The data warehouse and what’s it for. Then we set-up the Gartner quote to say that there is a tipping point. End the slide with a question: Why is it at a tipping point?
Slide talk track:
What is the “traditional” data warehouse?
IT professionals know this well. A data warehouse or an enterprise data warehouse is a database that was designed specifically for data analysis. It is the single source of truth or the central repository for all data in the company. This means disparate data in the company coming from your transactional systems, your ERP, CRM or Line of Business applications would all be extracted, transformed, and cleansed and put into the warehouse. It was built so that the people who is accessing the warehouse using BI tools will be accessing data that has been provisioned by IT and represent accurate data sanctioned by the company.
However, this traditional data warehouse is reaching an inflection point. Gartner in their analysis of the state of data warehousing noted that it is reaching the most significant tipping point since it’s inception. The question is why? What is going on?
Data is now the key strategic business asset. Every device, every customer, every activity – everything that’s happening in the world around us - is producing incredibly rich data that can help us create new experiences, new efficiencies, new business models and even new inventions. Leveraging this data can be the differentiator for your business. For example, IDC estimates companies that are leaders in using data assets to their advantage will capture $1.6 trillion more in business value than those that lag behind.
While data is pervasive, actionable intelligence from data is elusive. Our customers want to transform data to intelligent action and reinvent their business processes. To do this they need to more easily analyze massive amounts of data – so they can move from seeing “what happened” and understanding “why it happened” to predicting “what will happen” and ultimately, knowing “what should I do”. Only then can they create the intelligent enterprise.
Result:
Used across Microsoft in Office, Xbox Live, Azure, Windows, Bing and Skype
Supports ten thousand developers running experimentations
Manages exabytes of data
https://www.microsoft.com/en-us/Investor/earnings/FY-2016-Q4/press-release-webcast
Everything: technology, cost, culture
Our portfolio of products provides customers with the power to deploy the solution that suits their business needs.
Your choice of platform, whether on-premises, hybrid or private or public cloud, doesn’t limit you now or in the future. Migrating or expanding becomes an easy process and doesn’t require excessive downtime or introduce potential threats to your business success.
With Microsoft, you can seamlessly scale up to larger processing and storage capabilities, or scale out by adding additional servers in parallel arrangement.
T: SQL Server is a trusted market leader, and it’s the cornerstone of our data warehouse offering.
Reliable Open Source analytics with an Industry leading SLA
HDInsight allows you to easily spin up enterprise-grade open source cluster types guaranteed with the industry’s best 99.9% SLA and 24/7 support. We guarantee this SLA for the entire big data solution, not just the VM instances. HDInsight is architected for full redundancy and high availability including head node replication, data geo-replication, and built-in standby NameNode making HDInsight resilient to critical failures not addressed in standard Hadoop implementations. Azure also offers cluster monitoring and 24x7 enterprise support backed by Microsoft and Hortonworks with 37 combined committers for Hadoop core, more than all other managed cloud providers combined to support your deployment and the ability to fix and commit code back to Hadoop.
Enterprise Grade Security & Monitoring
HDInsight protects your data assets and easily extends your on-premise security and governance controls to the cloud. We feature single sign-on (SSO), multi-factor authentication and seamless management of millions of identities through Azure Active Directory. You can authorize users and groups with fine-grained access control policies over all your enterprise data with Apache Ranger. HDInsight meets HIPAA, PCI, SOC compliance, ensuring your enterprise data assets are always protected with the highest security and regulatory compliance. To ensure the highest level of business continuity, HDInsight extends capabilities for alerting, monitoring, defining pre-emptive actions, and enhanced workload protection through native integration with Azure Operations Management Suite (OMS).
Most Productive platform for developers and scientists
HDInsight offers developers tailored experiences through rich productivity suites for Hadoop & Spark with integrated development environments using Visual Studio, Eclipse, and IntelliJ supporting Scala, Python, R, Java, and .Net. HDInsight gives data scientists the ability to create narratives that combine code, statistical equations, and visualizations that tell a story about the data through integration to the two most popular notebooks: Jupyter and Zeppelin. HDInsight is also the only managed cloud Hadoop solution with integration to Microsoft R Server. Multi-threaded math libraries and transparent parallelization in R Server means handling up to 1000x more data and up to 50x faster speeds than open source R—helping you train more accurate models for better predictions than previously possible.
Cost effective cloud scale
HDInsight has decoupled compute and storage, enabling you to cost-effectively scale workloads up or down, independent of storage. Local storage can still be used for caching and fast I/O. Spark and interactive Hive users can choose SSD memory for interactive performance; while Kafka users can retain all streaming data in premium managed disks. You only pay for the compute and storage you use and are given the ability to choose any Azure VM types that enables the best utilization of resources. A recent study showed HDInsight delivering 63% lower TCO than deploying Hadoop on premises over 5 years.*
Integration with leading Productivity Applications
In the broader ecosystem for Hadoop, there is a thriving market of independent software vendors (ISVs) who provide value added solutions. Through a unique design where every cluster is extended with edge nodes and script action, HDInsight lets customers spin up Hadoop and Spark clusters pre-integrated and pre-tuned with any ISV application out-of-the-box. Datameer, Cask, AtScale, StreamSets are few such applications, which are very popular on the HDInsight platform today.
Easy for administrators to manage
With HDInsight, administrators can deploy Hadoop in the cloud without buying new hardware or incurring other up-front costs. There’s also no time-consuming installation or set up. There is also no need to patch the operating system or upgrade the Hadoop versions. Azure does it for you. Launch your first cluster in minutes.
Petabyte size files and Trillions of objects:With Azure Data Lake Store your organization can analyze all of its data in a single place with no artificial constraints. Your Data Lake Store can store trillions of files where a single file can be greater than a petabyte in size which is 200x larger than other cloud stores. This makes Data Lake Store ideal for storing any type of data including massive datasets like high-resolution video, genomic and seismic datasets, medical data, and data from a wide variety of industries.
Scalable throughput for massively parallel analytics:Without redesigning your application or repartitioning your data at higher scale, Data Lake Store scales throughput to support any size of analytic workload. It provides massive throughput to run analytic jobs with 1,000+ concurrent executors that read and write hundreds of terabytes of data efficiently.
HDFS for the Cloud:Microsoft Azure Data Lake Store supports any application that uses the open Apache Hadoop Distributed File System (HDFS) standard. By supporting HDFS, you can easily migrate your existing Hadoop and Spark data to the cloud without recreating your HDFS directory structure.
Always encrypted, Role-based security & Auditing:Data Lake Store protects your data assets and extends your on-premises security and governance controls to the cloud easily. Data is always encrypted; in motion using SSL, and at rest using service or user managed HSM-backed keys in Azure Key Vault. Capabilities such as single sign-on (SSO), multi-factor authentication and seamless management of millions of identities is built-in through Azure Active Directory. You can authorize users and groups with fine-grained POSIX-based ACLs for all data in the Store enabling role-based access controls. Finally, you can meet security and regulatory compliance needs by auditing every access or configuration change to the system.
Enterprise-grade Support:We guarantee a 99.9% enterprise-grade SLA and 24/7 support for your big data solution.
Start in seconds, Scale instantly, Pay per job:Our on-demand service will have you processing Big Data jobs within 30 seconds. There is no infrastructure to worry about because there are no servers, VMs, or clusters to wait for, manage or tune. You can instantly scale the analytic units (processing power) from one to thousands for each job. You only pay for the processing used per job.
Develop massively parallel programs with simplicity:U-SQL is a simple, expressive, and extensible language that allows you to write code once and automatically have it be parallelized for the scale you need. You can process petabytes of data for diverse workload categories such as ETL, machine learning, cognitive science, machine translation, imaging processing, and sentiment analysis by using U-SQL and leveraging existing libraries written in .NET languages, R, or Python..
Debug and Optimize your Big Data programs with ease:Debugging failures in cloud distributed programs are now as easy as debugging a program in your personal environment. Our execution environment actively analyzes your programs as they run and offers recommendations to improve performance and reduce cost. For example, if you requested 1000 AUs for your program and only 50 AUs were needed, the system would recommend that you only use 50 AUs resulting in a 20x cost savings.
Virtualize your analytics:The power to act on all your data with optimized data virtualization of your relational sources such as Azure SQL Server on VMs, Azure SQL Database, and Azure SQL Data Warehouse. Queries are automatically optimized by moving processing close to the source data, without data movement, thereby maximizing performance and minimizing latency.
Enterprise-grade Security, Auditing and Support:Extend your on-premises security and governance controls to the cloud for meeting your security and regulatory compliance needs. Capabilities such as single sign-on (SSO), multi-factor authentication and seamless management of millions of identities is built-in through Azure Active Directory. Role Based Access control, and the ability to audit all processing and management operations are on by default. We guarantee a 99.9% enterprise-grade SLA and 24/7 support for your big data solution.
22
We are planning to release a preview of this functionality early next year as part of SQL Server V.Next CTPs, exact release dates are still in flux.
By preview early next year PolyBase will support Teradata, Oracle, SQL Server, MongoDB, Hadoop and Azure blob storage (not MySQL!). We will continue to add more sources until GA.
http://demo.sqlmag.com/scaling-success-sql-server-2016/integrating-big-data-and-sql-server-2016
When it comes to key BI investments we are making it much easier to manage relational and non-relational data with Polybase technology that allows you to query Hadoop data and SQL Server relational data through single T-SQL query. One of the challenges we see with Hadoop is there are not enough people out there with Hadoop and Map Reduce skillset and this technology simplifies the skillset needed to manage Hadoop data. This can also work across your on-premises environment or SQL Server running in Azure.
Comparison of IoT Hub and Event Hubs: https://azure.microsoft.com/en-us/documentation/articles/iot-hub-compare-event-hubs/
Azure Stream Analytics is a cost effective event processing engine that helps uncover real-time insights from devices, sensors, infrastructure, applications, and data. It will enable various opportunities including Internet of Things (IoT) scenarios such as real-time fleet management or gaining insights from devices like mobile phones and connected cars. Deployed in the Azure cloud, Stream Analytics has elastic scale where resources are efficiently allocated and paid for as requested. Developers are given a rapid development experience where they describe their desired transformations in SQL and the system abstracts the complexities of the parallelization, distributed computing, and error handling from them.
Looking forward into H2 FY15, Stream Analytics will become generally available after previewing at TechEd EMEA 2014.
Microsoft’s Big Data vision in the cloud is to enable organizations to solve large, complex problems end-to-end, from storing and managing TBs of data without investing in hardware and software, to seamless integration with the 1 billion users of Excel. As part of this vision, Microsoft offers Azure Machine Learning, designed to democratize the complex task of advanced analytics.
Advanced analytics is using products like Azure Machine Learning to find new and actionable insights that traditional approaches to business intelligence are unlikely to discover. An easy way to think about this is thinking about a dashboard. Today when confined by only BI tools without a connection to machine learning, it is solely the job of the human looking at the spreadsheet to gain insights and react to the data. But a human can only consume so many variables. A computer, on the other hand, can consume a great deal more variables to provide much deeper insight on the data. Humans can then react to the data to make decisions that drive competitive advantage, as well as program the computer further to recognize important patterns in the future. This is why we say beyond business intelligence – machine intelligence.
The accessibility of our solution starts with set up. Previously you needed to provision your workspace on-premises for machine learning, also thinking about server space and a host of other considerations. Today you can get started with just a browser. With only an Azure subscription, you can take advantage of the full functionality of Azure Machine Learning within minutes. Taking a test drive is even easier, click Get Started off azure.com/ml and with simply a Microsoft ID you’re off to the races.
Another limit with other machine learning solutions are siloed environments that only allow for one programming language or make changing from one algorithm to another time consuming and complex. With Azure ML, you can experience the power of choice. That choice expands to language, with both Python and R being first class citizens of Azure ML, or algorithm. You can choose from hundreds of algorithms, including business-tested ones running our Microsoft businesses today. And swapping out algorithms to land on the right one for you is done with a click. Additionally you can drop in custom R and Python code – your “special sauce” – and mix and match that with the other options in the tool.
Most revolutionary of all you can deploy solutions in minutes as a web service, which is simply a url which can connect to any data, anywhere – including on-premises or in another cloud environment. The ability to put a model into production almost immediately, as well as revise it easily, is unique to Microsoft and allows companies to stay on top of the changing business landscape more effectively than is offered by any other provider today.
We even take that a step further, allowing model developers to connect to the world with our Machine Learning Marketplace, where they can publish finished solutions and APIs with their own brand and business model. Developers can also discover machine learning solutions there without any machine learning skills needed – the data science is inside. Check it out at https://datamarket.azure.com/.
Azure Data Factory is a cloud service for creating, managing, and monitoring the production of trusted information from on-premises and cloud data sources using transformative analytics at scale. Data Factory can be used in solutions to gain insights from operational and service health telemetry data, analyze customer actions to determine an optimal targeted marketing strategy, or predict customer churn from customer profile and service log data. Instead of writing hard-to-manage custom code to wire together a data warehouse with Hadoop, NoSQL, and SaaS, use Data Factory to quickly create and deploy highly available data processing pipelines, significantly cutting your time to solution and your operational costs. Get a single monitoring view of all of your data processing pipelines along with data lineage and service health. Bring together on-premises data like SQL Server and cloud data like Azure SQL Database, Blobs, and Tables with the transformative analytics of HDInsight (Hive, Pig, MapReduce, custom .NET code), and even Azure Machine Learning, to produce trusted information that is easily consumed by BI tools or applications.
Looking forward into H2 FY15, Data Factory will become generally available after previewing at TechEd EMEA 2014.
Power BI Desktop is a self-service BI tool designed to allow users to pull data together from multiple different data sources. Transform and clean that data. Model and add custom calculations. And then visually explore and create interactive reports that can be easily published and shared through the Power BI service.
In addition – you can now create your own custom visualizations though our open source visualization framework. More information available at powerbi.com/visuals
Power BI dashboards
With updates to Power BI customers can now see all their data through a single pane of glass. Live Power BI dashboards show visualizations and KPIs from data that reside both on-premises and in the cloud, providing a consolidated view across their business regardless of where their data lives.
You can then explore their data further by drilling through the dashboard into the underlying reports, discovering new insights that they can pin back to the dashboard to monitor performance going forward.
Natural Language Interface - With Power BI we continue to find new ways to simplify how people analyze and gain insight from data, providing industry leading features such as natural language query. Natural language query provides users with an easier way to interact with their data, allowing them to type questions of their data and receive answers in the form of live visualizations. Power BI integration with Cortana allows you to now ask these question directly from Cortana and to have answers from your Power BI data surfaced to you by Cortana. These data driven answers can range from simple numeric values (“revenue for the last quarter”), charts (“revenue over time”), maps (“revenue by region”) or data represented through any of the other Power BI data visualizations. Combined with the Cortana Analytics suite, this opens up amazing new opportunities to use Cortana to enable your business, and your customers' businesses, to get things done in more helpful, proactive, and natural ways.
Quick Insights - providing a new ways to help users find hidden insights in their data. The new Quick Insights feature allows users to automatically scan and detect patterns and trends in the data that they publish to Power BI. Through a partnership with Microsoft Research, the Quick Insights feature uses a growing list of algorithms to automatically discover and visualize correlations, outliers, trends, seasonality, change points in trends, and other factors in your data in seconds.
Animation set to loop (replace /Build walk in ?), Add session id to top
Bot Framework provides everything you need to build and connect intelligent bots that interact naturally wherever your users are talking, from text/sms to Skype, Slack, Office 365 mail and other popular services.
Bot Framework consists of three main components: Bot Connector, Bot Builder, and Bot Directory
At Microsoft, we’ve been offering APIs for a very long time across the company. In delivering Microsoft Cognitive Services API, we started with 4 last year at /build (2015); added 7 more last December, and today (May 2016) we have 22 APIs in our collection.
Cognitive Services are available individually or as a part of the Cortana Intelligence Suite, formerly known as Cortana Analytics, which provides a comprehensive collection of services powered by cutting-edge research into machine learning, perception, analytics and social bots.
These APIs are powered by Microsoft Azure.
Developers and businesses can use this suite of services and tools to create apps that learn about our world and interact with people and customers in personalized, intelligent ways.
Key points: Summarize key benefits for Azure Analysis Services
Talk track:
As already mentioned, Azure Analysis Services is based on the proven analytics engine in SQL Server 2016 Analysis Services, that has helped organizations turn complex data into a trusted, single source of truth for years.
This means that BI professionals who are familiar with SQL Server Analysis Services, tabular models can get started quickly and do not need to learn new tools or skills.
And with the power of the cloud, BI professionals do not need to manage infrastructure on-premises. They can easily deploy the BI solution and benefit from the scalability of the cloud.
Organizations store data in the cloud and on-premises. Azure Analysis Services is built for hybrid data. Data can be access in the cloud, on-premises or a combination of both, enabling a hybrid solution. So - customers do not have to move on-premises data to the cloud.
And last but not least. Azure Analysis Services enables interactive data visualization over billions of rows of data and as it supports BI industry standards such as XML/A and MDX, business users can access data using their preferred data visualization tool. Whether it is Power BI, Excel or other major data visualization tools.
To summarize, Azure Analysis Services is simple to use – it is easy to get started, you can use your existing skills to create BI semantic models, and your favorite data visualizations tools to analyze your data.
Slide objective
Show broad commitment to R by preserving freely available, enhanced editions, Windows and SQL Server editions and R Server editions for leading EDWs, Linux and Hadoop platforms.
Differentiate free, open editions from commercial by mentioning availability of commercial 24x7 support, and enhancements to support very large scale data analytics at speed.
Talking points
Notes
Microsoft Azure DocumentDB is the highly-scalable NoSQL document database-as-a-service that
enables query over schema-free data and multi-document transaction processing
helps deliver configurable and reliable performance
and enables rapid development
DocumentDB is the right solution for applications that run in the cloud when predictable throughput, low latency, and flexible query are key.
Fully managed PaaS database service backed by the power of Microsoft Azure. Unlike many other NoSQL offers, DocumentDB was built for the cloud to perform and scale in a multi-tenant environment. Cluster administration, replication, and other management functions are handled for the customer automatically. DocumentDB is backed by a 99.95% availability SLA (at GA) to provide consistent, reliable performance.
Application controlled schema with massive scale-out enables iterative development and evolving data models. DocumentDB supports a schema-free data model where the application defines the data model. This supports modern application development scenarios where applications are developed iteratively with many versions supported concurrently and data models continuously evolve.
Automatic indexing enables robust querying over schema-free data. DocumentDB is the first of its kind to offer SQL over schema-free JSON data and multi-document transactional processing.
Integrated transactional JavaScript processing + tunable consistency enable high performance application experiences. DocumentDB supports stored procedures, triggers, and user-defined functions. It also supports tunable consistency with well-defined click stops to enable developers to tune database performance based on the application’s needs.
The key scenarios for DocumentDB are the following:
Emitting telemetry and logging data
Storing/querying event and workflow data
Persisting device and app configuration data
User generated content
Scalable, iterative app development
Petabyte size files and Trillions of objects:With Azure Data Lake Store your organization can analyze all of its data in a single place with no artificial constraints. Your Data Lake Store can store trillions of files where a single file can be greater than a petabyte in size which is 200x larger than other cloud stores. This makes Data Lake Store ideal for storing any type of data including massive datasets like high-resolution video, genomic and seismic datasets, medical data, and data from a wide variety of industries.
Scalable throughput for massively parallel analytics:Without redesigning your application or repartitioning your data at higher scale, Data Lake Store scales throughput to support any size of analytic workload. It provides massive throughput to run analytic jobs with 1,000+ concurrent executors that read and write hundreds of terabytes
Start in seconds, Scale instantly, Pay per job:Our on-demand service will have you processing Big Data jobs within 30 seconds. There is no infrastructure to worry about because there are no servers, VMs, or clusters to wait for, manage or tune. You can instantly scale the analytic units (processing power) from one to thousands for each job. You only pay for the processing used per job.
Debug and Optimize your Big Data programs with ease:Debugging failures in cloud distributed programs are now as easy as debugging a program in your personal environment. Our execution environment actively analyzes your programs as they run and offers recommendations to improve performance and reduce cost. For example, if you requested 1000 AUs for your program and only 50 AUs were needed, the system would recommend that you only use 50 AUs resulting in a 20x cost savings.
Develop massively parallel programs with simplicity:U-SQL is a simple, expressive, and extensible language that allows you to write code once and automatically have it be parallelized for the scale you need. You can process petabytes of data for diverse workload categories such as ETL, machine learning, cognitive science, machine translation, imaging processing, and sentiment analysis by using U-SQL and leveraging existing libraries written in .NET languages, R, or Python..
With Microsoft Azure HDInsight, Microsoft R Server is now available as an option when you create HDInsight clusters in Azure. This new capability provides data scientists, statisticians, and R programmers with on-demand access to scalable, distributed methods of analytics on HDInsight.
Clusters can be sized to the projects and tasks at hand and torn down when they're no longer needed. Since they're part of Azure HDInsight, these clusters come with enterprise-level 24/7 support, an SLA of 99.9% uptime, and the flexibility to integrate with other components in the Azure ecosystem.
R Server on HDInsight provides the latest capabilities for R-based analytics on datasets of virtually any size loaded to either Azure Blob or Data Lake storage. Since R Server is built on open source R, the R-based applications you build can leverage any of the 8000+ open source R packages, as well as the routines in ScaleR, Microsoft’s big data analytics package that's included with R Server.
The edge node of a cluster provides a convenient place to connect to the cluster and to run your R scripts. With an edge node, you have the option of running ScaleR’s parallelized distributed functions across the cores of the edge node server. You also have the option to run them across the nodes of the cluster by using ScaleR’s Hadoop Map Reduce or Spark compute contexts.
The models or predictions that result from analyses can be downloaded for use on-premises. They can also be operationalized elsewhere in Azure, such as through an Azure Machine Learning Studio web service.
https://azure.microsoft.com/en-us/regions/
Always encrypted, Role-based security & Auditing:Data Lake Store protects your data assets and extends your on-premises security and governance controls to the cloud easily. Data is always encrypted; in motion using SSL, and at rest using service or user managed HSM-backed keys in Azure Key Vault. Capabilities such as single sign-on (SSO), multi-factor authentication and seamless management of millions of identities is built-in through Azure Active Directory. You can authorize users and groups with fine-grained POSIX-based ACLs for all data in the Store enabling role-based access controls. Finally, you can meet security and regulatory compliance needs by auditing every access or configuration change to the system.
1) Copy source data into the Azure Data Lake Store (twitter data example)2) Massage/filter the data using Hadoop (or skip using Hadoop and use stored procedures in SQL DW/DB to massage data after step #5)3) Pass data into Azure ML to build models using Hive query (or pass in directly from Azure Data Lake Store)4) Azure ML feeds prediction results into the data warehouse5) Non-relational data in Azure Data Lake Store copied to data warehouse in relational format (optionally use PolyBase with external tables to avoid copying data)6) Power BI pulls data from data warehouse to build dashboards and reports7) Azure Data Catalog captures metadata from Azure Data Lake Store and SQL DW/DB8) Power BI and Excel can pull data from the Azure Data Lake Store via HDInsight9) To support high concurrency if using SQL DW, or for easier end-user data layer, create an SSAS cube