Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio

Eche un vistazo a continuación

1 de 80 Anuncio

Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1

Lance Olson. Cortana Analytics is a fully managed big data and advanced analytics suite that helps you transform your data into intelligent action. Come to this two-part session to learn how you can do "big data" processing and storage in Cortana Analytics. In the first part, we will provide an overview of the processing and storage services. We will then talk about the patterns and use cases which make up most big data solutions. In the second part, we will go hands-on, showing you how to get started today with writing batch/interactive queries, real-time stream processing, or NoSQL transactions all over the same repository of data. Crunch petabytes of data by scaling out your computation power to any sized cluster. Store any amount of unstructured data in its native format with no limits to file or account size. All of this can be done with no hardware to acquire or maintain and minimal time to setup giving you the value of "big data" within minutes. Go to https://channel9.msdn.com/ to find the recording of this session.

Lance Olson. Cortana Analytics is a fully managed big data and advanced analytics suite that helps you transform your data into intelligent action. Come to this two-part session to learn how you can do "big data" processing and storage in Cortana Analytics. In the first part, we will provide an overview of the processing and storage services. We will then talk about the patterns and use cases which make up most big data solutions. In the second part, we will go hands-on, showing you how to get started today with writing batch/interactive queries, real-time stream processing, or NoSQL transactions all over the same repository of data. Crunch petabytes of data by scaling out your computation power to any sized cluster. Store any amount of unstructured data in its native format with no limits to file or account size. All of this can be done with no hardware to acquire or maintain and minimal time to setup giving you the value of "big data" within minutes. Go to https://channel9.msdn.com/ to find the recording of this session.

Anuncio
Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

A los espectadores también les gustó (20)

Anuncio

Similares a Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1 (20)

Más de MSAdvAnalytics (20)

Anuncio

Más reciente (20)

Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1

  1. 1. Data complexity: variety and velocity Petabytes
  2. 2. Massive Compute and Storage Deployment expertise Data of all Volume Variety, Velocity Speed Scale Economics Always Up, Always On Open and flexible Time to value
  3. 3. Big Data
  4. 4. HDInsight Script SQL NoSQL StreamingBatch Map reduce In Memory Core Engine
  5. 5. • Microsoft’s cloud Hadoop offering • 100% open source Apache Hadoop • Built on the latest releases for Hadoop • Up and running in minutes with no hardware to deploy • .NET and Java skills and deep integration to Visual Studio • Utilize familiar BI tools for analysis including Microsoft Excel • 99.9% Enterprise Service Level Agreement HDInsight Script SQL NoSQL StreamingBatch Map reduce In Memory Core Engine
  6. 6. HDInsight Script SQL NoSQL StreamingBatch Map reduce In Memory Core Engine Microsoft contribution to Apache code
  7. 7. Data Node Data Node Data Node Data Node Task Tracker Task Tracker Task Tracker Task Tracker Name Node Job Tracker HMaster Coordination Region Server Region Server Region Server Region Server • Random, fast (realtime) read/write access to your Big Data. • Host very large tables (billions of rows X millions of columns) on clusters of commodity hardware. • Runs on top of the Hadoop Distributed File System (HDFS) • Provides flexibility in that new columns can be added to column families at any time HDInsight Script SQL NoSQL StreamingBatch Map reduce In Memory Core Engine
  8. 8. Stream processin g Search and query Data analytics (Excel) Web/thick client dashboards Devices to take action RabbitMQ / ActiveMQ HDInsight Script SQL NoSQL StreamingBatch Map reduce In Memory Core Engine
  9. 9. • Single execution model for multiple tasks (SQL queries, Streaming, Machine Learning, and Graph) • Processing up to 100x faster performance • Developer friendly (Java, Python, Scala) • BI tool of choice (Power BI, Tabelau, Qlik, SAP) • Notebook experience (Jupyter/iPython, Zeppelin) HDInsight Script SQL NoSQL StreamingBatch Map reduce In Memory Core Engine Spark SQL Spark Streaming Machine Learning Graph HDInsight Script SQL NoSQL StreamingBatch Map reduce In Memory Core Engine
  10. 10. Spark for Azure HDInsight In-memory computation engine – Fully managed
  11. 11. • Managed & supported by Microsoft • Familiarity of Windows • Re-use common tools, documentation, samples from Hadoop/Linux ecosystem • Add Hadoop projects that were authored on Linux to HDInsight • Easier transition from on-premise to cloud
  12. 12. Partner Spotlight: AtScale Analysts Use Traditional BI Tools Against HDInsight
  13. 13. • HDFS For the Cloud • Unlimited Storage, Petabyte Files • Optimized for Massive Throughput • High frequency, low latency, read immediately • Managed and secured
  14. 14. PB TB GB PB TB
  15. 15. Neudesic partnered with one of the nation's largest utility companies that recently deployed Smart Utility Meters for power customers, nearly a million meters sending usage data every 15 minutes. The result: an Azure hybrid big data processing solution that enabled the customer to perform gap analytics: a process for identifying gaps that exist in the power usage readings, over 7x faster than their previous solution! Billions of Smart Meter reads get processed to identify the nature and duration of the gaps to mitigate revenue losses. Smart Meters Business Rules Processing BI Layer Blob Storage HDInsightInput Processed Output data ELT Local SQL DB for Customer and other confidential data Extract processed data from blob storage AZCopy AZCopy SSIS Input files
  16. 16. Big Data in Retail • Clickstream analytics • Online recommendation engine • 360° view of the customer • Analyze brand sentiment • Localized, personalized promotions • Optimal store layout Leading computer manufacturer in world • Use clickstream to deliver custom website ecommerce experience • Targeted ads for abandoned carts • Use unstructured data from website and social for data mining • Combine w/sales data for 360 view • Gather data from table-side devices at restaurants • Predict promotions/offers and content to upsell to guests • Gather social media sentiment from customer feedback • Combined with POS data, can determine right product mix Leading Multi-national Retailer • Track weather information (temperature/forecast) to predict shelf space for different seasons • Sentiment analysis on feedback Leading clothing online retailer • Use clickstream to understand who is viewing their site • Building recommendation engine based on users’ clickpaths
  17. 17. Ziosk turned to Microsoft gold partner, Artis Consulting to deploy a hybrid deployment consisting of the Analytics Platform System, Azure HDInsight, Power BI, and Azure Machine Learning “Until now, we haven’t had the ability to optimize the guest experience based on their specific interactions with the devices. With Azure, we can close the loop.” Kevin Mowry Ziosk Chief Software Architect
  18. 18. Big Data in Health • Predictive Analysis of Patient Health & Clinical Decision Support • Population, risk, and Care management • Real-time quality measures to assist providers w/regulatory requirements • Medical research data (eg. genomics) • Recruit cohorts for pharmaceutical trials • Process large volumes of data from any healthcare provider EHR system • Assist in showing compliance • Store 7-30 years of data to meet audit requirements • Scan handwritten notes and do natural language processing • Analyze if symptoms might map to bigger outbreak • Collect clinical trial data (from automated equipment, sensors) • Find patterns on this data (chemical compositions, enzymes) • Process 6 years worth of data in a few hours without any infrastructure
  19. 19. Big Data Financial Services • New account risk screens • Fraud prevention • Trading risk • Maximize deposit spread • Insurance underwriting • Accelerate loan processing • Actively monitor currencies used by UK manufacturers in supply chain to do risk analysis • Monitor UK GDP to help customers stay on top of economic trends • Needed to handle increasing amounts of finance, compliance, and legal data from trading operations • Trading data drives strategic decisions • Track customer feedback on social media and on their blog posts/website to understand loyalty • Predict at-risks clients to reach out to • Process data for actuaries to analyze results to understand risks for insurance companies • Milliman’s application understands relationships between people, process, and technology to manage risk
  20. 20. Tangerine partners with Microsoft to build a solution with Analytics Platform System for the data warehouse and uses PolyBase to query Azure HDInsight in the cloud. “With pre-built integration using PolyBase to query both the relational data warehouse and Hadoop in the cloud, the solution will allow us to reap the benefits of both relational and non-relational data regardless of where it lives.”
  21. 21. http://azure.microsoft.com/en-us/documentation/services/hdinsight/ http://azure.microsoft.com/en-us/documentation/articles/hdinsight-learn-map/ http://www.microsoftvirtualacademy.com/training-courses/getting-started-with-microsoft-big-data http://channel9.msdn.com/Shows/Data-Exposed http://azure.microsoft.com/en-us/pricing/free-trial/
  22. 22. Applications Web and Social Devices Sensors Queryable Table Hive
  23. 23. Head node Name node Data nodes/task nodes JDBCODBC Query Console Metastore Thrift server Command line interface (CLI) Compiler, Optimizer, Executor Hadoop Hive Visual Studio
  24. 24. Scale up/Scale out Tez Partitioning ORCFile Vectorization - 10 20 30 40 50 60 70 80 90 100 TPCH1 (1TB data) Latency in minutes)
  25. 25. TransformationCollection Presentation and action Event Queuing System Long-term storage Search and query Data analytics (Excel) Web/thick client dashboards Devices to take action Event hub Event producers Applications Web and social Devices Sensors Live Dashboards Apache HBase on HDInsight DocumentDB Solr Azure Search MongoDB SQL Cloud gateways (web APIs) Field gateways Kafka/RabbitMQ/ ActiveMQ Event hubs Azure ML Storage adapters Stream processing 52 Storm HDInsight Stream Analytics
  26. 26. Storm Essentials 53
  27. 27. 54 Easy to program A distributed real time processing platform Fault Tolerant Failure is expected, and embraced Fast Clocked at 1M+ messages per second per node Scalable Thousands of workers per cluster Reliable Guaranteed message delivery Exactly-once semantics Streaming data analysis
  28. 28. Storm Essentials 55
  29. 29. Unbounded sequence of Tuples Core unit of data Immutable set of key/value pair Source of streams Wraps a streaming data source and emits Tuples 56 Spout {…} Tuple {…} {…} {…} {…} {…} {…} Stream
  30. 30. Write to a data store Read from a data store Perform arbitrary computation (Optionally) Emit additional streams Core functions of a streaming computation | Receive tuples and do stuff Compute 57
  31. 31. Storm Essentials 58
  32. 32. 59
  33. 33. 60 Cloud gateways Data Generator Counter Bolt Aggregate Writer Bolt Live dashboard
  34. 34. Storm Essentials 61
  35. 35. 62 Managed services Open source platform Scale-up and scale-down Event Hub Visual Studio Azure HBase, SQL Database, DocumentDB Speed Analyse millions of messages per second
  36. 36. 63 Support for authoring Storm Topologies Create Storm projects from available template Submit a topology with C# bolts/spouts Submit Topologies containing Java spouts/bolts Monitor Topologies within VS Troubleshoot Topologies In essence, you never need to leave Visual Studio for Storm Projects
  37. 37. Storm on HDInsight Azure Stream Analytics Management & Operations Service Managed Cluster Managed Service Price Link to the pricing page. Link to the pricing page. Microsoft Supported Yes Yes Open Source Yes No Development Experience SQL DSL No Yes Extensible Yes No Temporal Operators No. Customer write custom code Yes Authorig/Debugging Tools via Visual Studio Interactive authoring and debugging via azure portal Input/Output Data Ingress No restriction Event Hub, Azure Blobs Data egress No restriction Support to write data to Event Hubs, Blob store, azure table, azure sql db, Powerbi Supports Multiple Inputs Yes Yes Generate Multiple Outputs Yes Yes Data Format No restriction Avro, JSON,CSV Performance Scalability Yes Yes Elastic Scale Yes Yes Technology comparison
  38. 38. Web App Devices Streaming Service Batch Analytics HBase Hadoop
  39. 39. Web App HBase Twitter Spout Sentiment Indexer Broadcaster Counter Writer SignalR Storm
  40. 40. HBase: The Definitive Guide Online HBase Book https://github.com/hdinsight/hbase-sdk-for-net https://github.com/maxluk/tweet-sentiment Get started using HBase in HDInsight Tutorial: Building Tweet Sentiment App
  41. 41. SUMMARY See a deployment summary VIRTUAL NETWORK Configure your virtual network HDINSIGHT CLUSTER Configure your cluster {ISV App Name} 1 BASICS Configure your app 1. Basics Windows Server 2012 R2 Datacenter Hadoop Create Basics
  42. 42. SUMMARY See a deployment summary VIRTUAL NETWORK Configure your virtual network HDINSIGHT CLUSTER Configure your cluster New HDInsight Cluster 1 BASICS Configure your app 1. Basics Windows Server 2012 R2 Datacenter Hadoop Create New HDInsight Cluster Hadoop mycluster001 HDInsight_Telemetry myresourcegroup Linux (Ubuntu 12.04 LTS)
  43. 43. SUMMARY See a deployment summary VIRTUAL NETWORK Configure your virtual network HDINSIGHT CLUSTER Configure your cluster New HDInsight Cluster 1 BASICS Configure your app 1. Basics Windows Server 2012 R2 Datacenter Hadoop Create New Virtual Network
  44. 44. SUMMARY See a deployment summary VIRTUAL NETWORK Configure your virtual network HDINSIGHT CLUSTER Configure your cluster New HDInsight Cluster 1 BASICS Configure your app 1. Basics Windows Server 2012 R2 Datacenter Hadoop Create Summary Cluster name newcluster001 Cluster type Storm Cluster operating system Linux (Ubuntu 12.04 LTS) Cluster data source (new) storage001 (Azure Storage) Head nodes: 2 nodes (D12) Worker nodes: 4 nodes (D14) Zookeeper nodes: 3 nodes (D12) Metastores selected: Hive: yes, Oozie: no

×