SlideShare una empresa de Scribd logo
1 de 24
Data Science in the cloud
with
Microsoft Azure
MARTIN THORNALLEY
DATA SOLUTION ARCHITECT, MICROSOFT
Introduction
Data Science Definition
“Data science is an interdisciplinary field about
processes and systems to extract knowledge or
insights from data in various forms, either structured
or unstructured, which is a continuation of some of
the data analysis fields such as statistics, machine
learning, data mining, and predictive analytics”
https://en.wikipedia.org/wiki/Data_science
Data Science Skillset
http://berkeleysciencereview.com/how-to-become-a-data-scientist-before-you-graduate/
The Cloud
Why does the Cloud matter for Data Science?
 High capacity and cost effective data storage
 Flexible, elastic compute capacity
 Ready to use technologies
 Choice of Infrastructure or Platform
 Enables Agile & DevOps
 Operational reliability and security
 Pay as you go
Microsoft Azure Cloud Platform
 Wide range of services covering Compute, Web & Mobile, Data &
Storage, Analytics, Internet of Things & Intelligence plus many more,
see http://azureplatform.azurewebsites.net/en-us/
 Easy to get started, free to try for 30 days but limited spend, also
MSDN licence free credits, see https://azure.microsoft.com/en-
gb/free/
 Comprehensive documentation and examples
 Global presence with many recognisable brands fully committed
 Huge investment and growing rapidly
Data Science Process
https://azure.microsoft.com/en-us/documentation/articles/data-science-process-overview/
Worked
Example
NYC taxis
 2013 NYC taxi trips and fares – open but non-trivial dataset
 24 CSV files - 12 trip, 12 fare, 1 for each month
 ~20GB compressed, ~50GB uncompressed, 170+ million records
 medallion – vehicle identifier
 hack license – driver identifier
 passenger count
 pickup & dropoff – datetime, longitude, latitude
 trip – time and distance
 fare - payment type, fare amount, surcharge, mta tax, tip amount, tolls
amount, total amount
http://www.andresmh.com/nyctaxitrips/
Predictions
 Predict whether a specific journey will result in a tip – binary
classification
 Predict what class of tip will be for a specific journey – multiclass
classification
 Predict how much a tip will be for a specific journey – regression
A Data Science
Environment
Data Science Virtual Machine
Create Linux and Windows virtual machines in minutes
 Wide range of configurations - CPU cores, memory, disks, network
speeds
 Scale to what you need
 Pay only for what you use
 Enhance security and compliance
 Preloaded with full set of tools and utilities from Azure MarketPlace
e.g. SQL Server 2016 Developer edition, Azure SDK, Python, R,
Jupyter, etc.
Storage Accounts
Massively scalable cloud storage for your applications
 Security-enhanced, durable, and highly available across the globe
 Industry-leading performance with exabytes of capacity
 Pay only for what you use
 Open, multi-platform support
HDInsight
A managed Apache Hadoop, Spark, R, HBase, and Storm cloud service
made easy
 Scale to petabytes on demand
 Crunch all data—structured, semi-structured, unstructured
 Skip buying and maintaining hardware
 Spin up Apache Hadoop, Spark, and R clusters in the cloud
 Use Excel or your favourite BI tool to visualize Hadoop data
 Connect on-premises Hadoop clusters with the cloud
Azure Machine Learning
A fully managed cloud service that enables you to easily build, deploy,
and share predictive analytics solutions.
 Powerful cloud based analytics, now part of Cortana Intelligence
Suite
 Azure Machine Learning Studio includes hundreds of built-in
packages and support for custom code
 Share your solution with the world in the Gallery or on the Azure
Marketplace
The Process
Preparation & Exploration
 Copy data using Azcopy and decompress
 Inspect files and load in to RStudio
 Create external Hive tables and load
 Query over full dataset for further exploration
 Remove erroneous data e.g. passenger numbers, lat/long
 Engineer features using Hive
 Distance from start to finish using Haversine calculation
 Binary indicator for tips
 Tip level based on ranges for multiclass classification
 Downsample dataset and save as internal table for Machine Learning
Machine Learning & Deployment
 Import Data using Hive Query
 Build Training Experiments
 Evaluate model performance
 Create Predictive Experiments
 Publish Web Service
 Test Web Service
 Call from Excel
Next Steps
To build a fully fledged enterprise solution with regular data ingestion
and model execution consider the following:
 Data Catalog
 Data Factory
 Event Hubs & Stream Analytics
 Power BI
 Cognitive Services
Conclusion
Summary
 Microsoft Azure provides a wide range of technologies for Data
Science activities
 Platform services reduce the management overhead
 No capacity limitations and flexible provisioning – pay as you go
 Choice of Open Source and Microsoft – use the best tool for the task
 The tools are well integrated
 Azure Machine Learning makes it trivial to deploy your models
 It’s quick and easy to get started
Getting Started
 Sign up for free
https://azure.microsoft.com/en-gb/free/
 Create a Data Science VM
https://azure.microsoft.com/en-us/marketplace/partners/microsoft-
ads/standard-data-science-vm/
 Visit Cortana Intelligence Gallery
https://gallery.cortanaintelligence.com/
Q&A
Thank You
Martin Thornalley
Data Solution Architect, Microsoft
@mthornal
martin.thornalley@microsoft.com
https://www.linkedin.com/in/martinthornalley

Más contenido relacionado

La actualidad más candente

Big data talking stories in Healthcare
Big data talking stories in Healthcare Big data talking stories in Healthcare
Big data talking stories in Healthcare Mostafa
 
Azure Databricks - An Introduction (by Kris Bock)
Azure Databricks - An Introduction (by Kris Bock)Azure Databricks - An Introduction (by Kris Bock)
Azure Databricks - An Introduction (by Kris Bock)Daniel Toomey
 
Steve Woolege Of Aster Data Gives Lightning Talk At BigDataCamp
Steve Woolege Of Aster Data Gives Lightning Talk At BigDataCampSteve Woolege Of Aster Data Gives Lightning Talk At BigDataCamp
Steve Woolege Of Aster Data Gives Lightning Talk At BigDataCampBigDataCamp
 
Azure Data Lake and Azure Data Lake Analytics
Azure Data Lake and Azure Data Lake AnalyticsAzure Data Lake and Azure Data Lake Analytics
Azure Data Lake and Azure Data Lake AnalyticsWaqas Idrees
 
Cortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data LakeCortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data LakeMSAdvAnalytics
 
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha DittmannAzure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha DittmannDatabricks
 
Azure Data Lake Store and Analytics
Azure Data Lake Store and AnalyticsAzure Data Lake Store and Analytics
Azure Data Lake Store and AnalyticsSergio Zenatti Filho
 
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...Data Con LA
 
Adventures in Azure Machine Learning from NE Bytes
Adventures in Azure Machine Learning from NE BytesAdventures in Azure Machine Learning from NE Bytes
Adventures in Azure Machine Learning from NE BytesDerek Graham
 
Data Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Con LA
 
Cloudian HyperStore Operating Environment
Cloudian HyperStore Operating EnvironmentCloudian HyperStore Operating Environment
Cloudian HyperStore Operating EnvironmentCloudian
 
How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?Vincent Terrasi
 
Perth Microsoft Data & Analytics User Group - Building Solutions with Azure D...
Perth Microsoft Data & Analytics User Group - Building Solutions with Azure D...Perth Microsoft Data & Analytics User Group - Building Solutions with Azure D...
Perth Microsoft Data & Analytics User Group - Building Solutions with Azure D...Sergio Zenatti Filho
 
Alluxio - Virtual Unified File System
Alluxio - Virtual Unified File System Alluxio - Virtual Unified File System
Alluxio - Virtual Unified File System Alluxio, Inc.
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystemmagda3695
 
The Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big DataThe Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big DataInMobi Technology
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBaseJames Serra
 

La actualidad más candente (20)

Big data talking stories in Healthcare
Big data talking stories in Healthcare Big data talking stories in Healthcare
Big data talking stories in Healthcare
 
Azure Databricks - An Introduction (by Kris Bock)
Azure Databricks - An Introduction (by Kris Bock)Azure Databricks - An Introduction (by Kris Bock)
Azure Databricks - An Introduction (by Kris Bock)
 
Steve Woolege Of Aster Data Gives Lightning Talk At BigDataCamp
Steve Woolege Of Aster Data Gives Lightning Talk At BigDataCampSteve Woolege Of Aster Data Gives Lightning Talk At BigDataCamp
Steve Woolege Of Aster Data Gives Lightning Talk At BigDataCamp
 
Azure Data Lake and Azure Data Lake Analytics
Azure Data Lake and Azure Data Lake AnalyticsAzure Data Lake and Azure Data Lake Analytics
Azure Data Lake and Azure Data Lake Analytics
 
Cortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data LakeCortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data Lake
 
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha DittmannAzure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
 
Azure Data Lake Store and Analytics
Azure Data Lake Store and AnalyticsAzure Data Lake Store and Analytics
Azure Data Lake Store and Analytics
 
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
 
Adventures in Azure Machine Learning from NE Bytes
Adventures in Azure Machine Learning from NE BytesAdventures in Azure Machine Learning from NE Bytes
Adventures in Azure Machine Learning from NE Bytes
 
Raven DB; day to day
Raven DB; day to dayRaven DB; day to day
Raven DB; day to day
 
Data Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Lakes with Azure Databricks
Data Lakes with Azure Databricks
 
Cloudian HyperStore Operating Environment
Cloudian HyperStore Operating EnvironmentCloudian HyperStore Operating Environment
Cloudian HyperStore Operating Environment
 
How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?
 
Perth Microsoft Data & Analytics User Group - Building Solutions with Azure D...
Perth Microsoft Data & Analytics User Group - Building Solutions with Azure D...Perth Microsoft Data & Analytics User Group - Building Solutions with Azure D...
Perth Microsoft Data & Analytics User Group - Building Solutions with Azure D...
 
Alluxio - Virtual Unified File System
Alluxio - Virtual Unified File System Alluxio - Virtual Unified File System
Alluxio - Virtual Unified File System
 
Big data in Azure
Big data in AzureBig data in Azure
Big data in Azure
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystem
 
The Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big DataThe Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big Data
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBase
 
Digital Transformation with Microsoft Azure
Digital Transformation with Microsoft AzureDigital Transformation with Microsoft Azure
Digital Transformation with Microsoft Azure
 

Destacado

Juliet Hougland, Data Scientist, Cloudera at MLconf NYC
Juliet Hougland, Data Scientist, Cloudera at MLconf NYCJuliet Hougland, Data Scientist, Cloudera at MLconf NYC
Juliet Hougland, Data Scientist, Cloudera at MLconf NYCMLconf
 
Presentación univalle
Presentación univallePresentación univalle
Presentación univallehugoandresdb
 
Why hadoop for data science?
Why hadoop for data science?Why hadoop for data science?
Why hadoop for data science?Hortonworks
 
Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612Mark Tabladillo
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist SoftServe
 
Understanding the Lean Startup
Understanding the Lean StartupUnderstanding the Lean Startup
Understanding the Lean StartupRic Pratte
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsSri Ambati
 

Destacado (8)

Juliet Hougland, Data Scientist, Cloudera at MLconf NYC
Juliet Hougland, Data Scientist, Cloudera at MLconf NYCJuliet Hougland, Data Scientist, Cloudera at MLconf NYC
Juliet Hougland, Data Scientist, Cloudera at MLconf NYC
 
Presentación univalle
Presentación univallePresentación univalle
Presentación univalle
 
Why hadoop for data science?
Why hadoop for data science?Why hadoop for data science?
Why hadoop for data science?
 
Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 
Unit5
Unit5Unit5
Unit5
 
Understanding the Lean Startup
Understanding the Lean StartupUnderstanding the Lean Startup
Understanding the Lean Startup
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data Scientists
 

Similar a Data Science in the cloud with Microsoft Azure

MongoDB IoT City Tour STUTTGART: The Microsoft Azure Platform for IoT
MongoDB IoT City Tour STUTTGART: The Microsoft Azure Platform for IoTMongoDB IoT City Tour STUTTGART: The Microsoft Azure Platform for IoT
MongoDB IoT City Tour STUTTGART: The Microsoft Azure Platform for IoTMongoDB
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AIJames Serra
 
To the Cloud and beyond (Nantes, Rebuild 2018)
To the Cloud and beyond (Nantes, Rebuild 2018)To the Cloud and beyond (Nantes, Rebuild 2018)
To the Cloud and beyond (Nantes, Rebuild 2018)Alex Danvy
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptxFedoRam1
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionJames Serra
 
Azure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandyAzure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandyNilesh Shah
 
Azure Overview Csco
Azure Overview CscoAzure Overview Csco
Azure Overview Cscorajramab
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Riccardo Zamana
 
Microsoft Partner Roadshow - To the Cloud
Microsoft Partner Roadshow  - To the CloudMicrosoft Partner Roadshow  - To the Cloud
Microsoft Partner Roadshow - To the CloudNigel Watson
 
Cepta The Future of Data with Power BI
Cepta The Future of Data with Power BICepta The Future of Data with Power BI
Cepta The Future of Data with Power BIKellyn Pot'Vin-Gorman
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategyJames Serra
 
Big Data on Azure Tutorial
Big Data on Azure TutorialBig Data on Azure Tutorial
Big Data on Azure Tutorialrustd
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud ComputingUnidev
 
Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure Abhimanyu Singhal
 
Benefits of the Azure cloud
Benefits of the Azure cloudBenefits of the Azure cloud
Benefits of the Azure cloudJames Serra
 
Deep Learning Technical Pitch Deck
Deep Learning Technical Pitch DeckDeep Learning Technical Pitch Deck
Deep Learning Technical Pitch DeckNicholas Vossburg
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on AzureTrivadis
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceeRic Choo
 

Similar a Data Science in the cloud with Microsoft Azure (20)

MongoDB IoT City Tour STUTTGART: The Microsoft Azure Platform for IoT
MongoDB IoT City Tour STUTTGART: The Microsoft Azure Platform for IoTMongoDB IoT City Tour STUTTGART: The Microsoft Azure Platform for IoT
MongoDB IoT City Tour STUTTGART: The Microsoft Azure Platform for IoT
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AI
 
To the Cloud and beyond (Nantes, Rebuild 2018)
To the Cloud and beyond (Nantes, Rebuild 2018)To the Cloud and beyond (Nantes, Rebuild 2018)
To the Cloud and beyond (Nantes, Rebuild 2018)
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptx
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 
Adam azure presentation
Adam   azure presentationAdam   azure presentation
Adam azure presentation
 
Azure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandyAzure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandy
 
Azure Overview Csco
Azure Overview CscoAzure Overview Csco
Azure Overview Csco
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020
 
Microsoft Partner Roadshow - To the Cloud
Microsoft Partner Roadshow  - To the CloudMicrosoft Partner Roadshow  - To the Cloud
Microsoft Partner Roadshow - To the Cloud
 
Cepta The Future of Data with Power BI
Cepta The Future of Data with Power BICepta The Future of Data with Power BI
Cepta The Future of Data with Power BI
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
 
Big Data on Azure Tutorial
Big Data on Azure TutorialBig Data on Azure Tutorial
Big Data on Azure Tutorial
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Sky High With Azure
Sky High With AzureSky High With Azure
Sky High With Azure
 
Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure
 
Benefits of the Azure cloud
Benefits of the Azure cloudBenefits of the Azure cloud
Benefits of the Azure cloud
 
Deep Learning Technical Pitch Deck
Deep Learning Technical Pitch DeckDeep Learning Technical Pitch Deck
Deep Learning Technical Pitch Deck
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on Azure
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 

Más de TechExeter

Exeter Science Centre, by Natalie Whitehead
Exeter Science Centre, by Natalie WhiteheadExeter Science Centre, by Natalie Whitehead
Exeter Science Centre, by Natalie WhiteheadTechExeter
 
South West InternetOfThings Network by Wo King
South West InternetOfThings Network by Wo KingSouth West InternetOfThings Network by Wo King
South West InternetOfThings Network by Wo KingTechExeter
 
Generative Adversarial Networks by Tariq Rashid
Generative Adversarial Networks by Tariq RashidGenerative Adversarial Networks by Tariq Rashid
Generative Adversarial Networks by Tariq RashidTechExeter
 
Conf 2019 - Workshop: Liam Glanfield - know your threat actor
Conf 2019 - Workshop: Liam Glanfield - know your threat actorConf 2019 - Workshop: Liam Glanfield - know your threat actor
Conf 2019 - Workshop: Liam Glanfield - know your threat actorTechExeter
 
Conf 2018 Track 1 - Unicorns aren't real
Conf 2018 Track 1 - Unicorns aren't realConf 2018 Track 1 - Unicorns aren't real
Conf 2018 Track 1 - Unicorns aren't realTechExeter
 
Conf 2018 Track 1 - Aerospace Innovation
Conf 2018 Track 1 - Aerospace InnovationConf 2018 Track 1 - Aerospace Innovation
Conf 2018 Track 1 - Aerospace InnovationTechExeter
 
Conf 2018 Track 2 - Try Elm
Conf 2018 Track 2 - Try ElmConf 2018 Track 2 - Try Elm
Conf 2018 Track 2 - Try ElmTechExeter
 
Conf 2018 Track 3 - Creating marine geospatial services
Conf 2018 Track 3 - Creating marine geospatial servicesConf 2018 Track 3 - Creating marine geospatial services
Conf 2018 Track 3 - Creating marine geospatial servicesTechExeter
 
Conf 2018 Track 2 - Machine Learning with TensorFlow
Conf 2018 Track 2 - Machine Learning with TensorFlowConf 2018 Track 2 - Machine Learning with TensorFlow
Conf 2018 Track 2 - Machine Learning with TensorFlowTechExeter
 
Conf 2018 Track 2 - Custom Web Elements with Stencil
Conf 2018 Track 2 - Custom Web Elements with StencilConf 2018 Track 2 - Custom Web Elements with Stencil
Conf 2018 Track 2 - Custom Web Elements with StencilTechExeter
 
Conf 2018 Track 1 - Tessl / revolutionising the house moving process
Conf 2018 Track 1 - Tessl / revolutionising the house moving processConf 2018 Track 1 - Tessl / revolutionising the house moving process
Conf 2018 Track 1 - Tessl / revolutionising the house moving processTechExeter
 
Conf 2018 Keynote - Andy Stanford-Clark, CTO IBM UK
Conf 2018 Keynote - Andy Stanford-Clark, CTO IBM UKConf 2018 Keynote - Andy Stanford-Clark, CTO IBM UK
Conf 2018 Keynote - Andy Stanford-Clark, CTO IBM UKTechExeter
 
Conf 2018 Track 3 - Microservices - What I've learned after a year building s...
Conf 2018 Track 3 - Microservices - What I've learned after a year building s...Conf 2018 Track 3 - Microservices - What I've learned after a year building s...
Conf 2018 Track 3 - Microservices - What I've learned after a year building s...TechExeter
 
Gps behaving badly - Guy Busenel
Gps behaving badly - Guy BusenelGps behaving badly - Guy Busenel
Gps behaving badly - Guy BusenelTechExeter
 
Why Isn't My Query Using an Index?: An Introduction to SQL Performance
Why Isn't My Query Using an Index?: An Introduction to SQL Performance Why Isn't My Query Using an Index?: An Introduction to SQL Performance
Why Isn't My Query Using an Index?: An Introduction to SQL Performance TechExeter
 
Turning Developers into Testers
Turning Developers into TestersTurning Developers into Testers
Turning Developers into TestersTechExeter
 
Remote working
Remote workingRemote working
Remote workingTechExeter
 
Developing an Agile Mindset
Developing an Agile Mindset Developing an Agile Mindset
Developing an Agile Mindset TechExeter
 
Think like a gardener
Think like a gardenerThink like a gardener
Think like a gardenerTechExeter
 
The trials and tribulations of providing engineering infrastructure
 The trials and tribulations of providing engineering infrastructure  The trials and tribulations of providing engineering infrastructure
The trials and tribulations of providing engineering infrastructure TechExeter
 

Más de TechExeter (20)

Exeter Science Centre, by Natalie Whitehead
Exeter Science Centre, by Natalie WhiteheadExeter Science Centre, by Natalie Whitehead
Exeter Science Centre, by Natalie Whitehead
 
South West InternetOfThings Network by Wo King
South West InternetOfThings Network by Wo KingSouth West InternetOfThings Network by Wo King
South West InternetOfThings Network by Wo King
 
Generative Adversarial Networks by Tariq Rashid
Generative Adversarial Networks by Tariq RashidGenerative Adversarial Networks by Tariq Rashid
Generative Adversarial Networks by Tariq Rashid
 
Conf 2019 - Workshop: Liam Glanfield - know your threat actor
Conf 2019 - Workshop: Liam Glanfield - know your threat actorConf 2019 - Workshop: Liam Glanfield - know your threat actor
Conf 2019 - Workshop: Liam Glanfield - know your threat actor
 
Conf 2018 Track 1 - Unicorns aren't real
Conf 2018 Track 1 - Unicorns aren't realConf 2018 Track 1 - Unicorns aren't real
Conf 2018 Track 1 - Unicorns aren't real
 
Conf 2018 Track 1 - Aerospace Innovation
Conf 2018 Track 1 - Aerospace InnovationConf 2018 Track 1 - Aerospace Innovation
Conf 2018 Track 1 - Aerospace Innovation
 
Conf 2018 Track 2 - Try Elm
Conf 2018 Track 2 - Try ElmConf 2018 Track 2 - Try Elm
Conf 2018 Track 2 - Try Elm
 
Conf 2018 Track 3 - Creating marine geospatial services
Conf 2018 Track 3 - Creating marine geospatial servicesConf 2018 Track 3 - Creating marine geospatial services
Conf 2018 Track 3 - Creating marine geospatial services
 
Conf 2018 Track 2 - Machine Learning with TensorFlow
Conf 2018 Track 2 - Machine Learning with TensorFlowConf 2018 Track 2 - Machine Learning with TensorFlow
Conf 2018 Track 2 - Machine Learning with TensorFlow
 
Conf 2018 Track 2 - Custom Web Elements with Stencil
Conf 2018 Track 2 - Custom Web Elements with StencilConf 2018 Track 2 - Custom Web Elements with Stencil
Conf 2018 Track 2 - Custom Web Elements with Stencil
 
Conf 2018 Track 1 - Tessl / revolutionising the house moving process
Conf 2018 Track 1 - Tessl / revolutionising the house moving processConf 2018 Track 1 - Tessl / revolutionising the house moving process
Conf 2018 Track 1 - Tessl / revolutionising the house moving process
 
Conf 2018 Keynote - Andy Stanford-Clark, CTO IBM UK
Conf 2018 Keynote - Andy Stanford-Clark, CTO IBM UKConf 2018 Keynote - Andy Stanford-Clark, CTO IBM UK
Conf 2018 Keynote - Andy Stanford-Clark, CTO IBM UK
 
Conf 2018 Track 3 - Microservices - What I've learned after a year building s...
Conf 2018 Track 3 - Microservices - What I've learned after a year building s...Conf 2018 Track 3 - Microservices - What I've learned after a year building s...
Conf 2018 Track 3 - Microservices - What I've learned after a year building s...
 
Gps behaving badly - Guy Busenel
Gps behaving badly - Guy BusenelGps behaving badly - Guy Busenel
Gps behaving badly - Guy Busenel
 
Why Isn't My Query Using an Index?: An Introduction to SQL Performance
Why Isn't My Query Using an Index?: An Introduction to SQL Performance Why Isn't My Query Using an Index?: An Introduction to SQL Performance
Why Isn't My Query Using an Index?: An Introduction to SQL Performance
 
Turning Developers into Testers
Turning Developers into TestersTurning Developers into Testers
Turning Developers into Testers
 
Remote working
Remote workingRemote working
Remote working
 
Developing an Agile Mindset
Developing an Agile Mindset Developing an Agile Mindset
Developing an Agile Mindset
 
Think like a gardener
Think like a gardenerThink like a gardener
Think like a gardener
 
The trials and tribulations of providing engineering infrastructure
 The trials and tribulations of providing engineering infrastructure  The trials and tribulations of providing engineering infrastructure
The trials and tribulations of providing engineering infrastructure
 

Último

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Último (20)

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Data Science in the cloud with Microsoft Azure

  • 1. Data Science in the cloud with Microsoft Azure MARTIN THORNALLEY DATA SOLUTION ARCHITECT, MICROSOFT
  • 3. Data Science Definition “Data science is an interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured, which is a continuation of some of the data analysis fields such as statistics, machine learning, data mining, and predictive analytics” https://en.wikipedia.org/wiki/Data_science
  • 5. The Cloud Why does the Cloud matter for Data Science?  High capacity and cost effective data storage  Flexible, elastic compute capacity  Ready to use technologies  Choice of Infrastructure or Platform  Enables Agile & DevOps  Operational reliability and security  Pay as you go
  • 6. Microsoft Azure Cloud Platform  Wide range of services covering Compute, Web & Mobile, Data & Storage, Analytics, Internet of Things & Intelligence plus many more, see http://azureplatform.azurewebsites.net/en-us/  Easy to get started, free to try for 30 days but limited spend, also MSDN licence free credits, see https://azure.microsoft.com/en- gb/free/  Comprehensive documentation and examples  Global presence with many recognisable brands fully committed  Huge investment and growing rapidly
  • 9. NYC taxis  2013 NYC taxi trips and fares – open but non-trivial dataset  24 CSV files - 12 trip, 12 fare, 1 for each month  ~20GB compressed, ~50GB uncompressed, 170+ million records  medallion – vehicle identifier  hack license – driver identifier  passenger count  pickup & dropoff – datetime, longitude, latitude  trip – time and distance  fare - payment type, fare amount, surcharge, mta tax, tip amount, tolls amount, total amount http://www.andresmh.com/nyctaxitrips/
  • 10. Predictions  Predict whether a specific journey will result in a tip – binary classification  Predict what class of tip will be for a specific journey – multiclass classification  Predict how much a tip will be for a specific journey – regression
  • 12. Data Science Virtual Machine Create Linux and Windows virtual machines in minutes  Wide range of configurations - CPU cores, memory, disks, network speeds  Scale to what you need  Pay only for what you use  Enhance security and compliance  Preloaded with full set of tools and utilities from Azure MarketPlace e.g. SQL Server 2016 Developer edition, Azure SDK, Python, R, Jupyter, etc.
  • 13. Storage Accounts Massively scalable cloud storage for your applications  Security-enhanced, durable, and highly available across the globe  Industry-leading performance with exabytes of capacity  Pay only for what you use  Open, multi-platform support
  • 14. HDInsight A managed Apache Hadoop, Spark, R, HBase, and Storm cloud service made easy  Scale to petabytes on demand  Crunch all data—structured, semi-structured, unstructured  Skip buying and maintaining hardware  Spin up Apache Hadoop, Spark, and R clusters in the cloud  Use Excel or your favourite BI tool to visualize Hadoop data  Connect on-premises Hadoop clusters with the cloud
  • 15. Azure Machine Learning A fully managed cloud service that enables you to easily build, deploy, and share predictive analytics solutions.  Powerful cloud based analytics, now part of Cortana Intelligence Suite  Azure Machine Learning Studio includes hundreds of built-in packages and support for custom code  Share your solution with the world in the Gallery or on the Azure Marketplace
  • 17. Preparation & Exploration  Copy data using Azcopy and decompress  Inspect files and load in to RStudio  Create external Hive tables and load  Query over full dataset for further exploration  Remove erroneous data e.g. passenger numbers, lat/long  Engineer features using Hive  Distance from start to finish using Haversine calculation  Binary indicator for tips  Tip level based on ranges for multiclass classification  Downsample dataset and save as internal table for Machine Learning
  • 18. Machine Learning & Deployment  Import Data using Hive Query  Build Training Experiments  Evaluate model performance  Create Predictive Experiments  Publish Web Service  Test Web Service  Call from Excel
  • 19. Next Steps To build a fully fledged enterprise solution with regular data ingestion and model execution consider the following:  Data Catalog  Data Factory  Event Hubs & Stream Analytics  Power BI  Cognitive Services
  • 21. Summary  Microsoft Azure provides a wide range of technologies for Data Science activities  Platform services reduce the management overhead  No capacity limitations and flexible provisioning – pay as you go  Choice of Open Source and Microsoft – use the best tool for the task  The tools are well integrated  Azure Machine Learning makes it trivial to deploy your models  It’s quick and easy to get started
  • 22. Getting Started  Sign up for free https://azure.microsoft.com/en-gb/free/  Create a Data Science VM https://azure.microsoft.com/en-us/marketplace/partners/microsoft- ads/standard-data-science-vm/  Visit Cortana Intelligence Gallery https://gallery.cortanaintelligence.com/
  • 23. Q&A
  • 24. Thank You Martin Thornalley Data Solution Architect, Microsoft @mthornal martin.thornalley@microsoft.com https://www.linkedin.com/in/martinthornalley