18. Azure Machine
Learning
• A "new" cloud-based service from Microsoft
• Integrates with existing Cloud technologies
• Use ready-made algorithms
• Program custom algorithms tuned to your problem
• You can evaluate it for free
22. • Browser based
• Drag n Drop
• Flowchart-y
• Example data sets
• Use R or Python
• Excellent intro wizard
ML Studio
23.
24. • Import data
• Filter and aggregate data
• Create machine learning models
• Run experiments
• Publish finished model
Provides tools to:
25.
26. The Learning Process
• Define a problem you want to solve
• Design a solution
• Experiment!
!
• Identify your data
• Train the model with the data
• Evaluate against expected results (speed and
accuracy)
• Adapt data or algorithm (or both)
• Repeat
!
• Save the best model
• Publish
• Run with live data
31. Scaling
• Instances auto-scale based on the CPU% metric
using Azure’s standard scaling model.
• Azure standard scaling is slow
• Once auto scaler notices we need more capacity,
the demand has often disappeared!
• Not a good user experience
34. Hackathon!
• Can we build a better autoscaler?
• Spin-up before high demand
• Tear-down when idle
• Better Cost vs UX
35.
36. Requirements
• What will "we" need on a given date or time?
• Do "we" need to take action now to compensate for
what will happen in 20 minutes time?
• Number of instances
• Predicted CPU
37. Best Predictor of Demand?
• Sessions?
• Instance Memory Use?
• Instance CPU?
38.
39. Table Storage Diagnostics
• Too slow
• Purging
• ML queries all or nothing
• ML Data Reader stops after 4GB
• GB !!!!
• ML times-out after ~3 Hours
43. Neural Net Experiments
!
• Feed Forward NN
• Written using R libraries
• Good predictor for 10-20 minute window
• Too inaccurate after that
• Best compromise between precision and speed
• Recurrent NN better at forecasting
• RNN execution time too long
• Need to reduce data to optimal subset
44. Stream Analytics
!
• Real-time data analysis
• Fast
• Sql-like syntax
• Range of inputs and outputs
• Interesting development
45.
46.
47.
48.
49.
50. Anomalies
• Dev Process is painful
• Syntax Errors
• “Test” Import Behaviour
• Starting and Stopping and Starting and Stopping
63. Bugs
• We were pushing the environment quite hard
• YMMV
• ML studio has bugs
• Parallel tasks !Parallel
• ML portal missing functionality preventing it being
production ready
64. #DevOps
• Sharing models is "public" - Gallery
• No export support
• No support (yet) for model deployment
• Still Drag n Drop
• PowerShell for EventHubs and Stream Analytics
65. Machine Learning
• Parallel R processing library would help
• Finding an appropriate solution often requires a data science
specialist
• Solution is only as good as your data
• You may need to compromise on accuracy for speed
• Cost
• Hosting
• Each call to the service