While Machine learning and data mining has had profound impact on how we model applications and use data for better product consumption, there is scope for extending prediction algorithms to lower levels as well. Some useful applications of machine learning in ACS could be exploring better resource allocation that is aware of usage statistics, predicting faults, load balancing, etc. In this talk we will * take a broad overview of what Machine Learning/Data mining is and how it is being used in today's tech ecosystemn* explore ways in which we can make ACS more efficientn* discuss some recent advancements in how ML can benefit datacenters from research community
Ensuring Technical Readiness For Copilot in Microsoft 365
Anurag Awasthi - Machine Learning applications for CloudStack
1. Applications of ML/AI in
CloudStack
Anurag Awasthi
Software Engineer at ShapeBlue
anurag.awasthi@shapeblue.com
2. $whoami
● Software Engineer @ ShapeBlue and contributor to Apache CloudStack
○ Things I have worked on - CloudStack feature development, KVM, VR.
○ Relatively new to ACS
● Formerly Engineering at Twitter, PocketGems, Microsoft Research
● Formerly at Twitter, PocketGems, Microsoft Research.
○ Diverse experiences - Backend, Web, iOS, Android, Machine Learning.
● Loves programming (github.com/anuragaw), dogs and trekking
3. Introduction
● Aim of this talk
○ Introduce, explore, ignite a debate, demystify.
○ Don’t aim to make Data scientists but to propose ML in ACS
● Machine learning toolbox
○ Common tools, examples
● Current Scenario
● Possible use cases in ACS
● Conclusion
4. Machine learning toolbox
● What is Machine Learning?
○ Data, data and more data
○ Classification
○ Prediction
○ Underlying loss function: Optimization
● Supervised vs Unsupervised
● Common models
○ Support Vector Machines
○ Decision trees
○ Neural networks
○ Logistic regression
○ Deep learning
5. Machine learning toolbox
● What is Machine Learning?
○ Data, data and more data
○ Classification
○ Prediction
○ Underlying loss function: Optimization
● Supervised vs Unsupervised
● Common models
○ Support Vector Machines
○ Decision trees
○ Neural networks
○ Logistic regression
○ Deep learning
Y = m * X + c
13. Current scenario
● Automated control system implemented at Google to cool its data centers autonomously
● Claims saved 30% energy
● Methodology-
○ PUE = Power to facility / Power of IT Equipment
○ According to the Uptime Institute, the typical data center has an average PUE of 2.5
○ The input features (DC input variables) included the IT load, weather conditions,
number of chillers and cooling towers running, equipment setpoints, etc
○ Using the machine learning framework developed in this paper, we are able to predict
DC PUE within 0.004 +/ 0.005, approximately 0.4 percent error for a PUE of 1.1
https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/42542.pdf
14. Current scenario
● Facebook uses ML in Data center operations and job scheduling.
● Other companies - IBM, Huawei and HPE (has written white paper about ML reducing
downtime)
● Litbit has developed the first AI-powered, data center operator, Dac.
○ Dac will be able to use the strategically placed Internet of Things (IoT) sensors to
detect loose electric wires in the server rooms or water leakages in a cooling
system.
15. Current scenario
● AI based DCIM are on the rise as well - gathering raw IoT data, centralizing it, and using
AI algorithms to identify patterns, actionable information is generated that provides
operators with clear visibility to IT and infrastructure asset behaviors and recommended
corrective actions. E.g. Hewlett Packard’s Infosight,
● Maya HTT's data center infrastructure management software, Datacenter Clarity LC, uses
AI-powered tools to analyze individual servers to detect anomalies and opportunities for
optimization.
16. Possible use cases in ACS
● Time for ACS to evolve?
● Don’t need ML scientists in community but need some awareness.
● Perhaps start a dialogue in...
17. Possible use cases in Apache CloudStack
● Energy aware CloudStack
● Load balancing techniques
○ Network traffic congestion can be avoided
● VM host placement & migration
○ More aware VM deployment
● Volume host placement & migration
○ More available storage
● Host failure prediction and maintenance
○ Trigger migrations in case of failures and notify admins
18. Possible use cases in ACS
● Smarter Router
○ AI driven security tools injected into VRs
○ Predict failures and self healing and not just services driven
● Logs analysis for recommendation
○ Logs generated can provide meaningful information for possible healing actions
○ Can help in fixing some if not all cases to self heal
● Architecture wise separate plugins leveraging existing open source ML tools can be used
19. Conclusion
● Challenges:
○ Community needs to be aware for future trends
○ Data:
■ Where is the data? (Some effort in logging host level performance)
■ How to manage data?
● Next Steps:
○ Open a dialogue on users@ and dev@ to gather opinions
○ Add API support for non ACS user/experts to help train models, export data