SlideShare una empresa de Scribd logo
1 de 5
Descargar para leer sin conexión
CIS – 508: Cloud Based Machine Learning Platforms

Abstract- Recently, there has been a massive
increase in the scale and sophistication of Machine
Learning and Data Mining (MLDM) problems and
techniques. As a result, there has also been a rapidly
increasing need for cloud based systems that can
execute these MLDM algorithms efficiently.
Technology giants such as Amazon, Google,
Microsoft, IBM, and other organizations have created
cloud based machine learning platforms to address the
expanding demand of customers and clients who need
these machine learning algorithms to derive
meaningful insights from their data. This review
discusses several of the cloud based tools and
platforms in the current market which attend to the
dire need for handling the large volume of data and
conveniently applying machine learning algorithms to
them for efficient predictive analysis.
Index Terms— Machine Learning, Data Mining,
Predictive Analytics, Cloud Based ML Platforms.
I. INTRODUCTION
ince the 1950’s, when `Artificial Intelligence’
(AI) achieved recognition as a discipline,
machine learning (ML) has been at its core. Over the
year’s machine learning has undergone major
transformations, beginning in the mid 90’s where a
great deal of focus was on logistic regression,
Support Vector Machines (SVM), and PageRank. In
2005, neural networks underwent significant
research breakthroughs with its applications in
Computer Vision and Natural Language Processing.
Recently, with the advent of ‘Big Data’ and cloud
computing technologies, organizations have started
developing cloud based machine learning platforms
to integrate the convenience of cloud computing and
the power of machine learning. These include
Amazon, Microsoft, Google, BigML, FICO,
Yottamine, and IBM. This integration is a necessity
in the industry due to various reasons, several of
which are mentioned below:
 Accessibility of machine learning:
Machine Learning, in several ways, makes
use of historical data to forecast future data.
This makes it even more important for
machine learning to be available at all times
and connected to all possible sources of data,
including the cloud.
 Easy utilization of ML algorithms: Earlier, it
was required to download ML tools like SAS
on your desktop in order to apply ML
algorithms. With the help of cloud platforms,
ML algorithms can be applied directly on the
cloud with ease.
 Variability in ML workload: ML algorithms
processing workload varies from system to
system, based on its specifications. Cloud
based ML is resistant to such variations
because the processing is carried out on the
cloud itself.
 Inexpensive storage and processing power of
the cloud: Cloud computing has
revolutionized the concept of memory
storage. Cheap availability of cloud storage
has made cloud computing the easiest way to
evolve platforms and service customers.
This review dives into various cloud based machine
learning platforms launched in the industry recently
to understand the underlying agenda behind their
advent in the machine learning world.
Review: Cloud Based Machine Learning
Platforms
Sagar Khashu, Student, Arizona State University, MSBA
S
CIS – 508: Cloud Based Machine Learning Platforms
II. AMAZON MACHINE LEARNING
On April 9, 2015 at the Amazon Summit in San
Francisco, Amazon Web Services launched their
machine learning service. Since quite some time,
Amazon’s been on a steady lookout for efficient
utility of their EC2 cloud. They claim to have added
516 additional features in 2014, resulting in data
transfers in its storage service increasing by up by
102% compared to the previous year, while
computation activity has simultaneously gone up
93%. The integration of the machine learning
platform has been the most interesting development
for their EC2 cloud, yet.
Amazon Machine Learning is a managed service
which deploys predictive models by looking at a
user’s historical data. It can be used to depict
customer turnover and buying patterns, and also find
issues in customer supports. Amazon.com has
capitalized on their machine learning abilities by
developing “Recommender systems”, which provide
recommendations based on customer purchases or
interests. As per their claims, they have used the
same machine learning techniques, as in Amazon
Machine Learning, which helps them make 50
billion predictions per week on Amazon.com.
Amazon Machine Learning makes use of industry
standard logistic regression algorithms to generate
models. It also provides proper security measures
through encryption and secure (SSI) connections to
safeguard their client data. One true upside to
Amazon ML is that it can train models on datasets of
up to 100GB, even with minor discrepancies in the
data, and generate the required model. It fails if there
are more than 10,000 or 10% incomplete/missing
records in the dataset. Amazon ML includes
powerful model evaluation features, which help test
the biasing and accuracy of predicted models.
Amazon Machine Learning also provides several
parameters to fine tune the learning process: (a)
target size of the model, (b) the number of passes to
be made over the data, and (c) the type and amount
of regularization applied to the model. Additionally,
it helps adjust the interpretation cut-off score for
binary classification models, enabling an informed
trade-off between different kinds of mistakes that a
trained model can make. Once the model is ready,
there are two ways of retrieving the predictions:
(a) batch API or (b) real-time API. Batch API is used
to make predictions for large datasets, it works
offline and returns all predictions altogether. The
real-time API is used for prediction of individual
input data records instantaneously.
Amazon Machine Learning is a developer friendly
platform which does not require much prior machine
learning knowledge from the user to operate. The
only user pre-requisites are: (a) a clear idea of your
problems and targets, and (b) the maximum amount
of relevant, true data with minimum assumptions.
The first one, although it may seem trivial, is crucial
to understand if the Amazon Machine Learning
abilities can fit your scenario, because not all
problems can be solved by it. The second one is
required to avoid under-fitting or overfitting of the
model, hence requiring selection of correct features
based on your requirements is important for desired
prediction results. Thus, it is recommended to plan
an ‘evaluation phase’ to split the data into two
segments: train dataset, to train the model, and test
dataset, to test the model generated in training.
Amazon Machine Learning can be helpful even
during this evaluation phase by analyzing the data
source and better understanding the correlations
within the data through statistics and visualizations.
This can help you choose the right features as inputs
for your model. Moreover, the beauty of Amazon
ML is that it trains and tests many complex models
on its own, even altering various parameters by itself
to finally come up with the “best” predictive model
for your problem. As long as a valid data source is
provided to it, it can be used to solve most low-level
problems.
Amazon Machine Learning is highly scalable and
can generate billions of predictions in real- time with
high throughput. One can start small and scale up as
the application grows large without any setup cost,
because it is pay as you go.
III. MICROSOFT AZURE MACHINE LEARNING
Microsoft marked the release of its new cloud
based machine learning platform on Feb 18, 2015 to
empower companies to utilize the power of the cloud
to build applications and APIs as well as predict
future events. Its Beta version was released on June
16, 2014, since which they have introduced several
CIS – 508: Cloud Based Machine Learning Platforms
features such as the addition of Python, before its
official release. Moreover, the platform also supports
R, Hadoop, and Spark; giving it an edge for
processing big data.
Microsoft Azure ML gives developers the ability
to create predictive analytics models and deploy
them over cloud web services. It provides the ability
to integrate and easily access a variety of data
sources, apply popular ML algorithms, provide
extensive model evaluation abilities, and support
end-to-end workflows to build predictive models by
easily integrating the developer into the repeatable
workflow pattern.
Fig. 1. Workflow depicting the iterative nature of predictive
model generation in Microsoft Azure ML. (source: Microsoft
Azure Essentials: Azure Machine Learning (ISBN
9780735698178), by Jeff Barnes)
Microsoft Azure ML is based on an iterative
process of building models. It has the ability to
generate “experimental” models for the data,
determine their accuracy, fail fast, and move on to
developing the next model. This loop continues until
it has produced the best predictive model for the data.
Fig 1. illustrates the detailed steps involved in Azure
ML for achieving the desired predictive model.
Microsoft Azure ML also helps you clean the data,
compile it, and analyze the training and testing data
sets for discrepancies. Utilizing the preprocessed
data and its attributes, it generates a model using its
numerous built-in ML algorithms. It then evaluates
the credibility of this model by determining its
accuracy to predict the outcome correctly. If it does
not receive the required minimum accuracy of
results, it again reruns its various built-in algorithms
and re-ensembles until it attains the desired model
with the best possible confidence factor/accuracy.
This feedback mechanism forms the backbone of the
Azure ML model generation and refining process.
After refining the model for better prediction, it can
then be deployed as a scalable web service which
provides the predictive models with the flexibility of
cloud platform integration.
The underlying algorithms in Azure ML have been
divided into three subdivisions based on their utility:
(a) classification algorithms: which are used to
classify data into different class labels and then used
to predict one or more labels for records depending
on the attributes of the dataset, (b) regression
algorithms: which are used to predict continuous
values for the target variable. It can also be used for
prediction of continuous values based on time-
series, and (c) Clustering Algorithms: which are
used to cluster the records together based on the
values of the attributes.
Azure Machine Learning uses a variety of
underlying ML algorithms which can be broadly
classified into two categories: Supervised and
Unsupervised learning. Azure ML utilizes the idea of
‘supervised learning’ to train the datasets against
known inputs and outputs and to produce a model
which is utilized to predict the unknown output
values in testing datasets. Similarly, the concept of
‘unsupervised learning’ is utilized by Azure ML as it
observes natural patterns in data and accordingly
develops predictive models based on those
similarities.
Microsoft Azure ML Studio, which is the primary
tool used to develop the predictive analytic solutions
and models, provides a highly interactive workspace
to build, test, iterate, and deploy models with ease.
The entire environment is cloud based and self-
sufficient, which makes it accessible through
virtually any web browser from any part of the world.
Microsoft claims to have built Azure Machine
Learning technology based on the technology
incorporated in Xbox and Bing. Moreover, the
acquisition of ‘Revolution Analytics’ on April 6,
2015 can prove to be game changing as they can
integrate the power of the R environment in the
Azure ML platform with prowess.
IV. GOOGLE PREDICTION API
Google launched their cloud based prediction API
CIS – 508: Cloud Based Machine Learning Platforms
platform back in 2011, compared to the other cloud
based ML platforms which were released fairly
recently.
Their agenda for efficient utilization of their
platform was not just limited to developing
predictive models but they also provided the facility
to develop Smart Apps, which can provide
significant suggestions to users of such apps, as and
when required. Their focus has been to diversify its
utility to several apps worldwide and to generate
predictive models as an automatic response to their
stream of data which they can further use to provide
suggestions to their users. This can be done in three
simple steps: (i) Upload: Uploading your dataset to
Google Storage (cloud) since prediction is done after
analyzing the values of attributes for the historical
data, (ii) Train: Building the model from your data
by applying various machine learning algorithms and
suggesting the best possible model obtained as a
result. (iii) Predict: Generating new predictions
based on the developed model for providing
meaningful suggestions to the app users.
Google Prediction API has been termed as a
“Black Box” by several critics, because one gets no
control or visibility to the underlying complex
mechanisms and algorithms running to provide the
best possible predictive model. The usage of Google
Prediction API is done in one of the three types of
problems: (a) Regression: which requires a
continuous output value as the predicted value, (b)
Classification: when the output value can take only
a specific set of values/labels, and (c) Binary
Classification: in which the output can take either of
two possible values (for instance, True and False).
Google Prediction API makes things easier by
keeping no restrictions on the type of input data. The
only requirement is for the dataset to be formatted in
the right manner, such that the first column
represents the target variable and each row acts as an
input vector of attributes. Other aspects such as
feature selection, normalization, and data type
detection are all handled by Google Prediction API.
An upside to this platform is that it allows hassle-free
updates to the generated model without going
through the training phase again. In addition, Google
Prediction API can easily be used with all other
Google services. For security reasons, the data
provided to Google Cloud Storage is replicated to
multiple ‘ambiguous’ data centers as well as
replicated within the data center.
A big drawback of Google Prediction API is that it
only supports ‘supervised learning’. ‘Unsupervised
learning’ is not yet supported by Google Prediction
API. Another hindrance in the usage of Google
Prediction API is that it supports only Python scripts
via an API call. Therefore, for non-coders, it is
advisable to use API Explorer for Web interfacing.
Also, it does not support all file types as data sources.
A ‘.csv’ file up to size of 2.5GB is acceptable
amongst few other types of file loading options.
Prediction API can be useful for a wide variety of
applications such as gene expression, fraud
detection, language identification, customer habit
analysis, sentiment analysis, and other such
applications. Overall, Google Prediction API is a
useful platform for real-time predictions based on
‘supervised learning’.
V. CONCLUSION
Creating machine learning algorithms and testing
them iteratively in order to devise the best predictive
model is a costly and tedious affair. The new era of
technology has simplified this tedious task with the
introduction of cloud based machine learning
platforms for various applications. This makes the
process of developing complex machine learning
models simple, even for people without statistical or
data mining backgrounds. With a variety of
companies offering their platforms for diverse
purposes, each of them aims to dominate the
potential customer base for these platforms.
Amazon Machine Learning, an offering of their
fast-growing Amazon Web Services, is looking to
dig its roots deeper in the market by providing
efficient predictive models for both supervised and
unsupervised learning scenarios, which could prove
to be a huge positive in expanding their potential
customer base.
The flexibility provided by the Microsoft Azure
ML platform in terms of types of dataset and
interface is unparalleled. Moreover, their recent
acquisition of Revolution Analytics may prove to be
a huge advantage for them in the cloud based
machine learning platform market.
CIS – 508: Cloud Based Machine Learning Platforms
Google Prediction API, which has been in the
market for a significant time now, has been aiming
towards the development of Smart Apps with the
help of their platform. This will enable not just
companies, but also individuals to realize the power
of Google’s Prediction API. Moreover, easy
integration of Google’s Prediction API with
Google’s other APIs will diversify its utility.
With the advent of iOT, there is going to be a
massive increase in the need for cloud based machine
learning algorithm platforms to analyze the
magnanimous stream of data that will be generated
from devices. As technology evolves, there will be a
rising need for not just predictive, but prescriptive
analytics as well. There is no doubt that smart
machines are going to play a significant role in the
way businesses develop in the future.
REFERENCES
[1] Introduction to Decision Trees - J.R. Quinlan
[2] http://radar.oreilly.com/2015/05/on-the-evolution-of-
machine-learning.html
[3] http://www.informationweek.com/cloud/infrastructure-as-
a-service/amazon-launches-machine-learning-as-a-
service/d/d-id/1319868
[4] http://cloudacademy.com/blog/aws-machine-learning/
[5] https://aws.amazon.com/machine-learning/faqs/
[6] Microsoft Azure Essentials: Azure Machine Learning,
Published: April 2015|237 pages, Jeff Barnes
[7] http://blogs.technet.com/b/machinelearning/archive/2015/
04/06/microsoft-closes-acquisition-of-revolution-
analytics.aspx
[8] http://techcrunch.com/2015/02/18/microsoft-officially-
launches-azure-machine-learning-big-data-platform/ -
TechCrunch
[9] http://techcrunch.com/2014/06/16/microsoft-announces-
azure-ml-cloud-based-machine-learning-platform-that-
can-predict-future-events/ - TechCrunch
[10]http://www.kdnuggets.com/2015/04/cloud-machine-
learning-amazon-ibm-watson-microsoft-azure.html - KD
Nuggets
[11]http://cloudacademy.com/blog/google-prediction-api/
[12]https://youtu.be/FJDP_0Mrb-w
[13]https://cloud.google.com/prediction/docs/faq
[14]http://www.v3.co.uk/v3-uk/feature/2404892/the-rise-of-
machine-learning-microsoft-aws-and-ibm-leading-the-
era-of-ai
[15]http://www.kdnuggets.com/2014/12/ibm-watson-
analytics-microsoft-azure-machine-learning-p1.html

Más contenido relacionado

La actualidad más candente

Building Intelligent Apps with MongoDB & Google Cloud
Building Intelligent Apps with MongoDB & Google CloudBuilding Intelligent Apps with MongoDB & Google Cloud
Building Intelligent Apps with MongoDB & Google CloudMongoDB
 
Home robots meet IBM Watson for Voice UI, and AI
Home robots meet IBM Watson for Voice UI, and AIHome robots meet IBM Watson for Voice UI, and AI
Home robots meet IBM Watson for Voice UI, and AIBill Liu
 
Building Enterprise Mashups - Web 2.0 conference
Building Enterprise Mashups - Web 2.0 conferenceBuilding Enterprise Mashups - Web 2.0 conference
Building Enterprise Mashups - Web 2.0 conferencemogrinz
 
Zeller Edm Summit Agile Deployment Of Predictive Analytics
Zeller Edm Summit   Agile Deployment Of Predictive AnalyticsZeller Edm Summit   Agile Deployment Of Predictive Analytics
Zeller Edm Summit Agile Deployment Of Predictive AnalyticsRonald.Ramos
 
Getting Started with Azure AutoML
Getting Started with Azure AutoMLGetting Started with Azure AutoML
Getting Started with Azure AutoMLVivek Raja P S
 
Machine Learning in Microsoft Azure
Machine Learning in Microsoft AzureMachine Learning in Microsoft Azure
Machine Learning in Microsoft AzureDmitry Petukhov
 
John Robert: Making your machine learning model usable by others
John Robert: Making your machine learning model usable by othersJohn Robert: Making your machine learning model usable by others
John Robert: Making your machine learning model usable by othersLviv Startup Club
 
Your data in the cloud windows azure
Your data in the cloud   windows azureYour data in the cloud   windows azure
Your data in the cloud windows azureNigel Watson
 
Globant - Amazon recognition workshop - 2018
Globant - Amazon recognition workshop - 2018  Globant - Amazon recognition workshop - 2018
Globant - Amazon recognition workshop - 2018 Globant
 

La actualidad más candente (9)

Building Intelligent Apps with MongoDB & Google Cloud
Building Intelligent Apps with MongoDB & Google CloudBuilding Intelligent Apps with MongoDB & Google Cloud
Building Intelligent Apps with MongoDB & Google Cloud
 
Home robots meet IBM Watson for Voice UI, and AI
Home robots meet IBM Watson for Voice UI, and AIHome robots meet IBM Watson for Voice UI, and AI
Home robots meet IBM Watson for Voice UI, and AI
 
Building Enterprise Mashups - Web 2.0 conference
Building Enterprise Mashups - Web 2.0 conferenceBuilding Enterprise Mashups - Web 2.0 conference
Building Enterprise Mashups - Web 2.0 conference
 
Zeller Edm Summit Agile Deployment Of Predictive Analytics
Zeller Edm Summit   Agile Deployment Of Predictive AnalyticsZeller Edm Summit   Agile Deployment Of Predictive Analytics
Zeller Edm Summit Agile Deployment Of Predictive Analytics
 
Getting Started with Azure AutoML
Getting Started with Azure AutoMLGetting Started with Azure AutoML
Getting Started with Azure AutoML
 
Machine Learning in Microsoft Azure
Machine Learning in Microsoft AzureMachine Learning in Microsoft Azure
Machine Learning in Microsoft Azure
 
John Robert: Making your machine learning model usable by others
John Robert: Making your machine learning model usable by othersJohn Robert: Making your machine learning model usable by others
John Robert: Making your machine learning model usable by others
 
Your data in the cloud windows azure
Your data in the cloud   windows azureYour data in the cloud   windows azure
Your data in the cloud windows azure
 
Globant - Amazon recognition workshop - 2018
Globant - Amazon recognition workshop - 2018  Globant - Amazon recognition workshop - 2018
Globant - Amazon recognition workshop - 2018
 

Similar a Cloud Machine Learning Platform Review

my ppt preentation.pptx
my ppt preentation.pptxmy ppt preentation.pptx
my ppt preentation.pptxSaikiran447644
 
Sustainable & Composable Generative AI
Sustainable & Composable Generative AISustainable & Composable Generative AI
Sustainable & Composable Generative AIDebmalya Biswas
 
Future of work machine learning and middle level jobs 112618
Future of work machine learning and middle level jobs 112618Future of work machine learning and middle level jobs 112618
Future of work machine learning and middle level jobs 112618Economic Strategy Institute
 
Machine learning in the enterprise
Machine learning in the enterpriseMachine learning in the enterprise
Machine learning in the enterpriseJesus Rodriguez
 
Machine Learning: The First Salvo of the AI Business Revolution
Machine Learning: The First Salvo of the AI Business RevolutionMachine Learning: The First Salvo of the AI Business Revolution
Machine Learning: The First Salvo of the AI Business RevolutionCognizant
 
Compositional AI: Fusion of AI/ML Services
Compositional AI: Fusion of AI/ML ServicesCompositional AI: Fusion of AI/ML Services
Compositional AI: Fusion of AI/ML ServicesDebmalya Biswas
 
Azure ml and dynamics 365
Azure ml and dynamics 365Azure ml and dynamics 365
Azure ml and dynamics 365Jivtesh Singh
 
K-MUG Azure Machine Learning
K-MUG Azure Machine LearningK-MUG Azure Machine Learning
K-MUG Azure Machine LearningPraveen Nair
 
MLOps for Compositional AI
MLOps for Compositional AIMLOps for Compositional AI
MLOps for Compositional AIDebmalya Biswas
 
From notebook to production with Amazon Sagemaker
From notebook to production with Amazon SagemakerFrom notebook to production with Amazon Sagemaker
From notebook to production with Amazon SagemakerAmazon Web Services
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine LearningEng Teong Cheah
 
Mining Intelligent Insights: AI/ML for Financial Services
Mining Intelligent Insights: AI/ML for Financial ServicesMining Intelligent Insights: AI/ML for Financial Services
Mining Intelligent Insights: AI/ML for Financial ServicesAmazon Web Services LATAM
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
FSI202 Machine Learning in Capital Markets
FSI202 Machine Learning in Capital MarketsFSI202 Machine Learning in Capital Markets
FSI202 Machine Learning in Capital MarketsAmazon Web Services
 
Why MLOps is Essential for AI-enabled Enterprises.pdf
Why MLOps is Essential for AI-enabled Enterprises.pdfWhy MLOps is Essential for AI-enabled Enterprises.pdf
Why MLOps is Essential for AI-enabled Enterprises.pdfEnterprise Insider
 

Similar a Cloud Machine Learning Platform Review (20)

my ppt preentation.pptx
my ppt preentation.pptxmy ppt preentation.pptx
my ppt preentation.pptx
 
Technovision
TechnovisionTechnovision
Technovision
 
Sustainable & Composable Generative AI
Sustainable & Composable Generative AISustainable & Composable Generative AI
Sustainable & Composable Generative AI
 
unit_5.pdf
unit_5.pdfunit_5.pdf
unit_5.pdf
 
Future of work machine learning and middle level jobs 112618
Future of work machine learning and middle level jobs 112618Future of work machine learning and middle level jobs 112618
Future of work machine learning and middle level jobs 112618
 
How Does Cloud-based Machine Learning impact your Business?
How Does Cloud-based Machine Learning impact your Business?How Does Cloud-based Machine Learning impact your Business?
How Does Cloud-based Machine Learning impact your Business?
 
Machine learning in the enterprise
Machine learning in the enterpriseMachine learning in the enterprise
Machine learning in the enterprise
 
Machine Learning: The First Salvo of the AI Business Revolution
Machine Learning: The First Salvo of the AI Business RevolutionMachine Learning: The First Salvo of the AI Business Revolution
Machine Learning: The First Salvo of the AI Business Revolution
 
Compositional AI: Fusion of AI/ML Services
Compositional AI: Fusion of AI/ML ServicesCompositional AI: Fusion of AI/ML Services
Compositional AI: Fusion of AI/ML Services
 
Azure ml and dynamics 365
Azure ml and dynamics 365Azure ml and dynamics 365
Azure ml and dynamics 365
 
K-MUG Azure Machine Learning
K-MUG Azure Machine LearningK-MUG Azure Machine Learning
K-MUG Azure Machine Learning
 
MLOps for Compositional AI
MLOps for Compositional AIMLOps for Compositional AI
MLOps for Compositional AI
 
From notebook to production with Amazon Sagemaker
From notebook to production with Amazon SagemakerFrom notebook to production with Amazon Sagemaker
From notebook to production with Amazon Sagemaker
 
TechDayPakistan-Slides RAG with Cosmos DB.pptx
TechDayPakistan-Slides RAG with Cosmos DB.pptxTechDayPakistan-Slides RAG with Cosmos DB.pptx
TechDayPakistan-Slides RAG with Cosmos DB.pptx
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
 
Introducing Amazon SageMaker
Introducing Amazon SageMakerIntroducing Amazon SageMaker
Introducing Amazon SageMaker
 
Mining Intelligent Insights: AI/ML for Financial Services
Mining Intelligent Insights: AI/ML for Financial ServicesMining Intelligent Insights: AI/ML for Financial Services
Mining Intelligent Insights: AI/ML for Financial Services
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
FSI202 Machine Learning in Capital Markets
FSI202 Machine Learning in Capital MarketsFSI202 Machine Learning in Capital Markets
FSI202 Machine Learning in Capital Markets
 
Why MLOps is Essential for AI-enabled Enterprises.pdf
Why MLOps is Essential for AI-enabled Enterprises.pdfWhy MLOps is Essential for AI-enabled Enterprises.pdf
Why MLOps is Essential for AI-enabled Enterprises.pdf
 

Cloud Machine Learning Platform Review

  • 1. CIS – 508: Cloud Based Machine Learning Platforms  Abstract- Recently, there has been a massive increase in the scale and sophistication of Machine Learning and Data Mining (MLDM) problems and techniques. As a result, there has also been a rapidly increasing need for cloud based systems that can execute these MLDM algorithms efficiently. Technology giants such as Amazon, Google, Microsoft, IBM, and other organizations have created cloud based machine learning platforms to address the expanding demand of customers and clients who need these machine learning algorithms to derive meaningful insights from their data. This review discusses several of the cloud based tools and platforms in the current market which attend to the dire need for handling the large volume of data and conveniently applying machine learning algorithms to them for efficient predictive analysis. Index Terms— Machine Learning, Data Mining, Predictive Analytics, Cloud Based ML Platforms. I. INTRODUCTION ince the 1950’s, when `Artificial Intelligence’ (AI) achieved recognition as a discipline, machine learning (ML) has been at its core. Over the year’s machine learning has undergone major transformations, beginning in the mid 90’s where a great deal of focus was on logistic regression, Support Vector Machines (SVM), and PageRank. In 2005, neural networks underwent significant research breakthroughs with its applications in Computer Vision and Natural Language Processing. Recently, with the advent of ‘Big Data’ and cloud computing technologies, organizations have started developing cloud based machine learning platforms to integrate the convenience of cloud computing and the power of machine learning. These include Amazon, Microsoft, Google, BigML, FICO, Yottamine, and IBM. This integration is a necessity in the industry due to various reasons, several of which are mentioned below:  Accessibility of machine learning: Machine Learning, in several ways, makes use of historical data to forecast future data. This makes it even more important for machine learning to be available at all times and connected to all possible sources of data, including the cloud.  Easy utilization of ML algorithms: Earlier, it was required to download ML tools like SAS on your desktop in order to apply ML algorithms. With the help of cloud platforms, ML algorithms can be applied directly on the cloud with ease.  Variability in ML workload: ML algorithms processing workload varies from system to system, based on its specifications. Cloud based ML is resistant to such variations because the processing is carried out on the cloud itself.  Inexpensive storage and processing power of the cloud: Cloud computing has revolutionized the concept of memory storage. Cheap availability of cloud storage has made cloud computing the easiest way to evolve platforms and service customers. This review dives into various cloud based machine learning platforms launched in the industry recently to understand the underlying agenda behind their advent in the machine learning world. Review: Cloud Based Machine Learning Platforms Sagar Khashu, Student, Arizona State University, MSBA S
  • 2. CIS – 508: Cloud Based Machine Learning Platforms II. AMAZON MACHINE LEARNING On April 9, 2015 at the Amazon Summit in San Francisco, Amazon Web Services launched their machine learning service. Since quite some time, Amazon’s been on a steady lookout for efficient utility of their EC2 cloud. They claim to have added 516 additional features in 2014, resulting in data transfers in its storage service increasing by up by 102% compared to the previous year, while computation activity has simultaneously gone up 93%. The integration of the machine learning platform has been the most interesting development for their EC2 cloud, yet. Amazon Machine Learning is a managed service which deploys predictive models by looking at a user’s historical data. It can be used to depict customer turnover and buying patterns, and also find issues in customer supports. Amazon.com has capitalized on their machine learning abilities by developing “Recommender systems”, which provide recommendations based on customer purchases or interests. As per their claims, they have used the same machine learning techniques, as in Amazon Machine Learning, which helps them make 50 billion predictions per week on Amazon.com. Amazon Machine Learning makes use of industry standard logistic regression algorithms to generate models. It also provides proper security measures through encryption and secure (SSI) connections to safeguard their client data. One true upside to Amazon ML is that it can train models on datasets of up to 100GB, even with minor discrepancies in the data, and generate the required model. It fails if there are more than 10,000 or 10% incomplete/missing records in the dataset. Amazon ML includes powerful model evaluation features, which help test the biasing and accuracy of predicted models. Amazon Machine Learning also provides several parameters to fine tune the learning process: (a) target size of the model, (b) the number of passes to be made over the data, and (c) the type and amount of regularization applied to the model. Additionally, it helps adjust the interpretation cut-off score for binary classification models, enabling an informed trade-off between different kinds of mistakes that a trained model can make. Once the model is ready, there are two ways of retrieving the predictions: (a) batch API or (b) real-time API. Batch API is used to make predictions for large datasets, it works offline and returns all predictions altogether. The real-time API is used for prediction of individual input data records instantaneously. Amazon Machine Learning is a developer friendly platform which does not require much prior machine learning knowledge from the user to operate. The only user pre-requisites are: (a) a clear idea of your problems and targets, and (b) the maximum amount of relevant, true data with minimum assumptions. The first one, although it may seem trivial, is crucial to understand if the Amazon Machine Learning abilities can fit your scenario, because not all problems can be solved by it. The second one is required to avoid under-fitting or overfitting of the model, hence requiring selection of correct features based on your requirements is important for desired prediction results. Thus, it is recommended to plan an ‘evaluation phase’ to split the data into two segments: train dataset, to train the model, and test dataset, to test the model generated in training. Amazon Machine Learning can be helpful even during this evaluation phase by analyzing the data source and better understanding the correlations within the data through statistics and visualizations. This can help you choose the right features as inputs for your model. Moreover, the beauty of Amazon ML is that it trains and tests many complex models on its own, even altering various parameters by itself to finally come up with the “best” predictive model for your problem. As long as a valid data source is provided to it, it can be used to solve most low-level problems. Amazon Machine Learning is highly scalable and can generate billions of predictions in real- time with high throughput. One can start small and scale up as the application grows large without any setup cost, because it is pay as you go. III. MICROSOFT AZURE MACHINE LEARNING Microsoft marked the release of its new cloud based machine learning platform on Feb 18, 2015 to empower companies to utilize the power of the cloud to build applications and APIs as well as predict future events. Its Beta version was released on June 16, 2014, since which they have introduced several
  • 3. CIS – 508: Cloud Based Machine Learning Platforms features such as the addition of Python, before its official release. Moreover, the platform also supports R, Hadoop, and Spark; giving it an edge for processing big data. Microsoft Azure ML gives developers the ability to create predictive analytics models and deploy them over cloud web services. It provides the ability to integrate and easily access a variety of data sources, apply popular ML algorithms, provide extensive model evaluation abilities, and support end-to-end workflows to build predictive models by easily integrating the developer into the repeatable workflow pattern. Fig. 1. Workflow depicting the iterative nature of predictive model generation in Microsoft Azure ML. (source: Microsoft Azure Essentials: Azure Machine Learning (ISBN 9780735698178), by Jeff Barnes) Microsoft Azure ML is based on an iterative process of building models. It has the ability to generate “experimental” models for the data, determine their accuracy, fail fast, and move on to developing the next model. This loop continues until it has produced the best predictive model for the data. Fig 1. illustrates the detailed steps involved in Azure ML for achieving the desired predictive model. Microsoft Azure ML also helps you clean the data, compile it, and analyze the training and testing data sets for discrepancies. Utilizing the preprocessed data and its attributes, it generates a model using its numerous built-in ML algorithms. It then evaluates the credibility of this model by determining its accuracy to predict the outcome correctly. If it does not receive the required minimum accuracy of results, it again reruns its various built-in algorithms and re-ensembles until it attains the desired model with the best possible confidence factor/accuracy. This feedback mechanism forms the backbone of the Azure ML model generation and refining process. After refining the model for better prediction, it can then be deployed as a scalable web service which provides the predictive models with the flexibility of cloud platform integration. The underlying algorithms in Azure ML have been divided into three subdivisions based on their utility: (a) classification algorithms: which are used to classify data into different class labels and then used to predict one or more labels for records depending on the attributes of the dataset, (b) regression algorithms: which are used to predict continuous values for the target variable. It can also be used for prediction of continuous values based on time- series, and (c) Clustering Algorithms: which are used to cluster the records together based on the values of the attributes. Azure Machine Learning uses a variety of underlying ML algorithms which can be broadly classified into two categories: Supervised and Unsupervised learning. Azure ML utilizes the idea of ‘supervised learning’ to train the datasets against known inputs and outputs and to produce a model which is utilized to predict the unknown output values in testing datasets. Similarly, the concept of ‘unsupervised learning’ is utilized by Azure ML as it observes natural patterns in data and accordingly develops predictive models based on those similarities. Microsoft Azure ML Studio, which is the primary tool used to develop the predictive analytic solutions and models, provides a highly interactive workspace to build, test, iterate, and deploy models with ease. The entire environment is cloud based and self- sufficient, which makes it accessible through virtually any web browser from any part of the world. Microsoft claims to have built Azure Machine Learning technology based on the technology incorporated in Xbox and Bing. Moreover, the acquisition of ‘Revolution Analytics’ on April 6, 2015 can prove to be game changing as they can integrate the power of the R environment in the Azure ML platform with prowess. IV. GOOGLE PREDICTION API Google launched their cloud based prediction API
  • 4. CIS – 508: Cloud Based Machine Learning Platforms platform back in 2011, compared to the other cloud based ML platforms which were released fairly recently. Their agenda for efficient utilization of their platform was not just limited to developing predictive models but they also provided the facility to develop Smart Apps, which can provide significant suggestions to users of such apps, as and when required. Their focus has been to diversify its utility to several apps worldwide and to generate predictive models as an automatic response to their stream of data which they can further use to provide suggestions to their users. This can be done in three simple steps: (i) Upload: Uploading your dataset to Google Storage (cloud) since prediction is done after analyzing the values of attributes for the historical data, (ii) Train: Building the model from your data by applying various machine learning algorithms and suggesting the best possible model obtained as a result. (iii) Predict: Generating new predictions based on the developed model for providing meaningful suggestions to the app users. Google Prediction API has been termed as a “Black Box” by several critics, because one gets no control or visibility to the underlying complex mechanisms and algorithms running to provide the best possible predictive model. The usage of Google Prediction API is done in one of the three types of problems: (a) Regression: which requires a continuous output value as the predicted value, (b) Classification: when the output value can take only a specific set of values/labels, and (c) Binary Classification: in which the output can take either of two possible values (for instance, True and False). Google Prediction API makes things easier by keeping no restrictions on the type of input data. The only requirement is for the dataset to be formatted in the right manner, such that the first column represents the target variable and each row acts as an input vector of attributes. Other aspects such as feature selection, normalization, and data type detection are all handled by Google Prediction API. An upside to this platform is that it allows hassle-free updates to the generated model without going through the training phase again. In addition, Google Prediction API can easily be used with all other Google services. For security reasons, the data provided to Google Cloud Storage is replicated to multiple ‘ambiguous’ data centers as well as replicated within the data center. A big drawback of Google Prediction API is that it only supports ‘supervised learning’. ‘Unsupervised learning’ is not yet supported by Google Prediction API. Another hindrance in the usage of Google Prediction API is that it supports only Python scripts via an API call. Therefore, for non-coders, it is advisable to use API Explorer for Web interfacing. Also, it does not support all file types as data sources. A ‘.csv’ file up to size of 2.5GB is acceptable amongst few other types of file loading options. Prediction API can be useful for a wide variety of applications such as gene expression, fraud detection, language identification, customer habit analysis, sentiment analysis, and other such applications. Overall, Google Prediction API is a useful platform for real-time predictions based on ‘supervised learning’. V. CONCLUSION Creating machine learning algorithms and testing them iteratively in order to devise the best predictive model is a costly and tedious affair. The new era of technology has simplified this tedious task with the introduction of cloud based machine learning platforms for various applications. This makes the process of developing complex machine learning models simple, even for people without statistical or data mining backgrounds. With a variety of companies offering their platforms for diverse purposes, each of them aims to dominate the potential customer base for these platforms. Amazon Machine Learning, an offering of their fast-growing Amazon Web Services, is looking to dig its roots deeper in the market by providing efficient predictive models for both supervised and unsupervised learning scenarios, which could prove to be a huge positive in expanding their potential customer base. The flexibility provided by the Microsoft Azure ML platform in terms of types of dataset and interface is unparalleled. Moreover, their recent acquisition of Revolution Analytics may prove to be a huge advantage for them in the cloud based machine learning platform market.
  • 5. CIS – 508: Cloud Based Machine Learning Platforms Google Prediction API, which has been in the market for a significant time now, has been aiming towards the development of Smart Apps with the help of their platform. This will enable not just companies, but also individuals to realize the power of Google’s Prediction API. Moreover, easy integration of Google’s Prediction API with Google’s other APIs will diversify its utility. With the advent of iOT, there is going to be a massive increase in the need for cloud based machine learning algorithm platforms to analyze the magnanimous stream of data that will be generated from devices. As technology evolves, there will be a rising need for not just predictive, but prescriptive analytics as well. There is no doubt that smart machines are going to play a significant role in the way businesses develop in the future. REFERENCES [1] Introduction to Decision Trees - J.R. Quinlan [2] http://radar.oreilly.com/2015/05/on-the-evolution-of- machine-learning.html [3] http://www.informationweek.com/cloud/infrastructure-as- a-service/amazon-launches-machine-learning-as-a- service/d/d-id/1319868 [4] http://cloudacademy.com/blog/aws-machine-learning/ [5] https://aws.amazon.com/machine-learning/faqs/ [6] Microsoft Azure Essentials: Azure Machine Learning, Published: April 2015|237 pages, Jeff Barnes [7] http://blogs.technet.com/b/machinelearning/archive/2015/ 04/06/microsoft-closes-acquisition-of-revolution- analytics.aspx [8] http://techcrunch.com/2015/02/18/microsoft-officially- launches-azure-machine-learning-big-data-platform/ - TechCrunch [9] http://techcrunch.com/2014/06/16/microsoft-announces- azure-ml-cloud-based-machine-learning-platform-that- can-predict-future-events/ - TechCrunch [10]http://www.kdnuggets.com/2015/04/cloud-machine- learning-amazon-ibm-watson-microsoft-azure.html - KD Nuggets [11]http://cloudacademy.com/blog/google-prediction-api/ [12]https://youtu.be/FJDP_0Mrb-w [13]https://cloud.google.com/prediction/docs/faq [14]http://www.v3.co.uk/v3-uk/feature/2404892/the-rise-of- machine-learning-microsoft-aws-and-ibm-leading-the- era-of-ai [15]http://www.kdnuggets.com/2014/12/ibm-watson- analytics-microsoft-azure-machine-learning-p1.html