The document discusses several cloud-based machine learning platforms from major technology companies, including Amazon Machine Learning, Microsoft Azure Machine Learning, and Google Prediction API. It provides an overview of the capabilities and features of each platform, describing how they allow users to train models on large datasets, deploy predictive models as scalable web services, and integrate with other cloud services and data sources. The document concludes that these platforms make machine learning more accessible and simplify complex model development, and their use will continue growing with the rise of big data and IoT.
Why MLOps is Essential for AI-enabled Enterprises.pdf
Cloud Machine Learning Platform Review
1. CIS – 508: Cloud Based Machine Learning Platforms
Abstract- Recently, there has been a massive
increase in the scale and sophistication of Machine
Learning and Data Mining (MLDM) problems and
techniques. As a result, there has also been a rapidly
increasing need for cloud based systems that can
execute these MLDM algorithms efficiently.
Technology giants such as Amazon, Google,
Microsoft, IBM, and other organizations have created
cloud based machine learning platforms to address the
expanding demand of customers and clients who need
these machine learning algorithms to derive
meaningful insights from their data. This review
discusses several of the cloud based tools and
platforms in the current market which attend to the
dire need for handling the large volume of data and
conveniently applying machine learning algorithms to
them for efficient predictive analysis.
Index Terms— Machine Learning, Data Mining,
Predictive Analytics, Cloud Based ML Platforms.
I. INTRODUCTION
ince the 1950’s, when `Artificial Intelligence’
(AI) achieved recognition as a discipline,
machine learning (ML) has been at its core. Over the
year’s machine learning has undergone major
transformations, beginning in the mid 90’s where a
great deal of focus was on logistic regression,
Support Vector Machines (SVM), and PageRank. In
2005, neural networks underwent significant
research breakthroughs with its applications in
Computer Vision and Natural Language Processing.
Recently, with the advent of ‘Big Data’ and cloud
computing technologies, organizations have started
developing cloud based machine learning platforms
to integrate the convenience of cloud computing and
the power of machine learning. These include
Amazon, Microsoft, Google, BigML, FICO,
Yottamine, and IBM. This integration is a necessity
in the industry due to various reasons, several of
which are mentioned below:
Accessibility of machine learning:
Machine Learning, in several ways, makes
use of historical data to forecast future data.
This makes it even more important for
machine learning to be available at all times
and connected to all possible sources of data,
including the cloud.
Easy utilization of ML algorithms: Earlier, it
was required to download ML tools like SAS
on your desktop in order to apply ML
algorithms. With the help of cloud platforms,
ML algorithms can be applied directly on the
cloud with ease.
Variability in ML workload: ML algorithms
processing workload varies from system to
system, based on its specifications. Cloud
based ML is resistant to such variations
because the processing is carried out on the
cloud itself.
Inexpensive storage and processing power of
the cloud: Cloud computing has
revolutionized the concept of memory
storage. Cheap availability of cloud storage
has made cloud computing the easiest way to
evolve platforms and service customers.
This review dives into various cloud based machine
learning platforms launched in the industry recently
to understand the underlying agenda behind their
advent in the machine learning world.
Review: Cloud Based Machine Learning
Platforms
Sagar Khashu, Student, Arizona State University, MSBA
S
2. CIS – 508: Cloud Based Machine Learning Platforms
II. AMAZON MACHINE LEARNING
On April 9, 2015 at the Amazon Summit in San
Francisco, Amazon Web Services launched their
machine learning service. Since quite some time,
Amazon’s been on a steady lookout for efficient
utility of their EC2 cloud. They claim to have added
516 additional features in 2014, resulting in data
transfers in its storage service increasing by up by
102% compared to the previous year, while
computation activity has simultaneously gone up
93%. The integration of the machine learning
platform has been the most interesting development
for their EC2 cloud, yet.
Amazon Machine Learning is a managed service
which deploys predictive models by looking at a
user’s historical data. It can be used to depict
customer turnover and buying patterns, and also find
issues in customer supports. Amazon.com has
capitalized on their machine learning abilities by
developing “Recommender systems”, which provide
recommendations based on customer purchases or
interests. As per their claims, they have used the
same machine learning techniques, as in Amazon
Machine Learning, which helps them make 50
billion predictions per week on Amazon.com.
Amazon Machine Learning makes use of industry
standard logistic regression algorithms to generate
models. It also provides proper security measures
through encryption and secure (SSI) connections to
safeguard their client data. One true upside to
Amazon ML is that it can train models on datasets of
up to 100GB, even with minor discrepancies in the
data, and generate the required model. It fails if there
are more than 10,000 or 10% incomplete/missing
records in the dataset. Amazon ML includes
powerful model evaluation features, which help test
the biasing and accuracy of predicted models.
Amazon Machine Learning also provides several
parameters to fine tune the learning process: (a)
target size of the model, (b) the number of passes to
be made over the data, and (c) the type and amount
of regularization applied to the model. Additionally,
it helps adjust the interpretation cut-off score for
binary classification models, enabling an informed
trade-off between different kinds of mistakes that a
trained model can make. Once the model is ready,
there are two ways of retrieving the predictions:
(a) batch API or (b) real-time API. Batch API is used
to make predictions for large datasets, it works
offline and returns all predictions altogether. The
real-time API is used for prediction of individual
input data records instantaneously.
Amazon Machine Learning is a developer friendly
platform which does not require much prior machine
learning knowledge from the user to operate. The
only user pre-requisites are: (a) a clear idea of your
problems and targets, and (b) the maximum amount
of relevant, true data with minimum assumptions.
The first one, although it may seem trivial, is crucial
to understand if the Amazon Machine Learning
abilities can fit your scenario, because not all
problems can be solved by it. The second one is
required to avoid under-fitting or overfitting of the
model, hence requiring selection of correct features
based on your requirements is important for desired
prediction results. Thus, it is recommended to plan
an ‘evaluation phase’ to split the data into two
segments: train dataset, to train the model, and test
dataset, to test the model generated in training.
Amazon Machine Learning can be helpful even
during this evaluation phase by analyzing the data
source and better understanding the correlations
within the data through statistics and visualizations.
This can help you choose the right features as inputs
for your model. Moreover, the beauty of Amazon
ML is that it trains and tests many complex models
on its own, even altering various parameters by itself
to finally come up with the “best” predictive model
for your problem. As long as a valid data source is
provided to it, it can be used to solve most low-level
problems.
Amazon Machine Learning is highly scalable and
can generate billions of predictions in real- time with
high throughput. One can start small and scale up as
the application grows large without any setup cost,
because it is pay as you go.
III. MICROSOFT AZURE MACHINE LEARNING
Microsoft marked the release of its new cloud
based machine learning platform on Feb 18, 2015 to
empower companies to utilize the power of the cloud
to build applications and APIs as well as predict
future events. Its Beta version was released on June
16, 2014, since which they have introduced several
3. CIS – 508: Cloud Based Machine Learning Platforms
features such as the addition of Python, before its
official release. Moreover, the platform also supports
R, Hadoop, and Spark; giving it an edge for
processing big data.
Microsoft Azure ML gives developers the ability
to create predictive analytics models and deploy
them over cloud web services. It provides the ability
to integrate and easily access a variety of data
sources, apply popular ML algorithms, provide
extensive model evaluation abilities, and support
end-to-end workflows to build predictive models by
easily integrating the developer into the repeatable
workflow pattern.
Fig. 1. Workflow depicting the iterative nature of predictive
model generation in Microsoft Azure ML. (source: Microsoft
Azure Essentials: Azure Machine Learning (ISBN
9780735698178), by Jeff Barnes)
Microsoft Azure ML is based on an iterative
process of building models. It has the ability to
generate “experimental” models for the data,
determine their accuracy, fail fast, and move on to
developing the next model. This loop continues until
it has produced the best predictive model for the data.
Fig 1. illustrates the detailed steps involved in Azure
ML for achieving the desired predictive model.
Microsoft Azure ML also helps you clean the data,
compile it, and analyze the training and testing data
sets for discrepancies. Utilizing the preprocessed
data and its attributes, it generates a model using its
numerous built-in ML algorithms. It then evaluates
the credibility of this model by determining its
accuracy to predict the outcome correctly. If it does
not receive the required minimum accuracy of
results, it again reruns its various built-in algorithms
and re-ensembles until it attains the desired model
with the best possible confidence factor/accuracy.
This feedback mechanism forms the backbone of the
Azure ML model generation and refining process.
After refining the model for better prediction, it can
then be deployed as a scalable web service which
provides the predictive models with the flexibility of
cloud platform integration.
The underlying algorithms in Azure ML have been
divided into three subdivisions based on their utility:
(a) classification algorithms: which are used to
classify data into different class labels and then used
to predict one or more labels for records depending
on the attributes of the dataset, (b) regression
algorithms: which are used to predict continuous
values for the target variable. It can also be used for
prediction of continuous values based on time-
series, and (c) Clustering Algorithms: which are
used to cluster the records together based on the
values of the attributes.
Azure Machine Learning uses a variety of
underlying ML algorithms which can be broadly
classified into two categories: Supervised and
Unsupervised learning. Azure ML utilizes the idea of
‘supervised learning’ to train the datasets against
known inputs and outputs and to produce a model
which is utilized to predict the unknown output
values in testing datasets. Similarly, the concept of
‘unsupervised learning’ is utilized by Azure ML as it
observes natural patterns in data and accordingly
develops predictive models based on those
similarities.
Microsoft Azure ML Studio, which is the primary
tool used to develop the predictive analytic solutions
and models, provides a highly interactive workspace
to build, test, iterate, and deploy models with ease.
The entire environment is cloud based and self-
sufficient, which makes it accessible through
virtually any web browser from any part of the world.
Microsoft claims to have built Azure Machine
Learning technology based on the technology
incorporated in Xbox and Bing. Moreover, the
acquisition of ‘Revolution Analytics’ on April 6,
2015 can prove to be game changing as they can
integrate the power of the R environment in the
Azure ML platform with prowess.
IV. GOOGLE PREDICTION API
Google launched their cloud based prediction API
4. CIS – 508: Cloud Based Machine Learning Platforms
platform back in 2011, compared to the other cloud
based ML platforms which were released fairly
recently.
Their agenda for efficient utilization of their
platform was not just limited to developing
predictive models but they also provided the facility
to develop Smart Apps, which can provide
significant suggestions to users of such apps, as and
when required. Their focus has been to diversify its
utility to several apps worldwide and to generate
predictive models as an automatic response to their
stream of data which they can further use to provide
suggestions to their users. This can be done in three
simple steps: (i) Upload: Uploading your dataset to
Google Storage (cloud) since prediction is done after
analyzing the values of attributes for the historical
data, (ii) Train: Building the model from your data
by applying various machine learning algorithms and
suggesting the best possible model obtained as a
result. (iii) Predict: Generating new predictions
based on the developed model for providing
meaningful suggestions to the app users.
Google Prediction API has been termed as a
“Black Box” by several critics, because one gets no
control or visibility to the underlying complex
mechanisms and algorithms running to provide the
best possible predictive model. The usage of Google
Prediction API is done in one of the three types of
problems: (a) Regression: which requires a
continuous output value as the predicted value, (b)
Classification: when the output value can take only
a specific set of values/labels, and (c) Binary
Classification: in which the output can take either of
two possible values (for instance, True and False).
Google Prediction API makes things easier by
keeping no restrictions on the type of input data. The
only requirement is for the dataset to be formatted in
the right manner, such that the first column
represents the target variable and each row acts as an
input vector of attributes. Other aspects such as
feature selection, normalization, and data type
detection are all handled by Google Prediction API.
An upside to this platform is that it allows hassle-free
updates to the generated model without going
through the training phase again. In addition, Google
Prediction API can easily be used with all other
Google services. For security reasons, the data
provided to Google Cloud Storage is replicated to
multiple ‘ambiguous’ data centers as well as
replicated within the data center.
A big drawback of Google Prediction API is that it
only supports ‘supervised learning’. ‘Unsupervised
learning’ is not yet supported by Google Prediction
API. Another hindrance in the usage of Google
Prediction API is that it supports only Python scripts
via an API call. Therefore, for non-coders, it is
advisable to use API Explorer for Web interfacing.
Also, it does not support all file types as data sources.
A ‘.csv’ file up to size of 2.5GB is acceptable
amongst few other types of file loading options.
Prediction API can be useful for a wide variety of
applications such as gene expression, fraud
detection, language identification, customer habit
analysis, sentiment analysis, and other such
applications. Overall, Google Prediction API is a
useful platform for real-time predictions based on
‘supervised learning’.
V. CONCLUSION
Creating machine learning algorithms and testing
them iteratively in order to devise the best predictive
model is a costly and tedious affair. The new era of
technology has simplified this tedious task with the
introduction of cloud based machine learning
platforms for various applications. This makes the
process of developing complex machine learning
models simple, even for people without statistical or
data mining backgrounds. With a variety of
companies offering their platforms for diverse
purposes, each of them aims to dominate the
potential customer base for these platforms.
Amazon Machine Learning, an offering of their
fast-growing Amazon Web Services, is looking to
dig its roots deeper in the market by providing
efficient predictive models for both supervised and
unsupervised learning scenarios, which could prove
to be a huge positive in expanding their potential
customer base.
The flexibility provided by the Microsoft Azure
ML platform in terms of types of dataset and
interface is unparalleled. Moreover, their recent
acquisition of Revolution Analytics may prove to be
a huge advantage for them in the cloud based
machine learning platform market.
5. CIS – 508: Cloud Based Machine Learning Platforms
Google Prediction API, which has been in the
market for a significant time now, has been aiming
towards the development of Smart Apps with the
help of their platform. This will enable not just
companies, but also individuals to realize the power
of Google’s Prediction API. Moreover, easy
integration of Google’s Prediction API with
Google’s other APIs will diversify its utility.
With the advent of iOT, there is going to be a
massive increase in the need for cloud based machine
learning algorithm platforms to analyze the
magnanimous stream of data that will be generated
from devices. As technology evolves, there will be a
rising need for not just predictive, but prescriptive
analytics as well. There is no doubt that smart
machines are going to play a significant role in the
way businesses develop in the future.
REFERENCES
[1] Introduction to Decision Trees - J.R. Quinlan
[2] http://radar.oreilly.com/2015/05/on-the-evolution-of-
machine-learning.html
[3] http://www.informationweek.com/cloud/infrastructure-as-
a-service/amazon-launches-machine-learning-as-a-
service/d/d-id/1319868
[4] http://cloudacademy.com/blog/aws-machine-learning/
[5] https://aws.amazon.com/machine-learning/faqs/
[6] Microsoft Azure Essentials: Azure Machine Learning,
Published: April 2015|237 pages, Jeff Barnes
[7] http://blogs.technet.com/b/machinelearning/archive/2015/
04/06/microsoft-closes-acquisition-of-revolution-
analytics.aspx
[8] http://techcrunch.com/2015/02/18/microsoft-officially-
launches-azure-machine-learning-big-data-platform/ -
TechCrunch
[9] http://techcrunch.com/2014/06/16/microsoft-announces-
azure-ml-cloud-based-machine-learning-platform-that-
can-predict-future-events/ - TechCrunch
[10]http://www.kdnuggets.com/2015/04/cloud-machine-
learning-amazon-ibm-watson-microsoft-azure.html - KD
Nuggets
[11]http://cloudacademy.com/blog/google-prediction-api/
[12]https://youtu.be/FJDP_0Mrb-w
[13]https://cloud.google.com/prediction/docs/faq
[14]http://www.v3.co.uk/v3-uk/feature/2404892/the-rise-of-
machine-learning-microsoft-aws-and-ibm-leading-the-
era-of-ai
[15]http://www.kdnuggets.com/2014/12/ibm-watson-
analytics-microsoft-azure-machine-learning-p1.html