1. PRESENT BY: ABDUL AHAD ABRO
1
Data Science & Predictive Analytics
Computer Engineering Department, Ege University, Turkey
Presentations 2
Veri Bilimi ve Tahmin Edici
Analizler
May 16-2017
2. Veri Bilimi ( Data Science )
Tahmin Edici Analizler ( Predictive Analytics )
Makine öğrenme ( Machine Learning )
Makine Öğrenme Algoritması ( Machine Learning Algorithm )
Regression & Classification
Microsoft Azure Machine Learning Studio
Regression with Microsoft Excel
Academic Studies / Articles / Publications
Agenda
2
3. Or
Data Science is an umbrella that contain many other fields like Machine learning, Data
Mining, big Data, statistics, Data visualization and data analytics.
What is Data Science ? Veri Bilimi nedir
Data science also known as data-driven science, is
an interdisciplinary field about scientific methods,
processes and systems to extract knowledge from
data in various forms, either structured or
unstructured, similar to Knowledge Discovery in
Databases (KDD) [4].
3
4. Predictive Analytics Tahmin Edici Analizler
Predictive analytics is the branch of the advanced analytics which is
used to make predictions about unknown future events. Predictive
analytics uses many techniques from data mining, statistics,
modeling, machine learning, and artificial intelligence to analyze
current data to make predictions about future. It uses a number of
data mining, and analytical techniques to bring together the
management, information technology, and modeling business
process to make predictions about future. The patterns found in
historical and transactional data can be used to identify risks and
opportunities for future.
4
6. Machine Learning Makine öğrenme
A branch of artificial intelligence, concerned with the design and development of
algorithms that allow computers to evolve behaviors based on empirical data.
Construction and study of systems that can learn from data.
6
7. Machine Learning Algorithm
Supervised learning ---
Predicting the Future..
Learn from the past example to predict future.
Unsupervised learning ---
Understanding the past
Making Sense of Data.
Learning Structure of Data
Compressing data for consumption.
Semi-supervised learning --- A mix of supervised and unsupervised
learning.
Reinforcement learning --- Allows the machine or software agent to
learn its behavior based on feedback from the environment. This
behavior can be learnt once and for all, or keep on adapting as time
goes by.
7
9. Regression:: In supervised learning target variables in regression must be continuous.
Categorical target variables are modelled in classification.
Regression has less or even no emphasis on using probability to describe the random
variation between the predictor and the target.
Regression is used to predict continuous values.
9
Classification:: Classification is used to predict which class a data point is part of (discrete value).
The Training data (observations, measurements, etc.) are accompanied by labels
indicating the class of observations.
New data is classified based on the training set.
10. 10
Abstract
The use of the term Data Science is becoming increasingly common. The Machine
learning specifically supervised learning, classification and regression process state with
Microsoft Azure Machine learning studio and resultant compare with Microsoft excel
classification and regression for sophistication and accuracy of result. In this mean
process classification and linear regression result in both cases consider probably same
as in equally data set utilized..
12. 12
Introduction
Data Science is an interdisciplinary field about processes and systems to extract knowledge or insights
from data in various forms. Data Science is not just Machine Learning. Data science is a concept to
unify statistics, data analysis and their related methods in order to understand and analyze actual
phenomena with data.
15. 15
Methodology
Supervised machine learning methods for regression and classification have been designed mostly for
data types that lie in vector spaces i.e. response (i.e. output) and/or predictor (i.e. input) variables are
often arranged into vectors of predefined dimensionality [2]. In this methodology the machine
learning specifically classification and linear regression have used. For sake of that research
methodology data sets is required for performing the classification and linear regression methods.
Many data sets available online resources but the most sophisticated, accurate and authentic data
sets utilized in this research which got from Microsoft Azure Machine learning studio data sets. The
data sets named as Automobile with multivariate data type includes 205 Instances with 26 attributes.
In this data sets various specification of have discussed regarding automobiles like fuel type,
aspiration, number of doors, engine-location, length, width, height, horsepower, city-mpg, highway-
mpg, peak-rpm, price and so on.
16. 16
Methodology (continue)
The linear regression and classification process take place over Microsoft azure machine learning
studio, over attributes which is based on dependent and independent variables of data sets where
built in saved data sets, trained model, machine learning, score model and evaluation model are
present for ready to use. Moreover for perfection and comparing the result of Microsoft azure
machine learning studio, Microsoft excel program has used for linear regression by utilizing the same
dataset and comparing the final result.
18. Academic Studies
18
Regression and classification using extreme learning machine based
on L1-norm and L2-norm. (Least Absolute Deviations and Least Squares)
Extreme learning machine (ELM) is a very simple machine learning algorithm and it can achieve a good
generalization performance with extremely fast speed. Therefore it has practical significance for data analysis in
real-world application. At the information stage, there has been a growing interest in the study of data analysis
techniques. Techniques of data analysis can extract previously unknown, hidden, but potentially useful
information and knowledge from original data, which is helpful to provide suggestions or decisions for future
actions .
Data analysis plays a huge guidance role for making future plans in practical applications. In this paper, a novel
algorithm called L1–L2-ELM was proposed as an effective technology in data analysis. It can deal with multiple-
output regression and multiple-class classification problems in a unified framework.
Xiong Luo, Xiaohui Chang, Xiaojuan Ban
19. Academic Studies
19
Minimal Learning Machine: A novel supervised distance-based approach for
Regression and classification.
Supervised machine learning methods for regression and classification have been designed mostly for data types
that lie in vector spaces, i.e. response (i.e output) and/or predictor (i.e. input)variables are often arranged in to
vectors of predefined dimensionality. There are other types of data, however such as graphs, sequences, shapes,
images, trees and covariance matrices, which are less amenable to being treated within standard
regression/classification frameworks. These data types usually do not lie in a natural vector space, but rather in a
metric space.
For regression tasks, there are some prior works in which the response and/or the predictor variables are
expressed as distance (i.e. dissimilarity) matrices. A comprehensive set of computer experiments illustrates that
the proposed method achieves accuracies that are comparable to more traditional machine learning methods for
regression and classification thus offering a computationally valid alternative to such approaches.
Amauri Holanda de Souza Júnior, Francesco Corona, Guilherme A. Barreto b,Yoan Miche, Amaury Lendasse
20. Python Code
20
# Required Packages
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn import datasets, linear_model
# Function to get data
def get_data(file_name):
data = pd.read_csv(file_name)
x_parameter = []
y_parameter = []
for single_square_feet ,single_price_value in
zip(data['square_feet'],data['price']):
x_parameter.append([float(single_square_feet)])
y_parameter.append(float(single_price_value))
return x_parameter,y_parameter
x,y = get_data('input_data.csv')
print x
print y
21. Python Code
21
# Function for Fitting data to Linear model
def
linear_model_main(X_parameters,Y_parameters,predict_
value):
# Create linear regression object
regr = linear_model.LinearRegression()
regr.fit(X_parameters, Y_parameters)
predict_outcome = regr.predict(predict_value)
predictions = {}
predictions['intercept'] = regr.intercept_
predictions['coefficient'] = regr.coef_
predictions['predicted_value'] = predict_outcome
return predictions
x,y = get_data('input_data.csv')
predict_value = 700
result = linear_model_main(x,y,predict_value)
print "Intercept value " , result['intercept']
print "coefficient" , result['coefficient']
print "Predicted value: ",result['predicted_value']
# Function to show the resutls of linear fit model
def show_linear_line(X_parameters,Y_parameters):
# Create linear regression object
regr = linear_model.LinearRegression()
regr.fit(X_parameters, Y_parameters)
plt.scatter(X_parameters,Y_parameters,color='blue')
plt.plot(X_parameters,regr.predict(X_parameters),color='red',li
newidth=4)
plt.xticks(())
plt.yticks(())
plt.show()
show_linear_line(X,Y)
22. References
[1] Xiong Luo, Xiaohui Chang, Xiaojuan Ban (2015). Regression and classification using extreme
learning machine based on L1-norm and L2-norm. (Least Absolute Deviations and Least Squares).
[2] Amauri Holanda de Souza Júnior, Francesco Corona, Guilherme A. Barreto b,Yoan Miche, Amaury
Lendasse (2015). Minimal Learning Machine: A novel supervised distance-based approach for
regression and classification.
[3] Sumit Mund (2015). Microsoft Azure Machine Learning.
[4] https://en.wikipedia.org/wiki/Data_science
22