1. Python and Data AnalyticsPython and Data Analytics
•Understand the problem By Understanding the Data
•Predictive Model Building: Balancing Performance, Complexity,
and theBig Data
4. Predictive model buildingPredictive model building
The process of building a predictive model is called
training.
Attributes: the variables being used to make predictions is known as:
◦ Predictors.
◦ Features
◦ Independent variables
◦ Input
Labels are also known as,
◦ Outcomes
◦ Targets
◦ Dependent variables
◦ Responses
5. A machine learning project may not be
linear, but it has a number of well known
steps:
Define Problem.
Prepare Data.
Evaluate Algorithms.
Improve Results.
Present Results.
6. the iris dataset has followingthe iris dataset has following
structurestructure
Attributes are numeric so you have to figure out
how to load and handle data.
It is a classification problem, allowing you to
practice with perhaps an easier type of supervised
learning algorithm.
It is a multi-class classification problem (multi-
nominal) that may require some specialized
handling.
It only has 4 attributes and 150 rows, meaning it is
small and easily fits into memory.
All of the numeric attributes are in the same units
and the same scale, not requiring any special scaling
or transforms to get started.
7. Machine Learning in Python:Machine Learning in Python:
Step-By-StepStep-By-Step
Installing the Python and SciPy
platform.
Loading the dataset.
Summarizing the dataset.
Visualizing the dataset.
Evaluating some algorithms.
Making some predictions.
8. Basic library in pythonBasic library in python
NumPy‘s array type augments the Python language
with an efficient data structure useful for numerical
work, e.g., manipulating matrices. NumPy also
provides basic numerical routines, such as tools for
finding eigenvectors.
SciPy contains additional routines needed in
scientific work: for example, routines for computing
integrals numerically, solving differential equations,
optimization, and sparse matrices.
The matplotlib module produces high quality plots.
With it you can turn your data or your models into
figures for presentations or articles. No need to do
the numerical work in one program, save the data,
and plot it with another program.
9. The Pandas module is a massive collaboration of many
modules along with some unique features to make a very
powerful module.
Pandas is great for data manipulation, data analysis, and data
visualization.
The Pandas modules uses objects to allow for data analysis
at a fairly high performance rate in comparison to typical
Python procedures. With it, we can easily read and write
from and to CSV files, or even databases.
From there, we can manipulate the data by columns, create
new columns, and even base the new columns on other
column data.
The scikit library used for
Simple and efficient tools for data mining and data analysis
Accessible to everybody, and reusable in various contexts
Built on NumPy, SciPy, and matplotlib
Open source, commercially usable
10. NumPy: Base n-dimensional array
package
SciPy: Fundamental library for scientific
computing
Matplotlib: Comprehensive 2D/3D
plotting
IPython: Enhanced interactive console
Sympy: Symbolic mathematics
Pandas: Data structures and analysis
11. 1. Downloading, Installing and Starting
Python SciPy
1.1 Install SciPy Libraries
There are 5 key libraries that you will need to
install. Below is a list of the Python SciPy
libraries required for this tutorial:
scipy
numpy
matplotlib
pandas
sklearn