SlideShare una empresa de Scribd logo
1 de 106
Data Science Tutorial
What’s in it for you?
What is Data Science? Who is a Data Scientist?
What are the skills of a Data Scientist?
What does a Data Scientist do?
Data Acquisition
Data Preparation
Data Mining with Tableau
Model Building & Testing
Model Maintenance
Summary
What is Data Science?
Data Science is the area of study which involves extracting knowledge
from all the data you can gather
What is Data Science?
Hence, translating a business problem into a research project, and then translating the results
back into a practical solution requires a deep understanding of the business domain and
creativity
What is Data Science?
Fraudulent transactions are ramping on the internet and the results can be devastating
What is Data Science?
Data Science can help in fraud detection using advanced machine learning algorithms and prevent great
monetary losses
What is Data Science?
And people doing all this work are called Data Scientists
What does a Data Scientist do?
Model Maintenance
Data Acquisition
Data Preparation
Data Mining
Data Modelling
Model Maintenance
Model Maintenance
What does a Data Scientist do?
Data Acquisition
Data Preparation
Data Mining
Data Modelling
Model Maintenance
Data Acquisition
Data Acquisition involves
extracting data from multiple
source systems
Integrating and transforming the
data into homogeneous format
Loading the transformed data
into data warehouse
Multiple source
systems (Database,
FlatFiles etc)
Data Acquisition
Data Acquisition involves
extracting data from multiple
source systems
Integrating and transforming the
data into homogeneous format
Loading the transformed data
into data warehouse
Transformation
Data Acquisition
Data Acquisition involves
extracting data from multiple
source systems
Integrating and transforming the
data into homogeneous format
Loading the transformed data
into data warehouse
Data Warehouse
Data Acquisition
Data warehouse
Multiple source
systems (Database,
FlatFiles etc.)
Transformation
Data Acquisition
Data warehouse
Multiple source
systems (Database,
FlatFiles etc)
Transformation
This process is mainly referred
as ETL (Extract, Transform,
Load)
Data Acquisition
Various tools are available for ETL process like Talend Studio,
DataStage, Informatica
Model Maintenance
What does a Data Scientist do?
Data Acquisition
Data Preparation
Data Mining
Model Building
Model Maintenance
Data Preparation
The most essential part of any Data Science project is Data Preparation. It consumes 60% of
the time spent on the project.
There are many things we can do to ensure
data is used in the most productive and
meaningful manner
Data Cleaning
Data Transformation
Handling Outliers
Data Integration
Data Reduction
Data Preparation
DATA CLEANING:
• Data cleaning is important because bad data may lead to a bad
model
• It handles missing values which may be due to various reasons
• In this, NULL or unwanted values is also handled
• It improves business decisions and increases productivity
Data Cleaning
Data Transformation
Handling Outliers
Data Integration
Data Reduction
Data Preparation
DATA TRANSFORMATION:
• Data Transformation turns raw data formats into desired outputs
• It involves normalization of data
• Min-max normalization
• Z-score normalization
Data Cleaning
Data Transformation
Handling Outliers
Data Integration
Data Reduction
Data Preparation
HANDLING OUTLIERS:
• Outliers are observations which are distant from the rest of the
data.
• They can be good or bad for the data
• Univariate and Multivariate methods are used for detection
• Outliers can be used for fraud detection
• Plots used: Scatter plots, box plots
Data Cleaning
Data Transformation
Handling Outliers
Data Integration
Data Reduction
Data Preparation
DATA INTEGRITY:
• Data integrity means that data is accurate and reliable
• It is important where business decisions are made on the basis of
data regularly
Data Cleaning
Data Transformation
Handling Outliers
Data Integration
Data Reduction
Data Preparation
DATA REDUCTION:
• This step reduces multitudinous amounts of data into meaningful
parts
• It increases storage capacity
• It is a factor helping in reducing costs
• Involves data deduplication which eliminates redundant data
Data Preparation – Data Cleaning
Data cleaning is one of the most common tasks of data preparation which involves ensuring that the data is:
Valid
Consistent
Uniform
Accurate
Data Preparation – Data Cleaning
Let’s have a look at a bank’s data with it’s customer details
Data Preparation – Data Cleaning
We can import and read the data using pandas library of Python
Let’s look at the Geography column which has some missing
values
In this case, we might not want to assume the location of the
customers
Data Preparation – Data Cleaning
So, let’s fill the missing values with empty string
Data Preparation – Data Cleaning
In numerical columns, filling the unwanted values with 'mean' can help us to even out the data set:
Before: After:
Data Preparation – Data Cleaning
Data.CreditScore =
data.CreditScore.fillna(data.creditScore.mean())
We can remove all rows with unwanted values, however this is a very aggressive step
Dropping all rows with
any NA values is easy:
data.dropna()
Data Preparation – Data Cleaning
Dropping all rows with
any NA values is easy:
data.dropna()
Data Preparation – Data Cleaning
We can remove all rows with unwanted values, however this is a very aggressive step
We can also drop rows that
have all NA values:
data.dropna(how=’all’)
Dropping all rows with
any NA values is easy:
data.dropna()
We can also drop rows that
have all NA values:
data.dropna(how=’all’)
Data Preparation – Data Cleaning
We can remove all rows with unwanted values, however this is a very aggressive step
We can also put a
limitation example, the
data needs to have at least
5 non-null values
data.dropna(thresh=5)
Model Maintenance
Data Mining
Data Acquisition
Data Preparation
Data Mining
Model Building
Model Maintenance
Data Mining
After acquiring the data, preparing and
cleaning it, Data Scientist discovers
patterns and relationship in data to make
better business decisions
Data Mining
It’s a discovery process, to get
hidden and useful knowledge.
It is commonly referred as ‘Data
Mining’
Data Mining
Let me show you how to do data
mining using Tableau.
Data Mining using Tableau
You can easily download Tableau software from
https://www.tableau.com/
Data Mining using Tableau
You can connect your csv file to Tableau desktop to start
mining the data!
You can connect to various other sources also
Data Mining using Tableau
Tableau will automatically identify the fields as Dimensions
(descriptive) and Measures (numeric fields)
Problem Statement
It is important to understand the
problem statement first!
In this problem, we have a dataset of a bank’s customers.
We want to understand the exit behavior of the customers based
on different variables, for instance:
Data Mining using Tableau
• Gender
• Credit card holding
• Geography
Data Mining using Tableau
Now, we have a dataset of about 10,000 rows as shown below and we need to find out potential
customers who will exit first
Let’s first evaluate on the basis of gender, to understand if this variable will affect the model
Data Mining using Tableau
We have moved the variable ‘exited’ from Measures to Dimensions
Data Mining using Tableau
Here, Tableau recognizes ‘exited’ column as measure because of its
numerical datatype. However, ‘exited’ is categorical for us as it tells
whether a person has left or not.
So, we have placed ‘exited’ in dimensions as seen in the screenshot:
Data Mining using Tableau
Data Mining using Tableau
If we put ‘gender’ in columns and ‘exited’ on colors
Where:
0 – people who did not exit
1 – people who did exit
For better understanding, let’s
prepare a bar chart which
depicts male/female who
stayed/exited
Data Mining using Tableau
Data Mining using Tableau
Stayed
Exited
You can give alias describing that 0
means stayed and 1 means exited
To compare the exit rates, you can also add a
reference line by right clicking on the right
axis and selecting ‘Add Reference Line’
Data Mining using Tableau
Data Mining using Tableau
Data Mining using Tableau
So, we have added the reference line and we can say
that female customers exit more than average as
compared to male customers!
We can also see on the basis of whether or not they have a credit card. Again if a person has credit
card or not is a categorical decision so we put it into dimensions and then evaluate:
Data Mining using Tableau
As explained earlier, we have added the
alias and a reference line.
Hence, a customer leaving the bank
is not dependent on a person holding
credit card
Data Mining using Tableau
Here, we can see that we cannot differentiate the exit behavior between people having a credit and those who do not
Here, we can see that customers leaving the
bank in Germany is more than that in France or
Spain. So, we treat this as an anomaly and it
can impact the model
Similarly, we can evaluate on the basis of geography:
Data Mining using Tableau
Data Mining using Tableau
So, we can ignore the variable ‘has
credit card’ because it will not impact the
model
Data Mining using Tableau
And focus on other variables like
‘gender’, ‘geography’ which can
actually have an impact
Data Mining using Tableau
So, we can make business
decisions based on these findings.
Lets see some advantages of data
mining
Advantages of Data Mining
Predicts future trends
Signifies customer patterns
Helps decision making
Quick fraud detection
Choosing right algorithm
Model Maintenance
Model Building
Data Acquisition
Data Preparation
Data Mining
Model Building
Model Maintenance
What is Model Building?
The model is built by selecting a
machine learning algorithm that suits
the data, use case, and available
resources
What is Model Building?
Many choose to build more than
one model and go ahead only with
the best fit
What is Model Building?
There are many machine learning
algorithms, chosen based on the
data and the problem at hand
Machine Learning Algorithms used by Data ScientistsCategorialContinuous
Supervised Unsupervised
• Regression
Linear
Multiple
• Decision Trees
• Random Forest
• Classification
KNN
Logistic Regression
SVM
Naïve-Baiyes
• Clustering
K-Means
PCA
• Association Analysis
• Hidden Markov Model
Machine Learning Algorithms used by Data ScientistsCategorialContinuous
Supervised Unsupervised
• Regression
Linear
Multiple
• Decision Trees
• Random Forest
• Classification
KNN
Logistic Regression
SVM
Naïve-Baiyes
• Clustering
K-Means
PCA
• Association Analysis
• Hidden Markov Model
Machine Learning Algorithms used by Data ScientistsCategorialContinuous
Supervised Unsupervised
• Regression
Linear
Multiple
• Decision Trees
• Random Forest
• Classification
KNN
Logistic Regression
SVM
Naïve-Baiyes
• Clustering
K-Means
PCA
• Association Analysis
• Hidden Markov Model
Machine Learning Algorithms used by Data ScientistsCategorialContinuous
Supervised Unsupervised
• Regression
Linear
Multiple
• Decision Trees
• Random Forest
• Classification
KNN
Logistic Regression
SVM
Naïve-Baiyes
• Clustering
K-Means
PCA
• Association Analysis
• Hidden Markov Model
What is Model Building?
Let me take you through a use
case for better understanding
Predicting World happiness
Model Building - Multiple Linear Regression
Questions To Ask:
1) How to describe the data?
2) Can we make a predictive model to calculate happiness score?
Variables used in the model
Happiness Rank
Region
Health
Happiness Score
Economy
Freedom
Country
Family
Trust
Generosity Dystopia Residual
Variables Used in Data:
Happiness Rank: determined by a country’s happiness score
Happiness Score: A score given to a country based on adding up the rankings that a population has given to each category (normalized)
Country: The country in question
Region: The region that the country belongs too (different than continent)
Economy: GDP per capita of the country
Family: quality of family life, nuclear and joint family
Health: ranking healthcare availability and average life expectancy in the country
Freedom: how much an individual is able to conduct them self based on their free will
Trust: that the government to not be corrupt
Generosity: how much their country is involved in peacekeeping and global aid
Dystopia Residual: Dystopia happiness score (1.85) i.e. the score of a hypothetical country that has a lower rank than the lowest ranking country
on the report, plus the residual value of each country (a number that is left over from the normalization of the variables which cannot be
explained).
Model Building - Multiple Linear Regression
For instructor
Importing the Python Libraries:
Model Building - Multiple Linear Regression
Preparing and Describing the Data:
• Read csv file using pandas “read_csv” and put it in a dataframe ‘happiness_2015’
• Similarly for 2016 and 2017, store the data in happiness_2016 and happiness_2017 respectively
Model Building - Multiple Linear Regression
Model Building - Multiple Linear RegressionHappiness_2016
Happiness_2017
Model Building - Multiple Linear Regression
We can concatenate the three
data or we can build a model
separately for one csv
Head() shows top countries with
highest happiness_score
Model Building - Multiple Linear Regression
We can create a visual which gives us a more appealing view of where each country is placed in the World ranking
report
Model Building - Multiple Linear Regression
Model Building - Multiple Linear Regression
The lighter colored countries
have a lower ranking.
The darker colored countries have
higher ranking on the report (i.e. are
the “happiest)
Model Building - Multiple Linear Regression
We can find out the correlation between Happiness Score and Happiness Rank by plotting a scatterplot
Model Building - Multiple Linear Regression
And the code is right
here!
Model Building - Multiple Linear Regression
• Happiness score determines how the country is ranked,
so happiness score as predictor and the happiness rank
as the dependent variable.
• The higher the score the lower the numerical rank, and
higher the happiness rating
• Therefore, happiness score and rank are negatively
correlated(as score increases, rank decreases)
Let’s dissect the graph:
However, we see that rank is directly
dependent on Happiness Score. So, we can
drop rank from our data frame
Model Building - Multiple Linear Regression
Model Building - Multiple Linear Regression
We can draw a heat map and
see the correlation between the
variables
Model Building - Multiple Linear Regression
We can see that happiness score is
strongly correlated with economy and
health, followed by family and
freedom.
Thus, in our model, we should see strong correlation between variables when finding the
coefficients.
Model Building - Multiple Linear Regression
Now, we can start to use SkLearn to construct the model
We drop categorical values and happiness rank because we will not explore it in this report
Model Building - Multiple Linear Regression
Let’s divide the dataset into train and test for further model building and testing:
Model Building - Multiple Linear Regression
So, data is split in 80:20 ratio between and train and test dataset respectively
Then, we import sklearn’s linear regression to fit the regression to the training set
Model Building - Multiple Linear Regression
Model Building - Multiple Linear Regression
Evaluate the predicted values and calculate the difference:
Model Building - Multiple Linear Regression
We can print the intercept and coefficients for the train dataset:
Model Building - Multiple Linear Regression
Trick: Right after you have trained your model, you know the order of the coefficients
This will print the coefficients
and the corresponding
feature
Tip: you can also use dictionary to reuse the coefficients later
Model Building - Multiple Linear Regression
For testing the model, let’s find out mean error:
Model Building - Multiple Linear Regression
Using sklearn.predict, we can use this model to predict the happiness scores for the first 100 countries in our model
How do these predictions compare
to the actual values in our data?
Model Building - Multiple Linear Regression
Let’s create a plot of our actual
happiness score versus the
predicted happiness score.
Model Building - Multiple Linear Regression
Hence, we can say that our model is a pretty good indicator of the
actual happiness score as there is a strong positive correlation
between the actual and the predicted values
Model Building - Multiple Linear Regression
HappinessScore=(0.0001289) +
(1.000048 * Economy) +
(0.999997 * Family)+
(0.999865 * Health) +
(0.999920 * Freedom) +
(0.999993 * Trust) +
(1.000029 * Generosity) +
(0.999971 * DystopiaResidual)
The Multiple Linear Regression Model for Happiness Score:
Communicate Results
Before you present your data model and insights, first understand:
What is your goal?Who is your target audience?
Communicate Results
But the battle is not over yet!!
Communicate Results
Before you present your data model and insights, first understand:
What is your goal?Who is your target audience?
Communicate Results
After you are clear with your objective, think about:
• What’s the question?
• What’s your
methodology?
• What’s the answer?
Communicate Results
Then, you’re finally ready to communicate your results to the
business teams so that it easily goes into execution phase
Model Maintenance
Model Maintenance
Data Acquisition
Data Preparation
Data Mining
Model Maintenance
Model Building
Model Maintenance
After we have deployed
the model, it is also
important to prevent the
model from deteriorating.
How can we do that?
ASSESS: Once in a while, we have to run a fresh
sample through data to assess our model
RETRAIN: After assessment, if you are not happy
with the performance of the model then you
retrain the model
REBUILD: If retrain fails too, then you have to rebuild by
finding all the variables again and build a new model
Model Maintenance
Summary
What is data science? What a data scientist does? Data acquisition
Model maintenanceMODEL BuildingData Mining using tableau
Model Building Using ARIMA
Plot the data:
As we can observe that mean and variance are
changing with time. Hence, this is non-stationary
series
Model Building Using ARIMA
Auto Correlation Function(ACF):
Model Building Using ARIMA
Partial Auto Correlation Function(PACF):
Data Science Training | Data Science For Beginners | Data Science With Python Course | Simplilearn

Más contenido relacionado

La actualidad más candente

Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Simplilearn
 

La actualidad más candente (20)

Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data Science
 
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
 
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
 
Data science
Data scienceData science
Data science
 
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data Science
 
Introduction to data science.pptx
Introduction to data science.pptxIntroduction to data science.pptx
Introduction to data science.pptx
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecases
 
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
 
Introduction to Python for Data Science
Introduction to Python for Data ScienceIntroduction to Python for Data Science
Introduction to Python for Data Science
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
 
Machine learning
Machine learningMachine learning
Machine learning
 
Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...
Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...
Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...
 
Naive Bayes
Naive BayesNaive Bayes
Naive Bayes
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and Analytics
 

Similar a Data Science Training | Data Science For Beginners | Data Science With Python Course | Simplilearn

Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Rohit Dubey
 
Tableau Career Path_ Roles, Skills & Certifications (2).pdf
Tableau Career Path_ Roles, Skills & Certifications (2).pdfTableau Career Path_ Roles, Skills & Certifications (2).pdf
Tableau Career Path_ Roles, Skills & Certifications (2).pdf
JanBask Training
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dm
sumit621
 

Similar a Data Science Training | Data Science For Beginners | Data Science With Python Course | Simplilearn (20)

Data preparation and processing chapter 2
Data preparation and processing chapter  2Data preparation and processing chapter  2
Data preparation and processing chapter 2
 
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
 
Exploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdfExploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdf
 
Chapter 3.pdf
Chapter 3.pdfChapter 3.pdf
Chapter 3.pdf
 
Tableau Career Path_ Roles, Skills & Certifications (2).pdf
Tableau Career Path_ Roles, Skills & Certifications (2).pdfTableau Career Path_ Roles, Skills & Certifications (2).pdf
Tableau Career Path_ Roles, Skills & Certifications (2).pdf
 
Data Preparation.pptx
Data Preparation.pptxData Preparation.pptx
Data Preparation.pptx
 
Découverte d'Einstein Analytics (Tableau CRM)
Découverte d'Einstein Analytics (Tableau CRM)Découverte d'Einstein Analytics (Tableau CRM)
Découverte d'Einstein Analytics (Tableau CRM)
 
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdfThe Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
 
ML-ChapterTwo-Data Preprocessing.ppt
ML-ChapterTwo-Data Preprocessing.pptML-ChapterTwo-Data Preprocessing.ppt
ML-ChapterTwo-Data Preprocessing.ppt
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dm
 
data wrangling (1).pptx kjhiukjhknjbnkjh
data wrangling (1).pptx kjhiukjhknjbnkjhdata wrangling (1).pptx kjhiukjhknjbnkjh
data wrangling (1).pptx kjhiukjhknjbnkjh
 
23.pdf
23.pdf23.pdf
23.pdf
 
Challenges Of A Junior Data Scientist_ Best Tips To Help You Along The Way.pdf
Challenges Of A Junior Data Scientist_ Best Tips To Help You Along The Way.pdfChallenges Of A Junior Data Scientist_ Best Tips To Help You Along The Way.pdf
Challenges Of A Junior Data Scientist_ Best Tips To Help You Along The Way.pdf
 
Predictive Analytics - Big Data Warehousing Meetup
Predictive Analytics - Big Data Warehousing MeetupPredictive Analytics - Big Data Warehousing Meetup
Predictive Analytics - Big Data Warehousing Meetup
 
Python for Data Analysis: A Comprehensive Guide
Python for Data Analysis: A Comprehensive GuidePython for Data Analysis: A Comprehensive Guide
Python for Data Analysis: A Comprehensive Guide
 
Exploratory Data Analysis - A Comprehensive Guide to EDA.pdf
Exploratory Data Analysis - A Comprehensive Guide to EDA.pdfExploratory Data Analysis - A Comprehensive Guide to EDA.pdf
Exploratory Data Analysis - A Comprehensive Guide to EDA.pdf
 
Exploratory Data Analysis - A Comprehensive Guide to EDA.pdf
Exploratory Data Analysis - A Comprehensive Guide to EDA.pdfExploratory Data Analysis - A Comprehensive Guide to EDA.pdf
Exploratory Data Analysis - A Comprehensive Guide to EDA.pdf
 
Exploratory Data Analysis - A Comprehensive Guide to EDA.pdf
Exploratory Data Analysis - A Comprehensive Guide to EDA.pdfExploratory Data Analysis - A Comprehensive Guide to EDA.pdf
Exploratory Data Analysis - A Comprehensive Guide to EDA.pdf
 
big-data-anallytics.pptx
big-data-anallytics.pptxbig-data-anallytics.pptx
big-data-anallytics.pptx
 
Data Processing & Explain each term in details.pptx
Data Processing & Explain each term in details.pptxData Processing & Explain each term in details.pptx
Data Processing & Explain each term in details.pptx
 

Más de Simplilearn

What is LSTM ?| Long Short Term Memory Explained with Example | Deep Learning...
What is LSTM ?| Long Short Term Memory Explained with Example | Deep Learning...What is LSTM ?| Long Short Term Memory Explained with Example | Deep Learning...
What is LSTM ?| Long Short Term Memory Explained with Example | Deep Learning...
Simplilearn
 
Top 10 Chat GPT Use Cases | ChatGPT Applications | ChatGPT Tutorial For Begin...
Top 10 Chat GPT Use Cases | ChatGPT Applications | ChatGPT Tutorial For Begin...Top 10 Chat GPT Use Cases | ChatGPT Applications | ChatGPT Tutorial For Begin...
Top 10 Chat GPT Use Cases | ChatGPT Applications | ChatGPT Tutorial For Begin...
Simplilearn
 
React JS Vs Next JS - What's The Difference | Next JS Tutorial For Beginners ...
React JS Vs Next JS - What's The Difference | Next JS Tutorial For Beginners ...React JS Vs Next JS - What's The Difference | Next JS Tutorial For Beginners ...
React JS Vs Next JS - What's The Difference | Next JS Tutorial For Beginners ...
Simplilearn
 
Backpropagation in Neural Networks | Back Propagation Algorithm with Examples...
Backpropagation in Neural Networks | Back Propagation Algorithm with Examples...Backpropagation in Neural Networks | Back Propagation Algorithm with Examples...
Backpropagation in Neural Networks | Back Propagation Algorithm with Examples...
Simplilearn
 
How to Become a Business Analyst ?| Roadmap to Become Business Analyst | Simp...
How to Become a Business Analyst ?| Roadmap to Become Business Analyst | Simp...How to Become a Business Analyst ?| Roadmap to Become Business Analyst | Simp...
How to Become a Business Analyst ?| Roadmap to Become Business Analyst | Simp...
Simplilearn
 
Career Opportunities In Artificial Intelligence 2023 | AI Job Opportunities |...
Career Opportunities In Artificial Intelligence 2023 | AI Job Opportunities |...Career Opportunities In Artificial Intelligence 2023 | AI Job Opportunities |...
Career Opportunities In Artificial Intelligence 2023 | AI Job Opportunities |...
Simplilearn
 
Programming for Beginners | How to Start Coding in 2023? | Introduction to Pr...
Programming for Beginners | How to Start Coding in 2023? | Introduction to Pr...Programming for Beginners | How to Start Coding in 2023? | Introduction to Pr...
Programming for Beginners | How to Start Coding in 2023? | Introduction to Pr...
Simplilearn
 
Best IDE for Programming in 2023 | Top 8 Programming IDE You Should Know | Si...
Best IDE for Programming in 2023 | Top 8 Programming IDE You Should Know | Si...Best IDE for Programming in 2023 | Top 8 Programming IDE You Should Know | Si...
Best IDE for Programming in 2023 | Top 8 Programming IDE You Should Know | Si...
Simplilearn
 
React 18 Overview | React 18 New Features and Changes | React 18 Tutorial 202...
React 18 Overview | React 18 New Features and Changes | React 18 Tutorial 202...React 18 Overview | React 18 New Features and Changes | React 18 Tutorial 202...
React 18 Overview | React 18 New Features and Changes | React 18 Tutorial 202...
Simplilearn
 
What Is Next JS ? | Introduction to Next JS | Basics of Next JS | Next JS Tut...
What Is Next JS ? | Introduction to Next JS | Basics of Next JS | Next JS Tut...What Is Next JS ? | Introduction to Next JS | Basics of Next JS | Next JS Tut...
What Is Next JS ? | Introduction to Next JS | Basics of Next JS | Next JS Tut...
Simplilearn
 
How To Become an SEO Expert In 2023 | SEO Expert Tutorial | SEO For Beginners...
How To Become an SEO Expert In 2023 | SEO Expert Tutorial | SEO For Beginners...How To Become an SEO Expert In 2023 | SEO Expert Tutorial | SEO For Beginners...
How To Become an SEO Expert In 2023 | SEO Expert Tutorial | SEO For Beginners...
Simplilearn
 
WordPress Tutorial for Beginners 2023 | What Is WordPress and How Does It Wor...
WordPress Tutorial for Beginners 2023 | What Is WordPress and How Does It Wor...WordPress Tutorial for Beginners 2023 | What Is WordPress and How Does It Wor...
WordPress Tutorial for Beginners 2023 | What Is WordPress and How Does It Wor...
Simplilearn
 
Blogging For Beginners 2023 | How To Create A Blog | Blogging Tutorial | Simp...
Blogging For Beginners 2023 | How To Create A Blog | Blogging Tutorial | Simp...Blogging For Beginners 2023 | How To Create A Blog | Blogging Tutorial | Simp...
Blogging For Beginners 2023 | How To Create A Blog | Blogging Tutorial | Simp...
Simplilearn
 
How To Start A Blog In 2023 | Pros And Cons Of Blogging | Blogging Tutorial |...
How To Start A Blog In 2023 | Pros And Cons Of Blogging | Blogging Tutorial |...How To Start A Blog In 2023 | Pros And Cons Of Blogging | Blogging Tutorial |...
How To Start A Blog In 2023 | Pros And Cons Of Blogging | Blogging Tutorial |...
Simplilearn
 
How to Increase Website Traffic ? | 10 Ways To Increase Website Traffic in 20...
How to Increase Website Traffic ? | 10 Ways To Increase Website Traffic in 20...How to Increase Website Traffic ? | 10 Ways To Increase Website Traffic in 20...
How to Increase Website Traffic ? | 10 Ways To Increase Website Traffic in 20...
Simplilearn
 

Más de Simplilearn (20)

ChatGPT in Cybersecurity
ChatGPT in CybersecurityChatGPT in Cybersecurity
ChatGPT in Cybersecurity
 
Whatis SQL Injection.pptx
Whatis SQL Injection.pptxWhatis SQL Injection.pptx
Whatis SQL Injection.pptx
 
Top 5 High Paying Cloud Computing Jobs in 2023
 Top 5 High Paying Cloud Computing Jobs in 2023  Top 5 High Paying Cloud Computing Jobs in 2023
Top 5 High Paying Cloud Computing Jobs in 2023
 
Types Of Cloud Jobs In 2024
Types Of Cloud Jobs In 2024Types Of Cloud Jobs In 2024
Types Of Cloud Jobs In 2024
 
Top 12 AI Technologies To Learn 2024 | Top AI Technologies in 2024 | AI Trend...
Top 12 AI Technologies To Learn 2024 | Top AI Technologies in 2024 | AI Trend...Top 12 AI Technologies To Learn 2024 | Top AI Technologies in 2024 | AI Trend...
Top 12 AI Technologies To Learn 2024 | Top AI Technologies in 2024 | AI Trend...
 
What is LSTM ?| Long Short Term Memory Explained with Example | Deep Learning...
What is LSTM ?| Long Short Term Memory Explained with Example | Deep Learning...What is LSTM ?| Long Short Term Memory Explained with Example | Deep Learning...
What is LSTM ?| Long Short Term Memory Explained with Example | Deep Learning...
 
Top 10 Chat GPT Use Cases | ChatGPT Applications | ChatGPT Tutorial For Begin...
Top 10 Chat GPT Use Cases | ChatGPT Applications | ChatGPT Tutorial For Begin...Top 10 Chat GPT Use Cases | ChatGPT Applications | ChatGPT Tutorial For Begin...
Top 10 Chat GPT Use Cases | ChatGPT Applications | ChatGPT Tutorial For Begin...
 
React JS Vs Next JS - What's The Difference | Next JS Tutorial For Beginners ...
React JS Vs Next JS - What's The Difference | Next JS Tutorial For Beginners ...React JS Vs Next JS - What's The Difference | Next JS Tutorial For Beginners ...
React JS Vs Next JS - What's The Difference | Next JS Tutorial For Beginners ...
 
Backpropagation in Neural Networks | Back Propagation Algorithm with Examples...
Backpropagation in Neural Networks | Back Propagation Algorithm with Examples...Backpropagation in Neural Networks | Back Propagation Algorithm with Examples...
Backpropagation in Neural Networks | Back Propagation Algorithm with Examples...
 
How to Become a Business Analyst ?| Roadmap to Become Business Analyst | Simp...
How to Become a Business Analyst ?| Roadmap to Become Business Analyst | Simp...How to Become a Business Analyst ?| Roadmap to Become Business Analyst | Simp...
How to Become a Business Analyst ?| Roadmap to Become Business Analyst | Simp...
 
Career Opportunities In Artificial Intelligence 2023 | AI Job Opportunities |...
Career Opportunities In Artificial Intelligence 2023 | AI Job Opportunities |...Career Opportunities In Artificial Intelligence 2023 | AI Job Opportunities |...
Career Opportunities In Artificial Intelligence 2023 | AI Job Opportunities |...
 
Programming for Beginners | How to Start Coding in 2023? | Introduction to Pr...
Programming for Beginners | How to Start Coding in 2023? | Introduction to Pr...Programming for Beginners | How to Start Coding in 2023? | Introduction to Pr...
Programming for Beginners | How to Start Coding in 2023? | Introduction to Pr...
 
Best IDE for Programming in 2023 | Top 8 Programming IDE You Should Know | Si...
Best IDE for Programming in 2023 | Top 8 Programming IDE You Should Know | Si...Best IDE for Programming in 2023 | Top 8 Programming IDE You Should Know | Si...
Best IDE for Programming in 2023 | Top 8 Programming IDE You Should Know | Si...
 
React 18 Overview | React 18 New Features and Changes | React 18 Tutorial 202...
React 18 Overview | React 18 New Features and Changes | React 18 Tutorial 202...React 18 Overview | React 18 New Features and Changes | React 18 Tutorial 202...
React 18 Overview | React 18 New Features and Changes | React 18 Tutorial 202...
 
What Is Next JS ? | Introduction to Next JS | Basics of Next JS | Next JS Tut...
What Is Next JS ? | Introduction to Next JS | Basics of Next JS | Next JS Tut...What Is Next JS ? | Introduction to Next JS | Basics of Next JS | Next JS Tut...
What Is Next JS ? | Introduction to Next JS | Basics of Next JS | Next JS Tut...
 
How To Become an SEO Expert In 2023 | SEO Expert Tutorial | SEO For Beginners...
How To Become an SEO Expert In 2023 | SEO Expert Tutorial | SEO For Beginners...How To Become an SEO Expert In 2023 | SEO Expert Tutorial | SEO For Beginners...
How To Become an SEO Expert In 2023 | SEO Expert Tutorial | SEO For Beginners...
 
WordPress Tutorial for Beginners 2023 | What Is WordPress and How Does It Wor...
WordPress Tutorial for Beginners 2023 | What Is WordPress and How Does It Wor...WordPress Tutorial for Beginners 2023 | What Is WordPress and How Does It Wor...
WordPress Tutorial for Beginners 2023 | What Is WordPress and How Does It Wor...
 
Blogging For Beginners 2023 | How To Create A Blog | Blogging Tutorial | Simp...
Blogging For Beginners 2023 | How To Create A Blog | Blogging Tutorial | Simp...Blogging For Beginners 2023 | How To Create A Blog | Blogging Tutorial | Simp...
Blogging For Beginners 2023 | How To Create A Blog | Blogging Tutorial | Simp...
 
How To Start A Blog In 2023 | Pros And Cons Of Blogging | Blogging Tutorial |...
How To Start A Blog In 2023 | Pros And Cons Of Blogging | Blogging Tutorial |...How To Start A Blog In 2023 | Pros And Cons Of Blogging | Blogging Tutorial |...
How To Start A Blog In 2023 | Pros And Cons Of Blogging | Blogging Tutorial |...
 
How to Increase Website Traffic ? | 10 Ways To Increase Website Traffic in 20...
How to Increase Website Traffic ? | 10 Ways To Increase Website Traffic in 20...How to Increase Website Traffic ? | 10 Ways To Increase Website Traffic in 20...
How to Increase Website Traffic ? | 10 Ways To Increase Website Traffic in 20...
 

Último

Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
ssuserdda66b
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 

Último (20)

FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 

Data Science Training | Data Science For Beginners | Data Science With Python Course | Simplilearn

  • 2. What’s in it for you? What is Data Science? Who is a Data Scientist? What are the skills of a Data Scientist? What does a Data Scientist do? Data Acquisition Data Preparation Data Mining with Tableau Model Building & Testing Model Maintenance Summary
  • 3. What is Data Science? Data Science is the area of study which involves extracting knowledge from all the data you can gather
  • 4. What is Data Science? Hence, translating a business problem into a research project, and then translating the results back into a practical solution requires a deep understanding of the business domain and creativity
  • 5. What is Data Science? Fraudulent transactions are ramping on the internet and the results can be devastating
  • 6. What is Data Science? Data Science can help in fraud detection using advanced machine learning algorithms and prevent great monetary losses
  • 7. What is Data Science? And people doing all this work are called Data Scientists
  • 8. What does a Data Scientist do? Model Maintenance Data Acquisition Data Preparation Data Mining Data Modelling Model Maintenance
  • 9. Model Maintenance What does a Data Scientist do? Data Acquisition Data Preparation Data Mining Data Modelling Model Maintenance
  • 10. Data Acquisition Data Acquisition involves extracting data from multiple source systems Integrating and transforming the data into homogeneous format Loading the transformed data into data warehouse Multiple source systems (Database, FlatFiles etc)
  • 11. Data Acquisition Data Acquisition involves extracting data from multiple source systems Integrating and transforming the data into homogeneous format Loading the transformed data into data warehouse Transformation
  • 12. Data Acquisition Data Acquisition involves extracting data from multiple source systems Integrating and transforming the data into homogeneous format Loading the transformed data into data warehouse Data Warehouse
  • 13. Data Acquisition Data warehouse Multiple source systems (Database, FlatFiles etc.) Transformation
  • 14. Data Acquisition Data warehouse Multiple source systems (Database, FlatFiles etc) Transformation This process is mainly referred as ETL (Extract, Transform, Load)
  • 15. Data Acquisition Various tools are available for ETL process like Talend Studio, DataStage, Informatica
  • 16. Model Maintenance What does a Data Scientist do? Data Acquisition Data Preparation Data Mining Model Building Model Maintenance
  • 17. Data Preparation The most essential part of any Data Science project is Data Preparation. It consumes 60% of the time spent on the project. There are many things we can do to ensure data is used in the most productive and meaningful manner
  • 18. Data Cleaning Data Transformation Handling Outliers Data Integration Data Reduction Data Preparation DATA CLEANING: • Data cleaning is important because bad data may lead to a bad model • It handles missing values which may be due to various reasons • In this, NULL or unwanted values is also handled • It improves business decisions and increases productivity
  • 19. Data Cleaning Data Transformation Handling Outliers Data Integration Data Reduction Data Preparation DATA TRANSFORMATION: • Data Transformation turns raw data formats into desired outputs • It involves normalization of data • Min-max normalization • Z-score normalization
  • 20. Data Cleaning Data Transformation Handling Outliers Data Integration Data Reduction Data Preparation HANDLING OUTLIERS: • Outliers are observations which are distant from the rest of the data. • They can be good or bad for the data • Univariate and Multivariate methods are used for detection • Outliers can be used for fraud detection • Plots used: Scatter plots, box plots
  • 21. Data Cleaning Data Transformation Handling Outliers Data Integration Data Reduction Data Preparation DATA INTEGRITY: • Data integrity means that data is accurate and reliable • It is important where business decisions are made on the basis of data regularly
  • 22. Data Cleaning Data Transformation Handling Outliers Data Integration Data Reduction Data Preparation DATA REDUCTION: • This step reduces multitudinous amounts of data into meaningful parts • It increases storage capacity • It is a factor helping in reducing costs • Involves data deduplication which eliminates redundant data
  • 23. Data Preparation – Data Cleaning Data cleaning is one of the most common tasks of data preparation which involves ensuring that the data is: Valid Consistent Uniform Accurate
  • 24. Data Preparation – Data Cleaning Let’s have a look at a bank’s data with it’s customer details
  • 25. Data Preparation – Data Cleaning We can import and read the data using pandas library of Python
  • 26. Let’s look at the Geography column which has some missing values In this case, we might not want to assume the location of the customers Data Preparation – Data Cleaning
  • 27. So, let’s fill the missing values with empty string Data Preparation – Data Cleaning
  • 28. In numerical columns, filling the unwanted values with 'mean' can help us to even out the data set: Before: After: Data Preparation – Data Cleaning Data.CreditScore = data.CreditScore.fillna(data.creditScore.mean())
  • 29. We can remove all rows with unwanted values, however this is a very aggressive step Dropping all rows with any NA values is easy: data.dropna() Data Preparation – Data Cleaning
  • 30. Dropping all rows with any NA values is easy: data.dropna() Data Preparation – Data Cleaning We can remove all rows with unwanted values, however this is a very aggressive step We can also drop rows that have all NA values: data.dropna(how=’all’)
  • 31. Dropping all rows with any NA values is easy: data.dropna() We can also drop rows that have all NA values: data.dropna(how=’all’) Data Preparation – Data Cleaning We can remove all rows with unwanted values, however this is a very aggressive step We can also put a limitation example, the data needs to have at least 5 non-null values data.dropna(thresh=5)
  • 32. Model Maintenance Data Mining Data Acquisition Data Preparation Data Mining Model Building Model Maintenance
  • 33. Data Mining After acquiring the data, preparing and cleaning it, Data Scientist discovers patterns and relationship in data to make better business decisions
  • 34. Data Mining It’s a discovery process, to get hidden and useful knowledge. It is commonly referred as ‘Data Mining’
  • 35. Data Mining Let me show you how to do data mining using Tableau.
  • 36. Data Mining using Tableau You can easily download Tableau software from https://www.tableau.com/
  • 37. Data Mining using Tableau You can connect your csv file to Tableau desktop to start mining the data! You can connect to various other sources also
  • 38. Data Mining using Tableau Tableau will automatically identify the fields as Dimensions (descriptive) and Measures (numeric fields)
  • 39. Problem Statement It is important to understand the problem statement first! In this problem, we have a dataset of a bank’s customers. We want to understand the exit behavior of the customers based on different variables, for instance: Data Mining using Tableau • Gender • Credit card holding • Geography
  • 40. Data Mining using Tableau Now, we have a dataset of about 10,000 rows as shown below and we need to find out potential customers who will exit first
  • 41. Let’s first evaluate on the basis of gender, to understand if this variable will affect the model Data Mining using Tableau
  • 42. We have moved the variable ‘exited’ from Measures to Dimensions Data Mining using Tableau
  • 43. Here, Tableau recognizes ‘exited’ column as measure because of its numerical datatype. However, ‘exited’ is categorical for us as it tells whether a person has left or not. So, we have placed ‘exited’ in dimensions as seen in the screenshot: Data Mining using Tableau
  • 44. Data Mining using Tableau If we put ‘gender’ in columns and ‘exited’ on colors Where: 0 – people who did not exit 1 – people who did exit
  • 45. For better understanding, let’s prepare a bar chart which depicts male/female who stayed/exited Data Mining using Tableau
  • 46. Data Mining using Tableau Stayed Exited You can give alias describing that 0 means stayed and 1 means exited
  • 47. To compare the exit rates, you can also add a reference line by right clicking on the right axis and selecting ‘Add Reference Line’ Data Mining using Tableau
  • 48. Data Mining using Tableau
  • 49. Data Mining using Tableau So, we have added the reference line and we can say that female customers exit more than average as compared to male customers!
  • 50. We can also see on the basis of whether or not they have a credit card. Again if a person has credit card or not is a categorical decision so we put it into dimensions and then evaluate: Data Mining using Tableau As explained earlier, we have added the alias and a reference line.
  • 51. Hence, a customer leaving the bank is not dependent on a person holding credit card Data Mining using Tableau Here, we can see that we cannot differentiate the exit behavior between people having a credit and those who do not
  • 52. Here, we can see that customers leaving the bank in Germany is more than that in France or Spain. So, we treat this as an anomaly and it can impact the model Similarly, we can evaluate on the basis of geography: Data Mining using Tableau
  • 53. Data Mining using Tableau So, we can ignore the variable ‘has credit card’ because it will not impact the model
  • 54. Data Mining using Tableau And focus on other variables like ‘gender’, ‘geography’ which can actually have an impact
  • 55. Data Mining using Tableau So, we can make business decisions based on these findings. Lets see some advantages of data mining
  • 56. Advantages of Data Mining Predicts future trends Signifies customer patterns Helps decision making Quick fraud detection Choosing right algorithm
  • 57. Model Maintenance Model Building Data Acquisition Data Preparation Data Mining Model Building Model Maintenance
  • 58. What is Model Building? The model is built by selecting a machine learning algorithm that suits the data, use case, and available resources
  • 59. What is Model Building? Many choose to build more than one model and go ahead only with the best fit
  • 60. What is Model Building? There are many machine learning algorithms, chosen based on the data and the problem at hand
  • 61. Machine Learning Algorithms used by Data ScientistsCategorialContinuous Supervised Unsupervised • Regression Linear Multiple • Decision Trees • Random Forest • Classification KNN Logistic Regression SVM Naïve-Baiyes • Clustering K-Means PCA • Association Analysis • Hidden Markov Model
  • 62. Machine Learning Algorithms used by Data ScientistsCategorialContinuous Supervised Unsupervised • Regression Linear Multiple • Decision Trees • Random Forest • Classification KNN Logistic Regression SVM Naïve-Baiyes • Clustering K-Means PCA • Association Analysis • Hidden Markov Model
  • 63. Machine Learning Algorithms used by Data ScientistsCategorialContinuous Supervised Unsupervised • Regression Linear Multiple • Decision Trees • Random Forest • Classification KNN Logistic Regression SVM Naïve-Baiyes • Clustering K-Means PCA • Association Analysis • Hidden Markov Model
  • 64. Machine Learning Algorithms used by Data ScientistsCategorialContinuous Supervised Unsupervised • Regression Linear Multiple • Decision Trees • Random Forest • Classification KNN Logistic Regression SVM Naïve-Baiyes • Clustering K-Means PCA • Association Analysis • Hidden Markov Model
  • 65. What is Model Building? Let me take you through a use case for better understanding
  • 66. Predicting World happiness Model Building - Multiple Linear Regression Questions To Ask: 1) How to describe the data? 2) Can we make a predictive model to calculate happiness score?
  • 67. Variables used in the model Happiness Rank Region Health Happiness Score Economy Freedom Country Family Trust Generosity Dystopia Residual
  • 68. Variables Used in Data: Happiness Rank: determined by a country’s happiness score Happiness Score: A score given to a country based on adding up the rankings that a population has given to each category (normalized) Country: The country in question Region: The region that the country belongs too (different than continent) Economy: GDP per capita of the country Family: quality of family life, nuclear and joint family Health: ranking healthcare availability and average life expectancy in the country Freedom: how much an individual is able to conduct them self based on their free will Trust: that the government to not be corrupt Generosity: how much their country is involved in peacekeeping and global aid Dystopia Residual: Dystopia happiness score (1.85) i.e. the score of a hypothetical country that has a lower rank than the lowest ranking country on the report, plus the residual value of each country (a number that is left over from the normalization of the variables which cannot be explained). Model Building - Multiple Linear Regression For instructor
  • 69. Importing the Python Libraries: Model Building - Multiple Linear Regression
  • 70. Preparing and Describing the Data: • Read csv file using pandas “read_csv” and put it in a dataframe ‘happiness_2015’ • Similarly for 2016 and 2017, store the data in happiness_2016 and happiness_2017 respectively Model Building - Multiple Linear Regression
  • 71. Model Building - Multiple Linear RegressionHappiness_2016
  • 72. Happiness_2017 Model Building - Multiple Linear Regression
  • 73. We can concatenate the three data or we can build a model separately for one csv Head() shows top countries with highest happiness_score Model Building - Multiple Linear Regression
  • 74. We can create a visual which gives us a more appealing view of where each country is placed in the World ranking report Model Building - Multiple Linear Regression
  • 75. Model Building - Multiple Linear Regression The lighter colored countries have a lower ranking. The darker colored countries have higher ranking on the report (i.e. are the “happiest)
  • 76. Model Building - Multiple Linear Regression We can find out the correlation between Happiness Score and Happiness Rank by plotting a scatterplot
  • 77. Model Building - Multiple Linear Regression And the code is right here!
  • 78. Model Building - Multiple Linear Regression • Happiness score determines how the country is ranked, so happiness score as predictor and the happiness rank as the dependent variable. • The higher the score the lower the numerical rank, and higher the happiness rating • Therefore, happiness score and rank are negatively correlated(as score increases, rank decreases) Let’s dissect the graph:
  • 79. However, we see that rank is directly dependent on Happiness Score. So, we can drop rank from our data frame Model Building - Multiple Linear Regression
  • 80. Model Building - Multiple Linear Regression We can draw a heat map and see the correlation between the variables
  • 81. Model Building - Multiple Linear Regression We can see that happiness score is strongly correlated with economy and health, followed by family and freedom.
  • 82. Thus, in our model, we should see strong correlation between variables when finding the coefficients. Model Building - Multiple Linear Regression
  • 83. Now, we can start to use SkLearn to construct the model We drop categorical values and happiness rank because we will not explore it in this report Model Building - Multiple Linear Regression
  • 84. Let’s divide the dataset into train and test for further model building and testing: Model Building - Multiple Linear Regression So, data is split in 80:20 ratio between and train and test dataset respectively
  • 85. Then, we import sklearn’s linear regression to fit the regression to the training set Model Building - Multiple Linear Regression
  • 86. Model Building - Multiple Linear Regression Evaluate the predicted values and calculate the difference:
  • 87. Model Building - Multiple Linear Regression We can print the intercept and coefficients for the train dataset:
  • 88. Model Building - Multiple Linear Regression Trick: Right after you have trained your model, you know the order of the coefficients This will print the coefficients and the corresponding feature Tip: you can also use dictionary to reuse the coefficients later
  • 89. Model Building - Multiple Linear Regression For testing the model, let’s find out mean error:
  • 90. Model Building - Multiple Linear Regression Using sklearn.predict, we can use this model to predict the happiness scores for the first 100 countries in our model How do these predictions compare to the actual values in our data?
  • 91. Model Building - Multiple Linear Regression Let’s create a plot of our actual happiness score versus the predicted happiness score.
  • 92. Model Building - Multiple Linear Regression Hence, we can say that our model is a pretty good indicator of the actual happiness score as there is a strong positive correlation between the actual and the predicted values
  • 93. Model Building - Multiple Linear Regression HappinessScore=(0.0001289) + (1.000048 * Economy) + (0.999997 * Family)+ (0.999865 * Health) + (0.999920 * Freedom) + (0.999993 * Trust) + (1.000029 * Generosity) + (0.999971 * DystopiaResidual) The Multiple Linear Regression Model for Happiness Score:
  • 95. Before you present your data model and insights, first understand: What is your goal?Who is your target audience? Communicate Results But the battle is not over yet!!
  • 96. Communicate Results Before you present your data model and insights, first understand: What is your goal?Who is your target audience?
  • 97. Communicate Results After you are clear with your objective, think about: • What’s the question? • What’s your methodology? • What’s the answer?
  • 98. Communicate Results Then, you’re finally ready to communicate your results to the business teams so that it easily goes into execution phase
  • 99. Model Maintenance Model Maintenance Data Acquisition Data Preparation Data Mining Model Maintenance Model Building
  • 100. Model Maintenance After we have deployed the model, it is also important to prevent the model from deteriorating. How can we do that?
  • 101. ASSESS: Once in a while, we have to run a fresh sample through data to assess our model RETRAIN: After assessment, if you are not happy with the performance of the model then you retrain the model REBUILD: If retrain fails too, then you have to rebuild by finding all the variables again and build a new model Model Maintenance
  • 102. Summary What is data science? What a data scientist does? Data acquisition Model maintenanceMODEL BuildingData Mining using tableau
  • 103. Model Building Using ARIMA Plot the data: As we can observe that mean and variance are changing with time. Hence, this is non-stationary series
  • 104. Model Building Using ARIMA Auto Correlation Function(ACF):
  • 105. Model Building Using ARIMA Partial Auto Correlation Function(PACF):

Notas del editor

  1. Remove title case
  2. The important deciding variable seems to be score (as it determines rank).
  3. The darker red the square, the stronger the positive correlation, and obviously, variables will have a correlation of 1 with each other.
  4. The darker red the square, the stronger the positive correlation, and obviously, variables will have a correlation of 1 with each other.
  5. RETRAIN: The variables remain the same but you train your model with fresh sample of data so coefficients will change
  6. Natural language processing to enable it to communicate successfully in English (or some other human language). Knowledge representation to store information provided before or during the interrogation. Automated reasoning to use the stored information to answer questions and to draw new conclusions. Machine learning to adapt to new circumstances and to detect and extrapolate patterns.