SlideShare una empresa de Scribd logo
1 de 25
Descargar para leer sin conexión
Getting Started with Data Science
Using Python
https://www.linkedin.com/in/andreilyskov/
https://www.quora.com/profile/Andrei-Lyskov-1
About The Speaker
What is Data Science?
What is Data Science?
What is Data Science?
Why the Hype Around Data Science?
● IBM predicts that demand for data scientists will soar by 28% by 2020
● Data scientist roles have grown over 650% since 2012, but currently, 35,000 people in
the US have data science skills, while hundreds of companies are hiring for those roles.
● Software engineering is a common starting point for professionals who are in the
top five fasting growing jobs today. The career path to Machine Learning Engineer
and Big Data Developer begins with a solid software engineering background.
● Data Science gives you career flexibility
Who Are Data Scientists?
https://www.datascience.com/blog/data-scientist-skills
Who Are Data Scientists?
https://www.datascience.com/blog/data-scientist-skills
Who Are Data Scientists?
https://www.datascience.com/blog/data-scientist-skills
Application - Security
Application – Sports
Application - Finance
Application – Microsoft (Skype Product)
● The first is with a product feature called Skype Translator. As its implied,
Skype uses machine learning to translate a conversation between two people
speaking different languages through the use of a third party bot that joins
your call.
Skype Translator – How it Works | Skype Blogs
● The second is to detect fraudulent Skype Users, examples range from
users who send spammy messages, to credit card and online payment fraud.
This is an important application of machine learning as you can imagine, a
platform that’s riddled with spammers and fraudsters is not one that will likely
retain many users.
Detecting Fraudulent Skype Users via Machine Learning
Learning Data Science With Python - Libraries
NumPy is a library for the Python
programming language, adding
support for large, multi-
dimensional arrays and matrices,
along with a large collection of
high-level mathematical functions
to operate on these arrays.
Pandas is a software library
written for the Python
programming language for
data manipulation and
analysis. In particular, it offers
data structures and operations
for manipulating numerical
tables and time series.
A free software machine learning
library that features various
classification, regression and
clustering algorithms including
support vector machines,
random forests, gradient
boosting, and k-means and is
designed to interoperate with the
Python numerical and scientific
libraries NumPy and SciPy.
Learning Data Science With Python - Libraries
A plotting library for the Python
programming language and its
numerical mathematics extension
NumPy
Keras is an open source
neural network library written
in Python. It is capable of
running on top of TensorFlow,
Microsoft Cognitive Toolkit,
Theano, or MXNet. It was
developed with a focus on
enabling fast experimentation
TensorFlow is an open-source
software library for dataflow
programming across a range of
tasks. It is a symbolic math
library, and is also used for
machine learning applications
such as neural networks.
Learning Data Science With Python - Tools
Open-source web application that
allows you to create and share
documents that contain live code,
equations, visualizations and
narrative text
http://jupyter.org/
Crestle is your GPU-enabled
Jupyter environment in the
cloud.
https://www.crestle.com/
Similar to Jupyter Notebook, but
with the added benefit of “google
doc” type sharing and
collaboration
https://colab.research.google.com
Data Science Steps
● Data Gathering
Unless you’re at a company with great data governance you’re likely going to have some trouble
accessing the data you want. Whether that's because your company has neglected to put the
necessary systems in place to gather data, or the data that they are collecting is fragmented and
scattered across the organization, you’ll have to first spend some time gathering whatever data you’ll
need to do your job. That means having discussions with relevant stakeholders, and getting the
necessary credentials to access databases within your organization.
● Data Preparation
Once you have access to data, you’ll need to spend some time cleaning and formatting it. This is
where Data Science can often become more of an art, then a science. Unlike datasets you’ll find in
competitions, the real world has very messy data sets. Missing values, error in data collection, data
formatting, normalization, outliers - these are all issues that you’ll have to learn to deal with.
Data Science Steps
● Exploration
Before diving into building any models, you’ll want to explore the data to try to glean
some insights. Clustering algorithms, scatterplots, bar graphs, Chernoff faces are all
interesting ways of visualizing data that will lead to a better understanding of the
structure of your data and aid you in your model building step.
● Model Building
With your data cleaned and formatted, you’ll have an opportunity to explore a variety of
models to see which one works best. Random Forests, SVM’s, Bayesian Predictors
Neural Networks, Deep Learning, K-Nearest Neighbours - all models you should
familiarize yourself with. There is no one model fits all, and so you again will need to
develop intuition on which model suits your particular problem.
Data Science Steps
● Model Validation
Prediction accuracy is a common benchmark for whether your model is performing well, however
often times there are other evaluation metrics to consider. False positives and false negatives are
important to think about from the perspective of the problem you’re working on. If you’re predicting
disease, you’ll care more about minimizing false negative, since it may result in a persons death -
whereas a false positive will only lead to additional testing.
● Model Deployment
Finally you’ll deploy your model into the wild, as you gather more data and feedback on how its
doing you’ll be able to tweak and improve it as time goes on.
*This is by no means a comprehensive list of steps, and there are certainly other things you’ll need
to do to be able to do well in your job - however this is a good high level overview of the steps
involved in tackling data science problems.
Case Study
Building a regression model to
predict housing prices
Building A Portfolio
1. 2.
3.
Building A Portfolio
4. 5.
Questions?
Resources
Podcasts Websites/Blogs Communities
Data Skeptic Dataquest.io
experfylabs.slack.com/m
essages/C0L736X36/
Data Stories Kaggle.com
opendatacommunity.slac
k.com
Learning Machines 101 Quora.com dcommunity.slack.com
Linear Digressions analyticsvidhya.com kagglenoobs.slack.com
O’Reilly Data Show Coursera.org pythondev.slack.com
Talking Machine
https://developers.google.com/
machine-learning/crash-
course/
This week in Machine Learning and AI https://portal.azure.com/
Siraj Raval (Youtube) https://www.luis.ai/
Resources
Books Tv Shows/documentaries
Hands-On Machine Learning with Scikit-Learn and TensorFlow Humans (2015-)
Python Machine Learning, 1st Edition Persons of interest
Everybody Lies: Big Data, New Data Intelligence
Weapons of Math Destruction Minority report
Big Data: A Revolution That Will Transform How We Live, Work, and Think Almost human
Turing: Pioneer of the Information Age Robot and frank
Avogadro Corp Her
Code: The Hidden Language of Computer Hardware and Software Black Mirror
Superintelligence: Paths, Dangers, Strategies iRobot
Visual Explanations: Images and Quantities, Evidence and Narrative Ex Machina
Pattern Recognition and Machine Learning (Information Science and Statistics)
The Secret Rules of Modern Living:
Algorithms
Storytelling with Data: A Data Visualization Guide for Business Professionals
An Introduction to Statistical Learning by James, Witten, Hastie and Tibshirani

Más contenido relacionado

La actualidad más candente

Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecasesSreenatha Reddy K R
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceSampath Kumar
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceSrishti44
 
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...Edureka!
 
Data science life cycle
Data science life cycleData science life cycle
Data science life cycleManoj Mishra
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning Gopal Sakarkar
 
Data science & data scientist
Data science & data scientistData science & data scientist
Data science & data scientistVijayMohan Vasu
 
Introduction to data science club
Introduction to data science clubIntroduction to data science club
Introduction to data science clubData Science Club
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data AnalyticsRohithND
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceNiko Vuokko
 
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Edureka!
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data ScienceJason Geng
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...Edureka!
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningEng Teong Cheah
 
Big data and data science overview
Big data and data science overviewBig data and data science overview
Big data and data science overviewColleen Farrelly
 

La actualidad más candente (20)

Data science
Data scienceData science
Data science
 
Data Science
Data ScienceData Science
Data Science
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecases
 
Data science
Data science Data science
Data science
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
 
Data science life cycle
Data science life cycleData science life cycle
Data science life cycle
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Data science & data scientist
Data science & data scientistData science & data scientist
Data science & data scientist
 
Introduction to data science club
Introduction to data science clubIntroduction to data science club
Introduction to data science club
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data Science
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
 
Data science Big Data
Data science Big DataData science Big Data
Data science Big Data
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Big data and data science overview
Big data and data science overviewBig data and data science overview
Big data and data science overview
 

Similar a Getting Started with Data Science Using Python

A Comprehensive Learning Path to Become a Data Science 2021.pptx
A Comprehensive Learning Path to Become a Data Science 2021.pptxA Comprehensive Learning Path to Become a Data Science 2021.pptx
A Comprehensive Learning Path to Become a Data Science 2021.pptxRajSingh512965
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxShanmugasundaram M
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data ScienceSanghamitra Deb
 
JavaZone 2018 - A Practical(ish) Introduction to Data Science
JavaZone 2018 - A Practical(ish) Introduction to Data ScienceJavaZone 2018 - A Practical(ish) Introduction to Data Science
JavaZone 2018 - A Practical(ish) Introduction to Data ScienceMark West
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceMark West
 
How to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceHow to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceJuuso Parkkinen
 
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceGeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceMark West
 
Building Data Scientists
Building Data ScientistsBuilding Data Scientists
Building Data ScientistsMitch Sanders
 
Ch1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxCh1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxAbderrahmanABID2
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?DIGITALSAI1
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification courseKumarNaik21
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)SayyedYusufali
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabadVamsiNihal
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabadsaitejavella
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training HyderabadNithinsunil1
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabadVamsiNihal
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)SayyedYusufali
 
data science training and placement
data science training and placementdata science training and placement
data science training and placementSaiprasadVella
 
online data science training
online data science trainingonline data science training
online data science trainingDIGITALSAI1
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabadVamsiNihal
 

Similar a Getting Started with Data Science Using Python (20)

A Comprehensive Learning Path to Become a Data Science 2021.pptx
A Comprehensive Learning Path to Become a Data Science 2021.pptxA Comprehensive Learning Path to Become a Data Science 2021.pptx
A Comprehensive Learning Path to Become a Data Science 2021.pptx
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data Science
 
JavaZone 2018 - A Practical(ish) Introduction to Data Science
JavaZone 2018 - A Practical(ish) Introduction to Data ScienceJavaZone 2018 - A Practical(ish) Introduction to Data Science
JavaZone 2018 - A Practical(ish) Introduction to Data Science
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data Science
 
How to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceHow to Prepare for a Career in Data Science
How to Prepare for a Career in Data Science
 
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceGeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
 
Building Data Scientists
Building Data ScientistsBuilding Data Scientists
Building Data Scientists
 
Ch1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxCh1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptx
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabad
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
data science training and placement
data science training and placementdata science training and placement
data science training and placement
 
online data science training
online data science trainingonline data science training
online data science training
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
 

Más de MSDEVMTL

Intro grpc.net
Intro  grpc.netIntro  grpc.net
Intro grpc.netMSDEVMTL
 
Grpc and asp.net partie 2
Grpc and asp.net partie 2Grpc and asp.net partie 2
Grpc and asp.net partie 2MSDEVMTL
 
Property based testing
Property based testingProperty based testing
Property based testingMSDEVMTL
 
Improve cloud visibility and cost in Microsoft Azure
Improve cloud visibility and cost in Microsoft AzureImprove cloud visibility and cost in Microsoft Azure
Improve cloud visibility and cost in Microsoft AzureMSDEVMTL
 
Return on Ignite 2019: Azure, .NET, A.I. & Data
Return on Ignite 2019: Azure, .NET, A.I. & DataReturn on Ignite 2019: Azure, .NET, A.I. & Data
Return on Ignite 2019: Azure, .NET, A.I. & DataMSDEVMTL
 
C sharp 8.0 new features
C sharp 8.0 new featuresC sharp 8.0 new features
C sharp 8.0 new featuresMSDEVMTL
 
Asp.net core 3
Asp.net core 3Asp.net core 3
Asp.net core 3MSDEVMTL
 
MSDEVMTL Informations 2019
MSDEVMTL Informations 2019MSDEVMTL Informations 2019
MSDEVMTL Informations 2019MSDEVMTL
 
Common features in webapi aspnetcore
Common features in webapi aspnetcoreCommon features in webapi aspnetcore
Common features in webapi aspnetcoreMSDEVMTL
 
Groupe Excel et Power BI - Rencontre du 25 septembre 2018
Groupe Excel et Power BI  - Rencontre du 25 septembre 2018Groupe Excel et Power BI  - Rencontre du 25 septembre 2018
Groupe Excel et Power BI - Rencontre du 25 septembre 2018MSDEVMTL
 
Api gateway
Api gatewayApi gateway
Api gatewayMSDEVMTL
 
Common features in webapi aspnetcore
Common features in webapi aspnetcoreCommon features in webapi aspnetcore
Common features in webapi aspnetcoreMSDEVMTL
 
Stephane Lapointe: Governance in Azure, keep control of your environments
Stephane Lapointe: Governance in Azure, keep control of your environmentsStephane Lapointe: Governance in Azure, keep control of your environments
Stephane Lapointe: Governance in Azure, keep control of your environmentsMSDEVMTL
 
Eric Routhier: Garder le contrôle sur vos coûts Azure
Eric Routhier: Garder le contrôle sur vos coûts AzureEric Routhier: Garder le contrôle sur vos coûts Azure
Eric Routhier: Garder le contrôle sur vos coûts AzureMSDEVMTL
 
Michel Ouellette + Gabriel Lainesse: Process Automation & Data Analytics at S...
Michel Ouellette + Gabriel Lainesse: Process Automation & Data Analytics at S...Michel Ouellette + Gabriel Lainesse: Process Automation & Data Analytics at S...
Michel Ouellette + Gabriel Lainesse: Process Automation & Data Analytics at S...MSDEVMTL
 
Open id connect, azure ad, angular 5, web api core
Open id connect, azure ad, angular 5, web api coreOpen id connect, azure ad, angular 5, web api core
Open id connect, azure ad, angular 5, web api coreMSDEVMTL
 
Yoann Clombe : Fail fast, iterate quickly with power bi and google analytics
Yoann Clombe : Fail fast, iterate quickly with power bi and google analyticsYoann Clombe : Fail fast, iterate quickly with power bi and google analytics
Yoann Clombe : Fail fast, iterate quickly with power bi and google analyticsMSDEVMTL
 
CAE: etude de cas - Rolling Average
CAE: etude de cas - Rolling AverageCAE: etude de cas - Rolling Average
CAE: etude de cas - Rolling AverageMSDEVMTL
 
CAE: etude de cas
CAE: etude de casCAE: etude de cas
CAE: etude de casMSDEVMTL
 
Dan Edwards : Data visualization best practices with Power BI
Dan Edwards : Data visualization best practices with Power BIDan Edwards : Data visualization best practices with Power BI
Dan Edwards : Data visualization best practices with Power BIMSDEVMTL
 

Más de MSDEVMTL (20)

Intro grpc.net
Intro  grpc.netIntro  grpc.net
Intro grpc.net
 
Grpc and asp.net partie 2
Grpc and asp.net partie 2Grpc and asp.net partie 2
Grpc and asp.net partie 2
 
Property based testing
Property based testingProperty based testing
Property based testing
 
Improve cloud visibility and cost in Microsoft Azure
Improve cloud visibility and cost in Microsoft AzureImprove cloud visibility and cost in Microsoft Azure
Improve cloud visibility and cost in Microsoft Azure
 
Return on Ignite 2019: Azure, .NET, A.I. & Data
Return on Ignite 2019: Azure, .NET, A.I. & DataReturn on Ignite 2019: Azure, .NET, A.I. & Data
Return on Ignite 2019: Azure, .NET, A.I. & Data
 
C sharp 8.0 new features
C sharp 8.0 new featuresC sharp 8.0 new features
C sharp 8.0 new features
 
Asp.net core 3
Asp.net core 3Asp.net core 3
Asp.net core 3
 
MSDEVMTL Informations 2019
MSDEVMTL Informations 2019MSDEVMTL Informations 2019
MSDEVMTL Informations 2019
 
Common features in webapi aspnetcore
Common features in webapi aspnetcoreCommon features in webapi aspnetcore
Common features in webapi aspnetcore
 
Groupe Excel et Power BI - Rencontre du 25 septembre 2018
Groupe Excel et Power BI  - Rencontre du 25 septembre 2018Groupe Excel et Power BI  - Rencontre du 25 septembre 2018
Groupe Excel et Power BI - Rencontre du 25 septembre 2018
 
Api gateway
Api gatewayApi gateway
Api gateway
 
Common features in webapi aspnetcore
Common features in webapi aspnetcoreCommon features in webapi aspnetcore
Common features in webapi aspnetcore
 
Stephane Lapointe: Governance in Azure, keep control of your environments
Stephane Lapointe: Governance in Azure, keep control of your environmentsStephane Lapointe: Governance in Azure, keep control of your environments
Stephane Lapointe: Governance in Azure, keep control of your environments
 
Eric Routhier: Garder le contrôle sur vos coûts Azure
Eric Routhier: Garder le contrôle sur vos coûts AzureEric Routhier: Garder le contrôle sur vos coûts Azure
Eric Routhier: Garder le contrôle sur vos coûts Azure
 
Michel Ouellette + Gabriel Lainesse: Process Automation & Data Analytics at S...
Michel Ouellette + Gabriel Lainesse: Process Automation & Data Analytics at S...Michel Ouellette + Gabriel Lainesse: Process Automation & Data Analytics at S...
Michel Ouellette + Gabriel Lainesse: Process Automation & Data Analytics at S...
 
Open id connect, azure ad, angular 5, web api core
Open id connect, azure ad, angular 5, web api coreOpen id connect, azure ad, angular 5, web api core
Open id connect, azure ad, angular 5, web api core
 
Yoann Clombe : Fail fast, iterate quickly with power bi and google analytics
Yoann Clombe : Fail fast, iterate quickly with power bi and google analyticsYoann Clombe : Fail fast, iterate quickly with power bi and google analytics
Yoann Clombe : Fail fast, iterate quickly with power bi and google analytics
 
CAE: etude de cas - Rolling Average
CAE: etude de cas - Rolling AverageCAE: etude de cas - Rolling Average
CAE: etude de cas - Rolling Average
 
CAE: etude de cas
CAE: etude de casCAE: etude de cas
CAE: etude de cas
 
Dan Edwards : Data visualization best practices with Power BI
Dan Edwards : Data visualization best practices with Power BIDan Edwards : Data visualization best practices with Power BI
Dan Edwards : Data visualization best practices with Power BI
 

Último

Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Laying the Data Foundations for Artificial Intelligence!
Laying the Data Foundations for Artificial Intelligence!Laying the Data Foundations for Artificial Intelligence!
Laying the Data Foundations for Artificial Intelligence!Memoori
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Which standard is best for your content?
Which standard is best for your content?Which standard is best for your content?
Which standard is best for your content?Rustici Software
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Automation Ops Series: Session 3 - Solutions management
Automation Ops Series: Session 3 - Solutions managementAutomation Ops Series: Session 3 - Solutions management
Automation Ops Series: Session 3 - Solutions managementDianaGray10
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Transcript: Green paths: Learning from publishers’ sustainability journeys - ...
Transcript: Green paths: Learning from publishers’ sustainability journeys - ...Transcript: Green paths: Learning from publishers’ sustainability journeys - ...
Transcript: Green paths: Learning from publishers’ sustainability journeys - ...BookNet Canada
 
Deliver Latency Free Customer Experience
Deliver Latency Free Customer ExperienceDeliver Latency Free Customer Experience
Deliver Latency Free Customer ExperienceOpsTree solutions
 
Introduction-to-Wazuh-and-its-integration.pptx
Introduction-to-Wazuh-and-its-integration.pptxIntroduction-to-Wazuh-and-its-integration.pptx
Introduction-to-Wazuh-and-its-integration.pptxmprakaash5
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Transport in Open Pits______SM_MI10415MI
Transport in Open Pits______SM_MI10415MITransport in Open Pits______SM_MI10415MI
Transport in Open Pits______SM_MI10415MIRomil Mishra
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Software Security in the Real World w/Kelsey Hightower
Software Security in the Real World w/Kelsey HightowerSoftware Security in the Real World w/Kelsey Hightower
Software Security in the Real World w/Kelsey HightowerAnchore
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 

Último (20)

Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Laying the Data Foundations for Artificial Intelligence!
Laying the Data Foundations for Artificial Intelligence!Laying the Data Foundations for Artificial Intelligence!
Laying the Data Foundations for Artificial Intelligence!
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Which standard is best for your content?
Which standard is best for your content?Which standard is best for your content?
Which standard is best for your content?
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Automation Ops Series: Session 3 - Solutions management
Automation Ops Series: Session 3 - Solutions managementAutomation Ops Series: Session 3 - Solutions management
Automation Ops Series: Session 3 - Solutions management
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Transcript: Green paths: Learning from publishers’ sustainability journeys - ...
Transcript: Green paths: Learning from publishers’ sustainability journeys - ...Transcript: Green paths: Learning from publishers’ sustainability journeys - ...
Transcript: Green paths: Learning from publishers’ sustainability journeys - ...
 
Deliver Latency Free Customer Experience
Deliver Latency Free Customer ExperienceDeliver Latency Free Customer Experience
Deliver Latency Free Customer Experience
 
Introduction-to-Wazuh-and-its-integration.pptx
Introduction-to-Wazuh-and-its-integration.pptxIntroduction-to-Wazuh-and-its-integration.pptx
Introduction-to-Wazuh-and-its-integration.pptx
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Transport in Open Pits______SM_MI10415MI
Transport in Open Pits______SM_MI10415MITransport in Open Pits______SM_MI10415MI
Transport in Open Pits______SM_MI10415MI
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Software Security in the Real World w/Kelsey Hightower
Software Security in the Real World w/Kelsey HightowerSoftware Security in the Real World w/Kelsey Hightower
Software Security in the Real World w/Kelsey Hightower
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 

Getting Started with Data Science Using Python

  • 1. Getting Started with Data Science Using Python
  • 3. What is Data Science?
  • 4. What is Data Science?
  • 5. What is Data Science?
  • 6. Why the Hype Around Data Science? ● IBM predicts that demand for data scientists will soar by 28% by 2020 ● Data scientist roles have grown over 650% since 2012, but currently, 35,000 people in the US have data science skills, while hundreds of companies are hiring for those roles. ● Software engineering is a common starting point for professionals who are in the top five fasting growing jobs today. The career path to Machine Learning Engineer and Big Data Developer begins with a solid software engineering background. ● Data Science gives you career flexibility
  • 7. Who Are Data Scientists? https://www.datascience.com/blog/data-scientist-skills
  • 8. Who Are Data Scientists? https://www.datascience.com/blog/data-scientist-skills
  • 9. Who Are Data Scientists? https://www.datascience.com/blog/data-scientist-skills
  • 13. Application – Microsoft (Skype Product) ● The first is with a product feature called Skype Translator. As its implied, Skype uses machine learning to translate a conversation between two people speaking different languages through the use of a third party bot that joins your call. Skype Translator – How it Works | Skype Blogs ● The second is to detect fraudulent Skype Users, examples range from users who send spammy messages, to credit card and online payment fraud. This is an important application of machine learning as you can imagine, a platform that’s riddled with spammers and fraudsters is not one that will likely retain many users. Detecting Fraudulent Skype Users via Machine Learning
  • 14. Learning Data Science With Python - Libraries NumPy is a library for the Python programming language, adding support for large, multi- dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. Pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. A free software machine learning library that features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, and k-means and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.
  • 15. Learning Data Science With Python - Libraries A plotting library for the Python programming language and its numerical mathematics extension NumPy Keras is an open source neural network library written in Python. It is capable of running on top of TensorFlow, Microsoft Cognitive Toolkit, Theano, or MXNet. It was developed with a focus on enabling fast experimentation TensorFlow is an open-source software library for dataflow programming across a range of tasks. It is a symbolic math library, and is also used for machine learning applications such as neural networks.
  • 16. Learning Data Science With Python - Tools Open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text http://jupyter.org/ Crestle is your GPU-enabled Jupyter environment in the cloud. https://www.crestle.com/ Similar to Jupyter Notebook, but with the added benefit of “google doc” type sharing and collaboration https://colab.research.google.com
  • 17. Data Science Steps ● Data Gathering Unless you’re at a company with great data governance you’re likely going to have some trouble accessing the data you want. Whether that's because your company has neglected to put the necessary systems in place to gather data, or the data that they are collecting is fragmented and scattered across the organization, you’ll have to first spend some time gathering whatever data you’ll need to do your job. That means having discussions with relevant stakeholders, and getting the necessary credentials to access databases within your organization. ● Data Preparation Once you have access to data, you’ll need to spend some time cleaning and formatting it. This is where Data Science can often become more of an art, then a science. Unlike datasets you’ll find in competitions, the real world has very messy data sets. Missing values, error in data collection, data formatting, normalization, outliers - these are all issues that you’ll have to learn to deal with.
  • 18. Data Science Steps ● Exploration Before diving into building any models, you’ll want to explore the data to try to glean some insights. Clustering algorithms, scatterplots, bar graphs, Chernoff faces are all interesting ways of visualizing data that will lead to a better understanding of the structure of your data and aid you in your model building step. ● Model Building With your data cleaned and formatted, you’ll have an opportunity to explore a variety of models to see which one works best. Random Forests, SVM’s, Bayesian Predictors Neural Networks, Deep Learning, K-Nearest Neighbours - all models you should familiarize yourself with. There is no one model fits all, and so you again will need to develop intuition on which model suits your particular problem.
  • 19. Data Science Steps ● Model Validation Prediction accuracy is a common benchmark for whether your model is performing well, however often times there are other evaluation metrics to consider. False positives and false negatives are important to think about from the perspective of the problem you’re working on. If you’re predicting disease, you’ll care more about minimizing false negative, since it may result in a persons death - whereas a false positive will only lead to additional testing. ● Model Deployment Finally you’ll deploy your model into the wild, as you gather more data and feedback on how its doing you’ll be able to tweak and improve it as time goes on. *This is by no means a comprehensive list of steps, and there are certainly other things you’ll need to do to be able to do well in your job - however this is a good high level overview of the steps involved in tackling data science problems.
  • 20. Case Study Building a regression model to predict housing prices
  • 24. Resources Podcasts Websites/Blogs Communities Data Skeptic Dataquest.io experfylabs.slack.com/m essages/C0L736X36/ Data Stories Kaggle.com opendatacommunity.slac k.com Learning Machines 101 Quora.com dcommunity.slack.com Linear Digressions analyticsvidhya.com kagglenoobs.slack.com O’Reilly Data Show Coursera.org pythondev.slack.com Talking Machine https://developers.google.com/ machine-learning/crash- course/ This week in Machine Learning and AI https://portal.azure.com/ Siraj Raval (Youtube) https://www.luis.ai/
  • 25. Resources Books Tv Shows/documentaries Hands-On Machine Learning with Scikit-Learn and TensorFlow Humans (2015-) Python Machine Learning, 1st Edition Persons of interest Everybody Lies: Big Data, New Data Intelligence Weapons of Math Destruction Minority report Big Data: A Revolution That Will Transform How We Live, Work, and Think Almost human Turing: Pioneer of the Information Age Robot and frank Avogadro Corp Her Code: The Hidden Language of Computer Hardware and Software Black Mirror Superintelligence: Paths, Dangers, Strategies iRobot Visual Explanations: Images and Quantities, Evidence and Narrative Ex Machina Pattern Recognition and Machine Learning (Information Science and Statistics) The Secret Rules of Modern Living: Algorithms Storytelling with Data: A Data Visualization Guide for Business Professionals An Introduction to Statistical Learning by James, Witten, Hastie and Tibshirani