SlideShare a Scribd company logo
1 of 25
Download to read offline
©2018 dataiku, Inc.
Applied Data Science Online Course
1st Class: Learning the Basics,
concepts, & your first ML model
©2018 dataiku, Inc.
● September 20th at 12PM ET: Learning the Basics, concepts, & your first ML model
● September 27th at 12PM ET: The data science workflow, building a predictive model flow
● October 4th at 12PM ET: Getting dirty; data preparation and feature creation
● October 11th at 12PM ET: Understanding your model - and communicating about it
Curriculum
Go from Small to Big Data in 4 weeks
©2018 dataiku, Inc.
> Intro (that’s now)
> Background: going from small to big data
> Machine Learning definitions & basic concepts
> The data science workflow
> Questions
> Hands-on exercise: Titanic Prediction data
The plan for today
Learning the Basics, concepts, & your first ML model
©2018 dataiku, Inc.
Going from Small to Big Data
©2018 dataiku, Inc.
Local processing
● Limited Power (100k lines)
● Downloading and opening csv or xls on a local place
● Not distributed
Database processing
● Can process billions of lines
● File are not stored and process in the same space
than co-worker
● Distributed analysis
Local processing vs. Database processing
Going from Small to Big Data
©2018 dataiku, Inc.
The basic element you’re working on when you’re modifying data in Excel is the cell.
When you’re working with data from a database, your basic element is a column.
Whether you’re cleaning your data or enriching it with new variables, you’ll be creating new columns in new datasets,
never changing one line of a file at a time.
Cell-to-cell Modifications vs. Mass Actions
Going from Small to Big Data
VS.
©2018 dataiku, Inc.
Potential pain points for analysts to transition
Going from Small to Big Data
Interacting
with database
How to connect to
Amazon Web
Service? Hadoop?
How to extract and
transform the data
1 2 4
Collaboration
with other
profile
How to benefit and
interact with the
works of a Data
Scientist or Data
Engineer?
3
Working with
Big Data
Extract and
Transform my data
on very large files
Advanced
Analytics
How to create
Machine Learning
Models without
coding skills? And to
handle Geography,
Time series…?
©2018 dataiku, Inc.
Concepts and Definitions
©2018 dataiku, Inc.
Data Science: An interdisciplinary field that uses scientific
methods, processes, algorithms and systems to extract
knowledge and insights from data in various forms, both
structured and unstructured (wikipedia)
What are we talking about?
Definitions
©2018 dataiku, Inc.
Machine Learning: A field of study focused on constructed systems that learn from large amount of data to make
predictions or find relations.
What are we talking about?
Definitions
©2018 dataiku, Inc.
Different types of Machine Learning
Definitions
Supervised Unsupervised
Data is labeled, algorithm predicts an
output from the input data
Data isn’t labeled, algorithm learns the
inherent structure of the data and makes a
prediction
Examples:
• Predicting the genre of a song based on a
label
Examples:
• Predicting the genre of a song without a label
©2018 dataiku, Inc.
Different types of Machine Learning
Definitions
Prediction Clustering
Goal: Create a model that can predict a
target variable
Goal: Separate data into clusters based on
similarity (no specific target)
Examples:
• Predict the sales price of an apartment
• Forecast the winner of an election
• Diagnose a disease
Examples:
• Find groups of similar apartments
• Segment voters into demographic groups
• Group diseases based on symptoms
©2018 dataiku, Inc.
Different types of Prediction
Definition
If target is (continuous) then regression
If target is (discrete) then classification
Ex: predicting price of airline tickets
Ex: predicting fraud
©2018 dataiku, Inc.
Different types of Prediction
Definition
©2018 dataiku, Inc.
Different types of Machine Learning
Examples
Predicting mortgage defaults
Forecasting lifetime spending of customer
Grouping songs into genres
Predicting amount of snowfall
Segmenting website visitors
Recommending movies to Netflix users
Detecting unusual financial transactions
Prediction Clustering
(classification)
(regression)
(regression)
(classification)
(regression)
©2018 dataiku, Inc.
What’s in a dataset
Definitions
Feature
Observation
● Types of data
©2018 dataiku, Inc.
Train, test, validate
If performance on test set starts to decline, think about retraining your model
Training set
Used to create your model
Validation set
Used to measure performance
Testing set
Used to check model performance
after deployment
©2018 dataiku, Inc.
Train, test, validate
Random or Time based
Training set Validation set Test set
70%
20%
10%
©2018 dataiku, Inc.
The Data Science Workflow
©2018 dataiku, Inc.
7 steps of a data projects
The Data Science Workflow
©2018 dataiku, Inc.
Advanced version of the workflow
The Data Science Workflow
Data Acquisition &
Understanding
Data Preparation Model Creation
Evaluation Deployment
Dataset 1
Scored
dataset
Scored
dataset
Iteration 1
Iteration 2
Iteration n
Dataset 2
Dataset n
Business
Understanding
©2018 dataiku, Inc.
Qu s o s?
©2018 dataiku, Inc.
Hands-on
©2018 dataiku, Inc.
Kaggle Titanic Challenge
Titanic Use Case
Predicting who survived the tragedy
©2018 dataiku, Inc.
About Dataiku - Your Path to Enterprise AI

More Related Content

What's hot

Main principles of Data Science and Machine Learning
Main principles of Data Science and Machine LearningMain principles of Data Science and Machine Learning
Main principles of Data Science and Machine LearningNikolay Karelin
 
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...Edureka!
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Science Introduction
Data Science IntroductionData Science Introduction
Data Science IntroductionGang Tao
 
Knowledge Graphs and Generative AI
Knowledge Graphs and Generative AIKnowledge Graphs and Generative AI
Knowledge Graphs and Generative AINeo4j
 
Graph Databases and Machine Learning | November 2018
Graph Databases and Machine Learning | November 2018Graph Databases and Machine Learning | November 2018
Graph Databases and Machine Learning | November 2018TigerGraph
 
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...SoftServe
 
Actionable Carbon Tracking and Analysis with the Neo4j Graph Data Platform
Actionable Carbon Tracking and Analysis with the Neo4j Graph Data PlatformActionable Carbon Tracking and Analysis with the Neo4j Graph Data Platform
Actionable Carbon Tracking and Analysis with the Neo4j Graph Data PlatformNeo4j
 
Machine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdf
Machine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdfMachine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdf
Machine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdfMaris R
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analyticsNatalino Busa
 
Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data ScienceSpotle.ai
 
Big Data Visualization
Big Data VisualizationBig Data Visualization
Big Data VisualizationRaffael Marty
 
Rakuten - Recommendation Platform
Rakuten - Recommendation PlatformRakuten - Recommendation Platform
Rakuten - Recommendation PlatformKarthik Murugesan
 
Meetup: Streaming Data Pipeline Development
Meetup:  Streaming Data Pipeline DevelopmentMeetup:  Streaming Data Pipeline Development
Meetup: Streaming Data Pipeline DevelopmentTimothy Spann
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMatei Zaharia
 
Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflowDatabricks
 
Real time big data stream processing
Real time big data stream processing Real time big data stream processing
Real time big data stream processing Luay AL-Assadi
 

What's hot (20)

Main principles of Data Science and Machine Learning
Main principles of Data Science and Machine LearningMain principles of Data Science and Machine Learning
Main principles of Data Science and Machine Learning
 
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Science Introduction
Data Science IntroductionData Science Introduction
Data Science Introduction
 
Knowledge Graphs and Generative AI
Knowledge Graphs and Generative AIKnowledge Graphs and Generative AI
Knowledge Graphs and Generative AI
 
Graph Databases and Machine Learning | November 2018
Graph Databases and Machine Learning | November 2018Graph Databases and Machine Learning | November 2018
Graph Databases and Machine Learning | November 2018
 
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
 
Data science Big Data
Data science Big DataData science Big Data
Data science Big Data
 
Actionable Carbon Tracking and Analysis with the Neo4j Graph Data Platform
Actionable Carbon Tracking and Analysis with the Neo4j Graph Data PlatformActionable Carbon Tracking and Analysis with the Neo4j Graph Data Platform
Actionable Carbon Tracking and Analysis with the Neo4j Graph Data Platform
 
Data Lake,beyond the Data Warehouse
Data Lake,beyond the Data WarehouseData Lake,beyond the Data Warehouse
Data Lake,beyond the Data Warehouse
 
What is MLOps
What is MLOpsWhat is MLOps
What is MLOps
 
Machine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdf
Machine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdfMachine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdf
Machine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdf
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analytics
 
Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data Science
 
Big Data Visualization
Big Data VisualizationBig Data Visualization
Big Data Visualization
 
Rakuten - Recommendation Platform
Rakuten - Recommendation PlatformRakuten - Recommendation Platform
Rakuten - Recommendation Platform
 
Meetup: Streaming Data Pipeline Development
Meetup:  Streaming Data Pipeline DevelopmentMeetup:  Streaming Data Pipeline Development
Meetup: Streaming Data Pipeline Development
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
 
Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflow
 
Real time big data stream processing
Real time big data stream processing Real time big data stream processing
Real time big data stream processing
 

Similar to Applied Data Science Course Part 1: Concepts & your first ML model

Nonprofits + Data: Pathway to Innovation
Nonprofits + Data: Pathway to InnovationNonprofits + Data: Pathway to Innovation
Nonprofits + Data: Pathway to InnovationTim Sarrantonio
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceMahir Haque
 
Cloudera Cares + DataKind | 7 May 2015 | London, UK
Cloudera Cares + DataKind | 7 May 2015 | London, UKCloudera Cares + DataKind | 7 May 2015 | London, UK
Cloudera Cares + DataKind | 7 May 2015 | London, UKCloudera, Inc.
 
CC TEL- Simulation-based co-design of algorithms
CC TEL- Simulation-based co-design of algorithmsCC TEL- Simulation-based co-design of algorithms
CC TEL- Simulation-based co-design of algorithmsSebastian Dennerlein
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Caserta
 
Certified Data Analytics Course in Chennai-April
Certified Data Analytics Course in Chennai-AprilCertified Data Analytics Course in Chennai-April
Certified Data Analytics Course in Chennai-AprilDataMites
 
Certified Data Analyst Course in Chennai-April
Certified Data Analyst Course in Chennai-AprilCertified Data Analyst Course in Chennai-April
Certified Data Analyst Course in Chennai-AprilDataMites
 
Certified Data Analytics Course in Chennai-April
Certified Data Analytics Course in Chennai-AprilCertified Data Analytics Course in Chennai-April
Certified Data Analytics Course in Chennai-AprilDataMites
 
Certified Data Analytics Course in Chennai-April
Certified Data Analytics Course in Chennai-AprilCertified Data Analytics Course in Chennai-April
Certified Data Analytics Course in Chennai-AprilDataMites
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Denodo
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data SciencePouria Amirian
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data SciencePouria Amirian
 
Data Analytics Course in Coimbatore-April
Data Analytics Course in Coimbatore-AprilData Analytics Course in Coimbatore-April
Data Analytics Course in Coimbatore-AprilDataMites
 
Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data AnalyticsUtkarsh Sharma
 
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptxXanGwaps
 

Similar to Applied Data Science Course Part 1: Concepts & your first ML model (20)

Nonprofits + Data: Pathway to Innovation
Nonprofits + Data: Pathway to InnovationNonprofits + Data: Pathway to Innovation
Nonprofits + Data: Pathway to Innovation
 
Big data overview
Big data overviewBig data overview
Big data overview
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Big Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARLBig Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARL
 
Cloudera Cares + DataKind | 7 May 2015 | London, UK
Cloudera Cares + DataKind | 7 May 2015 | London, UKCloudera Cares + DataKind | 7 May 2015 | London, UK
Cloudera Cares + DataKind | 7 May 2015 | London, UK
 
Bigowl aitech
Bigowl aitechBigowl aitech
Bigowl aitech
 
CC TEL- Simulation-based co-design of algorithms
CC TEL- Simulation-based co-design of algorithmsCC TEL- Simulation-based co-design of algorithms
CC TEL- Simulation-based co-design of algorithms
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Certified Data Analytics Course in Chennai-April
Certified Data Analytics Course in Chennai-AprilCertified Data Analytics Course in Chennai-April
Certified Data Analytics Course in Chennai-April
 
Certified Data Analyst Course in Chennai-April
Certified Data Analyst Course in Chennai-AprilCertified Data Analyst Course in Chennai-April
Certified Data Analyst Course in Chennai-April
 
Certified Data Analytics Course in Chennai-April
Certified Data Analytics Course in Chennai-AprilCertified Data Analytics Course in Chennai-April
Certified Data Analytics Course in Chennai-April
 
Certified Data Analytics Course in Chennai-April
Certified Data Analytics Course in Chennai-AprilCertified Data Analytics Course in Chennai-April
Certified Data Analytics Course in Chennai-April
 
Big Data Trends
Big Data TrendsBig Data Trends
Big Data Trends
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
 
Data Analytics Course in Coimbatore-April
Data Analytics Course in Coimbatore-AprilData Analytics Course in Coimbatore-April
Data Analytics Course in Coimbatore-April
 
Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data Analytics
 
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
 

More from Dataiku

Applied Data Science Course Part 2: the data science workflow and basic model...
Applied Data Science Course Part 2: the data science workflow and basic model...Applied Data Science Course Part 2: the data science workflow and basic model...
Applied Data Science Course Part 2: the data science workflow and basic model...Dataiku
 
The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016 The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016 Dataiku
 
Dataiku - data driven nyc - april 2016 - the solitude of the data team m...
Dataiku  -  data driven nyc  - april  2016 - the  solitude of the data team m...Dataiku  -  data driven nyc  - april  2016 - the  solitude of the data team m...
Dataiku - data driven nyc - april 2016 - the solitude of the data team m...Dataiku
 
How to Build a Successful Data Team - Florian Douetteau (@Dataiku)
How to Build a Successful Data Team - Florian Douetteau (@Dataiku) How to Build a Successful Data Team - Florian Douetteau (@Dataiku)
How to Build a Successful Data Team - Florian Douetteau (@Dataiku) Dataiku
 
The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products Dataiku
 
The US Healthcare Industry
The US Healthcare IndustryThe US Healthcare Industry
The US Healthcare IndustryDataiku
 
How to Build Successful Data Team - Dataiku ?
How to Build Successful Data Team -  Dataiku ? How to Build Successful Data Team -  Dataiku ?
How to Build Successful Data Team - Dataiku ? Dataiku
 
Before Kaggle : from a business goal to a Machine Learning problem
Before Kaggle : from a business goal to a Machine Learning problem Before Kaggle : from a business goal to a Machine Learning problem
Before Kaggle : from a business goal to a Machine Learning problem Dataiku
 
04Juin2015_Symposium_Présentation_Coyote_Dataiku
04Juin2015_Symposium_Présentation_Coyote_Dataiku 04Juin2015_Symposium_Présentation_Coyote_Dataiku
04Juin2015_Symposium_Présentation_Coyote_Dataiku Dataiku
 
Dataiku productive application to production - pap is may 2015
Dataiku    productive application to production - pap is may 2015 Dataiku    productive application to production - pap is may 2015
Dataiku productive application to production - pap is may 2015 Dataiku
 
Coyote & Dataiku - Séminaire Dixit GFII du 13 04-2015
Coyote & Dataiku - Séminaire Dixit GFII du 13 04-2015Coyote & Dataiku - Séminaire Dixit GFII du 13 04-2015
Coyote & Dataiku - Séminaire Dixit GFII du 13 04-2015Dataiku
 
Dataiku - Big data paris 2015 - A Hybrid Platform, a Hybrid Team
Dataiku -  Big data paris 2015 - A Hybrid Platform, a Hybrid Team Dataiku -  Big data paris 2015 - A Hybrid Platform, a Hybrid Team
Dataiku - Big data paris 2015 - A Hybrid Platform, a Hybrid Team Dataiku
 
The paradox of big data - dataiku / oxalide APEROTECH
The paradox of big data - dataiku / oxalide APEROTECHThe paradox of big data - dataiku / oxalide APEROTECH
The paradox of big data - dataiku / oxalide APEROTECHDataiku
 
OWF 2014 - Take back control of your Web tracking - Dataiku
OWF 2014 - Take back control of your Web tracking - DataikuOWF 2014 - Take back control of your Web tracking - Dataiku
OWF 2014 - Take back control of your Web tracking - DataikuDataiku
 
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku
 
Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Over...
Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Over...Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Over...
Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Over...Dataiku
 
Dataiku hadoop summit - semi-supervised learning with hadoop for understand...
Dataiku   hadoop summit - semi-supervised learning with hadoop for understand...Dataiku   hadoop summit - semi-supervised learning with hadoop for understand...
Dataiku hadoop summit - semi-supervised learning with hadoop for understand...Dataiku
 
Dataiku big data paris - the rise of the hadoop ecosystem
Dataiku   big data paris - the rise of the hadoop ecosystemDataiku   big data paris - the rise of the hadoop ecosystem
Dataiku big data paris - the rise of the hadoop ecosystemDataiku
 
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014Dataiku
 
BreizhJUG - Janvier 2014 - Big Data - Dataiku - Pages Jaunes
BreizhJUG - Janvier 2014 - Big Data -  Dataiku - Pages JaunesBreizhJUG - Janvier 2014 - Big Data -  Dataiku - Pages Jaunes
BreizhJUG - Janvier 2014 - Big Data - Dataiku - Pages JaunesDataiku
 

More from Dataiku (20)

Applied Data Science Course Part 2: the data science workflow and basic model...
Applied Data Science Course Part 2: the data science workflow and basic model...Applied Data Science Course Part 2: the data science workflow and basic model...
Applied Data Science Course Part 2: the data science workflow and basic model...
 
The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016 The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016
 
Dataiku - data driven nyc - april 2016 - the solitude of the data team m...
Dataiku  -  data driven nyc  - april  2016 - the  solitude of the data team m...Dataiku  -  data driven nyc  - april  2016 - the  solitude of the data team m...
Dataiku - data driven nyc - april 2016 - the solitude of the data team m...
 
How to Build a Successful Data Team - Florian Douetteau (@Dataiku)
How to Build a Successful Data Team - Florian Douetteau (@Dataiku) How to Build a Successful Data Team - Florian Douetteau (@Dataiku)
How to Build a Successful Data Team - Florian Douetteau (@Dataiku)
 
The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products
 
The US Healthcare Industry
The US Healthcare IndustryThe US Healthcare Industry
The US Healthcare Industry
 
How to Build Successful Data Team - Dataiku ?
How to Build Successful Data Team -  Dataiku ? How to Build Successful Data Team -  Dataiku ?
How to Build Successful Data Team - Dataiku ?
 
Before Kaggle : from a business goal to a Machine Learning problem
Before Kaggle : from a business goal to a Machine Learning problem Before Kaggle : from a business goal to a Machine Learning problem
Before Kaggle : from a business goal to a Machine Learning problem
 
04Juin2015_Symposium_Présentation_Coyote_Dataiku
04Juin2015_Symposium_Présentation_Coyote_Dataiku 04Juin2015_Symposium_Présentation_Coyote_Dataiku
04Juin2015_Symposium_Présentation_Coyote_Dataiku
 
Dataiku productive application to production - pap is may 2015
Dataiku    productive application to production - pap is may 2015 Dataiku    productive application to production - pap is may 2015
Dataiku productive application to production - pap is may 2015
 
Coyote & Dataiku - Séminaire Dixit GFII du 13 04-2015
Coyote & Dataiku - Séminaire Dixit GFII du 13 04-2015Coyote & Dataiku - Séminaire Dixit GFII du 13 04-2015
Coyote & Dataiku - Séminaire Dixit GFII du 13 04-2015
 
Dataiku - Big data paris 2015 - A Hybrid Platform, a Hybrid Team
Dataiku -  Big data paris 2015 - A Hybrid Platform, a Hybrid Team Dataiku -  Big data paris 2015 - A Hybrid Platform, a Hybrid Team
Dataiku - Big data paris 2015 - A Hybrid Platform, a Hybrid Team
 
The paradox of big data - dataiku / oxalide APEROTECH
The paradox of big data - dataiku / oxalide APEROTECHThe paradox of big data - dataiku / oxalide APEROTECH
The paradox of big data - dataiku / oxalide APEROTECH
 
OWF 2014 - Take back control of your Web tracking - Dataiku
OWF 2014 - Take back control of your Web tracking - DataikuOWF 2014 - Take back control of your Web tracking - Dataiku
OWF 2014 - Take back control of your Web tracking - Dataiku
 
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
 
Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Over...
Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Over...Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Over...
Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Over...
 
Dataiku hadoop summit - semi-supervised learning with hadoop for understand...
Dataiku   hadoop summit - semi-supervised learning with hadoop for understand...Dataiku   hadoop summit - semi-supervised learning with hadoop for understand...
Dataiku hadoop summit - semi-supervised learning with hadoop for understand...
 
Dataiku big data paris - the rise of the hadoop ecosystem
Dataiku   big data paris - the rise of the hadoop ecosystemDataiku   big data paris - the rise of the hadoop ecosystem
Dataiku big data paris - the rise of the hadoop ecosystem
 
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
 
BreizhJUG - Janvier 2014 - Big Data - Dataiku - Pages Jaunes
BreizhJUG - Janvier 2014 - Big Data -  Dataiku - Pages JaunesBreizhJUG - Janvier 2014 - Big Data -  Dataiku - Pages Jaunes
BreizhJUG - Janvier 2014 - Big Data - Dataiku - Pages Jaunes
 

Recently uploaded

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 

Recently uploaded (20)

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 

Applied Data Science Course Part 1: Concepts & your first ML model

  • 1. ©2018 dataiku, Inc. Applied Data Science Online Course 1st Class: Learning the Basics, concepts, & your first ML model
  • 2. ©2018 dataiku, Inc. ● September 20th at 12PM ET: Learning the Basics, concepts, & your first ML model ● September 27th at 12PM ET: The data science workflow, building a predictive model flow ● October 4th at 12PM ET: Getting dirty; data preparation and feature creation ● October 11th at 12PM ET: Understanding your model - and communicating about it Curriculum Go from Small to Big Data in 4 weeks
  • 3. ©2018 dataiku, Inc. > Intro (that’s now) > Background: going from small to big data > Machine Learning definitions & basic concepts > The data science workflow > Questions > Hands-on exercise: Titanic Prediction data The plan for today Learning the Basics, concepts, & your first ML model
  • 4. ©2018 dataiku, Inc. Going from Small to Big Data
  • 5. ©2018 dataiku, Inc. Local processing ● Limited Power (100k lines) ● Downloading and opening csv or xls on a local place ● Not distributed Database processing ● Can process billions of lines ● File are not stored and process in the same space than co-worker ● Distributed analysis Local processing vs. Database processing Going from Small to Big Data
  • 6. ©2018 dataiku, Inc. The basic element you’re working on when you’re modifying data in Excel is the cell. When you’re working with data from a database, your basic element is a column. Whether you’re cleaning your data or enriching it with new variables, you’ll be creating new columns in new datasets, never changing one line of a file at a time. Cell-to-cell Modifications vs. Mass Actions Going from Small to Big Data VS.
  • 7. ©2018 dataiku, Inc. Potential pain points for analysts to transition Going from Small to Big Data Interacting with database How to connect to Amazon Web Service? Hadoop? How to extract and transform the data 1 2 4 Collaboration with other profile How to benefit and interact with the works of a Data Scientist or Data Engineer? 3 Working with Big Data Extract and Transform my data on very large files Advanced Analytics How to create Machine Learning Models without coding skills? And to handle Geography, Time series…?
  • 9. ©2018 dataiku, Inc. Data Science: An interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from data in various forms, both structured and unstructured (wikipedia) What are we talking about? Definitions
  • 10. ©2018 dataiku, Inc. Machine Learning: A field of study focused on constructed systems that learn from large amount of data to make predictions or find relations. What are we talking about? Definitions
  • 11. ©2018 dataiku, Inc. Different types of Machine Learning Definitions Supervised Unsupervised Data is labeled, algorithm predicts an output from the input data Data isn’t labeled, algorithm learns the inherent structure of the data and makes a prediction Examples: • Predicting the genre of a song based on a label Examples: • Predicting the genre of a song without a label
  • 12. ©2018 dataiku, Inc. Different types of Machine Learning Definitions Prediction Clustering Goal: Create a model that can predict a target variable Goal: Separate data into clusters based on similarity (no specific target) Examples: • Predict the sales price of an apartment • Forecast the winner of an election • Diagnose a disease Examples: • Find groups of similar apartments • Segment voters into demographic groups • Group diseases based on symptoms
  • 13. ©2018 dataiku, Inc. Different types of Prediction Definition If target is (continuous) then regression If target is (discrete) then classification Ex: predicting price of airline tickets Ex: predicting fraud
  • 14. ©2018 dataiku, Inc. Different types of Prediction Definition
  • 15. ©2018 dataiku, Inc. Different types of Machine Learning Examples Predicting mortgage defaults Forecasting lifetime spending of customer Grouping songs into genres Predicting amount of snowfall Segmenting website visitors Recommending movies to Netflix users Detecting unusual financial transactions Prediction Clustering (classification) (regression) (regression) (classification) (regression)
  • 16. ©2018 dataiku, Inc. What’s in a dataset Definitions Feature Observation ● Types of data
  • 17. ©2018 dataiku, Inc. Train, test, validate If performance on test set starts to decline, think about retraining your model Training set Used to create your model Validation set Used to measure performance Testing set Used to check model performance after deployment
  • 18. ©2018 dataiku, Inc. Train, test, validate Random or Time based Training set Validation set Test set 70% 20% 10%
  • 19. ©2018 dataiku, Inc. The Data Science Workflow
  • 20. ©2018 dataiku, Inc. 7 steps of a data projects The Data Science Workflow
  • 21. ©2018 dataiku, Inc. Advanced version of the workflow The Data Science Workflow Data Acquisition & Understanding Data Preparation Model Creation Evaluation Deployment Dataset 1 Scored dataset Scored dataset Iteration 1 Iteration 2 Iteration n Dataset 2 Dataset n Business Understanding
  • 24. ©2018 dataiku, Inc. Kaggle Titanic Challenge Titanic Use Case Predicting who survived the tragedy
  • 25. ©2018 dataiku, Inc. About Dataiku - Your Path to Enterprise AI