SlideShare una empresa de Scribd logo
1 de 51
Descargar para leer sin conexión
Uczenie maszynowe
Vladimir Alekseichenko
„rocket science” czy chleb powszedni?
Zmiany w czasie
10min na jeden
36 500 000 minut
~70 lat
Kierowca vs Mechanik
dataworkshop.eu
Bike Sharing Demand
Zadnie - kaggle
Rozwiązanie - github.com/dataworkshop
Understand
Business & Data
Read and explore data
Feature Engineering
Create a new ones based on already exists
Feature Selection
Select only useful features
Model Selection
Find the best model(s) model
A
model
B
model
C
model
D
model
E
Tuning
Hyperparameters
Find the best hyperparameters for given model
Ensemble Modeling
Combine few models into one more better
x0.6 x0.4+
mode
l B
mode
l E
datetime season temp count
2011-01-01 08:32:02 1 9.23 5
2012-04-02 12:10:00 2 18.78 32
2012-08-07 15:47:01 3 15.45 15
datetime season temp hour day month … count count_log
2011-01-01
08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02
12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07
15:47:01 3 15.45 15 7 8 … 15 2.708
mode
l B
mode
l E
datetime season temp hour day month … count count_log
2011-01-01
08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02
12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07
15:47:01 3 15.45 15 7 8 … 15 2.708
Understand
Business & Data
Read and explore data
Feature Engineering
Create a new ones based on already exists
Feature Selection
Select only useful features
Model Selection
Find the best model(s) model
A
model
B
model
C
model
D
model
E
Tuning
Hyperparameters
Find the best hyperparameters for given model
Ensemble Modeling
Combine few models into one more better
x0.6 x0.4+
mode
l B
mode
l E
datetime season temp count
2011-01-01 08:32:02 1 9.23 5
2012-04-02 12:10:00 2 18.78 32
2012-08-07 15:47:01 3 15.45 15
datetime season temp hour day month … count count_log
2011-01-01
08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02
12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07
15:47:01 3 15.45 15 7 8 … 15 2.708
mode
l B
mode
l E
datetime season temp hour day month … count count_log
2011-01-01
08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02
12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07
15:47:01 3 15.45 15 7 8 … 15 2.708
Zrozum Biznes i Dane
(understand business and data)
Dni robocze
Weekend
Understand
Business & Data
Read and explore data
Feature Engineering
Create a new ones based on already exists
Feature Selection
Select only useful features
Model Selection
Find the best model(s) model
A
model
B
model
C
model
D
model
E
Tuning
Hyperparameters
Find the best hyperparameters for given model
Ensemble Modeling
Combine few models into one more better
x0.6 x0.4+
mode
l B
mode
l E
datetime season temp count
2011-01-01 08:32:02 1 9.23 5
2012-04-02 12:10:00 2 18.78 32
2012-08-07 15:47:01 3 15.45 15
datetime season temp hour day month … count count_log
2011-01-01
08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02
12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07
15:47:01 3 15.45 15 7 8 … 15 2.708
mode
l B
mode
l E
datetime season temp hour day month … count count_log
2011-01-01
08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02
12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07
15:47:01 3 15.45 15 7 8 … 15 2.708
Wytworzenie cech
(feature engineering)
• ilościowe => od 1 do 10, 11 do 20…
• daty => dzień, miesiąc, rok, godzina, czy weekend…
• kategorii/jakościowe (czerwony, zielony, biały)
• przypisać identyfikator liczbowy (1, 2, 3)
• stworzyć n-kolumn binarnych (jest czerwony? itd)
• prawdopodobieństwa ze zmienną docelową
Understand
Business & Data
Read and explore data
Feature Engineering
Create a new ones based on already exists
Feature Selection
Select only useful features
Model Selection
Find the best model(s) model
A
model
B
model
C
model
D
model
E
Tuning
Hyperparameters
Find the best hyperparameters for given model
Ensemble Modeling
Combine few models into one more better
x0.6 x0.4+
mode
l B
mode
l E
datetime season temp count
2011-01-01 08:32:02 1 9.23 5
2012-04-02 12:10:00 2 18.78 32
2012-08-07 15:47:01 3 15.45 15
datetime season temp hour day month … count count_log
2011-01-01
08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02
12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07
15:47:01 3 15.45 15 7 8 … 15 2.708
mode
l B
mode
l E
datetime season temp hour day month … count count_log
2011-01-01
08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02
12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07
15:47:01 3 15.45 15 7 8 … 15 2.708
Selekcja cech
(feature selection)
• Czym mniej tym lepiej (prostszy model)
• Zostawić najbardziej wartościowe (idealnie jedna :)
• Cechy (zazwyczaj) są zależny, więc trzeba uważać… (sprawdzać empirycznie)
• Szybciej
Variance
Univariate
Recursive
xgbfir
https://github.com/limexp/xgbfir
Understand
Business & Data
Read and explore data
Feature Engineering
Create a new ones based on already exists
Feature Selection
Select only useful features
Model Selection
Find the best model(s) model
A
model
B
model
C
model
D
model
E
Tuning
Hyperparameters
Find the best hyperparameters for given model
Ensemble Modeling
Combine few models into one more better
x0.6 x0.4+
mode
l B
mode
l E
datetime season temp count
2011-01-01 08:32:02 1 9.23 5
2012-04-02 12:10:00 2 18.78 32
2012-08-07 15:47:01 3 15.45 15
datetime season temp hour day month … count count_log
2011-01-01
08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02
12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07
15:47:01 3 15.45 15 7 8 … 15 2.708
mode
l B
mode
l E
datetime season temp hour day month … count count_log
2011-01-01
08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02
12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07
15:47:01 3 15.45 15 7 8 … 15 2.708
Dobór Modelu
(model selection)
• Linear
• Decision Tree
• Random Forest
• Gradient Boosting
• Neural Network
Linear
https://github.com/dataworkshop/model_evaluation/blob/master/step1-regression.ipynb
Decision Tree
http://xgboost.readthedocs.io/en/latest/model.html
Ensemble trees
http://xgboost.readthedocs.io/en/latest/model.html
Ensemble trees
• Bagging (bootstrap aggregation)
• Random Forest
• Extra Trees
• Boosting
• Gradient Boosting
XGBoost
(Extreme Gradient Boosting)
“When in doubt, use
xgboost”
Owen Zhang
Wybór modelu
(model selection)
Understand
Business & Data
Read and explore data
Feature Engineering
Create a new ones based on already exists
Feature Selection
Select only useful features
Model Selection
Find the best model(s) model
A
model
B
model
C
model
D
model
E
Tuning
Hyperparameters
Find the best hyperparameters for given model
Ensemble Modeling
Combine few models into one more better
x0.6 x0.4+
mode
l B
mode
l E
datetime season temp count
2011-01-01 08:32:02 1 9.23 5
2012-04-02 12:10:00 2 18.78 32
2012-08-07 15:47:01 3 15.45 15
datetime season temp hour day month … count count_log
2011-01-01
08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02
12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07
15:47:01 3 15.45 15 7 8 … 15 2.708
mode
l B
mode
l E
datetime season temp hour day month … count count_log
2011-01-01
08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02
12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07
15:47:01 3 15.45 15 7 8 … 15 2.708
Dobór hiperparametrów
(tuning hyperparameters)
• Grid Search
• Random Search
• Bayesian
hyperopt
Understand
Business & Data
Read and explore data
Feature Engineering
Create a new ones based on already exists
Feature Selection
Select only useful features
Model Selection
Find the best model(s) model
A
model
B
model
C
model
D
model
E
Tuning
Hyperparameters
Find the best hyperparameters for given model
Ensemble Modeling
Combine few models into one more better
x0.6 x0.4+
mode
l B
mode
l E
datetime season temp count
2011-01-01 08:32:02 1 9.23 5
2012-04-02 12:10:00 2 18.78 32
2012-08-07 15:47:01 3 15.45 15
datetime season temp hour day month … count count_log
2011-01-01
08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02
12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07
15:47:01 3 15.45 15 7 8 … 15 2.708
mode
l B
mode
l E
datetime season temp hour day month … count count_log
2011-01-01
08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02
12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07
15:47:01 3 15.45 15 7 8 … 15 2.708
Ansambl
(ensemble modeling)
Neuron
(Artificial) Neural Network
MNIST
Dane
Neural Network
Error: 1.60%
http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html
source
Wyzwania
Przeuczenie się
(overfitting)
http://mlwiki.org/index.php/Overfitting
Sprawdzian krzyżowy
(cross-validation)
http://blog.goldenhelix.com/bchristensen/cross-validation-for-genomic-prediction-in-svs/
Kreatywność jest wiele warta
https://techcrunch.com/2016/11/19/how-data-science-and-rocket-science-will-get-humans-to-mars
source
Fala już idzi…
czy jesteś gotów?
Dziękuję
@slon1024
hello@vova.me
dataworkshop.eu

Más contenido relacionado

Similar a AIMeetup #3: Uczenie maszynowe - rocket science czy chleb powszedni?

Just-in-Time Analytics and the Need for Autonomous Database Administration wi...
Just-in-Time Analytics and the Need for Autonomous Database Administration wi...Just-in-Time Analytics and the Need for Autonomous Database Administration wi...
Just-in-Time Analytics and the Need for Autonomous Database Administration wi...Databricks
 
15 a 01 reporting
15 a 01 reporting15 a 01 reporting
15 a 01 reportingtflung
 
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case StudyMongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case StudyMongoDB
 
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdfChris Hoyean Song
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...James Anderson
 
Back to FME School - Day 1: Your Data and FME
Back to FME School - Day 1: Your Data and FMEBack to FME School - Day 1: Your Data and FME
Back to FME School - Day 1: Your Data and FMESafe Software
 
Simplify Feature Engineering in Your Data Warehouse
Simplify Feature Engineering in Your Data WarehouseSimplify Feature Engineering in Your Data Warehouse
Simplify Feature Engineering in Your Data WarehouseFeatureByte
 
Performance Optimization of Rails Applications
Performance Optimization of Rails ApplicationsPerformance Optimization of Rails Applications
Performance Optimization of Rails ApplicationsSerge Smetana
 
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake EditionFrom BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake EditionRittman Analytics
 
Precomputing recommendations with Apache Beam
Precomputing recommendations with Apache BeamPrecomputing recommendations with Apache Beam
Precomputing recommendations with Apache BeamTatiana Al-Chueyr
 
Summer 2013 Internship Reflection
Summer 2013 Internship ReflectionSummer 2013 Internship Reflection
Summer 2013 Internship ReflectionTrevor Huggins
 
A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)Toshiyuki Shimono
 
MOPs & ML Pipelines on GCP - Session 6, RGDC
MOPs & ML Pipelines on GCP - Session 6, RGDCMOPs & ML Pipelines on GCP - Session 6, RGDC
MOPs & ML Pipelines on GCP - Session 6, RGDCgdgsurrey
 
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...Databricks
 
Building Intelligent Workplace Limits and Challenges RIGA COMM 2023
Building Intelligent Workplace Limits and Challenges RIGA COMM 2023 Building Intelligent Workplace Limits and Challenges RIGA COMM 2023
Building Intelligent Workplace Limits and Challenges RIGA COMM 2023 Muntis Rudzitis
 
Reporting with cloud solutions from SAP
Reporting with cloud solutions from SAPReporting with cloud solutions from SAP
Reporting with cloud solutions from SAPAndreas Eißmann
 
Ph.D Defense Clément Béra
Ph.D Defense Clément BéraPh.D Defense Clément Béra
Ph.D Defense Clément BéraClément Béra
 

Similar a AIMeetup #3: Uczenie maszynowe - rocket science czy chleb powszedni? (20)

Just-in-Time Analytics and the Need for Autonomous Database Administration wi...
Just-in-Time Analytics and the Need for Autonomous Database Administration wi...Just-in-Time Analytics and the Need for Autonomous Database Administration wi...
Just-in-Time Analytics and the Need for Autonomous Database Administration wi...
 
15 a 01 reporting
15 a 01 reporting15 a 01 reporting
15 a 01 reporting
 
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case StudyMongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
 
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
 
Back to FME School - Day 1: Your Data and FME
Back to FME School - Day 1: Your Data and FMEBack to FME School - Day 1: Your Data and FME
Back to FME School - Day 1: Your Data and FME
 
Simplify Feature Engineering in Your Data Warehouse
Simplify Feature Engineering in Your Data WarehouseSimplify Feature Engineering in Your Data Warehouse
Simplify Feature Engineering in Your Data Warehouse
 
STEP Architecture Update
STEP Architecture UpdateSTEP Architecture Update
STEP Architecture Update
 
Performance Optimization of Rails Applications
Performance Optimization of Rails ApplicationsPerformance Optimization of Rails Applications
Performance Optimization of Rails Applications
 
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake EditionFrom BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition
 
Precomputing recommendations with Apache Beam
Precomputing recommendations with Apache BeamPrecomputing recommendations with Apache Beam
Precomputing recommendations with Apache Beam
 
Summer 2013 Internship Reflection
Summer 2013 Internship ReflectionSummer 2013 Internship Reflection
Summer 2013 Internship Reflection
 
IC2IT 2013 Presentation
IC2IT 2013 PresentationIC2IT 2013 Presentation
IC2IT 2013 Presentation
 
IC2IT 2013 Presentation
IC2IT 2013 PresentationIC2IT 2013 Presentation
IC2IT 2013 Presentation
 
A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)
 
MOPs & ML Pipelines on GCP - Session 6, RGDC
MOPs & ML Pipelines on GCP - Session 6, RGDCMOPs & ML Pipelines on GCP - Session 6, RGDC
MOPs & ML Pipelines on GCP - Session 6, RGDC
 
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...
 
Building Intelligent Workplace Limits and Challenges RIGA COMM 2023
Building Intelligent Workplace Limits and Challenges RIGA COMM 2023 Building Intelligent Workplace Limits and Challenges RIGA COMM 2023
Building Intelligent Workplace Limits and Challenges RIGA COMM 2023
 
Reporting with cloud solutions from SAP
Reporting with cloud solutions from SAPReporting with cloud solutions from SAP
Reporting with cloud solutions from SAP
 
Ph.D Defense Clément Béra
Ph.D Defense Clément BéraPh.D Defense Clément Béra
Ph.D Defense Clément Béra
 

Más de 2040.io

Jak budujemy inteligentnego asystenta biznesowego
Jak budujemy inteligentnego asystenta biznesowegoJak budujemy inteligentnego asystenta biznesowego
Jak budujemy inteligentnego asystenta biznesowego2040.io
 
Obsługa klienta z wykorzystaniem sztucznej inteligencji
Obsługa klienta z wykorzystaniem sztucznej inteligencjiObsługa klienta z wykorzystaniem sztucznej inteligencji
Obsługa klienta z wykorzystaniem sztucznej inteligencji2040.io
 
Jak AI pozwala nam usłyszeć głos klienta
Jak AI pozwala nam usłyszeć głos klientaJak AI pozwala nam usłyszeć głos klienta
Jak AI pozwala nam usłyszeć głos klienta2040.io
 
Wyzwania związane z modelowaniem mobilnych systemów świadomych kontekstu
Wyzwania związane z modelowaniem mobilnych systemów świadomych kontekstuWyzwania związane z modelowaniem mobilnych systemów świadomych kontekstu
Wyzwania związane z modelowaniem mobilnych systemów świadomych kontekstu2040.io
 
Rozpoznawanie mowy: problem rozwiązany?
Rozpoznawanie mowy: problem rozwiązany?Rozpoznawanie mowy: problem rozwiązany?
Rozpoznawanie mowy: problem rozwiązany?2040.io
 
Czy Deep Learning działa?
Czy Deep Learning działa?Czy Deep Learning działa?
Czy Deep Learning działa?2040.io
 
Analiza semantyczna zasosowana w środowisku Menerva
Analiza semantyczna zasosowana w środowisku MenervaAnaliza semantyczna zasosowana w środowisku Menerva
Analiza semantyczna zasosowana w środowisku Menerva2040.io
 
Time-series prediction with neural networks
Time-series prediction with neural networksTime-series prediction with neural networks
Time-series prediction with neural networks2040.io
 
Ai meetup Neural machine translation updated
Ai meetup Neural machine translation updatedAi meetup Neural machine translation updated
Ai meetup Neural machine translation updated2040.io
 
AIMeetup #4: Neural-machine-translation
AIMeetup #4: Neural-machine-translationAIMeetup #4: Neural-machine-translation
AIMeetup #4: Neural-machine-translation2040.io
 
AIMeetup #4: Artificial intelligence and economics
AIMeetup #4: Artificial intelligence and economicsAIMeetup #4: Artificial intelligence and economics
AIMeetup #4: Artificial intelligence and economics2040.io
 
AIMeetup #4: Let’s compete with machine! edrone crm
AIMeetup #4: Let’s compete with machine! edrone crmAIMeetup #4: Let’s compete with machine! edrone crm
AIMeetup #4: Let’s compete with machine! edrone crm2040.io
 

Más de 2040.io (12)

Jak budujemy inteligentnego asystenta biznesowego
Jak budujemy inteligentnego asystenta biznesowegoJak budujemy inteligentnego asystenta biznesowego
Jak budujemy inteligentnego asystenta biznesowego
 
Obsługa klienta z wykorzystaniem sztucznej inteligencji
Obsługa klienta z wykorzystaniem sztucznej inteligencjiObsługa klienta z wykorzystaniem sztucznej inteligencji
Obsługa klienta z wykorzystaniem sztucznej inteligencji
 
Jak AI pozwala nam usłyszeć głos klienta
Jak AI pozwala nam usłyszeć głos klientaJak AI pozwala nam usłyszeć głos klienta
Jak AI pozwala nam usłyszeć głos klienta
 
Wyzwania związane z modelowaniem mobilnych systemów świadomych kontekstu
Wyzwania związane z modelowaniem mobilnych systemów świadomych kontekstuWyzwania związane z modelowaniem mobilnych systemów świadomych kontekstu
Wyzwania związane z modelowaniem mobilnych systemów świadomych kontekstu
 
Rozpoznawanie mowy: problem rozwiązany?
Rozpoznawanie mowy: problem rozwiązany?Rozpoznawanie mowy: problem rozwiązany?
Rozpoznawanie mowy: problem rozwiązany?
 
Czy Deep Learning działa?
Czy Deep Learning działa?Czy Deep Learning działa?
Czy Deep Learning działa?
 
Analiza semantyczna zasosowana w środowisku Menerva
Analiza semantyczna zasosowana w środowisku MenervaAnaliza semantyczna zasosowana w środowisku Menerva
Analiza semantyczna zasosowana w środowisku Menerva
 
Time-series prediction with neural networks
Time-series prediction with neural networksTime-series prediction with neural networks
Time-series prediction with neural networks
 
Ai meetup Neural machine translation updated
Ai meetup Neural machine translation updatedAi meetup Neural machine translation updated
Ai meetup Neural machine translation updated
 
AIMeetup #4: Neural-machine-translation
AIMeetup #4: Neural-machine-translationAIMeetup #4: Neural-machine-translation
AIMeetup #4: Neural-machine-translation
 
AIMeetup #4: Artificial intelligence and economics
AIMeetup #4: Artificial intelligence and economicsAIMeetup #4: Artificial intelligence and economics
AIMeetup #4: Artificial intelligence and economics
 
AIMeetup #4: Let’s compete with machine! edrone crm
AIMeetup #4: Let’s compete with machine! edrone crmAIMeetup #4: Let’s compete with machine! edrone crm
AIMeetup #4: Let’s compete with machine! edrone crm
 

Último

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 

Último (20)

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

AIMeetup #3: Uczenie maszynowe - rocket science czy chleb powszedni?

  • 1. Uczenie maszynowe Vladimir Alekseichenko „rocket science” czy chleb powszedni?
  • 3.
  • 4.
  • 5.
  • 6. 10min na jeden 36 500 000 minut ~70 lat
  • 7.
  • 10. Bike Sharing Demand Zadnie - kaggle Rozwiązanie - github.com/dataworkshop
  • 11. Understand Business & Data Read and explore data Feature Engineering Create a new ones based on already exists Feature Selection Select only useful features Model Selection Find the best model(s) model A model B model C model D model E Tuning Hyperparameters Find the best hyperparameters for given model Ensemble Modeling Combine few models into one more better x0.6 x0.4+ mode l B mode l E datetime season temp count 2011-01-01 08:32:02 1 9.23 5 2012-04-02 12:10:00 2 18.78 32 2012-08-07 15:47:01 3 15.45 15 datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708 mode l B mode l E datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708
  • 12. Understand Business & Data Read and explore data Feature Engineering Create a new ones based on already exists Feature Selection Select only useful features Model Selection Find the best model(s) model A model B model C model D model E Tuning Hyperparameters Find the best hyperparameters for given model Ensemble Modeling Combine few models into one more better x0.6 x0.4+ mode l B mode l E datetime season temp count 2011-01-01 08:32:02 1 9.23 5 2012-04-02 12:10:00 2 18.78 32 2012-08-07 15:47:01 3 15.45 15 datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708 mode l B mode l E datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708
  • 13. Zrozum Biznes i Dane (understand business and data)
  • 14.
  • 17. Understand Business & Data Read and explore data Feature Engineering Create a new ones based on already exists Feature Selection Select only useful features Model Selection Find the best model(s) model A model B model C model D model E Tuning Hyperparameters Find the best hyperparameters for given model Ensemble Modeling Combine few models into one more better x0.6 x0.4+ mode l B mode l E datetime season temp count 2011-01-01 08:32:02 1 9.23 5 2012-04-02 12:10:00 2 18.78 32 2012-08-07 15:47:01 3 15.45 15 datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708 mode l B mode l E datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708
  • 18. Wytworzenie cech (feature engineering) • ilościowe => od 1 do 10, 11 do 20… • daty => dzień, miesiąc, rok, godzina, czy weekend… • kategorii/jakościowe (czerwony, zielony, biały) • przypisać identyfikator liczbowy (1, 2, 3) • stworzyć n-kolumn binarnych (jest czerwony? itd) • prawdopodobieństwa ze zmienną docelową
  • 19.
  • 20. Understand Business & Data Read and explore data Feature Engineering Create a new ones based on already exists Feature Selection Select only useful features Model Selection Find the best model(s) model A model B model C model D model E Tuning Hyperparameters Find the best hyperparameters for given model Ensemble Modeling Combine few models into one more better x0.6 x0.4+ mode l B mode l E datetime season temp count 2011-01-01 08:32:02 1 9.23 5 2012-04-02 12:10:00 2 18.78 32 2012-08-07 15:47:01 3 15.45 15 datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708 mode l B mode l E datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708
  • 21. Selekcja cech (feature selection) • Czym mniej tym lepiej (prostszy model) • Zostawić najbardziej wartościowe (idealnie jedna :) • Cechy (zazwyczaj) są zależny, więc trzeba uważać… (sprawdzać empirycznie) • Szybciej
  • 24. Understand Business & Data Read and explore data Feature Engineering Create a new ones based on already exists Feature Selection Select only useful features Model Selection Find the best model(s) model A model B model C model D model E Tuning Hyperparameters Find the best hyperparameters for given model Ensemble Modeling Combine few models into one more better x0.6 x0.4+ mode l B mode l E datetime season temp count 2011-01-01 08:32:02 1 9.23 5 2012-04-02 12:10:00 2 18.78 32 2012-08-07 15:47:01 3 15.45 15 datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708 mode l B mode l E datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708
  • 25. Dobór Modelu (model selection) • Linear • Decision Tree • Random Forest • Gradient Boosting • Neural Network
  • 29. Ensemble trees • Bagging (bootstrap aggregation) • Random Forest • Extra Trees • Boosting • Gradient Boosting
  • 30. XGBoost (Extreme Gradient Boosting) “When in doubt, use xgboost” Owen Zhang
  • 32. Understand Business & Data Read and explore data Feature Engineering Create a new ones based on already exists Feature Selection Select only useful features Model Selection Find the best model(s) model A model B model C model D model E Tuning Hyperparameters Find the best hyperparameters for given model Ensemble Modeling Combine few models into one more better x0.6 x0.4+ mode l B mode l E datetime season temp count 2011-01-01 08:32:02 1 9.23 5 2012-04-02 12:10:00 2 18.78 32 2012-08-07 15:47:01 3 15.45 15 datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708 mode l B mode l E datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708
  • 33. Dobór hiperparametrów (tuning hyperparameters) • Grid Search • Random Search • Bayesian
  • 35. Understand Business & Data Read and explore data Feature Engineering Create a new ones based on already exists Feature Selection Select only useful features Model Selection Find the best model(s) model A model B model C model D model E Tuning Hyperparameters Find the best hyperparameters for given model Ensemble Modeling Combine few models into one more better x0.6 x0.4+ mode l B mode l E datetime season temp count 2011-01-01 08:32:02 1 9.23 5 2012-04-02 12:10:00 2 18.78 32 2012-08-07 15:47:01 3 15.45 15 datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708 mode l B mode l E datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708
  • 37.
  • 40. MNIST
  • 41. Dane
  • 50. source Fala już idzi… czy jesteś gotów?