SlideShare una empresa de Scribd logo
1 de 15
Smart approach in development and
deployment process for various ML models
Jelena Pekez (Advanced Analytics Team Lead)
Miloš Josifović (Big Data Architect)
Danijel Ilievski (Senior ML Engineer)
Comtrade System Integration
Introduction
→Since 87% of models are never deployed, all steps should be planned at the
beginning of Data Science Lifecycle (pipeline):
1. Manage
2. Develop
3. Deploy
4. Monitor
→The first goal is to reduce go to production time for new ML models
with development of Smart Generic Data Mart(s).
→With Smart Data Mart(s) we can prototype ML model and evaluate feasibility.
→The final goal is to generate Production Models and easily orchestrate them.
2
Results
Interpretation
Modeling
Data
Preprocess
Data mart design
ADS
Problem
Formulation
Deployment
PROD.MODEL
Comtrade System Integration
3
ADS smart development to support all future ML models
→Planning DataMart for creation of first ML model in a program takes exhaustive time:
• Collect at high-level all possible future use-cases
• Come up with all relevant and available data sources
• Customer’s activities which company has interest in
• Combine data from structured and unstructured data sources
• Extensive feature engineering (text processing, normalization, binning,…)
• Complying with GDPR regulation
• Define proper access rights on selected Data Mart(s)
• Resolving data quality issues at the very beginning will reduce endless reloads
FornextMLmodeldatascientistscanspendmoretimeoncreativeactivitiesusingdevelopedAnalyticalDatamarts/Sets(ADS)
Comtrade System Integration
Smart generic data mart(s)
→Creating Multipurpose Data Marts:
• Generate list of target features and relevant target events
• Design it so new events can be easily added
• Eliminate data that have no business/use-case value
• Filter out system records - clean data
• Make initial (starting) base table/s - what is definition of customer?
• Aggregate data to different granularity levels to catch behavior trends
• Feature Engineering do indeed make a difference!
4
Generate quickly and easily new ML training datasets
Comtrade System Integration
Data Science requires domain knowledge
makes a big difference
→How much domain knowledge do I need? Depends.
→Domain knowledge is critical for data preparation, productization and orchestration
→Which data points add value?
→Domain knowledge is necessary in data pre-processing:
• Outlier detection, feature importance, model selection, model evaluation stage...
5
DATA SCIENCE
DOMAIN
KNOWLEDGE
MATH, STATS
& ML
COMPUTER
SCIENCE
You have to get best of both worlds!
Comtrade System Integration
Control your data mart(s) in production
→Steps in data pipeline for data quality check:
• Missing data vs Loaded data - aggregations
• Duplicates – the same records were repeated
• Relative change threshold - increment or decrement in the number of records
• Statistical expected range
• Data drift – target variable distribution
6
Data
Pipeline
Comtrade System Integration
Example how Generic Data Set can help to focus on
Data Science – Transfer between DWH and Data Lake
→Data on two platforms (DWH – SQL database, Data Lake – Hadoop)
→Data can be transferred among databases:
• Through SQL federation / DB link – with certain specifics/products compatibility
• Via Spark engine (PySpark) to Hadoop
→Aim is to simplify data transfer between platforms so,
Data Scientist can do it on their own, without:
• Dealing with Spark’s jobs directly
• Manage Hadoop security (Kerberos, read-write permissions, etc.)
7
Comtrade System Integration
Speed up writting SQL queries
→ADS  [GENERATE SQL QUERY]  Training/Scoring table
→Query automation for training table
→ Input for Python script: e.g. of Python script:
8
SCHEMA SOURCE VAR_IN VAR_OUT FUNCTIONS
PERIOD
S
ZERO
EXCLUDE
ADS DS_PAYMENT TOTAL_PAYMENT_AMT
TOTAL_PAYMENT_AM
T
[MAX, AVG/P] [3, 6] 1
ADS DS_PAYMENT TOTAL_PAYMENT_CNT
TOTAL_PAYMENT_CN
T
[SUM] [1] 1
ADS DS_PAYMENT MAX_PAYMENT_AMT MAX_PAYMENT_AMT [MAX] [3] 1
ADS DS_PAYMENT MIN_PAYMENT_AMT MIN_PAYMENT_AMT [MIN] [3] 1
ADS DS_PAYMENT ADD_PAYMENT_CNT ADD_PAYMENT_CNT [AVG/P] [6] 1
ADS DS_USAGE USAGE_OUT_DUR USAGE_OUT_DUR [SUM] [1] 1
ADS
DS_USAGE USAGE_OUT_DUR USAGE_OUT_DUR
[AVG/P, MAX,
MIN]
[3, 6] 1
ADS
DS_USAGE USAGE_OUT_IN_PACK_DUR
USAGE_OUT_IN_PACK
_DUR
[SUM] [1] 1
ADS
DS_USAGE
NVL(USAGE_OUT_REG_INT_DUR,
0) +
NVL(USAGE_OUT_INT_DUR,0) USAGE_OUT_INT_DUR
[AVG/P] [6] 1
for i, line in enumerate(variables):
for i2, k in enumerate(line[2]): #funkcija
for i3, kk in enumerate(line[3]): #period
if (i == len(variables) - 1) & (i2 == len(line[2])-1) & (i3 == len(line[3])-1):
zarez = ''
else:
zarez = ','
#KREIRA AGREGACIONU KOLONU, npr. AVG(FIELD_NAME) AS NEW_FIELD_NAME
divider = ''
if 'AVG/P' == str.upper(k):
func1 = 'SUM'
func2 = '_' + 'AVG'
divider = '/' + str(kk)
elif ('SUM' == str.upper(k)) & (kk == '1'):
func1 = 'SUM'
func2 = ''
else:
func1 = k
func2 = '_' + k
query += (func1 + '(' + line[1] + '_' + str(kk) + 'M' + ')' + divider + ' AS ' + line[1] + func2 + '_' + str(kk) + 'M' + zarez + ' n’)
…
for i, line in enumerate(variables):
for i2, line2 in enumerate(line[3]):
if (i == len(variables) - 1) & (i2 == len(line[3])-1):
zarez = ''
else:
zarez = ','
if line[4] == 1:
zero_rule = 'AND {varijabla} <> 0'.format(varijabla = line[0])
else:
zero_rule = ''
query += ("CASE WHEN TIME_ID BETWEEN ADD_MONTHS('{datum_place}', {vreme2}) AND
'{datum_place}' {zero_rule} THEN {varijabla} ELSE
NULL END AS
{varijabla2}_{vreme}M{zarez_place}".format(varijabla = line[0],
varijabla2 = line[1], datum_place = datum, vreme2 = -1 * (int(line2) - 1),
zero_rule=zero_rule, vreme = line2, zarez_place = zarez))+ ' n'
query += ("FROMn
Comtrade System Integration
Develop phase - Devote more time to the creative side
→Improve ML traditional development processes:
• Benefit from pre-trained models (deep learning – mainly image recognition)
• Automated Machine learning (AutoML) – pretty good in supervised ML
9
→Auto ML:
• Optimize DS workload or lack of experience
• Processes tasks like Feature Selection, Data Preprocessing, Hyperparameter Optimization,
Model/Algorithm Selection
• Let you focus more on the data side
• Is no silver bullet, it is more exploration tool rather than an optimal model generation tool
MLBox, Auto-Sklearn, TPOT, H2O AutoML, Auto Keras, Auto PyTorch, Google Cloud AutoML, DataRobot, etc.​
Comtrade System Integration
Deploy phase - don’tgetanyvalueoutofamodelsittingonsomeonecomputer
→Phase where model is transferred to a production environment.
→Same best-practice principles and design patterns for software also apply to ML models
→ML model should be deployed as part of existing data pipeline
→Output of ML model should be monitored for bias
→ML model in deploy phase:
• Registered in appropriate repository
• Passed testing
• Model artifacts are retained
→Validate model  Publish model Deliver model
→Don’t update Python libraries before proper testing on development environment 😊 10
Comtrade System Integration
Deploy phase – more than one ML model
12
→Model registry:
• Place for all trained/production-ready models (with version control)
• Alternative models as backup
• All model artifacts, model dependencies, evaluation metrics, documentation
• Which dataset was used for training / model lineage
• Log performance details of the model and comparison with other models
• Tracking models during whole time (training, staging and production)
→Model registry enables faster deployment of your models or retrain current ones
→Shared by multiple team members (team collaboration)
→Tie up business rules and output from production model
→Consume the model through API integration
Comtrade System Integration
Single
Pipeline for
datatransfer
Conclusion
12
Easy
deployment
Smart
Generic
Data
Mart(s)
More
creative
time
Contact us as on:
Danijel.Ilievski@comtrade.com
Jelena.Pekez@comtrade.com
Milos.Josifovic@comtrade.com Milos.
Q&A
www.comtradeintegration.com
Copyright © 2020 Comtrade. All rights reserved. The content of this presentation is copyright protected. Any reproduction, distribution, or modification is not allowed.
The information, solutions, and opinions contained in this presentation are of informative nature only and are not intended to be a comprehensive study, nor should they be relied on or treated as
a means to provide a complete solution or advice, since we may not be aware of all specific circumstances of the case. We try to provide quality information, but we make no claims, promises, or
guaranties about the accuracy, completeness, or adequacy of the information contained herein.
Thank you

Más contenido relacionado

Similar a [DSC Europe 22] Smart approach in development and deployment process for various ML models - Danijel Ilievski & Milos Josifovic

Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...Databricks
 
Azure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User StoreAzure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User StoreDataStax Academy
 
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPython + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPaige_Roberts
 
SOLIDWORKS reseller Whitepaper by Promedia Systems
SOLIDWORKS reseller Whitepaper by Promedia Systems SOLIDWORKS reseller Whitepaper by Promedia Systems
SOLIDWORKS reseller Whitepaper by Promedia Systems Cavien Clever
 
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningPaige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningEdunomica
 
Prateek sharma etl_datastage_exp3.9yrs_resume
Prateek sharma etl_datastage_exp3.9yrs_resumePrateek sharma etl_datastage_exp3.9yrs_resume
Prateek sharma etl_datastage_exp3.9yrs_resumePrateek Sharma
 
MSBI Online Training in Hyderabad
MSBI Online Training in HyderabadMSBI Online Training in Hyderabad
MSBI Online Training in Hyderabadunited global soft
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data SciencePouria Amirian
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data SciencePouria Amirian
 
Welcome Webinar Slides
Welcome Webinar SlidesWelcome Webinar Slides
Welcome Webinar SlidesSumo Logic
 
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessData Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessAnant Corporation
 
Ajith_kumar_4.3 Years_Informatica_ETL
Ajith_kumar_4.3 Years_Informatica_ETLAjith_kumar_4.3 Years_Informatica_ETL
Ajith_kumar_4.3 Years_Informatica_ETLAjith Kumar Pampatti
 
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...Joachim Schlosser
 
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...Deepak Chandramouli
 
Bigdata.sunil_6+yearsExp
Bigdata.sunil_6+yearsExpBigdata.sunil_6+yearsExp
Bigdata.sunil_6+yearsExpbigdata sunil
 

Similar a [DSC Europe 22] Smart approach in development and deployment process for various ML models - Danijel Ilievski & Milos Josifovic (20)

Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
 
Azure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User StoreAzure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User Store
 
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPython + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
 
SOLIDWORKS reseller Whitepaper by Promedia Systems
SOLIDWORKS reseller Whitepaper by Promedia Systems SOLIDWORKS reseller Whitepaper by Promedia Systems
SOLIDWORKS reseller Whitepaper by Promedia Systems
 
JESSIESEMANA_CV_1
JESSIESEMANA_CV_1JESSIESEMANA_CV_1
JESSIESEMANA_CV_1
 
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningPaige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
 
Prateek sharma etl_datastage_exp3.9yrs_resume
Prateek sharma etl_datastage_exp3.9yrs_resumePrateek sharma etl_datastage_exp3.9yrs_resume
Prateek sharma etl_datastage_exp3.9yrs_resume
 
Sandeep Grandhi (1)
Sandeep Grandhi (1)Sandeep Grandhi (1)
Sandeep Grandhi (1)
 
MSBI Online Training in Hyderabad
MSBI Online Training in HyderabadMSBI Online Training in Hyderabad
MSBI Online Training in Hyderabad
 
MSBI Online Training in India
MSBI Online Training in IndiaMSBI Online Training in India
MSBI Online Training in India
 
MSBI Online Training
MSBI Online Training MSBI Online Training
MSBI Online Training
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
 
Welcome Webinar Slides
Welcome Webinar SlidesWelcome Webinar Slides
Welcome Webinar Slides
 
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessData Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
 
ChakravarthyUppara
ChakravarthyUpparaChakravarthyUppara
ChakravarthyUppara
 
Ajith_kumar_4.3 Years_Informatica_ETL
Ajith_kumar_4.3 Years_Informatica_ETLAjith_kumar_4.3 Years_Informatica_ETL
Ajith_kumar_4.3 Years_Informatica_ETL
 
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
 
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
 
Bigdata.sunil_6+yearsExp
Bigdata.sunil_6+yearsExpBigdata.sunil_6+yearsExp
Bigdata.sunil_6+yearsExp
 

Más de DataScienceConferenc1

[DSC Europe 23] Luciano Catani - AI in Diplomacy.PDF
[DSC Europe 23] Luciano Catani - AI in Diplomacy.PDF[DSC Europe 23] Luciano Catani - AI in Diplomacy.PDF
[DSC Europe 23] Luciano Catani - AI in Diplomacy.PDFDataScienceConferenc1
 
[DSC Europe 23] Rania Wazir - Mathematician jokes, cute cat photos, offensiv...
[DSC Europe 23] Rania Wazir -  Mathematician jokes, cute cat photos, offensiv...[DSC Europe 23] Rania Wazir -  Mathematician jokes, cute cat photos, offensiv...
[DSC Europe 23] Rania Wazir - Mathematician jokes, cute cat photos, offensiv...DataScienceConferenc1
 
[DSC Europe 23] Irena Cerovic - AI in International Development.pdf
[DSC Europe 23] Irena Cerovic - AI in International Development.pdf[DSC Europe 23] Irena Cerovic - AI in International Development.pdf
[DSC Europe 23] Irena Cerovic - AI in International Development.pdfDataScienceConferenc1
 
[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...
[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...
[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...DataScienceConferenc1
 
[DSC Europe 23] Branka Panic - Peace in the age of artificial intelligence.pptx
[DSC Europe 23] Branka Panic - Peace in the age of artificial intelligence.pptx[DSC Europe 23] Branka Panic - Peace in the age of artificial intelligence.pptx
[DSC Europe 23] Branka Panic - Peace in the age of artificial intelligence.pptxDataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Goran Dumic - Data-Driven Approach In Treatments
[DSC Europe 23][DigiHealth]  Goran Dumic -  Data-Driven Approach In Treatments[DSC Europe 23][DigiHealth]  Goran Dumic -  Data-Driven Approach In Treatments
[DSC Europe 23][DigiHealth] Goran Dumic - Data-Driven Approach In TreatmentsDataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Milos Todorovic - Bridging the Gap-Innovating Ag...
[DSC Europe 23][DigiHealth]  Milos Todorovic - Bridging the Gap-Innovating Ag...[DSC Europe 23][DigiHealth]  Milos Todorovic - Bridging the Gap-Innovating Ag...
[DSC Europe 23][DigiHealth] Milos Todorovic - Bridging the Gap-Innovating Ag...DataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...
[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...
[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...DataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Vladimir Brusic - SMART HEALTH HOME: Technology,...
[DSC Europe 23][DigiHealth]  Vladimir Brusic - SMART HEALTH HOME: Technology,...[DSC Europe 23][DigiHealth]  Vladimir Brusic - SMART HEALTH HOME: Technology,...
[DSC Europe 23][DigiHealth] Vladimir Brusic - SMART HEALTH HOME: Technology,...DataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Dimitar Penkov Grid Search Optimization of Novel...
[DSC Europe 23][DigiHealth]  Dimitar Penkov Grid Search Optimization of Novel...[DSC Europe 23][DigiHealth]  Dimitar Penkov Grid Search Optimization of Novel...
[DSC Europe 23][DigiHealth] Dimitar Penkov Grid Search Optimization of Novel...DataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED
[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED
[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMEDDataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Katarina Vucicevic - Navigating theKinetics of Dr...
[DSC Europe 23][DigiHealth] Katarina Vucicevic - Navigating theKinetics of Dr...[DSC Europe 23][DigiHealth] Katarina Vucicevic - Navigating theKinetics of Dr...
[DSC Europe 23][DigiHealth] Katarina Vucicevic - Navigating theKinetics of Dr...DataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Anja Baresic 0- Croatian digital Healthcare ecosy...
[DSC Europe 23][DigiHealth] Anja Baresic 0- Croatian digital Healthcare ecosy...[DSC Europe 23][DigiHealth] Anja Baresic 0- Croatian digital Healthcare ecosy...
[DSC Europe 23][DigiHealth] Anja Baresic 0- Croatian digital Healthcare ecosy...DataScienceConferenc1
 
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...DataScienceConferenc1
 
[DSC Europe 23][AI:CSI] Uros Arsenijevic Unlocking Cybersecurity with Seif
[DSC Europe 23][AI:CSI] Uros Arsenijevic Unlocking Cybersecurity with Seif[DSC Europe 23][AI:CSI] Uros Arsenijevic Unlocking Cybersecurity with Seif
[DSC Europe 23][AI:CSI] Uros Arsenijevic Unlocking Cybersecurity with SeifDataScienceConferenc1
 
[DSC Europe 23][AI:CSI] Goran Gvozden Improving Cybersecurity Posture with an...
[DSC Europe 23][AI:CSI] Goran Gvozden Improving Cybersecurity Posture with an...[DSC Europe 23][AI:CSI] Goran Gvozden Improving Cybersecurity Posture with an...
[DSC Europe 23][AI:CSI] Goran Gvozden Improving Cybersecurity Posture with an...DataScienceConferenc1
 
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...DataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...DataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Ligia Kornowska-How_may AI help you
[DSC Europe 23][DigiHealth] Ligia Kornowska-How_may AI help you[DSC Europe 23][DigiHealth] Ligia Kornowska-How_may AI help you
[DSC Europe 23][DigiHealth] Ligia Kornowska-How_may AI help youDataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Ilya Zakharov - NETWORK NEUROSCIENCE WHERE THE BR...
[DSC Europe 23][DigiHealth] Ilya Zakharov - NETWORK NEUROSCIENCE WHERE THE BR...[DSC Europe 23][DigiHealth] Ilya Zakharov - NETWORK NEUROSCIENCE WHERE THE BR...
[DSC Europe 23][DigiHealth] Ilya Zakharov - NETWORK NEUROSCIENCE WHERE THE BR...DataScienceConferenc1
 

Más de DataScienceConferenc1 (20)

[DSC Europe 23] Luciano Catani - AI in Diplomacy.PDF
[DSC Europe 23] Luciano Catani - AI in Diplomacy.PDF[DSC Europe 23] Luciano Catani - AI in Diplomacy.PDF
[DSC Europe 23] Luciano Catani - AI in Diplomacy.PDF
 
[DSC Europe 23] Rania Wazir - Mathematician jokes, cute cat photos, offensiv...
[DSC Europe 23] Rania Wazir -  Mathematician jokes, cute cat photos, offensiv...[DSC Europe 23] Rania Wazir -  Mathematician jokes, cute cat photos, offensiv...
[DSC Europe 23] Rania Wazir - Mathematician jokes, cute cat photos, offensiv...
 
[DSC Europe 23] Irena Cerovic - AI in International Development.pdf
[DSC Europe 23] Irena Cerovic - AI in International Development.pdf[DSC Europe 23] Irena Cerovic - AI in International Development.pdf
[DSC Europe 23] Irena Cerovic - AI in International Development.pdf
 
[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...
[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...
[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...
 
[DSC Europe 23] Branka Panic - Peace in the age of artificial intelligence.pptx
[DSC Europe 23] Branka Panic - Peace in the age of artificial intelligence.pptx[DSC Europe 23] Branka Panic - Peace in the age of artificial intelligence.pptx
[DSC Europe 23] Branka Panic - Peace in the age of artificial intelligence.pptx
 
[DSC Europe 23][DigiHealth] Goran Dumic - Data-Driven Approach In Treatments
[DSC Europe 23][DigiHealth]  Goran Dumic -  Data-Driven Approach In Treatments[DSC Europe 23][DigiHealth]  Goran Dumic -  Data-Driven Approach In Treatments
[DSC Europe 23][DigiHealth] Goran Dumic - Data-Driven Approach In Treatments
 
[DSC Europe 23][DigiHealth] Milos Todorovic - Bridging the Gap-Innovating Ag...
[DSC Europe 23][DigiHealth]  Milos Todorovic - Bridging the Gap-Innovating Ag...[DSC Europe 23][DigiHealth]  Milos Todorovic - Bridging the Gap-Innovating Ag...
[DSC Europe 23][DigiHealth] Milos Todorovic - Bridging the Gap-Innovating Ag...
 
[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...
[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...
[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...
 
[DSC Europe 23][DigiHealth] Vladimir Brusic - SMART HEALTH HOME: Technology,...
[DSC Europe 23][DigiHealth]  Vladimir Brusic - SMART HEALTH HOME: Technology,...[DSC Europe 23][DigiHealth]  Vladimir Brusic - SMART HEALTH HOME: Technology,...
[DSC Europe 23][DigiHealth] Vladimir Brusic - SMART HEALTH HOME: Technology,...
 
[DSC Europe 23][DigiHealth] Dimitar Penkov Grid Search Optimization of Novel...
[DSC Europe 23][DigiHealth]  Dimitar Penkov Grid Search Optimization of Novel...[DSC Europe 23][DigiHealth]  Dimitar Penkov Grid Search Optimization of Novel...
[DSC Europe 23][DigiHealth] Dimitar Penkov Grid Search Optimization of Novel...
 
[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED
[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED
[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED
 
[DSC Europe 23][DigiHealth] Katarina Vucicevic - Navigating theKinetics of Dr...
[DSC Europe 23][DigiHealth] Katarina Vucicevic - Navigating theKinetics of Dr...[DSC Europe 23][DigiHealth] Katarina Vucicevic - Navigating theKinetics of Dr...
[DSC Europe 23][DigiHealth] Katarina Vucicevic - Navigating theKinetics of Dr...
 
[DSC Europe 23][DigiHealth] Anja Baresic 0- Croatian digital Healthcare ecosy...
[DSC Europe 23][DigiHealth] Anja Baresic 0- Croatian digital Healthcare ecosy...[DSC Europe 23][DigiHealth] Anja Baresic 0- Croatian digital Healthcare ecosy...
[DSC Europe 23][DigiHealth] Anja Baresic 0- Croatian digital Healthcare ecosy...
 
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...
 
[DSC Europe 23][AI:CSI] Uros Arsenijevic Unlocking Cybersecurity with Seif
[DSC Europe 23][AI:CSI] Uros Arsenijevic Unlocking Cybersecurity with Seif[DSC Europe 23][AI:CSI] Uros Arsenijevic Unlocking Cybersecurity with Seif
[DSC Europe 23][AI:CSI] Uros Arsenijevic Unlocking Cybersecurity with Seif
 
[DSC Europe 23][AI:CSI] Goran Gvozden Improving Cybersecurity Posture with an...
[DSC Europe 23][AI:CSI] Goran Gvozden Improving Cybersecurity Posture with an...[DSC Europe 23][AI:CSI] Goran Gvozden Improving Cybersecurity Posture with an...
[DSC Europe 23][AI:CSI] Goran Gvozden Improving Cybersecurity Posture with an...
 
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
 
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
 
[DSC Europe 23][DigiHealth] Ligia Kornowska-How_may AI help you
[DSC Europe 23][DigiHealth] Ligia Kornowska-How_may AI help you[DSC Europe 23][DigiHealth] Ligia Kornowska-How_may AI help you
[DSC Europe 23][DigiHealth] Ligia Kornowska-How_may AI help you
 
[DSC Europe 23][DigiHealth] Ilya Zakharov - NETWORK NEUROSCIENCE WHERE THE BR...
[DSC Europe 23][DigiHealth] Ilya Zakharov - NETWORK NEUROSCIENCE WHERE THE BR...[DSC Europe 23][DigiHealth] Ilya Zakharov - NETWORK NEUROSCIENCE WHERE THE BR...
[DSC Europe 23][DigiHealth] Ilya Zakharov - NETWORK NEUROSCIENCE WHERE THE BR...
 

Último

Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 

Último (20)

Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 

[DSC Europe 22] Smart approach in development and deployment process for various ML models - Danijel Ilievski & Milos Josifovic

  • 1. Smart approach in development and deployment process for various ML models Jelena Pekez (Advanced Analytics Team Lead) Miloš Josifović (Big Data Architect) Danijel Ilievski (Senior ML Engineer)
  • 2. Comtrade System Integration Introduction →Since 87% of models are never deployed, all steps should be planned at the beginning of Data Science Lifecycle (pipeline): 1. Manage 2. Develop 3. Deploy 4. Monitor →The first goal is to reduce go to production time for new ML models with development of Smart Generic Data Mart(s). →With Smart Data Mart(s) we can prototype ML model and evaluate feasibility. →The final goal is to generate Production Models and easily orchestrate them. 2 Results Interpretation Modeling Data Preprocess Data mart design ADS Problem Formulation Deployment PROD.MODEL
  • 3. Comtrade System Integration 3 ADS smart development to support all future ML models →Planning DataMart for creation of first ML model in a program takes exhaustive time: • Collect at high-level all possible future use-cases • Come up with all relevant and available data sources • Customer’s activities which company has interest in • Combine data from structured and unstructured data sources • Extensive feature engineering (text processing, normalization, binning,…) • Complying with GDPR regulation • Define proper access rights on selected Data Mart(s) • Resolving data quality issues at the very beginning will reduce endless reloads FornextMLmodeldatascientistscanspendmoretimeoncreativeactivitiesusingdevelopedAnalyticalDatamarts/Sets(ADS)
  • 4. Comtrade System Integration Smart generic data mart(s) →Creating Multipurpose Data Marts: • Generate list of target features and relevant target events • Design it so new events can be easily added • Eliminate data that have no business/use-case value • Filter out system records - clean data • Make initial (starting) base table/s - what is definition of customer? • Aggregate data to different granularity levels to catch behavior trends • Feature Engineering do indeed make a difference! 4 Generate quickly and easily new ML training datasets
  • 5. Comtrade System Integration Data Science requires domain knowledge makes a big difference →How much domain knowledge do I need? Depends. →Domain knowledge is critical for data preparation, productization and orchestration →Which data points add value? →Domain knowledge is necessary in data pre-processing: • Outlier detection, feature importance, model selection, model evaluation stage... 5 DATA SCIENCE DOMAIN KNOWLEDGE MATH, STATS & ML COMPUTER SCIENCE You have to get best of both worlds!
  • 6. Comtrade System Integration Control your data mart(s) in production →Steps in data pipeline for data quality check: • Missing data vs Loaded data - aggregations • Duplicates – the same records were repeated • Relative change threshold - increment or decrement in the number of records • Statistical expected range • Data drift – target variable distribution 6 Data Pipeline
  • 7. Comtrade System Integration Example how Generic Data Set can help to focus on Data Science – Transfer between DWH and Data Lake →Data on two platforms (DWH – SQL database, Data Lake – Hadoop) →Data can be transferred among databases: • Through SQL federation / DB link – with certain specifics/products compatibility • Via Spark engine (PySpark) to Hadoop →Aim is to simplify data transfer between platforms so, Data Scientist can do it on their own, without: • Dealing with Spark’s jobs directly • Manage Hadoop security (Kerberos, read-write permissions, etc.) 7
  • 8. Comtrade System Integration Speed up writting SQL queries →ADS  [GENERATE SQL QUERY]  Training/Scoring table →Query automation for training table → Input for Python script: e.g. of Python script: 8 SCHEMA SOURCE VAR_IN VAR_OUT FUNCTIONS PERIOD S ZERO EXCLUDE ADS DS_PAYMENT TOTAL_PAYMENT_AMT TOTAL_PAYMENT_AM T [MAX, AVG/P] [3, 6] 1 ADS DS_PAYMENT TOTAL_PAYMENT_CNT TOTAL_PAYMENT_CN T [SUM] [1] 1 ADS DS_PAYMENT MAX_PAYMENT_AMT MAX_PAYMENT_AMT [MAX] [3] 1 ADS DS_PAYMENT MIN_PAYMENT_AMT MIN_PAYMENT_AMT [MIN] [3] 1 ADS DS_PAYMENT ADD_PAYMENT_CNT ADD_PAYMENT_CNT [AVG/P] [6] 1 ADS DS_USAGE USAGE_OUT_DUR USAGE_OUT_DUR [SUM] [1] 1 ADS DS_USAGE USAGE_OUT_DUR USAGE_OUT_DUR [AVG/P, MAX, MIN] [3, 6] 1 ADS DS_USAGE USAGE_OUT_IN_PACK_DUR USAGE_OUT_IN_PACK _DUR [SUM] [1] 1 ADS DS_USAGE NVL(USAGE_OUT_REG_INT_DUR, 0) + NVL(USAGE_OUT_INT_DUR,0) USAGE_OUT_INT_DUR [AVG/P] [6] 1 for i, line in enumerate(variables): for i2, k in enumerate(line[2]): #funkcija for i3, kk in enumerate(line[3]): #period if (i == len(variables) - 1) & (i2 == len(line[2])-1) & (i3 == len(line[3])-1): zarez = '' else: zarez = ',' #KREIRA AGREGACIONU KOLONU, npr. AVG(FIELD_NAME) AS NEW_FIELD_NAME divider = '' if 'AVG/P' == str.upper(k): func1 = 'SUM' func2 = '_' + 'AVG' divider = '/' + str(kk) elif ('SUM' == str.upper(k)) & (kk == '1'): func1 = 'SUM' func2 = '' else: func1 = k func2 = '_' + k query += (func1 + '(' + line[1] + '_' + str(kk) + 'M' + ')' + divider + ' AS ' + line[1] + func2 + '_' + str(kk) + 'M' + zarez + ' n’) … for i, line in enumerate(variables): for i2, line2 in enumerate(line[3]): if (i == len(variables) - 1) & (i2 == len(line[3])-1): zarez = '' else: zarez = ',' if line[4] == 1: zero_rule = 'AND {varijabla} <> 0'.format(varijabla = line[0]) else: zero_rule = '' query += ("CASE WHEN TIME_ID BETWEEN ADD_MONTHS('{datum_place}', {vreme2}) AND '{datum_place}' {zero_rule} THEN {varijabla} ELSE NULL END AS {varijabla2}_{vreme}M{zarez_place}".format(varijabla = line[0], varijabla2 = line[1], datum_place = datum, vreme2 = -1 * (int(line2) - 1), zero_rule=zero_rule, vreme = line2, zarez_place = zarez))+ ' n' query += ("FROMn
  • 9. Comtrade System Integration Develop phase - Devote more time to the creative side →Improve ML traditional development processes: • Benefit from pre-trained models (deep learning – mainly image recognition) • Automated Machine learning (AutoML) – pretty good in supervised ML 9 →Auto ML: • Optimize DS workload or lack of experience • Processes tasks like Feature Selection, Data Preprocessing, Hyperparameter Optimization, Model/Algorithm Selection • Let you focus more on the data side • Is no silver bullet, it is more exploration tool rather than an optimal model generation tool MLBox, Auto-Sklearn, TPOT, H2O AutoML, Auto Keras, Auto PyTorch, Google Cloud AutoML, DataRobot, etc.​
  • 10. Comtrade System Integration Deploy phase - don’tgetanyvalueoutofamodelsittingonsomeonecomputer →Phase where model is transferred to a production environment. →Same best-practice principles and design patterns for software also apply to ML models →ML model should be deployed as part of existing data pipeline →Output of ML model should be monitored for bias →ML model in deploy phase: • Registered in appropriate repository • Passed testing • Model artifacts are retained →Validate model  Publish model Deliver model →Don’t update Python libraries before proper testing on development environment 😊 10
  • 11. Comtrade System Integration Deploy phase – more than one ML model 12 →Model registry: • Place for all trained/production-ready models (with version control) • Alternative models as backup • All model artifacts, model dependencies, evaluation metrics, documentation • Which dataset was used for training / model lineage • Log performance details of the model and comparison with other models • Tracking models during whole time (training, staging and production) →Model registry enables faster deployment of your models or retrain current ones →Shared by multiple team members (team collaboration) →Tie up business rules and output from production model →Consume the model through API integration
  • 12. Comtrade System Integration Single Pipeline for datatransfer Conclusion 12 Easy deployment Smart Generic Data Mart(s) More creative time
  • 13. Contact us as on: Danijel.Ilievski@comtrade.com Jelena.Pekez@comtrade.com Milos.Josifovic@comtrade.com Milos.
  • 14. Q&A
  • 15. www.comtradeintegration.com Copyright © 2020 Comtrade. All rights reserved. The content of this presentation is copyright protected. Any reproduction, distribution, or modification is not allowed. The information, solutions, and opinions contained in this presentation are of informative nature only and are not intended to be a comprehensive study, nor should they be relied on or treated as a means to provide a complete solution or advice, since we may not be aware of all specific circumstances of the case. We try to provide quality information, but we make no claims, promises, or guaranties about the accuracy, completeness, or adequacy of the information contained herein. Thank you

Notas del editor

  1. DANIJEL
  2. DANIJEL During deployment in large organizations, we have to orchestrate more than one ML model and best thing is to have in mind that at very beginning of projects that we will have more ml models in future, so organize everything in that manner that can support adding new models easily. … - Since very beginning special focus in Data Science Lifecycle should be on data quality and production. Foundation for more models in a future: Development of analytical dataset for future models development we can observe like a different project.
  3. JELENA So if we go more in details…. Kada se razvija model, focus na pripremi podataka –Organize DB tables considering performance and optimization Analiza dodavanje kolona, bitnih izvora Osmislite izvore, target tabele, kako organiz. Tabel po pitanju perform, I logike, imati higtlevel koji su use case-ove.
  4. JELENA POMENUTI:  Organize DB tables considering performance and optimization Feature Engineering - isn't about generating a higher quantity of new features. It's about the quality of the features created. 
  5. -DANIJEL ILI JELENA Doman knowledge cannot be optimized. - Make an instruction file with field names and action how to handling null: Constant value, Max(), Min(), Mean(), Nearby value, Regression, Delete record - Domain knowledge will allow you to take the impact of your machine learning skills to a much higher level of significance. -------------- --Random forests, for example, can handle heterogeneous data types right out of the box. As Data Scientist with domain knowledge you will have answer on question Which data points add value? And you just need to find them.
  6. DANIJEL
  7. MILOS Benefit / suggestion: Parallel execution No temp data on initial database Fast transfer Careful about data types specified on table level
  8. DANIJEL
  9. JELNEA DO KRAJA Efficiently automate all regular, manual, and tedious workloads of ML implementations „Fails short“ for Feature Engineering. Can easily overfit (watch for label distribution, how many outliers, etc.
  10. Deploy model as a stand alone container - easier