SlideShare una empresa de Scribd logo
1 de 14
LegIA Squad
BEHIND THE
SCENES OF
DATA SCIENCE
by Loïc Lejoly
Today's
OutlineTopic Highlights
• Data is everywhere
• Data access complexity
• Data science projects
• Typical data science workflow
• How to facilitate Data Acquisition / Data Exploitation
• Labelling tool LegIAnnotate
• What I need to start a project
• Concrete examples
Data is everywhere
“Without a systematic way to start and
keep data clean, bad data will happen.”
— Donato Diorio
Since the rise of data storage, we have been collecting large quantities of
information about users, processes, monitoring, logs, e.g.
This information can be stored on flat files, databases, images, e.g.
Data access complexity
Companies own a large amount of data and a part of it is not exploited for
different reasons:
• Unawareness of data usefulness
• Added value of data to company
• Difficulties to access data
• Data quality problem
• Storage systems date back to the 20th century
• No documentation (e.g. no data semantic)
• Complex reverse engineering
• Compatibility problems (e.g. no existing connectors between old and new systems)
• People who worked on it not available anymore
Data science projects
DATA SCIENCE PROJECTS
LEVERAGE YOUR BUSINESS
DATA PROJECTS
• Data collection / Labelling
• Database Creation / Management
• Data warehouse Creation /
Management
• Data architecture / storage
STATISTICAL PROJECTS
• Machine Learning
• Optimization
• Artificial Intelligence
• R&D
DATA VISUALIZATION PROJECTS
• Excel Sheets tabular / graphics
• Visualization and Analytics tools
(Power BI, Tableau, Plotly, D3.js etc.)
Not only analytical projects
Can be small and simple projects
Can be bigger projects
Typical data science workflow
1. Detection of sources to use and set up
the pipeline to collect data from them
2. Preparation of the data. Transformation
of raw data to desired data format.
3. Exploitation of the Data Mart in a lab
environment using analytics and / or
machine learning algorithms to draw
insights
4. Move to a production environment
How to facilitate data acquisition / preparation
DATA ACQUISITION
• Ask yourself what is the data needed
• Detect where and on which infrastructure
this data is stored
• Use of external data (Data enrichment)
• Free and paid data (e.g.: weather,
images)
• Custom crawling scripts
• Crowdsourcing (e.g. labelling tools)
DATA PREPARATION
• Data Cleaning (data cleansing)
• Data profiling
• Understand the data
• Determine the quality of output
• Data granularity
• Actual infrastructure VS new one (e.g. move
excel files to flat files, Relational or No-Relational
DBs)
• Custom scripts
Acquiring or preparing data is rarely an easy task.
A Labelling tool: LegIAnnotate
An image annotation tool to create datasets that will serve to train computer
vision models.
Benefits:
• Collaborative labelling tool
• Easy to use
• Customizable to suit your needs
• Data storage standardization
• Full application control
Link: https://legiannotate.nrb-ai.nrbdigital.be/
Data science project starter pack
• What is my final goal with this project? (Important)
• What will my outcome be?
• What kind of data do I have? Which format? Which quality?
• A good overview of the business
• Different expert profiles (Data Scientists, full stack devs, DB architects, etc.)
• Communication
Data science project starter pack
Env. Lab (conda, VirtualEnv, e.g.)
Jupyter Notebook
Analytics and ML Libs
Languages (R, Python, Javascript, e.g.)
External Sources
Databases
Document files
Production Env.
Visualization
Dashboarding
Will a data science project be successful?
Do I have data?
Is my data qualitative?
Is this data sufficient?
Let’s start Let’s collect
more data
Depends on the
use case
Do I need data?
Let’s collect data Let’s start
Example: Fraud detection in insurance
• What is my final goal with this project?
Detect fraudulent affiliate
• What will my outcome be?
A confidence measure (e.g. probability of being
a fraudulent affiliate)
• What kind of data do I have? Which format?
Which quality?
Data is stored on old database systems and we
are not sure about the data quality. Data is not
labelled (i.e. we do not have a target associated
with the record)
Data extraction and quality problems 
Migration vs existing data storage
ML Algorithm
CLAIM FRAUD!
DATA OUTPUTMODEL
DECISION
TREES
Example: Optimization of data center energy consumption
• What is my final goal with this project?
Reduce the energy consumption of a data
center
• What will my outcome be?
Give recommendations about parameters
to tweak to reduce consumption
• What kind of data do I have? Which
format? Which quality?
Data about the data center energy
consumption as well as information about
the element that could influence the
energy consumption (e.g. weather,)
• Good comprehension of the business
Data center automation engineer experts
that can share their expertise
DATA OUTPUTMODEL
Data center
information at a
regular basis
Recommendations
Optimization
model
• Difficulties in keeping the
collectors up with the data
platform
• The data collection process is done
without a concertation with data
scientists.
Meet the Team
LEJOLY LOÏC
Data Scientist at NRB
DOLORIS SAMY
Data Scientist at NRB
LEILA REBBOUH
Head of Data Science at
NRB
@LoicLejoly
in/loic-lejoly/
loic.lejoly@nrb.be
@SamyDoloris
in/samy-doloris-490421158/
samy.doloris@nrb.be
@leilarebbouh
in/leilarebbouh/
leila.rebbouh@nrb.be

Más contenido relacionado

La actualidad más candente

Big data deep learning: applications and challenges
Big data deep learning: applications and challengesBig data deep learning: applications and challenges
Big data deep learning: applications and challengesfazail amin
 
Managing Your Research Data
Managing Your Research DataManaging Your Research Data
Managing Your Research DataKristin Briney
 
DataONE Education Module 01: Why Data Management?
DataONE Education Module 01: Why Data Management?DataONE Education Module 01: Why Data Management?
DataONE Education Module 01: Why Data Management?DataONE
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big datahktripathy
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceANOOP V S
 
Data Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsData Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsIJMER
 
Issues, challenges, and solutions
Issues, challenges, and solutionsIssues, challenges, and solutions
Issues, challenges, and solutionscsandit
 
Making the Move to an Enterprise Clinical Trial Management System
Making the Move to an Enterprise Clinical Trial Management SystemMaking the Move to an Enterprise Clinical Trial Management System
Making the Move to an Enterprise Clinical Trial Management SystemPerficient
 
Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data ScienceSpotle.ai
 
data mining
data miningdata mining
data mininguoitc
 
Session 04 communicating results
Session 04 communicating resultsSession 04 communicating results
Session 04 communicating resultsbodaceacat
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
Seminar presentation
Seminar presentationSeminar presentation
Seminar presentationKlawal13
 

La actualidad más candente (20)

Pre processing big data
Pre processing big dataPre processing big data
Pre processing big data
 
Data Management 101
Data Management 101Data Management 101
Data Management 101
 
Challenges of Big Data Research
Challenges of Big Data ResearchChallenges of Big Data Research
Challenges of Big Data Research
 
Big data deep learning: applications and challenges
Big data deep learning: applications and challengesBig data deep learning: applications and challenges
Big data deep learning: applications and challenges
 
Data Cleaning
Data CleaningData Cleaning
Data Cleaning
 
Managing Your Research Data
Managing Your Research DataManaging Your Research Data
Managing Your Research Data
 
DataONE Education Module 01: Why Data Management?
DataONE Education Module 01: Why Data Management?DataONE Education Module 01: Why Data Management?
DataONE Education Module 01: Why Data Management?
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
Data science
Data science Data science
Data science
 
Chapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data MiningChapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data Mining
 
Introduction to Data Science by Datalent Team @Data Science Clinic #9
Introduction to Data Science by Datalent Team @Data Science Clinic #9Introduction to Data Science by Datalent Team @Data Science Clinic #9
Introduction to Data Science by Datalent Team @Data Science Clinic #9
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Data Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsData Mining: Future Trends and Applications
Data Mining: Future Trends and Applications
 
Issues, challenges, and solutions
Issues, challenges, and solutionsIssues, challenges, and solutions
Issues, challenges, and solutions
 
Making the Move to an Enterprise Clinical Trial Management System
Making the Move to an Enterprise Clinical Trial Management SystemMaking the Move to an Enterprise Clinical Trial Management System
Making the Move to an Enterprise Clinical Trial Management System
 
Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data Science
 
data mining
data miningdata mining
data mining
 
Session 04 communicating results
Session 04 communicating resultsSession 04 communicating results
Session 04 communicating results
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Seminar presentation
Seminar presentationSeminar presentation
Seminar presentation
 

Similar a Behind the scenes of data science

Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3varshakumar21
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptxAnusuya123
 
Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data AnalyticsUtkarsh Sharma
 
Introduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdfIntroduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdfAbdulrahimShaibuIssa
 
TOPIC.pptx
TOPIC.pptxTOPIC.pptx
TOPIC.pptxinfinix8
 
big data and machine learning ppt.pptx
big data and machine learning ppt.pptxbig data and machine learning ppt.pptx
big data and machine learning ppt.pptxNATASHABANO
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedDunn Solutions Group
 
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptxXanGwaps
 
Enabling Your Data Science Team with Modern Data Engineering
Enabling Your Data Science Team with Modern Data EngineeringEnabling Your Data Science Team with Modern Data Engineering
Enabling Your Data Science Team with Modern Data EngineeringJames Densmore
 
FAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech ProposalsFAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech ProposalsFAIRDOM
 
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...Denodo
 
Data Architecture for Solutions.pdf
Data Architecture for Solutions.pdfData Architecture for Solutions.pdf
Data Architecture for Solutions.pdfAlan McSweeney
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceMahir Haque
 

Similar a Behind the scenes of data science (20)

Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3
 
Big data
Big dataBig data
Big data
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
 
Data science unit1
Data science unit1Data science unit1
Data science unit1
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Unit 3 part i Data mining
Unit 3 part i Data miningUnit 3 part i Data mining
Unit 3 part i Data mining
 
Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data Analytics
 
Introduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdfIntroduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdf
 
TOPIC.pptx
TOPIC.pptxTOPIC.pptx
TOPIC.pptx
 
dwdm unit 1.ppt
dwdm unit 1.pptdwdm unit 1.ppt
dwdm unit 1.ppt
 
big data and machine learning ppt.pptx
big data and machine learning ppt.pptxbig data and machine learning ppt.pptx
big data and machine learning ppt.pptx
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
 
ch2 DS.pptx
ch2 DS.pptxch2 DS.pptx
ch2 DS.pptx
 
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
 
Enabling Your Data Science Team with Modern Data Engineering
Enabling Your Data Science Team with Modern Data EngineeringEnabling Your Data Science Team with Modern Data Engineering
Enabling Your Data Science Team with Modern Data Engineering
 
FAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech ProposalsFAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech Proposals
 
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
 
unit 1 big data.pptx
unit 1 big data.pptxunit 1 big data.pptx
unit 1 big data.pptx
 
Data Architecture for Solutions.pdf
Data Architecture for Solutions.pdfData Architecture for Solutions.pdf
Data Architecture for Solutions.pdf
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 

Último

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 

Último (20)

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 

Behind the scenes of data science

  • 1. LegIA Squad BEHIND THE SCENES OF DATA SCIENCE by Loïc Lejoly
  • 2. Today's OutlineTopic Highlights • Data is everywhere • Data access complexity • Data science projects • Typical data science workflow • How to facilitate Data Acquisition / Data Exploitation • Labelling tool LegIAnnotate • What I need to start a project • Concrete examples
  • 3. Data is everywhere “Without a systematic way to start and keep data clean, bad data will happen.” — Donato Diorio Since the rise of data storage, we have been collecting large quantities of information about users, processes, monitoring, logs, e.g. This information can be stored on flat files, databases, images, e.g.
  • 4. Data access complexity Companies own a large amount of data and a part of it is not exploited for different reasons: • Unawareness of data usefulness • Added value of data to company • Difficulties to access data • Data quality problem • Storage systems date back to the 20th century • No documentation (e.g. no data semantic) • Complex reverse engineering • Compatibility problems (e.g. no existing connectors between old and new systems) • People who worked on it not available anymore
  • 5. Data science projects DATA SCIENCE PROJECTS LEVERAGE YOUR BUSINESS DATA PROJECTS • Data collection / Labelling • Database Creation / Management • Data warehouse Creation / Management • Data architecture / storage STATISTICAL PROJECTS • Machine Learning • Optimization • Artificial Intelligence • R&D DATA VISUALIZATION PROJECTS • Excel Sheets tabular / graphics • Visualization and Analytics tools (Power BI, Tableau, Plotly, D3.js etc.) Not only analytical projects Can be small and simple projects Can be bigger projects
  • 6. Typical data science workflow 1. Detection of sources to use and set up the pipeline to collect data from them 2. Preparation of the data. Transformation of raw data to desired data format. 3. Exploitation of the Data Mart in a lab environment using analytics and / or machine learning algorithms to draw insights 4. Move to a production environment
  • 7. How to facilitate data acquisition / preparation DATA ACQUISITION • Ask yourself what is the data needed • Detect where and on which infrastructure this data is stored • Use of external data (Data enrichment) • Free and paid data (e.g.: weather, images) • Custom crawling scripts • Crowdsourcing (e.g. labelling tools) DATA PREPARATION • Data Cleaning (data cleansing) • Data profiling • Understand the data • Determine the quality of output • Data granularity • Actual infrastructure VS new one (e.g. move excel files to flat files, Relational or No-Relational DBs) • Custom scripts Acquiring or preparing data is rarely an easy task.
  • 8. A Labelling tool: LegIAnnotate An image annotation tool to create datasets that will serve to train computer vision models. Benefits: • Collaborative labelling tool • Easy to use • Customizable to suit your needs • Data storage standardization • Full application control Link: https://legiannotate.nrb-ai.nrbdigital.be/
  • 9. Data science project starter pack • What is my final goal with this project? (Important) • What will my outcome be? • What kind of data do I have? Which format? Which quality? • A good overview of the business • Different expert profiles (Data Scientists, full stack devs, DB architects, etc.) • Communication
  • 10. Data science project starter pack Env. Lab (conda, VirtualEnv, e.g.) Jupyter Notebook Analytics and ML Libs Languages (R, Python, Javascript, e.g.) External Sources Databases Document files Production Env. Visualization Dashboarding
  • 11. Will a data science project be successful? Do I have data? Is my data qualitative? Is this data sufficient? Let’s start Let’s collect more data Depends on the use case Do I need data? Let’s collect data Let’s start
  • 12. Example: Fraud detection in insurance • What is my final goal with this project? Detect fraudulent affiliate • What will my outcome be? A confidence measure (e.g. probability of being a fraudulent affiliate) • What kind of data do I have? Which format? Which quality? Data is stored on old database systems and we are not sure about the data quality. Data is not labelled (i.e. we do not have a target associated with the record) Data extraction and quality problems  Migration vs existing data storage ML Algorithm CLAIM FRAUD! DATA OUTPUTMODEL DECISION TREES
  • 13. Example: Optimization of data center energy consumption • What is my final goal with this project? Reduce the energy consumption of a data center • What will my outcome be? Give recommendations about parameters to tweak to reduce consumption • What kind of data do I have? Which format? Which quality? Data about the data center energy consumption as well as information about the element that could influence the energy consumption (e.g. weather,) • Good comprehension of the business Data center automation engineer experts that can share their expertise DATA OUTPUTMODEL Data center information at a regular basis Recommendations Optimization model • Difficulties in keeping the collectors up with the data platform • The data collection process is done without a concertation with data scientists.
  • 14. Meet the Team LEJOLY LOÏC Data Scientist at NRB DOLORIS SAMY Data Scientist at NRB LEILA REBBOUH Head of Data Science at NRB @LoicLejoly in/loic-lejoly/ loic.lejoly@nrb.be @SamyDoloris in/samy-doloris-490421158/ samy.doloris@nrb.be @leilarebbouh in/leilarebbouh/ leila.rebbouh@nrb.be

Notas del editor

  1. Webinar focused on data science topic trending word What is behind data science less known As the name suggests it is related to data as well as science Will be focused on DATA Also A point to mention: Data Science projects not only data scientists IT and NON IT profiles Different data scientists profiles (Business centric, Data centric, Statistical/ML centric)
  2. - Modules in apps and services to easily collect data (Google Analytics, Cloud Services, Cookies, Etc.)
  3. A lot of data not used properly or not enough Example: English website User data 60% of the users are french
  4. Various types of data science projects
  5. -Flat file = CSV, TSV e.g. Data granularity: - periodicity - Level of data details ( sensor temp vs sensor temp, humidity, wind etc.)
  6. tool based on Make Sense github repo Open source - Sponsor corwdsourcing
  7. Important  To avoid to take a wrong dev path and loose crucial time  not reachable project
  8. Continuous iteration process
  9. - Depends on the use case: an example with a model to detect bad quality data based on a certain treshold