Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

The coding portion of Data Science

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Cargando en…3
×

Eche un vistazo a continuación

1 de 22 Anuncio

The coding portion of Data Science

Descargar para leer sin conexión

The “definition” of Data Scientist says that one should know Math and Statistics, has a domain or business-specific knowledge and knows how to put it in programming code. Nobody knows to what extent this knowledge should be present in a single unicorn. One’s for sure - it grows over time. Knowing to implement and use ML models as repeatable tasks is what separates statisticians and researchers from the Data Scientists that help businesses improve their performance. That’s where the art of coding jumps in.

The “definition” of Data Scientist says that one should know Math and Statistics, has a domain or business-specific knowledge and knows how to put it in programming code. Nobody knows to what extent this knowledge should be present in a single unicorn. One’s for sure - it grows over time. Knowing to implement and use ML models as repeatable tasks is what separates statisticians and researchers from the Data Scientists that help businesses improve their performance. That’s where the art of coding jumps in.

Anuncio
Anuncio

Más Contenido Relacionado

Similares a The coding portion of Data Science (20)

Anuncio

Más de Institute of Contemporary Sciences (20)

Más reciente (20)

Anuncio

The coding portion of Data Science

  1. 1. THE CODING PORTION OF DATA SCIENCE MILOS MILOVANOVIC milos@thingsolver.com CTO & Co-Founder ENLIGHTEN YOUR DATA November, 2019 www.thingsolver.com
  2. 2. WHY DO I CARE ABOUT THE TOPIC? There is one problem with your topic - you are not a Data Scientist! - Valentina Djordjevic / Head of Data Science @ THINGS SOLVER Building Data Products As a business owner, I need to ensure that our products work and improve client’s business processes. Technical Lead As a CTO, I need to ensure that my colleagues have the necessary skill set and that our technology is smooth. Data Engineering As a Data Engineer, I work with Data Scientists on productizing ML workflows and optimization.
  3. 3. CAN I BE A DATA SCIENTIST WITHOUT CODING? Common algorithms are already known, coded and optimized. Explicit coding is being replaced with drag-and-drop interfaces. Data science is becoming more automated with options like Google’s Cloud AutoML, DataRobot, ... Basic knowledge of Python and/or R helps us to tackle our ML tasks with common algorithms. VS. https://www.glassdoor.com/research/data-scientist-personas/ *
  4. 4. LEARNING DATA SCIENCE - Robert Chang, Data Scientist @ Airbnb Link: https://medium.com/@rchang/a-beginners-guide-to-data-engineering-part-i-4227c5c457d7 Computer Science M ath & Statistics B usiness K now ledge DATA SCIENCE Machine Learning Software Development Traditional Research
  5. 5. LEARNING DATA SCIENCE - Robert Chang, Data Scientist @ Airbnb Link: https://medium.com/@rchang/a-beginners-guide-to-data-engineering-part-i-4227c5c457d7 M ath & Statistics B usiness K now ledge DATA SCIENCE Machine Learning Software Development Traditional Research Computer Science
  6. 6. WHAT ACADEMIC INSTITUTIONS TEACH US? WHAT DO WE NEED TO LEARN IN REAL-LIFE? Import Data Build Features Modeling Model Evaluation Problem Formulation ETL Productizing Integration with Business Processes Debugging Context! thingsolver.com
  7. 7. www.thingsolver.com BUT WE CANNOT BLAME THE ACADEMIA... Speed of expansion in Data Science and ML fields is too high for academic institutions to keep the pace. Time and resources restrictions in a Master’s degree limit the content that can be taught. Data Science and ML include too many fields to be completely thought over a 4-5 years degree program.
  8. 8. www.thingsolver.com WHAT KAGGLE TEACHES US? 21 3 Join a Competition Build and Submit Your Model Watch the Leaderboard and Win Prizes Incredible Datasets Impressive Kernels Huge learning base Cleaned (and labeled) Data Runtime Environment Preparation Kernels Kaggle Stars Benchmark against other solutions * https://www.kaggle.com/challenge-yourself
  9. 9. www.thingsolver.com KAGGLE VS REALITY 21 3 Join a Competition Build and Submit Your Model Watch the Leaderboard and Win Prizes Incredible Datasets Impressive Kernels Huge learning base Cleaned (and labeled) Data Runtime Environment Preparation Kernels Kaggle Stars Benchmark against other solutions * https://www.kaggle.com/challenge-yourself Problems are already formulated! Datasets are prepared! Data is labeled! Lack of Decision Process!
  10. 10. DECISION MAKING PROCESS ADAPTIVE REGULATED CONTEXTUAL INTERRELATED REALTIME thingsolver.com Strategic and Operational Business Models Change Decision Making depends on Business Context (internal and external) Legislation dictates the applicability and usage of different models. Decisions are made in a fast-paced manner. Decision Making is based on interconnection of different business processes.
  11. 11. DATA IS MESSY ACCESS ● Different Databases ● VPNs ● Security CLEAN & PREPARE ● Remove dirty data ● Transform ● Disparate data sources thingsolver.com Network Protocols Connectors Security Policies SQL SQL Efficient Code Data Transformation
  12. 12. DATA SCIENCE IS A TEAMWORK thingsolver.com Build features Train model A Train model B Deploy to production NO SILOS!REUSABILITYEXPLAINABILITY
  13. 13. CODE VERSIONING thingsolver.com .gitignore .ipynb_checkpoints */.ipynb_checkpoints/*
  14. 14. thingsolver.com
  15. 15. BUILDING A ROBUST AND OPTIMAL SYSTEM thingsolver.com Building a model in your Jupyter Notebook Building a model in live and robust environment VS
  16. 16. BUILDING A ROBUST AND OPTIMAL SYSTEM thingsolver.com Building a model in your Jupyter Notebook Pipeline Automation Scale Monitoring Integrate Quality Assurance Build & Deployment Building a model in live and robust environment VS
  17. 17. BUILDING A ROBUST AND OPTIMAL SYSTEM thingsolver.com Building a model in your Jupyter Notebook Pipeline Automation Efficient Code Scale Distributed Code Monitoring Logging Integrate API Design Quality Assurance Unit Testing Build & Deployment Pluggable Packaging DATA PRODUCT VS
  18. 18. BUILDING A ROBUST AND OPTIMAL SYSTEM thingsolver.com Building a model in your Jupyter Notebook Pipeline Automation Scale Monitoring Logging Integrate API Design CI / CD Build & Deployment Pluggable Packaging VS
  19. 19. DOES THE MODEL WORTH THE INVESTMENT? PIPELINE EFFICIENCY $ thingsolver.com
  20. 20. Data Science is more than pure analytics: ITERATIVE INTERCONNECTED ADAPTIVE PROCESSES AUTOMATION
  21. 21. Data Science is more than pure analytics: ITERATIVE INTERCONNECTED ADAPTIVE PROCESSES AUTOMATION LEARNING
  22. 22. THANK YOU! MILOS MILOVANOVIC milos@thingsolver.com CTO & Co-Founder ENLIGHTEN YOUR DATA credits: Photo by Kevin Ku from Pexels

×