Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

How to program your way into data science?

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio

Eche un vistazo a continuación

1 de 13 Anuncio
Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

A los espectadores también les gustó (17)

Anuncio

Similares a How to program your way into data science? (20)

Anuncio

Más reciente (20)

How to program your way into data science?

  1. 1. How to program your way into Data Science? Eeshan Chatterjee Data Scientist @ MediaIQ Digital https://in.linkedin.com/in/eeshanchatterjee www.github.com/EeshanChatterjee
  2. 2. What is Data? Google Definition: ● Facts and statistics collected together for reference or analysis. ● The quantities, characters, or symbols on which operations are performed by a computer, which may be stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording media. ● Things known or assumed as facts, making the basis of reasoning or calculation. Umm... OK. But what is data in the business world? Lets simplify the entire thing. If you can Observe it, Record it, Store it and Measure it, It's gonna help your business. This is the data that is important to you.
  3. 3. What data does my business generate? Each and every department, right from the CEO's Office, to the janitorial division collects data. Stored! People Data Sales Data Customer Satisfaction Data Industrial Production & Wastage Data Travel Data Energy Data
  4. 4. Now the Buzzword: Data Science
  5. 5. The Basics How did we arrive at Data Science? Measure KPIs Model Key Metrics Operations Research The Era of Business Intelligence Dashboards Frequent Updates Business Analytics The Era of Data Science Cockpits Distributed Computation Federated Data Intelligent Systems Guess What didn't Change: Help Business make Better Decisions! The Era of Statistical Insight
  6. 6. The Basics If it's always been the same core job, can a statistician call himself a Data Scientist? Well... Not exactly. Today the job has diversified, demanding a wider skillset! Data Design Architect DataEngineer Requirement/Business Analyst Math & Statistics Business & Domain Tech & Computer Science DESIGNTHINKING }
  7. 7. But.. Programming for Everything? Actually, Yes. Let's look at a popular cheatsheet circulating on the internet. Infographic courtesy: http://nirvacana.com/thoughts/becoming-a-data-scientist/ Guess what, We can't tick off 15% of this checklist without programming!
  8. 8. Programming for Math Scripting Language Packages Data Structures Notebooks & Markdown Plotting Techniques Classes & Functions Cross- Language Execution The Algo Whiz Codebook ● Choose your scripting language. R & Python are the popular chioces. ● Use what's out there. Prebuilt packages for almost every technique are freely available for use. ● Interactive plots cut down EDA time by a huge margin.
  9. 9. R or Python? The holy grail of data science choices! It is indeed difficult to choose between the two. Their capabilities are pretty much the same. So, Which one do I choose?* Choose R When Choose Python When ● You are begining to explore your data ● You are looking to find one-time insight or developing analysis methodology ● You want to try out a broad spectrum of techniques to find best ensembles to use ● You have a good understanding of the data and techniques you want to use ● You want to deploy your analysis methodology as a persistant large- scale production system ● You want to train deep models on GPUs * This one is based on my experience and opinion. It has worked for me. The next person you ask, will have a different take on the matter.
  10. 10. Programming for Tech Data Platforms Ingestion & Management Services JAVA Distribution & Scale Hadoop, Yarn, Scala, JADE... JAVA Efficient Processing Low level Subroutines C++ GPGPU & Large Scale ML CUDA, OpenGL, MPI C/C++ The Scale-Out Toolbox ● C++ and JAVA form the backbone of almost every at-scale data system ● Most NoSQL & NewSQL databases are based on Java ● Large scale machine learning with millions of data points most certainly need GPU scale processing.
  11. 11. Programming for the Business Image courtesy: http://exposedata.com/tutorial/canvas/ The Decision-Maker's Cockpit ● Interactive charts allow answering of business questions intuitive. ● Real time updates allow decisions based on the latest information available. ● Bird's eye and drill down capabilities allow for multiple perspectives without losing context.
  12. 12. Design Thinking and Programming Design Thinking let's you break down and analyse the problem and synthesize the best solution from multiple solutions possible. At-Scale Solution Desired Future State Complication 1 Roadblock 2 Issue 3 Possible Solution 1 Possible Solution 2 Possible Solution 3 Possible Solution 4 Prototype Solution 4 Prototype Solution 3 Prototype Solution 2 Prototype Solution 1 Consumption Current State Define | Ideate | Prototype | Iterate | Develop | Deploy
  13. 13. Questions? Eeshan Chatterjee eeshanchatterjee@gmail.com Thank You!

×