Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Numerical Relativity as preparation for Industrial Data Science: a personal perspective

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Próximo SlideShare
Insight white paper_2014
Insight white paper_2014
Cargando en…3
×

Eche un vistazo a continuación

1 de 33 Anuncio

Numerical Relativity as preparation for Industrial Data Science: a personal perspective

Invited talk presented by Applied Technical Systems' CIO/CTO Ken Smith at the 2014 American Physical Society's April Meeting in Savannah, GA

Abstract:
Much of the conversation in commercial enterprises these days revolves around industry buzz words such as Big Data, Data Science, and being Data Driven. Beyond the hype surrounding these terms, there is a real, continuously growing movement for organizations to make better use of the data assets they have to inform decisions, strategy, and policy. This push is not unique to the commercial sector; governmental and academic organizations are also embracing such initiatives. The skills required to staff a Data Science project typically come from a number of disciplines, ranging from computer science, statistics, modeling and simulation, to information technology, but the emerging wisdom in the community is that the rigor and discipline of a scientific background often makes for the best data scientists. In this talk, I will offer a personal perspective on making the transition from a career in computational physics (specifically Numerical Relativity) to a career in industry, where I have focused on helping organizations make more informed decisions through better access and analysis of data at their disposal. I will identify the skills and training that carry over from a background in physics, discuss the gaps in that preparation, hypothesize as to where this industry is headed, and offer a frank look at a life outside of academia.

Invited talk presented by Applied Technical Systems' CIO/CTO Ken Smith at the 2014 American Physical Society's April Meeting in Savannah, GA

Abstract:
Much of the conversation in commercial enterprises these days revolves around industry buzz words such as Big Data, Data Science, and being Data Driven. Beyond the hype surrounding these terms, there is a real, continuously growing movement for organizations to make better use of the data assets they have to inform decisions, strategy, and policy. This push is not unique to the commercial sector; governmental and academic organizations are also embracing such initiatives. The skills required to staff a Data Science project typically come from a number of disciplines, ranging from computer science, statistics, modeling and simulation, to information technology, but the emerging wisdom in the community is that the rigor and discipline of a scientific background often makes for the best data scientists. In this talk, I will offer a personal perspective on making the transition from a career in computational physics (specifically Numerical Relativity) to a career in industry, where I have focused on helping organizations make more informed decisions through better access and analysis of data at their disposal. I will identify the skills and training that carry over from a background in physics, discuss the gaps in that preparation, hypothesize as to where this industry is headed, and offer a frank look at a life outside of academia.

Anuncio
Anuncio

Más Contenido Relacionado

Similares a Numerical Relativity as preparation for Industrial Data Science: a personal perspective (20)

Más reciente (20)

Anuncio

Numerical Relativity as preparation for Industrial Data Science: a personal perspective

  1. 1. Numerical Relativity as preparation for Industrial Data Science: a personal perspective Ken Smith, CIO/CTO APS April Meeting, 2014-04-06
  2. 2. Who am I? What is data science? Why is it a viable (maybe even desirable) career option for physicists? How do you get started? Overview Note: all image attributions will appear at the end of the slide deck. 2
  3. 3. Who am I? 2002 2004 2006 2008 2010 2012 2014 grad student lecturer sr. scientist CIO sr. scientist architect physics educationnumerical relativity / astrophysics machine learning natural language processing software architecture 3
  4. 4. Selected projects • Automatically categorizing text documents into topics based solely on content • Improving entity (person, location, organization) extraction techniques for large bodies of text within the US Army • Developing new tools for US Patent Examiners within the USPTO • Modeling and linking disparate datasets associated with supply & maintenance of US Navy systems • Designing systems to organize and visualize skills mix of employees within a company 4
  5. 5. WHAT IS DATA SCIENCE? SKILLS TRENDS ACTIVITIES 5
  6. 6. ―I keep saying the sexy job in the next ten years will be statisticians. People think I’m joking, but who would’ve guessed that computer engineers would’ve been the sexy job of the 1990s? The ability to take data—to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it—that’s going to be a hugely important skill in the next decades‖ Hal Varian, Chief Economist, Google January 2009 The sexiest job? http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/ar/1 http://www.mckinsey.com/insights/innovation/hal_varian_on_how_the_web_challenges_managers 6
  7. 7. Data Science Skills & Disciplines 7 http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
  8. 8. Data Science post-Prism 8http://joelgrus.com/2013/06/09/post-prism-data-science-venn-diagram
  9. 9. Trends: Data Storage IBM 350 in 1956: 3.75 MB 6.4 kB/s data transfer (50) 24-in diameter disk platters > 1 ton Leased for $3200/mo 9 http://old-photos.blogspot.com/2011/06/hard-drive.html
  10. 10. Trends: Data Storage 10http://www.mkomo.com/cost-per-gigabyte-update
  11. 11. Trends: Open Source Software 11https://github.com/blog/1724-10-million-repositories
  12. 12. Trends: Quantized Self The 2012 Feltron Report 12 http://feltron.com/ar12_02.html
  13. 13. Trends: Quantized Self The 2012 Feltron Report 13http://feltron.com/ar12_02.html
  14. 14. Trends: Quantized Self & Ubiquitous Sensors 14
  15. 15. Trends: Digital Exhaust 15
  16. 16. Father walks into a Minneapolis Target store: ―My daughter got this in the mail!‖ he said. ―She’s still in high school, and you’re sending her coupons for baby clothes and cribs? Are you trying to encourage her to get pregnant?‖ Manager apologizes and calls back a few days later to apologize again ―I had a talk with my daughter,‖ he said. ―It turns out there’s been some activities in my house I haven’t been completely aware of. She’s due in August. I owe you an apology.‖ Data mining determined a set of signals that a pregnant shopper may be getting near to her due date: • larger quantities of unscented lotion • supplements like calcium, magnesium and zinc. • scent-free soap and • extra-big bags of cotton balls • hand sanitizers • washcloths Trends: Targeted Marketing http://www.nytimes.com/2012/02/19/magazine/shopping-habits.html 16
  17. 17. ―What differentiates data science from statistics is that data science is a holistic approach. We’re increasingly finding data in the wild, and data scientists are involved with gathering data, massaging it into a tractable form, making it tell its story, and presenting that story to others.‖ What data scientists do 17 http://www.oreilly.com/data/free/what-is-data-science.csp
  18. 18. What does a data scientist do? 18 http://strata.oreilly.com/2013/09/data-analysis-just-one-component-of-the-data-science-workflow.html
  19. 19. WHY IS DATA SCIENCE VIABLE FOR PHYSICISTS? 19
  20. 20. ―People often assume that data scientists need a background in computer science. In my experience, that hasn’t been the case: my best data scientists have come from very different backgrounds. The inventor of LinkedIn’s People You May Know was an experimental physicist. A computational chemist on my decision sciences team had solved a 100-year-old problem on energy states of water. An oceanographer made major impacts on the way we identify fraud. Perhaps most surprising was the neurosurgeon who turned out to be a wizard at identifying rich underlying trends in the data.‖ DJ Patil, former Chief Scientist for LinkedIn Where do data scientists come from? http://radar.oreilly.com/2011/09/building-data-science-teams.html 20
  21. 21. Insight Data Science Fellows 21 http://insightdatascience.com/ An intensive six week post-doctoral training fellowship bridging the gap between academia and data science
  22. 22. Projected Data Science Demand 22 https://www.mckinsey.com/~/media/McKinsey/dotcom/Insights%20and%20pubs/MGI/Research/Technology %20and%20Innovation/Big%20Data/MGI_big_data_exec_summary.ashx
  23. 23. Recent NSF data on employment at PhD award 23 http://www.nsf.gov/statistics/sed/digest/2012/
  24. 24. AIP Physics Career Statistics 24 http://aip.org/statistics/data-graphics/physics-phds-starting-salaries-classes-2009-2010 http://aip.org/statistics/physics-trends/physics-phds-1-year-later
  25. 25. What you have: • Analytical/problem- solving mindset • Presentation skills (oral, written, & graphical) • Mathematical preparation • Curiosity • Understanding that reference frames can only ever be local What you are missing: • Sufficient training in statistics – Regression beyond linear – Classification techniques – Machine learning • SQL (Database) • Information Visualization (psychology of design) • Business/Finance acumen Physics prep for Data Science Warning: gross generalizations 25
  26. 26. Introduce statistical analysis techniques into graduate (possibly undergraduate) core physics curriculum. Make computer science courses available in high school. The ability to program is becoming a foundational skill along with reading, writing, and arithmetic. Curriculum Recommendations 26 http://www.amazon.com/Mathematical-Methods-Physicists-Fourth-Edition/dp/0120598159 http://csedweek.org/promote
  27. 27. HOW DO YOU GET STARTED? 27
  28. 28. 28 http://nirvacana.com/thoughts/becoming-a-data-scientist/
  29. 29. • Insight Data Science Fellows Program http://insightdatascience.com/ • Coursera: Stanford Machine Learning https://www.coursera.org/course/ml • Coursera: U. Washington Intro to Data Science https://www.coursera.org/course/datasci • Coursera: Princeton Algorithms Part I https://www.coursera.org/course/algs4partI • General Assembly Data Science https://generalassemb.ly/education/data-science Resources available 29
  30. 30. Learn and compete! “Kaggle is the world's largest community of data scientists. They compete with each other to solve complex data science problems, and the top competitors are invited to work on the most interesting and sensitive business problems from some of the world’s biggest companies through Masters competitions.” www.kaggle.com/about 30
  31. 31. Twitter: @Ken_2scientists http://www.atsid.com http://slidesha.re/1idf43d Thanks! 31
  32. 32. Image Sources 32 Slide Source 7 http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram 8 http://joelgrus.com/2013/06/09/post-prism-data-science-venn-diagram 9 http://old-photos.blogspot.com/2011/06/hard-drive.html 10 http://www.mkomo.com/cost-per-gigabyte-update 11 https://github.com/blog/1724-10-million-repositories 12,13 http://feltron.com/ar12_02.html 14 http://www.fitbit.com 15 https://chrome.google.com/webstore/detail/collusion-for- chrome/ganlifbpkcplnldliibcbegplfmcfigp 18 http://strata.oreilly.com/2013/09/data-analysis-just-one-component-of-the- data-science-workflow.html 21 http://insightdatascience.com/
  33. 33. Image Sources 33 Slide Source 22 https://www.mckinsey.com/~/media/McKinsey/dotcom/Insights%20and%20pu bs/MGI/Research/Technology%20and%20Innovation/Big%20Data/MGI_big_dat a_exec_summary.ashx 23 http://www.nsf.gov/statistics/sed/digest/2012/ 24 http://aip.org/statistics/data-graphics/physics-phds-starting-salaries-classes- 2009-2010 http://aip.org/statistics/physics-trends/physics-phds-1-year-later 26 http://www.amazon.com/Mathematical-Methods-Physicists-Fourth- Edition/dp/0120598159 http://csedweek.org/promote 28 http://nirvacana.com/thoughts/becoming-a-data-scientist/ 30 http://www.kaggle.com/competitions

×