Se ha denunciado esta presentación.

Dark Matter, Public Health, and Scientific Computing

2

Compartir

1 de 51
1 de 51

Dark Matter, Public Health, and Scientific Computing

2

Compartir

Descargar para leer sin conexión

Descripción

Talk given at the 8th IEEE International Conference on eScience in Chicago on October 10, 2012.

Transcripción

  1. 1. Dark Matter, Public Health, and Scientific Computing Greg Wilson
  2. 2. “Dark Matter Developers” Scott Hanselman (March 2012) [We] hypothesize that there is another kind of developer than the ones we meet all the time. We call them Dark Matter Developers. They don't read a lot of blogs, they never write blogs, they don't go to user groups, they don't tweet or facebook, and you don't often see them at large conferences... [They] aren't chasing the latest beta or pushing...limits, they are just producing. http://www.hanselman.com/blog/DarkMatterDevelopersTheUnseen99.aspx 2
  3. 3. I'm Not That Optimistic ● 95% of scientists have their heads down – Doing science instead of talking about using GPU clouds to personalize collaborative management of reproducible peta-scale workflows ● Not because they don't want to do all that Star Trek stuff ● But because e-science is out of reach 3
  4. 4. Shiny Toys 4
  5. 5. Grimy Reality 5
  6. 6. So Here We Are You Them 6
  7. 7. Surely You're Exaggerating 1. How many graduate students write shell scripts to analyze each new data set instead of running those analyses by hand? 7
  8. 8. Surely You're Exaggerating 2. How many of them use version control to keep track of their work and collaborate with colleagues? 8
  9. 9. Surely You're Exaggerating 3. How many routinely break large problems into pieces small enough to be - comprehensible, - testable, and - reusable? 9
  10. 10. Surely You're Exaggerating 3. How many routinely break large problems into pieces small enough to be - comprehensible, - testable, and - reusable? And how many know those are the same things? 10
  11. 11. “Not My Problem” Actually your biggest problem If someone is chronically malnourished as a child, giving them a CT scanner when they're 20 has no effect on their wellbeing. 11
  12. 12. Their Biggest Problem Too Them Us “I don't know much “Let's talk about about this, but I'd brain scans. like some clean water.” They're cool.” 12
  13. 13. We've Left the Majority Behind 13
  14. 14. We've Left the Majority Behind Other than Googling for things, the majority of scientists do not use computers more effectively today than they did 28 years ago. 14
  15. 15. The Goalposts A scientist is computationally competent if she can build, use, validate, and share software to: ● Manage and process data ● Tell if it's been processed correctly ● Find and fix problems when it hasn't been ● Keep track of what she's done ● Share work with others ● Do all of these things efficiently 15
  16. 16. A Driver's License Exam ● Developed with the Software Sustainability Institute for the DiRAC Consortium ● Formative assessment – Do you know what you need to know in order to get the most out of this gear? ● Pencils ready? Note: actual exam allows for several different programming languages, version control systems, etc. 16
  17. 17. A Driver's License Exam 1. Check out a working copy of the exam 2. materials from Subversion. 3. 4. 5. 6. 7. Could do it easily 1.0 Could struggle through 0.5 Wouldn't know where to start 0.0 Don't understand the question -1.0 17
  18. 18. A Driver's License Exam 1. Use find and grep in a pipe to create a 2. list of all .dat files in the working copy, 3. redirect the output to a file, and add that 4. file to the repository. 5. 6. 7. Could do it easily 1.0 Could struggle through 0.5 Wouldn't know where to start 0.0 Don't understand the question -1.0 18
  19. 19. A Driver's License Exam 1. Write a shell script that runs a legacy 2. program for each parameter in a set. 3. 4. 5. 6. 7. Could do it easily 1.0 Could struggle through 0.5 Wouldn't know where to start 0.0 Don't understand the question -1.0 19
  20. 20. A Driver's License Exam 1. Edit the Makefile provided so that if any 2. .dat file changes, analyze.py is re-run to 3. create the corresponding .out file. 4. 5. 6. 7. Could do it easily 1.0 Could struggle through 0.5 Wouldn't know where to start 0.0 Don't understand the question -1.0 20
  21. 21. A Driver's License Exam 1. Write four unit tests to exercise a function 2. that calculates running sums. Explain why 3. your four tests are most likely to uncover 4. bugs in the function. 5. 6. 7. Could do it easily 1.0 Could struggle through 0.5 Wouldn't know where to start 0.0 Don't understand the question -1.0 21
  22. 22. A Driver's License Exam 1. Explain when and how a function could 2. pass your tests, but still fail on real data. 3. 4. 5. 6. 7. Could do it easily 1.0 Could struggle through 0.5 Wouldn't know where to start 0.0 Don't understand the question -1.0 22
  23. 23. A Driver's License Exam 1. Do a code review of the legacy program 2. from Q3 (about 50 lines long) and list 3. the four most important improvements 4. you would make. 5. 6. 7. Could do it easily 1.0 Could struggle through 0.5 Wouldn't know where to start 0.0 Don't understand the question -1.0 23
  24. 24. How Are We Doing? a) How did you do? 24
  25. 25. How Are We Doing? a) How did you do? b) How do you think most scientists do? 25
  26. 26. How Are We Doing? a) How did you do? b) How do you think most scientists do? c) Do you think a scientist could use a peta-scale GPU provenance cloud without these skills? 26
  27. 27. How Are We Doing? a) How did you do? b) How do you think most scientists do? c) Do you think a scientist could use debug a peta-scale GPU provenance cloud without these skills? 27
  28. 28. How Are We Doing? a) How did you do? b) How do you think most scientists do? c) Do you think a scientist could use debug a peta-scale GPU provenance extend cloud without these skills? 28
  29. 29. So Why Is This Your Problem? If you're only helping the small minority lucky enough to have acquired the skills that mastering your shiny toy depends on... your potential user base is many times smaller than it could be. 29
  30. 30. It Is Therefore Obvious That... ● Put more computing courses in the curriculum! – Except the curriculum is already full 30
  31. 31. It Is Therefore Obvious That... ● Put a little computing in every course! – Still adds up: 5 minutes/lecture = 4 courses/degree – First thing cut when running late – The blind leading the blind 31
  32. 32. It Is Therefore Obvious That... ● Move all our teaching online! – Where do I start? – Signal vs. noise – “It's not a MOOC, it's a MESS” Everyone who understands something correctly understands it the same way. Everyone who misunderstands something misunderstands differently. 32
  33. 33. What Has Worked ● Target graduate students 33
  34. 34. What Has Worked ● Target graduate students ● Intensive short courses (2 days to 2 weeks) 34
  35. 35. What Has Worked ● Target graduate students ● Intensive short courses (2 days to 2 weeks) ● Focus on practical skills 35
  36. 36. What Has Worked ● Target graduate students ● Intensive short courses (2 days to 2 weeks) ● Focus on practical skills ● Follow up with 6-8 weeks of once-a-week online tutorials and help – Still tuning this part 36
  37. 37. 37
  38. 38. What We Teach ● Unix shell ● Python ● Version control ● Testing ● Matrix programming, image processing, SQL, ... 38
  39. 39. What We Teach ● Unix shell ● Python ● Version control ● Testing ● Matrix programming, image processing, SQL, ... 39
  40. 40. “That's Merely Useful” ● None of this is publishable any longer – Which means it's ineffective career-wise for academic computer scientists 40
  41. 41. “That's Merely Useful” ● None of this is publishable any longer – Which means it's ineffective career-wise for academic computer scientists ● But it works – Two independent assessments have validated the impact of what we do for 1/3-2/3 of learners – That's at least a six-fold increase in the number of scientists who can get work done without karōshi 41
  42. 42. Majestic Equality Anatole France (1844-1924) “The law, in its majestic equality, forbids the rich and poor alike to sleep under bridges, to beg in the streets, and to steal bread.” 42
  43. 43. Majestic Equality Anatole France (1844-1924) “The law, in its majestic equality, forbids the rich and poor alike to sleep under bridges, to beg in the streets, and to steal bread.” Thanks to modern computers, every scientist can now devote her working life to installation and configuration issues. 43
  44. 44. If you build a man a fire, you'll keep him warm for a night. If you set a man on fire, you'll keep him warm for the rest of his life. — Terry Pratchett 44
  45. 45. Host a Workshop 45
  46. 46. Teach a Workshop ● All our materials are CC-BY licensed ● We will help train you 46
  47. 47. Shine Some Light ● Include a few lines about your software stack and working practices in your scientific papers ● Ask about them when reviewing – Where's the repository containing the code? – What's the coverage of your unit tests? – How did you track provenance? 47
  48. 48. Be the Change You Want to See Would you rather increase the productivity of the top 5% of scientists ten-fold? Or double the productivity of the other 95%? 48
  49. 49. Be the Change You Want to See Would you rather increase the productivity of the top 5% of scientists ten-fold? Or double the productivity of the other 95%? 49
  50. 50. Be the Change You Want to See Would you rather see your best ideas left on the shelf because they're out of reach? Or raise a generation who can all do the things we think are exciting? 50
  51. 51. http://software-carpentry.org 51

Descripción

Talk given at the 8th IEEE International Conference on eScience in Chicago on October 10, 2012.

Transcripción

  1. 1. Dark Matter, Public Health, and Scientific Computing Greg Wilson
  2. 2. “Dark Matter Developers” Scott Hanselman (March 2012) [We] hypothesize that there is another kind of developer than the ones we meet all the time. We call them Dark Matter Developers. They don't read a lot of blogs, they never write blogs, they don't go to user groups, they don't tweet or facebook, and you don't often see them at large conferences... [They] aren't chasing the latest beta or pushing...limits, they are just producing. http://www.hanselman.com/blog/DarkMatterDevelopersTheUnseen99.aspx 2
  3. 3. I'm Not That Optimistic ● 95% of scientists have their heads down – Doing science instead of talking about using GPU clouds to personalize collaborative management of reproducible peta-scale workflows ● Not because they don't want to do all that Star Trek stuff ● But because e-science is out of reach 3
  4. 4. Shiny Toys 4
  5. 5. Grimy Reality 5
  6. 6. So Here We Are You Them 6
  7. 7. Surely You're Exaggerating 1. How many graduate students write shell scripts to analyze each new data set instead of running those analyses by hand? 7
  8. 8. Surely You're Exaggerating 2. How many of them use version control to keep track of their work and collaborate with colleagues? 8
  9. 9. Surely You're Exaggerating 3. How many routinely break large problems into pieces small enough to be - comprehensible, - testable, and - reusable? 9
  10. 10. Surely You're Exaggerating 3. How many routinely break large problems into pieces small enough to be - comprehensible, - testable, and - reusable? And how many know those are the same things? 10
  11. 11. “Not My Problem” Actually your biggest problem If someone is chronically malnourished as a child, giving them a CT scanner when they're 20 has no effect on their wellbeing. 11
  12. 12. Their Biggest Problem Too Them Us “I don't know much “Let's talk about about this, but I'd brain scans. like some clean water.” They're cool.” 12
  13. 13. We've Left the Majority Behind 13
  14. 14. We've Left the Majority Behind Other than Googling for things, the majority of scientists do not use computers more effectively today than they did 28 years ago. 14
  15. 15. The Goalposts A scientist is computationally competent if she can build, use, validate, and share software to: ● Manage and process data ● Tell if it's been processed correctly ● Find and fix problems when it hasn't been ● Keep track of what she's done ● Share work with others ● Do all of these things efficiently 15
  16. 16. A Driver's License Exam ● Developed with the Software Sustainability Institute for the DiRAC Consortium ● Formative assessment – Do you know what you need to know in order to get the most out of this gear? ● Pencils ready? Note: actual exam allows for several different programming languages, version control systems, etc. 16
  17. 17. A Driver's License Exam 1. Check out a working copy of the exam 2. materials from Subversion. 3. 4. 5. 6. 7. Could do it easily 1.0 Could struggle through 0.5 Wouldn't know where to start 0.0 Don't understand the question -1.0 17
  18. 18. A Driver's License Exam 1. Use find and grep in a pipe to create a 2. list of all .dat files in the working copy, 3. redirect the output to a file, and add that 4. file to the repository. 5. 6. 7. Could do it easily 1.0 Could struggle through 0.5 Wouldn't know where to start 0.0 Don't understand the question -1.0 18
  19. 19. A Driver's License Exam 1. Write a shell script that runs a legacy 2. program for each parameter in a set. 3. 4. 5. 6. 7. Could do it easily 1.0 Could struggle through 0.5 Wouldn't know where to start 0.0 Don't understand the question -1.0 19
  20. 20. A Driver's License Exam 1. Edit the Makefile provided so that if any 2. .dat file changes, analyze.py is re-run to 3. create the corresponding .out file. 4. 5. 6. 7. Could do it easily 1.0 Could struggle through 0.5 Wouldn't know where to start 0.0 Don't understand the question -1.0 20
  21. 21. A Driver's License Exam 1. Write four unit tests to exercise a function 2. that calculates running sums. Explain why 3. your four tests are most likely to uncover 4. bugs in the function. 5. 6. 7. Could do it easily 1.0 Could struggle through 0.5 Wouldn't know where to start 0.0 Don't understand the question -1.0 21
  22. 22. A Driver's License Exam 1. Explain when and how a function could 2. pass your tests, but still fail on real data. 3. 4. 5. 6. 7. Could do it easily 1.0 Could struggle through 0.5 Wouldn't know where to start 0.0 Don't understand the question -1.0 22
  23. 23. A Driver's License Exam 1. Do a code review of the legacy program 2. from Q3 (about 50 lines long) and list 3. the four most important improvements 4. you would make. 5. 6. 7. Could do it easily 1.0 Could struggle through 0.5 Wouldn't know where to start 0.0 Don't understand the question -1.0 23
  24. 24. How Are We Doing? a) How did you do? 24
  25. 25. How Are We Doing? a) How did you do? b) How do you think most scientists do? 25
  26. 26. How Are We Doing? a) How did you do? b) How do you think most scientists do? c) Do you think a scientist could use a peta-scale GPU provenance cloud without these skills? 26
  27. 27. How Are We Doing? a) How did you do? b) How do you think most scientists do? c) Do you think a scientist could use debug a peta-scale GPU provenance cloud without these skills? 27
  28. 28. How Are We Doing? a) How did you do? b) How do you think most scientists do? c) Do you think a scientist could use debug a peta-scale GPU provenance extend cloud without these skills? 28
  29. 29. So Why Is This Your Problem? If you're only helping the small minority lucky enough to have acquired the skills that mastering your shiny toy depends on... your potential user base is many times smaller than it could be. 29
  30. 30. It Is Therefore Obvious That... ● Put more computing courses in the curriculum! – Except the curriculum is already full 30
  31. 31. It Is Therefore Obvious That... ● Put a little computing in every course! – Still adds up: 5 minutes/lecture = 4 courses/degree – First thing cut when running late – The blind leading the blind 31
  32. 32. It Is Therefore Obvious That... ● Move all our teaching online! – Where do I start? – Signal vs. noise – “It's not a MOOC, it's a MESS” Everyone who understands something correctly understands it the same way. Everyone who misunderstands something misunderstands differently. 32
  33. 33. What Has Worked ● Target graduate students 33
  34. 34. What Has Worked ● Target graduate students ● Intensive short courses (2 days to 2 weeks) 34
  35. 35. What Has Worked ● Target graduate students ● Intensive short courses (2 days to 2 weeks) ● Focus on practical skills 35
  36. 36. What Has Worked ● Target graduate students ● Intensive short courses (2 days to 2 weeks) ● Focus on practical skills ● Follow up with 6-8 weeks of once-a-week online tutorials and help – Still tuning this part 36
  37. 37. 37
  38. 38. What We Teach ● Unix shell ● Python ● Version control ● Testing ● Matrix programming, image processing, SQL, ... 38
  39. 39. What We Teach ● Unix shell ● Python ● Version control ● Testing ● Matrix programming, image processing, SQL, ... 39
  40. 40. “That's Merely Useful” ● None of this is publishable any longer – Which means it's ineffective career-wise for academic computer scientists 40
  41. 41. “That's Merely Useful” ● None of this is publishable any longer – Which means it's ineffective career-wise for academic computer scientists ● But it works – Two independent assessments have validated the impact of what we do for 1/3-2/3 of learners – That's at least a six-fold increase in the number of scientists who can get work done without karōshi 41
  42. 42. Majestic Equality Anatole France (1844-1924) “The law, in its majestic equality, forbids the rich and poor alike to sleep under bridges, to beg in the streets, and to steal bread.” 42
  43. 43. Majestic Equality Anatole France (1844-1924) “The law, in its majestic equality, forbids the rich and poor alike to sleep under bridges, to beg in the streets, and to steal bread.” Thanks to modern computers, every scientist can now devote her working life to installation and configuration issues. 43
  44. 44. If you build a man a fire, you'll keep him warm for a night. If you set a man on fire, you'll keep him warm for the rest of his life. — Terry Pratchett 44
  45. 45. Host a Workshop 45
  46. 46. Teach a Workshop ● All our materials are CC-BY licensed ● We will help train you 46
  47. 47. Shine Some Light ● Include a few lines about your software stack and working practices in your scientific papers ● Ask about them when reviewing – Where's the repository containing the code? – What's the coverage of your unit tests? – How did you track provenance? 47
  48. 48. Be the Change You Want to See Would you rather increase the productivity of the top 5% of scientists ten-fold? Or double the productivity of the other 95%? 48
  49. 49. Be the Change You Want to See Would you rather increase the productivity of the top 5% of scientists ten-fold? Or double the productivity of the other 95%? 49
  50. 50. Be the Change You Want to See Would you rather see your best ideas left on the shelf because they're out of reach? Or raise a generation who can all do the things we think are exciting? 50
  51. 51. http://software-carpentry.org 51

Más Contenido Relacionado

Libros relacionados

Gratis con una prueba de 30 días de Scribd

Ver todo

Audiolibros relacionados

Gratis con una prueba de 30 días de Scribd

Ver todo

×