The byproduct of sericulture in different industries.pptx
SENCER_panel.ppt
1. Musings on Data Science and
Students Experiencing Data Analytics
New England SENCER Center for Innovation
Prof. Randy Paffenroth
Data Science Program
Department of Mathematical Sciences
Worcester Polytechnic Institute
rcpaffenroth@wpi.edu
2014
2. My Research
"Internet Connectivity Access layer" by User:Ludovic.ferre -
Internet_Connectivity_Overview2_Access.svg. Licensed under Creative
Commons Attribution-Share Alike 3.0 via Wikimedia Commons -
http://commons.wikimedia.org/wiki/File:Internet_Connectivity_Access_layer
.svg#mediaviewer/File:Internet_Connectivity_Access_layer.svg
3. This is a panel, so I want to be
provocative!
Provocative
Adjective
1. tending or serving to provoke; inciting,
stimulating, irritating, or vexing.
So, I will be a little sad if I don’t end up irritating
anyone
4. The first war: Terminology
• Analyzing data has a long history!
• There have been many terms that have been
used to describe such endeavors:
• Statistics
• Artificial Intelligence
• Machine learning
• Data analytics
• Since I happen to work in a “Data Science”
program perhaps I may be allowed the
indulgence of using that terminology…
6. Experiments, observations, and numerical simulations in many
areas of science and business are currently generating terabytes
of data, and in some cases are on the verge of generating
petabytes and beyond. Analyses of the information contained in
these data sets have already led to major breakthroughs in fields
ranging from genomics to astronomy and high-energy physics and
to the development of new information-based industries.
- Frontiers in Massive Data Analysis, National Research Council of the National Academies
Given a large mass of data, we can by judicious selection
construct perfectly plausible unassailable theories—all of
which, some of which, or none of which may be right.
- Paul Arnold Srere
7. The ability to take data—to be able to understand it, to process it, to
extract value from it, to visualize it, to communicate it—that’s going to
be a hugely important skill in the next decades, not only at the
professional level but even at the educational level for elementary
school kids, for high school kids, for college kids. Because now we
really do have essentially free and ubiquitous data. So the
complimentary scarce factor is the ability to understand that data and
extract value from it.
-Hal Varian, Google's Chief Economist, http://www.mckinsey.com/insights/innovation/hal_varian_on_how_the_web_challenges_managers
My personal goal: Getting students to be able to
think critically about data.
8. What is Big Data?
The are many examples of "data", but what makes some of
it “big”? The classic definition revolves around the three
Vs.
Volume, velocity, and variety.
Volume: There is a just a lot of it being generated all
the time. Things get interesting and “big”, when you
can’t fit it all on one computer anymore. Why? There
are many ideas here such as MapReduce, Hadoop, etc.
that all revolve around being able to process data that
goes from Terabytes, to Petabytes, to Exabytes.
Velocity: Data is being generated very quickly. Can
you even store it all? If not, then what do you get rid of
and what do you keep?
Variety: The data types you mention all take different
shapes. What does it mean to store them so that you
can play with or compare them?
http://pl.wikipedia.org
/wiki/Green_Giant#m
ediaviewer/Plik:Jolly_
green_giant.jpg
9. Is Big Data the same as Data
Science?
Are Big Data and Data Science the same thing?
I wouldn't say so...
Data Science can be done on small data sets.
And not everything done using Big Data would
necessarily be called Data Science.
Big Data
Data
Science
10. Is Big Data the same as Data
Science?
Are Big Data and Data Science the same thing?
I wouldn't say so...
Data Science can be done on small data sets.
And not everything done using Big Data would
necessarily be called Data Science.
But there certainly is a substantial overlap!
Big Data
Data
Science
11. Can you even be certain?
For real world problems, I
claim that you will never be
certain of any inferences from
data.
I mean, what happens to your
carefully thought out marketing
plan for some rocking slacks
when the Martians land.
What is unacceptable is when
the data you actually have
does not support the
conclusion you report.
Public domain image
12. It can be easy to fool yourself!
Human beings are really
good at pattern
detection...
http://en.wikipedia.org/wiki/Cydonia_(region_of_Mars)
Perhaps a bit too good!
13. It can be easy to fool yourself!
http://en.wikipedia.org/wiki/Cydonia_(region_of_Mars)
14. Skills for Data Science
http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
15. Which is most important?
http://en.wikipedia.org/wiki/View_of_the_World_from_9th_Avenue
http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
16. WPI Data Science Program:
A Collaboration
Business School
Computer
Science
Department
Mathematical
Sciences
Department
17. M.S. in Data Science Program
INTEGRATIVE DATA SCIENCE (3 CREDITS)
GRADUATE QUALIFYING PROJECT OR MS THESIS
(3 TO 9 CREDITS)
MATHEMATICAL
ANALYTICS
(3 CREDITS)
DATA ACCESS &
MANAGEMENT
(3 CREDITS)
DATA
ANALYTICS &
MINING
(3 CREDITS)
BUSINESS
INTELLIGENCE &
CASE STUDIES
(3 CREDITS)
CONCENTRATION AND ELECTIVES
(9 TO 15 CREDITS)
18. Data Science Core
I N T E G R AT I V E D ATA S C I E N C E :
D S 5 0 1 I N T R O D U C T I O N T O D ATA S C I E N C E ( N E W C O U R S E )
M AT H E M AT I C A L A N A LY T I C S ( S E L E C T O N E ) :
M A 5 4 3 / D S 5 0 2 S TAT I S T I C A L M E T H O D S F O R D ATA S C I E N C E ( N E W
C O U R S E )
M A 5 4 2 R E G R E S S I O N A N A LY S I S
M A 5 5 4 A P P L I E D M U LT I V A R I AT E A N A LY S I S
D ATA A C C E S S A N D M A N A G E M E N T ( S E L E C T O N E ) :
C S 5 4 2 D ATA B A S E M A N A G E M E N T S Y S T E M S
M I S 5 7 1 D ATA B A S E A P P L I C AT I O N S D E V E L O P M E N T
C S 5 6 1 A D V A N C E D T O P I C S I N D ATA B A S E S Y S T E M S
C S 5 8 5 / D S 5 0 3 B I G D ATA M A N A G E M E N T ( N E W C O U R S E )
D ATA A N A LY T I C S A N D M I N I N G ( S E L E C T O N E ) :
C S 5 4 8 K N O W L E D G E D I S C O V E R Y A N D D ATA M I N I N G
C S 5 3 9 M A C H I N E L E A R N I N G
C S 5 8 6 / D S 5 0 4 B I G D ATA A N A LY T I C S ( N E W C O U R S E )
B U S I N E S S I N T E L L I G E N C E A N D C A S E S T U D I E S ( S E L E C T O N E ) :
M I S 5 8 4 B U S I N E S S I N T E L L I G E N C E
M K T 5 6 8 D ATA M I N I N G B U S I N E S S A P P L I C AT I O N S
Data Science Certificate
Program (18 credits);
• 15 CREDIT DATA SCIENCE CORE
plus
• 3 CREDIT ELECTIVE
19. 2014 Data Science Cohort
NATIONALITY
C A M B O D I A
I N D I A
C H I N A
P A K I S T A N
T A I W A N
I R A N
U . S . A .
B R A Z I L
N E P A L
A F G H A N I S T A N
I N D O N E S I A
EDUCATIONAL FOUNDATION
QUANTITATIVE/ COMPUTATIONAL
BACKGROUNDS
PROGRAMMING WITH DATA STRUCTURES
AND ALGORITHMS FOR COMPUTATIONAL
SKILLS
QUANTITATIVE SKILLS
CALCULUS, LINEAR ALGEBRA AND
STATISTICS
EMPLOYMENT HISTORIES
SENIOR RESEARCH ANALYST
SENIOR BUSINESS ANALYST
PATIENT FINANCIAL SERVICES
DATA BASE ANALYST-ARCHITECT
DECISION SCIENTIST
MINISTRY OF FINANCE
LAHEY HEALTH
TECHNICAL PROGRAM MANAGEMENT
U.S. DEPARTMENT OF STATE
66.70% Male
33.3% Female
GENDER
10%
FULBRIGHT
SCHOLARS
20. 2014 Data Science Cohort
FALL 2014
Total Applicants 126
Total acceptances 33
Fulbright Scholars 3
Brazil Science Mobility Student 1
Countries Represented 9
Domestic Students 5
International Students 28
Many hold more than one earned Bachelor’s Degree
US Universities include Columbia, UNH and WPI
Dean Oates gave two Awards of $5K to outstanding
students.
These awards help attract top students.
21. Skills Acquired by Our Students
Fundamental/Technical :
SQL/ Data Modeling / Cleaning
Data Integration / Warehousing
Statistical Learning / Machine Learning
Distributed Computing
Big Data Management
Classif./Regression/DecisionTrees
Business Intelligence
Distributed Mining Algorithms
Professional Skills:
Business Use Cases / Entrepreneurship
Interdisciplinary Teams / Leadership
Tools :
Oracle /MySQL/DB2/SQLServer
R / SAS / SciKit
Weka /RapidMiner /MatLab
IBM Cognos / SPSS Modeler
Hadoop / Mahout / Cassandra
Python / Java / Cloud Computing
Storm / Sparc / InfoSphere Streams
Spotfire / Tableaux
Professional Skills:
Story Telling / Visualization
Presentations / Reports