SlideShare una empresa de Scribd logo
1 de 22
Musings on Data Science and
Students Experiencing Data Analytics
New England SENCER Center for Innovation
Prof. Randy Paffenroth
Data Science Program
Department of Mathematical Sciences
Worcester Polytechnic Institute
rcpaffenroth@wpi.edu
2014
My Research
"Internet Connectivity Access layer" by User:Ludovic.ferre -
Internet_Connectivity_Overview2_Access.svg. Licensed under Creative
Commons Attribution-Share Alike 3.0 via Wikimedia Commons -
http://commons.wikimedia.org/wiki/File:Internet_Connectivity_Access_layer
.svg#mediaviewer/File:Internet_Connectivity_Access_layer.svg
This is a panel, so I want to be
provocative!
Provocative
Adjective
1. tending or serving to provoke; inciting,
stimulating, irritating, or vexing.
So, I will be a little sad if I don’t end up irritating
anyone 
The first war: Terminology
• Analyzing data has a long history!
• There have been many terms that have been
used to describe such endeavors:
• Statistics
• Artificial Intelligence
• Machine learning
• Data analytics
• Since I happen to work in a “Data Science”
program perhaps I may be allowed the
indulgence of using that terminology…
Whatever we call it, what makes
things different now?
Experiments, observations, and numerical simulations in many
areas of science and business are currently generating terabytes
of data, and in some cases are on the verge of generating
petabytes and beyond. Analyses of the information contained in
these data sets have already led to major breakthroughs in fields
ranging from genomics to astronomy and high-energy physics and
to the development of new information-based industries.
- Frontiers in Massive Data Analysis, National Research Council of the National Academies
Given a large mass of data, we can by judicious selection
construct perfectly plausible unassailable theories—all of
which, some of which, or none of which may be right.
- Paul Arnold Srere
The ability to take data—to be able to understand it, to process it, to
extract value from it, to visualize it, to communicate it—that’s going to
be a hugely important skill in the next decades, not only at the
professional level but even at the educational level for elementary
school kids, for high school kids, for college kids. Because now we
really do have essentially free and ubiquitous data. So the
complimentary scarce factor is the ability to understand that data and
extract value from it.
-Hal Varian, Google's Chief Economist, http://www.mckinsey.com/insights/innovation/hal_varian_on_how_the_web_challenges_managers
My personal goal: Getting students to be able to
think critically about data.
What is Big Data?
 The are many examples of "data", but what makes some of
it “big”? The classic definition revolves around the three
Vs.
 Volume, velocity, and variety.
 Volume: There is a just a lot of it being generated all
the time. Things get interesting and “big”, when you
can’t fit it all on one computer anymore. Why? There
are many ideas here such as MapReduce, Hadoop, etc.
that all revolve around being able to process data that
goes from Terabytes, to Petabytes, to Exabytes.
 Velocity: Data is being generated very quickly. Can
you even store it all? If not, then what do you get rid of
and what do you keep?
 Variety: The data types you mention all take different
shapes. What does it mean to store them so that you
can play with or compare them?
http://pl.wikipedia.org
/wiki/Green_Giant#m
ediaviewer/Plik:Jolly_
green_giant.jpg
Is Big Data the same as Data
Science?
 Are Big Data and Data Science the same thing?
 I wouldn't say so...
 Data Science can be done on small data sets.
 And not everything done using Big Data would
necessarily be called Data Science.
Big Data
Data
Science
Is Big Data the same as Data
Science?
 Are Big Data and Data Science the same thing?
 I wouldn't say so...
 Data Science can be done on small data sets.
 And not everything done using Big Data would
necessarily be called Data Science.
 But there certainly is a substantial overlap!
Big Data
Data
Science
Can you even be certain?
 For real world problems, I
claim that you will never be
certain of any inferences from
data.
 I mean, what happens to your
carefully thought out marketing
plan for some rocking slacks
when the Martians land.
 What is unacceptable is when
the data you actually have
does not support the
conclusion you report.
Public domain image
It can be easy to fool yourself!
Human beings are really
good at pattern
detection...
http://en.wikipedia.org/wiki/Cydonia_(region_of_Mars)
Perhaps a bit too good!
It can be easy to fool yourself!
http://en.wikipedia.org/wiki/Cydonia_(region_of_Mars)
Skills for Data Science
http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
Which is most important?
http://en.wikipedia.org/wiki/View_of_the_World_from_9th_Avenue
http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
WPI Data Science Program:
A Collaboration
Business School
Computer
Science
Department
Mathematical
Sciences
Department
M.S. in Data Science Program
INTEGRATIVE DATA SCIENCE (3 CREDITS)
GRADUATE QUALIFYING PROJECT OR MS THESIS
(3 TO 9 CREDITS)
MATHEMATICAL
ANALYTICS
(3 CREDITS)
DATA ACCESS &
MANAGEMENT
(3 CREDITS)
DATA
ANALYTICS &
MINING
(3 CREDITS)
BUSINESS
INTELLIGENCE &
CASE STUDIES
(3 CREDITS)
CONCENTRATION AND ELECTIVES
(9 TO 15 CREDITS)
Data Science Core
I N T E G R AT I V E D ATA S C I E N C E :
D S 5 0 1 I N T R O D U C T I O N T O D ATA S C I E N C E ( N E W C O U R S E )
M AT H E M AT I C A L A N A LY T I C S ( S E L E C T O N E ) :
M A 5 4 3 / D S 5 0 2 S TAT I S T I C A L M E T H O D S F O R D ATA S C I E N C E ( N E W
C O U R S E )
M A 5 4 2 R E G R E S S I O N A N A LY S I S
M A 5 5 4 A P P L I E D M U LT I V A R I AT E A N A LY S I S
D ATA A C C E S S A N D M A N A G E M E N T ( S E L E C T O N E ) :
C S 5 4 2 D ATA B A S E M A N A G E M E N T S Y S T E M S
M I S 5 7 1 D ATA B A S E A P P L I C AT I O N S D E V E L O P M E N T
C S 5 6 1 A D V A N C E D T O P I C S I N D ATA B A S E S Y S T E M S
C S 5 8 5 / D S 5 0 3 B I G D ATA M A N A G E M E N T ( N E W C O U R S E )
D ATA A N A LY T I C S A N D M I N I N G ( S E L E C T O N E ) :
C S 5 4 8 K N O W L E D G E D I S C O V E R Y A N D D ATA M I N I N G
C S 5 3 9 M A C H I N E L E A R N I N G
C S 5 8 6 / D S 5 0 4 B I G D ATA A N A LY T I C S ( N E W C O U R S E )
B U S I N E S S I N T E L L I G E N C E A N D C A S E S T U D I E S ( S E L E C T O N E ) :
M I S 5 8 4 B U S I N E S S I N T E L L I G E N C E
M K T 5 6 8 D ATA M I N I N G B U S I N E S S A P P L I C AT I O N S
Data Science Certificate
Program (18 credits);
• 15 CREDIT DATA SCIENCE CORE
plus
• 3 CREDIT ELECTIVE
2014 Data Science Cohort
NATIONALITY
C A M B O D I A
I N D I A
C H I N A
P A K I S T A N
T A I W A N
I R A N
U . S . A .
B R A Z I L
N E P A L
A F G H A N I S T A N
I N D O N E S I A
EDUCATIONAL FOUNDATION
QUANTITATIVE/ COMPUTATIONAL
BACKGROUNDS
PROGRAMMING WITH DATA STRUCTURES
AND ALGORITHMS FOR COMPUTATIONAL
SKILLS
QUANTITATIVE SKILLS
CALCULUS, LINEAR ALGEBRA AND
STATISTICS
EMPLOYMENT HISTORIES
SENIOR RESEARCH ANALYST
SENIOR BUSINESS ANALYST
PATIENT FINANCIAL SERVICES
DATA BASE ANALYST-ARCHITECT
DECISION SCIENTIST
MINISTRY OF FINANCE
LAHEY HEALTH
TECHNICAL PROGRAM MANAGEMENT
U.S. DEPARTMENT OF STATE
66.70% Male
33.3% Female
GENDER
10%
FULBRIGHT
SCHOLARS
2014 Data Science Cohort
FALL 2014
Total Applicants 126
Total acceptances 33
Fulbright Scholars 3
Brazil Science Mobility Student 1
Countries Represented 9
Domestic Students 5
International Students 28
Many hold more than one earned Bachelor’s Degree
US Universities include Columbia, UNH and WPI
Dean Oates gave two Awards of $5K to outstanding
students.
These awards help attract top students.
Skills Acquired by Our Students
Fundamental/Technical :
SQL/ Data Modeling / Cleaning
Data Integration / Warehousing
Statistical Learning / Machine Learning
Distributed Computing
Big Data Management
Classif./Regression/DecisionTrees
Business Intelligence
Distributed Mining Algorithms
Professional Skills:
Business Use Cases / Entrepreneurship
Interdisciplinary Teams / Leadership
Tools :
Oracle /MySQL/DB2/SQLServer
R / SAS / SciKit
Weka /RapidMiner /MatLab
IBM Cognos / SPSS Modeler
Hadoop / Mahout / Cassandra
Python / Java / Cloud Computing
Storm / Sparc / InfoSphere Streams
Spotfire / Tableaux
Professional Skills:
Story Telling / Visualization
Presentations / Reports
Data Science Tools for Students:
Free!
Software:
•Python
•http://www.python.org/
• iPython: http://ipython.org/
• Numpy: http://www.numpy.org/
• Pandas: http://pandas.pydata.org/
• Matplotlib: http://matplotlib.org/
• Mayavi: http://mayavi.sourceforge.net/
• Scikit-learn: http://scikit-
learn.org/stable/
Data:
•UCI Machine learning
repository
• http://archive.ics.uci.edu/ml/
•Kaggle
• https://www.kaggle.com/
•U.S. Government
• https://www.data.gov/

Más contenido relacionado

Similar a SENCER_panel.ppt

Getting to Know Your Data with R
Getting to Know Your Data with RGetting to Know Your Data with R
Getting to Know Your Data with RStephen Withington
 
Ellicium Solutions - Making Data Science Work
Ellicium  Solutions - Making Data Science Work Ellicium  Solutions - Making Data Science Work
Ellicium Solutions - Making Data Science Work Ellicium Solutions Inc.
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabadKelly Technologies
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
 
Challenges in Analytics for BIG Data
Challenges in Analytics for BIG DataChallenges in Analytics for BIG Data
Challenges in Analytics for BIG DataPrasant Misra
 
Data Modelling Fundamentals course 3 day synopsis
Data Modelling Fundamentals course 3 day synopsisData Modelling Fundamentals course 3 day synopsis
Data Modelling Fundamentals course 3 day synopsisChristopher Bradley
 
A data view of the data science process
A data view of the data science processA data view of the data science process
A data view of the data science processMathieu d'Aquin
 
Technology Trends, Consumer Experience @MICA 2016
Technology Trends, Consumer Experience @MICA 2016Technology Trends, Consumer Experience @MICA 2016
Technology Trends, Consumer Experience @MICA 2016Ravi Pal
 
Mirko Lorenz Data Driven Journalism Overview Seminar Ordine dei Giornalisti d...
Mirko Lorenz Data Driven Journalism Overview Seminar Ordine dei Giornalisti d...Mirko Lorenz Data Driven Journalism Overview Seminar Ordine dei Giornalisti d...
Mirko Lorenz Data Driven Journalism Overview Seminar Ordine dei Giornalisti d...Massimiliano Crosato
 
Big Data for International Development
Big Data for International DevelopmentBig Data for International Development
Big Data for International DevelopmentAlex Rascanu
 
Pandas, Data Wrangling & Data Science
Pandas, Data Wrangling & Data SciencePandas, Data Wrangling & Data Science
Pandas, Data Wrangling & Data ScienceKrishna Sankar
 
Who is a data scientist
Who is a data scientist  Who is a data scientist
Who is a data scientist prateek kumar
 
What does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearn
What does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearnWhat does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearn
What does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearnPraj H
 
Computational Thinking & STEM = PBL in action
Computational Thinking & STEM = PBL in actionComputational Thinking & STEM = PBL in action
Computational Thinking & STEM = PBL in actionSusan S. Wells
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 

Similar a SENCER_panel.ppt (20)

Getting to Know Your Data with R
Getting to Know Your Data with RGetting to Know Your Data with R
Getting to Know Your Data with R
 
Ellicium Solutions - Making Data Science Work
Ellicium  Solutions - Making Data Science Work Ellicium  Solutions - Making Data Science Work
Ellicium Solutions - Making Data Science Work
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
 
Interview
InterviewInterview
Interview
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
Small data big impact
Small data big impactSmall data big impact
Small data big impact
 
Challenges in Analytics for BIG Data
Challenges in Analytics for BIG DataChallenges in Analytics for BIG Data
Challenges in Analytics for BIG Data
 
Data Modelling Fundamentals course 3 day synopsis
Data Modelling Fundamentals course 3 day synopsisData Modelling Fundamentals course 3 day synopsis
Data Modelling Fundamentals course 3 day synopsis
 
A data view of the data science process
A data view of the data science processA data view of the data science process
A data view of the data science process
 
Vikram emerging technologies
Vikram emerging technologiesVikram emerging technologies
Vikram emerging technologies
 
Technology Trends, Consumer Experience @MICA 2016
Technology Trends, Consumer Experience @MICA 2016Technology Trends, Consumer Experience @MICA 2016
Technology Trends, Consumer Experience @MICA 2016
 
KOHN.ppt
KOHN.pptKOHN.ppt
KOHN.ppt
 
KOHN.ppt
KOHN.pptKOHN.ppt
KOHN.ppt
 
Mirko Lorenz Data Driven Journalism Overview Seminar Ordine dei Giornalisti d...
Mirko Lorenz Data Driven Journalism Overview Seminar Ordine dei Giornalisti d...Mirko Lorenz Data Driven Journalism Overview Seminar Ordine dei Giornalisti d...
Mirko Lorenz Data Driven Journalism Overview Seminar Ordine dei Giornalisti d...
 
Big Data for International Development
Big Data for International DevelopmentBig Data for International Development
Big Data for International Development
 
Pandas, Data Wrangling & Data Science
Pandas, Data Wrangling & Data SciencePandas, Data Wrangling & Data Science
Pandas, Data Wrangling & Data Science
 
Who is a data scientist
Who is a data scientist  Who is a data scientist
Who is a data scientist
 
What does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearn
What does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearnWhat does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearn
What does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearn
 
Computational Thinking & STEM = PBL in action
Computational Thinking & STEM = PBL in actionComputational Thinking & STEM = PBL in action
Computational Thinking & STEM = PBL in action
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 

Más de nagarajan740445

principles of design thinking and start a new business in bengaluru.pptx
principles of design thinking and start a new business in bengaluru.pptxprinciples of design thinking and start a new business in bengaluru.pptx
principles of design thinking and start a new business in bengaluru.pptxnagarajan740445
 
how to start the MSME business in India.pptx
how to start the MSME business in India.pptxhow to start the MSME business in India.pptx
how to start the MSME business in India.pptxnagarajan740445
 
digital age mode Industry presentation.pptx
digital age mode Industry presentation.pptxdigital age mode Industry presentation.pptx
digital age mode Industry presentation.pptxnagarajan740445
 
Statistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptxStatistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptxnagarajan740445
 
Inroduction to ERP system core functions and challenages.pptx
Inroduction to ERP system core functions and challenages.pptxInroduction to ERP system core functions and challenages.pptx
Inroduction to ERP system core functions and challenages.pptxnagarajan740445
 
MDD in CAP (Saundra Stock).ppt
MDD in CAP (Saundra Stock).pptMDD in CAP (Saundra Stock).ppt
MDD in CAP (Saundra Stock).pptnagarajan740445
 
Intestinal Obstruction (1).ppt
Intestinal Obstruction (1).pptIntestinal Obstruction (1).ppt
Intestinal Obstruction (1).pptnagarajan740445
 
marketing analytics 1.pptx
marketing analytics 1.pptxmarketing analytics 1.pptx
marketing analytics 1.pptxnagarajan740445
 
first rule of marketing analytics forget about the customer.pptx
first rule of marketing analytics  forget about the customer.pptxfirst rule of marketing analytics  forget about the customer.pptx
first rule of marketing analytics forget about the customer.pptxnagarajan740445
 
marketing analytics.pptx
marketing  analytics.pptxmarketing  analytics.pptx
marketing analytics.pptxnagarajan740445
 
BUSINESS_ANALYTICS_ppt.ppt
BUSINESS_ANALYTICS_ppt.pptBUSINESS_ANALYTICS_ppt.ppt
BUSINESS_ANALYTICS_ppt.pptnagarajan740445
 
Tamil Nadul List of Doctors-2020.pdf
Tamil Nadul List of Doctors-2020.pdfTamil Nadul List of Doctors-2020.pdf
Tamil Nadul List of Doctors-2020.pdfnagarajan740445
 
malabsorptionsyndrome-141120082515-conversion-gate02.pdf
malabsorptionsyndrome-141120082515-conversion-gate02.pdfmalabsorptionsyndrome-141120082515-conversion-gate02.pdf
malabsorptionsyndrome-141120082515-conversion-gate02.pdfnagarajan740445
 

Más de nagarajan740445 (20)

principles of design thinking and start a new business in bengaluru.pptx
principles of design thinking and start a new business in bengaluru.pptxprinciples of design thinking and start a new business in bengaluru.pptx
principles of design thinking and start a new business in bengaluru.pptx
 
how to start the MSME business in India.pptx
how to start the MSME business in India.pptxhow to start the MSME business in India.pptx
how to start the MSME business in India.pptx
 
digital age mode Industry presentation.pptx
digital age mode Industry presentation.pptxdigital age mode Industry presentation.pptx
digital age mode Industry presentation.pptx
 
Statistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptxStatistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptx
 
scorpio case study.pptx
scorpio case study.pptxscorpio case study.pptx
scorpio case study.pptx
 
geetha 1SP21BA009.pptx
geetha 1SP21BA009.pptxgeetha 1SP21BA009.pptx
geetha 1SP21BA009.pptx
 
gagana ppt 1.pptx
gagana ppt 1.pptxgagana ppt 1.pptx
gagana ppt 1.pptx
 
SCM + PUF_Day 3.pptx
SCM + PUF_Day 3.pptxSCM + PUF_Day 3.pptx
SCM + PUF_Day 3.pptx
 
Inroduction to ERP system core functions and challenages.pptx
Inroduction to ERP system core functions and challenages.pptxInroduction to ERP system core functions and challenages.pptx
Inroduction to ERP system core functions and challenages.pptx
 
MDD in CAP (Saundra Stock).ppt
MDD in CAP (Saundra Stock).pptMDD in CAP (Saundra Stock).ppt
MDD in CAP (Saundra Stock).ppt
 
Intestinal Obstruction (1).ppt
Intestinal Obstruction (1).pptIntestinal Obstruction (1).ppt
Intestinal Obstruction (1).ppt
 
marketing analytics 1.pptx
marketing analytics 1.pptxmarketing analytics 1.pptx
marketing analytics 1.pptx
 
first rule of marketing analytics forget about the customer.pptx
first rule of marketing analytics  forget about the customer.pptxfirst rule of marketing analytics  forget about the customer.pptx
first rule of marketing analytics forget about the customer.pptx
 
marketing analytics.pptx
marketing  analytics.pptxmarketing  analytics.pptx
marketing analytics.pptx
 
Cardiac.pptx
Cardiac.pptxCardiac.pptx
Cardiac.pptx
 
NERCOMPfinal_jfg.ppt
NERCOMPfinal_jfg.pptNERCOMPfinal_jfg.ppt
NERCOMPfinal_jfg.ppt
 
Data Analytics .pptx
Data Analytics .pptxData Analytics .pptx
Data Analytics .pptx
 
BUSINESS_ANALYTICS_ppt.ppt
BUSINESS_ANALYTICS_ppt.pptBUSINESS_ANALYTICS_ppt.ppt
BUSINESS_ANALYTICS_ppt.ppt
 
Tamil Nadul List of Doctors-2020.pdf
Tamil Nadul List of Doctors-2020.pdfTamil Nadul List of Doctors-2020.pdf
Tamil Nadul List of Doctors-2020.pdf
 
malabsorptionsyndrome-141120082515-conversion-gate02.pdf
malabsorptionsyndrome-141120082515-conversion-gate02.pdfmalabsorptionsyndrome-141120082515-conversion-gate02.pdf
malabsorptionsyndrome-141120082515-conversion-gate02.pdf
 

Último

JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...anjaliyadav012327
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...Pooja Nehwal
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxShobhayan Kirtania
 

Último (20)

JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
 

SENCER_panel.ppt

  • 1. Musings on Data Science and Students Experiencing Data Analytics New England SENCER Center for Innovation Prof. Randy Paffenroth Data Science Program Department of Mathematical Sciences Worcester Polytechnic Institute rcpaffenroth@wpi.edu 2014
  • 2. My Research "Internet Connectivity Access layer" by User:Ludovic.ferre - Internet_Connectivity_Overview2_Access.svg. Licensed under Creative Commons Attribution-Share Alike 3.0 via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:Internet_Connectivity_Access_layer .svg#mediaviewer/File:Internet_Connectivity_Access_layer.svg
  • 3. This is a panel, so I want to be provocative! Provocative Adjective 1. tending or serving to provoke; inciting, stimulating, irritating, or vexing. So, I will be a little sad if I don’t end up irritating anyone 
  • 4. The first war: Terminology • Analyzing data has a long history! • There have been many terms that have been used to describe such endeavors: • Statistics • Artificial Intelligence • Machine learning • Data analytics • Since I happen to work in a “Data Science” program perhaps I may be allowed the indulgence of using that terminology…
  • 5. Whatever we call it, what makes things different now?
  • 6. Experiments, observations, and numerical simulations in many areas of science and business are currently generating terabytes of data, and in some cases are on the verge of generating petabytes and beyond. Analyses of the information contained in these data sets have already led to major breakthroughs in fields ranging from genomics to astronomy and high-energy physics and to the development of new information-based industries. - Frontiers in Massive Data Analysis, National Research Council of the National Academies Given a large mass of data, we can by judicious selection construct perfectly plausible unassailable theories—all of which, some of which, or none of which may be right. - Paul Arnold Srere
  • 7. The ability to take data—to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it—that’s going to be a hugely important skill in the next decades, not only at the professional level but even at the educational level for elementary school kids, for high school kids, for college kids. Because now we really do have essentially free and ubiquitous data. So the complimentary scarce factor is the ability to understand that data and extract value from it. -Hal Varian, Google's Chief Economist, http://www.mckinsey.com/insights/innovation/hal_varian_on_how_the_web_challenges_managers My personal goal: Getting students to be able to think critically about data.
  • 8. What is Big Data?  The are many examples of "data", but what makes some of it “big”? The classic definition revolves around the three Vs.  Volume, velocity, and variety.  Volume: There is a just a lot of it being generated all the time. Things get interesting and “big”, when you can’t fit it all on one computer anymore. Why? There are many ideas here such as MapReduce, Hadoop, etc. that all revolve around being able to process data that goes from Terabytes, to Petabytes, to Exabytes.  Velocity: Data is being generated very quickly. Can you even store it all? If not, then what do you get rid of and what do you keep?  Variety: The data types you mention all take different shapes. What does it mean to store them so that you can play with or compare them? http://pl.wikipedia.org /wiki/Green_Giant#m ediaviewer/Plik:Jolly_ green_giant.jpg
  • 9. Is Big Data the same as Data Science?  Are Big Data and Data Science the same thing?  I wouldn't say so...  Data Science can be done on small data sets.  And not everything done using Big Data would necessarily be called Data Science. Big Data Data Science
  • 10. Is Big Data the same as Data Science?  Are Big Data and Data Science the same thing?  I wouldn't say so...  Data Science can be done on small data sets.  And not everything done using Big Data would necessarily be called Data Science.  But there certainly is a substantial overlap! Big Data Data Science
  • 11. Can you even be certain?  For real world problems, I claim that you will never be certain of any inferences from data.  I mean, what happens to your carefully thought out marketing plan for some rocking slacks when the Martians land.  What is unacceptable is when the data you actually have does not support the conclusion you report. Public domain image
  • 12. It can be easy to fool yourself! Human beings are really good at pattern detection... http://en.wikipedia.org/wiki/Cydonia_(region_of_Mars) Perhaps a bit too good!
  • 13. It can be easy to fool yourself! http://en.wikipedia.org/wiki/Cydonia_(region_of_Mars)
  • 14. Skills for Data Science http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
  • 15. Which is most important? http://en.wikipedia.org/wiki/View_of_the_World_from_9th_Avenue http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
  • 16. WPI Data Science Program: A Collaboration Business School Computer Science Department Mathematical Sciences Department
  • 17. M.S. in Data Science Program INTEGRATIVE DATA SCIENCE (3 CREDITS) GRADUATE QUALIFYING PROJECT OR MS THESIS (3 TO 9 CREDITS) MATHEMATICAL ANALYTICS (3 CREDITS) DATA ACCESS & MANAGEMENT (3 CREDITS) DATA ANALYTICS & MINING (3 CREDITS) BUSINESS INTELLIGENCE & CASE STUDIES (3 CREDITS) CONCENTRATION AND ELECTIVES (9 TO 15 CREDITS)
  • 18. Data Science Core I N T E G R AT I V E D ATA S C I E N C E : D S 5 0 1 I N T R O D U C T I O N T O D ATA S C I E N C E ( N E W C O U R S E ) M AT H E M AT I C A L A N A LY T I C S ( S E L E C T O N E ) : M A 5 4 3 / D S 5 0 2 S TAT I S T I C A L M E T H O D S F O R D ATA S C I E N C E ( N E W C O U R S E ) M A 5 4 2 R E G R E S S I O N A N A LY S I S M A 5 5 4 A P P L I E D M U LT I V A R I AT E A N A LY S I S D ATA A C C E S S A N D M A N A G E M E N T ( S E L E C T O N E ) : C S 5 4 2 D ATA B A S E M A N A G E M E N T S Y S T E M S M I S 5 7 1 D ATA B A S E A P P L I C AT I O N S D E V E L O P M E N T C S 5 6 1 A D V A N C E D T O P I C S I N D ATA B A S E S Y S T E M S C S 5 8 5 / D S 5 0 3 B I G D ATA M A N A G E M E N T ( N E W C O U R S E ) D ATA A N A LY T I C S A N D M I N I N G ( S E L E C T O N E ) : C S 5 4 8 K N O W L E D G E D I S C O V E R Y A N D D ATA M I N I N G C S 5 3 9 M A C H I N E L E A R N I N G C S 5 8 6 / D S 5 0 4 B I G D ATA A N A LY T I C S ( N E W C O U R S E ) B U S I N E S S I N T E L L I G E N C E A N D C A S E S T U D I E S ( S E L E C T O N E ) : M I S 5 8 4 B U S I N E S S I N T E L L I G E N C E M K T 5 6 8 D ATA M I N I N G B U S I N E S S A P P L I C AT I O N S Data Science Certificate Program (18 credits); • 15 CREDIT DATA SCIENCE CORE plus • 3 CREDIT ELECTIVE
  • 19. 2014 Data Science Cohort NATIONALITY C A M B O D I A I N D I A C H I N A P A K I S T A N T A I W A N I R A N U . S . A . B R A Z I L N E P A L A F G H A N I S T A N I N D O N E S I A EDUCATIONAL FOUNDATION QUANTITATIVE/ COMPUTATIONAL BACKGROUNDS PROGRAMMING WITH DATA STRUCTURES AND ALGORITHMS FOR COMPUTATIONAL SKILLS QUANTITATIVE SKILLS CALCULUS, LINEAR ALGEBRA AND STATISTICS EMPLOYMENT HISTORIES SENIOR RESEARCH ANALYST SENIOR BUSINESS ANALYST PATIENT FINANCIAL SERVICES DATA BASE ANALYST-ARCHITECT DECISION SCIENTIST MINISTRY OF FINANCE LAHEY HEALTH TECHNICAL PROGRAM MANAGEMENT U.S. DEPARTMENT OF STATE 66.70% Male 33.3% Female GENDER 10% FULBRIGHT SCHOLARS
  • 20. 2014 Data Science Cohort FALL 2014 Total Applicants 126 Total acceptances 33 Fulbright Scholars 3 Brazil Science Mobility Student 1 Countries Represented 9 Domestic Students 5 International Students 28 Many hold more than one earned Bachelor’s Degree US Universities include Columbia, UNH and WPI Dean Oates gave two Awards of $5K to outstanding students. These awards help attract top students.
  • 21. Skills Acquired by Our Students Fundamental/Technical : SQL/ Data Modeling / Cleaning Data Integration / Warehousing Statistical Learning / Machine Learning Distributed Computing Big Data Management Classif./Regression/DecisionTrees Business Intelligence Distributed Mining Algorithms Professional Skills: Business Use Cases / Entrepreneurship Interdisciplinary Teams / Leadership Tools : Oracle /MySQL/DB2/SQLServer R / SAS / SciKit Weka /RapidMiner /MatLab IBM Cognos / SPSS Modeler Hadoop / Mahout / Cassandra Python / Java / Cloud Computing Storm / Sparc / InfoSphere Streams Spotfire / Tableaux Professional Skills: Story Telling / Visualization Presentations / Reports
  • 22. Data Science Tools for Students: Free! Software: •Python •http://www.python.org/ • iPython: http://ipython.org/ • Numpy: http://www.numpy.org/ • Pandas: http://pandas.pydata.org/ • Matplotlib: http://matplotlib.org/ • Mayavi: http://mayavi.sourceforge.net/ • Scikit-learn: http://scikit- learn.org/stable/ Data: •UCI Machine learning repository • http://archive.ics.uci.edu/ml/ •Kaggle • https://www.kaggle.com/ •U.S. Government • https://www.data.gov/