This document discusses the need for data science skills and proposes a curriculum to address the skills gap. It notes that the web has evolved from static HTML to user-generated content and now machines understanding information. Current jobs require data analysis, idea generation, and hypothesis testing skills. A study found enterprises have major skills gaps in mobile, cloud, social and analytics technologies. The proposed curriculum aims to directly teach needed skills while keeping students engaged. Core classes focus on algorithms, systems, architecture, and machine intelligence. The curriculum is designed to bridge undergraduate and graduate programs and use Python to keep students engaged with hands-on projects. A future data science graduate program is outlined focusing on data engineering, networks, visualization, scalable systems, big data
1. Agile Data Science
Dr. Ahmet Bulut (ahmetbulut@sehir.edu.tr)
Istanbul Sehir University, Istanbul, Turkey
2. Web ...
• In the nineties, the Web served lots of static
HTML pages created by a small set of people at
select institutions and news agencies.
• 21st century: the number of contributors and the
amount of information has skyrocketed with the
rise of platforms that enable rapid collaboration
and personal contribution.
• Web 3.0: MACHINES understanding, generating,
and consuming information.
3. Skills Required!
• Current environment awash with data.
• Skills needed from undergraduates:
(i) data analysis,
(ii) idea generation, and
(iii) hypothesis testing.
• Raise awareness at K-12 level of what kind of
undergraduate skills is being forged at the
universities.
• Skill need pressure will percolate down into K12.
4. Skills Gap
• A recent IBM study highlights that roughly
1/4 of enterprises report having major skill
gaps in four pivotal emerging technologies:
(i) Mobile Computing,
(ii) Cloud Computing,
(iii) Social Business, and
(iv) Business Analytics.
Source: IBM developerWorks and IBM Center for Applied Insights, Tech Trends Study,
November 2012.
6. Our “bridging” solution
• In order to connect the
academia and industry:
a core set of classes that are
designed to educate in
areas where the faculty
indicated as the most
important skill set needed
during the years personally
spent in the industry.
7. Curriculum flavor
• NO (-) to Programming Languages class.
• YES (+) to broader Systems class.
• the idea is to teach students how to run a web
application on top of a database that may be
distributed for handling increasing load or for
enabling rapid data warehousing.
• the goal is to expose students or drop them in the
ocean (not in a sandbox environment).
8. Key design principles
• (1): Leave little room for bloating the curriculum
with unnecessary classes.
• (2): Bridge the gap between undergraduate and
graduate programs.
• (3): Keep students engaged at all times. Pick a
programming language for instruction that is
versatile and agile.
9. Realization
• (1): Leave little room for bloating the curriculum
with unnecessary classes.
ALGORITHMS
SYSTEMS
ARCHITECTURE
MACHINE
INTELLIGENCE
SOFTWARE
10. Realization
• (2): Bridge the gap between undergraduate and
graduate programs.
Graduate Program
Data Engineering
...
...
...
...
...
Undergraduate Program
Programming Practice
...
...
Dilute...
...
...
11. Realization
• (3): Keep students engaged at all times. Pick a
programming language for instruction that is
versatile and agile.
python
12. Fruits
• Spring’ 13 - Programming Practice Class Projects:
Project
Description
Movie Recommendation System
Apply collaborative filtering learned in class on
Netflix dataset.
News Filter
Provide news from multiple news sites in a form
that is easy to digest. Use classification and
textual properties to categorize data.
Tweetpy
Capture the relationship between social media
and stock prices. Use statistics gathered to see if it
can be used to predict the stock price. Use SQLite
or Pickle to store data.
14. Future: Data Science Grad Program
(1) Data Engineering: Information retrieval and data engineering on practical
applications.
(2) Networks: Graph & Game theoretic analysis of Web, Social Networks, and
Sponsored Search Markets.
(3) Data Visualization: Techniques to visualize high-dimensional data for
insight discovery.
(4) Scalable Systems: How to build consumer facing Web systems that can scale.
(5) Big Data Analysis: Tools used for analyzing Big Data.
(6) Probabilistic Graphical Networks: Establish relationships between entities
and objects for probabilistic inference.
(7) Machine Learning: Theory behind well-established classification, regression,
and clustering methodologies.
(8) Linear Dynamical Systems: Representation of dynamic systems in state
space to understand their evolution over time.
(9) Optimization: Techniques used to optimize real world problems with real
constraints.
15. Thank you!
• Dr. Ahmet Bulut
Department of Computer Science
Istanbul Sehir University
34660 Istanbul, Turkey
e-mail: ahmetbulut@sehir.edu.tr
phone: +90 216 559 9089