Oppenheimer Film Discussion for Philosophy and Film
The Ai & I at Work
1. The AI & I at Work
Tarek Hoteit – PhD, IT Director TR Labs at Thomson Reuters
October 19, 2018 – University of Texas in Dallas MIS Club
http://tarek.computer
2.
3. Agenda
• Short history of Data Science & AI
• Data Science and AI together 4 ever
• Practical Techniques for data scientists when using AI
• Real demos for work
4. History of Data Science
1962 John W. Turkey predicted effect of modern-day electronic computing on
data analysis as an empirical science1
5. History of Data Science
1965 “Programma 101” 1st
commercial programmable desktop
calculator
6. History of Data Science
1981 IBM released 1st
personal computer,
followed by Apple in 1983
with GUI
8. History of Data Science
Data Scientists (from Chapter 3: Roles & responsibilities of individuals and institutions )
The interests of data scientists – the information and computer scientists, database and
software engineers and programmers, disciplinary experts, curators and expert annotators,
librarians, archivists, and others, who are crucial to the successful management of a digital
data collection – lie in having their creativity and intellectual contributions fully recognized.
In pursuing these interests, they have the responsibility to:
• conduct creative inquiry and analysis; enhance through consultation, collaboration, and
coordination the ability of others to conduct research and education using digital data
collections;
• be at the forefront in developing innovative concepts in database technology and
information sciences, including methods for data visualization and information
discovery, and applying these in the fields of science and education relevant to the
collection;
• implement best practices and technology; serve as a mentor to beginning or
transitioning investigators, students and others interested in pursuing data science;
• design and implement education and outreach programs that make the benefits of data
collections and digital information science available to the broadest possible range of
researchers, educators, students, and the general public.
2005 National Science Board advocates data science career
9. History of Data Science
2010 data science takes center stage in computer technology / customers
use more technology devices, social media, mobile & machines become
faster
11. 1956 -The 1956 Dartmouth summer research
project on artificial intelligence was initiated
August. 31, 1955
proposal authored by:
J. McCarthy, Dartmouth College
M. L. Minsky, Harvard University
N. Rochester, I.B.M. Corporation
C.E. Shannon, Bell Telephone Laboratories
History of AI
12. 1968 – Space Odyssey 2001 by Stanley Kubrick is
released featuring intelligent computer, HAL 9000.
History of AI
13. 1950 – 60s : reasoning AI, prototypes – high interests
1971: winter AI came up
1980s – 1990s: another hype with expert systems, neural networks,
1990s: AI Winter 2
History of AI
14. Late 90’s 2000’s – hype starts again (Deep Blue beats Kasparov
in chess
2006 – University of Toronto develops deep learning
2011 – Watson wins at Jeopardy
2016 – Alpha Go beats GO champions
History of AI
2017 – Alpha Go Zero beats
Alpha Go 100 to 0 after starting
from scratch
18. We now have two types of Data Scientists
Data Scientists Type A – Analytical
• Focuses on the why
• Heavy on statistics, machine
learning fundamentals, data
wrangling
• Use Python/R, SQL
Data Scientist Type B –
Builder/Machine Learning Engineer
• Focused on creating new
products
• Heavy on machine learning,
software engineering, linear
algebra and differential
equations
• Use Python/Java/Scala, Docker,
cloud computing
Jesse Steinweg-Woods https://www.datascience.com/blog/guide-to-popular-data-science-jobs
19. Common coding grounds? Python favorable among machine
learning and data science jobs
Based on indeed.com last updated late 2017
20. Python & Data Science libraries are heavily used for data
analysis
“The number of Data Scientists is
constantly growing and at the moment
the number of Data Scientists is larger
than the number of Web Developers
among Python users.” – JetBrains
2018 “The State of Developer
Ecosystem Survey in 2018”
https://www.jetbrains.com/research/devec
osystem-2018/python/
21. Note: Java and JavaScript are still the most popular programming
languages for developers but more people continue to learn Python
JetBrains 2018 “The State of
Developer Ecosystem Survey in
2018”
22. To move from Data Scientist Type A to Type B
You need to build a solid
foundation for your data
and move up the pyramid
Monica Rogati “The AI Hierarchy of Needs” https://hackernoon.com/the-ai-hierarchy-of-needs-18f111fcc007
24. Jupyter Notebooks for data scientists
• Python Python Python
• Core modules: NumPy, SciPy,
MatplotLib
• Work environment: Jupyter
Notebooks – works with
Python, R, C++, Julia and more
http://jupyter.org/try
• Anaconda or VirtualEnv to
isolate Python work
environment
25. Complete Coding experience using JupyterLab
• Try Jupyter Labs - next-
generation web-based
user interface
• Pip install jupyterlab or
conda install -c conda-
forge jupyterlab
26. or cloud based development solutions – Google Collab
Free Google Collab https://colab.research.google.com/ . You can leverage their GPUs
27. More useful resources for researchers
• Nurture AI – curated summary of research papers
https://nurture.ai/home
• Auto ML https://cloud.google.com/automl/
• Public Data Search https://www.google.com/publicdata/directory
• Google Dataset Search https://toolbox.google.com/datasetsearch
29. Sentiment Analysis on Twitter using Django,
Docker containers, Python & Google NLP
Twitter API using Tweepy
Python Library & Twitter Dev
Account
Local Docker running
PostGresql database
Django & Python to run and
manage the code and data
Google Cloud Natural
Language Processing SDK to
run sentiment analysis
GITHUB Source Code: https://github.com/hoteit/sentiment-tweets
30. Training Google AutoML for categorizing customer
reviews
Searched for a dataset on
https://toolbox.google.com/data
setsearch
Found “100K+ Scraped Course
Reviews from the Coursera
Website (As of May 2017)
Analyzed the data,
cleaned when necessary
(pretraining step)
Created Google Cloud
AutoML project & activiated
NLP APIs, uploaded data
No AI expertise needed!
Dataset import
Train/Evaluate/Predict model
GitHub Source Code https://github.com/hoteit/coursereviews-automl
31. Fun time with AWS DeepLens - Deep learning-enabled
video camera
Chose a project template on
https://console.aws.amazon.com/deeplens
Registered Deeplens Device
& Deployed Project
Model configured using SageMaker, in this case:
SSD architecture with a ResNet-50 feature extractor
on S3, accessible via Lambda
Image from http://mattturck.com/wp-content/uploads/2018/07/Matt_Turck_FirstMark_Big_Data_Landscape_2018_Final.png
From https://www.kdnuggets.com/2017/01/data-science-puzzle-revisited.html
F
Link https://www.indeed.com/jobtrends/q-python-and-%22machine-learning%22-q-python-and-%22data-science%22-q-%22data-science%22-and-%22machine-learning%22-q-java-and-%22data-science%22-q-javascript-and-%22data-science%22-q-%22R%22-and-%22data-science%22-q-%22R%22-and-%22machine-learning%22.html