This document provides an overview of a course on data literacy and ethics in the lab taught by Chris Wiggins and Matt Jones. The course aims to teach skills and knowledge that are not covered in statistics or social science curricula. It pairs technical skills like Python programming with readings on the political and ethical contexts of data and technology.
The course introduces key hypotheses driving the class, provides a student perspective on the first lecture, and discusses approaches to research ethics. It also previews the course structure, which includes lectures, labs using Jupyter notebooks, and discussions on Slack. The goals are to develop multiple literacies, show how capabilities relate to power dynamics, and consider data technologies throughout history and their social impacts.
Data: Past, Present, and Future (Cornell Digital Life Seminar on Data Literacy & Ethics in the Lab on February 13th, 2018)
1. Data Literacy
and
Ethics in the Lab
Chris Wiggins + Matt Jones
data-ppf.github.io
Course supported by Collaboratory Fellows Fund, Columbia University
(talk presented 2018-02-13 at Digital Life Seminar, Cornell Tech)
2. overview
1. hypotheses driving the class
2. student-eye view of the course (following “lecture 1”)
3. ethics
a. defining vs enforcing
b. research vs industry: two institutional moments
c. curricula extant vs curricula needed
4. show and tell:
a. syllabus/readings
b. paired Python/Jupyter notebooks (readings on monday; Python on wednesdays)
c. Slack (not just for discussions)
4. hypotheses driving the class
1. there is important material being taught neither to future statisticians nor to
future senators
a. outside the technical canon yet also
b. present only at the advanced level in STS, to our knowledge unpaired with technical
engagement
2. multicapabilities [1] are teachable without prerequisite
a. functional: via pre-authored Jupyter notebooks, as in-class labs
b. rhetorical: in-class labs as well as discussion
c. critical: discussions + readings as well as in-class labs
3. pair intellectual changes with political and ethical context
a. what powers motivated this advance?
b. how did this advance rearrange power? (cf. Rogaway) [2]
[1] cf. Selber, S. (2004). Multiliteracies for a digital age. SIU Press.
[2] Rogaway, P. (2015). The Moral Character of Cryptographic Work. IACR Cryptology ePrint Archive, 2015, 1162
5. hypotheses driving the class
1. there is important material being taught neither to future statisticians nor to
future senators
a. outside the technical canon yet also
b. present only at the advanced level in STS, to our knowledge unpaired with technical
engagement
2. multicapabilities [1] are teachable without prerequisite
a. functional: via pre-authored Jupyter notebooks, as in-class labs
b. rhetorical: in-class labs as well as discussion
c. critical: discussions + readings as well as in-class labs
3. pair intellectual changes with political and ethical context
a. what powers motivated this advance?
b. how did this advance rearrange power? (cf. Rogaway) [2]
[1] cf. Selber, S. (2004). Multiliteracies for a digital age. SIU Press.
[2] Rogaway, P. (2015). The Moral Character of Cryptographic Work. IACR Cryptology ePrint Archive, 2015, 1162
17. how did this end up in my news feed?
- math
- hardware
- system
- funding
- market
- regulation
- data
this was not possible 20 years ago.
- why?
- what did people do instead?
23. “Automated Inference on Criminality using Face Images”
(arXiv:1611.04135v1)
“In all cultures and all
periods of recorded human
history, people share the
belief that the face alone
suffices to reveal innate
traits of a person.”
24. We’ve been here before
J Am Inst. Criminal Law, 1912, on Lombroso, 1899
28. Statistical sciences always political
Dream of sciences of social difference
central to development of
statistics
and the
data sciences
29. Florence Nightingale
& Data Visualization
“Experience has shown that
without special information and
skilful application of the resources
of science in preserving health,
the drain on our home population
must exhaust our means. The
introduction, therefore, of a proper
sanitary system into the British
army is of essential importance to
the public interests.” (1858)
30. Florence Nightingale
& Data Visualization
“Upon the British race alone the
integrity of that empire at this
moment appears to depend. The
conquering race must retain
possession.” (1858)
31. Every week:
Scientific and mathematical development
Technologies and engineering
Driving forces: money, prestige, resources, Imperial competition
Power, ethics, and data intensive knowledge
32. Tech story: three chronological stages
Data and Math
Data and Engineering
Data and Technology
33. Data technologies
Census and government survey
Information processing machines and
digital computers
Always on network infrastructure
34. Power
How should social and political order be organized on basis
of science and engineering?
How do technologies transform the social and political order?
How do technologies augment and diminish democratic orders? Autocratic ones?
35. Power and politics*
New technologies mean new capabilities.
These capabilities are first available to those in power
(cf. “The future is already here — it's just not very evenly distributed.” --Gibson)
- How does this distribution of capability reorder power?
- How are data-empowered algorithms an example of this dynamic
■ of capability, and
■ of reinforcing or distributing power?
* politics here meaning the dynamics of power, not to be confused with “voting”
46. Experimental design,
hypothesis tests, and
decision theory
“To play this game with the greatest chance of
success, the experimenter cannot afford to exclude
the possibility of any possible arrangement of soil
fertilities, and his best strategy is to equalize the
chance that any treatment shall fall on any plot by
determining it by chance himself.”
- Joan Fisher
47. World War 2: Turing and statistical cryptography
55. Weekly structure
Monday
Lecture and discussion
Expectation
arrive having done the week’s readings
Wednesday
Laboratory
Expectation
arrive with laptop ready to collaborate
57. Two tracks
more technical background track (60%)
● pursue a semester long project
culminating in a 15pp paper and any
associated code
● complete 3 problem sets
● short final presentation on paper
more humanistic background track (60%)
● write a 10 pp paper on a topic of their
choice
● complete 5 problem sets, these problem
sets will involve both computational work
and writing work
● short final presentation on paper
61. ethics
1. placement of ethics within multicapabilities:
Q: integrate or separate?
A: our approach: lead up to, i.e., foreshadow throughout, via shock and awe, then provide framing
62. ethics
2. A curriculum / ideational setting for:
- granularities
- belmont, menlo park [1]
common rule / IRB
- users+ products
- society+markets [2]
three-party game (including
law/regulation) [3]
define v. enforceindustryv.“research”
[1] Salganik, M. J. (2017). Bit by bit: social research in the digital age. Princeton University Press.
[2] Pasquale, F. (2015). The black box society: The secret algorithms that control money and information.
[3] Janeway, W. H. (2012). Doing capitalism in the innovation economy: markets, speculation and the state.
63. defining via granularities: principles->standards->rules [1]
1. Respect for Persons:
- informed consent; respect for individuals’ autonomy and individuals impacted;
- protection for individuals with diminished autonomy or decision making capability.
2. Beneficence: do not harm; assess risk.
3. Justice: fair distribution of benefits of research; selection of subjects; and allocation of
burdens.
4. Respect for Law and Public Interest:
- legal due diligence;
- transparency in methods and results;
- accountability.
[1] Solum, L. B (2009). Legal theory lexicon: Rules, standards, and principles (blog post).
[2] Dittrich, D., & Kenneally, E. (2012). The Menlo Report: Ethical principles guiding information and
communication technology research. US Department of Homeland Security.
64. 4. show and tell:
a. syllabus
b. paired Python/Jupyter notebooks
c. Slack
65. recap
1. hypotheses driving the class
2. student-eye view of the course (following “lecture 1”)
3. what we talk about when we talk about ethics
a. defining vs enforcing
b. two industrial moments: Research vs industry
c. curricula extant, curricula needed
4. show and tell:
a. syllabus/readings
b. paired Python/Jupyter notebooks (readings on monday; Python on wednesdays)
c. Slack (not just for discussions)
66. for more info, code, syllabus, etc. please see
“data: past present and future” course page
data-ppf.github.io
Course supported by Collaboratory Fellows Fund, Columbia University
68. responses: uses of history
1. “make the present strange” (- J. Grimmelmann)
- emphasize the diverse ways of thinking about the problem from
before it was “settled”: help us see what could have been?
- make human by revealing the human interests and conflicts
2. provides a new way into the technical materials which doesn’t
presume or advantage a particular prior curriculum
3. provides window into the ethical questions
- distance us from our “settled” narrative via the past
- show debates that are different now (e.g., eugenics debates)