This was a talk I gave at SXSW 2016. It outlines the current state of applied ethics in data science as a profession. Describes key reasons a code should be constructed and also proposes a framework to begin discussion.
2. WHAT IS THIS?
‣ Advertisers and ethics… WTF!
‣ What me ethical?
‣ Mapping the code.
‣ Why do this at all?
3. WHAT IS THIS NOT?
‣ An attempt to get you to Tweet about something
‣ A vision for Tim’s perfect future
‣ A shameless plug for any association, business
or way of thinking
11. WHAT IS A DATA SCIENTIST?
‣ Statistics
‣ Data Strategy
‣ Social Science
‣ Coding chops
‣ Good Looks
12. AND WE SEEM TO HAVE MORE AND MORE
OF THEM IN THE WORLD IN GENERAL
13. O’Riley 2015 Data Science Survey
http://duu86o6n09pv.cloudfront.net/reports/2015-data-science-salary-survey.pdf
of +/- 600 respondents
1%
9%
23%
25%
14%
13%
6%
5%
4%
0%
5%
10%
15%
20%
25%
30%
<21 21+25 26+30 31+35 36+40 41+45 46+50 51+55 56<
Percent2of2Respondents
Reported2 Age
THEY ARE ALSO A YOUNG BUNCH
14. AND THAT MAKES SENSE AS
IT IS A YOUNG PROFESSION
1996 Members of the
International Federation of
Classification Societies (IFCS)
meet in Kobe, Japan.
2001 William S. Cleveland
publishes “Data Science: An Action
Plan for Expanding the Technical
Areas of the Field of Statistics.”
FIRST USE OF
“DATA SCIENCE”
THE PAPER THAT
LAUNCHED A 1,000 NERDS
15. MOREOVER, NEW ENTRANTS INTO THE
FIELD ARE NOT GIVEN VERY MUCH
ETHICAL TRAINING
Surveyed Syllabi from 13 Intro to Data Science Courses
16. ONLY THREE HAVE AT LEAST ONE
MENTION OF AN “ETHICS” COMPONENT
IN THE SYLLABUS
23. Earl, I think Data
Science needs a code
of ethics.
Yup.
24. A CODE OF ETHICS WOULD
‣ Establish credibility and responsibility outside
of nerd-dom
‣ Provide a starting point to act as technology
changes
‣ Galvanize the disparate data practitioner
community
28. A TIMELINE OF ETHICAL CODES
EGYPTIAN
CODE OF
MA’AT
JEWISH
TORAH
HIPPOCRATIC
OATH
BUSHIDO
WARRIOR
CODE
PIRATE’S
CODE OF THE
BRETHREN
FRENCH
FOREIGN
LEGION CODE
D'HONNEUR
JOURNALIST’S
CREED
NUREMBURG
CODE
I.R.B. - EXEMPT
COMMON RULE
INTERNATIONAL
STATISTICAL
INSTITUTE
ASSOCIATION
FOR COMPUTING
MACHINERY
AMERICAN
STATISTICAL
ASSOCIATION
DRAFT MODEL
BIOETHICISTS
CODE
~1200 bce~2300 bce ~500 bce 1914~1600
~1000 1831
1999199219811946
1985
2005
increase of professional codes
29. ETHICAL CODES ARE NOT ALL THE SAME
BUT THEY HAVE TWO CLASSES OF
CHARACTERISTICS
Inward
facing goals
Outward
facing goals
30. INWARD FACING GOALS
‣ Provide guidance when norms are not
explicit
‣ Reduce internal conflicts and build a
common purpose
‣ Establish professional behavior
‣ Deter unethical behavior with sanctions and
internal reporting structures
31. OUTWARD FACING GOALS
‣ Protect vulnerable populations who could be
harmed by profession’s activities
‣ Establish the profession as a distinct moral
community worthy of autonomy
‣ Serve as tool for disputes between member
and non-member parties
‣ Create institutions resilient to external
pressures
32. PROMOTE POSITIVE ENFORCEMENT
‣ Accept the distributed nature of
professional communities creates too many
judicial problems for active regulation
‣ Construct the code with consensus
allowing for broad buy-in
‣ Set boundaries and expectations of the
practicing community, allowing for self-
affirming social control mechanisms
33. ‣ Mediate internal group needs and external
community interactions
‣ Adapt to future unknown circumstances
‣ Inspire collective identity supporting
adherence and adoption
OVERALL A PROFESSIONAL
CODE OF ETHICS SHOULD:
34. OKAY PROFESSOR, SO WHAT IS THE
REAL REASON DATA SCIENCE NEEDS
AN ETHICAL CODE?
36. "In economics, moral hazard occurs
when one person takes more risks
because someone else bears the
burden of those risks."
– wikipedia
https://en.wikipedia.org/wiki/Moral_hazard
40. ‣ Connections between data and the people
it represents are very abstracted
‣ Digital creations affect people we never
see
‣ Unintended algorithmic consequences are
almost never known or explored
‣ When was the last time an algorithm ever
“hurt” anybody?
DATA SCIENCE IS STEEPED IN
MORAL HAZARD
43. –Paul Ohm
“Broken Promises of Privacy: Responding to
the Surprising Failure of Anonymization,”
UCLA Law Review 57,p.1702
“Data can be useful
or anonymous,
but never both.”
44. THUS A CODE WOULD NEED
TO MAINTAIN THE UTILITY
OF DATA
WHILE BALANCING
CONTROL OF THAT DATA
45. A FRAMEWORK FOR A CODE IS
COMPOSED OF THREE CLUSTERS
Data Ethics Code
Safety of used
data & analysis
Protection of
subjects
Mathematical
responsibility
Community
Privacy
bio-
information
Business
applications
3rd party
usage
Identity
Ownership Verification
Right to be
forgotten
Incorrect data
correction
46. PRIVACY
‣ Once you buy or sell data what are the ethics around
using it? You did ‘buy it’ right?
3rd party data
‣ What is the relationship between privacy of internet
exploration and advertisement of relevant
products?
Business applications
‣ Is data generated from your body owned differently?
Bio-information
47. COMMUNITY
‣ How do we protect people who our analysis affects
for negative consequences?
Protection of subjects
‣ Is there a system for correct use of professional
tools and continuing education?
Mathematical responsibility
‣ Once data is used how is it discarded and sensitive
analysis protected?
Safety of used data & analysis
48. IDENTITY
‣ Is there a need for a centralized personal data
safe?
Ownership
‣ How do means of validation affect access, privacy and
safety?
Validation
‣ What are the mechanisms to correct bad data?
Incorrect data correction
49. THESE COMPONENTS PROVIDE THE
BASIS FOR CONVERSATION NOT A
HARD STRUCTURE
Data Ethics Code
Identity
Safety of used
data & analysis
Protection of
subjects
Mathematical
responsibility
Community
Privacy
bio-
information
Business
applications
3rd party
usage
Ownership Verification
Right to be
forgotten
Incorrect data
correction
61. ESTIMATED $100 MILLION - $500 MILLION
2006 - data theft
http://www.lifehealthpro.com/2015/06/18/the-10-most-expensive-data-breaches?t=regulatory&slreturn=1456110972&page=5
62. HIGH ESTIMATES $4 BILLION DOLLARS
2011 - data breach of 75 client companies
http://www.eweek.com/c/a/Security/Epsilon-Data-Breach-to-Cost-Billions-in-WorstCase-Scenario-459480
marketing data
70. Some folks working on this:
‣ The Council for Big Data, Ethics and Society
‣ Certified Analytics Professionals
‣ Michael McFarland, S.J. - Computer Scientist
‣ Cynthia Dwork - Microsoft Research
‣ Kord Davis - Digital Strategist
READ MORE HERE