3. • Data – a record (can be tangible or electronic)
that is used as a basis for decision-making,
discussion or calculation that requires
processing and/or analysis to have meaning
• Data Scientist – A professional who uses the
scientific method to answer questions with
data
• Data quality – the truthfulness of data
4. • Signal – a meaningful interpretation of data
that is based on scientific evidence and
knowledge
• Noise – other interpretations of data
• Algorithm – a set of rules used in problem
solving
5. • Statistics – collecting and analyzing numbers in
large quantities
• Statistical significance – a statistical assessment
of whether the observed finding is real or caused
by chance
• Causation – a relationship between a first and
second phenomenon in which the second is a
consequence of the first.
• Spurious correlation – a relationship caused by a
hidden or lurking variable
6.
7.
8. • Ludic fallacy – thinking the real world
(complex!) is comparable to the models used
in experiments and modeled with math
• Naïve interventionism – preferring to do
something over nothing when nothing may be
more appropriate
• Naïve rationalism – belief that explanations
will necessarily follow investigations
9. Risks to ethics
• Hammurabi Risk Management – the builder
knows more than the inspector and can hide
flaws in the foundations
• Ethical inversion – putting the needs of the
profession ahead of the ethics (aka politics)
• Narrative fallacy – the need to fit a story to a
set of facts
10. Recognizing ethical risk spots?
• Could the action be damaging to people or
community?
• Does the action have ramifications beyond
legal or institutional concerns?
11. Ethics and data
• Analysts shouldn’t attempt to provide explanations
beyond their ability
• Analysts should provide their methods, to the ability of
their client to understand, including limitations of the
data and the insights
• Analysts should protect confidential information
• Analysts should avoid conflicts of interest
• Analysts should use the data science method
– Careful observation
– Analysis for potential meaning
– Formation of hypotheses
– Empirical testing of hypotheses
Notas del editor
Graphics from Tyler Vigen; tylervigen.com
Graphics from Tyler Vigen; tylervigen.com
Prostate cancer example: treatable, but sometimes treatment is worse than disease
Surgeon success percentages example – not taking a risky operation that could save a life because if it fails the surgeon or hospital is harmed. Narrative fallacy example: Why did Donald Trump get elected president despite polls leaning otherwise?
Narrative: disenfranchised working class voters
Narrative: the Russians did it
Fitting the narrative makes you overlook data