Talk on "Algorithmic Accountability Reporting" by Nicholas Diakopoulos, a computer scientist, Tow Fellow at the Columbia University Journalism School and incoming member of the faculty at the Philip Merrill College of Journalism.
1. Algorithmic Accountability
Reporting: On the Investigation of
Black Boxes
Nicholas Diakopoulos, Ph.D.
Columbia University Journalism School
(soon to be University of Maryland College of Journalism)
@ndiakopoulos – http://www.nickdiakopoulos.com
2. We should interrogate the
architecture of cyberspace as we
interrogate the code of Congress.
-- Lawrence Lessig, Code is Law, 2000
7. Algorithmic Accountability
How can we characterize the bias or
power of an algorithm?
When might algorithms be wronging us,
or making consequential decisions?
What role might journalists play in
holding algorithmic power to account?
9. Transparency
Voluntary incentives for self-disclosure
about algorithms
Trade secrets
Including FOIA exception
Gaming / manipulation
Goodhart’s Law: “when a measure becomes a target, it
ceases to be a good measure.”
Cognitive complexity
Transparency information needs to be accessible and
understandable
10. Adversarial Investigation
Reverse
Engineering
“the process of extracting
the knowledge or design
blueprints from anything
man-made”
Systematic examination to
unearth a model of how
system works
Uncover unintended side-
effects as a result of
implementation
12. geo
cookies
prices
Staples.com
WSJ Price Discrimination
Jennifer Valentino-DeVries, Jeremy Singer-Vine, and Ashkan Soltani. Websites Vary Prices, Deals Based on Users’ Information. Wall
Street Journal. Dec, 2012.
Price discrimination
Do different people pay different prices depending on their
geography or browser history? Yes!
13.
14.
15. Discriminatory / Unfair
Mistake that denies a service
Censorship
Breaks law or social norm
False Prediction
Other Stories from Algorithms?
16. Teaching journalists to do algorithmic
accountability
It’s messy and hard!
Legal issues
EULAs, DMCA, Computer Fraud and Abuse Act
Ethical implications of publishing more
info
Gaming, individual privacy
Transparency policy
What factors to expose, frequency, format of disclosure
What’s Next?
17. Thanks! Questions?
Nick Diakopoulos
Twitter: @ndiakopoulos
Email: nicholas.diakopoulos@gmail.com
Web: http://www.nickdiakopoulos.com
More Info
Algorithmic Accountability Reporting: On the
Investigation of Black Boxes. Tow Center. Feb. 2014.
http://towcenter.org/algorithmic-accountability-2/
18. Algorithms In Media: Search
Search Engine Autocomplete
Google Autocomplete FAQ:
“we exclude a narrow class of search queries related to
pornography, violence, hate speech, and copyright
infringement.”
Editorial Criteria
Boundaries of censorship, differences among search
engines, mistakes?
19. Algorithms In Media: Trends
Implications for formation of
publics?
How are trends defined and
measured?
What might be missed as a result?
Almost 14 years ago Lawrence Lessig taught us that “Code is Law” – that the architecture of systems, the code and algorithms that run them, can be a powerful influence on liberty. Let me give you a quick example of algorithmic power.
There are now dozens of sites online that collect, organize, and SEO mug shot photos. Having a mugshot online can be embarrassing, and many of these sites are blackmailing people for money to have their photo removed.
As a society we can deal with this shady behavior through laws, or through market forces like credit card companies not processing payments. Or it can be mitigated with algorithmic power. Which is what Google just did, by down ranking any of these sites.
Here are some of the examples where t I’ve found algorithms being used in gov. and corporations. I think there’s a gold mine of stories out there. I’m wondering if algorithms could be a new beat for computational journalists?
Traditionally investigative journalism has looked at uncovering hidden information about institutions. Turns out that algorithms are really good at hiding and obfuscating information so there’s a natural fit for journalism here. What we lack as a public is clarity on how an algorithm exercises its power over us, given that power is opaque, hidden behind complexity in a black box. We need to get *inside* that box.
The crux of algorithmic power is really autonomous decision-making. We might start to assess algorithmic power by thinking about the atomic decisions that algorithms make. And they can be composed and composited to arrive at higher level operations like summarization. This framework can help identify what we might focus on when investigating an algorithm, and suggests questions: criteria, errors, biases in training data, editorial criteria. Prioritization: fire-inspections in new york. Parollee attention. Classification: contentID – infringing or notAssociation: relationship between entities in an investigationFiltering: Censorship on social media – e.g. chinese censorship or the filtering of child pornography from search results
Transparency is the vogue response these days and certainly an increasingly important way that journalists deal with their own bias. But there are some limitations and challenges to applying it to algorithms.
In reality the process is probably a bit closer to historiography (or archaeology): the 12th century historian’s sample of data is that which time has chosen to preserve in the form of written accounts, artifacts, and the archaeological record. You take what you can get and interviews are not an option. Mention: if interviews do become available with the designers it can form a powerful comparison point between the deduced and the expressed design intent and lead to insights about whether the system is performing “as designed
Algorithms may be black boxes, but they have two little holes in the side, one for inputting and one for outputting. And if you vary the inputs in enough different ways and pay close attention to the outputs, you can start piecing together a theory of how the algo works. Also like it with the analogy to recipes … Given a cake and a set of ingredients can you figure out the recipe to turn those ingredients into a similar cake?
At the WSJ they used reverse engineering to understand how online commerce sites like Staples dynamically set prices based on things like geography and browser history. This involved simulating thousands of surfing sessions and recording prices. Also: rosetta stone, orbitz, home depot
The tendency to see discount prices appeared to be tied most strongly to distance from an OfficeMax or Office Depot — Staples' main competitors.• ZIP Codes within about 20 miles of a competitor's store tended to see discount prices.• ZIP Codes farther away from a rival store tended to see higher prices, even more so if the ZIP Code contained a Staples store.• Cities (which were more likely to have Staples stores as well as competitor stores) tended to see discount prices in the Journal's tests as well.• The Journal also examined numerous other possible factors, including income and race, and found that none were tied as strongly to price as competitor-store locations.
The tendency to see discount prices appeared to be tied most strongly to distance from an OfficeMax or Office Depot — Staples' main competitors.• ZIP Codes within about 20 miles of a competitor's store tended to see discount prices.• ZIP Codes farther away from a rival store tended to see higher prices, even more so if the ZIP Code contained a Staples store.• Cities (which were more likely to have Staples stores as well as competitor stores) tended to see discount prices in the Journal's tests as well.• The Journal also examined numerous other possible factors, including income and race, and found that none were tied as strongly to price as competitor-store locations.What they found is that statistically speaking, the strongest correlation to price involved the distance to a rival’s store from the center of a zip code. Prices tend to be higher in areas with less competition, including rural or poor areas.
So what might make a story out of an algorithm? Maybe it’s discriminatory or unfair in some way. It makes a mistake that denies a service. It censors something. It’s output breaks a law or social norm. Or it falsely predicts something with real consequences.
So what might make a story out of an algorithm? Maybe it’s discriminatory or unfair in some way. It makes a mistake that denies a service. It censors something. It’s output breaks a law or social norm. Or it falsely predicts something with real consequences.
Can personalize trends, diff people see different trends, how are trends detected. What are the implications for publics forming around trends?
Filters are imperfectDe-emphasize legitimate reviewsLeave fake reviews intactYelp is a massive review platform, not just for restaurants, but for all kinds of small businesses. And business and depend on the start ratings and reviews that you get. In order to protect the consumer, so they have an algorithmic filter that weeds out suspicious reviews and de-emphasizes them on another page, making it hard to even see how the removed reviews aggregate, or would potentially contribute to the overall score. Explain nature of allegations – how you could have a low rating, with several high ratings that are hidden. a number of Russian YouTube videos have been blocked from within Germany. The reason? These videos contain background music playing from a Russian car radio.