Presentation at the 5th International Conference on eSocial Science. Part of a workshop on the law and ethics of eSocial Science research. It outlines three domains I am currently researching and some of the ethical issues I have encountered including reporting on a third party (Facebook), deception (craigslist) and information access (grouphug.us).
Ethical challenges for online social science research: Networks, rentals and confessionals
1. Ethical challenges for online social
science research: Networks,
Rentals and Confessionals
Bernie Hogan
Research Fellow, Oxford Internet Institute
NCeSS - 5th International Conference on e-Social Science
June 24, 2009. Cologne, Germany
Wednesday, June 24, 2009 1
2. Three unethical
studies?
• Facebook network research
• Craigslist audit study
• Grouphug.us
Wednesday, June 24, 2009 2
4. What are the techniques?
• Spidering - Technically fussy, often considered
inappropriate by data controller
• API - Technically restrictive, gives false sense of data
ownership (See Facebook Developer Terms of Use
Section 2.A.6)
• Datadump - Facebook gives you the data
• Someone else’s application - May not give data, but only
a picture.
• Handcoding - Spidering for masochists
Wednesday, June 24, 2009 4
5. Who gets the data?
• Golder, S., Wilkinson, D. M., and Huberman, B. A. (2007).
Rhythms of social interaction: Messaging within a
massive online network. In 3rd International Conference on
Communities and Technologies, East Lansing, MI. Springer.
• Traud, A., Kelsic, E., Mucha, P., and Porter, M. (2008). Community
structure in online collegiate networks. Working paper.
• Lewis, K., Kaufman, J., Gonzalez, M., Wimmer, A., and Christakis, N.
(2008). Tastes, ties, and time: A new social network
dataset using facebook.com. Social Networks, 30(4):330–342.
Wednesday, June 24, 2009 5
6. But isn’t it anonymous? No.
• Backstrom, L., Dwork, C., and Kleinberg, J. (2007).
Wherefore art thou r3579x? : anonymized social
networks, hidden patterns, and structural
steganography. In Proceedings of the 16th international
conference on World Wide Web, pages 181–190. ACM New
York, NY, USA.
• Direct attack needs ~ sqrt(log(n)) nodes.
• Narayanan, A. and Shmatikov,V. (2009). De-anonymizing
social networks. Forthcoming: IEEE C&S.
• Starting with even less and matching to existing network
can get over 90% of the network accurately.
Wednesday, June 24, 2009 6
7. Or simply use this guy
Zimmer, Michael. 2009.
“But the Data is Already
Public”: On the Ethics of
Research in Facebook.
8th International
Conference of Computer
Ethics: Philosophical
Enquiry. Corfu, Greece.
Wednesday, June 24, 2009 7
8. The only anonymous
network is one where
you know don’t know
the network structure.
This is unrealistic.
Wednesday, June 24, 2009 8
9. So what’s the precedent?
• Personal networks with informed consent.
• Name generators have historically asked individuals
to report data on their friends.
• They jump through an ethical loop-hole vis-a-vis the fact
that this is recall data.
• Information networks, however, permit not only data
created by an individual, but the friend of a friend data
that is merely accessible, not created, by the respondent.
Wednesday, June 24, 2009 9
10. Facebook properties enable you to
report on your friends to a third party.
Respondent
Friend 1 ? Friend 2
Wednesday, June 24, 2009 10
13. Methods
• This is a University of Toronto ethics board-approved
audit study.
• We selected craigslist.org, a highly popular free online
classifieds site.
• From March to June 2007 we selected approximately 10
new ads each day for inclusion in the study.
• Each landlord was emailed 5 messages. Each message
included one of five ethnicities randomly assigned with
one of five message bodies. Each experiment used one
gender only.
Wednesday, June 24, 2009 13
14. 1. Price and number of bedrooms 2. Masked email 3. Well-formed
almost always in header. address. date
4 . PostingID - key 5. Link to well-formed Google map, or
to linking data failing that, nearest intersection.
Wednesday, June 24, 2009 14
15. Jitter means that messages are
We send messages out one day after the
sent at a random time within "5"
posting (rather than immediately) at short
minutes of the specified time.
regular intervals. The parameters can be
Makes batches of messages look
tuned.
more realistic
By default we alternate between This window shows the five name / message
male and female names. combinations that will be sent out.
Wednesday, June 24, 2009 15
16. Date Email address. 1 of 5 different message bodies.
Secret posting ID:
1 of 5 female arabic names
ddhfegjfb = 337546951
Wednesday, June 24, 2009 16
17. Map of rentals in
Greater Toronto Area
Geographic distribution
of rental ads
(97% showing)
Wednesday, June 24, 2009 17
18. Ranked responses for names by
ethnicity and gender
• We ranked each of the Male Female
50 names from 1 (least 519 756
responses) to 50 (most
responses). Arab 31 113
Black 97 129
• The table shows the sum
of the ranks for all 5 SE Asian 88 179
names used in each
ethnicity-gender Caucasian 146 164
combination. Jewish 157 171
Wednesday, June 24, 2009 18
19. Issues
• Racism is often difficult to assess through
direct questioning.
• Deception in this study is necessary.
• There is no direct personal harm, and no
direct manipulation.
Wednesday, June 24, 2009 19
21. Online confessional site
• What constitutes anonymity?
• Grouphug is a website of approximately
one million posts (approximately 95%
unique).
• Does not store IP, actively discourages
quoting other posts and encodes the
entries in non-sequential strings
(timestamps exist but are hidden)
Wednesday, June 24, 2009 21
22. Nothing here to see...
(catch 22)
Wednesday, June 24, 2009 22
23. Ok, here are some examples
• “I am so happy that I can confess again. I don't
even care about seeing my confessions on here,
it's just the feeling of getting it off your chest and
sending it away!” (136158003)
• “I pee in the shower because I hate everyone I
live with.” (255678370)
Wednesday, June 24, 2009 23
24. Some worse examples
• “I paid my friend 200 dollars to do over 400 pages of
homework for the year, so that i can ditch school as
much as i want, while lying to my mother and saying im
still going to school” (194778021)
• “I have HPV, its a std. I have known about it for 7
years, but that has not stopped me from having sex with
9 people with out a condom. 4 of the girls where
married. I have never told anyone about my std. I have
no idea how many people are infected because of me,
it keeps me up at night.” (275447713)
Wednesday, June 24, 2009 24
25. So...
• Do we ignore anonymous confessionals as too
toxic, or treat them as insight to the id?
• Can we even analyze this data or merely view
it as passive bystanders? Are there legal
implications, especially dealing with data
designed to resist tracking? What is my
responsibility if I can do nothing to follow up
(or even confirm the veracity of the
statement)?
Wednesday, June 24, 2009 25
26. Summary
• Facebook - the ethics of capturing someone else’s
relationships is ambiguous. The network I see is not mine -
it is what I am allowed to see. I defer to Facebook’s terms
of use.
• Craigslist - the ethics of understanding racism as it
actually operates online is problematic. I defer to utilitarian
arguments and approval from the ethics board.
• Grouphug - the ethics of viewing and storing, let alone
analyzing, confessionals is ambiguous. How can we assure
no personally identifying information without looking for
it? How can we anonymize a million entries?
Wednesday, June 24, 2009 26
27. Opportunities
• We can get unprecedented access to
society in the wild.
• But is this fair? Is it justified?
• How close to ‘the social good’ must one be
to justify this work?
Wednesday, June 24, 2009 27
28. Thank You
Bernie Hogan
bernie.hogan@oii.ox.ac.uk
Wednesday, June 24, 2009 28