1. BYTE:
Big data roadmap and cross-disciplinary community for
addressing societal externalities
Ethical and social issues in big data practice
Rachel Finn, Anna Donovan and Kush Wadhwa
Trilateral Research & Consulting, LLP
BYTE WP2 Workshop
Lyon, 11 Sept 2014
2. WP2: Elements of societal impact
Task Description
T2.1 Economic issues in big data
T2.2 Legal issues in big data
T2.3 Social and ethical issues in big data*
T2.4 Political issues in big data
T2.5 Public perceptions relevant to big data*
T2.6 Open access to data
T2.7 Validation workshop
Task 2.3 D2.1
Task 2.5 D2.2
For information related to open access and big data (T2.6), please see D2.3
@BYTE_EU www.byte-project.eu
3. Objectives
To understand what potential social and ethical externalities exist relative to big data processing
To offer informed conjecture as to what members of the public might expect in a big data
environment
@BYTE_EU www.byte-project.eu
4. Methodology
Both based on desk research / literature review
◦ Review of social and ethical issues focused on academic journal articles, research reports, media
materials, etc.
◦ Review of public perceptions and aspirations focused on public opinion surveys
◦ E.g., Special Eurobarometer 359: Attitudes on Data Protection and Electronic Identity in the European Union 2012
◦ Big Data: Public views on the Collection, Sharing and Use of Personal Data by Government and Companies 2014
◦ Unisys Security Index: UK 2014
@BYTE_EU www.byte-project.eu
5. Practices examined
• Transparency
• Profiling and tracking
• Re-use / secondary use
• Data access
@BYTE_EU www.byte-project.eu
6. Transparency
Potential positive impacts
◦ Increased support for processing of data
◦ Information Commissioner’s Office (UK) – “Companies are asking… ‘should we do this with the data’?”
◦ Transparency may lead to greater trust, and more willingness to provide data
Potential negative impacts
◦ Data sabatoge – “once actors realise that an institution is collecting data and looking for patterns, they can
attempt to sabotage this by providing false information”
◦ A “chilling effect” – individuals restrain themselves from particular behaviours because they suspect that their
activities are being monitored
Unisys 2014 survey:
75% of British people will not shop or bank with people they cannot trust to safeguard their personal
information
@BYTE_EU www.byte-project.eu
7. Profiling / tracking
Potential positive impacts
◦ Trend identification
◦ Personalisation
◦ Efficiency
Potential negative impacts
◦ Discrimination
◦ Objectification
◦ Exploitation
◦ Privacy infringement
Eurobarometer Flash 225:
What is personal data? Information about tastes and opinions (27%), nationality (26%), hobbies, sports and
places visited (25%), and websites visited (25%).
@BYTE_EU www.byte-project.eu
8. Re-use / secondary use
Potential positive impacts
◦ Use of “data exhaust” for innovation or to capture efficiencies
◦ Limits the need for costly duplication of recourses
Potential negative impacts
◦ The “data gap”
◦ Extending “discriminatory” practices
Eurobarometer 359:
34% of respondents were concerned that their information is being used without their knowledge and
23% were concerned about their information being used in different contexts from the ones that were
disclosed to them
@BYTE_EU www.byte-project.eu
9. Data access
Potential positive impacts
◦ Opening access to data can enable the linking of data sets to generate new insights
◦ Differential access may be appropriate in some circumstances
Potential negative impacts
◦ Creation of a “digital hierarchy”
◦ Gender, race and class bias in those creating the digital models
◦ Potential privacy infringements when data sets are opened, linked and mined.
Ipsos Mori 2014:
90% support the use of people’s data to help develop treatment for cancer,
75% support data being used to improve the scheduling of transport services, and
70% support data use to prevent crimes
@BYTE_EU www.byte-project.eu
10. QUESTIONS
Any questions?
Key contacts:
◦ Rachel Finn, rachel.finn@trilateralresearch.com
◦ Anna Donovan, anna.donovan@trilateralresearch.com
Thank you
@BYTE_EU www.byte-project.eu
Notas del editor
Slide 2 – Work package 2 is made up of the following tasks. In this discussion I am going to be focusing on findings from T2.3, as outlined in Chapter 4 of D2.1, and D2.2.
If you would like information on open access and big data, please see D2.3, which was circulated, in draft form, with the final workshop agenda.
The reason I am going to focus on these two tasks is the significant overlap between the social and ethical impacts of big data processes, and the concerns and aspirations expressed by members of the public in Europe with regard to big data. For example, personalisation and efficiency are key potential positive impacts of big data, and these expectations and aspirations are shared by members of the public.
Slide 3 – Objectives
To understand what potential social and ethical externalities exist relative to big data processing
These may be positive or negative – personalisation, discrimination, trust or unwanted data linkages.
To offer informed conjecture as to what members of the public might expect in a big data environment
Again, these may be negative perceptions, for example around the security of their data, potential for discrimination, etc. or positive aspirations – e.g., greater efficiency, personalisation, etc.
Slide 4 – methodology
Review of public perceptions and aspirations focused on public opinion surveys as well as reports, articles and media information about major surveys.
No specific surveys on big data exist as yet. So we had to focus on surveys related to security, privacy, data protection and other ICT practices.
Sciencewise Expert Resource Centre
Both of these result in a situation whereby people attempt to manipluate the information that is collected about them
Whether transparency has positive or negative implications for big data, those implications are connected to levels of trust users hold in big data companies and organisations performing the relevant practices.
Slide 7 -
Trend identification: Purchasing behaviour, driving behaviour, etc.
Personalisation – goods and services that individuals are interested in, rather than random products. Amazon uses this very efficiently. Google, tailors your search results to profiles aspects and previous behaviour. However, Personalisation may not be an unmitigated positive: The benefits of personalization tend to accrue to businesses but the harms are inflicted on dispersed and unorganized individuals.”
Taipale - Cited in Bollier, David, “The Promise and Peril of Big Data”, The Aspen Institute: Communications and Society Program, Washington, DC, 2010, p.23.
Negative impacts
Discrimination – reinforce existing inequalities particularly in data mining applications related to consumer credit, government applications, etc. which only seek to identify relationships and do not examine the social causes of those relationships. Furthermore, particular groups are easier to collect data from, and so they may be over-represented in particular types of data sets. Yet, big data, assumes N=all.
Objectification – people become over-determined by their data profiles.
Exploitation – profit is generated “on the backs” of those about whom data is collected.
Slide 8
Data exhaust refers to information that is generated as a result of other, primary processes. For example, collections of location data resulting from the provision of navigation services, or energy usage information resulting from billing services.
The data gap was originally identified / discussed in a 2012 Royal Society report. It refers to data being divorced from the context in which it was created.
Finally, re-use of data or secondary uses could extend discriminatory practices in that elements of data collection that left out particular social groups (e.g., those with different physical capabilities, age, wealth, etc.) may be further extended if secondary processing of that information, particularly to gain new insights into relationships, continues to exclude or differently consider such groups.
Differential access may be appropriate in situations where personal data is being processed, or where there is potential for spurious relationships to emerge.
This can create a digital hierarchy where only a limited number of big data actors have the potential and means to access big data sets, and extract benefits of that access. Such access differentials may stifle innovation.
Students of the history of science already know that the questions that are asked and the scientific models that are created are contextualised by our social positions. If those who are asking the questions are disproportionately white, male and economically privileged, this could have a significant impact on the findings of big data.