SlideShare una empresa de Scribd logo
1 de 38
Descargar para leer sin conexión
VISVESVARAYA TECHNOLOGICAL UNIVERSITY 
Belagavi, Karnataka 
A Seminar Report on 
“BIG DATA PRIVACY ISSUES IN PUBLIC SOCIAL MEDIA” 
Submitted in partial fulfillment of the requirements for the award of the Degree of 
Master of Technology 
In 
Computer Science and Engineering 
Submitted 
by 
SUPRIYA R 
1VI14SCS11 
Guided By 
Mrs. MARY VIDYA JOHN 
Assistant Professor, Dept. of CSE 
Vemana IT, Bengaluru – 34. 
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING 
VEMANA INSTITUTE OF TECHNOLOGY 
Koramangala, Bengaluru – 560 034. 
2014-2015
Karnataka Reddyjana Sangha® 
VEMANA INSTITUTE OF TECHNOLOGY 
Koramangala, Bengaluru – 34 
(Affiliated to Visvesvaraya Technological University, Belagavi) 
Department of Computer Science & Engineering 
Certificate 
This is to certify that SUPRIYA R, bearing U.S.N 1VI14SCS11, a student of I semester, 
Computer Science & Engineering has completed the seminar entitled “BIG DATA 
PRIVACY ISSUES IN PUBLIC SOCIAL MEDIA” in partial fulfillment for the 
requirement of the award of degree of Master of Technology in Computer Science and Engineering 
of the Visvesvaraya Technological University, Belagavi during the academic year 2014-2015. The 
seminar report has been approved as it satisfies the academic requirements in respect of seminar 
work prescribed for the said degree. 
________________ ______________ 
Signature of the Guide Signature of HOD 
MRS. MARY VIDYA JOHN DR.NARASIMHA MURTHY K.N 
Name of the Examiner Signature with date 
1.____________________ ________________ 
2.____________________ ________________
ACKNOWLEDGEMENT 
I am indeed very happy to greatly acknowledge the numerous personalities involved in 
lending their help to make my seminar “Big Data Privacy Issues in Public Social Media” a 
successful one. 
Salutations to my beloved and esteemed institute “Vemana institute of technology” for 
having well qualified staff and labs furnished with necessary equipments. 
I express my gratitude to DR. P. GIRIDHARA REDDY, Principal, Vemana IT for 
providing the best facilities and without his constructive criticisms and suggestions I wouldn’t 
have been able to do this curriculum. 
I express my sincere gratitude to DR.NARASIMHA MURTHY K.N, Professor and 
H.O.D (PG), Dept of Computer Science and Engineering, a great motivator and think-tank. I am 
very much grateful to him for providing an opportunity to present my seminar. 
I also express my gratitude to internal guide Mrs. MARY VIDYA JOHN, Assistant 
Professor for giving the sort of encouragement in preparing the report, presenting the paper, spirit 
and useful guidance. 
I also thank my parents and friends for their moral support and constant guidance made 
ii 
my efforts fruitful. 
Date: SUPRIYA R 
Place: Bangalore 1VI14SCS11
ABSTRACT 
Big Data is a new label given to a diverse field of data intensive information in 
which the datasets are so large that they are difficult to work with effectively. The term has 
been mainly used in two contexts, firstly as a technological challenge when dealing with 
data-intensive domains such as high energy physics, astronomy or internet search, and 
secondly as a sociological problem when data about us is collected and mined by companies 
such as Facebook, Google, Mobile phone companies and Governments. 
The concentration is on the second issue from a new perspective, namely how the 
user can gain awareness of the personally relevant part Big Data that is publicly available 
in the social web. The amount of user-generated media uploaded to the web is expanding 
rapidly and it is beyond the capabilities of any human to sift through it all to see which 
media impacts our privacy. Based on an analysis of social media in Flickr, Locr, Facebook 
and Google+, privacy implications and potential of the emerging trend of geo-tagged social 
media are discussed. At last, a concept with which users can stay informed about which 
parts of the social Big Data is relevant to them. 
iii
CONTENTS 
CHAPTER NO. TITLE PAGE NO 
ABSTRACT iii 
CONTENTS iv 
LIST OF FIGURES v 
iv 
1 
1 INTRODUCTION 
1.1 Overview 1 
2 LITERARURE SURVEY 3 
2.1 Location privacy in pervasive computing 3 
2.2 Facebook’s trainwreck: exposure and invasion 6 
2.3 Facebook’s privacy settings: Who cares? 8 
2.4 Protecting location privacy with K-Anonymity 12 
3 
PROBLEM STATEMENT 14 
4 SYSTEM ANALYSIS 16 
4.1 Existing system 16 
4.2 Proposed system 17 
5 THREAT ANALYSIS 19 
5.1 Awareness of damaging media in big datasets 19 
5.2 Analysis of Service Privacy 20 
6 SURVEY OF METADATA IN SOCIAL MEDIA 24 
7 CONCLUSION AND FUTURE WORK 27 
REFERENCES 30
LIST OF FIGURES 
FIGURE NO. 
TITLE 
PAGE. NO 
Figure 1 Mixed zone arrangement 05 
Figure 2 Facebook message in Dec 2009 10 
Figure 3 Facebook’s simplified setting 11 
Figure 4 Public Privacy metadata from Flickr 22 
Figure 5 Public Privacy metadata from Locr 23 
v
Big Data Privacy Issues in Public Social Media 
CHAPTER 1 
INTRODUCTION 
In this electronic age, increasing number of organizations are facing the problem of 
explosion of data and the size of the databases used in today’s enterprises has been growing 
at exponential rates. Data is generated through many sources like business processes, 
transactions, social networking sites, web servers, etc. and remains in structured as well as 
unstructured form . Today's business applications are having enterprise features like large 
scale, data-intensive, web-oriented and accessed from diverse devices including mobile 
devices. Processing or analyzing the huge amount of data or extracting meaningful 
information is a challenging task. 
1.1 Overview 
Big data is an all-encompassing term for any collection of data sets so large and 
complex that it becomes difficult to process them using traditional data processing 
applications. It is becoming a hot topic in many areas where datasets are so large that they 
can no longer be handled effectively or even completely. Or put differently, any task that 
is comparatively easy to execute when operating on a small set of data, but becomes 
unmanageable when dealing with the same problem with a large dataset can be classified 
as a Big Data problem. Data sets grow in size in part because they are increasingly being 
gathered by ubiquitous information-sensing mobile devices, remote sensing, software logs, 
cameras, microphones and wireless sensor networks. 
Typical problems that are encountered when dealing with Big Data includes capture, 
storage, dissemination, search, analytics and visualization. The traditional data-intensive 
sciences such as astronomy, high energy physics, meteorology, and genomics, biological 
and environmental research in which peta- and exabytes of data are generated are common 
domain examples. Here even the capture and storage of the data is a challenge. 
But there are also new domains emerging on the Big Data paradigm such as data 
warehousing, Internet and social web search, finance and business informatics. Here 
datasets can be small compared to the previous domains, however the complexity of the 
data can still lead to the classification as a Big Data problem. 
M.Tech, 1st sem, Dept.of CSE, VIT 2014-15 1
Big Data Privacy Issues in Public Social Media 
The traditional Big Data applications such as astronomy and other e-sciences usually 
operate on non-personal information and as such usually do not have significant privacy 
issues. The privacy critical Big Data applications lie in the new domains of the social web, 
consumer and business analytics and governmental surveillance. In these domains Big Data 
research is being used to create and analyze profiles of users, for example for market 
research, targeted advertisement, workflow improvement or national security. 
These are very contentious issues since it is entirely up to the controller of the Big Data 
sets if the information obtained from various sources is used for criminal purposes or not. 
In particular in the context of the social web there is an increasing awareness of the value, 
potential and risk of the personal data which users voluntarily upload to the web. 
However, the Big Data issue in this area has focused almost entirely on what the 
controlling companies do with this information. These concerns are being addressed by 
calls for regulatory intervention, i.e. regulating what companies are allowed to do with the 
data users give them or what data they are allowed to gather about users. 
Social media Big Data issue is examined. The growing proliferation and capabilities of 
mobile devices which is creating a deluge of social media which effects our privacy is 
discussed. Due to the vast amounts of data being uploaded every day it is next to impossible 
to be aware of everything which affects us. Then a concept which can be used to regain 
control of some of the Big Data deluge created by other social web users is discussed. 
M.Tech, 1st sem, Dept.of CSE, VIT 2014-15 2
Big Data Privacy Issues in Public Social Media 
CHAPTER 2 
LITERATURE SURVEY 
2.1 Location Privacy in Pervasive Computing 
Mix zones 
Most theoretical models of anonymity and pseudonymity originate from the work 
of David Chaum, whose pioneering contributions include two constructions for anonymous 
communications (the mix network6 and the dining cryptographers algorithm) and the 
notion of anonymity sets. Introduction to a new entity for location systems, the mix zone, 
which is analogous to a mix node in communication systems is discussed. Using the mix 
zone to model the spatiotemporal problem, useful techniques from the anonymous 
communications field are adopted. 
Mix networks and mix nodes 
A mix network is a store-and-forward network that offers anonymous 
communication facilities. The network contains normal message-routing nodes alongside 
special mix nodes. Even hostile observers who can monitor all the links in the network 
cannot trace a message from its source to its destination without the collusion of the mix 
nodes. In its simplest form, a mix node collects n equal-length packets as input and reorders 
them by some metric (for example, lexicographically or randomly) before forwarding them, 
thus providing unlinkability between incoming and outgoing messages. 
For briefness, some essential details about layered encryption of packets are 
omitted; interested readers should consult Chaum’s original work. 
The number of distinct senders in the batch provides a measure of the unlinkability 
between the messages coming in and going out of the mix. Chaum later abstracted this last 
observation into the concept of an anonymity set. As paraphrased by Andreas Pfitzmann 
and Marit Köhntopp, whose recent survey generalizes and formalizes the terminology on 
anonymity and related topics, the anonymity set is “the set of all possible subjects who 
might cause an action.” 
M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 3
Big Data Privacy Issues in Public Social Media 
In anonymous communications, the action is usually that of sending or receiving a 
message. The larger the anonymity set’s size, the greater the anonymity offered. 
Conversely, when the anonymity set reduces to a singleton—the anonymity set cannot 
become empty for an action that was actually performed—the subject is completely 
exposed and loses all anonymity. 
Mix zones 
A mix zone for a group of users as a connected spatial region of maximum size in 
which none of these users have registered any application callback has been discussed; for 
a given group of users there might be several distinct mix zones. In contrast, an application 
zone as an area where a user has registered for a callback. The middleware system can 
define the mix zones and calculate them separately for each group of users as the spatial 
areas currently not in any application zone. Because applications do not receive any 
location information when users are in a mix zone, the identities are “mixed.” 
Assuming users change to a new, unused pseudonym whenever they enter a mix 
zone, applications that see a user emerging from the mix zone cannot distinguish that user 
from any other who was in the mix zone at the same time and cannot link people going into 
the mix zone with those coming out of it. If a mix zone has a diameter much larger than the 
distance the user can cover during one location update period, it might not mix users 
adequately. 
M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 4
Big Data Privacy Issues in Public Social Media 
Figure 1. A sample mix zone arrangement with three application zones. The airline agency 
(A) is much closer to the bank (B) than the coffee shop (C). Users leaving A and C at the 
same time might be distinguishable on arrival at B. 
Figure 1 provides a plan view of a single mix zone with three application zones around the 
edge: an airline agency (A), a bank (B), and a coffee shop (C). Zone A is much closer to B 
than C, so if two users leave A and C at the same time and a user reaches B at the next 
update period, an observer will know the user emerging from the mix zone at B is not the 
one who entered the mix zone at C. Furthermore, if nobody else was in the mix zone at the 
time, the user can only be the one from A. If the maximum size of the mix zone exceeds 
the distance a user covers in one period, mixing will be incomplete. The amount to which 
the mix zone anonymizes users is therefore smaller than one might believe by looking at 
the anonymity set size. 
M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 5
Big Data Privacy Issues in Public Social Media 
2.2 Facebook’s Privacy Train-wreck. Exposure, Invasion, and Social Convergence 
The tech world has a tendency to view the concept of ‘private’ as a single bit that is 
either 0 or 1. Data are either exposed or not. When companies make a decision to make 
data visible in a more ‘efficient’ manner, it is often startling, prompting users to speak of a 
disruption of ‘privacy’. 
Facebook, did not make anything public that was not already public, but search 
disrupted the social dynamics. The reason for this is that privacy is not simply about the 
state of an inanimate object or set of bytes; it is about the sense of vulnerability that an 
individual experiences when negotiating data. On Facebook, people were wandering 
around accepting others as Friends, commenting on others’ pages, checking out what others 
posted, and otherwise participating in the networked environment. Now, imagine that 
everyone involved notices all this because it is displayed when they login. 
By aggregating all this information and projecting it to everyone’s News Feed, 
Facebook made what was previously obscure difficult to miss (and even harder to forget). 
Those data were all there before but were not efficiently accessible; they were not 
aggregated. The acoustics changed and the social faux pas was suddenly very visible. 
Without having privacy features, participants had to reconsider each change that they made 
because they knew it would be broadcast to all their Friends. 
With Facebook, participants have to consider how others might interpret their 
actions, knowing that any action will be broadcast to everyone with whom they consented 
to digital Friendship. For many, it is hard to even remember whom they listed as Friends, 
let alone assess the different ways in which they may interpret information. Without being 
able to see their Friends’ reactions, they are not even aware of when their posts have been 
misinterpreted. 
Facebook imply that people can maintain infinite numbers of friends provided they 
have digital management tools. While having hundreds of Friends on social network sites 
is not uncommon, users are not actually keeping up with the lives of all of those people. 
These digital Friends are not necessarily close friends and friendship management tools are 
not good enough for building and maintaining close ties. While Facebook assumes that all 
M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 6
Big Data Privacy Issues in Public Social Media 
Friends are friends, participants have varied reasons for maintaining Friendship ties on the 
site that have nothing to do with daily upkeep. 
News Feeds does not distinguish between these – all Friends are treated equally and 
updates come from all Friends, not just those that an individual deems to be close friends. 
To complicate matters, when data are there, people want to pay attention, even if it doesn’t 
help them. People relish personal information because it is the currency of social hierarchy 
and connectivity. Cognitive addiction to social information is great for Facebook because 
News Feeds makes Facebook sticky. 
Privacy is not simply about zero’s and ones, it is about how people experience their 
relationship with others and with information. Privacy is a sense of control over 
information, the context where sharing takes place, and the audience who can gain access. 
Information is not private because no one knows it; it is private because the knowing is 
limited and controlled. In most scenarios, the limitations are often more social than 
structural. Secrets are considered the most private of social information because keeping 
knowledge private is far more difficult than spreading it. There is an immense gray are 
between secrets and information intended to be broadcast as publicly as possible. By and 
large, people treated Facebook as being in that gray zone. Participants were not likely to 
post secrets, but they often posted information that was only relevant in certain contexts. 
The assumption was that if you were visiting someone’s page, you could access information 
in context. When snippets and actions were broadcast to the News Feed, they were taken 
out of context and made far more visible than seemed reasonable. In other words, with 
News Feeds, Facebook obliterated the gray zone. Social convergence occurs when 
disparate social contexts are collapsed into one. Even in public settings, people are 
accustomed to maintaining discrete social contexts separated by space. How one behaves 
is typically dependent on the norms in a given social context. 
How one behaves in a pub differs from how one behaves in a family park, even 
though both are ostensibly public. Social convergence requires people to handle disparate 
audiences simultaneously without a social script. While social convergence allows 
information to be spread more efficiently, this is not always what people desire. As with 
other forms of convergence, control is lost with social convergence. Can we celebrate 
M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 7
Big Data Privacy Issues in Public Social Media 
people’s inability to control private information in the same breath as we celebrate 
mainstream media’s inability to control what information is broadcast to us? 
Privacy is not an inalienable right – it is a privilege that must be protected social 
and structurally in order to exist. The question remains as to whether or not privacy is 
something that society wishes to support. When Facebook launched News Feeds, they 
altered the architecture of information flow. This was their goal and with a few tweets, they 
were able to convince their users that the advantages of News Feeds outweighed security 
through obscurity. Users quickly adjusted to the new architecture; they began taking actions 
solely so that they could be broadcast across to Friends’ News Feeds. 
2.3 Facebook privacy settings: Who cares? 
Facebook’s approach to privacy was initially network–centric. By default, students’ 
content was visible to all other students on the same campus, but no one else. Through a 
series of redesigns, Facebook provided users with controls for determining what could be 
shared with whom, initially allowing them to share with “No One”, “Friends”, “Friends– 
of–Friends”, or a specific “Network”. When Facebook became a platform upon which other 
companies could create applications, users’ content could then be shared with third–party 
developers who used Facebook data as part of the “Apps” that they provided. The company 
introduced privacy settings to allow users to determine which third parties could access 
what content when users encountered a message whenever they chose to add an application. 
Over time, Facebook introduced the ability to share content with “Everyone” (inside 
Facebook or not). Increasingly, the controls got more complex and media reports suggest 
that users found themselves uncertain about what they meant (Bilton, 2010). Recognizing 
the validity of this point, Facebook eventually simplified its privacy settings page 
(Zuckerberg, 2010). 
At each point when Facebook introduced new options for sharing content, the 
default was to share broadly. Following a series of redesigns in December 2009, Facebook 
added a prompt when users logged on asking them to reconsider their privacy settings for 
various types of content on the site, including “Posts I Create: Status Updates, Likes, 
Photos, Videos, and Notes“ (see Figure 2). For each item, users were given two options: 
“Everyone” or “Old Settings.” The toggle buttons defaulted to “Everyone.” 
M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 8
Big Data Privacy Issues in Public Social Media 
This message appeared when users logged into their account and it was impossible to go to 
the rest of the site without addressing the prompt. Faced with this obligatory prompt, many 
users may well have just clicked through, accepting the defaults that Facebook had chosen. 
In doing so, these users made much of their content more accessible than was previously 
the case. As part of these changes, everyone’s basic profile and Friends list became 
available to the public, regardless of whether or not they had previously chosen to restrict 
access; this was later redacted. 
When Facebook was challenged by the Federal Trade Commission to defend its 
decision about this approach, the company representative noted that 35 percent of users 
who had never before edited their settings did so when prompted. Facebook used these data 
to highlight that more people engaged with Facebook privacy settings than the industry 
average of 5–10 percent. While 35 percent may be significantly more than the industry 
average, and Facebook did not specify what percentage of users had never adjusted their 
settings, there is still likely a sizeable majority that accepted the site’s defaults each time 
changes had been implemented. 
M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 9
Big Data Privacy Issues in Public Social Media 
Figure 2: The message Facebook users saw in December 2009. 
As privacy advocates and regulators investigated what these changes meant, 
Facebook moved toward another series of modifications that depended on users’ 
willingness to share information. 
In April 2010, at f8 conference, Facebook announced Instant Personalizer and 
Social Plugins, two services that allowed partners to leverage the social graph — the 
information about one’s relationships on the site that the user makes available to the system 
— and provide a channel for sharing information between Facebook and third parties. For 
example, Web sites could implement a Like button on their own pages that enables users 
to share content from that site with the user’s connections on Facebook. Sites could also 
implement an Activity Feed that publicizes what a person’s Friends do on that site. These 
tools were built for developers and likely few users, journalists, or regulators had any sense 
of what data from their accounts and actions on Facebook were being shared with whom 
M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 10
Big Data Privacy Issues in Public Social Media 
under what circumstances. As the discussions became more voracious, Facebook was 
increasingly pressured to respond. 
On 26 May 2010, Zuckerberg announced that Facebook heard the concerns and 
believed that the major issue on the table was Facebook’s confusing privacy settings page. 
Facebook unveiled a new privacy settings page that, while simpler, also removed many of 
the controls that allowed users to limit what content could be restricted (see Figure 3). This 
quelled much of the news coverage, but it is unlikely that it would lead to the end of 
controversy as in July 2010, a Canadian law firm filed a class action suit against Facebook 
over privacy issues. 
Figure 3: Facebook’s “simplified” privacy settings, July 2010. 
M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 11
Big Data Privacy Issues in Public Social Media 
2.4 Protecting Location Privacy with Personalized k-Anonymity: Architecture and 
Algorithms 
There are two popular approaches to protect location privacy in the context of 
Location Based Services (LBS) usage: policy-based and anonymity-based approaches. 
In policy-based approaches, mobile clients specify their location privacy 
preferences as policies and completely trust that the third-party LBS providers adhere to 
these policies. 
In the anonymity-based approaches, the LBS providers are assumed to be semi 
honest instead of completely trusted. A k-anonymity preserving management of location 
information by developing efficient and scalable system-level facilities for protecting the 
location privacy through ensuring location k-anonymity. Assume that anonymous location-based 
applications do not require user identities for providing service. 
The concept of k-anonymity was originally introduced in the context of relational 
data privacy. It addresses the question of “how a data holder can release its private data 
with guarantees that the individual subjects of the data cannot be identified whereas the 
data remain practically useful”. For instance, a medical institution may want to release a 
table of medical records with the names of the individuals replaced with dummy identifiers. 
However, some set of attributes can still lead to identity breaches. These attributes are 
referred to as the quasi-identifier. For instance, the combination of birth date, zip code, and 
gender attributes in the disclosed table can uniquely determine an individual. By joining 
such a medical record table with some publicly available information source like a voters 
list table, the medical information can be easily linked to individuals. K-anonymity 
prevents such a privacy breach by ensuring that each individual record can only be released 
if there are at least k - 1 distinct individuals whose associated records are indistinguishable 
from the former in terms of their quasi-identifier values. 
In the context of LBSs and mobile clients, location k-anonymity refers to the k-anonymity 
usage of location information. A subject is considered location k-anonymous if 
and only if the location information sent from a mobile client to an LBS is indistinguishable 
from the location information of at least k - 1 other mobile clients. Paper proposses that 
location perturbation is an effective technique for supporting location k-anonymity and 
M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 12
Big Data Privacy Issues in Public Social Media 
dealing with location privacy breaches exemplified by the location inference attack 
scenarios. If the location information sent by each mobile client is perturbed by replacing 
the position of the mobile client with a coarser grained spatial range such that there are k- 
1 other mobile clients within that range (k > 1), then the adversary will have uncertainty in 
matching the mobile client to a known location-identity association or an external 
observation of the location-identity binding. This uncertainty increases with the increasing 
value of k, providing a higher degree of privacy for mobile clients, even though the LBS 
provider knows that person A was reported to visit location L, it cannot match a message 
to this user with certainty. This is because there are at least k different mobile clients within 
the same location L. As a result, it cannot perform further tracking without ambiguity. 
M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 13
Big Data Privacy Issues in Public Social Media 
CHAPTER 3 
PROBLEM STATEMENT 
The amount of social media being uploaded into the web is growing rapidly and 
there is still no end to this trend in sight. The ease-of-use of modern smartphones and the 
proliferation of high-speed mobile networks is facilitating a culture of spontaneous and 
carefree uploading of user-generated content. To give an idea of the scale of this 
phenomenon: Just in the last two years the number of photos uploaded to Facebook per 
month has risen from 2 billion to over 6 billion. From a personal perspective an 
overwhelming majority of these photos have no privacy relevance for oneself. Finding the 
few that are relevant is a daunting task. 
While one’s own media is uploaded consciously, the flood of media uploaded by 
others is so huge that it is almost impossible to stay aware of all media in which one might 
be depicted. This can be classified as a Big Data problem on the user’s side, however not 
on the provider side. Current social networks and photo-sharing sites mainly focus on the 
privacy of users’ own media in terms of access control, but do little to deal with the privacy 
implications created by other users’ media. There are ever more complex settings allowing 
a user to decide who is allowed to see what content but only content owned by the user. 
However, the issue of staying on top of what others are uploading (mostly in good faith), 
that might also be relevant to the user, is still very much outside the control of that user. 
Social networks which allow tagging of users usually inform affected users when they are 
tagged. However, if no such tagging is done by the uploader or a third party, there are 
currently no mechanisms to inform users of relevant media. 
A second important emerging trend is the capability of many modern devices to 
embed geo-data and other metadata into the created content. While the privacy issues of 
location-based services such as Foursquare or Qype have been discussed at great length, 
the privacy issues of location information embedded into uploaded media have not yet 
received much attention. There is one very significant difference between these two 
categories. In the first, users reveal their current location to access online services, such as 
Google Maps, Yelp or Qype or the user actually publishes his location on a social network 
M.Tech, 1st sem, Dept.of CSE, VIT 2014-15 14
Big Data Privacy Issues in Public Social Media 
site like Foursquare, Google Latitude or Facebook Places. In this category, the user mainly 
affects his own privacy. There is a large body of work examining privacy preserving 
techniques to protect a user’s own privacy, ranging from solutions which are installed 
locally on the user’s mobile device, to solutions which use online services relying on group-based 
anonymisation algorithms, as for instance mix zones or k-anonymity. 
The second category is created by media which contain location information. This 
can have all the same privacy implications for the creator of the media, however, a critical 
and hitherto often overlooked issue is the fact, that the location and other metadata 
contained in pictures and videos can also affect other people than the uploader himself. 
This is a critical oversight and an issue which will gain importance as the mobile smart 
device boom continues. 
M.Tech, 1st sem, Dept.of CSE, VIT 2014-15 15
Big Data Privacy Issues in Public Social Media 
CHAPTER 4 
SYSTEM ANALYSIS 
4.1 EXISTING SYSTEM 
Privacy issues are categorized into two classes. Firstly, homegrown problems: 
Someone uploads a piece of compromising media of himself with insufficient protection or 
forethought which causes damage to his own privacy. A prime example of this category is 
someone uploading compromising pictures of himself into a public album instead of a 
private one or onto his Timeline instead of a message. The damage done in these cases is 
very obvious since the link between the content and the user is direct and the audience 
(often the peer circle) has direct interest in the content. One special facet of this problem is 
that what is considered damaging content by the user can and often does change over time. 
While this is a serious problem, especially amongst the Facebook generation, this issue is 
a small data problem and thus is not the focus of this work. 
Secondly we have the Big Data problems created by others: An emerging threat to 
users’ online privacy comes from other users’ media. What makes this threat particularly 
bad is the fact that the person harmed is not involved in the uploading process and thus 
cannot take any pre-emptive precautions and the amount of data being uploaded is so vast 
it cannot be manually sighted. Also there are currently no countermeasures, except post-priory 
legal ones, to prevent others from uploading potentially damaging content about 
someone. 
There are two requirements for this form of privacy threat to have an effect: 
Firstly, to cause harm to a person a piece of media needs to be able to be 
associated/linked to the person in some way. This link can either be non-technical, such as 
being recognizable in a photo, or technical such as a profile being (hyper-) linked to a photo. 
There is also the grey area of textual references to a person near to the photo or embedded 
in the metadata of the photo. This metadata does not directly create a technical link to a 
profile, but it opens the possibility for search engines to index the information and make it 
searchable, thus creating a technical link. 
M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 16
Big Data Privacy Issues in Public Social Media 
Secondly, a piece of media in question must contain harmful content for the person 
linked to it. This can again be non-technical such as being depicted in a compromising way. 
However, more interestingly it can also be technical. In these cases metadata or associated 
data causes harm. For instance time and location data can indicate that a person has been 
at an embarrassing location, took part in a political event, or was not where he said he was. 
Since the uploading of this type of damaging media cannot be effectively prevented, 
awareness is the key issue in combating this emerging privacy problem. 
4.2 PROPOSED SYSTEM 
The privacy awareness concept consists of a watchdog client a server side watchdog 
service. Using a GPS-enabled mobile device a user can activate the watchdog client to 
locally track his position during times he considers relevant to his privacy. Then his device 
can request the privacy watchdog to show him media that could potentially affect him 
whenever the user is interested in the state of his online privacy. For this the watchdog 
client sends the location traces to the watchdog server or servers which then respond with 
a list of media which was taken in the vicinity (both spatially and temporally) of the user. 
The watchdog service can be operated in three different ways. The first two would 
be value-added services which can be offered by the media sharing sites or social networks 
(SN) themselves. In both these cases the existing services would need to be extended by an 
API to allow users to search for media by time and location. The first type of service would 
do this via the regular user account. Thus, it would able to see all public pictures, but also 
all pictures restricted in scope but visible to the user. 
The benefit of this service type is that through the integration with the account of 
the user pictures which aren’t publicly available can be searched. These private pictures are 
typically from social network friends and thus the likelihood of pictures involving the user 
is higher and the scope of people able to view the pictures more relevant. This type of 
service also has the benefit-of-sorts that the location information is valuable to the SN, so 
it has an incentive to offer this kind of value-added service. The privacy downside of 
searching with the user’s account is that the SN receives the information of when and where 
a user was. While there are certainly users who would not mind this information being sent 
to their SN if it means they get to see and remove embarrassing photos in a timely manner, 
M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 17
Big Data Privacy Issues in Public Social Media 
there are also certainly users who do not wish their SN to know when and where they were, 
particularly amongst the clientele who wish to protect their online privacy. 
The second type of watchdog service would also be operated by the SN. However, 
it does not require a user account to do the search and can be queried anonymously. This 
type of service would thus have a smaller scope, since it can only access publicly available 
media. A further drawback of this type of service is that there is less of an incentive for the 
SN provider to implement such a service. While there are sites such as Locr that allow such 
queries, most SN sites do not. Without outside pressure there is less intrinsic value for them 
to include such a service compared to the first type. 
The third type of service would be a stand-alone servi which can be operated by a 
third party. The stand-alone service operates like an indexing search machine, which crawls 
publicly available media and its metadata and allows this database to be queried. Possible 
incentive models for this approach include pay-per-use, subscription, ad-based or 
community services. The visibility scope would be the same as for the second type of 
service. 
All three types of service are mainly focused on detecting relevant media events 
and breaking down the Big Data problem to humanly manageable sizes. The concept is 
mainly focused on bringing possibly relevant media to the attention of the user without 
overburdening him. The system does not explicitly protect from malicious uploads with 
which the uploader is intentionally trying to harm another while attempting the activity at 
the same time. Even though the watchdog service proposed here could make the subterfuge 
harder for the malicious uploader. But even without full protection from malicious activity 
we believe that such a watchdog would improve the current state of the art by enabling 
users to gain better awareness of the relevant part of the social media Big Dataset. 
M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 18
Big Data Privacy Issues in Public Social Media 
CHAPTER 5 
THREAT ANALYSIS 
5.1 Awareness of Damaging Media in Big Datasets 
Most popular social networks and media sharing sites allow users to tag objects and 
people in their uploaded media. Additionally, some services also extract embedded 
metadata and use this information for indexing and linking. Media is annotated with names, 
comments, or is directly linked to users’ profiles. In particular the direct linking of profiles 
to pictures was initially met with an outcry of privacy concerns since it greatly facilitated 
finding information about people. For this reason, social networks quickly introduced the 
option to prevent people from linking them in media. However, there is also a positive side 
to this form of direct linking since the linked person is usually made aware about the linked 
media and can then take action to have unwanted content removed or restrict the visibility 
of the link. 
While the privacy mechanism of current services are still limited, hidden and often 
confusing, once the link is made the affected people can take action. A more critical case 
in view is the non-linked tagging of photos. In this case a free text tag contains identifying 
information and/or malicious comments. However there is no automated mechanism to 
inform a user that he was named in or near a piece of media. The named person might not 
even be a member of the service where the media was uploaded. The threat of this kind of 
linking is significantly different to the one depicted above. While the immediate damage 
can be smaller since no automated notification is sent to friends of the user, the threat can 
remain hidden far longer. The person can remain unaware of this media whereas others can 
stumble upon it or be informed by peers. 
The final case of damaging media does not contain any technical link. Without any 
link to the person in question this kind of media can only cause harm to the person if 
someone having some contact to that person stumbles across it and makes the connection. 
While the likelihood of causing noticeable harm is smaller, it is still possible. The viral 
spreading of media has caused serious embarrassment and harm in real world cases. The 
critical issue here is that there is currently no way for a person to pro-actively search for 
this kind of media in the Big Data deluge to mitigate this threat. 
M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 19
Big Data Privacy Issues in Public Social Media 
5.2 Analysis of Service Privacy 
The different types of watchdog services discussed in the proposed system can help 
to reduce the number of relevant pieces of media a user needs to keep an eye on if users 
don’t want uncontrolled media of themselves to be online. However, the devil is in the 
detail since this form of service can also have serious privacy implications itself if designed 
in the wrong way. Care must be taken to facilitate the different usage and privacy 
requirements of different users. The critical issue is the fact that to request the relevant 
media a user must send location information to the watchdog service. 
When using the first type of service, there is little which can be done to protect the 
location privacy of the user since the correlation between the location query and the user 
account is direct. One option to protect the privacy to a degree can be an obfuscation 
approach. For every true query a number of fake queries could also be sent, making it less 
easy (but far from impossible) for the SN provider to ascertain the true location. However, 
this approach does not scale well for two reasons. Firstly, it creates a higher load on the 
SN. However, it is more critical that the likelihood deducing the true location rises, if many 
queries are sent unless great care is taken in creating the fake paths and masking the source 
IP addresses. 
Protecting the user’s privacy in the second case is simpler. Since the queries do not 
require an account the only way a user could be tracked directly is his IP address. Using an 
anonymizing service such as TOR sequential queries cannot be linked together and creating 
a tracking profile becomes significantly harder. The anonymous trace data can of course 
still be used by the SN, but the missing link to the user makes it less critical for the user 
himself. The third type of service is probably the most interesting privacy wise, since the 
economic model behind the service will significantly impact the privacy techniques which 
can be applied to this model. Most payment models would require user credentials to log 
in and thus would allow the watchdog service provider to track the user. In this case it 
would have to be the reputation of the service provider which the user would have to trust, 
similar to the case of commercial anonymizing proxies. In an ad-based approach no user-credentials 
are needed, thus it would be possible to use the service anonymously via TOR. 
If so, the watchdog client would need to be open source to ensure that no secret tracking 
information is stored there. In community-based approaches a privacy model like that in 
M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 20
Big Data Privacy Issues in Public Social Media 
TOR can be used, to ensure none of the participating nodes gets enough information to 
track a single user. Each of these proposed service types has different privacy benefits and 
disadvantages. 
The following section overviews our privacy analysis of different media hosting 
sites. It includes media access control as well as metadata handling. 
Flickr provides the most fine-grained privacy/access control settings of all analyzed 
services. Privacy settings can be defined on metadata as well as the image itself. One 
particularly interesting feature of Flickr is the geo-fence. The geo-fence feature enables 
users to define privacy regions on a map by putting a pin on it and setting a radius. Access 
to GPS data of the user’s photos inside these regions is only allowed for a restricted set of 
users (friends, family, and contacts). Flickr allows its users to tag and add people to images. 
If a user revokes a person tag of himself in an image, no one can add the person to that 
image again. 
Facebook extracts the title and description of an image from metadata during the 
upload process of some clients. All photos are resized after upload and metadata is stripped 
off. Facebook uses face recognition for friend tagging suggestions based on already tagged 
friends. Access to images is restricted by Facebook’s ever changing, complex and 
sometimes abstruse privacy settings. 
Picasa Web & Google+ store the complete EXIF metadata of all images. It is 
accessible by everyone who can access the image. The access to images is regulated on a 
per-album base. It can be set to public, restricted to people who know the secret URL to 
the album, or to the owner only. A noteworthy feature is that geo-location data can be 
protected separately. Google+ and Picasa Web allow the tagging of people in images. 
Locr is a geo-tagging focused photo-sharing site. As such, location information is 
included in most images. By default all metadata is retained in all images. Access control 
is set on a per image basis. Anybody who can see an image can also see the metadata. There 
are also extensive location-based search options. Geo-data is extracted from uploaded files 
or set by people on the Locr website. Locr uses reverse geocoding to add textual location 
information to images in its database. 
M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 21
Big Data Privacy Issues in Public Social Media 
Instagram and PicPlz are services/mobile apps that allow posting images in a 
Twitter like way. Resized images stripped of metadata but with optional location data are 
stored by the services. Additionally they allow the posting of photos to different services 
like Flickr, Facebook, Dropbox, Foursquare and more. Depending on the service used 
metadata is stored or discarded. For instance, when uploading a photo to Flickr metadata is 
stripped from the actual file, but title, description as well as geo location are extracted from 
the image and can be set by the user. In contrast, the Hipstamatic mobile app reserves in-file 
metadata when uploading images to Flickr. 
The logos for different social media apps: 
M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 22
Big Data Privacy Issues in Public Social Media 
M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 23
Big Data Privacy Issues in Public Social Media 
CHAPTER 6 
SURVEY OF METADATA IN SOCIAL MEDIA 
To underpin the growing prevalence of privacy-relevant metadata and location data 
in particular and to judge potential dangers and benefits based on real-world data we 
analyzed a set of 20,000 publicly available Flickr images and their metadata. 
Flickr was chosen as the premiere photo-sharing website, because it can be legally 
crawled, offers the full extent of privacy mechanisms and does not remove metadata in 
general. Crawling one photo each from 20k random Flickr users. Of these, 68.8% were Pro 
users where the original file could be accessed as well. For the others only the metadata 
available via the Flickr API was accessed. This includes data automatically extracted from 
EXIF data during upload and data manually added via website or semi-automatically set 
by client applications. 23% of the 20k users denied access to their extracted EXIF data in 
the Flickr database. 
Also took a set of 3,000 images made with a camera phone from 3k random mobile 
Flickr users. 46.8% of the mobile users were Pro users and only 2% denied access to EXIF 
data in the Flickr database GPS location data was present in 19% of the 20k dataset and in 
34% of the 3k mobile phone dataset. While Flickr hosts many semi-professional DSLR 
photos, mobile phones are becoming the dominant photo generation tool with the iPhone 4 
currently being the most common camera on Flickr. Textual location information like street 
or city names are currently not used much on Flickr. 
However, as reverse geocoding becomes more common in client applications this 
will change (cf. Locr in Figure 4). To evaluate the potential privacy impact, we manual 
checked which photos contained people and geo-reference, but no user profile tags – i.e. 
images which could contain people who are unaware of the photo. In the set of 20k images 
we found 16% and in in the set of 3k mobile photos we found 28% fulfilling this criteria 
M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 24
Big Data Privacy Issues in Public Social Media 
Fig. 4. Public privacy-related metadata in 5.7k random and 1k mobile user original Flickr 
photos 
Further the subset of images which were available from Pro users were analyzed, 
since these can contain the unaltered metadata from the camera. From the 20k dataset, 5761 
images contained in-file metadata. From the 3k dataset, 1050 images contained in-file 
metadata. 
Figure 5 shows the percentage values for the different types of metadata contained 
in the files. For the rest of the images metadata was either manually removed by the 
uploader or the image never has had any in the first place. Of the 20k dataset only 3% of 
the in-file metadata contained GPS data compared to 32% from the mobile 3k dataset. This 
shows a clear dominance of mobile devices when it comes to publishing GPS metadata. 
M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 25
Big Data Privacy Issues in Public Social Media 
This it is unsurprising since most compact and DSLR cameras currently do not have 
GPS receivers and only few photographers add external GPS devices to these cameras – 
but this will likely change with future cameras. However, combined with the fact that 
mobile phones are becoming the dominant type of camera where it comes to the number of 
published pictures, it is to be expected that the amount of GPS data available for scrutiny, 
use and abuse will rise further. 
Fig. 5. Public privacy-related metadata of 5k random photos from Locr service 
M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 26
Big Data Privacy Issues in Public Social Media 
A 5k dataset of random photos from Locr were collected, and analyzed the metadata 
in the images plus the images’ HTML pages built from the Locr database. Figure 4 shows 
the results from this dataset. Particularly interesting is the high rate of non-GPS based 
location information. 
This is a trend to watch since most location stripping mechanisms only strip GPS 
information and leave other text-based tags intact. Furthermore, the amount of camera ID 
meta-data is notable since these IDs can be used to link different pictures and infer meta-data 
even if data has been stripped from some of the photos. 
To summarize, one third of the pictures taken by dominant camera devices contains 
GPS information. About one third of these images depict people on it. Thus, about 10% of 
all the photos could harm other peoples’ privacy without them knowing about it. 
M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 27
Big Data Privacy Issues in Public Social Media 
CHAPTER 7 
CONCLUSION AND FUTURE WORK 
An analysis of the threat to an individual’s privacy that is created by other 
peoples’ social media is discussed. A brief overview of privacy capabilities of common 
social media services regarding their capability of protecting users from other peoples’ 
activities were discussed.Based on the survey, analyzed the Big Data privacy implications 
and potential of the emerging trend of geo-tagged social media. Then the three concepts 
how the location information can actually help users to stay in control of the flood of 
potentially harmful or interesting social media uploaded by others are presented. 
Applications can be built or modified to use pseudonyms rather than true user 
identities, and route toward greater location privacy. Because different applications can 
collide and share information about user sightings, users should adopt different 
pseudonyms for different applications. 
Furthermore, to stress more on sophisticated attacks, users should change 
pseudonyms frequently, even while being tracked. Drawing on the methods developed for 
anonymous communication, conceptual tools of mix zones and anonymity sets to analyze 
location privacy can be used. 
Although anonymity sets provide a first quantitative measure of location privacy, 
the measure is only an upper-bound estimate. A better, more accurate metric, developed 
using an information-theoretic approach, uses entropy, taking into account the a priori 
knowledge that an observer can derive from historical data.The entropy measurement 
shows a more pessimistic picture—in which the user hasless privacy—because it models 
a more powerful adversary who might use historical data to de-anonymize pseudonyms 
more accurately. 
Two services worth mentioning which collect the type ofinformation needed for a 
privacy watchdog are Social Camera and Locaccino. Social Camera is a mobile app that 
detects faces in the picture and tries to recognize them with the help of Facebook profile 
pictures of persons that are in your friends list. Recognized people can be automatically 
tagged and pictures instantly uploaded to Facebook. Locaccino is a Foursquare type 
application which allows users to upload location-based information into Facebook. 
M.Tech, 1st sem, Dept. Of CSE,VIT 2014-15 28
Big Data Privacy Issues in Public Social Media 
These two appsshow the willingness of users to share this kind of informationin the social 
web. 
Ahern et al. analyses in their work privacy decisions of mobile users in the photo 
sharing process. Ahern et al. identify relationships between the location of the photo 
capture and the corresponding privacy settings. Ahern et al. recommend the use context 
information to help users to set privacy preferences andto increase the users’ awareness of 
information aggregation. 
Work by Fang and LeFevre focuses on helping the user to find appropriate 
privacy settings in social networks. Fang and LeFevrepresent a system where the user 
initially only needs to setup a few rules. Through the use of active machine learning 
algorithms the system helps the user to protect private information based on the 
individual behavior and taste. 
Mannan et al. address the problem, that private user data is not only shared within 
social networks, but also through personal web pages. In their work they focus on a 
privacy-enabled web content sharing and utilize existing instant messaging friendship 
relations to create and enforce access policies. 
The three works shown above all focus on protecting auser’s privacy based on 
dangers created by the user himself while sharing media. It did not discuss how users can 
be protected from other peoples’ media. It is prevalent for most of the research work done 
in this area. 
Besmer et al present work which allows users that are tagged in photos to send a 
request to the owner to hide the linked photo from certain people. This approach also 
follows the idea that forewarned is forearmed and that creating awareness of critical 
content is the first step towards the solution of the problem. However the work relies on 
direct technical tags and as such does not cover the same scope asthe privacy watchdog 
suggested in the paper. 
Work that also takes into account other users’ media ispresented by Squicciarini et 
al. who postulate that mostof the shared data does not only belong to a single user. 
Therefore Squicciarini et al. propose a system to share the ownership of media items and 
by that strive to establish a collaborative privacy management for shared content. 
M.Tech, 1st sem, Dept. Of CSE,VIT 2014-15 29
Big Data Privacy Issues in Public Social Media 
Squicciarini et al. Prototype is implemented as a Facebook app and is based on game 
theory, rewarding users that promote co-ownerships of media items. While the work does 
take into account other users’ media, unlike the approach it does not cope with previously 
unknown and unrelated users. 
M.Tech, 1st sem, Dept. Of CSE,VIT 2014-15 30
Big Data Privacy Issues in Public Social Media 
REFERENCES 
[1] S. Ahern, D. Eckles, N. Good, S. King, M. Naaman, and R. Nair. 
Overexposed?:privacy patterns and considerations in online and mobile photosharing. In 
Proceedings of the SIGCHI conference on Human factorsin computing systems, pages 
357–366, 2007. 
[2] C. a. Ardagna, M. Cremonini, S. De Capitani di Vimercati, and P. Samarati. An 
Obfuscation-Based Approach for Protecting Location Privacy.IEEE Transactions on 
Dependable and Secure Computing, 8(1):13–27,June 2009. 
[3] A. Beresford and F. Stajano. Location privacy in pervasive computing.IEEE Pervasive 
Computing, 2(1):46–55, Jan. 2003. 
[4] A. Besmer and H. Richter Lipford. Moving beyond untagging. In Proceedings of the 
28th international conference on Human factors incomputing systems - CHI ’10, page 
1563, Apr. 2010. 
[5] D. Boyd. Facebook’s privacy trainwreck: Exposure, invasion, and social convergence. 
Convergence: The International Journal of Research into New Media Technologies, 
14(1):13–20, 2008. 
[6] d. Boyd and K. Crawford. Six Provocations for Big Data. SSRNeLibrary, 2011. 
[7] D. Boyd and E. Hargittai. Facebook privacy settings: Who cares? FirstMonday, 15(8), 
2010. 
[8] Carnegie Mellons Mobile Commerce Laboratory. Locaccino - a usercontrollable 
location-sharing tool. http://locaccino.org/, 2011. 
[9] E. Eldon. New Facebook Statistics Show Big Increase in Content Sharing, Local 
Business Pages. http://goo.gl/ebGQH, February 2010. 
[10] Facebook. Statistics. 2011. http://www.facebook.com/press/info.php? statistics. 
[11] L. Fang and K. LeFevre. Privacy wizards for social networking sites.In Proceedings 
of the 19th international conference on World wide web- WWW ’10, page 351. ACM 
Press, Apr. 2010. 
[12] Flickr. Camera Finder. http://www.flickr.com/cameras, October 2011. 
M.Tech, 1st sem, Dept. Of CSE,VIT 2014-15 31
Big Data Privacy Issues in Public Social Media 
[13] B. Gedik and L. Liu. Protecting Location Privacy with Personalized k-Anonymity: 
Architecture and Algorithms. IEEE Transactions on MobileComputing, 7(1):1–18, Jan. 
2008. 
[14] M. Mannan and P. C. van Oorschot. Privacy-enhanced sharing ofpersonal content on 
the web. In Proceeding of the 17th international conference on World Wide Web - WWW 
’08, page 487. ACM Press,April 2008. 
[15] Metadata Working Group. Gartner special report –patternbasedstrategy: Getting 
value from big data. www.gartner.com/patternbasedstrategy, 2012. 
[16] A. C. Squicciarini, M. Shehab, and F. Paci. Collective privacy managementin social 
networks. In Proceedings of the 18th international conference on World wide webWWW 
’09, page 521. ACM Press, Apr.2009. 
[17] Viewdle. SocialCamera. http://www.viewdle.com/products/mobile/index.html. 
M.Tech, 1st sem, Dept. Of CSE,VIT 2014-15 32

Más contenido relacionado

La actualidad más candente

Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentalsrjain51
 
Data Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research OpportunitiesData Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research OpportunitiesKathirvel Ayyaswamy
 
big data Big Things
big data Big Thingsbig data Big Things
big data Big Thingspateelhs
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementationSandip Tipayle Patil
 
Addressing Big Data Challenges - The Hadoop Way
Addressing Big Data Challenges - The Hadoop WayAddressing Big Data Challenges - The Hadoop Way
Addressing Big Data Challenges - The Hadoop WayXoriant Corporation
 
Big data ppt
Big data pptBig data ppt
Big data pptYash Raj
 
Big Data Information Architecture PowerPoint Presentation Slide
Big Data Information Architecture PowerPoint Presentation SlideBig Data Information Architecture PowerPoint Presentation Slide
Big Data Information Architecture PowerPoint Presentation SlideSlideTeam
 
Big Data : Risks and Opportunities
Big Data : Risks and OpportunitiesBig Data : Risks and Opportunities
Big Data : Risks and OpportunitiesKenny Huang Ph.D.
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysisPoonam Kshirsagar
 
BIG Data and Methodology-A review
BIG Data and Methodology-A reviewBIG Data and Methodology-A review
BIG Data and Methodology-A reviewShilpa Soi
 
The 17 V’s of Big Data
The 17 V’s of Big DataThe 17 V’s of Big Data
The 17 V’s of Big DataIRJET Journal
 
Big data, data science & fast data
Big data, data science & fast dataBig data, data science & fast data
Big data, data science & fast dataKunal Joshi
 

La actualidad más candente (20)

Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentals
 
Data Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research OpportunitiesData Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research Opportunities
 
big data Big Things
big data Big Thingsbig data Big Things
big data Big Things
 
Bigdata analytics
Bigdata analyticsBigdata analytics
Bigdata analytics
 
Big Data: Issues and Challenges
Big Data: Issues and ChallengesBig Data: Issues and Challenges
Big Data: Issues and Challenges
 
Big data-ppt
Big data-pptBig data-ppt
Big data-ppt
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementation
 
Addressing Big Data Challenges - The Hadoop Way
Addressing Big Data Challenges - The Hadoop WayAddressing Big Data Challenges - The Hadoop Way
Addressing Big Data Challenges - The Hadoop Way
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data mining
Big data miningBig data mining
Big data mining
 
Big Data Information Architecture PowerPoint Presentation Slide
Big Data Information Architecture PowerPoint Presentation SlideBig Data Information Architecture PowerPoint Presentation Slide
Big Data Information Architecture PowerPoint Presentation Slide
 
Big data
Big dataBig data
Big data
 
Big Data : Risks and Opportunities
Big Data : Risks and OpportunitiesBig Data : Risks and Opportunities
Big Data : Risks and Opportunities
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysis
 
Big data survey
Big data surveyBig data survey
Big data survey
 
Data mining on big data
Data mining on big dataData mining on big data
Data mining on big data
 
BIG Data and Methodology-A review
BIG Data and Methodology-A reviewBIG Data and Methodology-A review
BIG Data and Methodology-A review
 
Big data
Big dataBig data
Big data
 
The 17 V’s of Big Data
The 17 V’s of Big DataThe 17 V’s of Big Data
The 17 V’s of Big Data
 
Big data, data science & fast data
Big data, data science & fast dataBig data, data science & fast data
Big data, data science & fast data
 

Similar a Big data privacy issues in public social media

Social media platform and Our right to privacy
Social media platform and Our right to privacySocial media platform and Our right to privacy
Social media platform and Our right to privacyvivatechijri
 
Social Media Privacy Protection for Blockchain with Cyber Security Prediction...
Social Media Privacy Protection for Blockchain with Cyber Security Prediction...Social Media Privacy Protection for Blockchain with Cyber Security Prediction...
Social Media Privacy Protection for Blockchain with Cyber Security Prediction...IRJET Journal
 
UNIT 1 -BIG DATA ANALYTICS Full.pdf
UNIT 1 -BIG DATA ANALYTICS Full.pdfUNIT 1 -BIG DATA ANALYTICS Full.pdf
UNIT 1 -BIG DATA ANALYTICS Full.pdfvvpadhu
 
A Survey on Big Data Mining Challenges
A Survey on Big Data Mining ChallengesA Survey on Big Data Mining Challenges
A Survey on Big Data Mining ChallengesEditor IJMTER
 
Mining Big Data to Predicting Future
Mining Big Data to Predicting FutureMining Big Data to Predicting Future
Mining Big Data to Predicting FutureIJERA Editor
 
Big data - a review (2013 4)
Big data - a review (2013 4)Big data - a review (2013 4)
Big data - a review (2013 4)Sonu Gupta
 
Big Data and Information Security
Big Data and Information SecurityBig Data and Information Security
Big Data and Information Securityijceronline
 
Isolating values from big data with the help of four v’s
Isolating values from big data with the help of four v’sIsolating values from big data with the help of four v’s
Isolating values from big data with the help of four v’seSAT Journals
 
SECUREWALL-A FRAMEWORK FOR FINEGRAINED PRIVACY CONTROL IN ONLINE SOCIAL NETWORKS
SECUREWALL-A FRAMEWORK FOR FINEGRAINED PRIVACY CONTROL IN ONLINE SOCIAL NETWORKSSECUREWALL-A FRAMEWORK FOR FINEGRAINED PRIVACY CONTROL IN ONLINE SOCIAL NETWORKS
SECUREWALL-A FRAMEWORK FOR FINEGRAINED PRIVACY CONTROL IN ONLINE SOCIAL NETWORKSZac Darcy
 
Data Mining in the World of BIG Data-A Survey
Data Mining in the World of BIG Data-A SurveyData Mining in the World of BIG Data-A Survey
Data Mining in the World of BIG Data-A SurveyEditor IJCATR
 
The evolution of research on social media
The evolution of research on social mediaThe evolution of research on social media
The evolution of research on social mediaFarida Vis
 
IDENTITY DISCLOSURE PROTECTION IN DYNAMIC NETWORKS USING K W – STRUCTURAL DIV...
IDENTITY DISCLOSURE PROTECTION IN DYNAMIC NETWORKS USING K W – STRUCTURAL DIV...IDENTITY DISCLOSURE PROTECTION IN DYNAMIC NETWORKS USING K W – STRUCTURAL DIV...
IDENTITY DISCLOSURE PROTECTION IN DYNAMIC NETWORKS USING K W – STRUCTURAL DIV...IJITE
 
Data Science For Social Good: Tackling the Challenge of Homelessness
Data Science For Social Good: Tackling the Challenge of HomelessnessData Science For Social Good: Tackling the Challenge of Homelessness
Data Science For Social Good: Tackling the Challenge of HomelessnessAnita Luthra
 
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...ijcseit
 
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...ijcseit
 
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...ijcseit
 
Big data analytics and its impact on internet users
Big data analytics and its impact on internet usersBig data analytics and its impact on internet users
Big data analytics and its impact on internet usersStruggler Ever
 

Similar a Big data privacy issues in public social media (20)

204
204204
204
 
Social media platform and Our right to privacy
Social media platform and Our right to privacySocial media platform and Our right to privacy
Social media platform and Our right to privacy
 
Social Media Privacy Protection for Blockchain with Cyber Security Prediction...
Social Media Privacy Protection for Blockchain with Cyber Security Prediction...Social Media Privacy Protection for Blockchain with Cyber Security Prediction...
Social Media Privacy Protection for Blockchain with Cyber Security Prediction...
 
UNIT 1 -BIG DATA ANALYTICS Full.pdf
UNIT 1 -BIG DATA ANALYTICS Full.pdfUNIT 1 -BIG DATA ANALYTICS Full.pdf
UNIT 1 -BIG DATA ANALYTICS Full.pdf
 
A Survey on Big Data Mining Challenges
A Survey on Big Data Mining ChallengesA Survey on Big Data Mining Challenges
A Survey on Big Data Mining Challenges
 
Mining Big Data to Predicting Future
Mining Big Data to Predicting FutureMining Big Data to Predicting Future
Mining Big Data to Predicting Future
 
Big data - a review (2013 4)
Big data - a review (2013 4)Big data - a review (2013 4)
Big data - a review (2013 4)
 
Big Data and Information Security
Big Data and Information SecurityBig Data and Information Security
Big Data and Information Security
 
Big data Paper
Big data PaperBig data Paper
Big data Paper
 
Isolating values from big data with the help of four v’s
Isolating values from big data with the help of four v’sIsolating values from big data with the help of four v’s
Isolating values from big data with the help of four v’s
 
SECUREWALL-A FRAMEWORK FOR FINEGRAINED PRIVACY CONTROL IN ONLINE SOCIAL NETWORKS
SECUREWALL-A FRAMEWORK FOR FINEGRAINED PRIVACY CONTROL IN ONLINE SOCIAL NETWORKSSECUREWALL-A FRAMEWORK FOR FINEGRAINED PRIVACY CONTROL IN ONLINE SOCIAL NETWORKS
SECUREWALL-A FRAMEWORK FOR FINEGRAINED PRIVACY CONTROL IN ONLINE SOCIAL NETWORKS
 
Data Mining in the World of BIG Data-A Survey
Data Mining in the World of BIG Data-A SurveyData Mining in the World of BIG Data-A Survey
Data Mining in the World of BIG Data-A Survey
 
Research paper on big data and hadoop
Research paper on big data and hadoopResearch paper on big data and hadoop
Research paper on big data and hadoop
 
The evolution of research on social media
The evolution of research on social mediaThe evolution of research on social media
The evolution of research on social media
 
IDENTITY DISCLOSURE PROTECTION IN DYNAMIC NETWORKS USING K W – STRUCTURAL DIV...
IDENTITY DISCLOSURE PROTECTION IN DYNAMIC NETWORKS USING K W – STRUCTURAL DIV...IDENTITY DISCLOSURE PROTECTION IN DYNAMIC NETWORKS USING K W – STRUCTURAL DIV...
IDENTITY DISCLOSURE PROTECTION IN DYNAMIC NETWORKS USING K W – STRUCTURAL DIV...
 
Data Science For Social Good: Tackling the Challenge of Homelessness
Data Science For Social Good: Tackling the Challenge of HomelessnessData Science For Social Good: Tackling the Challenge of Homelessness
Data Science For Social Good: Tackling the Challenge of Homelessness
 
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
 
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
 
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
 
Big data analytics and its impact on internet users
Big data analytics and its impact on internet usersBig data analytics and its impact on internet users
Big data analytics and its impact on internet users
 

Último

APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 

Último (20)

Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 

Big data privacy issues in public social media

  • 1. VISVESVARAYA TECHNOLOGICAL UNIVERSITY Belagavi, Karnataka A Seminar Report on “BIG DATA PRIVACY ISSUES IN PUBLIC SOCIAL MEDIA” Submitted in partial fulfillment of the requirements for the award of the Degree of Master of Technology In Computer Science and Engineering Submitted by SUPRIYA R 1VI14SCS11 Guided By Mrs. MARY VIDYA JOHN Assistant Professor, Dept. of CSE Vemana IT, Bengaluru – 34. DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING VEMANA INSTITUTE OF TECHNOLOGY Koramangala, Bengaluru – 560 034. 2014-2015
  • 2. Karnataka Reddyjana Sangha® VEMANA INSTITUTE OF TECHNOLOGY Koramangala, Bengaluru – 34 (Affiliated to Visvesvaraya Technological University, Belagavi) Department of Computer Science & Engineering Certificate This is to certify that SUPRIYA R, bearing U.S.N 1VI14SCS11, a student of I semester, Computer Science & Engineering has completed the seminar entitled “BIG DATA PRIVACY ISSUES IN PUBLIC SOCIAL MEDIA” in partial fulfillment for the requirement of the award of degree of Master of Technology in Computer Science and Engineering of the Visvesvaraya Technological University, Belagavi during the academic year 2014-2015. The seminar report has been approved as it satisfies the academic requirements in respect of seminar work prescribed for the said degree. ________________ ______________ Signature of the Guide Signature of HOD MRS. MARY VIDYA JOHN DR.NARASIMHA MURTHY K.N Name of the Examiner Signature with date 1.____________________ ________________ 2.____________________ ________________
  • 3. ACKNOWLEDGEMENT I am indeed very happy to greatly acknowledge the numerous personalities involved in lending their help to make my seminar “Big Data Privacy Issues in Public Social Media” a successful one. Salutations to my beloved and esteemed institute “Vemana institute of technology” for having well qualified staff and labs furnished with necessary equipments. I express my gratitude to DR. P. GIRIDHARA REDDY, Principal, Vemana IT for providing the best facilities and without his constructive criticisms and suggestions I wouldn’t have been able to do this curriculum. I express my sincere gratitude to DR.NARASIMHA MURTHY K.N, Professor and H.O.D (PG), Dept of Computer Science and Engineering, a great motivator and think-tank. I am very much grateful to him for providing an opportunity to present my seminar. I also express my gratitude to internal guide Mrs. MARY VIDYA JOHN, Assistant Professor for giving the sort of encouragement in preparing the report, presenting the paper, spirit and useful guidance. I also thank my parents and friends for their moral support and constant guidance made ii my efforts fruitful. Date: SUPRIYA R Place: Bangalore 1VI14SCS11
  • 4. ABSTRACT Big Data is a new label given to a diverse field of data intensive information in which the datasets are so large that they are difficult to work with effectively. The term has been mainly used in two contexts, firstly as a technological challenge when dealing with data-intensive domains such as high energy physics, astronomy or internet search, and secondly as a sociological problem when data about us is collected and mined by companies such as Facebook, Google, Mobile phone companies and Governments. The concentration is on the second issue from a new perspective, namely how the user can gain awareness of the personally relevant part Big Data that is publicly available in the social web. The amount of user-generated media uploaded to the web is expanding rapidly and it is beyond the capabilities of any human to sift through it all to see which media impacts our privacy. Based on an analysis of social media in Flickr, Locr, Facebook and Google+, privacy implications and potential of the emerging trend of geo-tagged social media are discussed. At last, a concept with which users can stay informed about which parts of the social Big Data is relevant to them. iii
  • 5. CONTENTS CHAPTER NO. TITLE PAGE NO ABSTRACT iii CONTENTS iv LIST OF FIGURES v iv 1 1 INTRODUCTION 1.1 Overview 1 2 LITERARURE SURVEY 3 2.1 Location privacy in pervasive computing 3 2.2 Facebook’s trainwreck: exposure and invasion 6 2.3 Facebook’s privacy settings: Who cares? 8 2.4 Protecting location privacy with K-Anonymity 12 3 PROBLEM STATEMENT 14 4 SYSTEM ANALYSIS 16 4.1 Existing system 16 4.2 Proposed system 17 5 THREAT ANALYSIS 19 5.1 Awareness of damaging media in big datasets 19 5.2 Analysis of Service Privacy 20 6 SURVEY OF METADATA IN SOCIAL MEDIA 24 7 CONCLUSION AND FUTURE WORK 27 REFERENCES 30
  • 6. LIST OF FIGURES FIGURE NO. TITLE PAGE. NO Figure 1 Mixed zone arrangement 05 Figure 2 Facebook message in Dec 2009 10 Figure 3 Facebook’s simplified setting 11 Figure 4 Public Privacy metadata from Flickr 22 Figure 5 Public Privacy metadata from Locr 23 v
  • 7. Big Data Privacy Issues in Public Social Media CHAPTER 1 INTRODUCTION In this electronic age, increasing number of organizations are facing the problem of explosion of data and the size of the databases used in today’s enterprises has been growing at exponential rates. Data is generated through many sources like business processes, transactions, social networking sites, web servers, etc. and remains in structured as well as unstructured form . Today's business applications are having enterprise features like large scale, data-intensive, web-oriented and accessed from diverse devices including mobile devices. Processing or analyzing the huge amount of data or extracting meaningful information is a challenging task. 1.1 Overview Big data is an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process them using traditional data processing applications. It is becoming a hot topic in many areas where datasets are so large that they can no longer be handled effectively or even completely. Or put differently, any task that is comparatively easy to execute when operating on a small set of data, but becomes unmanageable when dealing with the same problem with a large dataset can be classified as a Big Data problem. Data sets grow in size in part because they are increasingly being gathered by ubiquitous information-sensing mobile devices, remote sensing, software logs, cameras, microphones and wireless sensor networks. Typical problems that are encountered when dealing with Big Data includes capture, storage, dissemination, search, analytics and visualization. The traditional data-intensive sciences such as astronomy, high energy physics, meteorology, and genomics, biological and environmental research in which peta- and exabytes of data are generated are common domain examples. Here even the capture and storage of the data is a challenge. But there are also new domains emerging on the Big Data paradigm such as data warehousing, Internet and social web search, finance and business informatics. Here datasets can be small compared to the previous domains, however the complexity of the data can still lead to the classification as a Big Data problem. M.Tech, 1st sem, Dept.of CSE, VIT 2014-15 1
  • 8. Big Data Privacy Issues in Public Social Media The traditional Big Data applications such as astronomy and other e-sciences usually operate on non-personal information and as such usually do not have significant privacy issues. The privacy critical Big Data applications lie in the new domains of the social web, consumer and business analytics and governmental surveillance. In these domains Big Data research is being used to create and analyze profiles of users, for example for market research, targeted advertisement, workflow improvement or national security. These are very contentious issues since it is entirely up to the controller of the Big Data sets if the information obtained from various sources is used for criminal purposes or not. In particular in the context of the social web there is an increasing awareness of the value, potential and risk of the personal data which users voluntarily upload to the web. However, the Big Data issue in this area has focused almost entirely on what the controlling companies do with this information. These concerns are being addressed by calls for regulatory intervention, i.e. regulating what companies are allowed to do with the data users give them or what data they are allowed to gather about users. Social media Big Data issue is examined. The growing proliferation and capabilities of mobile devices which is creating a deluge of social media which effects our privacy is discussed. Due to the vast amounts of data being uploaded every day it is next to impossible to be aware of everything which affects us. Then a concept which can be used to regain control of some of the Big Data deluge created by other social web users is discussed. M.Tech, 1st sem, Dept.of CSE, VIT 2014-15 2
  • 9. Big Data Privacy Issues in Public Social Media CHAPTER 2 LITERATURE SURVEY 2.1 Location Privacy in Pervasive Computing Mix zones Most theoretical models of anonymity and pseudonymity originate from the work of David Chaum, whose pioneering contributions include two constructions for anonymous communications (the mix network6 and the dining cryptographers algorithm) and the notion of anonymity sets. Introduction to a new entity for location systems, the mix zone, which is analogous to a mix node in communication systems is discussed. Using the mix zone to model the spatiotemporal problem, useful techniques from the anonymous communications field are adopted. Mix networks and mix nodes A mix network is a store-and-forward network that offers anonymous communication facilities. The network contains normal message-routing nodes alongside special mix nodes. Even hostile observers who can monitor all the links in the network cannot trace a message from its source to its destination without the collusion of the mix nodes. In its simplest form, a mix node collects n equal-length packets as input and reorders them by some metric (for example, lexicographically or randomly) before forwarding them, thus providing unlinkability between incoming and outgoing messages. For briefness, some essential details about layered encryption of packets are omitted; interested readers should consult Chaum’s original work. The number of distinct senders in the batch provides a measure of the unlinkability between the messages coming in and going out of the mix. Chaum later abstracted this last observation into the concept of an anonymity set. As paraphrased by Andreas Pfitzmann and Marit Köhntopp, whose recent survey generalizes and formalizes the terminology on anonymity and related topics, the anonymity set is “the set of all possible subjects who might cause an action.” M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 3
  • 10. Big Data Privacy Issues in Public Social Media In anonymous communications, the action is usually that of sending or receiving a message. The larger the anonymity set’s size, the greater the anonymity offered. Conversely, when the anonymity set reduces to a singleton—the anonymity set cannot become empty for an action that was actually performed—the subject is completely exposed and loses all anonymity. Mix zones A mix zone for a group of users as a connected spatial region of maximum size in which none of these users have registered any application callback has been discussed; for a given group of users there might be several distinct mix zones. In contrast, an application zone as an area where a user has registered for a callback. The middleware system can define the mix zones and calculate them separately for each group of users as the spatial areas currently not in any application zone. Because applications do not receive any location information when users are in a mix zone, the identities are “mixed.” Assuming users change to a new, unused pseudonym whenever they enter a mix zone, applications that see a user emerging from the mix zone cannot distinguish that user from any other who was in the mix zone at the same time and cannot link people going into the mix zone with those coming out of it. If a mix zone has a diameter much larger than the distance the user can cover during one location update period, it might not mix users adequately. M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 4
  • 11. Big Data Privacy Issues in Public Social Media Figure 1. A sample mix zone arrangement with three application zones. The airline agency (A) is much closer to the bank (B) than the coffee shop (C). Users leaving A and C at the same time might be distinguishable on arrival at B. Figure 1 provides a plan view of a single mix zone with three application zones around the edge: an airline agency (A), a bank (B), and a coffee shop (C). Zone A is much closer to B than C, so if two users leave A and C at the same time and a user reaches B at the next update period, an observer will know the user emerging from the mix zone at B is not the one who entered the mix zone at C. Furthermore, if nobody else was in the mix zone at the time, the user can only be the one from A. If the maximum size of the mix zone exceeds the distance a user covers in one period, mixing will be incomplete. The amount to which the mix zone anonymizes users is therefore smaller than one might believe by looking at the anonymity set size. M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 5
  • 12. Big Data Privacy Issues in Public Social Media 2.2 Facebook’s Privacy Train-wreck. Exposure, Invasion, and Social Convergence The tech world has a tendency to view the concept of ‘private’ as a single bit that is either 0 or 1. Data are either exposed or not. When companies make a decision to make data visible in a more ‘efficient’ manner, it is often startling, prompting users to speak of a disruption of ‘privacy’. Facebook, did not make anything public that was not already public, but search disrupted the social dynamics. The reason for this is that privacy is not simply about the state of an inanimate object or set of bytes; it is about the sense of vulnerability that an individual experiences when negotiating data. On Facebook, people were wandering around accepting others as Friends, commenting on others’ pages, checking out what others posted, and otherwise participating in the networked environment. Now, imagine that everyone involved notices all this because it is displayed when they login. By aggregating all this information and projecting it to everyone’s News Feed, Facebook made what was previously obscure difficult to miss (and even harder to forget). Those data were all there before but were not efficiently accessible; they were not aggregated. The acoustics changed and the social faux pas was suddenly very visible. Without having privacy features, participants had to reconsider each change that they made because they knew it would be broadcast to all their Friends. With Facebook, participants have to consider how others might interpret their actions, knowing that any action will be broadcast to everyone with whom they consented to digital Friendship. For many, it is hard to even remember whom they listed as Friends, let alone assess the different ways in which they may interpret information. Without being able to see their Friends’ reactions, they are not even aware of when their posts have been misinterpreted. Facebook imply that people can maintain infinite numbers of friends provided they have digital management tools. While having hundreds of Friends on social network sites is not uncommon, users are not actually keeping up with the lives of all of those people. These digital Friends are not necessarily close friends and friendship management tools are not good enough for building and maintaining close ties. While Facebook assumes that all M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 6
  • 13. Big Data Privacy Issues in Public Social Media Friends are friends, participants have varied reasons for maintaining Friendship ties on the site that have nothing to do with daily upkeep. News Feeds does not distinguish between these – all Friends are treated equally and updates come from all Friends, not just those that an individual deems to be close friends. To complicate matters, when data are there, people want to pay attention, even if it doesn’t help them. People relish personal information because it is the currency of social hierarchy and connectivity. Cognitive addiction to social information is great for Facebook because News Feeds makes Facebook sticky. Privacy is not simply about zero’s and ones, it is about how people experience their relationship with others and with information. Privacy is a sense of control over information, the context where sharing takes place, and the audience who can gain access. Information is not private because no one knows it; it is private because the knowing is limited and controlled. In most scenarios, the limitations are often more social than structural. Secrets are considered the most private of social information because keeping knowledge private is far more difficult than spreading it. There is an immense gray are between secrets and information intended to be broadcast as publicly as possible. By and large, people treated Facebook as being in that gray zone. Participants were not likely to post secrets, but they often posted information that was only relevant in certain contexts. The assumption was that if you were visiting someone’s page, you could access information in context. When snippets and actions were broadcast to the News Feed, they were taken out of context and made far more visible than seemed reasonable. In other words, with News Feeds, Facebook obliterated the gray zone. Social convergence occurs when disparate social contexts are collapsed into one. Even in public settings, people are accustomed to maintaining discrete social contexts separated by space. How one behaves is typically dependent on the norms in a given social context. How one behaves in a pub differs from how one behaves in a family park, even though both are ostensibly public. Social convergence requires people to handle disparate audiences simultaneously without a social script. While social convergence allows information to be spread more efficiently, this is not always what people desire. As with other forms of convergence, control is lost with social convergence. Can we celebrate M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 7
  • 14. Big Data Privacy Issues in Public Social Media people’s inability to control private information in the same breath as we celebrate mainstream media’s inability to control what information is broadcast to us? Privacy is not an inalienable right – it is a privilege that must be protected social and structurally in order to exist. The question remains as to whether or not privacy is something that society wishes to support. When Facebook launched News Feeds, they altered the architecture of information flow. This was their goal and with a few tweets, they were able to convince their users that the advantages of News Feeds outweighed security through obscurity. Users quickly adjusted to the new architecture; they began taking actions solely so that they could be broadcast across to Friends’ News Feeds. 2.3 Facebook privacy settings: Who cares? Facebook’s approach to privacy was initially network–centric. By default, students’ content was visible to all other students on the same campus, but no one else. Through a series of redesigns, Facebook provided users with controls for determining what could be shared with whom, initially allowing them to share with “No One”, “Friends”, “Friends– of–Friends”, or a specific “Network”. When Facebook became a platform upon which other companies could create applications, users’ content could then be shared with third–party developers who used Facebook data as part of the “Apps” that they provided. The company introduced privacy settings to allow users to determine which third parties could access what content when users encountered a message whenever they chose to add an application. Over time, Facebook introduced the ability to share content with “Everyone” (inside Facebook or not). Increasingly, the controls got more complex and media reports suggest that users found themselves uncertain about what they meant (Bilton, 2010). Recognizing the validity of this point, Facebook eventually simplified its privacy settings page (Zuckerberg, 2010). At each point when Facebook introduced new options for sharing content, the default was to share broadly. Following a series of redesigns in December 2009, Facebook added a prompt when users logged on asking them to reconsider their privacy settings for various types of content on the site, including “Posts I Create: Status Updates, Likes, Photos, Videos, and Notes“ (see Figure 2). For each item, users were given two options: “Everyone” or “Old Settings.” The toggle buttons defaulted to “Everyone.” M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 8
  • 15. Big Data Privacy Issues in Public Social Media This message appeared when users logged into their account and it was impossible to go to the rest of the site without addressing the prompt. Faced with this obligatory prompt, many users may well have just clicked through, accepting the defaults that Facebook had chosen. In doing so, these users made much of their content more accessible than was previously the case. As part of these changes, everyone’s basic profile and Friends list became available to the public, regardless of whether or not they had previously chosen to restrict access; this was later redacted. When Facebook was challenged by the Federal Trade Commission to defend its decision about this approach, the company representative noted that 35 percent of users who had never before edited their settings did so when prompted. Facebook used these data to highlight that more people engaged with Facebook privacy settings than the industry average of 5–10 percent. While 35 percent may be significantly more than the industry average, and Facebook did not specify what percentage of users had never adjusted their settings, there is still likely a sizeable majority that accepted the site’s defaults each time changes had been implemented. M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 9
  • 16. Big Data Privacy Issues in Public Social Media Figure 2: The message Facebook users saw in December 2009. As privacy advocates and regulators investigated what these changes meant, Facebook moved toward another series of modifications that depended on users’ willingness to share information. In April 2010, at f8 conference, Facebook announced Instant Personalizer and Social Plugins, two services that allowed partners to leverage the social graph — the information about one’s relationships on the site that the user makes available to the system — and provide a channel for sharing information between Facebook and third parties. For example, Web sites could implement a Like button on their own pages that enables users to share content from that site with the user’s connections on Facebook. Sites could also implement an Activity Feed that publicizes what a person’s Friends do on that site. These tools were built for developers and likely few users, journalists, or regulators had any sense of what data from their accounts and actions on Facebook were being shared with whom M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 10
  • 17. Big Data Privacy Issues in Public Social Media under what circumstances. As the discussions became more voracious, Facebook was increasingly pressured to respond. On 26 May 2010, Zuckerberg announced that Facebook heard the concerns and believed that the major issue on the table was Facebook’s confusing privacy settings page. Facebook unveiled a new privacy settings page that, while simpler, also removed many of the controls that allowed users to limit what content could be restricted (see Figure 3). This quelled much of the news coverage, but it is unlikely that it would lead to the end of controversy as in July 2010, a Canadian law firm filed a class action suit against Facebook over privacy issues. Figure 3: Facebook’s “simplified” privacy settings, July 2010. M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 11
  • 18. Big Data Privacy Issues in Public Social Media 2.4 Protecting Location Privacy with Personalized k-Anonymity: Architecture and Algorithms There are two popular approaches to protect location privacy in the context of Location Based Services (LBS) usage: policy-based and anonymity-based approaches. In policy-based approaches, mobile clients specify their location privacy preferences as policies and completely trust that the third-party LBS providers adhere to these policies. In the anonymity-based approaches, the LBS providers are assumed to be semi honest instead of completely trusted. A k-anonymity preserving management of location information by developing efficient and scalable system-level facilities for protecting the location privacy through ensuring location k-anonymity. Assume that anonymous location-based applications do not require user identities for providing service. The concept of k-anonymity was originally introduced in the context of relational data privacy. It addresses the question of “how a data holder can release its private data with guarantees that the individual subjects of the data cannot be identified whereas the data remain practically useful”. For instance, a medical institution may want to release a table of medical records with the names of the individuals replaced with dummy identifiers. However, some set of attributes can still lead to identity breaches. These attributes are referred to as the quasi-identifier. For instance, the combination of birth date, zip code, and gender attributes in the disclosed table can uniquely determine an individual. By joining such a medical record table with some publicly available information source like a voters list table, the medical information can be easily linked to individuals. K-anonymity prevents such a privacy breach by ensuring that each individual record can only be released if there are at least k - 1 distinct individuals whose associated records are indistinguishable from the former in terms of their quasi-identifier values. In the context of LBSs and mobile clients, location k-anonymity refers to the k-anonymity usage of location information. A subject is considered location k-anonymous if and only if the location information sent from a mobile client to an LBS is indistinguishable from the location information of at least k - 1 other mobile clients. Paper proposses that location perturbation is an effective technique for supporting location k-anonymity and M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 12
  • 19. Big Data Privacy Issues in Public Social Media dealing with location privacy breaches exemplified by the location inference attack scenarios. If the location information sent by each mobile client is perturbed by replacing the position of the mobile client with a coarser grained spatial range such that there are k- 1 other mobile clients within that range (k > 1), then the adversary will have uncertainty in matching the mobile client to a known location-identity association or an external observation of the location-identity binding. This uncertainty increases with the increasing value of k, providing a higher degree of privacy for mobile clients, even though the LBS provider knows that person A was reported to visit location L, it cannot match a message to this user with certainty. This is because there are at least k different mobile clients within the same location L. As a result, it cannot perform further tracking without ambiguity. M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 13
  • 20. Big Data Privacy Issues in Public Social Media CHAPTER 3 PROBLEM STATEMENT The amount of social media being uploaded into the web is growing rapidly and there is still no end to this trend in sight. The ease-of-use of modern smartphones and the proliferation of high-speed mobile networks is facilitating a culture of spontaneous and carefree uploading of user-generated content. To give an idea of the scale of this phenomenon: Just in the last two years the number of photos uploaded to Facebook per month has risen from 2 billion to over 6 billion. From a personal perspective an overwhelming majority of these photos have no privacy relevance for oneself. Finding the few that are relevant is a daunting task. While one’s own media is uploaded consciously, the flood of media uploaded by others is so huge that it is almost impossible to stay aware of all media in which one might be depicted. This can be classified as a Big Data problem on the user’s side, however not on the provider side. Current social networks and photo-sharing sites mainly focus on the privacy of users’ own media in terms of access control, but do little to deal with the privacy implications created by other users’ media. There are ever more complex settings allowing a user to decide who is allowed to see what content but only content owned by the user. However, the issue of staying on top of what others are uploading (mostly in good faith), that might also be relevant to the user, is still very much outside the control of that user. Social networks which allow tagging of users usually inform affected users when they are tagged. However, if no such tagging is done by the uploader or a third party, there are currently no mechanisms to inform users of relevant media. A second important emerging trend is the capability of many modern devices to embed geo-data and other metadata into the created content. While the privacy issues of location-based services such as Foursquare or Qype have been discussed at great length, the privacy issues of location information embedded into uploaded media have not yet received much attention. There is one very significant difference between these two categories. In the first, users reveal their current location to access online services, such as Google Maps, Yelp or Qype or the user actually publishes his location on a social network M.Tech, 1st sem, Dept.of CSE, VIT 2014-15 14
  • 21. Big Data Privacy Issues in Public Social Media site like Foursquare, Google Latitude or Facebook Places. In this category, the user mainly affects his own privacy. There is a large body of work examining privacy preserving techniques to protect a user’s own privacy, ranging from solutions which are installed locally on the user’s mobile device, to solutions which use online services relying on group-based anonymisation algorithms, as for instance mix zones or k-anonymity. The second category is created by media which contain location information. This can have all the same privacy implications for the creator of the media, however, a critical and hitherto often overlooked issue is the fact, that the location and other metadata contained in pictures and videos can also affect other people than the uploader himself. This is a critical oversight and an issue which will gain importance as the mobile smart device boom continues. M.Tech, 1st sem, Dept.of CSE, VIT 2014-15 15
  • 22. Big Data Privacy Issues in Public Social Media CHAPTER 4 SYSTEM ANALYSIS 4.1 EXISTING SYSTEM Privacy issues are categorized into two classes. Firstly, homegrown problems: Someone uploads a piece of compromising media of himself with insufficient protection or forethought which causes damage to his own privacy. A prime example of this category is someone uploading compromising pictures of himself into a public album instead of a private one or onto his Timeline instead of a message. The damage done in these cases is very obvious since the link between the content and the user is direct and the audience (often the peer circle) has direct interest in the content. One special facet of this problem is that what is considered damaging content by the user can and often does change over time. While this is a serious problem, especially amongst the Facebook generation, this issue is a small data problem and thus is not the focus of this work. Secondly we have the Big Data problems created by others: An emerging threat to users’ online privacy comes from other users’ media. What makes this threat particularly bad is the fact that the person harmed is not involved in the uploading process and thus cannot take any pre-emptive precautions and the amount of data being uploaded is so vast it cannot be manually sighted. Also there are currently no countermeasures, except post-priory legal ones, to prevent others from uploading potentially damaging content about someone. There are two requirements for this form of privacy threat to have an effect: Firstly, to cause harm to a person a piece of media needs to be able to be associated/linked to the person in some way. This link can either be non-technical, such as being recognizable in a photo, or technical such as a profile being (hyper-) linked to a photo. There is also the grey area of textual references to a person near to the photo or embedded in the metadata of the photo. This metadata does not directly create a technical link to a profile, but it opens the possibility for search engines to index the information and make it searchable, thus creating a technical link. M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 16
  • 23. Big Data Privacy Issues in Public Social Media Secondly, a piece of media in question must contain harmful content for the person linked to it. This can again be non-technical such as being depicted in a compromising way. However, more interestingly it can also be technical. In these cases metadata or associated data causes harm. For instance time and location data can indicate that a person has been at an embarrassing location, took part in a political event, or was not where he said he was. Since the uploading of this type of damaging media cannot be effectively prevented, awareness is the key issue in combating this emerging privacy problem. 4.2 PROPOSED SYSTEM The privacy awareness concept consists of a watchdog client a server side watchdog service. Using a GPS-enabled mobile device a user can activate the watchdog client to locally track his position during times he considers relevant to his privacy. Then his device can request the privacy watchdog to show him media that could potentially affect him whenever the user is interested in the state of his online privacy. For this the watchdog client sends the location traces to the watchdog server or servers which then respond with a list of media which was taken in the vicinity (both spatially and temporally) of the user. The watchdog service can be operated in three different ways. The first two would be value-added services which can be offered by the media sharing sites or social networks (SN) themselves. In both these cases the existing services would need to be extended by an API to allow users to search for media by time and location. The first type of service would do this via the regular user account. Thus, it would able to see all public pictures, but also all pictures restricted in scope but visible to the user. The benefit of this service type is that through the integration with the account of the user pictures which aren’t publicly available can be searched. These private pictures are typically from social network friends and thus the likelihood of pictures involving the user is higher and the scope of people able to view the pictures more relevant. This type of service also has the benefit-of-sorts that the location information is valuable to the SN, so it has an incentive to offer this kind of value-added service. The privacy downside of searching with the user’s account is that the SN receives the information of when and where a user was. While there are certainly users who would not mind this information being sent to their SN if it means they get to see and remove embarrassing photos in a timely manner, M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 17
  • 24. Big Data Privacy Issues in Public Social Media there are also certainly users who do not wish their SN to know when and where they were, particularly amongst the clientele who wish to protect their online privacy. The second type of watchdog service would also be operated by the SN. However, it does not require a user account to do the search and can be queried anonymously. This type of service would thus have a smaller scope, since it can only access publicly available media. A further drawback of this type of service is that there is less of an incentive for the SN provider to implement such a service. While there are sites such as Locr that allow such queries, most SN sites do not. Without outside pressure there is less intrinsic value for them to include such a service compared to the first type. The third type of service would be a stand-alone servi which can be operated by a third party. The stand-alone service operates like an indexing search machine, which crawls publicly available media and its metadata and allows this database to be queried. Possible incentive models for this approach include pay-per-use, subscription, ad-based or community services. The visibility scope would be the same as for the second type of service. All three types of service are mainly focused on detecting relevant media events and breaking down the Big Data problem to humanly manageable sizes. The concept is mainly focused on bringing possibly relevant media to the attention of the user without overburdening him. The system does not explicitly protect from malicious uploads with which the uploader is intentionally trying to harm another while attempting the activity at the same time. Even though the watchdog service proposed here could make the subterfuge harder for the malicious uploader. But even without full protection from malicious activity we believe that such a watchdog would improve the current state of the art by enabling users to gain better awareness of the relevant part of the social media Big Dataset. M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 18
  • 25. Big Data Privacy Issues in Public Social Media CHAPTER 5 THREAT ANALYSIS 5.1 Awareness of Damaging Media in Big Datasets Most popular social networks and media sharing sites allow users to tag objects and people in their uploaded media. Additionally, some services also extract embedded metadata and use this information for indexing and linking. Media is annotated with names, comments, or is directly linked to users’ profiles. In particular the direct linking of profiles to pictures was initially met with an outcry of privacy concerns since it greatly facilitated finding information about people. For this reason, social networks quickly introduced the option to prevent people from linking them in media. However, there is also a positive side to this form of direct linking since the linked person is usually made aware about the linked media and can then take action to have unwanted content removed or restrict the visibility of the link. While the privacy mechanism of current services are still limited, hidden and often confusing, once the link is made the affected people can take action. A more critical case in view is the non-linked tagging of photos. In this case a free text tag contains identifying information and/or malicious comments. However there is no automated mechanism to inform a user that he was named in or near a piece of media. The named person might not even be a member of the service where the media was uploaded. The threat of this kind of linking is significantly different to the one depicted above. While the immediate damage can be smaller since no automated notification is sent to friends of the user, the threat can remain hidden far longer. The person can remain unaware of this media whereas others can stumble upon it or be informed by peers. The final case of damaging media does not contain any technical link. Without any link to the person in question this kind of media can only cause harm to the person if someone having some contact to that person stumbles across it and makes the connection. While the likelihood of causing noticeable harm is smaller, it is still possible. The viral spreading of media has caused serious embarrassment and harm in real world cases. The critical issue here is that there is currently no way for a person to pro-actively search for this kind of media in the Big Data deluge to mitigate this threat. M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 19
  • 26. Big Data Privacy Issues in Public Social Media 5.2 Analysis of Service Privacy The different types of watchdog services discussed in the proposed system can help to reduce the number of relevant pieces of media a user needs to keep an eye on if users don’t want uncontrolled media of themselves to be online. However, the devil is in the detail since this form of service can also have serious privacy implications itself if designed in the wrong way. Care must be taken to facilitate the different usage and privacy requirements of different users. The critical issue is the fact that to request the relevant media a user must send location information to the watchdog service. When using the first type of service, there is little which can be done to protect the location privacy of the user since the correlation between the location query and the user account is direct. One option to protect the privacy to a degree can be an obfuscation approach. For every true query a number of fake queries could also be sent, making it less easy (but far from impossible) for the SN provider to ascertain the true location. However, this approach does not scale well for two reasons. Firstly, it creates a higher load on the SN. However, it is more critical that the likelihood deducing the true location rises, if many queries are sent unless great care is taken in creating the fake paths and masking the source IP addresses. Protecting the user’s privacy in the second case is simpler. Since the queries do not require an account the only way a user could be tracked directly is his IP address. Using an anonymizing service such as TOR sequential queries cannot be linked together and creating a tracking profile becomes significantly harder. The anonymous trace data can of course still be used by the SN, but the missing link to the user makes it less critical for the user himself. The third type of service is probably the most interesting privacy wise, since the economic model behind the service will significantly impact the privacy techniques which can be applied to this model. Most payment models would require user credentials to log in and thus would allow the watchdog service provider to track the user. In this case it would have to be the reputation of the service provider which the user would have to trust, similar to the case of commercial anonymizing proxies. In an ad-based approach no user-credentials are needed, thus it would be possible to use the service anonymously via TOR. If so, the watchdog client would need to be open source to ensure that no secret tracking information is stored there. In community-based approaches a privacy model like that in M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 20
  • 27. Big Data Privacy Issues in Public Social Media TOR can be used, to ensure none of the participating nodes gets enough information to track a single user. Each of these proposed service types has different privacy benefits and disadvantages. The following section overviews our privacy analysis of different media hosting sites. It includes media access control as well as metadata handling. Flickr provides the most fine-grained privacy/access control settings of all analyzed services. Privacy settings can be defined on metadata as well as the image itself. One particularly interesting feature of Flickr is the geo-fence. The geo-fence feature enables users to define privacy regions on a map by putting a pin on it and setting a radius. Access to GPS data of the user’s photos inside these regions is only allowed for a restricted set of users (friends, family, and contacts). Flickr allows its users to tag and add people to images. If a user revokes a person tag of himself in an image, no one can add the person to that image again. Facebook extracts the title and description of an image from metadata during the upload process of some clients. All photos are resized after upload and metadata is stripped off. Facebook uses face recognition for friend tagging suggestions based on already tagged friends. Access to images is restricted by Facebook’s ever changing, complex and sometimes abstruse privacy settings. Picasa Web & Google+ store the complete EXIF metadata of all images. It is accessible by everyone who can access the image. The access to images is regulated on a per-album base. It can be set to public, restricted to people who know the secret URL to the album, or to the owner only. A noteworthy feature is that geo-location data can be protected separately. Google+ and Picasa Web allow the tagging of people in images. Locr is a geo-tagging focused photo-sharing site. As such, location information is included in most images. By default all metadata is retained in all images. Access control is set on a per image basis. Anybody who can see an image can also see the metadata. There are also extensive location-based search options. Geo-data is extracted from uploaded files or set by people on the Locr website. Locr uses reverse geocoding to add textual location information to images in its database. M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 21
  • 28. Big Data Privacy Issues in Public Social Media Instagram and PicPlz are services/mobile apps that allow posting images in a Twitter like way. Resized images stripped of metadata but with optional location data are stored by the services. Additionally they allow the posting of photos to different services like Flickr, Facebook, Dropbox, Foursquare and more. Depending on the service used metadata is stored or discarded. For instance, when uploading a photo to Flickr metadata is stripped from the actual file, but title, description as well as geo location are extracted from the image and can be set by the user. In contrast, the Hipstamatic mobile app reserves in-file metadata when uploading images to Flickr. The logos for different social media apps: M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 22
  • 29. Big Data Privacy Issues in Public Social Media M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 23
  • 30. Big Data Privacy Issues in Public Social Media CHAPTER 6 SURVEY OF METADATA IN SOCIAL MEDIA To underpin the growing prevalence of privacy-relevant metadata and location data in particular and to judge potential dangers and benefits based on real-world data we analyzed a set of 20,000 publicly available Flickr images and their metadata. Flickr was chosen as the premiere photo-sharing website, because it can be legally crawled, offers the full extent of privacy mechanisms and does not remove metadata in general. Crawling one photo each from 20k random Flickr users. Of these, 68.8% were Pro users where the original file could be accessed as well. For the others only the metadata available via the Flickr API was accessed. This includes data automatically extracted from EXIF data during upload and data manually added via website or semi-automatically set by client applications. 23% of the 20k users denied access to their extracted EXIF data in the Flickr database. Also took a set of 3,000 images made with a camera phone from 3k random mobile Flickr users. 46.8% of the mobile users were Pro users and only 2% denied access to EXIF data in the Flickr database GPS location data was present in 19% of the 20k dataset and in 34% of the 3k mobile phone dataset. While Flickr hosts many semi-professional DSLR photos, mobile phones are becoming the dominant photo generation tool with the iPhone 4 currently being the most common camera on Flickr. Textual location information like street or city names are currently not used much on Flickr. However, as reverse geocoding becomes more common in client applications this will change (cf. Locr in Figure 4). To evaluate the potential privacy impact, we manual checked which photos contained people and geo-reference, but no user profile tags – i.e. images which could contain people who are unaware of the photo. In the set of 20k images we found 16% and in in the set of 3k mobile photos we found 28% fulfilling this criteria M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 24
  • 31. Big Data Privacy Issues in Public Social Media Fig. 4. Public privacy-related metadata in 5.7k random and 1k mobile user original Flickr photos Further the subset of images which were available from Pro users were analyzed, since these can contain the unaltered metadata from the camera. From the 20k dataset, 5761 images contained in-file metadata. From the 3k dataset, 1050 images contained in-file metadata. Figure 5 shows the percentage values for the different types of metadata contained in the files. For the rest of the images metadata was either manually removed by the uploader or the image never has had any in the first place. Of the 20k dataset only 3% of the in-file metadata contained GPS data compared to 32% from the mobile 3k dataset. This shows a clear dominance of mobile devices when it comes to publishing GPS metadata. M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 25
  • 32. Big Data Privacy Issues in Public Social Media This it is unsurprising since most compact and DSLR cameras currently do not have GPS receivers and only few photographers add external GPS devices to these cameras – but this will likely change with future cameras. However, combined with the fact that mobile phones are becoming the dominant type of camera where it comes to the number of published pictures, it is to be expected that the amount of GPS data available for scrutiny, use and abuse will rise further. Fig. 5. Public privacy-related metadata of 5k random photos from Locr service M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 26
  • 33. Big Data Privacy Issues in Public Social Media A 5k dataset of random photos from Locr were collected, and analyzed the metadata in the images plus the images’ HTML pages built from the Locr database. Figure 4 shows the results from this dataset. Particularly interesting is the high rate of non-GPS based location information. This is a trend to watch since most location stripping mechanisms only strip GPS information and leave other text-based tags intact. Furthermore, the amount of camera ID meta-data is notable since these IDs can be used to link different pictures and infer meta-data even if data has been stripped from some of the photos. To summarize, one third of the pictures taken by dominant camera devices contains GPS information. About one third of these images depict people on it. Thus, about 10% of all the photos could harm other peoples’ privacy without them knowing about it. M.Tech, 1st sem, Dept. Of CSE, VIT 2014-15 27
  • 34. Big Data Privacy Issues in Public Social Media CHAPTER 7 CONCLUSION AND FUTURE WORK An analysis of the threat to an individual’s privacy that is created by other peoples’ social media is discussed. A brief overview of privacy capabilities of common social media services regarding their capability of protecting users from other peoples’ activities were discussed.Based on the survey, analyzed the Big Data privacy implications and potential of the emerging trend of geo-tagged social media. Then the three concepts how the location information can actually help users to stay in control of the flood of potentially harmful or interesting social media uploaded by others are presented. Applications can be built or modified to use pseudonyms rather than true user identities, and route toward greater location privacy. Because different applications can collide and share information about user sightings, users should adopt different pseudonyms for different applications. Furthermore, to stress more on sophisticated attacks, users should change pseudonyms frequently, even while being tracked. Drawing on the methods developed for anonymous communication, conceptual tools of mix zones and anonymity sets to analyze location privacy can be used. Although anonymity sets provide a first quantitative measure of location privacy, the measure is only an upper-bound estimate. A better, more accurate metric, developed using an information-theoretic approach, uses entropy, taking into account the a priori knowledge that an observer can derive from historical data.The entropy measurement shows a more pessimistic picture—in which the user hasless privacy—because it models a more powerful adversary who might use historical data to de-anonymize pseudonyms more accurately. Two services worth mentioning which collect the type ofinformation needed for a privacy watchdog are Social Camera and Locaccino. Social Camera is a mobile app that detects faces in the picture and tries to recognize them with the help of Facebook profile pictures of persons that are in your friends list. Recognized people can be automatically tagged and pictures instantly uploaded to Facebook. Locaccino is a Foursquare type application which allows users to upload location-based information into Facebook. M.Tech, 1st sem, Dept. Of CSE,VIT 2014-15 28
  • 35. Big Data Privacy Issues in Public Social Media These two appsshow the willingness of users to share this kind of informationin the social web. Ahern et al. analyses in their work privacy decisions of mobile users in the photo sharing process. Ahern et al. identify relationships between the location of the photo capture and the corresponding privacy settings. Ahern et al. recommend the use context information to help users to set privacy preferences andto increase the users’ awareness of information aggregation. Work by Fang and LeFevre focuses on helping the user to find appropriate privacy settings in social networks. Fang and LeFevrepresent a system where the user initially only needs to setup a few rules. Through the use of active machine learning algorithms the system helps the user to protect private information based on the individual behavior and taste. Mannan et al. address the problem, that private user data is not only shared within social networks, but also through personal web pages. In their work they focus on a privacy-enabled web content sharing and utilize existing instant messaging friendship relations to create and enforce access policies. The three works shown above all focus on protecting auser’s privacy based on dangers created by the user himself while sharing media. It did not discuss how users can be protected from other peoples’ media. It is prevalent for most of the research work done in this area. Besmer et al present work which allows users that are tagged in photos to send a request to the owner to hide the linked photo from certain people. This approach also follows the idea that forewarned is forearmed and that creating awareness of critical content is the first step towards the solution of the problem. However the work relies on direct technical tags and as such does not cover the same scope asthe privacy watchdog suggested in the paper. Work that also takes into account other users’ media ispresented by Squicciarini et al. who postulate that mostof the shared data does not only belong to a single user. Therefore Squicciarini et al. propose a system to share the ownership of media items and by that strive to establish a collaborative privacy management for shared content. M.Tech, 1st sem, Dept. Of CSE,VIT 2014-15 29
  • 36. Big Data Privacy Issues in Public Social Media Squicciarini et al. Prototype is implemented as a Facebook app and is based on game theory, rewarding users that promote co-ownerships of media items. While the work does take into account other users’ media, unlike the approach it does not cope with previously unknown and unrelated users. M.Tech, 1st sem, Dept. Of CSE,VIT 2014-15 30
  • 37. Big Data Privacy Issues in Public Social Media REFERENCES [1] S. Ahern, D. Eckles, N. Good, S. King, M. Naaman, and R. Nair. Overexposed?:privacy patterns and considerations in online and mobile photosharing. In Proceedings of the SIGCHI conference on Human factorsin computing systems, pages 357–366, 2007. [2] C. a. Ardagna, M. Cremonini, S. De Capitani di Vimercati, and P. Samarati. An Obfuscation-Based Approach for Protecting Location Privacy.IEEE Transactions on Dependable and Secure Computing, 8(1):13–27,June 2009. [3] A. Beresford and F. Stajano. Location privacy in pervasive computing.IEEE Pervasive Computing, 2(1):46–55, Jan. 2003. [4] A. Besmer and H. Richter Lipford. Moving beyond untagging. In Proceedings of the 28th international conference on Human factors incomputing systems - CHI ’10, page 1563, Apr. 2010. [5] D. Boyd. Facebook’s privacy trainwreck: Exposure, invasion, and social convergence. Convergence: The International Journal of Research into New Media Technologies, 14(1):13–20, 2008. [6] d. Boyd and K. Crawford. Six Provocations for Big Data. SSRNeLibrary, 2011. [7] D. Boyd and E. Hargittai. Facebook privacy settings: Who cares? FirstMonday, 15(8), 2010. [8] Carnegie Mellons Mobile Commerce Laboratory. Locaccino - a usercontrollable location-sharing tool. http://locaccino.org/, 2011. [9] E. Eldon. New Facebook Statistics Show Big Increase in Content Sharing, Local Business Pages. http://goo.gl/ebGQH, February 2010. [10] Facebook. Statistics. 2011. http://www.facebook.com/press/info.php? statistics. [11] L. Fang and K. LeFevre. Privacy wizards for social networking sites.In Proceedings of the 19th international conference on World wide web- WWW ’10, page 351. ACM Press, Apr. 2010. [12] Flickr. Camera Finder. http://www.flickr.com/cameras, October 2011. M.Tech, 1st sem, Dept. Of CSE,VIT 2014-15 31
  • 38. Big Data Privacy Issues in Public Social Media [13] B. Gedik and L. Liu. Protecting Location Privacy with Personalized k-Anonymity: Architecture and Algorithms. IEEE Transactions on MobileComputing, 7(1):1–18, Jan. 2008. [14] M. Mannan and P. C. van Oorschot. Privacy-enhanced sharing ofpersonal content on the web. In Proceeding of the 17th international conference on World Wide Web - WWW ’08, page 487. ACM Press,April 2008. [15] Metadata Working Group. Gartner special report –patternbasedstrategy: Getting value from big data. www.gartner.com/patternbasedstrategy, 2012. [16] A. C. Squicciarini, M. Shehab, and F. Paci. Collective privacy managementin social networks. In Proceedings of the 18th international conference on World wide webWWW ’09, page 521. ACM Press, Apr.2009. [17] Viewdle. SocialCamera. http://www.viewdle.com/products/mobile/index.html. M.Tech, 1st sem, Dept. Of CSE,VIT 2014-15 32