Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Platforms and Analytical Gestures
1. Platforms and Analytical Gestures
Social Media Data Analysis with Digital
Methods
Bernhard Rieder
Universiteit van Amsterdam
Mediastudies Department
2. Introduction
Social media are taking an important place in contemporary life.
Besides other things, they are discussed as data collections – in relation to
terms like big data, computational social science, digital methods,
database marketing, surveillance, social sorting, etc.
Many disciplines have begun to study social media, applying various
methodologies, but there is an explosion in data-driven research.
The promise is (cheap and detailed) access to what people do, not what
they say they do; to their behavior, exchange, ideas, and sentiments.
3. This presentation
How do we talk about social media data? How do we analyze them? What
is our frame of thought? How do we go further in terms of methodological
imagination and expressivity?
Instead of a totalizing search for a "logic" of data analysis, we could
inquire into the rich vocabulary of analytical gestures that constitute the
practice of data analysis.
Social media data analysis using digital methods (Rogers 2013):
1. Characteristics of social media
2. Analytical gestures
3. Some examples
4. Very large numbers and variety in users,
contents, purposes, arrangements, etc.
5. Social media are built on simple
point-to-point principles; this
allows for a wide variety of
topological arrangements to emerge
over time.
There is no average Twitter user.
But every account is also the same.
6. Platforms like Twitter
provide opportunities for
creating connections
between defined types of
entities (users, messages,
hashtags, resources, etc.).
They formalize and channel
expression, exchange, and
coordination.
"You cannot reply to a
hashtag."
7. The Web vs. Social Media
The Web "natively" only
knows one type of entity
(the web page) and one
type of connection (the
hyperlink).
"The Web does not know
what a blog is."
Technical formalization is
very unspecific in terms of
user practices.
8. The Web vs. Social Media
Social media define sets of
distinct entities as well as
distinct types of
connection.
Technical formalization is
explicitly related to
specific use practices.
"Social media are formal,
the Web is conventional."
"Facebook knows that a
song is."
9. The Web vs. Social Media
The open Web is difficult to study because of the separation between
technical markers and meaning. A link is not a like.
The more detailed the formalization, the more salient the data.
Social media platforms are essentially large databases.
10. Social media are built on simple point-
to-point principles; but elements are
dynamically aggregated into lists.
Social media platforms organize
exchange around market forms of
interaction; topological arrangements
result from histories of exchange and
technological mediation /
11. Standardization and formalization enhance calculability.
Platforms modulate visibility by using various ways of
processing formalized entities to produce aggregates.
12. If outcomes are based both on platform
characteristics and historically developed
arrangements marked by diverse practices, how do we
understand these objects / practices / dynamics?
13. Social media produce detailed
data traces; data pools in
social media are centralized
and searchable.
Access to these proprietary
data is governed by technical
(API) and legal means (EULA).
Structure of APIs is closely
related to given
formalizations.
14. Data analysis for social media, first recommendation:
in order to select, process, and interpret data in a
meaningful way, we need to understand the
platform: entities, relations, modes of aggregation.
15. “facts and statistics collected together for reference or analysis. See also datum.
- Computing: the quantities, characters, or symbols on which operations are performed by a
computer, being stored and transmitted in the form of electrical signals and recorded on
magnetic, optical, or mechanical recording media.
- Philosophy: things known or assumed as facts, making the basis of reasoning or
calculation.” (Oxford American Dictionary)
Define: data
Reasoning (OAD): "think rationally", "use one's mind", "calculate", "make sense
of", "come to the conclusion", "judge", "persuade", etc.
Reasoning as "giving reasons": What counts as a finding? As a valid argument or
method? What is "good" knowledge?
How do we reason on the basis of data?
16. What styles of reasoning?
Hacking (1991) builds the concept of "style of reasoning" on A. C.
Crombie’s (1994) "styles of scientific thinking":
☉ postulation and deduction
☉ experiment and empirical research
☉ reasoning by analogy
☉ ordering by comparison and taxonomy
☉ statistical analysis of regularities and probabilities
☉ genetic development
These are styles of "giving reasons", styles of making truth and knowledge.
What kind of reasoning are we mobilizing in data analysis? Is this simply
quantitative empiricism, counting facts?
17. Quality / quantity
"One of my favorite fantasies is a dialogue between Mills and Lazarsfeld in which the former
reads to the latter the first sentence of The Sociological Imagination: 'Nowadays men often
feel that their private lives are a series of traps.' Lazarsfeld immediately replies: 'How many
men, which men, how long have they felt this way, which aspects of their private lives
bother them, do their public lives bother them, when do they feel free rather than trapped,
what kinds of traps do they experience, etc., etc., etc.'." (Maurice Stein, cit. in Gitlin 1978)
Theory vs. empiricism, macro vs. micro, qualitative vs. quantitative, inductive vs.
deductive, confirmatory vs. exploratory, understanding vs. explaining, etc.
The promise of data analysis, applied to exhaustive (and cheap) data, is to bridge
the gap between different epistemic stances, e.g. "quali-quanti" (Latour 2010).
We need to think creatively about new analytical gestures rather than slavishly
follow sterile methodological paradigms.
18. Flusser (1991) describes gestures as having convention and structure, but
also as different from reflexes, because translating a moment of freedom.
It is an "art of doing" (de Certeau 1980), a movement of the body that has
no sufficient causal explanation.
We investigate the structure of data by creating "views" of the data.
The notion of gesture indicates that data does not speak for itself, we
approach it with particular epistemic techniques (methods) related to a
sense of purpose, a "will to know" (Foucault 1976).
Analytical gestures
Data analysis for social media, second recommendation:
in order to select, process, and interpret data in a
meaningful way, we need to be clear about our
19.
20. Where are analytical gestures?
Analytical gestures produce orderings, lists, tables, charts, etc. that are
considered to be saying something about the data / phenomenon.
Analytical gestures mobilize the analytical capacities of:
☉ The platforms (formalization, aggregation) and their users (appropriation, use)
☉ The analytical tools and methods we use
☉ The researchers and their imagination, knowledge, and skill
We need to think all three levels together.
21. There are counts everywhere,
but anything here can be
exploited for analysis.
22. Social media platforms are full of analytics: counts,
rankings, trends, recommendations, groupings, similarities,
and so on. How can we repurpose them for research?
Example 1: the platform
23. Step 1: Retrieve list of friends
from Facebook account
Step 2: Check for friendship
between each pair of users
Step 3: Project as network – use
friendship as link
Step 4: Spatialize network (use
structure to arrange nodes)
Step 5: Detect communities (use
structure to distinguish groups)
Step 6: Size and color nodes
Step 7: Interpret
Friendship analysis of my personal FB network:
Nodes: users / Links: "being friends"
Example 2: the methods and tools
24. Co-like analysis of my personal FB network:
Nodes: users / Links: "liking the same thing"
Step 1: Retrieve list of friends
from Facebook account.
Step 2: Retrieve liked entities for
every user.
Step 3: Project as network – if two
users like the same object, create
a link; for every other mutual like
add weight to connection.
Step 4: Spatialize network (use
structure to arrange nodes)
Step 5: Detect communities (use
structure to distinguish groups)
Step 6: Size and color nodes
Step 7: Interprete
Example 3: our imagination
25. Analytical gestures
Methodological plasticity requires imagination, rigor, and self-criticism.
Example: the arithmetic mean (most common form of average) is
supposed to reveal a "central tendency" in a distribution:
26. Three (more) things to consider about data
The technical shape of data: data in relation to the social media platform
☉ Variety of "units": users, accounts, pages, groups, lists, messages, hashtags, words,
etc. (different forms of materiality, different analytical opportunities, etc.)
The social / cultural shape of data: data in relation to lived experience
☉ Demographic (age, sex, income, etc.)
☉ Post-demographic (taste, expression, etc.)
☉ Behavioral (trajectories, interaction, etc.)
The modes of analysis: the tools / methods (and their mathematics)
☉ Statistical (case centered) perspective
☉ Relational (structure centered) perspective
Data analysis for social media, third recommendation: in order to
select, process, and interpret data in a meaningful way, we need to be
knowledgeable about methods and the concepts they are based on.
27. Two kinds of mathematics
Statistics
Observed: objects and properties ("cases")
Data representation: the table
Visual representation: quantity charts
Inferred: relations between properties
Grouping: class (similar properties)
Graph-theory
Observed: objects and relations
Data representation: the matrix
Visual representation: network diagrams
Inferred: structure of relations between objects
Grouping: clique (dense relations)
28. Facebook Page "ElShaheeed", June 2010 – June 2011, (Poell / Rieder, forthcoming)
7K posts, 700K users, 3.6M comments, 10M likes (tool: netvizz), work in progress!
29. Date captured from an API can be easily imported into standard statistical
tools that come with many analytical gestures built in (e.g. R, Excel, SPSS,
Rapidminer, …).
Statistics
Data analysis for social media, fourth recommendation: in order to
select, process, and interpret data in a meaningful way, we need to be
able to use tools skillfully and with a degree of awareness of how they
work.
37. Two kinds of mathematics
Statistics
Observed: objects and properties
Inferred: relations
Data representation: the table
Visual representation: quantity charts
Grouping: class (similar properties)
Graph-theory
Observed: objects and relations
Inferred: structure
Data representation: the matrix
Visual representation: network diagrams
Grouping: clique (dense relations)
38. 3 / The mathematics of structure
Graph theory has a long prehistory; social network analysis starts in the
1930s with Jacob Moreno's work.
Graph theory is "a mathematical model for any system involving a binary
relation" (Harary 1969); it makes relational structure calculable.
56. Recommendations
Data analysis for social media requires (in my view):
☉ Robust understanding of the social media platform
☉ A sense of purpose
☉ Conceptual understanding of methods and analytical gestures
☉ Knowledge of software tools for data analysis
Also: from the simple to the more complex, start out with platform
metrics and counting; prefer simple visualizations to complex ones.
Finally, let's not forget domain expertise!
57. Conclusions
There is a lot of excitement about social media data analysis, but our
understanding of styles and analytical gestures is still very poor.
We need interrogation and critiques of methodology that are developed
from engagement and historical / conceptual investigation.
We need analytical gestures that are more closely tied to concepts from
the humanities and social sciences.
Visualization and simpler tools are very interesting but require technical
and conceptual literacy to deliver more than (deceptive) illustrations.
58. Thank You
rieder@uva.nl
https://www.digitalmethods.net
http://thepoliticsofsystems.net
"Far better an approximate answer to the right question,
which is often vague, than an exact answer to the wrong
question, which can always be made precise. Data
analysis must progress by approximate answers, at best,
since its knowledge of what the problem really is will at
best be approximate." (Tukey 1962)
Notas del editor
Data can be thought of as a kind of "observation" rather than survey-based research.
I've been doing this for only three years, so I'm still learning lot each day. But I would like to share some of the things I've learned so far.Not monolithic, but full of choice.=> I'll give a couple of recommendations, that are debatable, of course.
People do a lot of different things on Twitter, Facebook, etc. – and just because you and your immediate vicinity seem to have coherent practices, this does not mean others have.But there are conventions emerging, national, subcultural, etc.The banal, the political, and the commercial coexist in the same functional containers, behind the same interfaces.
Differentiation of scales (topological forms) is produced through technical means and emerge through social dynamics. Variations in scale are less institutional and more topological. (example: big Twitter accounts.)The idea that this would foster equality comes from the fact that indeed, everybody is a node. We think in terms of properties, not in terms of structure/dynamics. Status is not what you are, but how you are connected.=> Variety in topics, variety in scales. Size is the main differentiator. (compare to university, organizations)
There are conventions, but relatively little formalization.Image: Anne Helmond / Esther Weltevrede(starting list compiled through experts)Less formalization, more convention: there are ways to define a blog, but this is not part of the Web's architecture.
Organized around a limited set of formalized entities (users, messages, etc.) that have defined properties and functionalities.A song is something that a user listens to through an app like spotify. It knows that songs appear on albums, that they are made by musicians.
People do a lot of different things on Twitter, Facebook, etc. – and just because you and your immediate vicinity seem to have coherent practices, this does not mean others have.Entities and types of relation are formalized in "domain specific ways" => FB social graphEmpty room vs. full room
A stock market is basically a series of point-to-point connections that are constantly producing lists showing aggregates.A telephone network is a point-to-point system without aggregate lists. A television network is just a list.Social media: low barriers of entry into the market, but difficult to be seen.List provide orientation: where am I? What is happening? Etc.
Ordering devices, list-making devicesThese lists are very significant on the interface side, but they are also important for researchers:To understand how the platforms order andTo use them for our own analytical purposes
The diversity of practices, contents, geographies, topologies, intensities, motivations, etc. makes it hard to generalize and theorize dynamics of use.We need both theory and empiricism. We need both quantity and quality. => grounded theory---Image from: http://personal.anderson.ucla.edu/phillip.wool.2012/research_networks.htm
Very large scale systems on the one side, but highly concentrated data repositories on the other.Here at DMI we use a lot of listing devices (e.g. Google) to centralize and search the Web.
The promise of data analysis is, of course, to use that data to make sense of all the complexity.All of this to say: it’s kind of complicated, but we've also got a lot of pretty good data.Practical: Look at the help sections and other explanations, study API documentation.
Starting part two of this presentation.Research is essentially about giving reasons for a description/belief/affirmation.
C. Wright Mills vs. Paul Lazarsfeld
Just as the first recommendation, this can never be made entirely transparent.
Why look at these elements? Why these cutoff points? Why these visual representations? Why marking news outlets in red? (Maybe because we try to comment on the idea that Twitter is a medium for information diffusion)There is freedom.
These
Find the most relevant, popular, dominant ideas, etc. – or the discarded, marginal, etc."repurpose" because the platform's goals are quite different to ours. They want to grow use and strategically inject advertising.
Visualization is, again, one type of analysis.Which properties of the network are "made salient" by an algorithm?We often build our analysis on platform elements, that are then aggregated and examined in different ways.
This is a technical process, but to be a method, there needs to be adequation between a conceptual element and a technical one.These steps translate a large number of commitments to particular ideas.A postdemographic (Rogers) approach.
Allows for all kinds of folding, combinations, etc. – Math is not homogeneous, but sprawling!Different forms of reasoning, different modes of aggregation.These are already analytical frameworks, different ways of formalizing.There is a fast growing variety of analytical gestures focusing on large numbers of formalized and classed objects.
http://www.facebook.com/ElShaheeed (Created by WaelGhonim, considered to be a central place for the sparking of the Egyptian Revolution)http://apps.facebook.com/netvizz/ (tool used for extraction)
A standardized way of looking at a distribution.
But if we look at the number of posts published on the page, this is a very different picture! So we want to compare!
Simply plotting events is an analytical gesture. (=> pattern)
Adding variables => allow for comparisons
Find outliers and interesting moments not only in terms of values, but relationships between values. Simple calculations.
In statistics, regression analysis is a statistical technique for estimating the relationships among variables. (correlation)A probability relationship: height and weight is correlated: if you are very tall, there is a good chance that you also weigh more; a statistical not a deterministic relationshhipErosion of determinism in the 19th centuryTitle : Recherchessur la population, les naissances, les décès, les prisons, les dépôts de mendicité, etc., dans le royaume des Pays-Bas , par M. A. Quételet,… 1827http://gallica.bnf.fr/ark:/12148/bpt6k81568v.r=.langEN
Positive correlation, but it's not 1:1
And now to graph theory.
Forsythe and Katz, 1946 – "adjacency matrix", Moreno, 1934
Visualization is, again, one type of analysis.Which properties of the network are "made salient" by an algorithm?http://thepoliticsofsystems.net/2010/10/one-network-and-four-algorithms/Models behind: spring simulation, simulated annealing (http://wiki.cns.iu.edu/pages/viewpage.action?pageId=1704113)
Starting point: stop Islamization of the World
Visual / spatial analysis is already very interesting, but graph theory allows to do much more. Networks are eminently calculable.All in all, this process resulted in the specification of nine centrality measures based on three conceptual foundations. Three are based on the degrees of points and are indexes of communication activity. Three are based on the betweenness of points and are indexes of potential for control of communication. And three are based on closeness and are indexes either of independence or efficiency.(Freeman 1979)What concepts are they based on?
Starting point: stop Islamization of the World.What does this mean?
Starting point: stop Islamization of the World
Starting point: stop Islamization of the World
The main pages of the defence league and anti-islam clusters link to the Israel cluster, but the main pages of the Israel cluster do not link back.Very complex patterns in this network, beware of easy conclusions.
We can of course produce descriptive statistics!Baselining allows us to make "drawing the line" more informed. Does not evacuate bias – there is no "view from nowhere" – but maybe more conscious.
Extend word lists (what am I missing?), account for refraction.
Project statistical measures into a network.
Larger roles of hashtags, not all are issue markers!
There is no need to analyze and visualize a graph as a network.Characterize hashtags in relation to a whole. (their role beyond my sample), better understand our fishing pole and the weight it carries.Tbt: throwback thursday