3. We love to label. We can make a list of almost any group of people
using broad labels.
Millennials (age). New Yorkers (location). Females (gender). Democrats
(voting preference). We can even combine these things: Millennial
Female New York Democrats.
4. We can tell basic things about the people in our list (How much money
do you make? How many people in your family? What do you do for a
living? Who will you vote for?). Aggregate these responses, and we
have a poll.
5. No matter our label, there is either a binary or multinomial
classi cation system we can compare our list to.
Democrats? How do they compare to Republicans?
New Yorkers? How do they compare against other states?
6. A lot of the time, these systems are expressed as personas in the
marketing world
But, these are very broad generalizations at best. Easily identi ed, but
not particularly enlightening.
"Geography, age, and gender? We put that in the garbage heap"
- Net ix VP of product Todd Yellin
8. The social connections between people in a group are easier to
quantify now than they ever have been before.
(Followers of @x), or (followers of @x and @y but not @z)
If we are good enough with principal component analysis, we can
e ectively cluster on these attributes.
Comparisons between algorithmically-de ned clusters tend to be
richer than comparisons using broad demographic labels.
10. “for anything to be made whole, the rst step is to know what’s missing.”
― Christian Rudder
Once we've established clusters in a network graph, we have plenty of
metadata to aggregate and analyze.
11. The things we use to self-describe:
Bio information and keywords
Our location.
Our avatar.
Sidenote: Demographics are pretty tricky to gure out on networks
like twitter. We've had a lot of success using deep learning to analyze
facial features of pro le photos to determine age and gender.
14. Most importantly, the things we don't say
out loud: Our Interest Patterns.
The things we follow, our in uencers on social media, play a huge role
in both the kinds of content we see and share, and in the sorts of
people we relate to.
Follower-patterns are indicative of interest, of political a liation, of
brand loyalty, and plenty more.
Being able to see signal through the noise is important: Some
twitter/instagram users are so widely-followed that they skew results
and give incorrect impressions of what we like
Solution: Niche-rank social media accounts by popularity. A nity
scoring. Like TF-IDF, but for in uence.
16. A nio clusters social graphs based on user interest patterns.
We then look at the aggregated metadata from the emergent clusters
to identify where tribes are distinct, and where they are related.
The most over-index attributes are surfaced as traits
17. Di erent usage patterns between clusters also emerge:
social tribes that are particularly talkative
tribes that are likely to know one another
tribes whose interests are not shared elsewhere
These all present unique opportunities for analysis.