Analyzing people is great -- we can just talk to them -- but hard: their answers are fuzzy. This talk walks through our analysis of a Twitter botnet and another of how programmers pick programming languages. Interestingly, we used interactive graph visualizations to unravel mysteries in both.
4. CASE STUDY:
TWITTER FRAUD
Naïve layout on 1K+ node graphs
give impenetrable hairballs.
Gauss-Seidel Force-Directed Graph, O(N^2) n-body, GPU
Node: Twitter account
Edge: Friendship
Friends and friend-of-friends of a bot
who randomly messaged real
people and retweeted them.
5. Even on a small graph (77 nodes),
smart design starts adding clarity
6. With smart layouts, fake account clusters pop out
ForceAtlas2 Layout, O(n log n) n-body, GPU
The spambot
is an entrypoint
to more bots…
Obviously fake
account names
7. A quiet small business who buys
virtual game currency from gamers…
8. Who somehow got exactly
1 message massively
trended & advertised by Twitter
10. Relationships hard to see without
graphs with smart layouts & interactions.
Next step: explore the time dimension
Ex: how do mobs launch from Twitter?
11. Leo A. Meyerovich, @lmeyerov,
Graphistry
THE
SOCIOLOGY
OF
PROGRAMMING
LANGUAGES
11
16. Let’s run a competition for the
friendliest language! (Glicko2)
Each survey response is a game match:
1. Person A says Python beats C in friendliness
2. Person A says Java beats C in friendliness
3. Person B says C beats APL in friendliness
…
17. Score Points set by a Bookie
Every language starts with rank 1000
1. “Person A: Python friendlier than C”
Python’s rank goes up
2. “Person B: Python friendlier than C”
Python already > C, less valuable win
3. “Person C: Haskell friendlier than Python”
Problem: little known about Haskell (“sparse”)
Haskell beat a high-rank language: big level increase!
(Bayesian!)
25. Relationships hard to see without
graphs with smart layouts & interactions.
Step 2 of analysis is correlate (step 1 is count).
Correlations are relationships,
so explore them as graphs!
30. Survey of 1,679 Developers
Extrinsic factors
dominate!
(on last
project)
30
31. FUTURE STEP:
Now that we’ve counted things, let’s correlate them!
Topics in Free-form ResponsesAnswer Correlations
32. Relationships hard to see without
graphs with smart layouts & interactions.
Step 2 of analysis is correlate (step 1 is count).
Correlations are relationships,
so explore them as graphs!
Powerful because correlations everywhere:
raw features, inferred topics, …