Detecting Credit Card Fraud: A Machine Learning Approach
Networks All Around Us: Extracting networks from your problem domain
1. M E T I S M E E T U P
Networks All Around Us: Analyzing Networks in your Problem Domain | 3/3/2016
Russell Jurney
http://bit.ly/socialnetworkanalysis2
15. PROPERTY GRAPHS IN YOUR DOMAIN
identify entities
identify relationships
specify schema (or not)
populate graph database
learn to think in graph walks (hard)
query in batch
query in realtime
19. final Graph g = TinkerFactory.createClassic();
try (final OutputStream os = new FileOutputStream(“jsondump/links.json")) {
GraphSONWriter.build().create().writeGraph(os, g);
}
EXPORT LINKS AS JSON
20. THEN USE
SNA
LIBRARIES
#
# Example - calculate friendship dispersion
#
di_graph = nx.DiGraph()
all_edges = util.json_cr_file_2_array('jsondump/links.json')
for edge in all_edges:
if 'type' in edge and edge['type'] == 'partnership':
di_graph.add_edge(edge['domain1'], edge[‘domain2'])
dispersion = nx.dispersion(di_graph)
21. TOOLS OF
SNA
SNA = Social Network Analysis
centrality
clustering
block models
cores
dispersion
center-pieces
22. CENTRALITY
Centrality is a way of measuring how central or important a particular
node is in a social network.
OR
What nodes should I care about?
27. DEGREE CENTRALITY
# computation
count connections
…its that simple
in-degree centrality = popularity
out-degree centrality = gregariousness
# meaning
risk of catching cold
28. DEGREE CENTRALITY IN GREMLIN
# all-links-the-same-type-centrality
g.V().out().groupCount()
29. CLOSENESS CENTRALITY
# computation
count hops of all shortest paths
distance from all other nodes
reciprocal of farness
# meaning
communication efficiency
spread of information
32. EIGENVECTOR CENTRALITY
# computation
counts connections of connected nodes
more connected neighbors matter more
# meaning
influence of one node on others
pagerank is an eigenvector centrality
33. EIGENVECTOR CENTRALITY IN GREMLIN
g.V()
.repeat(out(‘relationship_type’).groupCount(‘m').by('unique_key'))
.times(n).cap('m')