Knowledge graphs (KGs) have recently emerged as a powerful way to represent knowledge in multiple communities, including data mining, natural language processing and machine learning. Large-scale KGs like Wikidata and DBpedia are openly available, while in industry, the Google Knowledge Graph is a good example of proprietary knowledge that continues to fuel impressive advances in Google's semantic search capabilities. Yet, both crowdsourced and automatically constructed KGs suffer from noise, both during KG construction and during search and inference. In this talk, I will discuss how to build and use such knowledge graphs effectively, despite the noise and sparsity of labeled data, to solve real-world social problems such as providing insights in disaster situations, and helping law enforcement fight human trafficking. I will conclude by providing insight on the lessons learned, and the applicability of research techniques to industrial problems. The talk will be designed to appeal both to business and technical leaders.
14. What do these problems have in common
(besides being really hard)?
15. 1. Very messy, raw data, with both
redundancy and irrelevance
2. Users are also producers i.e. we cannot
just ‘build’ the system and hand it off
3. Domains are largely non-analytic (e.g.,
we don’t have a model/equations for
human trafficking)
16. 1. Very messy, raw data, with both
redundancy and irrelevance
2. Users are also producers i.e. we cannot
just ‘build’ the system and hand it off
3. Domains are largely non-analytic (e.g.,
we don’t have a model/equations for
human trafficking)
17. 1. Very messy, raw data, with both
redundancy and irrelevance
2. Users are also producers i.e. we cannot
just ‘build’ the system and hand it off
3. Domains are largely non-analytic (e.g.,
we don’t have a model/equations for
human trafficking)
18.
19. 19
Space of design decisions
Raw data
Search+GUI
?
??
Representation +
Infrastructure
? ? ?
25. 25
Space of design decisions: example from human trafficking
Raw data
Search+GUI
Knowledge
Graph (KG)
Domain
discovery
Define KG
schema
Representation +
Infrastructure
Flexible
inputs
Query
reformulation
KG
Construction
49. 49
Academic Output
~15 publications over the course of the program
• 7 more currently under review
• 2 best paper awards
• Upcoming special issue call on knowledge construction and management
• 2 upcoming books, incl. graduate-level textbook on knowledge graphs (MIT Press, 2018)
Multiple tutorials/demonstrations at top-tier academic conferences
• Tutorials on knowledge graph construction and data mining over Web corpora/unusual domains in
KDD17, ISWC17, AAAI18, WWW18
• At ISWC17, only full-day tutorial accepted; had near-capacity attendance
• Demos at ISWC17, AAAI18 (nominated for Best Demo)
• Case study at CHI18
Selected papers
• Knowledge Graphs for Social Good: An Entity-centric Search Engine for the Human Trafficking Domain
(IEEE Transactions on Big Data, 2017)
• Information Extraction in Illicit Domains (WWW, 2017)
• Unsupervised Entity Resolution on Multi-type Graphs (ISWC 2016)
56. 56
DIG capabilities
Aggregations
Facets
Dossier Generation
Networks
Provenance
Structured Queries
Interface Customization
• Capabilities that generic search
engines like Google do not
currently support
• Domain-specific
–Allows a user to specify her schema
–No prior constraints
• Insight
–Supports aggregations, network
analysis, faceted search, dossiers...
• Graph
–Uses a knowledge graph
representation + efficient NoSQL
query reformulation
58. • Advanced name matching algorithm based on
machine learning, phonetic similarity and illicit
webpage-specific word embeddings
58
How can we tell when two actors are really one and the same?
Abbie
Candy
Kim
Lea
Nicki
Abby
Kandy
Kimmy
Leah
Nikki
59. • Evaluated on five investigative domains beyond human trafficking,
each with its own domain-specific needs
–Narcotics
–Counterfeit Electronics Manufacturing
–Securities Fraud
–Mail Shipment Fraud
–Illegal Weapons Sales
• User engagement was high
–Investigators were able to customize their domain in just one day, with less
than an hour of training
–Have expressed interest in continuing to refine and use the search engine
internally
59
Other use-cases