How can we verify identity using unstructured data from a user device? While biometrics like fingerprinting and facial recognition are often used for authentication, research around natural language processing has found people's use of language as uniquely identifying.
In this session, we will discuss multiple facets of language modeling:
• Efficacy on different kinds of unstructured text within a corporate network
• As a technique to detect anomalous user activity, compromised accounts, and stolen credentials
• As an integral part of a cybersecurity program in addition to UEBA and risk-adaptive protection
33. SOMETIMES MODELS ARE JUST BAD
Jeff Dasovich
Second largest training set from Enron
Most unique tokens
We are more likely to guess Richard Sanders
as the author
Common top-25 tokens include
'know', 'like', 'call', 'get', 'time', 'would', 'thanks'
Why do we fail to identify Jeff?
34. SOMETIMES MODELS ARE JUST BAD
Jeff Dasovich
Second largest training set from Enron
Most unique tokens
We are more likely to guess Richard Sanders
as the author
Common top-25 tokens include
'know', 'like', 'call', 'get', 'time', 'would', 'thanks'
Why do we fail to identify Jeff?
He liked to embed news articles in his emails
… This article showed up on Wednesday . Thought
you might be interested .
Texas Journal -- Energy traders cite gains , but some
math is missing -- Volatile prices for natural gas and
electricity are creating high-voltage counting on these
gains could be in for a jolt down the road ...
35. EXTENDING TO OTHER
STRUCTURED CONTENT
Demonstrated a solution that
Addresses the task of entity identification
Increases performance according to
quantitative precision assessment
Improves performance over time with
additional experience
Potential future applications
Chat or phone transcript
Command line activity
Database / SIEM queries