Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
One Step Closer to the Matrix: Machine Learning & Augmented Reality in Streaming Data
1. CarolinaCon 11
One Step Closer to the Matrix: Machine
Learning and Augmented Reality in
Streaming Data
Rob Weiss
John Eberhardt
2. What’s the Story?
• Rob and John have been working together for years
• Rob is a Network Engineer and Hacker
• John is a Data Scientist and Architect
• Two Great Tastes that Taste Great Together
• Different perspectives bring new answers
• Rob and John are interested in how to create a paradigm
shift in user interaction with data and network security
• We are also probably slightly insane
CarolinaCon 11
3. The Defender’s Challenge
• The attacker has an inherent advantage – no rules!
• So the defense problem is asymmetric
• Classical methods fail more rapidly as computing power
becomes cheaper and more readily available
• The Fortress or “Big Walls” security model is outdated and,
frankly, ineffective
• Qualified people are in short supply
• Can we crowdsource network defense?
CarolinaCon 11
4. How We Got Started
• A research project in a galaxy far, far away
• We started modeling zero day attacks
• We combined machine learning and streaming analytics to
detect novel patterns statistically
• It worked well enough, but there were limitations
• Not sensitive enough
• Not specific enough
• Proprietary software limited flexibility
• It still required a pretty sophisticated operator – and
those are in short supply
• So . . .
CarolinaCon 11
5. Taking a Different Approach
CarolinaCon 11
• Could we do for raw data what GUIs did for computers
and revolutionize human interaction with data?
• Complex streaming analytics are not tractable to the
human
• The “last mile” requires a user interface that creates flow
for the human analyst out of data
• Harness the power of metaphor to explain complex
concepts to the human analyst (e.g. Windows)
• Streaming Analytics + Streaming User Experience = “Data
Looming”
• Can we really make a prosthetic for the brain?
7. Data Looming
• Can you point out every individual thread and show me
how it is woven? Probably not.
• Can you tell me what it is? I sure hope so!
CarolinaCon 11
Data Looming
Watch threads on a loom – to the naked eye,
the loom is too complex and moving too
quickly for you to pick out the details, but you
can quickly see when the overall pattern
changes – usually within very few iterations. A
simple, intuitive, scalable visualization of
streaming analytics allows the human analyst
to connect the “last mile” of disconnected
events and is at the heart of what we are doing
– merging complex streaming analytics with the
sparse pattern detection capabilities of the
human brain.
8. Pattern Recognition is For the Birds
A child can learn to recognize this pattern in 15 seconds, but a
computer still can’t.
#1 - Eagle #2 - Swan #3 - ????
CarolinaCon 11
9. Getting to The Big Idea
Zero Day Work
William Gibson’s
Neuromancer The Matrix
John Maeda’s Simplicity
by Design
Open Source Network Expertise Data Science
Expertise
Crowdsourcing
Hacktastic Innovation Explosion!!!
CarolinaCon 11
10. How I Did It by Victor Frankenstein
• Accelerate data analysis by extending streaming analytics to
broader groups of less skilled human analysts
• Combine the speed, precision and recall of a computer,
through an immersive interface, with the inherent sparse
pattern recognition capabilities of the human brain
• Streaming Analytics allow for rapid, real time
adjudication of data and make the user experience
dynamic
• An immersive user experience makes complex analytics
data “real” to the human and enables experiential
learning
• Combining them in a single environment enables sparse
pattern recognition in dynamic systems
CarolinaCon 11
11. How I Did It Continued (Abby Normal)
• Data: Streaming data from sensors, collectors, files, etc.
• Platform: Streaming analytics process and analyze these
data, including attribution to the real world
• Visual Language Construct: Integrates streaming data,
streaming analytics, and streaming user experience in a
pluggable architecture
• Streaming User Experience: Immersive 3-D user experience
allows analysts to interact directly with streaming data and
analytics
CarolinaCon 11
12. Architecture (Meet the Architect)
Data Sensor
(N+1)
Data Collector
(N+1)
Kafka
Zookeeper
Kafka
Queue
Nimbus
Worker Node
Storm
Trident-ML
Analytics
Platform
Visual Language
Construct
Streaming User
Experience
Analytics and
Countermeasures
Game Players
CarolinaCon 11
13. Design Principles
Principle Enables
Open Source Components Supports integration of streaming analytics and immersive user
experience to create a dynamic feedback loop –rapidly adapt
the platform from lessons learned from human experience
Streaming Analytics Accelerating analytics to keep pace with data collection
(facilitating high collection rate)
Immersive Streaming User
Experience
Extending the user interface to allow broader groups of analysts
to use sophisticated analytics (addressing the recruiting
challenge)
Pluggable Architecture “Bring your own” tools and analytics supports crowdsourcing
and allows for aggressive exploitation of new analytics and user
experience paradigms
CarolinaCon 11
14. Larry Byrd: Network Defender of the Future
A basketball player can watch your network. When an attack occurs, our player can quickly
identify pattern shift using the same brain computation as when the player identifies a
shift in the offensive strategy of the opposing basketball team. Think about this as a data
prosthetic for the human brain.
CarolinaCon 11
15. Enough of Us Talking at You
• Fight fire with fire – crowdsource all comers and create an
asymmetric defense
• Align economic incentives, human behaviors, and defense
objectives
• Do for data what GUIs did for computers – make it
accessible!
• This isn’t about technology . . . it’s about revolutionizing the
way humans interact with data to enable a game-changing
leap forward
CarolinaCon 11
18. Demo Concept
Concept
• Normal work environment – “normal” patterns give way to aberrations
• This behavior is focused on network data, but could easily be any other
streaming data
Design
• Analytics cluster traffic based on source and destination port patterns
over time using k-means clustering
• Cubes represent nodes on the network; streaming spheres represent
packets
• Colors represent the behavior of nodes / packets based upon traffic –
Green is a client, Blue is a Server, Yellow is “undetermined behavior”
CarolinaCon 11
Green (client) Blue (server) Yellow (??)
Source Centroid 54760 1001 5066
Dest Centroid 791 54518 5511
19. Questions I Can Ask
• Is a given node on the network behaving as expected?
• Watch the node colors - they should be consistent in a normal network:
some white nodes, a lot of blue (client) nodes, and some green nodes.
What happens over time?
• Does my use of source and destination ports mark me out as a client or server?
Does my role appear consistent or change?
• The node colors indicate what they are – watch the colors of the nodes –
machines should have clear and consistent roles
• Is my pattern of nodes that I am interacting with consistent? Am I interacting
with different partners?
• Watch the stream patterns – machines should interact with consistent
groups
• Do my behaviors adhere to regular time cycles? Can I apply time cycles to any of
the above (e.g., a workday)?
• Watch the patterns change as cyclical time progresses in our “workday”
CarolinaCon 11
21. About Rob and John
• Rob Weiss is a senior systems engineer at G2 (www.g2-
inc.com) with over 24 years of experience in government
and commercial markets. He started with Legos and is now a
tool builder and problem solver. Currently runs the Altamira
Red Team and performs information security research,
looking for hard problems to solve. Twitter: @3XPlo1T2
• John Eberhardt is a Data Scientist at 3E Services
(www.3eservicesllc.com) with 20 years of quantitative
problem solving and a penchant for trying to decipher
symbolism in obscure 16th century literature. John has
experience in analytical problem solving in healthcare, life
sciences, security, financial services, consumer products,
and transportation. Twitter: @JohnSEberhardt3
CarolinaCon 11
23. Squiggly (probably won’t use this)
• A self organizing system consists of groups A, B, and C
interacting
• Hence, the current state of A is {A|B,C}
• They influence each other {B|A,C}, {C|A,B} which means
the system is described by f{{A|B,C},{B|A,C},{C|A,B}}
• However these groups are neither unitary nor static,
which means at any given time they can have sub-
attributes {Ai...An}, {Bi...Bn}, {Ci...Cn} that are unknown
• So now the system is described by f{{Ai | {Bi...Bn},
{Ci...Cn}},{Bi |{Ai...An}, {Ci...Cn}},{Ci |{Ai...An}, {Bi...Bn}}}
• How do you solve this np-hard problem?