SlideShare una empresa de Scribd logo
1 de 39
SHyRe
Streaming Hypothesis Reasoning
WILLIAM SMITH, PATRICK PAULSON, MARK BORKUM,
DEBORAH MCGUINNESS, BRENDA PRAGGASTIS, RUI YAN, YUE LIU
DAML 2016 – Seattle, WA
Smart Data Conference, 2015 – San Jose, California
January 26, 2016
The legends PROTECTED INFORMATION and PROPRIETARY INFORMATION apply to information describing Subject Inventions as defined in
Contract No. DE-AC05-76RL01830 and any other information which may be properly withheld from public disclosure thereunder
DOE’s National Laboratories are
Solving America’s Toughest Challenges
4
Mission
Drivers
Analyzing Changing
Online Landscapes
Seed LDRD Projects
- Signatures of Communities & Change
- Digital Currency Graph Forensics
- DarkNet Characterization
- Signatures in the Cloud
Signature Discovery
Initiative (SDI)
Analysis in Motion
(AIM)
National
Security
Computing
Disrupting Illicit
Trafficking
Nuclear Security
National Defense
Homeland Security
Special Programs
Seattle Innovation
District
Asymmetric Resilient
Cybersecurity (ARC)
Cyber-Physical
Systems
Ubiquitous
Sensing
Analysis in Motion
6
Streaming Data Characterization & Processing
Library of foundational streaming algorithms, methods for extracting features from streams
Data reduction techniques like semantic characterization
Hypothesis Generation & Testing
Scalable symbolic deduction & incremental machine learning to track a stream
Generate, update, and validate human-understandable hypotheses from streaming classifiers
Human-Machine Feedback
Interaction with human interfaces to implicitly weight, tune, and modify underlying models
Visual strategies for bidirectional communication of and interaction with multiple hypotheses
Work Environments
Integration framework and testing range
Instrumentation to measure overall accuracy, utility, and throughput
January 27, 2016 7
AIM Program Area 1
Streaming Data Characterization
Compression Analysis (CA)
Video compression algorithms provide an
efficient means of detecting and
classifying events in a stream
Nonstandard features
Became full project at mid-year
Scalable Feature Extraction and
Sampling (SFE)
Given a dataset, can we find a minimum
subset that provides similar accuracy as
the entire dataset?
Parallel setting using MPI
Open source library (MaTEX)
8
AIM Program Area 3
Human-Machine Feedback
User-Centered Hypothesis Definition
(UCHD)
Transitioned to new PM and new
technical focus in February
What does a machine-generated
hypothesis look like to a human
analyst?
Science of Interaction (SOI)
Use user clickstream data as an
indicator of user sensemaking
Developed and open-sourced the
Streaming Canvas software
UI engineering for use cases
User studies
January 27, 2016
January 27, 2016 9
AIM Program Area 3
Human-Machine Feedback
Mitigating Cognitive Depletion in Streaming Environments (CD)
Predict and mitigate human performance degradation
Quantify increase in error and impulsivity based on time from last break
Studies using Halo and exam data
User study planned
Kills / Deaths
Halo: Reach
Streaming Analytics
10
CHALLENGE
____________________________________________________________________
Craft machine-generated hypotheses as data
arrive, steering data collection and using human
feedback to tune a multi-classifier system.
PNNL IMPACT
____________________________________________________________
Developing niche in interactive streaming
analytics at scale; basis for invited keynotes at
IEEE HCBDR, AAAS Big Data in Life Science,
Data Science Innovation Summit, Science of
Multi-INT.
Developed streaming automated detection of first
point of failure in lithium battery through electron
microscopy.
PNNL streaming architecture used as reference
model for special programs sponsors.
Collaborators: Rensselaer Polytechnic,
Laboratory for Analytic Sciences.
TXT VIS STREAM GRAPH STATS DATA PROV CYBER
Data Provenance & Workflow at Extreme Scale
11
CHALLENGE
____________________________________________________________________
Ensuring reliable performance and
reproducibility of complex and adaptive
workflows in extreme scale environments.
PNNL IMPACT
____________________________________________________________
Workflow Performance Provenance
ontology captures performance and
reproducibility metrics across the complete
system and application stack, helping to
identify causal relationships.
ProvEn uses PNNL’s provenance ontology
to record, correlate, and analyze events;
distinguished from mainstream provenance
by focusing on process not just data
heritage.
PNNL is informing ASCR directions for
future provenance investments.
TXT VIS STREAM GRAPH STATS DATA PROV CYBER
Project Approach
Protected Information | Proprietary Information 12
National Security Computing Program Areas
13
INFRASTRUCTURE
 Data and workflow
management
 HPC programming models
and libraries
 Power, performance, and
reliability modeling
 Resiliency theory
 Mobile and edge computing
 Embedded systems
 Systems engineering and
agile development
 Cloud and streaming
architectures
 Modeling and simulation
 Data quality and
provenance
 Sampling strategies
 Experimental design
 Human language
technology
 Computer vision
 Large graph analysis
 Recommender systems
 Social and behavioral
science
ANALYTICS DECISION SUPPORT
 Visualization
 Human-computer
interaction
 User experience design
 Semantic computing
 Operations research
 Test environments
 Analytic tradecraft and
critical thinking
 Situational awareness
 Collaborative systems
 Training systems
MISSION AREAS AND OPERATIONAL DEPLOYMENT
Cyber analysis | Bio-surveillance | Social media analysis | Forensics | Emergency preparedness and response
Law enforcement | Critical infrastructure resiliency | Trafficking networks | Power grid management
January 27, 2016 14
Project Goals
Research Question
How do we structure the Semantic technology stack to consume and
reason over a volatile data stream, and what are the effects of this
configuration when expressing streaming data models through common-of-
the-shelf (COTS) reasoners?
Goals of Project
Build prototype frameworks created to consume streaming data into a
Semantic Web stack
Model streaming data in a Description Logic (DL) ontology and reason over
the new graph using a set of DL compliant reasoners
Model streaming data into an ontology, DL or comparable rule set, that can
be compared across reasoning clients
Study the effects of cache maintenance, primarily data eviction, on the
Semantic Web stack and results across reasoners
Develop engineering proposal to convert prototypes into singular platform
that can be deployed on cloud networks (AWS, PIC)
January 27, 2016 15
Project Approach
Propositional data are streaming in at a certain rate, and we can only see
some “window” of them at any given time.
We sample the data in the window and add them to a fixed-size cache.
We need effective methods of sampling.
The fixed-size cache differentiates our framing of the problem from
agglomerative databases (i.e., “just store everything”).
Deductive reasoning is continuously performed over the cache in order
to try and answer queries and corroborate/refute hypotheses as quickly
as possible.
Low-latency, high-throughput reasoning on ephemeral data is a hard, open
problem.
There will likely be many conclusions to bring to the attention of the user,
and so ranking is needed in order to prioritize attention.
The idea of ranking is not so hard, but determining the correct ordering is.
Approach
Fixed-size Cache
Data
Stream
Window Size
Data Rate
Pellet StarDog AllegroG
DINTNMR
USE CASE
Symbolic Reasoning
Hypotheses / Questions
Ranked Conclusions
cache
maintenance
sampling
16
Approach
Fixed-size Cache
Data
Stream
Window Size
Data Rate
Pellet StarDog AllegroG
DINTNMR
USE CASE
Symbolic Reasoning
Hypotheses / Questions
Ranked Conclusions
cache
maintenance
sampling
17
January 27, 2016 18
Engineering Approach
J2EE Pipeline
AVRO Packet StreamStream
JAVA Stream “Pull” Client
Use Case JAVA - Streaming Design Pattern Use Case JAVA - Streaming Design Pattern
JAVA Pellet Reasoner
StarDog TripleStore / Reasoner
AllegroGraph TripleStore / Reasoner
Not Implemented / Reasoner
January 27, 2016 19
Four Concurrent States
Ingestion Annotation Query Cache Mangement
Initialize Load Process
January 27, 2016 20
Four Concurrent States
Ingestion Annotation Query Cache Mangement
Initialize Load Process
FAST SLOW
January 27, 2016 21
SHyRe Decision Tree
January 27, 2016 22
SHyRe Decision Tree
January 27, 2016 23
SHyRe Decision Tree
5 Possible Outcomes:
1. Query Pellet with built in JENA RDF functionality
2. Query Pellet with SPARQL Query
3. Encode SPARQL to URL format and CURL a triplestore endpoint.
4. Use SNARL protocol to query StarDog with SPARQL Query
5. Use AGQuery protocol to query AllegroGraph with SPARQL Query
a. *RDFS++ Reasoning
January 27, 2016 24
Engineering Approach
J2EE Pipeline
AVRO Packet StreamStream
JAVA Stream “Pull” Client
Use Case JAVA - Streaming Design Pattern Use Case JAVA - Streaming Design Pattern
JAVA Pellet Reasoner
StarDog TripleStore / Reasoner
AllegroGraph TripleStore / Reasoner
Not Implemented / Reasoner
Use Case 1: Nuclear Magnetic
Resonance
Protected Information | Proprietary Information 25
January 27, 2016 26
What is Nuclear Magnetic Resonance?
January 27, 2016 27
NMR Accomplishments to Date
Research Question Answered
By consuming an undefined count of scans, can we assemble a NMR run,
model compounds within an ontology of background data, and then reason
across this new combined model of compound and spectrum ontology?
Logic Constraints Answered
Streaming data – When is a spectrum fully assembled?
How do we decide which functions to model in the ontology, and which to
apply to a query?
SHyRe NMR Model
Description Logic background ontology of compound classes and peaks
(Pellet implementation)
RDFS background ontology of compound classes and peaks (StarDog /
AllegroGraph implementations)
Consume and model a NMR run from a stream of spectrum scans
Query the NMR run after applying the compound background ontology
28
NMR Accomplishments to Date
29
NMR Accomplishments to Date
Use Case 2: Shipping a
Strategic Surprise
Protected Information | Proprietary Information 30
January 27, 2016 31
How do we detect a
Strategic Surprise?
Ford
Exemplar
HS-10237
HS-10238
HS-10239
HS-10240
HS-10241
HS-10242
HS-10237
HS-10238
HS-10239
HS-10246
HS-10248
HS-10243
Import
Stream
HS-10243
HS-10244
HS-10245
HS-10246
HS-10247
HS-10248
Nike
Exemplar
HS-10301
HS-10302
HS-10303
HS-10304
HS-10305
HS-10306
HS-10307
HS-10308
HS-10309
HS-10310
HS-10311
HS-10312
January 27, 2016 32
How do we detect a
Strategic Surprise?
Ford
Exemplar
HS-10237
HS-10238
HS-10239
HS-10240
HS-10241
HS-10242
HS-10237
HS-10238
HS-10239
HS-10246
HS-10248
HS-10243
Import
Stream
HS-10243
HS-10244
HS-10245
HS-10246
HS-10247
HS-10248
Nike
Exemplar
HS-10301
HS-10302
HS-10303
HS-10304
HS-10305
HS-10306
HS-10307
HS-10308
HS-10309
HS-10310
HS-10311
HS-10312
January 27, 2016 33
How do we detect a
Strategic Surprise?
Ford
Exemplar
HS-10237
HS-10238
HS-10239
HS-10240
HS-10241
HS-10242
HS-10246
HS-10248
HS-10243
HS-10303
HS-10311
HS-10307
Import
Stream
HS-10243
HS-10244
HS-10245
HS-10246
HS-10247
HS-10248
Nike
Exemplar
HS-10301
HS-10302
HS-10303
HS-10304
HS-10305
HS-10306
HS-10307
HS-10308
HS-10309
HS-10310
HS-10311
HS-10312
January 27, 2016 34
How do we detect a
Strategic Surprise?
Ford
Exemplar
HS-10237
HS-10238
HS-10239
HS-10240
HS-10241
HS-10242
HS-10303
HS-10311
HS-10307
HS-10304
HS-10305
HS-10312
Import
Stream
HS-10243
HS-10244
HS-10245
HS-10246
HS-10247
HS-10248
Nike
Exemplar
HS-10301
HS-10302
HS-10303
HS-10304
HS-10305
HS-10306
HS-10307
HS-10308
HS-10309
HS-10310
HS-10311
HS-10312
January 27, 2016 35
Strategic Surprise Accomplishments
to Date
Research Question Answered
Based on a company’s import records, can we determine if they are entering
a new LOB?
Logic Constraints Answered
Streaming data – have to determine if record might be important in future
Explain reasoning to enable user intervention / interaction and integration
with other models
SHyRe Strategic Surprise Model
Model each company by the HSCODEs it imports
Identify companies that represent all companies in a LOB
Exemplar of the LOB
Use training data to get HSCODEs used by each exemplar
Count the number of matching HSCODEs between monitored company and
exemplars
36
Strategic Surprise Accomplishments
to Date
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
Outputs 0 0 15 88 129
Inputs
Outputs
Required Input Records to Produce Output
January 27, 2016 37
Strategic Surprise Accomplishments
to Date
Input Import Records Output Results CPU (seconds) CPU (inputs / second)
0 0 1.292
1 0 1.693
10,000 15 77.619 128.834
20,000 88 185.553 107.786
30,000 129 330.895 90.663
40,000 169 508.902 78.601
Required Input Records to Produce Output
Project Challenges
Protected Information | Proprietary Information 39
Challenges
Reasoning Differences in Standards (RDFS / OWL EL/DL / RDFS++)
January 27, 2016 40
Reasoner Difficulty
Pellet Nearly complete OWL DL, but not currently maintained.
StarDog Strict separation of A-Box / T-Box reasoning within OWL DL across
embedded Pellet and StarDog systems. Creates oddly formed,
verbose SPARQL queries.
AllegroGraph Proprietary reasoning with inconsistent standards.
Complex cache eviction algorithms and unsupported SPARQL standards
Reasoner Difficulty
Pellet Requires complex internal storage algorithms to manipulate memory
graphs
StarDog SPARQL DELETE can only support literal triples. Variables within a
DELETE invoke background graph indexing and frequently fail.
January 27, 2016 41
Conclusions
Contract with Rensselaer Polytechnic Institute
Rui Yan and Yue Liu joined SHyRe team advised by Prof. Deborah McGuinness
Complete: International Conference for Biomedical Ontologies Paper
William Smith, Alan Chapell, Courtney Courley
Complete: Smart Data 2015 Conference
William Smith, Deborah McGuinness, Rui Yan
Complete: Conference on Information Knowledge Management 2015 Paper
Mark Borkum, William Smith, Deborah McGuinness, Rui Yan, Yue Liu
Complete: ISWC 2015 Workshop Paper
Rui Yan, Brenda Praggastis, William Smith, Deborah McGuinness
In Progress: Skolemization/Currying to enable decidable reasoning
Patrick Paulson
In Progress: Journal of Web Semantics, Streaming Edition Paper
William Smith
Human Centered Analytics
william.smith@pnnl.gov
+1.206.528.3356
SHYRE: Streaming
Hypothesis Reasoning
aim.pnnl.gov
Protected Information | Proprietary Information

Más contenido relacionado

La actualidad más candente

Building Interpretable & Secure AI Systems using PyTorch
Building Interpretable & Secure AI Systems using PyTorchBuilding Interpretable & Secure AI Systems using PyTorch
Building Interpretable & Secure AI Systems using PyTorchgeetachauhan
 
Tales from an ip worker in consulting and software
Tales from an ip worker in consulting and softwareTales from an ip worker in consulting and software
Tales from an ip worker in consulting and softwareGreg Makowski
 
data Fusion and log correlation
data Fusion and log correlationdata Fusion and log correlation
data Fusion and log correlationMahdi Sayyad
 
Treparel lt innovate summit june 27, 2013
Treparel lt innovate summit june 27, 2013Treparel lt innovate summit june 27, 2013
Treparel lt innovate summit june 27, 2013Treparel
 
Threat Detection in Surveillance Videos
Threat Detection in Surveillance VideosThreat Detection in Surveillance Videos
Threat Detection in Surveillance VideosDatabricks
 
Graphs in Telecommunications - Jesus Barrasa, Neo4j
Graphs in Telecommunications - Jesus Barrasa, Neo4jGraphs in Telecommunications - Jesus Barrasa, Neo4j
Graphs in Telecommunications - Jesus Barrasa, Neo4jNeo4j
 
Decentralized AI Draper
Decentralized AI   DraperDecentralized AI   Draper
Decentralized AI Drapergeetachauhan
 
Future is private intel dev fest
Future is private   intel dev festFuture is private   intel dev fest
Future is private intel dev festgeetachauhan
 
Traffic Data Analysis and Prediction using Big Data
Traffic Data Analysis and Prediction using Big DataTraffic Data Analysis and Prediction using Big Data
Traffic Data Analysis and Prediction using Big DataJongwook Woo
 
Red lambda FAQ's
Red lambda FAQ'sRed lambda FAQ's
Red lambda FAQ'sIla Group
 
modeling and predicting cyber hacking breaches
modeling and predicting cyber hacking breaches modeling and predicting cyber hacking breaches
modeling and predicting cyber hacking breaches Venkat Projects
 
Outlier and fraud detection using Hadoop
Outlier and fraud detection using HadoopOutlier and fraud detection using Hadoop
Outlier and fraud detection using HadoopPranab Ghosh
 
Classification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different FacetsClassification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different FacetsGeoffrey Fox
 
Nokia Augmented Reality
Nokia Augmented RealityNokia Augmented Reality
Nokia Augmented RealityStudioSFO
 
Big data technology by Data Sciences Thailand ในงาน THE FIRST NIDA BUSINESS A...
Big data technology by Data Sciences Thailand ในงาน THE FIRST NIDA BUSINESS A...Big data technology by Data Sciences Thailand ในงาน THE FIRST NIDA BUSINESS A...
Big data technology by Data Sciences Thailand ในงาน THE FIRST NIDA BUSINESS A...BAINIDA
 
Scalable Predictive Analysis and The Trend with Big Data & AI
Scalable Predictive Analysis and The Trend with Big Data & AIScalable Predictive Analysis and The Trend with Big Data & AI
Scalable Predictive Analysis and The Trend with Big Data & AIJongwook Woo
 
Using Graph Algorithms for Advanced Analytics - Part 2 Centrality
Using Graph Algorithms for Advanced Analytics - Part 2 CentralityUsing Graph Algorithms for Advanced Analytics - Part 2 Centrality
Using Graph Algorithms for Advanced Analytics - Part 2 CentralityTigerGraph
 

La actualidad más candente (20)

Building Interpretable & Secure AI Systems using PyTorch
Building Interpretable & Secure AI Systems using PyTorchBuilding Interpretable & Secure AI Systems using PyTorch
Building Interpretable & Secure AI Systems using PyTorch
 
Tales from an ip worker in consulting and software
Tales from an ip worker in consulting and softwareTales from an ip worker in consulting and software
Tales from an ip worker in consulting and software
 
data Fusion and log correlation
data Fusion and log correlationdata Fusion and log correlation
data Fusion and log correlation
 
Treparel lt innovate summit june 27, 2013
Treparel lt innovate summit june 27, 2013Treparel lt innovate summit june 27, 2013
Treparel lt innovate summit june 27, 2013
 
Threat Detection in Surveillance Videos
Threat Detection in Surveillance VideosThreat Detection in Surveillance Videos
Threat Detection in Surveillance Videos
 
[IJET-V2I4P10] Authors: Prof. Swetha.T.N, Dr. S.Bhargavi, Dr. Sreerama Reddy ...
[IJET-V2I4P10] Authors: Prof. Swetha.T.N, Dr. S.Bhargavi, Dr. Sreerama Reddy ...[IJET-V2I4P10] Authors: Prof. Swetha.T.N, Dr. S.Bhargavi, Dr. Sreerama Reddy ...
[IJET-V2I4P10] Authors: Prof. Swetha.T.N, Dr. S.Bhargavi, Dr. Sreerama Reddy ...
 
Graph Realities
Graph RealitiesGraph Realities
Graph Realities
 
Graphs in Telecommunications - Jesus Barrasa, Neo4j
Graphs in Telecommunications - Jesus Barrasa, Neo4jGraphs in Telecommunications - Jesus Barrasa, Neo4j
Graphs in Telecommunications - Jesus Barrasa, Neo4j
 
Decentralized AI Draper
Decentralized AI   DraperDecentralized AI   Draper
Decentralized AI Draper
 
Future is private intel dev fest
Future is private   intel dev festFuture is private   intel dev fest
Future is private intel dev fest
 
Traffic Data Analysis and Prediction using Big Data
Traffic Data Analysis and Prediction using Big DataTraffic Data Analysis and Prediction using Big Data
Traffic Data Analysis and Prediction using Big Data
 
Red lambda FAQ's
Red lambda FAQ'sRed lambda FAQ's
Red lambda FAQ's
 
modeling and predicting cyber hacking breaches
modeling and predicting cyber hacking breaches modeling and predicting cyber hacking breaches
modeling and predicting cyber hacking breaches
 
Outlier and fraud detection using Hadoop
Outlier and fraud detection using HadoopOutlier and fraud detection using Hadoop
Outlier and fraud detection using Hadoop
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Classification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different FacetsClassification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different Facets
 
Nokia Augmented Reality
Nokia Augmented RealityNokia Augmented Reality
Nokia Augmented Reality
 
Big data technology by Data Sciences Thailand ในงาน THE FIRST NIDA BUSINESS A...
Big data technology by Data Sciences Thailand ในงาน THE FIRST NIDA BUSINESS A...Big data technology by Data Sciences Thailand ในงาน THE FIRST NIDA BUSINESS A...
Big data technology by Data Sciences Thailand ในงาน THE FIRST NIDA BUSINESS A...
 
Scalable Predictive Analysis and The Trend with Big Data & AI
Scalable Predictive Analysis and The Trend with Big Data & AIScalable Predictive Analysis and The Trend with Big Data & AI
Scalable Predictive Analysis and The Trend with Big Data & AI
 
Using Graph Algorithms for Advanced Analytics - Part 2 Centrality
Using Graph Algorithms for Advanced Analytics - Part 2 CentralityUsing Graph Algorithms for Advanced Analytics - Part 2 Centrality
Using Graph Algorithms for Advanced Analytics - Part 2 Centrality
 

Similar a Streaming Hypothesis Reasoning - William Smith, Jan 2016

Challenges in Analytics for BIG Data
Challenges in Analytics for BIG DataChallenges in Analytics for BIG Data
Challenges in Analytics for BIG DataPrasant Misra
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceeRic Choo
 
Machine Learning and Hadoop
Machine Learning and HadoopMachine Learning and Hadoop
Machine Learning and HadoopJosh Patterson
 
HIGH-IMPACT USE CASES POWERED BY NEXT-GENERATION NETWORK ANALYTICS
HIGH-IMPACT USE CASES POWERED BY NEXT-GENERATION NETWORK ANALYTICSHIGH-IMPACT USE CASES POWERED BY NEXT-GENERATION NETWORK ANALYTICS
HIGH-IMPACT USE CASES POWERED BY NEXT-GENERATION NETWORK ANALYTICSHappiest Minds Technologies
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformSanjay Padhi, Ph.D
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data Geoffrey Fox
 
Accelerating Insight - Smart Data Lake Customer Success Stories
Accelerating Insight - Smart Data Lake Customer Success StoriesAccelerating Insight - Smart Data Lake Customer Success Stories
Accelerating Insight - Smart Data Lake Customer Success StoriesCambridge Semantics
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationDenodo
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science James Hendler
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsKamalika Dutta
 
Ieee 2016 cs project topics list mtech
Ieee 2016 cs project topics  list mtechIeee 2016 cs project topics  list mtech
Ieee 2016 cs project topics list mtechSoftroniics india
 
TUW - Quality of data-aware data analytics workflows
TUW - Quality of data-aware data analytics workflowsTUW - Quality of data-aware data analytics workflows
TUW - Quality of data-aware data analytics workflowsHong-Linh Truong
 
Big Data : Risks and Opportunities
Big Data : Risks and OpportunitiesBig Data : Risks and Opportunities
Big Data : Risks and OpportunitiesKenny Huang Ph.D.
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyRohit Dubey
 
A systems engineering methodology for wide area network selection
A systems engineering methodology for wide area network selectionA systems engineering methodology for wide area network selection
A systems engineering methodology for wide area network selectionAlexander Decker
 
Pantech java projects 2016-17
Pantech   java projects 2016-17Pantech   java projects 2016-17
Pantech java projects 2016-17Java Team
 
Pantech final year software projects (java)
Pantech   final year software projects (java)Pantech   final year software projects (java)
Pantech final year software projects (java)Senthil Kumar
 

Similar a Streaming Hypothesis Reasoning - William Smith, Jan 2016 (20)

Challenges in Analytics for BIG Data
Challenges in Analytics for BIG DataChallenges in Analytics for BIG Data
Challenges in Analytics for BIG Data
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
Machine Learning and Hadoop
Machine Learning and HadoopMachine Learning and Hadoop
Machine Learning and Hadoop
 
HIGH-IMPACT USE CASES POWERED BY NEXT-GENERATION NETWORK ANALYTICS
HIGH-IMPACT USE CASES POWERED BY NEXT-GENERATION NETWORK ANALYTICSHIGH-IMPACT USE CASES POWERED BY NEXT-GENERATION NETWORK ANALYTICS
HIGH-IMPACT USE CASES POWERED BY NEXT-GENERATION NETWORK ANALYTICS
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
2018 learning approach-digitaltrends
2018 learning approach-digitaltrends2018 learning approach-digitaltrends
2018 learning approach-digitaltrends
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data
 
Accelerating Insight - Smart Data Lake Customer Success Stories
Accelerating Insight - Smart Data Lake Customer Success StoriesAccelerating Insight - Smart Data Lake Customer Success Stories
Accelerating Insight - Smart Data Lake Customer Success Stories
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
Analytics&IoT
Analytics&IoTAnalytics&IoT
Analytics&IoT
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time Systems
 
Ieee 2016 cs project topics list mtech
Ieee 2016 cs project topics  list mtechIeee 2016 cs project topics  list mtech
Ieee 2016 cs project topics list mtech
 
TUW - Quality of data-aware data analytics workflows
TUW - Quality of data-aware data analytics workflowsTUW - Quality of data-aware data analytics workflows
TUW - Quality of data-aware data analytics workflows
 
Big Data : Risks and Opportunities
Big Data : Risks and OpportunitiesBig Data : Risks and Opportunities
Big Data : Risks and Opportunities
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
 
A systems engineering methodology for wide area network selection
A systems engineering methodology for wide area network selectionA systems engineering methodology for wide area network selection
A systems engineering methodology for wide area network selection
 
Pantech java projects 2016-17
Pantech   java projects 2016-17Pantech   java projects 2016-17
Pantech java projects 2016-17
 
Pantech final year software projects (java)
Pantech   final year software projects (java)Pantech   final year software projects (java)
Pantech final year software projects (java)
 

Más de Seattle DAML meetup

Karin Strauss - DNA Storage, July 2016
Karin Strauss - DNA Storage, July 2016Karin Strauss - DNA Storage, July 2016
Karin Strauss - DNA Storage, July 2016Seattle DAML meetup
 
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016Seattle DAML meetup
 
Understanding disparities using the American Community Survey - Sean Green, M...
Understanding disparities using the American Community Survey - Sean Green, M...Understanding disparities using the American Community Survey - Sean Green, M...
Understanding disparities using the American Community Survey - Sean Green, M...Seattle DAML meetup
 
Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016
Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016
Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016Seattle DAML meetup
 
Frequent Pattern Mining - Krishna Sridhar, Feb 2016
Frequent Pattern Mining - Krishna Sridhar, Feb 2016Frequent Pattern Mining - Krishna Sridhar, Feb 2016
Frequent Pattern Mining - Krishna Sridhar, Feb 2016Seattle DAML meetup
 
Been Kim - Interpretable machine learning, Nov 2015
Been Kim - Interpretable machine learning, Nov 2015Been Kim - Interpretable machine learning, Nov 2015
Been Kim - Interpretable machine learning, Nov 2015Seattle DAML meetup
 
Hunting criminals with hybrid analytics -- October 2015
Hunting criminals with hybrid analytics -- October 2015Hunting criminals with hybrid analytics -- October 2015
Hunting criminals with hybrid analytics -- October 2015Seattle DAML meetup
 
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...Seattle DAML meetup
 
Adventures in Data Visualization - Jeff Heer, May 2015
Adventures in Data Visualization - Jeff Heer, May 2015Adventures in Data Visualization - Jeff Heer, May 2015
Adventures in Data Visualization - Jeff Heer, May 2015Seattle DAML meetup
 
The Road to Data Science - Joel Grus, June 2015
The Road to Data Science - Joel Grus, June 2015The Road to Data Science - Joel Grus, June 2015
The Road to Data Science - Joel Grus, June 2015Seattle DAML meetup
 
Scaling decision trees - George Murray, July 2015
Scaling decision trees - George Murray, July 2015Scaling decision trees - George Murray, July 2015
Scaling decision trees - George Murray, July 2015Seattle DAML meetup
 

Más de Seattle DAML meetup (11)

Karin Strauss - DNA Storage, July 2016
Karin Strauss - DNA Storage, July 2016Karin Strauss - DNA Storage, July 2016
Karin Strauss - DNA Storage, July 2016
 
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
 
Understanding disparities using the American Community Survey - Sean Green, M...
Understanding disparities using the American Community Survey - Sean Green, M...Understanding disparities using the American Community Survey - Sean Green, M...
Understanding disparities using the American Community Survey - Sean Green, M...
 
Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016
Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016
Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016
 
Frequent Pattern Mining - Krishna Sridhar, Feb 2016
Frequent Pattern Mining - Krishna Sridhar, Feb 2016Frequent Pattern Mining - Krishna Sridhar, Feb 2016
Frequent Pattern Mining - Krishna Sridhar, Feb 2016
 
Been Kim - Interpretable machine learning, Nov 2015
Been Kim - Interpretable machine learning, Nov 2015Been Kim - Interpretable machine learning, Nov 2015
Been Kim - Interpretable machine learning, Nov 2015
 
Hunting criminals with hybrid analytics -- October 2015
Hunting criminals with hybrid analytics -- October 2015Hunting criminals with hybrid analytics -- October 2015
Hunting criminals with hybrid analytics -- October 2015
 
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
 
Adventures in Data Visualization - Jeff Heer, May 2015
Adventures in Data Visualization - Jeff Heer, May 2015Adventures in Data Visualization - Jeff Heer, May 2015
Adventures in Data Visualization - Jeff Heer, May 2015
 
The Road to Data Science - Joel Grus, June 2015
The Road to Data Science - Joel Grus, June 2015The Road to Data Science - Joel Grus, June 2015
The Road to Data Science - Joel Grus, June 2015
 
Scaling decision trees - George Murray, July 2015
Scaling decision trees - George Murray, July 2015Scaling decision trees - George Murray, July 2015
Scaling decision trees - George Murray, July 2015
 

Último

Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxSilpa
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professormuralinath2
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Silpa
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learninglevieagacer
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry Areesha Ahmad
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.Silpa
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLkantirani197
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIADr. TATHAGAT KHOBRAGADE
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Silpa
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptxArvind Kumar
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxSilpa
 

Último (20)

Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptx
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 

Streaming Hypothesis Reasoning - William Smith, Jan 2016

  • 1. SHyRe Streaming Hypothesis Reasoning WILLIAM SMITH, PATRICK PAULSON, MARK BORKUM, DEBORAH MCGUINNESS, BRENDA PRAGGASTIS, RUI YAN, YUE LIU DAML 2016 – Seattle, WA Smart Data Conference, 2015 – San Jose, California January 26, 2016 The legends PROTECTED INFORMATION and PROPRIETARY INFORMATION apply to information describing Subject Inventions as defined in Contract No. DE-AC05-76RL01830 and any other information which may be properly withheld from public disclosure thereunder
  • 2. DOE’s National Laboratories are Solving America’s Toughest Challenges 4
  • 3. Mission Drivers Analyzing Changing Online Landscapes Seed LDRD Projects - Signatures of Communities & Change - Digital Currency Graph Forensics - DarkNet Characterization - Signatures in the Cloud Signature Discovery Initiative (SDI) Analysis in Motion (AIM) National Security Computing Disrupting Illicit Trafficking Nuclear Security National Defense Homeland Security Special Programs Seattle Innovation District Asymmetric Resilient Cybersecurity (ARC) Cyber-Physical Systems Ubiquitous Sensing
  • 4. Analysis in Motion 6 Streaming Data Characterization & Processing Library of foundational streaming algorithms, methods for extracting features from streams Data reduction techniques like semantic characterization Hypothesis Generation & Testing Scalable symbolic deduction & incremental machine learning to track a stream Generate, update, and validate human-understandable hypotheses from streaming classifiers Human-Machine Feedback Interaction with human interfaces to implicitly weight, tune, and modify underlying models Visual strategies for bidirectional communication of and interaction with multiple hypotheses Work Environments Integration framework and testing range Instrumentation to measure overall accuracy, utility, and throughput
  • 5. January 27, 2016 7 AIM Program Area 1 Streaming Data Characterization Compression Analysis (CA) Video compression algorithms provide an efficient means of detecting and classifying events in a stream Nonstandard features Became full project at mid-year Scalable Feature Extraction and Sampling (SFE) Given a dataset, can we find a minimum subset that provides similar accuracy as the entire dataset? Parallel setting using MPI Open source library (MaTEX)
  • 6. 8 AIM Program Area 3 Human-Machine Feedback User-Centered Hypothesis Definition (UCHD) Transitioned to new PM and new technical focus in February What does a machine-generated hypothesis look like to a human analyst? Science of Interaction (SOI) Use user clickstream data as an indicator of user sensemaking Developed and open-sourced the Streaming Canvas software UI engineering for use cases User studies January 27, 2016
  • 7. January 27, 2016 9 AIM Program Area 3 Human-Machine Feedback Mitigating Cognitive Depletion in Streaming Environments (CD) Predict and mitigate human performance degradation Quantify increase in error and impulsivity based on time from last break Studies using Halo and exam data User study planned Kills / Deaths Halo: Reach
  • 8. Streaming Analytics 10 CHALLENGE ____________________________________________________________________ Craft machine-generated hypotheses as data arrive, steering data collection and using human feedback to tune a multi-classifier system. PNNL IMPACT ____________________________________________________________ Developing niche in interactive streaming analytics at scale; basis for invited keynotes at IEEE HCBDR, AAAS Big Data in Life Science, Data Science Innovation Summit, Science of Multi-INT. Developed streaming automated detection of first point of failure in lithium battery through electron microscopy. PNNL streaming architecture used as reference model for special programs sponsors. Collaborators: Rensselaer Polytechnic, Laboratory for Analytic Sciences. TXT VIS STREAM GRAPH STATS DATA PROV CYBER
  • 9. Data Provenance & Workflow at Extreme Scale 11 CHALLENGE ____________________________________________________________________ Ensuring reliable performance and reproducibility of complex and adaptive workflows in extreme scale environments. PNNL IMPACT ____________________________________________________________ Workflow Performance Provenance ontology captures performance and reproducibility metrics across the complete system and application stack, helping to identify causal relationships. ProvEn uses PNNL’s provenance ontology to record, correlate, and analyze events; distinguished from mainstream provenance by focusing on process not just data heritage. PNNL is informing ASCR directions for future provenance investments. TXT VIS STREAM GRAPH STATS DATA PROV CYBER
  • 10. Project Approach Protected Information | Proprietary Information 12
  • 11. National Security Computing Program Areas 13 INFRASTRUCTURE  Data and workflow management  HPC programming models and libraries  Power, performance, and reliability modeling  Resiliency theory  Mobile and edge computing  Embedded systems  Systems engineering and agile development  Cloud and streaming architectures  Modeling and simulation  Data quality and provenance  Sampling strategies  Experimental design  Human language technology  Computer vision  Large graph analysis  Recommender systems  Social and behavioral science ANALYTICS DECISION SUPPORT  Visualization  Human-computer interaction  User experience design  Semantic computing  Operations research  Test environments  Analytic tradecraft and critical thinking  Situational awareness  Collaborative systems  Training systems MISSION AREAS AND OPERATIONAL DEPLOYMENT Cyber analysis | Bio-surveillance | Social media analysis | Forensics | Emergency preparedness and response Law enforcement | Critical infrastructure resiliency | Trafficking networks | Power grid management
  • 12. January 27, 2016 14 Project Goals Research Question How do we structure the Semantic technology stack to consume and reason over a volatile data stream, and what are the effects of this configuration when expressing streaming data models through common-of- the-shelf (COTS) reasoners? Goals of Project Build prototype frameworks created to consume streaming data into a Semantic Web stack Model streaming data in a Description Logic (DL) ontology and reason over the new graph using a set of DL compliant reasoners Model streaming data into an ontology, DL or comparable rule set, that can be compared across reasoning clients Study the effects of cache maintenance, primarily data eviction, on the Semantic Web stack and results across reasoners Develop engineering proposal to convert prototypes into singular platform that can be deployed on cloud networks (AWS, PIC)
  • 13. January 27, 2016 15 Project Approach Propositional data are streaming in at a certain rate, and we can only see some “window” of them at any given time. We sample the data in the window and add them to a fixed-size cache. We need effective methods of sampling. The fixed-size cache differentiates our framing of the problem from agglomerative databases (i.e., “just store everything”). Deductive reasoning is continuously performed over the cache in order to try and answer queries and corroborate/refute hypotheses as quickly as possible. Low-latency, high-throughput reasoning on ephemeral data is a hard, open problem. There will likely be many conclusions to bring to the attention of the user, and so ranking is needed in order to prioritize attention. The idea of ranking is not so hard, but determining the correct ordering is.
  • 14. Approach Fixed-size Cache Data Stream Window Size Data Rate Pellet StarDog AllegroG DINTNMR USE CASE Symbolic Reasoning Hypotheses / Questions Ranked Conclusions cache maintenance sampling 16
  • 15. Approach Fixed-size Cache Data Stream Window Size Data Rate Pellet StarDog AllegroG DINTNMR USE CASE Symbolic Reasoning Hypotheses / Questions Ranked Conclusions cache maintenance sampling 17
  • 16. January 27, 2016 18 Engineering Approach J2EE Pipeline AVRO Packet StreamStream JAVA Stream “Pull” Client Use Case JAVA - Streaming Design Pattern Use Case JAVA - Streaming Design Pattern JAVA Pellet Reasoner StarDog TripleStore / Reasoner AllegroGraph TripleStore / Reasoner Not Implemented / Reasoner
  • 17. January 27, 2016 19 Four Concurrent States Ingestion Annotation Query Cache Mangement Initialize Load Process
  • 18. January 27, 2016 20 Four Concurrent States Ingestion Annotation Query Cache Mangement Initialize Load Process FAST SLOW
  • 19. January 27, 2016 21 SHyRe Decision Tree
  • 20. January 27, 2016 22 SHyRe Decision Tree
  • 21. January 27, 2016 23 SHyRe Decision Tree 5 Possible Outcomes: 1. Query Pellet with built in JENA RDF functionality 2. Query Pellet with SPARQL Query 3. Encode SPARQL to URL format and CURL a triplestore endpoint. 4. Use SNARL protocol to query StarDog with SPARQL Query 5. Use AGQuery protocol to query AllegroGraph with SPARQL Query a. *RDFS++ Reasoning
  • 22. January 27, 2016 24 Engineering Approach J2EE Pipeline AVRO Packet StreamStream JAVA Stream “Pull” Client Use Case JAVA - Streaming Design Pattern Use Case JAVA - Streaming Design Pattern JAVA Pellet Reasoner StarDog TripleStore / Reasoner AllegroGraph TripleStore / Reasoner Not Implemented / Reasoner
  • 23. Use Case 1: Nuclear Magnetic Resonance Protected Information | Proprietary Information 25
  • 24. January 27, 2016 26 What is Nuclear Magnetic Resonance?
  • 25. January 27, 2016 27 NMR Accomplishments to Date Research Question Answered By consuming an undefined count of scans, can we assemble a NMR run, model compounds within an ontology of background data, and then reason across this new combined model of compound and spectrum ontology? Logic Constraints Answered Streaming data – When is a spectrum fully assembled? How do we decide which functions to model in the ontology, and which to apply to a query? SHyRe NMR Model Description Logic background ontology of compound classes and peaks (Pellet implementation) RDFS background ontology of compound classes and peaks (StarDog / AllegroGraph implementations) Consume and model a NMR run from a stream of spectrum scans Query the NMR run after applying the compound background ontology
  • 28. Use Case 2: Shipping a Strategic Surprise Protected Information | Proprietary Information 30
  • 29. January 27, 2016 31 How do we detect a Strategic Surprise? Ford Exemplar HS-10237 HS-10238 HS-10239 HS-10240 HS-10241 HS-10242 HS-10237 HS-10238 HS-10239 HS-10246 HS-10248 HS-10243 Import Stream HS-10243 HS-10244 HS-10245 HS-10246 HS-10247 HS-10248 Nike Exemplar HS-10301 HS-10302 HS-10303 HS-10304 HS-10305 HS-10306 HS-10307 HS-10308 HS-10309 HS-10310 HS-10311 HS-10312
  • 30. January 27, 2016 32 How do we detect a Strategic Surprise? Ford Exemplar HS-10237 HS-10238 HS-10239 HS-10240 HS-10241 HS-10242 HS-10237 HS-10238 HS-10239 HS-10246 HS-10248 HS-10243 Import Stream HS-10243 HS-10244 HS-10245 HS-10246 HS-10247 HS-10248 Nike Exemplar HS-10301 HS-10302 HS-10303 HS-10304 HS-10305 HS-10306 HS-10307 HS-10308 HS-10309 HS-10310 HS-10311 HS-10312
  • 31. January 27, 2016 33 How do we detect a Strategic Surprise? Ford Exemplar HS-10237 HS-10238 HS-10239 HS-10240 HS-10241 HS-10242 HS-10246 HS-10248 HS-10243 HS-10303 HS-10311 HS-10307 Import Stream HS-10243 HS-10244 HS-10245 HS-10246 HS-10247 HS-10248 Nike Exemplar HS-10301 HS-10302 HS-10303 HS-10304 HS-10305 HS-10306 HS-10307 HS-10308 HS-10309 HS-10310 HS-10311 HS-10312
  • 32. January 27, 2016 34 How do we detect a Strategic Surprise? Ford Exemplar HS-10237 HS-10238 HS-10239 HS-10240 HS-10241 HS-10242 HS-10303 HS-10311 HS-10307 HS-10304 HS-10305 HS-10312 Import Stream HS-10243 HS-10244 HS-10245 HS-10246 HS-10247 HS-10248 Nike Exemplar HS-10301 HS-10302 HS-10303 HS-10304 HS-10305 HS-10306 HS-10307 HS-10308 HS-10309 HS-10310 HS-10311 HS-10312
  • 33. January 27, 2016 35 Strategic Surprise Accomplishments to Date Research Question Answered Based on a company’s import records, can we determine if they are entering a new LOB? Logic Constraints Answered Streaming data – have to determine if record might be important in future Explain reasoning to enable user intervention / interaction and integration with other models SHyRe Strategic Surprise Model Model each company by the HSCODEs it imports Identify companies that represent all companies in a LOB Exemplar of the LOB Use training data to get HSCODEs used by each exemplar Count the number of matching HSCODEs between monitored company and exemplars
  • 34. 36 Strategic Surprise Accomplishments to Date 0 5000 10000 15000 20000 25000 30000 35000 40000 45000 Outputs 0 0 15 88 129 Inputs Outputs Required Input Records to Produce Output
  • 35. January 27, 2016 37 Strategic Surprise Accomplishments to Date Input Import Records Output Results CPU (seconds) CPU (inputs / second) 0 0 1.292 1 0 1.693 10,000 15 77.619 128.834 20,000 88 185.553 107.786 30,000 129 330.895 90.663 40,000 169 508.902 78.601 Required Input Records to Produce Output
  • 36. Project Challenges Protected Information | Proprietary Information 39
  • 37. Challenges Reasoning Differences in Standards (RDFS / OWL EL/DL / RDFS++) January 27, 2016 40 Reasoner Difficulty Pellet Nearly complete OWL DL, but not currently maintained. StarDog Strict separation of A-Box / T-Box reasoning within OWL DL across embedded Pellet and StarDog systems. Creates oddly formed, verbose SPARQL queries. AllegroGraph Proprietary reasoning with inconsistent standards. Complex cache eviction algorithms and unsupported SPARQL standards Reasoner Difficulty Pellet Requires complex internal storage algorithms to manipulate memory graphs StarDog SPARQL DELETE can only support literal triples. Variables within a DELETE invoke background graph indexing and frequently fail.
  • 38. January 27, 2016 41 Conclusions Contract with Rensselaer Polytechnic Institute Rui Yan and Yue Liu joined SHyRe team advised by Prof. Deborah McGuinness Complete: International Conference for Biomedical Ontologies Paper William Smith, Alan Chapell, Courtney Courley Complete: Smart Data 2015 Conference William Smith, Deborah McGuinness, Rui Yan Complete: Conference on Information Knowledge Management 2015 Paper Mark Borkum, William Smith, Deborah McGuinness, Rui Yan, Yue Liu Complete: ISWC 2015 Workshop Paper Rui Yan, Brenda Praggastis, William Smith, Deborah McGuinness In Progress: Skolemization/Currying to enable decidable reasoning Patrick Paulson In Progress: Journal of Web Semantics, Streaming Edition Paper
  • 39. William Smith Human Centered Analytics william.smith@pnnl.gov +1.206.528.3356 SHYRE: Streaming Hypothesis Reasoning aim.pnnl.gov Protected Information | Proprietary Information

Notas del editor

  1. 2 Outside views - Civil servants – Amusing during election season, which stopped being a season long ago and is now just the perpetual state of things Department of Defense – Controversial when at peace, and necessary when somebody is somewhere they’re not supposed to be and it’s going to take a pretty penny to get them out. Third Arm of Government DOE – Infrastructure and Science. 17 labs, national highway system, power plants, green energy, smart grids, environmental regulation, cyber security, disease tracking… Point out some of the labs PNNL Nuclear Labs Sanida (z-machine) Fermi – (collider) NREL Energy Grid Lab We are the support system for the myriad of other internal US departments that support state governments and national projects.
  2. 2 Outside views - Civil servants – Amusing during election season, which stopped being a season long ago and is now just the perpetual state of things Department of Defense – Controversial when at peace, and necessary when somebody is somewhere they’re not supposed to be and it’s going to take a pretty penny to get them out. Third Arm of Government DOE – Infrastructure and Science. 17 labs, national highway system, power plants, green energy, smart grids, environmental regulation, cyber security, disease tracking… Point out some of the labs PNNL Nuclear Labs Sanida (z-machine) Fermi – (collider) NREL Energy Grid Lab We are the support system for the myriad of other internal US departments that support state governments and national projects. PNNL strengths and cultural focus on … Focus on strengthening & leveraging the science base Focus on impacting mission & developing next generation History – 3 innovations that won WWII DOE labs overall role Accelerate the rate of innovation (user facilities, next gen, scientific leadership) Address enduring, S&T centered mission challenges (naval reactors, nonpro, energy, stockpile) Ensure ability to react to rapid change or crises (critical materials, Fukushima, 9/11, cyber) Achieve and prevent technology surprise (security mission-centric) Enhance economic competitiveness
  3. Oppy here for analytic provenance in SP sponsors – big challenge there. one other thing that makes ProvEn research different is that we are more focused on how the captured provenance can provide real actionable insights for the users e.g. why is my workflow performance so variable, what was different in this process from different instances, how can I improve workflow performance etc.
  4. A strategic surprise is a material that doesn’t relate to the line of business of a company, or that is embargoed from that specific company receiving. Infant incubators. Great for premature babies, and also not bad for creating bio weapons. So if you’re a country with 10 hospitals and 4 premature babies why would your state company need 10,000 infant incubators? Same with glove boxes. Great for specific industries… that your country doesn’t have… so why do you need 10,000 glove boxes? Company that needs ONE industrial piece of equipment very rarely. Like a french bakery. Once every 20 years it will need an industrial HVAC system… so why did this one just import 50 of them? Loading dock switches. Let’s say I buy up the loading dock right next to a company that does need to import something, and make a deal about things moving from loading dock 1-AC to 1-AD. We took a much broader point of view before the fine grained loading dock switch…
  5. We started with a central research question like all PNNL projects… <read question> Through this question, and a slight project reorganization, the following goals were aligned to FY15. <read slides>
  6. Data pipe & sampling window – Provided by outside Java enterprise framework. Fixed size cache. There were a couple different of caching algorithms depending on in memory caching, application caching, and triple store caching Several reasoning platforms were selected, depending mainly on claims from manuals and marketing resources End user. Human in the loop was a large requirement, but we did focus on an analyst function.
  7. We chose 3 core areas for FY15: We selected three COTS – common off the shelf – reasoners with different degrees of the OWL and RDFS specification implemented and expressivity for testing. We researched cache maintenance and have placed emphasis on cache eviction and how do you maintain a stable, but relevant, graph for queries Communication with the underlying infrastructure. The original build of SHYRE was good at consuming and producing metrics about the query operation. FY15 focused not only on the use cases, but providing results back to the infrastructure for user review: a. Talk to automated entity extraction agent b. Provided conclusions to analytics UIs c. Create propositions for future use within Shyre
  8. The mile high picture of the design pattern is a workflow engine (or state machine) that runs four concurrent processes. MOST IMPORTANTLY, after starting each process they MUST run independently and not require a synchronization mechanisms beyond thread safe programming. The ingestion client is responsible for connecting to a stream, decoding an byte-stream encoded packet, completing an initial test on data conformity to the use case upon initialization. It then provides the decoded data to the annotation mechanism. Ingestion is not responsible for establishing a medium term storage solution, and all decoded data is immediately stored in a FIFO list for annotation. This is the state composed of processes for encoding data into an RDF graph and providing the graph to a reasoner. Data annotation to Semantic Web RDF identifiers creates a unique challenge, as each use case generally requires a different markup for decoded data. The ability to create a question and propose it in a way a reasoner can answer using an RDF graph or triplestore. Querying the Semantic Web, after annotating and storing a data stream, is a variable state intended to run in tandem with all states after the initial ingestion. Because the graphs are created and modified as the stream arrives, annotation and query design must be composed for a streaming architecture Cache maintenance… that’s really really hard.
  9. Ingestion can run really really fast Querying can run really really slow, especially depending on how you structure the query and logic. DL almost exponentially adds to query time as more triples are added to the graph, and RDFS you have to take special care on how you construct the query to ensure something is always returned for cache management And cache management never works.
  10. These 4 states create a decision tree - Data consumption from architecture Now that we have the data we have to make a decision based on annotation requirements, use case, and reasoner package In-memory systems quickly assembling RDF models, no metering of FIFO cache consumption is necessary Attempts to meter access to the temporary cache providing time to complete the string building process necessary String manipulation into RDF triples Template variable substitution into valid RDF/XML/TURTLE/ETC Graphs
  11. Query Pellet with built in JENA/OPEN RDF functionality Query Pellet across the file system with SPARQL Query Encode SPARQL to URL format and CURL a triplestore endpoint. Use SNARL protocol to query StarDog with SPARQL Query Use AGQuery protocol to query AllegroGraph with SPARQL Query a. *RDFS++ Reasoning
  12. And here are the 5 outcomes you have to account for when evaluating this kind of system. 1 and 2, while in memory (or on the file system if you’re particularly pressed for time) Pellet is slow, but it isn’t hand cuffed when it comes to pretty much complete DL logic. This is the easiest to create a static query, that always returns an expectation of true / false per entity within the query and ontology. 3 is the grand mystery highly dependent on triplestore. How did you populate it? Could you curl a query? Was there a proprietary loading method like ISQL or mark logic pipeline demo’d yesterday? How much file system is used? How do we know when we can query? 4-5 Both StarDog and AllegroGraph have similar custom protocols for populating the graphs, and it becomes much more of an issue of query composition, especially with Standard SPARQL, RDFS or OWL EL.
  13. Now that the broad overview is covered, we will be focusing on this yellow strip as our use case applications. This is where the majority of the custom SHYRE LOGIC – not stream ingestion - had to be created, and where the majority of the design pattern and decisions we just discussed were deployed.
  14. Nuclear Magnetic Resonance (NMR) spectroscopy is an analytic technique that exploits the magnetic properties of certain atomic nuclei in order to determine the physical and chemical properties of the molecules in which they are contained (e.g., the chemical structure).
  15. Let’s run through our accomplishments with each use case - we will begin with NMR. 1. <read question 1> - Yes, we can. However there is a large scalability issue as scans become more complex ballooning our query time from 10 seconds to 19 minutes. 2. <read logic constraints> - a. Yes, we tracked on run numbers and did an additional test using query result completeness. IE – When is a query not returning any new results? b. OWL DL vs. RDFS – This was decided for us generally by the reasoner being used. StarDog has an interesting quirk where it breaks queries into A-Box / T-Box reasoning forcing the query author to be careful when composing queries and modeling an ontology 3. Finally, this is what we came up with - <read slide>
  16. MAJOR POINTS 25 Time Linear Runs out of 25,000+ This should be a straight line with no bumps. There should not have been this much change in such a small amount of time so something isn’t tracking between the graphs, ontology and queries.
  17. Roughly 1,730 seems to be golden graph size where we start to lose results and compound confirmations However, 1,698 / 1,722 returns all 11 positives here. This brings up the question of graph utilization – every triple in this graph applies to a compound we’re searching Each of these queries took roughly 20 seconds on an NMR run composed of between 30 and 50 scans. As scans increase query time increases dramatically due to ranging functions in the DL ontology having to search every triple 30 times. By the time we’re at 250+ scans (~20K triples in graph) queries take around 19 minutes and only return “possible” for all 30 compounds. This brings up the question of a “data deluge” never negating or affirming a chemical, while also providing enough information the probability is high – or nearly certain.
  18. A strategic surprise is a material that doesn’t relate to the line of business of a company, or that is embargoed from that specific company receiving. Infant incubators. Great for premature babies, and also not bad for creating bio weapons. So if you’re a country with 10 hospitals and 4 premature babies why would your state company need 10,000 infant incubators? Same with glove boxes. Great for specific industries… that your country doesn’t have… so why do you need 10,000 glove boxes? Company that needs ONE industrial piece of equipment very rarely. Like a french bakery. Once every 20 years it will need an industrial HVAC system… so why did this one just import 50 of them? Loading dock switches. Let’s say I buy up the loading dock right next to a company that does need to import something, and make a deal about things moving from loading dock 1-AC to 1-AD. We took a much broader point of view before the fine grained loading dock switch…
  19. Import stream – just your normal company pulling things in Ford is good for automotive Nike is good for athletic gear Poor examples: Samsung Toyota General Electric Walmart is actually on the line, but can be an exemplar. Sure they import a lot of finished consumer goods, but how many stores do they own and how often would they need raw materials and construction systems replaced (HVAC)? Walmart doesn’t sell cars or reactors, and they don’t need the raw goods to create their consumer goods. So there is a superstore category for finished consumer goods.
  20. Company A looks like Exemplar Ford so they’re importing automotive goods
  21. Well…. Now they don’t…
  22. Company A looks like Exemplar Nike so they’re importing athletic goods
  23. Next we move onto the Strategic Surprise accomplishments From the questions earlier in the talk, and established Strategic Surprise DL Ontology, our problem became how do we model so much data – record packets of 30+ values – and which are useful for determining drift in business category? This was accomplished by establishing a set of exemplar companies, or companies that never or very infrequently import outside of a given business category. Constraints on logic included – Record decomposition – common problem in the streaming data . More importantly, and the problem I frequently have, is explaining when and why the SHYRE reasoner creates an output record. Records are created ONLY when a company begins aligning itself with an exemplar, and this drift toward a specific industry provides the metrics for SOI from Shyre. That means in the SOI demonstration you’re only going to see SHYRE lines stepping up as a company becomes more and more like a specific exemplar company within an industry. 3. Finally, this is what we came up with - <read slide>
  24. This follows common logic in the fact most companies aren’t rapidly entering different lines of business and importing goods that don’t pertain to their business model.
  25. CPU seconds are nearly doubling as company import tendencies are modeled. Processing power is halved by the time we ingest roughly a magnitude of 4 from the original 10,000 records. Good because it doesn’t happen at 20,000, but an obvious bottle neck at some point.
  26. 30 seconds it starts – What is OPA What is Shyre What is the movement in between Talk threshold 1:30 shyre detects appliances 2:10 Detects a change Go to the background 5:20 Talk console
  27. So.. What challenges did we have. Beyond the aforementioned scalability covered previously…. I’m just reading this slide because it’s my favorite slide and the bane of my last year of research. Read Slide