SlideShare una empresa de Scribd logo
1 de 41
Descargar para leer sin conexión
GraphAware®
SIGNALS FROM OUTER
SPACE
Vlasta Kůs, Data Scientist @ GraphAware
graphaware.com
@graph_aware, @VlastaKus
How NASA Benefits from Graph-Powered NLP
‣ Database of learned knowledge across NASA’s programs & projects
‣ Unstructured text with basic metadata
‣ Collected since late 1950s (100s of millions of documents)
‣ Public dataset: ~1600 documents
NASA’s Lessons Learned
GraphAware®
"1406",420,"Roberts, J “,
"VO'75 Pressure Regulator Leakage and Work-Around
Procedures (~1976)”,
"The pressure regulator in the Viking Orbiter Propulsion
Subsystem started leaking following a pyro firing that
occurred prior to the near-Mars TCM. Likely causes were
corrosion or residue from propellant migration or pyro
valve blowby, or particulate contamination. Recommendations
included using separate regulators for the fuel and
oxidizer sides, incorporating a bellows in the pyro valve
to eliminate blowby, and adding a isolation valve between
the regulator and propellant tank.“,
" The micro-scale effects of long-term propellant exposure
should be investigated in order to better critique
regulator design. “,
"JPL",1996-07-08,"",TRUE,"",1460,7,NA,"https://
nen.nasa.gov/web/11/viewall/-/viewall/420"
NASA’s Lessons Learned Database
GraphAware®
“673",1326,"Relvini, Kristine “,
"Lessons Learned Not Being Inputted Into Lessons
Learned Information System (LLIS) Database”,
“",
"If you don't document the lessons learned, you loose
knowledgeable, shared information and tracking capacity
across programs.“,
"KSC",2002-10-11,"Aeronautics Research, Science,
Exploration Systems, Space Operations, ",FALSE,"",
702,6,NA,"https://nen.nasa.gov/web/11/viewall/-/
viewall/1326"
NASA’s Lessons Learned
GraphAware®
Graph database = isolated data silos -> connected knowledge
‣ Efficient search
‣ Relationships among various areas
Apollo, Space Shuttle, Orion, …
‣ Pattern recognition (clusters, communities, correlations, …)
Example: correlation between corrosion of valves & topics involving batteries
‣ Useful for planning future projects and preventing/solving issues
NASA’s Lessons Learned
GraphAware®
What is a Graph?
GraphAware®
G = (V, E)
WHY NEO4J?
GraphAware®
It is a proper graph database
It is a proper database
Graph-Based Architecture: Knowledge Graph
GraphAware®
EXAMPLE
GraphAware®
‣ NLP = machine learning tools allowing computers to process - and
perhaps understand - human languages
‣ Basic steps
Sentence segmentation
Tokenisation
Lemmatisation
Part of Speech (POS) tagging
Parsing
Named Entities Recognition (NER)
Sentiment analysis
…
Natural Language Processing
GraphAware®
Currently supported toolkits for human language processing
‣ Stanford CoreNLP
‣ developed at Stanford University
‣ fast, robust, production ready
‣ many pre-built models
‣ license: GPL v3+
‣ Apache OpenNLP
‣ developed by volunteers
‣ many pre-built models
‣ license: Apache License v2.0
NLP: Text Processors
GraphAware®
‣ Named Entity Recognition (NER) = classification of words into predefined
classes
‣ Examples: Dr. Who -> Person, May 2018 -> Date, EU -> Country …
‣ Stanford NLP default entities: Person, Location, Date, Organisation,
Number, Money, Percentage
‣ Custom NE classes -> training on large tokenised & labeled corpus
‣ Wikipedia, Wikidata - rich sources of multilingual training data that can
be extracted automatically
Named Entity Recognition
GraphAware®
Custom Named Entities based on Wikipedia
GraphAware®
NASA use case: identify names of space missions
Training - crawling Wikipedia & identifying relevant information
EXAMPLE
GraphAware®
Universal Dependencies: cross-linguistically consistent grammatical relations
among words in a sentence
Examples:
‣ amod (adjectival modifier)
Matt likes red wine.
‣ appos (appositional modifier)
Mars Global Surveyor (MGS) was an American robotic spacecraft …
‣ conj (conjunct)
It failed to respond to messages and commands.
‣ …
Universal Dependencies
GraphAware®
‣ Stanford CoreNLP: Dependency & Part of Speech analysis of a single sentence
Source: http://nlp.stanford.edu:8080/corenlp/process
Either find an efficient representation in some traditional database, or …
Graph-Powered NLP
GraphAware®
Graph-Powered NLP
GraphAware®
NLP and property graphs: natural fit
… use a property graph!
EXAMPLE
GraphAware®
Unsupervised techniques tend to be underestimated, while …
‣ No need for time & money to get massive labeled training datasets
‣ Often faster to train & faster to predict
‣ Unsupervised deep learning
Unsupervised ML Algorithms
GraphAware®
PageRank
GraphAware®
PageRank = a measure of importance of a web page based on the quality
of links from other pages
The formula reflects a model of a random surfer.
Source: https://en.wikipedia.org/wiki/PageRank
Keyword Extraction: TextRank
GraphAware®
Keywords = words/phrases that capture the semantic essence of a text
Graph-Based Unsupervised Algorithm:
‣ Construct a graph of word co-occurrences
‣ Asses the importance of words by PageRank algorithm
‣ Use top 1/3 of words as keyword candidates
‣ Use universal dependencies to construct key phrases
GraphAware®
Rada Mihalcea, Paul Tarau. TextRank: Bringing Order into Texts. Proceedings of EMNLP 2004, pages 404–411, Barcelona,
Spain. Association for Computational Linguistics. http://www.aclweb.org/anthology/W04-3252.
Keyword Extraction: TextRank
Despite its simplicity, TextRank provides
state of the art results on wide range of
unstructured texts.
Leveraging universal dependencies allowed
us to surpass precision & recall of the
original TextRank paper.
NASA examples: “space shuttle”, “flight hardware”, “launch vehicle”, …
Automatic text summarisation
‣ Abstractive
‣ Extractive
TextRank can be adapted for efficient
sentence ranking for extractive summarisation.
Summarisation: TextRank
GraphAware®
EXAMPLE
GraphAware®
ConceptNet 5 = semantic network for understanding the meaning of words
‣ Relational knowledge from MIT’s Open Mind Common Sense project
‣ DBPedia (information from Wikipedia info-boxes)
‣ Wiktionary (free multilingual dictionary)
‣ …
Knowledge Enrichment: ConceptNet 5
GraphAware®
Microsoft Concept Graph = semantic network introducing knowledge
about concepts
‣ harnessed from billions of web pages and years’ worth of search logs
Expand the knowledge from external or other internal sources.
Knowlege Enrichment
GraphAware®
‣ Latent Dirichlet Allocation (LDA) - generative statistical model that
describes documents as a probabilistic mixture of a small number of topics
‣ Each topic described by a list of most relevant words
‣ Sample of topics from the NASA dataset
[“design”, "failure", "test", "result", "flight", "hardware", "mission", “testing”, “system”,
“due”]
[“pressure", "system", "cause", "valve", "propellant", "leak", "operation", “shuttle”,
“space”, “gas”]
[“space”, "shuttle", "NASA", "operation", "safety", "iss", "crew", "ISS", "astronaut", "progr
am"]
Topic Extraction: Latent Dirichlet Allocation
GraphAware®
EXAMPLE
GraphAware®
‣ Word embeddings = representation of words as multi-dimensional
semantic vectors which encode linguistic patterns
‣ Use cases: word sense disambiguation, new distance functions between
documents, starting point for further ML (e.g. NN classification)
‣ Word2vec = shallow two-layer neural network model for producing word
embeddings
‣ ConceptNet Numberbatch - consists of state-of-the-art word embeddings
Word Embeddings
GraphAware®
Word Embeddings: word2vec
GraphAware®
Tomas Mikolov et al.: https://arxiv.org/abs/1301.3781
Word Embeddings: word2vec
GraphAware®
Kusner et al.: http://mkusner.github.io/publications/WMD.pdf
Document distance: min. cumulative distance that all words need to travel
Semantic patterns representable as linear translations:
distance(Oslo -> Norway) similar to distance(Berlin -> Germany)
vec(Germany) - vec(Berlin) + vec(Oslo) = vec(Norway)
Document Embeddings
GraphAware®
Q. Le, T. Mikolov: Distributed representations of sentences and documents, arXiv:1405.4053v2
Paragraph Vector (doc2vec): extension of word2vec
The additional paragraph node represents context (topic) of the current document.
Paragraph vectors have the same behaviour towards linear vector translations as
word vectors.
Document Embeddings
GraphAware®
doc2vec vectors of dimension 300, NASA sentences -> dimensionality reduction (PCA + t-SNE)
Document Embeddings
GraphAware®
doc2vec vectors of dimension 2000, 30k Wikipedia pages -> dimensionality reduction (PCA + t-SNE)
Some of the neural networks applicable to text processing
‣ Shallow networks (word & document embeddings)
‣ Deep Auto-Encoders
‣ Convolutional Neural Networks
‣ Recurrent Neural Networks (LSTMs)
Deep Learning for Text Processing
GraphAware®
Self-supervised Auto-Encoders: useful for vector embeddings (images, texts)
DeepLearning4J - Java-based deep learning library
Example of auto-encoder (e.g. stacked RBMs) …
Deep Learning: Auto-encoders
GraphAware®
Works well for images, but problematic for texts (sparsity).
Convolutional Neural Networks
GraphAware®
Y. Zhang, B. Wallace: arXiv:1510.03820
Classification of documents based on word embeddings and CNN
Deep Learning: Summarisation
GraphAware®
S. Narayan et al.: Ranking Sentences for Extractive Summarisation with Reinforcement learning, arXiv:1802.08636
Deep Learning: Summarisation
GraphAware®
Extractive summarisation (sentence ranking) notably outperforms abstractive.
S. Narayan et al.: Ranking Sentences for Extractive Summarisation with Reinforcement learning, arXiv:1802.08636
Knowledge Graphs are a powerful problem-solving tool
‣ Augmented search
‣ Actionable knowledge
‣ Machine Learning
‣ Chatbots and Question answering systems
‣ Foundational to AI
Conclusion
GraphAware®
www.graphaware.com @graph_aware

Más contenido relacionado

La actualidad más candente

Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Apache Spark GraphX highlights.
Apache Spark GraphX highlights.
Doug Needham
 
Machine Learning and GraphX
Machine Learning and GraphXMachine Learning and GraphX
Machine Learning and GraphX
Andy Petrella
 

La actualidad más candente (20)

Unparalleled Graph Database Scalability Delivered by Neo4j 4.0
Unparalleled Graph Database Scalability Delivered by Neo4j 4.0Unparalleled Graph Database Scalability Delivered by Neo4j 4.0
Unparalleled Graph Database Scalability Delivered by Neo4j 4.0
 
Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Apache Spark GraphX highlights.
Apache Spark GraphX highlights.
 
Graph Database Prototyping made easy with Graphgen
Graph Database Prototyping made easy with GraphgenGraph Database Prototyping made easy with Graphgen
Graph Database Prototyping made easy with Graphgen
 
GraphFrames: Graph Queries in Spark SQL by Ankur Dave
GraphFrames: Graph Queries in Spark SQL by Ankur DaveGraphFrames: Graph Queries in Spark SQL by Ankur Dave
GraphFrames: Graph Queries in Spark SQL by Ankur Dave
 
Machine Learning and GraphX
Machine Learning and GraphXMachine Learning and GraphX
Machine Learning and GraphX
 
An excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXAn excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphX
 
GraphAware Framework Intro
GraphAware Framework IntroGraphAware Framework Intro
GraphAware Framework Intro
 
Graph Analytics: Graph Algorithms Inside Neo4j
Graph Analytics: Graph Algorithms Inside Neo4jGraph Analytics: Graph Algorithms Inside Neo4j
Graph Analytics: Graph Algorithms Inside Neo4j
 
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use CaseApache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big data
 
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
 
Gephi, Graphx, and Giraph
Gephi, Graphx, and GiraphGephi, Graphx, and Giraph
Gephi, Graphx, and Giraph
 
GraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQLGraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQL
 
Spark Summit 2015 keynote: Making Big Data Simple with Spark
Spark Summit 2015 keynote: Making Big Data Simple with SparkSpark Summit 2015 keynote: Making Big Data Simple with Spark
Spark Summit 2015 keynote: Making Big Data Simple with Spark
 
Interpreting Relational Schema to Graphs
Interpreting Relational Schema to GraphsInterpreting Relational Schema to Graphs
Interpreting Relational Schema to Graphs
 
Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache Spark
 
GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™
 
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold XinUnifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
 
Microsoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaMicrosoft R Server for Data Sciencea
Microsoft R Server for Data Sciencea
 

Similar a Signals from outer space

The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...
University of California, San Diego
 
Maria Patterson - Building a community fountain around your data stream
Maria Patterson - Building a community fountain around your data streamMaria Patterson - Building a community fountain around your data stream
Maria Patterson - Building a community fountain around your data stream
PyData
 
Project Presentation
Project PresentationProject Presentation
Project Presentation
butest
 

Similar a Signals from outer space (20)

The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...
 
Maria Patterson - Building a community fountain around your data stream
Maria Patterson - Building a community fountain around your data streamMaria Patterson - Building a community fountain around your data stream
Maria Patterson - Building a community fountain around your data stream
 
Towards efficient processing of RDF data streams
Towards efficient processing of RDF data streamsTowards efficient processing of RDF data streams
Towards efficient processing of RDF data streams
 
Towards efficient processing of RDF data streams
Towards efficient processing of RDF data streamsTowards efficient processing of RDF data streams
Towards efficient processing of RDF data streams
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
 
Project Presentation
Project PresentationProject Presentation
Project Presentation
 
3rd presentation
3rd presentation3rd presentation
3rd presentation
 
"Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications""Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications"
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
 
Runtime Behavior of JavaScript Programs
Runtime Behavior of JavaScript ProgramsRuntime Behavior of JavaScript Programs
Runtime Behavior of JavaScript Programs
 
Bio2RDF@BH2010
Bio2RDF@BH2010Bio2RDF@BH2010
Bio2RDF@BH2010
 
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
 
Scientific
Scientific Scientific
Scientific
 
Connecting Stream Reasoners on the Web
Connecting Stream Reasoners on the WebConnecting Stream Reasoners on the Web
Connecting Stream Reasoners on the Web
 
RDF Stream Processing: Let's React
RDF Stream Processing: Let's ReactRDF Stream Processing: Let's React
RDF Stream Processing: Let's React
 
Data dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNLData dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNL
 
RDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of SemanticsRDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of Semantics
 
20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogs20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogs
 
Text mining and Visualizations
Text mining  and VisualizationsText mining  and Visualizations
Text mining and Visualizations
 
Boulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
Boulder/Denver BigData: Cluster Computing with Apache Mesos and CascadingBoulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
Boulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
 

Más de GraphAware

Más de GraphAware (20)

Challenges in knowledge graph visualization
Challenges in knowledge graph visualizationChallenges in knowledge graph visualization
Challenges in knowledge graph visualization
 
Social media monitoring with ML-powered Knowledge Graph
Social media monitoring with ML-powered Knowledge GraphSocial media monitoring with ML-powered Knowledge Graph
Social media monitoring with ML-powered Knowledge Graph
 
To be or not to be.
To be or not to be. To be or not to be.
To be or not to be.
 
It Depends (and why it's the most frequent answer to modelling questions)
It Depends (and why it's the most frequent answer to modelling questions)It Depends (and why it's the most frequent answer to modelling questions)
It Depends (and why it's the most frequent answer to modelling questions)
 
How Boston Scientific Improves Manufacturing Quality Using Graph Analytics
How Boston Scientific Improves Manufacturing Quality Using Graph AnalyticsHow Boston Scientific Improves Manufacturing Quality Using Graph Analytics
How Boston Scientific Improves Manufacturing Quality Using Graph Analytics
 
When privacy matters! Chatbots in data-sensitive businesses
When privacy matters! Chatbots in data-sensitive businessesWhen privacy matters! Chatbots in data-sensitive businesses
When privacy matters! Chatbots in data-sensitive businesses
 
Graph-Powered Machine Learning
Graph-Powered Machine LearningGraph-Powered Machine Learning
Graph-Powered Machine Learning
 
(Big) Data Science
 (Big) Data Science (Big) Data Science
(Big) Data Science
 
Modelling Data in Neo4j (plus a few tips)
Modelling Data in Neo4j (plus a few tips)Modelling Data in Neo4j (plus a few tips)
Modelling Data in Neo4j (plus a few tips)
 
Intro to Neo4j (CZ)
Intro to Neo4j (CZ)Intro to Neo4j (CZ)
Intro to Neo4j (CZ)
 
Modelling Data as Graphs (Neo4j)
Modelling Data as Graphs (Neo4j)Modelling Data as Graphs (Neo4j)
Modelling Data as Graphs (Neo4j)
 
GraphAware Framework Intro
GraphAware Framework IntroGraphAware Framework Intro
GraphAware Framework Intro
 
Advanced Neo4j Use Cases with the GraphAware Framework
Advanced Neo4j Use Cases with the GraphAware FrameworkAdvanced Neo4j Use Cases with the GraphAware Framework
Advanced Neo4j Use Cases with the GraphAware Framework
 
Recommendations with Neo4j (FOSDEM 2015)
Recommendations with Neo4j (FOSDEM 2015)Recommendations with Neo4j (FOSDEM 2015)
Recommendations with Neo4j (FOSDEM 2015)
 
Knowledge Graphs and Chatbots with Neo4j and IBM Watson - Christophe Willemsen
Knowledge Graphs and Chatbots with Neo4j and IBM Watson - Christophe WillemsenKnowledge Graphs and Chatbots with Neo4j and IBM Watson - Christophe Willemsen
Knowledge Graphs and Chatbots with Neo4j and IBM Watson - Christophe Willemsen
 
The power of polyglot searching
The power of polyglot searchingThe power of polyglot searching
The power of polyglot searching
 
Neo4j-Databridge
Neo4j-DatabridgeNeo4j-Databridge
Neo4j-Databridge
 
Spring Data Neo4j: Graph Power Your Enterprise Apps
Spring Data Neo4j: Graph Power Your Enterprise AppsSpring Data Neo4j: Graph Power Your Enterprise Apps
Spring Data Neo4j: Graph Power Your Enterprise Apps
 
Voice-driven Knowledge Graph Journey with Neo4j and Amazon Alexa
Voice-driven Knowledge Graph Journey with Neo4j and Amazon AlexaVoice-driven Knowledge Graph Journey with Neo4j and Amazon Alexa
Voice-driven Knowledge Graph Journey with Neo4j and Amazon Alexa
 
Relevant Search Leveraging Knowledge Graphs with Neo4j
Relevant Search Leveraging Knowledge Graphs with Neo4jRelevant Search Leveraging Knowledge Graphs with Neo4j
Relevant Search Leveraging Knowledge Graphs with Neo4j
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 

Signals from outer space

  • 1. GraphAware® SIGNALS FROM OUTER SPACE Vlasta Kůs, Data Scientist @ GraphAware graphaware.com @graph_aware, @VlastaKus How NASA Benefits from Graph-Powered NLP
  • 2. ‣ Database of learned knowledge across NASA’s programs & projects ‣ Unstructured text with basic metadata ‣ Collected since late 1950s (100s of millions of documents) ‣ Public dataset: ~1600 documents NASA’s Lessons Learned GraphAware®
  • 3. "1406",420,"Roberts, J “, "VO'75 Pressure Regulator Leakage and Work-Around Procedures (~1976)”, "The pressure regulator in the Viking Orbiter Propulsion Subsystem started leaking following a pyro firing that occurred prior to the near-Mars TCM. Likely causes were corrosion or residue from propellant migration or pyro valve blowby, or particulate contamination. Recommendations included using separate regulators for the fuel and oxidizer sides, incorporating a bellows in the pyro valve to eliminate blowby, and adding a isolation valve between the regulator and propellant tank.“, " The micro-scale effects of long-term propellant exposure should be investigated in order to better critique regulator design. “, "JPL",1996-07-08,"",TRUE,"",1460,7,NA,"https:// nen.nasa.gov/web/11/viewall/-/viewall/420" NASA’s Lessons Learned Database GraphAware®
  • 4. “673",1326,"Relvini, Kristine “, "Lessons Learned Not Being Inputted Into Lessons Learned Information System (LLIS) Database”, “", "If you don't document the lessons learned, you loose knowledgeable, shared information and tracking capacity across programs.“, "KSC",2002-10-11,"Aeronautics Research, Science, Exploration Systems, Space Operations, ",FALSE,"", 702,6,NA,"https://nen.nasa.gov/web/11/viewall/-/ viewall/1326" NASA’s Lessons Learned GraphAware®
  • 5. Graph database = isolated data silos -> connected knowledge ‣ Efficient search ‣ Relationships among various areas Apollo, Space Shuttle, Orion, … ‣ Pattern recognition (clusters, communities, correlations, …) Example: correlation between corrosion of valves & topics involving batteries ‣ Useful for planning future projects and preventing/solving issues NASA’s Lessons Learned GraphAware®
  • 6. What is a Graph? GraphAware® G = (V, E)
  • 7. WHY NEO4J? GraphAware® It is a proper graph database It is a proper database
  • 10. ‣ NLP = machine learning tools allowing computers to process - and perhaps understand - human languages ‣ Basic steps Sentence segmentation Tokenisation Lemmatisation Part of Speech (POS) tagging Parsing Named Entities Recognition (NER) Sentiment analysis … Natural Language Processing GraphAware®
  • 11. Currently supported toolkits for human language processing ‣ Stanford CoreNLP ‣ developed at Stanford University ‣ fast, robust, production ready ‣ many pre-built models ‣ license: GPL v3+ ‣ Apache OpenNLP ‣ developed by volunteers ‣ many pre-built models ‣ license: Apache License v2.0 NLP: Text Processors GraphAware®
  • 12. ‣ Named Entity Recognition (NER) = classification of words into predefined classes ‣ Examples: Dr. Who -> Person, May 2018 -> Date, EU -> Country … ‣ Stanford NLP default entities: Person, Location, Date, Organisation, Number, Money, Percentage ‣ Custom NE classes -> training on large tokenised & labeled corpus ‣ Wikipedia, Wikidata - rich sources of multilingual training data that can be extracted automatically Named Entity Recognition GraphAware®
  • 13. Custom Named Entities based on Wikipedia GraphAware® NASA use case: identify names of space missions Training - crawling Wikipedia & identifying relevant information
  • 15. Universal Dependencies: cross-linguistically consistent grammatical relations among words in a sentence Examples: ‣ amod (adjectival modifier) Matt likes red wine. ‣ appos (appositional modifier) Mars Global Surveyor (MGS) was an American robotic spacecraft … ‣ conj (conjunct) It failed to respond to messages and commands. ‣ … Universal Dependencies GraphAware®
  • 16. ‣ Stanford CoreNLP: Dependency & Part of Speech analysis of a single sentence Source: http://nlp.stanford.edu:8080/corenlp/process Either find an efficient representation in some traditional database, or … Graph-Powered NLP GraphAware®
  • 17. Graph-Powered NLP GraphAware® NLP and property graphs: natural fit … use a property graph!
  • 19. Unsupervised techniques tend to be underestimated, while … ‣ No need for time & money to get massive labeled training datasets ‣ Often faster to train & faster to predict ‣ Unsupervised deep learning Unsupervised ML Algorithms GraphAware®
  • 20. PageRank GraphAware® PageRank = a measure of importance of a web page based on the quality of links from other pages The formula reflects a model of a random surfer. Source: https://en.wikipedia.org/wiki/PageRank
  • 21. Keyword Extraction: TextRank GraphAware® Keywords = words/phrases that capture the semantic essence of a text Graph-Based Unsupervised Algorithm: ‣ Construct a graph of word co-occurrences ‣ Asses the importance of words by PageRank algorithm ‣ Use top 1/3 of words as keyword candidates ‣ Use universal dependencies to construct key phrases
  • 22. GraphAware® Rada Mihalcea, Paul Tarau. TextRank: Bringing Order into Texts. Proceedings of EMNLP 2004, pages 404–411, Barcelona, Spain. Association for Computational Linguistics. http://www.aclweb.org/anthology/W04-3252. Keyword Extraction: TextRank Despite its simplicity, TextRank provides state of the art results on wide range of unstructured texts. Leveraging universal dependencies allowed us to surpass precision & recall of the original TextRank paper. NASA examples: “space shuttle”, “flight hardware”, “launch vehicle”, …
  • 23. Automatic text summarisation ‣ Abstractive ‣ Extractive TextRank can be adapted for efficient sentence ranking for extractive summarisation. Summarisation: TextRank GraphAware®
  • 25. ConceptNet 5 = semantic network for understanding the meaning of words ‣ Relational knowledge from MIT’s Open Mind Common Sense project ‣ DBPedia (information from Wikipedia info-boxes) ‣ Wiktionary (free multilingual dictionary) ‣ … Knowledge Enrichment: ConceptNet 5 GraphAware® Microsoft Concept Graph = semantic network introducing knowledge about concepts ‣ harnessed from billions of web pages and years’ worth of search logs
  • 26. Expand the knowledge from external or other internal sources. Knowlege Enrichment GraphAware®
  • 27. ‣ Latent Dirichlet Allocation (LDA) - generative statistical model that describes documents as a probabilistic mixture of a small number of topics ‣ Each topic described by a list of most relevant words ‣ Sample of topics from the NASA dataset [“design”, "failure", "test", "result", "flight", "hardware", "mission", “testing”, “system”, “due”] [“pressure", "system", "cause", "valve", "propellant", "leak", "operation", “shuttle”, “space”, “gas”] [“space”, "shuttle", "NASA", "operation", "safety", "iss", "crew", "ISS", "astronaut", "progr am"] Topic Extraction: Latent Dirichlet Allocation GraphAware®
  • 29. ‣ Word embeddings = representation of words as multi-dimensional semantic vectors which encode linguistic patterns ‣ Use cases: word sense disambiguation, new distance functions between documents, starting point for further ML (e.g. NN classification) ‣ Word2vec = shallow two-layer neural network model for producing word embeddings ‣ ConceptNet Numberbatch - consists of state-of-the-art word embeddings Word Embeddings GraphAware®
  • 30. Word Embeddings: word2vec GraphAware® Tomas Mikolov et al.: https://arxiv.org/abs/1301.3781
  • 31. Word Embeddings: word2vec GraphAware® Kusner et al.: http://mkusner.github.io/publications/WMD.pdf Document distance: min. cumulative distance that all words need to travel Semantic patterns representable as linear translations: distance(Oslo -> Norway) similar to distance(Berlin -> Germany) vec(Germany) - vec(Berlin) + vec(Oslo) = vec(Norway)
  • 32. Document Embeddings GraphAware® Q. Le, T. Mikolov: Distributed representations of sentences and documents, arXiv:1405.4053v2 Paragraph Vector (doc2vec): extension of word2vec The additional paragraph node represents context (topic) of the current document. Paragraph vectors have the same behaviour towards linear vector translations as word vectors.
  • 33. Document Embeddings GraphAware® doc2vec vectors of dimension 300, NASA sentences -> dimensionality reduction (PCA + t-SNE)
  • 34. Document Embeddings GraphAware® doc2vec vectors of dimension 2000, 30k Wikipedia pages -> dimensionality reduction (PCA + t-SNE)
  • 35. Some of the neural networks applicable to text processing ‣ Shallow networks (word & document embeddings) ‣ Deep Auto-Encoders ‣ Convolutional Neural Networks ‣ Recurrent Neural Networks (LSTMs) Deep Learning for Text Processing GraphAware®
  • 36. Self-supervised Auto-Encoders: useful for vector embeddings (images, texts) DeepLearning4J - Java-based deep learning library Example of auto-encoder (e.g. stacked RBMs) … Deep Learning: Auto-encoders GraphAware® Works well for images, but problematic for texts (sparsity).
  • 37. Convolutional Neural Networks GraphAware® Y. Zhang, B. Wallace: arXiv:1510.03820 Classification of documents based on word embeddings and CNN
  • 38. Deep Learning: Summarisation GraphAware® S. Narayan et al.: Ranking Sentences for Extractive Summarisation with Reinforcement learning, arXiv:1802.08636
  • 39. Deep Learning: Summarisation GraphAware® Extractive summarisation (sentence ranking) notably outperforms abstractive. S. Narayan et al.: Ranking Sentences for Extractive Summarisation with Reinforcement learning, arXiv:1802.08636
  • 40. Knowledge Graphs are a powerful problem-solving tool ‣ Augmented search ‣ Actionable knowledge ‣ Machine Learning ‣ Chatbots and Question answering systems ‣ Foundational to AI Conclusion GraphAware®