SlideShare una empresa de Scribd logo
1 de 18
| 1
Topic Pages
from articles to answers..
Deep Kayal, Elsevier
4.7.19
| 2
Elsevier!
So much more!
- Entity recognition and linking
- Text summarization
- Question answering
- Image understanding
- Ontology creation/alignment
- Knowledge graph creation
- User representation and
understanding
- Recommendations
- Search
- …
| 3
Working with Amsterdam Data Science
It’s been a brilliant journey so far!
• A successful internship program for 3 years
now
- With about 30 graduates
- More than 10 publications
- And 5 hires
• An AI lab VU and UVA
- 3 PhD students and 2 post-doctoral
researchers
- Helping us with themes around information
extraction and search
• Inspiring others about Amsterdam as
attractive Data Science Hub
| 4
???!!
W.T.H!
Talking about finding information…
| 5
Enter Topic Pages!
Definition
More info
Other related
concepts
| 6
Breaking down the problem
• The intention was to build an scientific encyclopedia
- Automatically
- From peer-reviewed, citable content
• An encyclopedia provides well-structured and meaningful information about concepts
- So we need to have a database of concepts, at the least
- We need to find the concepts in free text
- And, finally, show the region of text where the concept was found, if it is meaningful
| 7
Step 1: Tag the content and find candidates
• The first step was to tag all of the incoming textual data using pre-defined concepts
- At Elsevier, we have a large, general-purpose, semi-automatically made taxonomy, called
Omniscience
- It combines and extends several existing taxonomies such as EMMeT, MeSH, Reaxys, etc.
| 8
Step 1: Tag the content and find candidates…
• So we know what we have to tag text with..
- We still need to figure out a way to do the tagging
• We’ve developed a state-of-the-art tagging system, which we call
the Fingerprint engine (FPE) uses NLP-driven rules to impose
an external taxonomy on incoming text
- So, given a piece of text
- And, say, the branch of Omniscience dealing with chemistry
concepts, it gives you annotations which correspond to a concept in
the taxonomy
• Finally, every sentence which contains an annotation is a
candidate which can possibly be displayed on the topic page
| 9
For a single document, we get…
Candidates:
1) During adolescence, considerable social and biological
changes occur that interact with functional brain maturation,
some of which are sex-specific.
2) The amygdala is one brain area that has displayed sexual
dimorphism, specifically in socio-affective (superficial amygdala
[SFA]), stress (centromedial amygdala [CMA]), and learning and
memory (basolateral amygdala [BLA]) processing.
3) The amygdala has also been implicated in mood and anxiety
disorders which display sex-specific features, most prominently
observed during adolescence.
4) Using functional magnetic resonance imaging (fMRI), the
present study examined the interaction of age and sex on resting
state functional connectivity (RSFC) of amygdala sub-regions,
BLA and SFA, in a sample of healthy adolescents between the
ages 10 and 16 years (n = 122, 71 boys).
…
| 10
And aggregating over concepts…
10
• As an example, for amygdala we observe about 10k candidate sentences
- In mice, which do not form pair bonds, OTR in the medial amygdala and V1aR in the lateral septum are
essential for individual discrimination.272,289
- To determine the sites of action in the brain, DCS was microinjected into the NAcc, the amygdala, and the
caudate putamen.
- The central amygdala, however, is viewed as an important output region.
- The central amygdala then orchestrates responses appropriate to cope with the detected biologically-significant
event.
- The amygdala also innervates the locus coeruleus allowing emotional pain and physical stressors of withdrawal
to trigger noradrenergic (norepinephrine) (fight-or-flight) responses.
- …
| 11
• Once we had sentences, we needed to select the good ones for definitions and snippets
• A major challenge was the lack of training data
- Remember that this is highly specific information, pertaining to highly evolved domains of science
- Training data must be manually curated by subject-matter experts who know the field
- There is a lot of sentences to label!
• To collect data we devised a stratification:
- Hearst patterns (,i.e., is defined by, is a)
- Which section did it come from?
- Sentence length
- Presence of other concepts
- Similarity of the main concept to other concepts
- Similarity to DBPedia definitions
- …
Step 2: Make a training set to train machine learning algorithms
| 12
Unlabeled
Candidate
Sentences
Learning
Algorithm
Reinforcement
(Q-) Learning
SME
Choose next
Predict
Train
Feedback
Label
• Active Learning is an efficient way to get the most informative training data out of the entire
unlabeled set
- The learning algorithm is a LSTM network with a linear SVM as the final layer
- And we use Q-learning to select the next sample out of the pre-computed strata
- Such that it gets a reward if it selects a sentence which the classifier thinks is a good candidate, but
the human annotator marks as bad
Active learning to gather data
| 13
Training set…
Is good
definition?
Concept Definition
1 Massively
Parallel
Processing
Massively parallel processing is a means of crunching huge amounts
of data by distributing the processing over hundreds or thousands of
processors, which might be running in the same box or in separate,
distantly located computers.
0 Software
Adaption
Software adaptation is a remarkably complex phenomenon and it
must be and will be studied for some time.
1 Hash tables Hash tables are one of the most basic data structures used to
provide fast access and compact storage for sparse data.
0 Computational
space
Computational space is an imagery of the two prior spaces that
resides temporarily in the magnetic, semiconductor locations during
the emulation and execution phases of the problems encountered in
dealing with real-space scenarios.
1 Flip-flops Flip-flops are the principal memory circuits that will store past values
and make them available when called for.
0 Probabilistic
sampling
Probabilistic sampling is when there is a well-formed population from
which you are sampling.
| 14
Step 3: Machine Learning for Definition Classification
Results on a public
dataset
| 15
Overall
Content
Tagger
Annotated
candidates
ML model
Definition
Snippet
Topic
Pages!
Improve quality
| 16
Impact
0
10000
20000
30000
40000
50000
60000
0
50000
100000
150000
200000
250000
300000
350000
400000
Traffic and Conversions from Prototype 2016-2017
Prototype SD -Trfk Prototype Google-trfk
Prototype SD -Conv Prototype Google-Conv
Conversions
Pageviews/Traffic
| 17
• Topic pages are a freely available resource
• Great for users who want to find out more about what they’re reading or know about
erstwhile unknown-to-them concepts
• We use ensemble machine learning algorithms
• Deployed on Apache Spark clusters
• To continuously improve the quality of these pages
• Which keeps readers engaged
• And drives incoming traffic from search engines
Summary
| 18
Thanks! Bedankt!
Come talk to us in person if you’re not feeling too tired or shy! 
Or go to:
https://www.elsevier.com/about/careers

Más contenido relacionado

La actualidad más candente

Rajesh babajee from the back of the class to the front
Rajesh babajee   from the back of the class to the frontRajesh babajee   from the back of the class to the front
Rajesh babajee from the back of the class to the fronteduresearch
 
Machine learning (domingo's paper)
Machine learning (domingo's paper)Machine learning (domingo's paper)
Machine learning (domingo's paper)Akhilesh Joshi
 
Self-Attentive Associative Memory
Self-Attentive Associative MemorySelf-Attentive Associative Memory
Self-Attentive Associative MemoryHung Le
 
7. knowledge acquisition, representation and organization 8. semantic network...
7. knowledge acquisition, representation and organization 8. semantic network...7. knowledge acquisition, representation and organization 8. semantic network...
7. knowledge acquisition, representation and organization 8. semantic network...AhL'Dn Daliva
 
On Semi-Supervised Learning and Beyond
On Semi-Supervised Learning and BeyondOn Semi-Supervised Learning and Beyond
On Semi-Supervised Learning and BeyondEunjeong (Lucy) Park
 
Ai sem1 2012-13-w2-representation
Ai sem1 2012-13-w2-representationAi sem1 2012-13-w2-representation
Ai sem1 2012-13-w2-representationAzimah Hashim
 
Continual Learning with Deep Architectures - Tutorial ICML 2021
Continual Learning with Deep Architectures - Tutorial ICML 2021Continual Learning with Deep Architectures - Tutorial ICML 2021
Continual Learning with Deep Architectures - Tutorial ICML 2021Vincenzo Lomonaco
 
Knowledge Patterns SSSW2016
Knowledge Patterns SSSW2016Knowledge Patterns SSSW2016
Knowledge Patterns SSSW2016Aldo Gangemi
 
Uncertain Knowledge and Reasoning in Artificial Intelligence
Uncertain Knowledge and Reasoning in Artificial IntelligenceUncertain Knowledge and Reasoning in Artificial Intelligence
Uncertain Knowledge and Reasoning in Artificial IntelligenceExperfy
 
Lec 0 about the course
Lec 0 about the courseLec 0 about the course
Lec 0 about the courseEyob Sisay
 
Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...
Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...
Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...Association for Computational Linguistics
 
LearningAG.ppt
LearningAG.pptLearningAG.ppt
LearningAG.pptbutest
 
Connectivism: Education & Artificial Intelligence
Connectivism: Education & Artificial IntelligenceConnectivism: Education & Artificial Intelligence
Connectivism: Education & Artificial IntelligenceAlaa Al Dahdouh
 
Objective Fiction, i-semantics keynote
Objective Fiction, i-semantics keynoteObjective Fiction, i-semantics keynote
Objective Fiction, i-semantics keynoteAldo Gangemi
 

La actualidad más candente (18)

Rajesh babajee from the back of the class to the front
Rajesh babajee   from the back of the class to the frontRajesh babajee   from the back of the class to the front
Rajesh babajee from the back of the class to the front
 
Soft computing01
Soft computing01Soft computing01
Soft computing01
 
Machine learning (domingo's paper)
Machine learning (domingo's paper)Machine learning (domingo's paper)
Machine learning (domingo's paper)
 
Self-Attentive Associative Memory
Self-Attentive Associative MemorySelf-Attentive Associative Memory
Self-Attentive Associative Memory
 
7. knowledge acquisition, representation and organization 8. semantic network...
7. knowledge acquisition, representation and organization 8. semantic network...7. knowledge acquisition, representation and organization 8. semantic network...
7. knowledge acquisition, representation and organization 8. semantic network...
 
Knowledge representation
Knowledge representationKnowledge representation
Knowledge representation
 
On Semi-Supervised Learning and Beyond
On Semi-Supervised Learning and BeyondOn Semi-Supervised Learning and Beyond
On Semi-Supervised Learning and Beyond
 
Chapter 5 of 1
Chapter 5 of 1Chapter 5 of 1
Chapter 5 of 1
 
Ai sem1 2012-13-w2-representation
Ai sem1 2012-13-w2-representationAi sem1 2012-13-w2-representation
Ai sem1 2012-13-w2-representation
 
Continual Learning with Deep Architectures - Tutorial ICML 2021
Continual Learning with Deep Architectures - Tutorial ICML 2021Continual Learning with Deep Architectures - Tutorial ICML 2021
Continual Learning with Deep Architectures - Tutorial ICML 2021
 
Knowledge Patterns SSSW2016
Knowledge Patterns SSSW2016Knowledge Patterns SSSW2016
Knowledge Patterns SSSW2016
 
Uncertain Knowledge and Reasoning in Artificial Intelligence
Uncertain Knowledge and Reasoning in Artificial IntelligenceUncertain Knowledge and Reasoning in Artificial Intelligence
Uncertain Knowledge and Reasoning in Artificial Intelligence
 
Lec 0 about the course
Lec 0 about the courseLec 0 about the course
Lec 0 about the course
 
Learning
LearningLearning
Learning
 
Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...
Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...
Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...
 
LearningAG.ppt
LearningAG.pptLearningAG.ppt
LearningAG.ppt
 
Connectivism: Education & Artificial Intelligence
Connectivism: Education & Artificial IntelligenceConnectivism: Education & Artificial Intelligence
Connectivism: Education & Artificial Intelligence
 
Objective Fiction, i-semantics keynote
Objective Fiction, i-semantics keynoteObjective Fiction, i-semantics keynote
Objective Fiction, i-semantics keynote
 

Similar a Topic Pages. From articles to answers.

Applied Artificial Intelligence Unit 1 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 1 Semester 3 MSc IT Part 2 Mumbai Univer...Applied Artificial Intelligence Unit 1 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 1 Semester 3 MSc IT Part 2 Mumbai Univer...Madhav Mishra
 
HyperMembrane Structures for Open Source Cognitive Computing
HyperMembrane Structures for Open Source Cognitive ComputingHyperMembrane Structures for Open Source Cognitive Computing
HyperMembrane Structures for Open Source Cognitive ComputingJack Park
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word CloudsLecture: Semantic Word Clouds
Lecture: Semantic Word CloudsMarina Santini
 
PS3103 Cognitive Psy Lecture 1.pptx
PS3103 Cognitive Psy Lecture 1.pptxPS3103 Cognitive Psy Lecture 1.pptx
PS3103 Cognitive Psy Lecture 1.pptxUneezaRajpoot
 
PS3103 Cognitive Psy Lecture 1.pptx
PS3103 Cognitive Psy Lecture 1.pptxPS3103 Cognitive Psy Lecture 1.pptx
PS3103 Cognitive Psy Lecture 1.pptxahmadbhattim005
 
Knowledge base system appl. p 1,2-ver1
Knowledge base system appl.  p 1,2-ver1Knowledge base system appl.  p 1,2-ver1
Knowledge base system appl. p 1,2-ver1Taymoor Nazmy
 
Learning Analytics – Opportunities for ISO/IEC JTC 1/SC36 standardisation
Learning Analytics – Opportunities for ISO/IEC JTC 1/SC36 standardisationLearning Analytics – Opportunities for ISO/IEC JTC 1/SC36 standardisation
Learning Analytics – Opportunities for ISO/IEC JTC 1/SC36 standardisationTore Hoel
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learningAmr Rashed
 
ML slide share.pptx
ML slide share.pptxML slide share.pptx
ML slide share.pptxGoodReads1
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...RajkiranVeluri
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning AnalyticsXavier Ochoa
 
Connected Data for Machine Learning | Paul Groth
Connected Data for Machine Learning | Paul GrothConnected Data for Machine Learning | Paul Groth
Connected Data for Machine Learning | Paul GrothConnected Data World
 
2018.01.25 rune sætre_triallecture_xai_v2
2018.01.25 rune sætre_triallecture_xai_v22018.01.25 rune sætre_triallecture_xai_v2
2018.01.25 rune sætre_triallecture_xai_v2Rune Sætre
 
“Progress and Challenges in Interactive Cognitive Systems”
“Progress and Challenges in Interactive Cognitive Systems”“Progress and Challenges in Interactive Cognitive Systems”
“Progress and Challenges in Interactive Cognitive Systems”diannepatricia
 
Introduction to Object Oriented Programming
Introduction to Object Oriented ProgrammingIntroduction to Object Oriented Programming
Introduction to Object Oriented ProgrammingMoutaz Haddara
 
Introaied nancy2019 luengo
Introaied nancy2019 luengoIntroaied nancy2019 luengo
Introaied nancy2019 luengoVanda Luengo
 
Artificial Intelligence by B. Ravikumar
Artificial Intelligence by B. RavikumarArtificial Intelligence by B. Ravikumar
Artificial Intelligence by B. RavikumarGarry D. Lasaga
 
The 4th New Science
The 4th New ScienceThe 4th New Science
The 4th New Sciencegrplunkett
 

Similar a Topic Pages. From articles to answers. (20)

Applied Artificial Intelligence Unit 1 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 1 Semester 3 MSc IT Part 2 Mumbai Univer...Applied Artificial Intelligence Unit 1 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 1 Semester 3 MSc IT Part 2 Mumbai Univer...
 
HyperMembrane Structures for Open Source Cognitive Computing
HyperMembrane Structures for Open Source Cognitive ComputingHyperMembrane Structures for Open Source Cognitive Computing
HyperMembrane Structures for Open Source Cognitive Computing
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word CloudsLecture: Semantic Word Clouds
Lecture: Semantic Word Clouds
 
PS3103 Cognitive Psy Lecture 1.pptx
PS3103 Cognitive Psy Lecture 1.pptxPS3103 Cognitive Psy Lecture 1.pptx
PS3103 Cognitive Psy Lecture 1.pptx
 
PS3103 Cognitive Psy Lecture 1.pptx
PS3103 Cognitive Psy Lecture 1.pptxPS3103 Cognitive Psy Lecture 1.pptx
PS3103 Cognitive Psy Lecture 1.pptx
 
Knowledge base system appl. p 1,2-ver1
Knowledge base system appl.  p 1,2-ver1Knowledge base system appl.  p 1,2-ver1
Knowledge base system appl. p 1,2-ver1
 
Learning Analytics – Opportunities for ISO/IEC JTC 1/SC36 standardisation
Learning Analytics – Opportunities for ISO/IEC JTC 1/SC36 standardisationLearning Analytics – Opportunities for ISO/IEC JTC 1/SC36 standardisation
Learning Analytics – Opportunities for ISO/IEC JTC 1/SC36 standardisation
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
ML slide share.pptx
ML slide share.pptxML slide share.pptx
ML slide share.pptx
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning Analytics
 
Connected Data for Machine Learning | Paul Groth
Connected Data for Machine Learning | Paul GrothConnected Data for Machine Learning | Paul Groth
Connected Data for Machine Learning | Paul Groth
 
IJCS_2015_0201003
IJCS_2015_0201003IJCS_2015_0201003
IJCS_2015_0201003
 
2018.01.25 rune sætre_triallecture_xai_v2
2018.01.25 rune sætre_triallecture_xai_v22018.01.25 rune sætre_triallecture_xai_v2
2018.01.25 rune sætre_triallecture_xai_v2
 
“Progress and Challenges in Interactive Cognitive Systems”
“Progress and Challenges in Interactive Cognitive Systems”“Progress and Challenges in Interactive Cognitive Systems”
“Progress and Challenges in Interactive Cognitive Systems”
 
Introduction to Object Oriented Programming
Introduction to Object Oriented ProgrammingIntroduction to Object Oriented Programming
Introduction to Object Oriented Programming
 
Introaied nancy2019 luengo
Introaied nancy2019 luengoIntroaied nancy2019 luengo
Introaied nancy2019 luengo
 
Artificial Intelligence by B. Ravikumar
Artificial Intelligence by B. RavikumarArtificial Intelligence by B. Ravikumar
Artificial Intelligence by B. Ravikumar
 
Machine learning
Machine learningMachine learning
Machine learning
 
The 4th New Science
The 4th New ScienceThe 4th New Science
The 4th New Science
 

Más de Deep Kayal

State of transformers in Computer Vision
State of transformers in Computer VisionState of transformers in Computer Vision
State of transformers in Computer VisionDeep Kayal
 
Unsupervised sentence-embeddings by manifold approximation and projection
Unsupervised sentence-embeddings by manifold approximation and projectionUnsupervised sentence-embeddings by manifold approximation and projection
Unsupervised sentence-embeddings by manifold approximation and projectionDeep Kayal
 
Notes on Deploying Machine-learning Models at Scale
Notes on Deploying Machine-learning Models at ScaleNotes on Deploying Machine-learning Models at Scale
Notes on Deploying Machine-learning Models at ScaleDeep Kayal
 
Information Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ DeloitteInformation Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ DeloitteDeep Kayal
 
A Framework to Automatically Extract Funding Information from Text
A Framework to Automatically Extract Funding Information from TextA Framework to Automatically Extract Funding Information from Text
A Framework to Automatically Extract Funding Information from TextDeep Kayal
 
Large-Scale Data Extraction, Structuring and Matching using Python and Spark
Large-Scale Data Extraction, Structuring and Matching using Python and SparkLarge-Scale Data Extraction, Structuring and Matching using Python and Spark
Large-Scale Data Extraction, Structuring and Matching using Python and SparkDeep Kayal
 

Más de Deep Kayal (6)

State of transformers in Computer Vision
State of transformers in Computer VisionState of transformers in Computer Vision
State of transformers in Computer Vision
 
Unsupervised sentence-embeddings by manifold approximation and projection
Unsupervised sentence-embeddings by manifold approximation and projectionUnsupervised sentence-embeddings by manifold approximation and projection
Unsupervised sentence-embeddings by manifold approximation and projection
 
Notes on Deploying Machine-learning Models at Scale
Notes on Deploying Machine-learning Models at ScaleNotes on Deploying Machine-learning Models at Scale
Notes on Deploying Machine-learning Models at Scale
 
Information Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ DeloitteInformation Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ Deloitte
 
A Framework to Automatically Extract Funding Information from Text
A Framework to Automatically Extract Funding Information from TextA Framework to Automatically Extract Funding Information from Text
A Framework to Automatically Extract Funding Information from Text
 
Large-Scale Data Extraction, Structuring and Matching using Python and Spark
Large-Scale Data Extraction, Structuring and Matching using Python and SparkLarge-Scale Data Extraction, Structuring and Matching using Python and Spark
Large-Scale Data Extraction, Structuring and Matching using Python and Spark
 

Último

In-pond Race way systems for Aquaculture (IPRS).pptx
In-pond Race way systems for Aquaculture (IPRS).pptxIn-pond Race way systems for Aquaculture (IPRS).pptx
In-pond Race way systems for Aquaculture (IPRS).pptxMAGOTI ERNEST
 
MSCII_ FCT UNIT 5 TOXICOLOGY.pdf
MSCII_              FCT UNIT 5 TOXICOLOGY.pdfMSCII_              FCT UNIT 5 TOXICOLOGY.pdf
MSCII_ FCT UNIT 5 TOXICOLOGY.pdfSuchita Rawat
 
GBSN - Microbiology (Unit 6) Human and Microbial interaction
GBSN - Microbiology (Unit 6) Human and Microbial interactionGBSN - Microbiology (Unit 6) Human and Microbial interaction
GBSN - Microbiology (Unit 6) Human and Microbial interactionAreesha Ahmad
 
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...Sahil Suleman
 
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptxPlasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptxmuralinath2
 
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...yogeshlabana357357
 
Plasma proteins_ Dr.Muralinath_Dr.c. kalyan
Plasma proteins_ Dr.Muralinath_Dr.c. kalyanPlasma proteins_ Dr.Muralinath_Dr.c. kalyan
Plasma proteins_ Dr.Muralinath_Dr.c. kalyanmuralinath2
 
RACEMIzATION AND ISOMERISATION completed.pptx
RACEMIzATION AND ISOMERISATION completed.pptxRACEMIzATION AND ISOMERISATION completed.pptx
RACEMIzATION AND ISOMERISATION completed.pptxArunLakshmiMeenakshi
 
Factor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary GlandFactor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary GlandRcvets
 
POST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptx
POST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptxPOST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptx
POST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptxArpitaMishra69
 
NuGOweek 2024 programme final FLYER short.pdf
NuGOweek 2024 programme final FLYER short.pdfNuGOweek 2024 programme final FLYER short.pdf
NuGOweek 2024 programme final FLYER short.pdfpablovgd
 
Information science research with large language models: between science and ...
Information science research with large language models: between science and ...Information science research with large language models: between science and ...
Information science research with large language models: between science and ...Fabiano Dalpiaz
 
Lubrication System in forced feed system
Lubrication System in forced feed systemLubrication System in forced feed system
Lubrication System in forced feed systemADB online India
 
Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...
Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...
Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...mikehavy0
 
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptx
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptxSaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptx
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptxPat (JS) Heslop-Harrison
 
family therapy psychotherapy types .pdf
family therapy psychotherapy types  .pdffamily therapy psychotherapy types  .pdf
family therapy psychotherapy types .pdfhaseebahmeddrama
 
MSC IV_Forensic medicine -sexual offence.pdf
MSC IV_Forensic medicine -sexual offence.pdfMSC IV_Forensic medicine -sexual offence.pdf
MSC IV_Forensic medicine -sexual offence.pdfSuchita Rawat
 
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...Sérgio Sacani
 
GBSN - Microbiology (Unit 7) Microbiology in Everyday Life
GBSN - Microbiology (Unit 7) Microbiology in Everyday LifeGBSN - Microbiology (Unit 7) Microbiology in Everyday Life
GBSN - Microbiology (Unit 7) Microbiology in Everyday LifeAreesha Ahmad
 

Último (20)

In-pond Race way systems for Aquaculture (IPRS).pptx
In-pond Race way systems for Aquaculture (IPRS).pptxIn-pond Race way systems for Aquaculture (IPRS).pptx
In-pond Race way systems for Aquaculture (IPRS).pptx
 
MSCII_ FCT UNIT 5 TOXICOLOGY.pdf
MSCII_              FCT UNIT 5 TOXICOLOGY.pdfMSCII_              FCT UNIT 5 TOXICOLOGY.pdf
MSCII_ FCT UNIT 5 TOXICOLOGY.pdf
 
GBSN - Microbiology (Unit 6) Human and Microbial interaction
GBSN - Microbiology (Unit 6) Human and Microbial interactionGBSN - Microbiology (Unit 6) Human and Microbial interaction
GBSN - Microbiology (Unit 6) Human and Microbial interaction
 
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
 
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptxPlasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
 
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
 
Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
 
Plasma proteins_ Dr.Muralinath_Dr.c. kalyan
Plasma proteins_ Dr.Muralinath_Dr.c. kalyanPlasma proteins_ Dr.Muralinath_Dr.c. kalyan
Plasma proteins_ Dr.Muralinath_Dr.c. kalyan
 
RACEMIzATION AND ISOMERISATION completed.pptx
RACEMIzATION AND ISOMERISATION completed.pptxRACEMIzATION AND ISOMERISATION completed.pptx
RACEMIzATION AND ISOMERISATION completed.pptx
 
Factor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary GlandFactor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary Gland
 
POST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptx
POST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptxPOST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptx
POST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptx
 
NuGOweek 2024 programme final FLYER short.pdf
NuGOweek 2024 programme final FLYER short.pdfNuGOweek 2024 programme final FLYER short.pdf
NuGOweek 2024 programme final FLYER short.pdf
 
Information science research with large language models: between science and ...
Information science research with large language models: between science and ...Information science research with large language models: between science and ...
Information science research with large language models: between science and ...
 
Lubrication System in forced feed system
Lubrication System in forced feed systemLubrication System in forced feed system
Lubrication System in forced feed system
 
Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...
Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...
Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...
 
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptx
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptxSaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptx
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptx
 
family therapy psychotherapy types .pdf
family therapy psychotherapy types  .pdffamily therapy psychotherapy types  .pdf
family therapy psychotherapy types .pdf
 
MSC IV_Forensic medicine -sexual offence.pdf
MSC IV_Forensic medicine -sexual offence.pdfMSC IV_Forensic medicine -sexual offence.pdf
MSC IV_Forensic medicine -sexual offence.pdf
 
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...
 
GBSN - Microbiology (Unit 7) Microbiology in Everyday Life
GBSN - Microbiology (Unit 7) Microbiology in Everyday LifeGBSN - Microbiology (Unit 7) Microbiology in Everyday Life
GBSN - Microbiology (Unit 7) Microbiology in Everyday Life
 

Topic Pages. From articles to answers.

  • 1. | 1 Topic Pages from articles to answers.. Deep Kayal, Elsevier 4.7.19
  • 2. | 2 Elsevier! So much more! - Entity recognition and linking - Text summarization - Question answering - Image understanding - Ontology creation/alignment - Knowledge graph creation - User representation and understanding - Recommendations - Search - …
  • 3. | 3 Working with Amsterdam Data Science It’s been a brilliant journey so far! • A successful internship program for 3 years now - With about 30 graduates - More than 10 publications - And 5 hires • An AI lab VU and UVA - 3 PhD students and 2 post-doctoral researchers - Helping us with themes around information extraction and search • Inspiring others about Amsterdam as attractive Data Science Hub
  • 4. | 4 ???!! W.T.H! Talking about finding information…
  • 5. | 5 Enter Topic Pages! Definition More info Other related concepts
  • 6. | 6 Breaking down the problem • The intention was to build an scientific encyclopedia - Automatically - From peer-reviewed, citable content • An encyclopedia provides well-structured and meaningful information about concepts - So we need to have a database of concepts, at the least - We need to find the concepts in free text - And, finally, show the region of text where the concept was found, if it is meaningful
  • 7. | 7 Step 1: Tag the content and find candidates • The first step was to tag all of the incoming textual data using pre-defined concepts - At Elsevier, we have a large, general-purpose, semi-automatically made taxonomy, called Omniscience - It combines and extends several existing taxonomies such as EMMeT, MeSH, Reaxys, etc.
  • 8. | 8 Step 1: Tag the content and find candidates… • So we know what we have to tag text with.. - We still need to figure out a way to do the tagging • We’ve developed a state-of-the-art tagging system, which we call the Fingerprint engine (FPE) uses NLP-driven rules to impose an external taxonomy on incoming text - So, given a piece of text - And, say, the branch of Omniscience dealing with chemistry concepts, it gives you annotations which correspond to a concept in the taxonomy • Finally, every sentence which contains an annotation is a candidate which can possibly be displayed on the topic page
  • 9. | 9 For a single document, we get… Candidates: 1) During adolescence, considerable social and biological changes occur that interact with functional brain maturation, some of which are sex-specific. 2) The amygdala is one brain area that has displayed sexual dimorphism, specifically in socio-affective (superficial amygdala [SFA]), stress (centromedial amygdala [CMA]), and learning and memory (basolateral amygdala [BLA]) processing. 3) The amygdala has also been implicated in mood and anxiety disorders which display sex-specific features, most prominently observed during adolescence. 4) Using functional magnetic resonance imaging (fMRI), the present study examined the interaction of age and sex on resting state functional connectivity (RSFC) of amygdala sub-regions, BLA and SFA, in a sample of healthy adolescents between the ages 10 and 16 years (n = 122, 71 boys). …
  • 10. | 10 And aggregating over concepts… 10 • As an example, for amygdala we observe about 10k candidate sentences - In mice, which do not form pair bonds, OTR in the medial amygdala and V1aR in the lateral septum are essential for individual discrimination.272,289 - To determine the sites of action in the brain, DCS was microinjected into the NAcc, the amygdala, and the caudate putamen. - The central amygdala, however, is viewed as an important output region. - The central amygdala then orchestrates responses appropriate to cope with the detected biologically-significant event. - The amygdala also innervates the locus coeruleus allowing emotional pain and physical stressors of withdrawal to trigger noradrenergic (norepinephrine) (fight-or-flight) responses. - …
  • 11. | 11 • Once we had sentences, we needed to select the good ones for definitions and snippets • A major challenge was the lack of training data - Remember that this is highly specific information, pertaining to highly evolved domains of science - Training data must be manually curated by subject-matter experts who know the field - There is a lot of sentences to label! • To collect data we devised a stratification: - Hearst patterns (,i.e., is defined by, is a) - Which section did it come from? - Sentence length - Presence of other concepts - Similarity of the main concept to other concepts - Similarity to DBPedia definitions - … Step 2: Make a training set to train machine learning algorithms
  • 12. | 12 Unlabeled Candidate Sentences Learning Algorithm Reinforcement (Q-) Learning SME Choose next Predict Train Feedback Label • Active Learning is an efficient way to get the most informative training data out of the entire unlabeled set - The learning algorithm is a LSTM network with a linear SVM as the final layer - And we use Q-learning to select the next sample out of the pre-computed strata - Such that it gets a reward if it selects a sentence which the classifier thinks is a good candidate, but the human annotator marks as bad Active learning to gather data
  • 13. | 13 Training set… Is good definition? Concept Definition 1 Massively Parallel Processing Massively parallel processing is a means of crunching huge amounts of data by distributing the processing over hundreds or thousands of processors, which might be running in the same box or in separate, distantly located computers. 0 Software Adaption Software adaptation is a remarkably complex phenomenon and it must be and will be studied for some time. 1 Hash tables Hash tables are one of the most basic data structures used to provide fast access and compact storage for sparse data. 0 Computational space Computational space is an imagery of the two prior spaces that resides temporarily in the magnetic, semiconductor locations during the emulation and execution phases of the problems encountered in dealing with real-space scenarios. 1 Flip-flops Flip-flops are the principal memory circuits that will store past values and make them available when called for. 0 Probabilistic sampling Probabilistic sampling is when there is a well-formed population from which you are sampling.
  • 14. | 14 Step 3: Machine Learning for Definition Classification Results on a public dataset
  • 16. | 16 Impact 0 10000 20000 30000 40000 50000 60000 0 50000 100000 150000 200000 250000 300000 350000 400000 Traffic and Conversions from Prototype 2016-2017 Prototype SD -Trfk Prototype Google-trfk Prototype SD -Conv Prototype Google-Conv Conversions Pageviews/Traffic
  • 17. | 17 • Topic pages are a freely available resource • Great for users who want to find out more about what they’re reading or know about erstwhile unknown-to-them concepts • We use ensemble machine learning algorithms • Deployed on Apache Spark clusters • To continuously improve the quality of these pages • Which keeps readers engaged • And drives incoming traffic from search engines Summary
  • 18. | 18 Thanks! Bedankt! Come talk to us in person if you’re not feeling too tired or shy!  Or go to: https://www.elsevier.com/about/careers

Notas del editor

  1. https://www.sciencedirect.com/topics/neuroscience/amygdala
  2. http://data.elsevier.com/vocabulary/browse/OmniScience
  3. http://data.elsevier.com/vocabulary/browse/OmniScience
  4. http://data.elsevier.com/vocabulary/browse/OmniScience