SlideShare una empresa de Scribd logo
1 de 17
Mathematical Language Processing
via Tree Embeddings
Jack Wang, Andrew Lan, Richard Baraniuk
June 15, 2021
Mathematical Language Is Everywhere
textbooks
academic papers
Wikipedia articles
Difficult to extract and synthesize information from massive content
How to efficiently find relevant mathematical content?
The Mathematical Content Retrieval Problem
Difficult to extract and synthesize information from massive content
Desired: efficient, automated system to aid indexing, searching, and organizing
mathematical contents
We focus on formula retrieval:
- Search for and retrieve similar equations, given a query equation
The Mathematical Content Retrieval Problem
Current search engines lack ability to effectively search for mathematical content
Machine
learning
The Mathematical Content Retrieval Problem
Current search engines lack ability to effectively search for mathematical content
query equation in a machine learning textbook
Search results contain only
specific characters that match
with input query but NOT the
entire equation
The Mathematical Content Retrieval Problem
Desired retrieval
Our Solution: Formula Representation via
Tree Embeddings
A novel framework that learns a good representation of mathematical formulae
Based on the encoder-decoder architecture
● A novel encoding scheme: equation as trees
● A novel decoding scheme: generate equation as trees
formula encoder decoder
Reconstructed
formula
Formula
embedding
Minimize this reconstruction loss
Our Solution, part #1: Equation Encoding
Explicitly capture the semantic and syntactic information in an equation
Encoder
(GRU)
Our Solution, part #1: Equation Encoding
Encoder
(GRU)
The formula embedding that we will use in the formula retrieval task
Our Solution, part #1: Equation Encoding
Encoder
(GRU)
After the encoding step
- Decode to recover the input formula tree, using the formula embedding
- Tree beam search to improve reconstruction quality
Formula Retrieval Experiment
- 18 queries formulae
- Train (and search) on 770k equations
- Compute the embedding of all equations and queries
- Compute the cosine similarity between all equations and each query
- For each query, choose the top 25 most relevant equations
- Human evaluation: compute % of relevant equations for each query
Formula Retrieval Experiment
Formula Retrieval: Main Results
Our method outperforms the data-driven baseline
Formula Retrieval: Main Results
Our method achieves state-of-the-art when combined with Approach0
Formula Retrieval: Examples
Our method retrieves structurally and semantically more similar formulae
Learnt Formula Representation: T-SNE Example
Our method embeds good representations of different formulae
Summary
Framework to process equations via tree embeddings
- Novel encoder + decoder + beam search
- State-of-the-art formula retrieval performance
- Application to textbook math content search and beyond
Future work
- Joint math and text processing
- Deploy and pilot study at OpenStax
- Open-ended math solution feedback
Zhang et al. Math Operation Embeddings for Open-ended Solution Analysis and Feedback. To appear @EDM’21
https://arxiv.org/abs/2104.12047

Más contenido relacionado

La actualidad más candente

Machine Learning
Machine LearningMachine Learning
Machine Learningbutest
 
Information retrieval 8 term weighting
Information retrieval 8 term weightingInformation retrieval 8 term weighting
Information retrieval 8 term weightingVaibhav Khanna
 
Mining Product Reputations On the Web
Mining Product Reputations On the WebMining Product Reputations On the Web
Mining Product Reputations On the Webfeiwin
 
SelQA: A New Benchmark for Selection-based Question Answering
SelQA: A New Benchmark for Selection-based Question AnsweringSelQA: A New Benchmark for Selection-based Question Answering
SelQA: A New Benchmark for Selection-based Question AnsweringJinho Choi
 
Mining from Open Answers in Questionnaire Data
Mining from Open Answers in Questionnaire DataMining from Open Answers in Questionnaire Data
Mining from Open Answers in Questionnaire Datafeiwin
 
Ontology based approach for annotating a corpus of computer science abstracts
Ontology based approach for annotating a corpus of computer science abstractsOntology based approach for annotating a corpus of computer science abstracts
Ontology based approach for annotating a corpus of computer science abstractsZainab Almugbel
 
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...Parang Saraf
 
TextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsTextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsShubhangi Tandon
 
Report
ReportReport
Reportbutest
 
Real Time Competitive Marketing Intelligence
Real Time Competitive Marketing IntelligenceReal Time Competitive Marketing Intelligence
Real Time Competitive Marketing Intelligencefeiwin
 
Question Answering for Machine Reading Evaluation on Romanian and English
Question Answering for Machine Reading Evaluation on Romanian and EnglishQuestion Answering for Machine Reading Evaluation on Romanian and English
Question Answering for Machine Reading Evaluation on Romanian and EnglishFaculty of Computer Science
 
06 quantitative data processing
06 quantitative data processing06 quantitative data processing
06 quantitative data processingKanagaraj Easwaran
 
Data Mining and the Web_Past_Present and Future
Data Mining and the Web_Past_Present and FutureData Mining and the Web_Past_Present and Future
Data Mining and the Web_Past_Present and Futurefeiwin
 
QUT Bachelor of Mathematics (Honours) info presentation
QUT Bachelor of Mathematics (Honours) info presentationQUT Bachelor of Mathematics (Honours) info presentation
QUT Bachelor of Mathematics (Honours) info presentationDann Mallet
 
Generating SPSS training materials in StatJR
Generating SPSS training materials in StatJRGenerating SPSS training materials in StatJR
Generating SPSS training materials in StatJRUniversity of Southampton
 
Learning to learn with meta learning
Learning to learn with meta learningLearning to learn with meta learning
Learning to learn with meta learningShreeGowriRadhakrish
 
Concurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector RepresentationsConcurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector RepresentationsParang Saraf
 
Resource comparison SciKnow 2019
Resource comparison SciKnow 2019Resource comparison SciKnow 2019
Resource comparison SciKnow 2019Allard Oelen
 
IRJET- Implementation of Automatic Question Paper Generator System
IRJET- Implementation of Automatic Question Paper Generator SystemIRJET- Implementation of Automatic Question Paper Generator System
IRJET- Implementation of Automatic Question Paper Generator SystemIRJET Journal
 

La actualidad más candente (20)

Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Information retrieval 8 term weighting
Information retrieval 8 term weightingInformation retrieval 8 term weighting
Information retrieval 8 term weighting
 
Mining Product Reputations On the Web
Mining Product Reputations On the WebMining Product Reputations On the Web
Mining Product Reputations On the Web
 
SelQA: A New Benchmark for Selection-based Question Answering
SelQA: A New Benchmark for Selection-based Question AnsweringSelQA: A New Benchmark for Selection-based Question Answering
SelQA: A New Benchmark for Selection-based Question Answering
 
Mining from Open Answers in Questionnaire Data
Mining from Open Answers in Questionnaire DataMining from Open Answers in Questionnaire Data
Mining from Open Answers in Questionnaire Data
 
Ontology based approach for annotating a corpus of computer science abstracts
Ontology based approach for annotating a corpus of computer science abstractsOntology based approach for annotating a corpus of computer science abstracts
Ontology based approach for annotating a corpus of computer science abstracts
 
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...
 
TextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsTextRank: Bringing Order into Texts
TextRank: Bringing Order into Texts
 
Report
ReportReport
Report
 
Real Time Competitive Marketing Intelligence
Real Time Competitive Marketing IntelligenceReal Time Competitive Marketing Intelligence
Real Time Competitive Marketing Intelligence
 
Question Answering for Machine Reading Evaluation on Romanian and English
Question Answering for Machine Reading Evaluation on Romanian and EnglishQuestion Answering for Machine Reading Evaluation on Romanian and English
Question Answering for Machine Reading Evaluation on Romanian and English
 
06 quantitative data processing
06 quantitative data processing06 quantitative data processing
06 quantitative data processing
 
Data Mining and the Web_Past_Present and Future
Data Mining and the Web_Past_Present and FutureData Mining and the Web_Past_Present and Future
Data Mining and the Web_Past_Present and Future
 
QUT Bachelor of Mathematics (Honours) info presentation
QUT Bachelor of Mathematics (Honours) info presentationQUT Bachelor of Mathematics (Honours) info presentation
QUT Bachelor of Mathematics (Honours) info presentation
 
OR Slide
OR SlideOR Slide
OR Slide
 
Generating SPSS training materials in StatJR
Generating SPSS training materials in StatJRGenerating SPSS training materials in StatJR
Generating SPSS training materials in StatJR
 
Learning to learn with meta learning
Learning to learn with meta learningLearning to learn with meta learning
Learning to learn with meta learning
 
Concurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector RepresentationsConcurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector Representations
 
Resource comparison SciKnow 2019
Resource comparison SciKnow 2019Resource comparison SciKnow 2019
Resource comparison SciKnow 2019
 
IRJET- Implementation of Automatic Question Paper Generator System
IRJET- Implementation of Automatic Question Paper Generator SystemIRJET- Implementation of Automatic Question Paper Generator System
IRJET- Implementation of Automatic Question Paper Generator System
 

Similar a Mathematical Language Processing via Tree Embeddings

Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnBenjamin Bengfort
 
How AI Helps Students Solve Math Problems
How AI Helps Students Solve Math ProblemsHow AI Helps Students Solve Math Problems
How AI Helps Students Solve Math ProblemsAmazon Web Services
 
HyperQA: A Framework for Complex Question-Answering
HyperQA: A Framework for Complex Question-AnsweringHyperQA: A Framework for Complex Question-Answering
HyperQA: A Framework for Complex Question-AnsweringJinho Choi
 
Naver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltcNaver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltcNAVER Engineering
 
EE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxEE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxiamultapromax
 
Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...Infrrd
 
MULTI-LEARNING SPECIAL SESSION / EDUCON 2018 / EMADRID TEAM
MULTI-LEARNING SPECIAL SESSION / EDUCON 2018 / EMADRID TEAMMULTI-LEARNING SPECIAL SESSION / EDUCON 2018 / EMADRID TEAM
MULTI-LEARNING SPECIAL SESSION / EDUCON 2018 / EMADRID TEAMeMadrid network
 
An Evolution of Deep Learning Models for AI2 Reasoning Challenge
An Evolution of Deep Learning Models for AI2 Reasoning ChallengeAn Evolution of Deep Learning Models for AI2 Reasoning Challenge
An Evolution of Deep Learning Models for AI2 Reasoning ChallengeTraian Rebedea
 
Introduction to Artificial Intelligence...pptx
Introduction to Artificial Intelligence...pptxIntroduction to Artificial Intelligence...pptx
Introduction to Artificial Intelligence...pptxMMCOE, Karvenagar, Pune
 
IRJET - Automated Essay Grading System using Deep Learning
IRJET -  	  Automated Essay Grading System using Deep LearningIRJET -  	  Automated Essay Grading System using Deep Learning
IRJET - Automated Essay Grading System using Deep LearningIRJET Journal
 
Data structure and algorithm.
Data structure and algorithm. Data structure and algorithm.
Data structure and algorithm. Abdul salam
 
intership summary
intership summaryintership summary
intership summaryJunting Ma
 
Start machine learning in 5 simple steps
Start machine learning in 5 simple stepsStart machine learning in 5 simple steps
Start machine learning in 5 simple stepsRenjith M P
 
Wecp all-india-test-series-program-brochure
Wecp all-india-test-series-program-brochureWecp all-india-test-series-program-brochure
Wecp all-india-test-series-program-brochureBIPIN KAUSHIK
 

Similar a Mathematical Language Processing via Tree Embeddings (20)

Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Lecture 1 (bce-7)
Lecture   1 (bce-7)Lecture   1 (bce-7)
Lecture 1 (bce-7)
 
How AI Helps Students Solve Math Problems
How AI Helps Students Solve Math ProblemsHow AI Helps Students Solve Math Problems
How AI Helps Students Solve Math Problems
 
HyperQA: A Framework for Complex Question-Answering
HyperQA: A Framework for Complex Question-AnsweringHyperQA: A Framework for Complex Question-Answering
HyperQA: A Framework for Complex Question-Answering
 
Naver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltcNaver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltc
 
EE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxEE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptx
 
Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...
 
geekgap.io webinar #1
geekgap.io webinar #1geekgap.io webinar #1
geekgap.io webinar #1
 
MULTI-LEARNING SPECIAL SESSION / EDUCON 2018 / EMADRID TEAM
MULTI-LEARNING SPECIAL SESSION / EDUCON 2018 / EMADRID TEAMMULTI-LEARNING SPECIAL SESSION / EDUCON 2018 / EMADRID TEAM
MULTI-LEARNING SPECIAL SESSION / EDUCON 2018 / EMADRID TEAM
 
An Evolution of Deep Learning Models for AI2 Reasoning Challenge
An Evolution of Deep Learning Models for AI2 Reasoning ChallengeAn Evolution of Deep Learning Models for AI2 Reasoning Challenge
An Evolution of Deep Learning Models for AI2 Reasoning Challenge
 
Introduction to Artificial Intelligence...pptx
Introduction to Artificial Intelligence...pptxIntroduction to Artificial Intelligence...pptx
Introduction to Artificial Intelligence...pptx
 
IRJET - Automated Essay Grading System using Deep Learning
IRJET -  	  Automated Essay Grading System using Deep LearningIRJET -  	  Automated Essay Grading System using Deep Learning
IRJET - Automated Essay Grading System using Deep Learning
 
Data structure and algorithm.
Data structure and algorithm. Data structure and algorithm.
Data structure and algorithm.
 
Presentation
PresentationPresentation
Presentation
 
Algorithms
AlgorithmsAlgorithms
Algorithms
 
intership summary
intership summaryintership summary
intership summary
 
MACHINE LEARNING.pptx
MACHINE LEARNING.pptxMACHINE LEARNING.pptx
MACHINE LEARNING.pptx
 
Start machine learning in 5 simple steps
Start machine learning in 5 simple stepsStart machine learning in 5 simple steps
Start machine learning in 5 simple steps
 
Wecp all-india-test-series-program-brochure
Wecp all-india-test-series-program-brochureWecp all-india-test-series-program-brochure
Wecp all-india-test-series-program-brochure
 
Wecp all-india-test-series-program-brochure
Wecp all-india-test-series-program-brochureWecp all-india-test-series-program-brochure
Wecp all-india-test-series-program-brochure
 

Más de Sergey Sosnovsky

Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...
Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...
Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...Sergey Sosnovsky
 
Toward Eliminating Hallucinations: GPT-based Explanatory AI for Intelligent T...
Toward Eliminating Hallucinations: GPT-based Explanatory AI for Intelligent T...Toward Eliminating Hallucinations: GPT-based Explanatory AI for Intelligent T...
Toward Eliminating Hallucinations: GPT-based Explanatory AI for Intelligent T...Sergey Sosnovsky
 
Layout- and Activity-based Textbook Modeling for Automatic PDF Textbook Extra...
Layout- and Activity-based Textbook Modeling for Automatic PDF Textbook Extra...Layout- and Activity-based Textbook Modeling for Automatic PDF Textbook Extra...
Layout- and Activity-based Textbook Modeling for Automatic PDF Textbook Extra...Sergey Sosnovsky
 
Exploring the Content Ecosystem of the First Open-source Adaptive Tutor and i...
Exploring the Content Ecosystem of the First Open-source Adaptive Tutor and i...Exploring the Content Ecosystem of the First Open-source Adaptive Tutor and i...
Exploring the Content Ecosystem of the First Open-source Adaptive Tutor and i...Sergey Sosnovsky
 
Advancing Intelligent Textbooks with Automatically Generated Practice: A Larg...
Advancing Intelligent Textbooks with Automatically Generated Practice: A Larg...Advancing Intelligent Textbooks with Automatically Generated Practice: A Larg...
Advancing Intelligent Textbooks with Automatically Generated Practice: A Larg...Sergey Sosnovsky
 
Creating Session Data from eTextbook Event Streams
Creating Session Data from eTextbook Event StreamsCreating Session Data from eTextbook Event Streams
Creating Session Data from eTextbook Event StreamsSergey Sosnovsky
 
Augmenting Digital Textbooks with Reusable Smart Learning Content: Solutions ...
Augmenting Digital Textbooks with Reusable Smart Learning Content: Solutions ...Augmenting Digital Textbooks with Reusable Smart Learning Content: Solutions ...
Augmenting Digital Textbooks with Reusable Smart Learning Content: Solutions ...Sergey Sosnovsky
 
Interactions of reading and assessment activities
Interactions of reading and assessment activitiesInteractions of reading and assessment activities
Interactions of reading and assessment activitiesSergey Sosnovsky
 
Parallel Construction: A Parallel Corpus Approach for Automatic Question Gene...
Parallel Construction: A Parallel Corpus Approach for Automatic Question Gene...Parallel Construction: A Parallel Corpus Approach for Automatic Question Gene...
Parallel Construction: A Parallel Corpus Approach for Automatic Question Gene...Sergey Sosnovsky
 
YAI4Edu: an Explanatory AI to Generate Interactive e-Books for Education
YAI4Edu: an Explanatory AI to Generate Interactive e-Books for EducationYAI4Edu: an Explanatory AI to Generate Interactive e-Books for Education
YAI4Edu: an Explanatory AI to Generate Interactive e-Books for EducationSergey Sosnovsky
 
Automatic Question Generation for Evidence-based Online Courseware Engineering
Automatic Question Generation for Evidence-based Online Courseware EngineeringAutomatic Question Generation for Evidence-based Online Courseware Engineering
Automatic Question Generation for Evidence-based Online Courseware EngineeringSergey Sosnovsky
 
Reading Comprehension Quiz Generation using Generative Pre-trained Transformers
Reading Comprehension Quiz Generation using Generative Pre-trained TransformersReading Comprehension Quiz Generation using Generative Pre-trained Transformers
Reading Comprehension Quiz Generation using Generative Pre-trained TransformersSergey Sosnovsky
 
Transforming Textbooks into Learning by Doing Environments: An Evaluation of ...
Transforming Textbooks into Learning by Doing Environments: An Evaluation of ...Transforming Textbooks into Learning by Doing Environments: An Evaluation of ...
Transforming Textbooks into Learning by Doing Environments: An Evaluation of ...Sergey Sosnovsky
 
Generation of Assessment Questions from Textbooks Enriched with Knowledge Models
Generation of Assessment Questions from Textbooks Enriched with Knowledge ModelsGeneration of Assessment Questions from Textbooks Enriched with Knowledge Models
Generation of Assessment Questions from Textbooks Enriched with Knowledge ModelsSergey Sosnovsky
 
Using Semantics of Textbook Highlights to Predict Student Comprehension and K...
Using Semantics of Textbook Highlights to Predict Student Comprehension and K...Using Semantics of Textbook Highlights to Predict Student Comprehension and K...
Using Semantics of Textbook Highlights to Predict Student Comprehension and K...Sergey Sosnovsky
 
Dental TutorBot: Exploitation of Dental Textbooks for Automated Learning
Dental TutorBot: Exploitation of Dental Textbooks for Automated LearningDental TutorBot: Exploitation of Dental Textbooks for Automated Learning
Dental TutorBot: Exploitation of Dental Textbooks for Automated LearningSergey Sosnovsky
 
Using Programmed Instruction to Help Students Engage with eTextbook Content
Using Programmed Instruction to Help Students Engage with eTextbook Content Using Programmed Instruction to Help Students Engage with eTextbook Content
Using Programmed Instruction to Help Students Engage with eTextbook Content Sergey Sosnovsky
 
Adding Intelligence to a Textbook for Human Anatomy with a Causal Concept Map...
Adding Intelligence to a Textbook for Human Anatomy with a Causal Concept Map...Adding Intelligence to a Textbook for Human Anatomy with a Causal Concept Map...
Adding Intelligence to a Textbook for Human Anatomy with a Causal Concept Map...Sergey Sosnovsky
 
Interlingua: Linking Textbooks Across Different Languages
Interlingua: Linking Textbooks Across Different Languages Interlingua: Linking Textbooks Across Different Languages
Interlingua: Linking Textbooks Across Different Languages Sergey Sosnovsky
 

Más de Sergey Sosnovsky (20)

Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...
Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...
Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...
 
Toward Eliminating Hallucinations: GPT-based Explanatory AI for Intelligent T...
Toward Eliminating Hallucinations: GPT-based Explanatory AI for Intelligent T...Toward Eliminating Hallucinations: GPT-based Explanatory AI for Intelligent T...
Toward Eliminating Hallucinations: GPT-based Explanatory AI for Intelligent T...
 
Layout- and Activity-based Textbook Modeling for Automatic PDF Textbook Extra...
Layout- and Activity-based Textbook Modeling for Automatic PDF Textbook Extra...Layout- and Activity-based Textbook Modeling for Automatic PDF Textbook Extra...
Layout- and Activity-based Textbook Modeling for Automatic PDF Textbook Extra...
 
Exploring the Content Ecosystem of the First Open-source Adaptive Tutor and i...
Exploring the Content Ecosystem of the First Open-source Adaptive Tutor and i...Exploring the Content Ecosystem of the First Open-source Adaptive Tutor and i...
Exploring the Content Ecosystem of the First Open-source Adaptive Tutor and i...
 
Advancing Intelligent Textbooks with Automatically Generated Practice: A Larg...
Advancing Intelligent Textbooks with Automatically Generated Practice: A Larg...Advancing Intelligent Textbooks with Automatically Generated Practice: A Larg...
Advancing Intelligent Textbooks with Automatically Generated Practice: A Larg...
 
Creating Session Data from eTextbook Event Streams
Creating Session Data from eTextbook Event StreamsCreating Session Data from eTextbook Event Streams
Creating Session Data from eTextbook Event Streams
 
Augmenting Digital Textbooks with Reusable Smart Learning Content: Solutions ...
Augmenting Digital Textbooks with Reusable Smart Learning Content: Solutions ...Augmenting Digital Textbooks with Reusable Smart Learning Content: Solutions ...
Augmenting Digital Textbooks with Reusable Smart Learning Content: Solutions ...
 
Interactions of reading and assessment activities
Interactions of reading and assessment activitiesInteractions of reading and assessment activities
Interactions of reading and assessment activities
 
Parallel Construction: A Parallel Corpus Approach for Automatic Question Gene...
Parallel Construction: A Parallel Corpus Approach for Automatic Question Gene...Parallel Construction: A Parallel Corpus Approach for Automatic Question Gene...
Parallel Construction: A Parallel Corpus Approach for Automatic Question Gene...
 
YAI4Edu: an Explanatory AI to Generate Interactive e-Books for Education
YAI4Edu: an Explanatory AI to Generate Interactive e-Books for EducationYAI4Edu: an Explanatory AI to Generate Interactive e-Books for Education
YAI4Edu: an Explanatory AI to Generate Interactive e-Books for Education
 
Automatic Question Generation for Evidence-based Online Courseware Engineering
Automatic Question Generation for Evidence-based Online Courseware EngineeringAutomatic Question Generation for Evidence-based Online Courseware Engineering
Automatic Question Generation for Evidence-based Online Courseware Engineering
 
Reading Comprehension Quiz Generation using Generative Pre-trained Transformers
Reading Comprehension Quiz Generation using Generative Pre-trained TransformersReading Comprehension Quiz Generation using Generative Pre-trained Transformers
Reading Comprehension Quiz Generation using Generative Pre-trained Transformers
 
Transforming Textbooks into Learning by Doing Environments: An Evaluation of ...
Transforming Textbooks into Learning by Doing Environments: An Evaluation of ...Transforming Textbooks into Learning by Doing Environments: An Evaluation of ...
Transforming Textbooks into Learning by Doing Environments: An Evaluation of ...
 
Generation of Assessment Questions from Textbooks Enriched with Knowledge Models
Generation of Assessment Questions from Textbooks Enriched with Knowledge ModelsGeneration of Assessment Questions from Textbooks Enriched with Knowledge Models
Generation of Assessment Questions from Textbooks Enriched with Knowledge Models
 
Using Semantics of Textbook Highlights to Predict Student Comprehension and K...
Using Semantics of Textbook Highlights to Predict Student Comprehension and K...Using Semantics of Textbook Highlights to Predict Student Comprehension and K...
Using Semantics of Textbook Highlights to Predict Student Comprehension and K...
 
Dental TutorBot: Exploitation of Dental Textbooks for Automated Learning
Dental TutorBot: Exploitation of Dental Textbooks for Automated LearningDental TutorBot: Exploitation of Dental Textbooks for Automated Learning
Dental TutorBot: Exploitation of Dental Textbooks for Automated Learning
 
What's in a textbook
What's in a textbookWhat's in a textbook
What's in a textbook
 
Using Programmed Instruction to Help Students Engage with eTextbook Content
Using Programmed Instruction to Help Students Engage with eTextbook Content Using Programmed Instruction to Help Students Engage with eTextbook Content
Using Programmed Instruction to Help Students Engage with eTextbook Content
 
Adding Intelligence to a Textbook for Human Anatomy with a Causal Concept Map...
Adding Intelligence to a Textbook for Human Anatomy with a Causal Concept Map...Adding Intelligence to a Textbook for Human Anatomy with a Causal Concept Map...
Adding Intelligence to a Textbook for Human Anatomy with a Causal Concept Map...
 
Interlingua: Linking Textbooks Across Different Languages
Interlingua: Linking Textbooks Across Different Languages Interlingua: Linking Textbooks Across Different Languages
Interlingua: Linking Textbooks Across Different Languages
 

Último

On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxcallscotland1987
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxdhanalakshmis0310
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 

Último (20)

On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 

Mathematical Language Processing via Tree Embeddings

  • 1. Mathematical Language Processing via Tree Embeddings Jack Wang, Andrew Lan, Richard Baraniuk June 15, 2021
  • 2. Mathematical Language Is Everywhere textbooks academic papers Wikipedia articles Difficult to extract and synthesize information from massive content How to efficiently find relevant mathematical content?
  • 3. The Mathematical Content Retrieval Problem Difficult to extract and synthesize information from massive content Desired: efficient, automated system to aid indexing, searching, and organizing mathematical contents We focus on formula retrieval: - Search for and retrieve similar equations, given a query equation
  • 4. The Mathematical Content Retrieval Problem Current search engines lack ability to effectively search for mathematical content Machine learning
  • 5. The Mathematical Content Retrieval Problem Current search engines lack ability to effectively search for mathematical content query equation in a machine learning textbook Search results contain only specific characters that match with input query but NOT the entire equation
  • 6. The Mathematical Content Retrieval Problem Desired retrieval
  • 7. Our Solution: Formula Representation via Tree Embeddings A novel framework that learns a good representation of mathematical formulae Based on the encoder-decoder architecture ● A novel encoding scheme: equation as trees ● A novel decoding scheme: generate equation as trees formula encoder decoder Reconstructed formula Formula embedding Minimize this reconstruction loss
  • 8. Our Solution, part #1: Equation Encoding Explicitly capture the semantic and syntactic information in an equation Encoder (GRU)
  • 9. Our Solution, part #1: Equation Encoding Encoder (GRU) The formula embedding that we will use in the formula retrieval task
  • 10. Our Solution, part #1: Equation Encoding Encoder (GRU) After the encoding step - Decode to recover the input formula tree, using the formula embedding - Tree beam search to improve reconstruction quality
  • 11. Formula Retrieval Experiment - 18 queries formulae - Train (and search) on 770k equations - Compute the embedding of all equations and queries - Compute the cosine similarity between all equations and each query - For each query, choose the top 25 most relevant equations - Human evaluation: compute % of relevant equations for each query
  • 13. Formula Retrieval: Main Results Our method outperforms the data-driven baseline
  • 14. Formula Retrieval: Main Results Our method achieves state-of-the-art when combined with Approach0
  • 15. Formula Retrieval: Examples Our method retrieves structurally and semantically more similar formulae
  • 16. Learnt Formula Representation: T-SNE Example Our method embeds good representations of different formulae
  • 17. Summary Framework to process equations via tree embeddings - Novel encoder + decoder + beam search - State-of-the-art formula retrieval performance - Application to textbook math content search and beyond Future work - Joint math and text processing - Deploy and pilot study at OpenStax - Open-ended math solution feedback Zhang et al. Math Operation Embeddings for Open-ended Solution Analysis and Feedback. To appear @EDM’21 https://arxiv.org/abs/2104.12047

Notas del editor

  1. Hello my name is Jack Wang and today I am going to present my project on mathematical language processing.
  2. The question we focus here is: how do we efficiently find relevant mathematical content?
  3. In this talk, I will primarily focus on the problem of formula retrieval as a representative problem. Namely, given an equation, we would like to find the most relevant ones. You can think of this as a search engine such as Google but it is devoted to mathematical formulae. The ability to search for formula is useful for a number of educational related applications. For example, a student might want to search for relevant assessment questions given a query question, or they want to search for relevant content in a textbook given a query formula.
  4. Here is a concrete hypothetical example. Say you have a machine learning textbook and you are searching relevant formula given a query formula. Current search engines lack the ability to effectively search for formulae.
  5. If you look at the retrieval results , you will find that they contain specific components that match query but not the entire formulae. This observation suggests that we need a method that better captures the semantics of a math formula such that a search engine can return the most relevant ones.
  6. For example, this retrieval result is a good match to the query
  7. In this project, we present a solution from a representation learning perspective. The starting point is that, we want to learn a good representation of math formulae, such that we can use this representation for the formula retrieval task. Our solution is a novel framework that processes math formula in the form of trees. This is because every formula can be inherently represented as a tree structure, and by explicitly learning their tree representations, our framework retains the inherent properties of formulae and therefore improves the retrieval performance. More specifically, the framework contains 3 key components. The first component is a tree encoder, which encodes the formula in its tree format into a vector representation, or embedding. The second component is a generator, which reconstructs the input formula tree. The entire pipeline is optimized end-to-end by minimizing the reconstruction error between the input formula tree and the reconstructed formulae tree.
  8. As I mentioned earlier, this step us to explicitly capture the semantic and syntactic information in an equation.
  9. This embedding is what we will use for the formula retrieval task.
  10. To complete the pipeline, After the encoding step, we use a decoder that reconstructs the input formula in its tree format. To improve reconstruction quality, we also develop a beam search algorithm specifically for tree structured data. I’ll skip the technical details but you can find them in the paper.
  11. We validate our framework on a formula retrieval task. In this task, we have 18 query formula
  12. Here are some examples of queries. You can see that they are diverse in appearance and subject domain.
  13. First of all, we can first observe that our method outperforms the other data-driven baseline on both metrics.
  14. So we develop a new method that combines the strengths of both our method and Approach0. We can see that this method achieves state-of-the-art performance on this formula retrieval task.
  15. We can see that our method retrieves equations that are semantically and structurally more similar to the query, whereas the tangentCFT baseline fails to do so in some cases.
  16. I also want to visualize how the learnt formula representations are. Here, we choose a small number of formula from different math topics and plot their 2 dimensional TSNE embeddings. We can see that these embeddings form nice clusters. Which indicates that our model learns meaningful representations of these formula.
  17. And finally, we can apply our method to analyze students step-wise answers to open ended math questions. We have a paper that is going to appear in the educational data mining conference later this month. The arxiv version is already out. If you are interested you are welcome to checkout the paper and attend our talk at EDM to learn more. Thanks