SlideShare una empresa de Scribd logo
Who we are?
What is Grammarly?
Grammarly’s AI-powered writing
assistant helps you make your
communication clear and effective,
wherever you type.
● Correctness: grammar, spelling, punctuation, writing
mechanics, and updated terminologies
● Clarity: clear writing, conciseness, complexity, and
readability
● Engagement: vocabulary and variety
● Delivery: formality, politeness, and confidence
Grammarly’s
writing assistant
350+
team members across offices
in San Francisco, New York,
Kyiv, and Vancouver
2,000+
Institutional and enterprise
clients
20M+
daily active users around
the world
Grammarly by the numbers
Grammarly’s NLP team
Our team which works on NLP contains 30+ people of the
following roles:
- applied research scientists
- machine learning engineers
- computational linguists
This work
This presentation is based on the paper written by:
- Kostiantyn Omelianchuk
- Oleksandr Skurzhanskyi
- Vitaliy Atrasevych
- Artem Chernodub
What task we are
trying to solve?
The goal of GEC task is to produce the grammatically correct
sentence from the sentence with mistakes.
Example:
Source: He go at school.
Target: He goes to school.
GEC: Grammatical
Error Correction
A Comprehensive Survey of Grammar Error Correction (Yu
Wang*, Yuelin Wang*, Jie Liuɨ , and Zhuo Liu)
GEC: History overview
Categorization of research
in GEC
GEC: Data
GEC: Shared task
- CoNLL-2014 (the test set is composed of 50 essays written
by non-native English students; metric - M2Score)
- BEA-2019 (new data; the test set consists of 4477
sentences; metric - Errant)
GEC: Progress
Sequence
Tagging
We approach the GEC task as a sequence tagging problem.
In this formulation for each token in the source sentence a GEC
system should produce a tag (edit) which represent a required
correction operation for this token.
For solving this problem we use non-autoregressive
transformer-based model.
Sequence Tagging
keep the current token
unchanged
APPEND_ti
DELETE
REPLACE_ti
KEEP
delete current token
append new token ti
replace the current token
with token ti
SPLIT
MERGE
NOUN
NUMBER
CASE
VERB
FORM
Basic transformations
change the case of the
current token
g-transformations
merge the current and
the next token
split the current token
into two
convert singular nouns
to plurals and vice versa
change the form of
regular/irregular verbs
Token-level transformations
A ten years old boy go school
A ⇨ A:
ten ⇨ ten, -: ,
years ⇨ year, -:
old ⇨ old:
go ⇨ goes, to:
school ⇨ school, .:
A ten years old boy go school
A ten years old boy go school
$KEEP
$MERGE_HYPH
$NOUN_NUMBER_SINGULAR
$VERB_FORM_VB_VBZ
$APPEND_{.}
$KEEP
$MERGE_HYPH
$KEEP
$APPEND_{to}
$KEEP
A ten-year-old boy goes to school.
1. Generating tags is fully independent and easy to parallelize
operation.
2. Smaller output vocabulary size compare to seq2seq
models.
3. Less usage memory and faster inference compare to
encoder-decoder models.
Sequence Tagging: Pros
Sequence Tagging: Cons
1. Independent generating of tags relies on assumption that
errors are independent between each other.
2. Word reordering is tricky for this architecture.
First baselines
Academia vs Industry
- different data (both training/evaluation)
- latency/throughput matters
- aggressive deadlines
Baseline architecture
Encoder
Input sentence
Error
replacement
linear layer
Predicted tags
Corrected
sentence
Error
detection
linear layer
Postprocess
Error insertion
linear layer(s)
Embeddings
Baseline hyperparameters
- Emedings (trainable random, glove, bert, distill-bert)
- Encoder (cnn, lstm, stacked-lstm, pass_through, transformers)
- Output layers (1-2 for insertions, 1 for replacement, 1 for
detection)
- Output vocabulary size (100-50000)
- Other (dropouts, training schedule, tp/tn ratio, etc)
Baseline results
Baseline/m2 score CoNLL-2014 (test)
Grammarly Spellchecker
26.8
Stacked LSTM (vocab_size=1000) 30.5
Stacked LSTM (vocab_size=1000; + g-transformations) 35.6
Stacked LSTM (vocab_size=1000; + g-transformations; +
BERT embeddings) 46
Academic SOTA (single model) 61.3
Insights
- Increasing size of output vocabulary does not help
- Adding BERT as emdebbings helps a lot
- Training BERT with small batch_size failed (not converge);
training with bigger batches required gradient accumulating
Sudden
competitor
PIE paper
Our SOTA on NUCLE: 46
PIE paper
Our SOTA on NUCLE: 46
PIE paper
Our SOTA on NUCLE: 46
Transformers
Transformer
Attention Is All You Need
(Vaswani et al. 2017)
Multihead attention
BERT-like architecture
BERT (Devlin et al. 2018)
Training
transformers
huggingface / transformers
Training transformers.
First try
Thing that made our transformers works:
● small learning rate (1e-5)
● big batch size (at least 128 sentences)
or use gradient accumulation
● freeze BERT encoder first and train only linear layer,
then train jointly
Transformers recipe
*on CoNLL-2014 (test)
BERTology works
LSTM 35.6
LSTM + BERT
embeddings
46.0
DistillBERT 52.8
BERT-base 57.3
Additional tricks
A ten years old boy go school (zero corrections)
Iteration 1: A ten-years old boy goes school (2 total corrections)
Iteration 2 A ten-year-old boy goes to school (5 total corrections)
Iteration 3: A ten-year-old boy goes to school. (6 total corrections)
Iteration 2
Iterative Approach
Iteration 1
Iteration 3
Sequence
GECToR model: iterative
pipeline
Encoder (transformer-based)
Input sentence
Error correction
linear layer
Predicted tags
Corrected
sentence
Error detection
linear layer
Postprocess
Repeat N times
● Training is splitted in N stages
● Each stage has its own data
● Each stage has its own hyperparameters
● On each stage model is initialized by the best weights of
previous stage
Staged training
I. Pre-training on synthetic
errorful sentences as in
(Awasthi et al., 2019)
II. Fine-tuning on errorful-only
sentences
III. Fine-tuning on subset of
errorful and errorfree
sentences as in
(Kiyono et al., 2019)
Training stages
Dataset # sentences
% errorful
sentences
PIE-synthetic 9,000,000 100.0%
Lang-8 947,344 52.5%
NUCLE 56,958 38.0%
FCE 34,490 62.4%
W&I+LOCNESS 34,304 67.3%
I. Pre-training on synthetic
errorful sentences as in
(Awasthi et al., 2019)
II. Fine-tuning on errorful-only
sentences(new!)
III. Fine-tuning on subset of
errorful and errorfree
sentences as in
(Kiyono et al., 2019)
Training stages
Dataset # sentences
% errorful
sentences
PIE-synthetic 9,000,000 100.0%
Lang-8 947,344 52.5%
NUCLE 56,958 38.0%
FCE 34,490 62.4%
W&I+LOCNESS 34,304 67.3%
I. Pre-training on synthetic
errorful sentences as in
(Awasthi et al., 2019)
II. Fine-tuning on errorful-only
sentences
III. Fine-tuning on subset of
errorful and errorfree
sentences as in
(Kiyono et al., 2019)
Training stages
Dataset # sentences
% errorful
sentences
PIE-synthetic 9,000,000 100.0%
Lang-8 947,344 52.5%
NUCLE 56,958 38.0%
FCE 34,490 62.4%
W&I+LOCNESS 34,304 67.3%
● Adam optimizer (Kingma and Ba, 2015);
● Early stopping after 3 epochs of 10K
updates each w/a improvement;
● Epochs & batch sizes:
○ Stage I (pretraining):
batch size=256, 20 epochs
○ Stages II, III (finetuning):
batch size=128, 2-3 epochs
Training details
Training
stage #
CoNLL-2014
(test)*, F0.5
BEA-2019
(dev)*, F0.5
I 49.9 33.2
II 59.7 44.4
III 62.5 50.3
III + Inf.
tweaks
65.3 55.5
* Results are given for GECToR (XLNet).
Inference tweaks 1
By increasing probability of $KEEP tag we can force model to make
only confident actions.
In such a way, we can increase precision by trading recall.
Inference tweaks 2
We also compute the minimum probability of incorrect class across
all tokens in the sentence.
This value (min_erorr_probability) should be higher than threshold in
order to run next iteration.
BERT family
Varying encoders from
pretrained transformers
* Training was performed on data from training stage II only. ** Baseline.
Encoder
CONLL-2014 (test) BEA-2019 (test)
P R F0.5
P R F0.5
LSTM** 51.6 15.3 35.0 - - -
ALBERT 59.5 31.0 50.3 43.8 22.3 36.7
BERT 65.6 36.9 56.8 48.3 29.0 42.6
GPT-2 61.0 6.3 22.2 44.5 5.0 17.2
RoBERTa 67.5 38.3 58.6 50.3 30.5 44.5
XLNet 64.6 42.6 58.6 47.1 34.2 43.8
Varying encoders from
pretrained transformers
* Training was performed on data from training stage II only. ** Baseline.
Encoder
CONLL-2014 (test) BEA-2019 (test)
P R F0.5
P R F0.5
LSTM** 51.6 15.3 35.0 - - -
ALBERT 59.5 31.0 50.3 43.8 22.3 36.7
BERT 65.6 36.9 56.8 48.3 29.0 42.6
GPT-2 61.0 6.3 22.2 44.5 5.0 17.2
RoBERTa 67.5 38.3 58.6 50.3 30.5 44.5
XLNet 64.6 42.6 58.6 47.1 34.2 43.8
Model inference
time
Speed comparison
● NVIDIA Tesla V100
● CoNLL-2014 (test)
● single model
● batch size=128
Speed comparison
Final results
Results
Big picture
P.S. Writing paper
Writing paper
Writing paper
- It takes us approx 2 months for 4 authors to write 4 pages
- Better to start as early as possible
Support your code
Thank you!
Links
● The paper
● The code
● GEC survey

Más contenido relacionado

La actualidad más candente

BERT Finetuning Webinar Presentation
BERT Finetuning Webinar PresentationBERT Finetuning Webinar Presentation
BERT Finetuning Webinar Presentation
bhavesh_physics
 
Bert
BertBert
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural Networks
SDL
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Minh Pham
 
Bert pre_training_of_deep_bidirectional_transformers_for_language_understanding
Bert  pre_training_of_deep_bidirectional_transformers_for_language_understandingBert  pre_training_of_deep_bidirectional_transformers_for_language_understanding
Bert pre_training_of_deep_bidirectional_transformers_for_language_understanding
ThyrixYang1
 
SOFTCARDINALITY: Learning to Identify Directional Cross-Lingual Entailment fr...
SOFTCARDINALITY: Learning to Identify Directional Cross-Lingual Entailment fr...SOFTCARDINALITY: Learning to Identify Directional Cross-Lingual Entailment fr...
SOFTCARDINALITY: Learning to Identify Directional Cross-Lingual Entailment fr...
Sergio Jimenez
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
gohyunwoong
 
Capturing Word-level Dependencies in Morpheme-based Language Modeling
Capturing Word-level Dependencies in Morpheme-based Language ModelingCapturing Word-level Dependencies in Morpheme-based Language Modeling
Capturing Word-level Dependencies in Morpheme-based Language Modeling
Guy De Pauw
 
Transformers and BERT with SageMaker
Transformers and BERT with SageMakerTransformers and BERT with SageMaker
Transformers and BERT with SageMaker
Suman Debnath
 
Introduction to Transformers
Introduction to TransformersIntroduction to Transformers
Introduction to Transformers
Suman Debnath
 
22_11_2019 «Gamifying a software testing course with the Code Defenders Testi...
22_11_2019 «Gamifying a software testing course with the Code Defenders Testi...22_11_2019 «Gamifying a software testing course with the Code Defenders Testi...
22_11_2019 «Gamifying a software testing course with the Code Defenders Testi...
eMadrid network
 
Electra
ElectraElectra
Electra
San Kim
 
ACL読み会2014@PFI "Less Grammar, More Features"
ACL読み会2014@PFI "Less Grammar, More Features"ACL読み会2014@PFI "Less Grammar, More Features"
ACL読み会2014@PFI "Less Grammar, More Features"
nozyh
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural Network
Viet-Trung TRAN
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Deep Learning Italia
 
[論文紹介] Deep contextualized word representations
[論文紹介] Deep contextualized word representations[論文紹介] Deep contextualized word representations
[論文紹介] Deep contextualized word representations
OgataTomoya
 
IRJET- Spoken Language Identification System using MFCC Features and Gaus...
IRJET-  	  Spoken Language Identification System using MFCC Features and Gaus...IRJET-  	  Spoken Language Identification System using MFCC Features and Gaus...
IRJET- Spoken Language Identification System using MFCC Features and Gaus...
IRJET Journal
 
Ics1019 ics5003
Ics1019 ics5003Ics1019 ics5003
Ics1019 ics5003
Matt Montebello
 
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deren Lei
 

La actualidad más candente (19)

BERT Finetuning Webinar Presentation
BERT Finetuning Webinar PresentationBERT Finetuning Webinar Presentation
BERT Finetuning Webinar Presentation
 
Bert
BertBert
Bert
 
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural Networks
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Bert pre_training_of_deep_bidirectional_transformers_for_language_understanding
Bert  pre_training_of_deep_bidirectional_transformers_for_language_understandingBert  pre_training_of_deep_bidirectional_transformers_for_language_understanding
Bert pre_training_of_deep_bidirectional_transformers_for_language_understanding
 
SOFTCARDINALITY: Learning to Identify Directional Cross-Lingual Entailment fr...
SOFTCARDINALITY: Learning to Identify Directional Cross-Lingual Entailment fr...SOFTCARDINALITY: Learning to Identify Directional Cross-Lingual Entailment fr...
SOFTCARDINALITY: Learning to Identify Directional Cross-Lingual Entailment fr...
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Capturing Word-level Dependencies in Morpheme-based Language Modeling
Capturing Word-level Dependencies in Morpheme-based Language ModelingCapturing Word-level Dependencies in Morpheme-based Language Modeling
Capturing Word-level Dependencies in Morpheme-based Language Modeling
 
Transformers and BERT with SageMaker
Transformers and BERT with SageMakerTransformers and BERT with SageMaker
Transformers and BERT with SageMaker
 
Introduction to Transformers
Introduction to TransformersIntroduction to Transformers
Introduction to Transformers
 
22_11_2019 «Gamifying a software testing course with the Code Defenders Testi...
22_11_2019 «Gamifying a software testing course with the Code Defenders Testi...22_11_2019 «Gamifying a software testing course with the Code Defenders Testi...
22_11_2019 «Gamifying a software testing course with the Code Defenders Testi...
 
Electra
ElectraElectra
Electra
 
ACL読み会2014@PFI "Less Grammar, More Features"
ACL読み会2014@PFI "Less Grammar, More Features"ACL読み会2014@PFI "Less Grammar, More Features"
ACL読み会2014@PFI "Less Grammar, More Features"
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural Network
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
 
[論文紹介] Deep contextualized word representations
[論文紹介] Deep contextualized word representations[論文紹介] Deep contextualized word representations
[論文紹介] Deep contextualized word representations
 
IRJET- Spoken Language Identification System using MFCC Features and Gaus...
IRJET-  	  Spoken Language Identification System using MFCC Features and Gaus...IRJET-  	  Spoken Language Identification System using MFCC Features and Gaus...
IRJET- Spoken Language Identification System using MFCC Features and Gaus...
 
Ics1019 ics5003
Ics1019 ics5003Ics1019 ics5003
Ics1019 ics5003
 
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
 

Similar a Kostiantyn Omelianchuk, Oleksandr Skurzhanskyi "Building a state-of-the-art approach to Grammatical Error Correction"

ELECTRA_Pretraining Text Encoders as Discriminators rather than Generators
ELECTRA_Pretraining Text Encoders as Discriminators rather than GeneratorsELECTRA_Pretraining Text Encoders as Discriminators rather than Generators
ELECTRA_Pretraining Text Encoders as Discriminators rather than Generators
Danbi Cho
 
Free Writing - Grammatical Error Correction System: Sequence Tagging
Free Writing - Grammatical Error Correction System: Sequence TaggingFree Writing - Grammatical Error Correction System: Sequence Tagging
Free Writing - Grammatical Error Correction System: Sequence Tagging
IRJET Journal
 
IA3_presentation.pptx
IA3_presentation.pptxIA3_presentation.pptx
IA3_presentation.pptx
KtonNguyn2
 
American sign language recognizer
American sign language recognizerAmerican sign language recognizer
American sign language recognizer
Garrett Broughton, Architect/Engineer
 
Your Own Metric System
Your Own Metric SystemYour Own Metric System
Your Own Metric System
Erin Dees
 
BloombergGPT.pdfA Large Language Model for Finance
BloombergGPT.pdfA Large Language Model for FinanceBloombergGPT.pdfA Large Language Model for Finance
BloombergGPT.pdfA Large Language Model for Finance
957671457
 
Triantafyllia Voulibasi
Triantafyllia VoulibasiTriantafyllia Voulibasi
Triantafyllia Voulibasi
ISSEL
 
Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword units
Tae Hwan Jung
 
Refactoring legacy code driven by tests - ENG
Refactoring legacy code driven by tests - ENGRefactoring legacy code driven by tests - ENG
Refactoring legacy code driven by tests - ENG
Luca Minudel
 
05-transformers.pdf
05-transformers.pdf05-transformers.pdf
05-transformers.pdf
ChaoYang81
 
LLM GPT-3: Language models are few-shot learners
LLM GPT-3: Language models are few-shot learnersLLM GPT-3: Language models are few-shot learners
LLM GPT-3: Language models are few-shot learners
Yan Xu
 
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
cvpaper. challenge
 
A Neural Grammatical Error Correction built on Better Pre-training and Sequen...
A Neural Grammatical Error Correction built on Better Pre-training and Sequen...A Neural Grammatical Error Correction built on Better Pre-training and Sequen...
A Neural Grammatical Error Correction built on Better Pre-training and Sequen...
NAVER Engineering
 
Generation of Random EMF Models for Benchmarks
Generation of Random EMF Models for BenchmarksGeneration of Random EMF Models for Benchmarks
Generation of Random EMF Models for Benchmarks
Markus Scheidgen
 
Analysis of Educational Robotics activities using a machine learning approach
Analysis of Educational Robotics activities using a machine learning approachAnalysis of Educational Robotics activities using a machine learning approach
Analysis of Educational Robotics activities using a machine learning approach
Lorenzo Cesaretti
 
From_seq2seq_to_BERT
From_seq2seq_to_BERTFrom_seq2seq_to_BERT
From_seq2seq_to_BERT
Huali Zhao
 
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual LearningSequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
MLAI2
 
Building high productivity applications
Building high productivity applicationsBuilding high productivity applications
Building high productivity applications
Hutomo Sugianto
 
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Association for Computational Linguistics
 
NL to OCL via SBVR
NL to OCL via SBVRNL to OCL via SBVR
NL to OCL via SBVR
Imran Bajwa
 

Similar a Kostiantyn Omelianchuk, Oleksandr Skurzhanskyi "Building a state-of-the-art approach to Grammatical Error Correction" (20)

ELECTRA_Pretraining Text Encoders as Discriminators rather than Generators
ELECTRA_Pretraining Text Encoders as Discriminators rather than GeneratorsELECTRA_Pretraining Text Encoders as Discriminators rather than Generators
ELECTRA_Pretraining Text Encoders as Discriminators rather than Generators
 
Free Writing - Grammatical Error Correction System: Sequence Tagging
Free Writing - Grammatical Error Correction System: Sequence TaggingFree Writing - Grammatical Error Correction System: Sequence Tagging
Free Writing - Grammatical Error Correction System: Sequence Tagging
 
IA3_presentation.pptx
IA3_presentation.pptxIA3_presentation.pptx
IA3_presentation.pptx
 
American sign language recognizer
American sign language recognizerAmerican sign language recognizer
American sign language recognizer
 
Your Own Metric System
Your Own Metric SystemYour Own Metric System
Your Own Metric System
 
BloombergGPT.pdfA Large Language Model for Finance
BloombergGPT.pdfA Large Language Model for FinanceBloombergGPT.pdfA Large Language Model for Finance
BloombergGPT.pdfA Large Language Model for Finance
 
Triantafyllia Voulibasi
Triantafyllia VoulibasiTriantafyllia Voulibasi
Triantafyllia Voulibasi
 
Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword units
 
Refactoring legacy code driven by tests - ENG
Refactoring legacy code driven by tests - ENGRefactoring legacy code driven by tests - ENG
Refactoring legacy code driven by tests - ENG
 
05-transformers.pdf
05-transformers.pdf05-transformers.pdf
05-transformers.pdf
 
LLM GPT-3: Language models are few-shot learners
LLM GPT-3: Language models are few-shot learnersLLM GPT-3: Language models are few-shot learners
LLM GPT-3: Language models are few-shot learners
 
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
 
A Neural Grammatical Error Correction built on Better Pre-training and Sequen...
A Neural Grammatical Error Correction built on Better Pre-training and Sequen...A Neural Grammatical Error Correction built on Better Pre-training and Sequen...
A Neural Grammatical Error Correction built on Better Pre-training and Sequen...
 
Generation of Random EMF Models for Benchmarks
Generation of Random EMF Models for BenchmarksGeneration of Random EMF Models for Benchmarks
Generation of Random EMF Models for Benchmarks
 
Analysis of Educational Robotics activities using a machine learning approach
Analysis of Educational Robotics activities using a machine learning approachAnalysis of Educational Robotics activities using a machine learning approach
Analysis of Educational Robotics activities using a machine learning approach
 
From_seq2seq_to_BERT
From_seq2seq_to_BERTFrom_seq2seq_to_BERT
From_seq2seq_to_BERT
 
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual LearningSequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
 
Building high productivity applications
Building high productivity applicationsBuilding high productivity applications
Building high productivity applications
 
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
 
NL to OCL via SBVR
NL to OCL via SBVRNL to OCL via SBVR
NL to OCL via SBVR
 

Más de Fwdays

"What I learned through reverse engineering", Yuri Artiukh
"What I learned through reverse engineering", Yuri Artiukh"What I learned through reverse engineering", Yuri Artiukh
"What I learned through reverse engineering", Yuri Artiukh
Fwdays
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
"Micro frontends: Unbelievably true life story", Dmytro Pavlov
"Micro frontends: Unbelievably true life story", Dmytro Pavlov"Micro frontends: Unbelievably true life story", Dmytro Pavlov
"Micro frontends: Unbelievably true life story", Dmytro Pavlov
Fwdays
 
"Objects validation and comparison using runtime types (io-ts)", Oleksandr Suhak
"Objects validation and comparison using runtime types (io-ts)", Oleksandr Suhak"Objects validation and comparison using runtime types (io-ts)", Oleksandr Suhak
"Objects validation and comparison using runtime types (io-ts)", Oleksandr Suhak
Fwdays
 
"JavaScript. Standard evolution, when nobody cares", Roman Savitskyi
"JavaScript. Standard evolution, when nobody cares", Roman Savitskyi"JavaScript. Standard evolution, when nobody cares", Roman Savitskyi
"JavaScript. Standard evolution, when nobody cares", Roman Savitskyi
Fwdays
 
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y..."How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
Fwdays
 
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
Fwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Fwdays
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
Fwdays
 
"What is a RAG system and how to build it",Dmytro Spodarets
"What is a RAG system and how to build it",Dmytro Spodarets"What is a RAG system and how to build it",Dmytro Spodarets
"What is a RAG system and how to build it",Dmytro Spodarets
Fwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
Fwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
Fwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
Fwdays
 
"Distributed graphs and microservices in Prom.ua", Maksym Kindritskyi
"Distributed graphs and microservices in Prom.ua",  Maksym Kindritskyi"Distributed graphs and microservices in Prom.ua",  Maksym Kindritskyi
"Distributed graphs and microservices in Prom.ua", Maksym Kindritskyi
Fwdays
 
"Rethinking the existing data loading and processing process as an ETL exampl...
"Rethinking the existing data loading and processing process as an ETL exampl..."Rethinking the existing data loading and processing process as an ETL exampl...
"Rethinking the existing data loading and processing process as an ETL exampl...
Fwdays
 
"How Ukrainian IT specialist can go on vacation abroad without crossing the T...
"How Ukrainian IT specialist can go on vacation abroad without crossing the T..."How Ukrainian IT specialist can go on vacation abroad without crossing the T...
"How Ukrainian IT specialist can go on vacation abroad without crossing the T...
Fwdays
 
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ..."The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...
Fwdays
 
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu..."[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...
Fwdays
 
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care..."[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...
Fwdays
 
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"..."4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...
Fwdays
 

Más de Fwdays (20)

"What I learned through reverse engineering", Yuri Artiukh
"What I learned through reverse engineering", Yuri Artiukh"What I learned through reverse engineering", Yuri Artiukh
"What I learned through reverse engineering", Yuri Artiukh
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
"Micro frontends: Unbelievably true life story", Dmytro Pavlov
"Micro frontends: Unbelievably true life story", Dmytro Pavlov"Micro frontends: Unbelievably true life story", Dmytro Pavlov
"Micro frontends: Unbelievably true life story", Dmytro Pavlov
 
"Objects validation and comparison using runtime types (io-ts)", Oleksandr Suhak
"Objects validation and comparison using runtime types (io-ts)", Oleksandr Suhak"Objects validation and comparison using runtime types (io-ts)", Oleksandr Suhak
"Objects validation and comparison using runtime types (io-ts)", Oleksandr Suhak
 
"JavaScript. Standard evolution, when nobody cares", Roman Savitskyi
"JavaScript. Standard evolution, when nobody cares", Roman Savitskyi"JavaScript. Standard evolution, when nobody cares", Roman Savitskyi
"JavaScript. Standard evolution, when nobody cares", Roman Savitskyi
 
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y..."How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
 
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
"What is a RAG system and how to build it",Dmytro Spodarets
"What is a RAG system and how to build it",Dmytro Spodarets"What is a RAG system and how to build it",Dmytro Spodarets
"What is a RAG system and how to build it",Dmytro Spodarets
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"Distributed graphs and microservices in Prom.ua", Maksym Kindritskyi
"Distributed graphs and microservices in Prom.ua",  Maksym Kindritskyi"Distributed graphs and microservices in Prom.ua",  Maksym Kindritskyi
"Distributed graphs and microservices in Prom.ua", Maksym Kindritskyi
 
"Rethinking the existing data loading and processing process as an ETL exampl...
"Rethinking the existing data loading and processing process as an ETL exampl..."Rethinking the existing data loading and processing process as an ETL exampl...
"Rethinking the existing data loading and processing process as an ETL exampl...
 
"How Ukrainian IT specialist can go on vacation abroad without crossing the T...
"How Ukrainian IT specialist can go on vacation abroad without crossing the T..."How Ukrainian IT specialist can go on vacation abroad without crossing the T...
"How Ukrainian IT specialist can go on vacation abroad without crossing the T...
 
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ..."The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...
 
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu..."[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...
 
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care..."[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...
 
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"..."4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...
 

Último

Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
jpupo2018
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 

Último (20)

Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 

Kostiantyn Omelianchuk, Oleksandr Skurzhanskyi "Building a state-of-the-art approach to Grammatical Error Correction"

  • 1.
  • 3. What is Grammarly? Grammarly’s AI-powered writing assistant helps you make your communication clear and effective, wherever you type.
  • 4. ● Correctness: grammar, spelling, punctuation, writing mechanics, and updated terminologies ● Clarity: clear writing, conciseness, complexity, and readability ● Engagement: vocabulary and variety ● Delivery: formality, politeness, and confidence Grammarly’s writing assistant
  • 5. 350+ team members across offices in San Francisco, New York, Kyiv, and Vancouver 2,000+ Institutional and enterprise clients 20M+ daily active users around the world Grammarly by the numbers
  • 6. Grammarly’s NLP team Our team which works on NLP contains 30+ people of the following roles: - applied research scientists - machine learning engineers - computational linguists
  • 7. This work This presentation is based on the paper written by: - Kostiantyn Omelianchuk - Oleksandr Skurzhanskyi - Vitaliy Atrasevych - Artem Chernodub
  • 8. What task we are trying to solve?
  • 9. The goal of GEC task is to produce the grammatically correct sentence from the sentence with mistakes. Example: Source: He go at school. Target: He goes to school. GEC: Grammatical Error Correction
  • 10. A Comprehensive Survey of Grammar Error Correction (Yu Wang*, Yuelin Wang*, Jie Liuɨ , and Zhuo Liu) GEC: History overview
  • 13. GEC: Shared task - CoNLL-2014 (the test set is composed of 50 essays written by non-native English students; metric - M2Score) - BEA-2019 (new data; the test set consists of 4477 sentences; metric - Errant)
  • 16. We approach the GEC task as a sequence tagging problem. In this formulation for each token in the source sentence a GEC system should produce a tag (edit) which represent a required correction operation for this token. For solving this problem we use non-autoregressive transformer-based model. Sequence Tagging
  • 17. keep the current token unchanged APPEND_ti DELETE REPLACE_ti KEEP delete current token append new token ti replace the current token with token ti SPLIT MERGE NOUN NUMBER CASE VERB FORM Basic transformations change the case of the current token g-transformations merge the current and the next token split the current token into two convert singular nouns to plurals and vice versa change the form of regular/irregular verbs
  • 18. Token-level transformations A ten years old boy go school A ⇨ A: ten ⇨ ten, -: , years ⇨ year, -: old ⇨ old: go ⇨ goes, to: school ⇨ school, .: A ten years old boy go school A ten years old boy go school $KEEP $MERGE_HYPH $NOUN_NUMBER_SINGULAR $VERB_FORM_VB_VBZ $APPEND_{.} $KEEP $MERGE_HYPH $KEEP $APPEND_{to} $KEEP A ten-year-old boy goes to school.
  • 19. 1. Generating tags is fully independent and easy to parallelize operation. 2. Smaller output vocabulary size compare to seq2seq models. 3. Less usage memory and faster inference compare to encoder-decoder models. Sequence Tagging: Pros
  • 20. Sequence Tagging: Cons 1. Independent generating of tags relies on assumption that errors are independent between each other. 2. Word reordering is tricky for this architecture.
  • 22. Academia vs Industry - different data (both training/evaluation) - latency/throughput matters - aggressive deadlines
  • 23. Baseline architecture Encoder Input sentence Error replacement linear layer Predicted tags Corrected sentence Error detection linear layer Postprocess Error insertion linear layer(s) Embeddings
  • 24. Baseline hyperparameters - Emedings (trainable random, glove, bert, distill-bert) - Encoder (cnn, lstm, stacked-lstm, pass_through, transformers) - Output layers (1-2 for insertions, 1 for replacement, 1 for detection) - Output vocabulary size (100-50000) - Other (dropouts, training schedule, tp/tn ratio, etc)
  • 25. Baseline results Baseline/m2 score CoNLL-2014 (test) Grammarly Spellchecker 26.8 Stacked LSTM (vocab_size=1000) 30.5 Stacked LSTM (vocab_size=1000; + g-transformations) 35.6 Stacked LSTM (vocab_size=1000; + g-transformations; + BERT embeddings) 46 Academic SOTA (single model) 61.3
  • 26. Insights - Increasing size of output vocabulary does not help - Adding BERT as emdebbings helps a lot - Training BERT with small batch_size failed (not converge); training with bigger batches required gradient accumulating
  • 28. PIE paper Our SOTA on NUCLE: 46
  • 29. PIE paper Our SOTA on NUCLE: 46
  • 30. PIE paper Our SOTA on NUCLE: 46
  • 32. Transformer Attention Is All You Need (Vaswani et al. 2017)
  • 38. Thing that made our transformers works: ● small learning rate (1e-5) ● big batch size (at least 128 sentences) or use gradient accumulation ● freeze BERT encoder first and train only linear layer, then train jointly Transformers recipe
  • 39. *on CoNLL-2014 (test) BERTology works LSTM 35.6 LSTM + BERT embeddings 46.0 DistillBERT 52.8 BERT-base 57.3
  • 41. A ten years old boy go school (zero corrections) Iteration 1: A ten-years old boy goes school (2 total corrections) Iteration 2 A ten-year-old boy goes to school (5 total corrections) Iteration 3: A ten-year-old boy goes to school. (6 total corrections) Iteration 2 Iterative Approach Iteration 1 Iteration 3 Sequence
  • 42. GECToR model: iterative pipeline Encoder (transformer-based) Input sentence Error correction linear layer Predicted tags Corrected sentence Error detection linear layer Postprocess Repeat N times
  • 43. ● Training is splitted in N stages ● Each stage has its own data ● Each stage has its own hyperparameters ● On each stage model is initialized by the best weights of previous stage Staged training
  • 44. I. Pre-training on synthetic errorful sentences as in (Awasthi et al., 2019) II. Fine-tuning on errorful-only sentences III. Fine-tuning on subset of errorful and errorfree sentences as in (Kiyono et al., 2019) Training stages Dataset # sentences % errorful sentences PIE-synthetic 9,000,000 100.0% Lang-8 947,344 52.5% NUCLE 56,958 38.0% FCE 34,490 62.4% W&I+LOCNESS 34,304 67.3%
  • 45. I. Pre-training on synthetic errorful sentences as in (Awasthi et al., 2019) II. Fine-tuning on errorful-only sentences(new!) III. Fine-tuning on subset of errorful and errorfree sentences as in (Kiyono et al., 2019) Training stages Dataset # sentences % errorful sentences PIE-synthetic 9,000,000 100.0% Lang-8 947,344 52.5% NUCLE 56,958 38.0% FCE 34,490 62.4% W&I+LOCNESS 34,304 67.3%
  • 46. I. Pre-training on synthetic errorful sentences as in (Awasthi et al., 2019) II. Fine-tuning on errorful-only sentences III. Fine-tuning on subset of errorful and errorfree sentences as in (Kiyono et al., 2019) Training stages Dataset # sentences % errorful sentences PIE-synthetic 9,000,000 100.0% Lang-8 947,344 52.5% NUCLE 56,958 38.0% FCE 34,490 62.4% W&I+LOCNESS 34,304 67.3%
  • 47. ● Adam optimizer (Kingma and Ba, 2015); ● Early stopping after 3 epochs of 10K updates each w/a improvement; ● Epochs & batch sizes: ○ Stage I (pretraining): batch size=256, 20 epochs ○ Stages II, III (finetuning): batch size=128, 2-3 epochs Training details Training stage # CoNLL-2014 (test)*, F0.5 BEA-2019 (dev)*, F0.5 I 49.9 33.2 II 59.7 44.4 III 62.5 50.3 III + Inf. tweaks 65.3 55.5 * Results are given for GECToR (XLNet).
  • 48. Inference tweaks 1 By increasing probability of $KEEP tag we can force model to make only confident actions. In such a way, we can increase precision by trading recall.
  • 49. Inference tweaks 2 We also compute the minimum probability of incorrect class across all tokens in the sentence. This value (min_erorr_probability) should be higher than threshold in order to run next iteration.
  • 51. Varying encoders from pretrained transformers * Training was performed on data from training stage II only. ** Baseline. Encoder CONLL-2014 (test) BEA-2019 (test) P R F0.5 P R F0.5 LSTM** 51.6 15.3 35.0 - - - ALBERT 59.5 31.0 50.3 43.8 22.3 36.7 BERT 65.6 36.9 56.8 48.3 29.0 42.6 GPT-2 61.0 6.3 22.2 44.5 5.0 17.2 RoBERTa 67.5 38.3 58.6 50.3 30.5 44.5 XLNet 64.6 42.6 58.6 47.1 34.2 43.8
  • 52. Varying encoders from pretrained transformers * Training was performed on data from training stage II only. ** Baseline. Encoder CONLL-2014 (test) BEA-2019 (test) P R F0.5 P R F0.5 LSTM** 51.6 15.3 35.0 - - - ALBERT 59.5 31.0 50.3 43.8 22.3 36.7 BERT 65.6 36.9 56.8 48.3 29.0 42.6 GPT-2 61.0 6.3 22.2 44.5 5.0 17.2 RoBERTa 67.5 38.3 58.6 50.3 30.5 44.5 XLNet 64.6 42.6 58.6 47.1 34.2 43.8
  • 54. Speed comparison ● NVIDIA Tesla V100 ● CoNLL-2014 (test) ● single model ● batch size=128
  • 61. Writing paper - It takes us approx 2 months for 4 authors to write 4 pages - Better to start as early as possible
  • 64. Links ● The paper ● The code ● GEC survey