The goal of moral understanding is to grasp the theoretical concepts embedded in a narrative by delving beyond the concrete occurrences and dynamic personas. Specifically, the narrative is compacted into a single statement without involving any characters within the original text, necessitating a more astute language model that can comprehend connotative morality and exhibit commonsense reasoning. The “pre-training + fine-tuning” paradigm is widely embraced in neural language models. In this paper, we propose an intermediary phase to establish an improved paradigm of “pre-training + further pre-training + fine-tuning”. Further pre-training generally refers to continual learning on task-specific or domain-relevant corpora before being applied to target tasks, which aims at bridging the gap in data distribution between the phases of pre-training and fine-tuning. Our work is based on a Chinese dataset named STORAL-ZH that composes of 4k human-written story-moral pairs. Furthermore, we design a two-step process of domain-adaptive pre-training in the intermediary phase. The first step depends on a newly-collected Chinese dataset of Confucian moral culture. And the second step bases on the Chinese version of a frequently-used commonsense knowledge graph (i.e. ATOMIC) to enrich the backbone model with inferential knowledge besides morality. By comparison with several advanced models including BERT-base, RoBERTa-base and T5-base, experimental results on two understanding tasks demonstrate the effectiveness of our proposed three-phase paradigm.
May 2024 - Top10 Cited Articles in Natural Language Computingkevig
Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze, understand, and generate languages that humans use naturally to address computers.
This paper discusses the capabilities and limitations of GPT-3 (0), a state-of-the-art language model, in the
context of text understanding. We begin by describing the architecture and training process of GPT-3, and
provide an overview of its impressive performance across a wide range of natural language processing
tasks, such as language translation, question-answering, and text completion. Throughout this research
project, a summarizing tool was also created to help us retrieve content from any types of document,
specifically IELTS (0) Reading Test data in this project. We also aimed to improve the accuracy of the
summarizing, as well as question-answering capabilities of GPT-3 (0) via long text
1) The document discusses a system called MaLTe (Machine Learning from Text) that aims to extract knowledge from technical expository texts using both natural language processing and machine learning techniques.
2) MaLTe will process texts containing narratives and examples, and output a representation of the knowledge in the form of Horn clauses. Some user interaction will be required during the translation process.
3) The document outlines several challenges in applying machine learning and natural language processing to knowledge extraction from real-world texts, including their logical structure and examples. It provides an example from a tax guide to illustrate these challenges.
The document describes a comparative study of various machine learning and neural network models for detecting abusive language on Twitter. It finds that a bidirectional GRU network trained on word-level features, with a Latent Topic Clustering module, achieves the most accurate results with an F1 score of 0.805 for detecting abusive tweets. Additionally, it explores using context tweets as additional features and finds this improves some models' performance.
Measuring massive multitask language understandingSan Kim
a new test to measure a text model's multitask accuracy. The test covers 57 tasks including elementary mathematics, US history, computer science, law, and more.
Chunker Based Sentiment Analysis and Tense Classification for Nepali Textkevig
The article represents the Sentiment Analysis (SA) and Tense Classification using Skip gram model for the word to vector encoding on Nepali language. The experiment on SA for positive-negative classification is carried out in two ways. In the first experiment the vector representation of each sentence is generated by using Skip-gram model followed by the Multi-Layer Perceptron (MLP) classification and it is observed that the F1 score of 0.6486 is achieved for positive-negative classification with overall accuracy of 68%. Whereas in the second experiment the verb chunks are extracted using Nepali parser and carried out the similar experiment on the verb chunks. F1 scores of 0.6779 is observed for positive -negative classification with overall accuracy of 85%. Hence, Chunker based sentiment analysis is proven to be better than sentiment analysis using sentences. This paper also proposes using a skip-gram model to identify the tenses of Nepali sentences and verbs. In the third experiment, the vector representation of each sentence is generated by using Skip-gram model followed by the Multi-Layer Perceptron (MLP)classification and it is observed that verb chunks had very low overall accuracy of 53%. In the fourth experiment, conducted for Tense Classification using Sentences resulted in improved efficiency with overall accuracy of 89%. Past tenses were identified and classified more accurately than other tenses. Hence, sentence based tense classification is proven to be better than verb Chunker based sentiment analysis.
Chunker Based Sentiment Analysis and Tense Classification for Nepali Textkevig
The article represents the Sentiment Analysis (SA) and Tense Classification using Skip gram model for the word to vector encoding on Nepali language. The experiment on SA for positive-negative classification is carried out in two ways. In the first experiment the vector representation of each sentence is generated by using Skip-gram model followed by the Multi-Layer Perceptron (MLP) classification and it is observed that the F1 score of 0.6486 is achieved for positive-negative classification with overall accuracy of 68%. Whereas in the second experiment the verb chunks are extracted using Nepali parser and carried out the similar experiment on the verb chunks. F1 scores of 0.6779 is observed for positive -negative classification with overall accuracy of 85%. Hence, Chunker based sentiment analysis is proven to be better than sentiment analysis using sentences. This paper also proposes using a skip-gram model to identify the tenses of Nepali sentences and verbs. In the third experiment, the vector representation of each sentence is generated by using Skip-gram model followed by the Multi-Layer Perceptron (MLP)classification and it is observed that verb chunks had very low overall accuracy of 53%. In the fourth experiment, conducted for Tense Classification using Sentences resulted in improved efficiency with overall accuracy of 89%. Past tenses were identified and classified more accurately than other tenses. Hence, sentence based tense classification is proven to be better than verb Chunker based sentiment analysis.
This document discusses natural language processing (NLP) and its role in augmentative and alternative communication. It begins with introducing NLP as a field of artificial intelligence that aims to allow computers to understand and process human language. It then describes augmentative and alternative communication as methods of non-verbal communication that are useful for those with speech impairments. The document goes on to discuss the different levels of NLP processing from phonology to pragmatics. It also outlines common approaches to NLP, including symbolic and statistical methods. The role of NLP is seen as enabling computerized communication to support augmentative communication.
May 2024 - Top10 Cited Articles in Natural Language Computingkevig
Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze, understand, and generate languages that humans use naturally to address computers.
This paper discusses the capabilities and limitations of GPT-3 (0), a state-of-the-art language model, in the
context of text understanding. We begin by describing the architecture and training process of GPT-3, and
provide an overview of its impressive performance across a wide range of natural language processing
tasks, such as language translation, question-answering, and text completion. Throughout this research
project, a summarizing tool was also created to help us retrieve content from any types of document,
specifically IELTS (0) Reading Test data in this project. We also aimed to improve the accuracy of the
summarizing, as well as question-answering capabilities of GPT-3 (0) via long text
1) The document discusses a system called MaLTe (Machine Learning from Text) that aims to extract knowledge from technical expository texts using both natural language processing and machine learning techniques.
2) MaLTe will process texts containing narratives and examples, and output a representation of the knowledge in the form of Horn clauses. Some user interaction will be required during the translation process.
3) The document outlines several challenges in applying machine learning and natural language processing to knowledge extraction from real-world texts, including their logical structure and examples. It provides an example from a tax guide to illustrate these challenges.
The document describes a comparative study of various machine learning and neural network models for detecting abusive language on Twitter. It finds that a bidirectional GRU network trained on word-level features, with a Latent Topic Clustering module, achieves the most accurate results with an F1 score of 0.805 for detecting abusive tweets. Additionally, it explores using context tweets as additional features and finds this improves some models' performance.
Measuring massive multitask language understandingSan Kim
a new test to measure a text model's multitask accuracy. The test covers 57 tasks including elementary mathematics, US history, computer science, law, and more.
Chunker Based Sentiment Analysis and Tense Classification for Nepali Textkevig
The article represents the Sentiment Analysis (SA) and Tense Classification using Skip gram model for the word to vector encoding on Nepali language. The experiment on SA for positive-negative classification is carried out in two ways. In the first experiment the vector representation of each sentence is generated by using Skip-gram model followed by the Multi-Layer Perceptron (MLP) classification and it is observed that the F1 score of 0.6486 is achieved for positive-negative classification with overall accuracy of 68%. Whereas in the second experiment the verb chunks are extracted using Nepali parser and carried out the similar experiment on the verb chunks. F1 scores of 0.6779 is observed for positive -negative classification with overall accuracy of 85%. Hence, Chunker based sentiment analysis is proven to be better than sentiment analysis using sentences. This paper also proposes using a skip-gram model to identify the tenses of Nepali sentences and verbs. In the third experiment, the vector representation of each sentence is generated by using Skip-gram model followed by the Multi-Layer Perceptron (MLP)classification and it is observed that verb chunks had very low overall accuracy of 53%. In the fourth experiment, conducted for Tense Classification using Sentences resulted in improved efficiency with overall accuracy of 89%. Past tenses were identified and classified more accurately than other tenses. Hence, sentence based tense classification is proven to be better than verb Chunker based sentiment analysis.
Chunker Based Sentiment Analysis and Tense Classification for Nepali Textkevig
The article represents the Sentiment Analysis (SA) and Tense Classification using Skip gram model for the word to vector encoding on Nepali language. The experiment on SA for positive-negative classification is carried out in two ways. In the first experiment the vector representation of each sentence is generated by using Skip-gram model followed by the Multi-Layer Perceptron (MLP) classification and it is observed that the F1 score of 0.6486 is achieved for positive-negative classification with overall accuracy of 68%. Whereas in the second experiment the verb chunks are extracted using Nepali parser and carried out the similar experiment on the verb chunks. F1 scores of 0.6779 is observed for positive -negative classification with overall accuracy of 85%. Hence, Chunker based sentiment analysis is proven to be better than sentiment analysis using sentences. This paper also proposes using a skip-gram model to identify the tenses of Nepali sentences and verbs. In the third experiment, the vector representation of each sentence is generated by using Skip-gram model followed by the Multi-Layer Perceptron (MLP)classification and it is observed that verb chunks had very low overall accuracy of 53%. In the fourth experiment, conducted for Tense Classification using Sentences resulted in improved efficiency with overall accuracy of 89%. Past tenses were identified and classified more accurately than other tenses. Hence, sentence based tense classification is proven to be better than verb Chunker based sentiment analysis.
This document discusses natural language processing (NLP) and its role in augmentative and alternative communication. It begins with introducing NLP as a field of artificial intelligence that aims to allow computers to understand and process human language. It then describes augmentative and alternative communication as methods of non-verbal communication that are useful for those with speech impairments. The document goes on to discuss the different levels of NLP processing from phonology to pragmatics. It also outlines common approaches to NLP, including symbolic and statistical methods. The role of NLP is seen as enabling computerized communication to support augmentative communication.
The document describes a study that used large language models (LLMs) like GPT-3 and GPT-4 to complete knowledge from Wikidata. The researchers developed a pipeline called LLMKE that combines knowledge probing and Wikidata entity mapping. They were able to achieve a macro-averaged F1-score of 0.701 on the ISWC 2023 LM-KBC Challenge, with scores ranging from 1.00 to 0.328 depending on the domain. The results show that LLMs have varying knowledge depending on the domain and more work is needed to determine when they can be used for automatic knowledge base completion and correction.
Cooperating Techniques for Extracting Conceptual Taxonomies from TextFulvio Rotella
The document proposes a mixed approach using existing natural language processing techniques and novel techniques to automatically construct conceptual taxonomies from text. It identifies relevant concepts from text using keyword extraction, clustering, and computing relevance weights. It then generalizes similar concepts using WordNet to group concepts and disambiguate word senses. Preliminary evaluations show promising initial results.
The document proposes a mixed approach using existing natural language processing techniques and novel techniques to automatically construct conceptual taxonomies from text. Key steps include identifying relevant concepts and attributes from text, clustering similar concepts, computing relevance weights for concepts, and generalizing concepts using WordNet. Preliminary results suggest the approach shows promise for extending and improving automatic taxonomy construction.
A memory based processing approach to training studentAlexander Decker
This document discusses the role of memory in student interpreter training. It argues that adequate training of student's memory abilities enables them to better comprehend and produce oral texts when interpreting. The study hypothesizes that failures in interpreting can be attributed to weaknesses in an interpreter's short-term and long-term memory. Specifically, their ability to store information from the source text and retrieve it later during target text production. The document outlines different memory systems involved in oral text processing and proposes that interpreter training focus on educational approaches that emphasize the interpreting process rather than just the product.
A memory based processing approach to training studentAlexander Decker
This document discusses the role of memory in interpreting and how training student interpreters' memory can improve their interpreting ability. It presents two hypotheses: 1) Adequately training students' memory allows them to enhance translation ability, overcome problems, and avoid failure; 2) Failure can be attributed to the role of memory in comprehending and producing oral texts, so the more memory is trained, the easier interpreting becomes. The study analyzes data from student and lecturer questionnaires and student interpreting tests to investigate how memory influences the interpreting process and product. It finds that while lecturers are aware of some memory-based strategies, greater focus on memory training could help students perform better.
Psychological processes in language acquisitionsrnaz
This document discusses various theories regarding language acquisition and representation, including the symbolic vs. connectionist contrast, nativist vs. non-nativist approaches, Universal Grammar (UG), and the poverty of stimulus argument. It also describes connectionist models like parallel distributed processing networks and the competition model. Key aspects of these theories are explained in detail, with discussion of their limitations and criticisms.
Association Rule Mining Based Extraction of Semantic Relations Using Markov L...IJwest
Ontology may be a conceptualization of a website into a human understandable, however machine-readable format consisting of entities, attributes, relationships and axioms. Ontologies formalize the intentional aspects of a site, whereas the denotative part is provided by a mental object that contains assertions about instances of concepts and relations. Semantic relation it might be potential to extract the whole family-tree of a outstanding personality employing a resource like Wikipedia. In a way, relations describe the linguistics relationships among the entities involve that is beneficial for a higher understanding of human language. The relation can be identified from the result of concept hierarchy extraction. The existing ontology learning process only produces the result of concept hierarchy extraction. It does not produce the semantic relation between the concepts. Here, we have to do the process of constructing the predicates and also first order logic formula. Here, also find the inference and learning weights using Markov Logic Network. To improve the relation of every input and also improve the relation between the contents we have to propose the concept of ARSRE. This method can find the frequent items between concepts and converting the extensibility of existing lightweight ontologies to formal one. The experimental results can produce the good extraction of semantic relations compared to state-of-art method.
The Power of Natural Language Processing (NLP) | Enterprise WiredEnterprise Wired
This comprehensive guide delves into the intricacies of Natural Language Processing, exploring its foundational concepts, applications across diverse industries, challenges, and the cutting-edge advancements shaping the future of this dynamic field.
The document discusses the development of corpora at the University of Nottingham, including both mono-modal corpora containing one type of data (text-based) and multi-modal corpora containing multiple data types (text, video, audio). It describes the Nottingham Multi-Modal Corpus and Nottingham Learner Corpora as examples. The Nottingham eLanguage Corpus aims to collect diverse digital language data types from individuals, including SMS, email, social media, web browsing history, and location data. One challenge is modeling how language varies based on dynamic contextual factors. As a case study, the document outlines the Thrill corpus containing synchronized audio, video and sensor data from fairground rides, to examine linguistic patterns across different phases
Comparative Analysis of Existing and a Novel Approach to Topic Detection on C...kevig
Topic detection in dialogue datasets has become a significant challenge for unsupervised
and unlabeled data to develop a cohesive and engaging dialogue system. In this paper, we
proposed unsupervised and semi-supervised techniques for topic detection in the conversational dialogue dataset and compared them with existing topic detection techniques. The
paper proposes a novel approach for topic detection, which takes preprocessed data as an
input and performs similarity analysis with the TF-IDF scores bag of words technique
(BOW) to identify higher frequency words from dialogue utterances. It then refines the
higher frequency words by integrating the clustering and elbow methods and using the Parallel Latent Dirichlet Allocation (PLDA) model to detect the topics. The paper comprised a
comparative analysis of the proposed approach on the Switchboard, Personachat and MultiWOZ dataset. The experimental results show that the proposed topic detection approach
performs significantly better using a semi-supervised dialogue dataset. We also performed
topic quantification to check how accurate extracted topics are to compare with manually
annotated data. For example, extracted topics from Switchboard are 92.72%, Peronachat
87.31% and MultiWOZ 93.15% accurate with manually annotated data.
Comparative Analysis of Existing and a Novel Approach to Topic Detection on C...kevig
Topic detection in dialogue datasets has become a significant challenge for unsupervised
and unlabeled data to develop a cohesive and engaging dialogue system. In this paper, we
proposed unsupervised and semi-supervised techniques for topic detection in the conversational dialogue dataset and compared them with existing topic detection techniques. The
paper proposes a novel approach for topic detection, which takes preprocessed data as an
input and performs similarity analysis with the TF-IDF scores bag of words technique
(BOW) to identify higher frequency words from dialogue utterances. It then refines the
higher frequency words by integrating the clustering and elbow methods and using the Parallel Latent Dirichlet Allocation (PLDA) model to detect the topics. The paper comprised a
comparative analysis of the proposed approach on the Switchboard, Personachat and MultiWOZ dataset. The experimental results show that the proposed topic detection approach
performs significantly better using a semi-supervised dialogue dataset. We also performed
topic quantification to check how accurate extracted topics are to compare with manually
annotated data. For example, extracted topics from Switchboard are 92.72%, Peronachat
87.31% and MultiWOZ 93.15% accurate with manually annotated data.
This document provides an overview of natural language processing (NLP) research trends presented at ACL 2020, including shifting away from large labeled datasets towards unsupervised and data augmentation techniques. It discusses the resurgence of retrieval models combined with language models, the focus on explainable NLP models, and reflections on current achievements and limitations in the field. Key papers on BERT and XLNet are summarized, outlining their main ideas and achievements in advancing the state-of-the-art on various NLP tasks.
Association Rule Mining Based Extraction of Semantic Relations Using Markov ...dannyijwest
Ontology may be a conceptualization of a website into a human understandable, however machine-
readable format consisting of entities, attributes, relationships and axioms. Ontologies formalize the
intentional aspects of a site, whereas the denotative part is provided by a mental object that contains
assertions about instances of concepts and relations. Semantic relation it might be potential to extract the
whole family-tree of a outstanding personality employing a resource like Wikipedia. In a way, relations
describe the linguistics relationships among the entities involve that is beneficial for a higher
understanding of human language. The relation can be identified from the result of concept hierarchy
extraction. The existing ontology learning process only produces the result of concept hierarchy extraction.
It does not produce the semantic relation between the concepts. Here, we have to do the process of
constructing the predicates and also first order logic formula. Here, also find the inference and learning
weights using Markov Logic Network. To improve the relation of every input and also improve the relation
between the contents we have to propose the concept of ARSRE.
Natural Language Processing Theory, Applications and Difficultiesijtsrd
The promise of a powerful computing device to help people in productivity as well as in recreation can only be realized with proper human machine communication. Automatic recognition and understanding of spoken language is the first step toward natural human machine interaction. Research in this field has produced remarkable results, leading to many exciting expectations and new challenges. This field is known as Natural language Processing. In this paper the natural language generation and Natural language understanding is discussed. Difficulties in NLU, applications and comparison with structured programming language are also discussed here. Mrs. Anjali Gharat | Mrs. Helina Tandel | Mr. Ketan Bagade "Natural Language Processing Theory, Applications and Difficulties" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-6 , October 2019, URL: https://www.ijtsrd.com/papers/ijtsrd28092.pdf Paper URL: https://www.ijtsrd.com/engineering/computer-engineering/28092/natural-language-processing-theory-applications-and-difficulties/mrs-anjali-gharat
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...ijtsrd
This document presents a method for generating suggestions for specific erroneous parts of sentences in Indian languages like Malayalam using deep learning. The method uses recurrent neural networks with long short-term memory layers to train a model on input-output examples of sentences and their corrections. The model takes in preprocessed sentence data and generates a set of possible corrections for erroneous parts through multiple network layers. An analysis of the model shows that it can accurately generate suggestions for word length of three, but requires more data and study to handle the complex morphology and symbols of Malayalam. The performance of the method is limited by the hardware used and it could be improved with a more powerful system and additional training data.
French machine reading for question answeringAli Kabbadj
This paper proposes to unlock the main barrier to machine reading and comprehension French natural language texts. This open the way to machine to find to a question a precise answer buried in the mass of unstructured French texts. Or to create a universal French chatbot. Deep learning has produced extremely promising results for various tasks in natural language understanding particularly topic classification, sentiment analysis, question answering, and language translation. But to be effective Deep Learning methods need very large training da-tasets. Until now these technics cannot be actually used for French texts Question Answering (Q&A) applications since there was not a large Q&A training dataset. We produced a large (100 000+) French training Dataset for Q&A by translating and adapting the English SQuAD v1.1 Dataset, a GloVe French word and character embed-ding vectors from Wikipedia French Dump. We trained and evaluated of three different Q&A neural network ar-chitectures in French and carried out a French Q&A models with F1 score around 70%.
A DECADE OF USING HYBRID INFERENCE SYSTEMS IN NLP (2005 – 2015): A SURVEYijaia
In today’s world of digital media, connecting millions of users, large amounts of information is being
generated. These are potential mines of knowledge and could give deep insights about the trends of both
social and scientific value. However, owing to the fact that most of this is highly unstructured, we cannot
make any sense of it. Natural language processing (NLP) is a serious attempt in this direction to organise
the textual matter which is in a human understandable form (natural language) in a meaningful and
insightful way. In this, text entailment can be considered a key component in verifying or proving the
correctness or efficiency of this organisation. This paper tries to make a survey of various text entailment
methods proposed giving a comparative picture based on certain criteria like robustness and semantic
precision.
Using ontology for natural language processing can help mediate human-machine communication. Ontologies provide a formal representation of knowledge that can be used in natural language processing. The document discusses how general and specific ontologies can be combined and used with natural language processing techniques like machine translation and speech recognition. It also describes how ontologies can be extended over time using artificial neural networks to improve natural language understanding abilities.
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATIONijaia
Complicated policy texts require a lot of effort to read, so there is a need for intelligent interpretation of
Chinese policies. To better solve the Chinese Text Summarization task, this paper utilized the mT5 model
as the core framework and initial weights. Additionally, In addition, this paper reduced the model size
through parameter clipping, used the Gap Sentence Generation (GSG) method as unsupervised method,
and improved the Chinese tokenizer. After training on a meticulously processed 30GB Chinese training
corpus, the paper developed the enhanced mT5-GSG model. Then, when fine-tuning the Chinese Policy
text, this paper chose the idea of “Dropout Twice”, and innovatively combined the probability distribution
of the two Dropouts through the Wasserstein distance. Experimental results indicate that the proposed
model achieved Rouge-1, Rouge-2, and Rouge-L scores of 56.13%, 45.76%, and 56.41% respectively on
the Chinese policy text summarization dataset.
Identification and Classification of Named Entities in Indian Languageskevig
The process of identification of Named Entities (NEs) in a given document and then there classification into
different categories of NEs is referred to as Named Entity Recognition (NER). We need to do a great effort
in order to perform NER in Indian languages and achieve the same or higher accuracy as that obtained by
English and the European languages. In this paper, we have presented the results that we have achieved by
performing NER in Hindi, Bengali and Telugu using Hidden Markov Model (HMM) and Performance
Metrics.
Effect of Query Formation on Web Search Engine Resultskevig
Query in a search engine is generally based on natural language. A query can be expressed in more than
one way without changing its meaning as it depends on thinking of human being at a particular moment.
Aim of the searcher is to get most relevant results immaterial of how the query has been expressed. In the
present paper, we have examined the results of search engine for change in coverage and similarity of first
few results when a query is entered in two semantically same but in different formats. Searching has been
made through Google search engine. Fifteen pairs of queries have been chosen for the study. The t-test has
been used for the purpose and the results have been checked on the basis of total documents found,
similarity of first five and first ten documents found in the results of a query entered in two different
formats. It has been found that the total coverage is same but first few results are significantly different.
Más contenido relacionado
Similar a Understanding Chinese Moral Stories with Further Pre-Training
The document describes a study that used large language models (LLMs) like GPT-3 and GPT-4 to complete knowledge from Wikidata. The researchers developed a pipeline called LLMKE that combines knowledge probing and Wikidata entity mapping. They were able to achieve a macro-averaged F1-score of 0.701 on the ISWC 2023 LM-KBC Challenge, with scores ranging from 1.00 to 0.328 depending on the domain. The results show that LLMs have varying knowledge depending on the domain and more work is needed to determine when they can be used for automatic knowledge base completion and correction.
Cooperating Techniques for Extracting Conceptual Taxonomies from TextFulvio Rotella
The document proposes a mixed approach using existing natural language processing techniques and novel techniques to automatically construct conceptual taxonomies from text. It identifies relevant concepts from text using keyword extraction, clustering, and computing relevance weights. It then generalizes similar concepts using WordNet to group concepts and disambiguate word senses. Preliminary evaluations show promising initial results.
The document proposes a mixed approach using existing natural language processing techniques and novel techniques to automatically construct conceptual taxonomies from text. Key steps include identifying relevant concepts and attributes from text, clustering similar concepts, computing relevance weights for concepts, and generalizing concepts using WordNet. Preliminary results suggest the approach shows promise for extending and improving automatic taxonomy construction.
A memory based processing approach to training studentAlexander Decker
This document discusses the role of memory in student interpreter training. It argues that adequate training of student's memory abilities enables them to better comprehend and produce oral texts when interpreting. The study hypothesizes that failures in interpreting can be attributed to weaknesses in an interpreter's short-term and long-term memory. Specifically, their ability to store information from the source text and retrieve it later during target text production. The document outlines different memory systems involved in oral text processing and proposes that interpreter training focus on educational approaches that emphasize the interpreting process rather than just the product.
A memory based processing approach to training studentAlexander Decker
This document discusses the role of memory in interpreting and how training student interpreters' memory can improve their interpreting ability. It presents two hypotheses: 1) Adequately training students' memory allows them to enhance translation ability, overcome problems, and avoid failure; 2) Failure can be attributed to the role of memory in comprehending and producing oral texts, so the more memory is trained, the easier interpreting becomes. The study analyzes data from student and lecturer questionnaires and student interpreting tests to investigate how memory influences the interpreting process and product. It finds that while lecturers are aware of some memory-based strategies, greater focus on memory training could help students perform better.
Psychological processes in language acquisitionsrnaz
This document discusses various theories regarding language acquisition and representation, including the symbolic vs. connectionist contrast, nativist vs. non-nativist approaches, Universal Grammar (UG), and the poverty of stimulus argument. It also describes connectionist models like parallel distributed processing networks and the competition model. Key aspects of these theories are explained in detail, with discussion of their limitations and criticisms.
Association Rule Mining Based Extraction of Semantic Relations Using Markov L...IJwest
Ontology may be a conceptualization of a website into a human understandable, however machine-readable format consisting of entities, attributes, relationships and axioms. Ontologies formalize the intentional aspects of a site, whereas the denotative part is provided by a mental object that contains assertions about instances of concepts and relations. Semantic relation it might be potential to extract the whole family-tree of a outstanding personality employing a resource like Wikipedia. In a way, relations describe the linguistics relationships among the entities involve that is beneficial for a higher understanding of human language. The relation can be identified from the result of concept hierarchy extraction. The existing ontology learning process only produces the result of concept hierarchy extraction. It does not produce the semantic relation between the concepts. Here, we have to do the process of constructing the predicates and also first order logic formula. Here, also find the inference and learning weights using Markov Logic Network. To improve the relation of every input and also improve the relation between the contents we have to propose the concept of ARSRE. This method can find the frequent items between concepts and converting the extensibility of existing lightweight ontologies to formal one. The experimental results can produce the good extraction of semantic relations compared to state-of-art method.
The Power of Natural Language Processing (NLP) | Enterprise WiredEnterprise Wired
This comprehensive guide delves into the intricacies of Natural Language Processing, exploring its foundational concepts, applications across diverse industries, challenges, and the cutting-edge advancements shaping the future of this dynamic field.
The document discusses the development of corpora at the University of Nottingham, including both mono-modal corpora containing one type of data (text-based) and multi-modal corpora containing multiple data types (text, video, audio). It describes the Nottingham Multi-Modal Corpus and Nottingham Learner Corpora as examples. The Nottingham eLanguage Corpus aims to collect diverse digital language data types from individuals, including SMS, email, social media, web browsing history, and location data. One challenge is modeling how language varies based on dynamic contextual factors. As a case study, the document outlines the Thrill corpus containing synchronized audio, video and sensor data from fairground rides, to examine linguistic patterns across different phases
Comparative Analysis of Existing and a Novel Approach to Topic Detection on C...kevig
Topic detection in dialogue datasets has become a significant challenge for unsupervised
and unlabeled data to develop a cohesive and engaging dialogue system. In this paper, we
proposed unsupervised and semi-supervised techniques for topic detection in the conversational dialogue dataset and compared them with existing topic detection techniques. The
paper proposes a novel approach for topic detection, which takes preprocessed data as an
input and performs similarity analysis with the TF-IDF scores bag of words technique
(BOW) to identify higher frequency words from dialogue utterances. It then refines the
higher frequency words by integrating the clustering and elbow methods and using the Parallel Latent Dirichlet Allocation (PLDA) model to detect the topics. The paper comprised a
comparative analysis of the proposed approach on the Switchboard, Personachat and MultiWOZ dataset. The experimental results show that the proposed topic detection approach
performs significantly better using a semi-supervised dialogue dataset. We also performed
topic quantification to check how accurate extracted topics are to compare with manually
annotated data. For example, extracted topics from Switchboard are 92.72%, Peronachat
87.31% and MultiWOZ 93.15% accurate with manually annotated data.
Comparative Analysis of Existing and a Novel Approach to Topic Detection on C...kevig
Topic detection in dialogue datasets has become a significant challenge for unsupervised
and unlabeled data to develop a cohesive and engaging dialogue system. In this paper, we
proposed unsupervised and semi-supervised techniques for topic detection in the conversational dialogue dataset and compared them with existing topic detection techniques. The
paper proposes a novel approach for topic detection, which takes preprocessed data as an
input and performs similarity analysis with the TF-IDF scores bag of words technique
(BOW) to identify higher frequency words from dialogue utterances. It then refines the
higher frequency words by integrating the clustering and elbow methods and using the Parallel Latent Dirichlet Allocation (PLDA) model to detect the topics. The paper comprised a
comparative analysis of the proposed approach on the Switchboard, Personachat and MultiWOZ dataset. The experimental results show that the proposed topic detection approach
performs significantly better using a semi-supervised dialogue dataset. We also performed
topic quantification to check how accurate extracted topics are to compare with manually
annotated data. For example, extracted topics from Switchboard are 92.72%, Peronachat
87.31% and MultiWOZ 93.15% accurate with manually annotated data.
This document provides an overview of natural language processing (NLP) research trends presented at ACL 2020, including shifting away from large labeled datasets towards unsupervised and data augmentation techniques. It discusses the resurgence of retrieval models combined with language models, the focus on explainable NLP models, and reflections on current achievements and limitations in the field. Key papers on BERT and XLNet are summarized, outlining their main ideas and achievements in advancing the state-of-the-art on various NLP tasks.
Association Rule Mining Based Extraction of Semantic Relations Using Markov ...dannyijwest
Ontology may be a conceptualization of a website into a human understandable, however machine-
readable format consisting of entities, attributes, relationships and axioms. Ontologies formalize the
intentional aspects of a site, whereas the denotative part is provided by a mental object that contains
assertions about instances of concepts and relations. Semantic relation it might be potential to extract the
whole family-tree of a outstanding personality employing a resource like Wikipedia. In a way, relations
describe the linguistics relationships among the entities involve that is beneficial for a higher
understanding of human language. The relation can be identified from the result of concept hierarchy
extraction. The existing ontology learning process only produces the result of concept hierarchy extraction.
It does not produce the semantic relation between the concepts. Here, we have to do the process of
constructing the predicates and also first order logic formula. Here, also find the inference and learning
weights using Markov Logic Network. To improve the relation of every input and also improve the relation
between the contents we have to propose the concept of ARSRE.
Natural Language Processing Theory, Applications and Difficultiesijtsrd
The promise of a powerful computing device to help people in productivity as well as in recreation can only be realized with proper human machine communication. Automatic recognition and understanding of spoken language is the first step toward natural human machine interaction. Research in this field has produced remarkable results, leading to many exciting expectations and new challenges. This field is known as Natural language Processing. In this paper the natural language generation and Natural language understanding is discussed. Difficulties in NLU, applications and comparison with structured programming language are also discussed here. Mrs. Anjali Gharat | Mrs. Helina Tandel | Mr. Ketan Bagade "Natural Language Processing Theory, Applications and Difficulties" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-6 , October 2019, URL: https://www.ijtsrd.com/papers/ijtsrd28092.pdf Paper URL: https://www.ijtsrd.com/engineering/computer-engineering/28092/natural-language-processing-theory-applications-and-difficulties/mrs-anjali-gharat
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...ijtsrd
This document presents a method for generating suggestions for specific erroneous parts of sentences in Indian languages like Malayalam using deep learning. The method uses recurrent neural networks with long short-term memory layers to train a model on input-output examples of sentences and their corrections. The model takes in preprocessed sentence data and generates a set of possible corrections for erroneous parts through multiple network layers. An analysis of the model shows that it can accurately generate suggestions for word length of three, but requires more data and study to handle the complex morphology and symbols of Malayalam. The performance of the method is limited by the hardware used and it could be improved with a more powerful system and additional training data.
French machine reading for question answeringAli Kabbadj
This paper proposes to unlock the main barrier to machine reading and comprehension French natural language texts. This open the way to machine to find to a question a precise answer buried in the mass of unstructured French texts. Or to create a universal French chatbot. Deep learning has produced extremely promising results for various tasks in natural language understanding particularly topic classification, sentiment analysis, question answering, and language translation. But to be effective Deep Learning methods need very large training da-tasets. Until now these technics cannot be actually used for French texts Question Answering (Q&A) applications since there was not a large Q&A training dataset. We produced a large (100 000+) French training Dataset for Q&A by translating and adapting the English SQuAD v1.1 Dataset, a GloVe French word and character embed-ding vectors from Wikipedia French Dump. We trained and evaluated of three different Q&A neural network ar-chitectures in French and carried out a French Q&A models with F1 score around 70%.
A DECADE OF USING HYBRID INFERENCE SYSTEMS IN NLP (2005 – 2015): A SURVEYijaia
In today’s world of digital media, connecting millions of users, large amounts of information is being
generated. These are potential mines of knowledge and could give deep insights about the trends of both
social and scientific value. However, owing to the fact that most of this is highly unstructured, we cannot
make any sense of it. Natural language processing (NLP) is a serious attempt in this direction to organise
the textual matter which is in a human understandable form (natural language) in a meaningful and
insightful way. In this, text entailment can be considered a key component in verifying or proving the
correctness or efficiency of this organisation. This paper tries to make a survey of various text entailment
methods proposed giving a comparative picture based on certain criteria like robustness and semantic
precision.
Using ontology for natural language processing can help mediate human-machine communication. Ontologies provide a formal representation of knowledge that can be used in natural language processing. The document discusses how general and specific ontologies can be combined and used with natural language processing techniques like machine translation and speech recognition. It also describes how ontologies can be extended over time using artificial neural networks to improve natural language understanding abilities.
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATIONijaia
Complicated policy texts require a lot of effort to read, so there is a need for intelligent interpretation of
Chinese policies. To better solve the Chinese Text Summarization task, this paper utilized the mT5 model
as the core framework and initial weights. Additionally, In addition, this paper reduced the model size
through parameter clipping, used the Gap Sentence Generation (GSG) method as unsupervised method,
and improved the Chinese tokenizer. After training on a meticulously processed 30GB Chinese training
corpus, the paper developed the enhanced mT5-GSG model. Then, when fine-tuning the Chinese Policy
text, this paper chose the idea of “Dropout Twice”, and innovatively combined the probability distribution
of the two Dropouts through the Wasserstein distance. Experimental results indicate that the proposed
model achieved Rouge-1, Rouge-2, and Rouge-L scores of 56.13%, 45.76%, and 56.41% respectively on
the Chinese policy text summarization dataset.
Similar a Understanding Chinese Moral Stories with Further Pre-Training (20)
Identification and Classification of Named Entities in Indian Languageskevig
The process of identification of Named Entities (NEs) in a given document and then there classification into
different categories of NEs is referred to as Named Entity Recognition (NER). We need to do a great effort
in order to perform NER in Indian languages and achieve the same or higher accuracy as that obtained by
English and the European languages. In this paper, we have presented the results that we have achieved by
performing NER in Hindi, Bengali and Telugu using Hidden Markov Model (HMM) and Performance
Metrics.
Effect of Query Formation on Web Search Engine Resultskevig
Query in a search engine is generally based on natural language. A query can be expressed in more than
one way without changing its meaning as it depends on thinking of human being at a particular moment.
Aim of the searcher is to get most relevant results immaterial of how the query has been expressed. In the
present paper, we have examined the results of search engine for change in coverage and similarity of first
few results when a query is entered in two semantically same but in different formats. Searching has been
made through Google search engine. Fifteen pairs of queries have been chosen for the study. The t-test has
been used for the purpose and the results have been checked on the basis of total documents found,
similarity of first five and first ten documents found in the results of a query entered in two different
formats. It has been found that the total coverage is same but first few results are significantly different.
Investigations of the Distributions of Phonemic Durations in Hindi and Dogrikevig
Speech generation is one of the most important areas of research in speech signal processing which is now gaining a serious attention. Speech is a natural form of communication in all living things. Computers with the ability to understand speech and speak with a human like voice are expected to contribute to the development of more natural man-machine interface. However, in order to give those functions that are even closer to those of human beings, we must learn more about the mechanisms by which speech is produced and perceived, and develop speech information processing technologies that can generate a more natural sounding systems. The so described field of stud, also called speech synthesis and more prominently acknowledged as text-to-speech synthesis, originated in the mid eighties because of the emergence of DSP and the rapid advancement of VLSI techniques. To understand this field of speech, it is necessary to understand the basic theory of speech production. Every language has different phonetic alphabets and a different set of possible phonemes and their combinations.
For the analysis of the speech signal, we have carried out the recording of five speakers in Dogri (3 male and 5 females) and eight speakers in Hindi language (4 male and 4 female). For estimating the durational distributions, the mean of mean of ten instances of vowels of each speaker in both the languages has been calculated. Investigations have shown that the two durational distributions differ significantly with respect to mean and standard deviation. The duration of phoneme is speaker dependent. The whole investigation can be concluded with the end result that almost all the Dogri phonemes have shorter duration, in comparison to Hindi phonemes. The period in milli seconds of same phonemes when uttered in Hindi were found to be longer compared to when they were spoken by a person with Dogri as his mother tongue. There are many applications which are directly of indirectly related to the research being carried out. For instance the main application may be for transforming Dogri speech into Hindi and vice versa, and further utilizing this application, we can develop a speech aid to teach Dogri to children. The results may also be useful for synthesizing the phonemes of Dogri using the parameters of the phonemes of Hindi and for building large vocabulary speech recognition systems.
Effect of Singular Value Decomposition Based Processing on Speech Perceptionkevig
Speech is an important biological signal for primary mode of communication among human being and also the most natural and efficient form of exchanging information among human in speech. Speech processing is the most important aspect in signal processing. In this paper the theory of linear algebra called singular value decomposition (SVD) is applied to the speech signal. SVD is a technique for deriving important parameters of a signal. The parameters derived using SVD may further be reduced by perceptual evaluation of the synthesized speech using only perceptually important parameters, where the speech signal can be compressed so that the information can be transformed into compressed form without losing its quality. This technique finds wide applications in speech compression, speech recognition, and speech synthesis. The objective of this paper is to investigate the effect of SVD based feature selection of the input speech on the perception of the processed speech signal. The speech signal which is in the form of vowels \a\, \e\, \u\ were recorded from each of the six speakers (3 males and 3 females). The vowels for the six speakers were analyzed using SVD based processing and the effect of the reduction in singular values was investigated on the perception of the resynthesized vowels using reduced singular values. Investigations have shown that the number of singular values can be drastically reduced without significantly affecting the perception of the vowels.
Identifying Key Terms in Prompts for Relevance Evaluation with GPT Modelskevig
Relevance evaluation of a query and a passage is essential in Information Retrieval (IR). Recently, numerous studies have been conducted on tasks related to relevance judgment using Large Language Models (LLMs) such as GPT-4,
demonstrating significant improvements. However, the efficacy of LLMs is considerably influenced by the design of the prompt. The purpose of this paper is to
identify which specific terms in prompts positively or negatively impact relevance
evaluation with LLMs. We employed two types of prompts: those used in previous
research and generated automatically by LLMs. By comparing the performance of
these prompts in both few-shot and zero-shot settings, we analyze the influence of
specific terms in the prompts. We have observed two main findings from our study.
First, we discovered that prompts using the term ‘answer’ lead to more effective
relevance evaluations than those using ‘relevant.’ This indicates that a more direct
approach, focusing on answering the query, tends to enhance performance. Second,
we noted the importance of appropriately balancing the scope of ‘relevance.’ While
the term ‘relevant’ can extend the scope too broadly, resulting in less precise evaluations, an optimal balance in defining relevance is crucial for accurate assessments.
The inclusion of few-shot examples helps in more precisely defining this balance.
By providing clearer contexts for the term ‘relevance,’ few-shot examples contribute
to refine relevance criteria. In conclusion, our study highlights the significance of
carefully selecting terms in prompts for relevance evaluation with LLMs.
Identifying Key Terms in Prompts for Relevance Evaluation with GPT Modelskevig
Relevance evaluation of a query and a passage is essential in Information Retrieval (IR). Recently, numerous studies have been conducted on tasks related to relevance judgment using Large Language Models (LLMs) such as GPT-4, demonstrating significant improvements. However, the efficacy of LLMs is considerably influenced by the design of the prompt. The purpose of this paper is to identify which specific terms in prompts positively or negatively impact relevance evaluation with LLMs. We employed two types of prompts: those used in previous research and generated automatically by LLMs. By comparing the performance of these prompts in both few-shot and zero-shot settings, we analyze the influence of specific terms in the prompts. We have observed two main findings from our study. First, we discovered that prompts using the term ‘answer’ lead to more effective relevance evaluations than those using ‘relevant.’ This indicates that a more direct approach, focusing on answering the query, tends to enhance performance. Second, we noted the importance of appropriately balancing the scope of ‘relevance.’ While the term ‘relevant’ can extend the scope too broadly, resulting in less precise evaluations, an optimal balance in defining relevance is crucial for accurate assessments. The inclusion of few-shot examples helps in more precisely defining this balance. By providing clearer contexts for the term ‘relevance,’ few-shot examples contribute to refine relevance criteria. In conclusion, our study highlights the significance of carefully selecting terms in prompts for relevance evaluation with LLMs.
In recent years, great advances have been made in the speed, accuracy, and coverage of automatic word
sense disambiguator systems that, given a word appearing in a certain context, can identify the sense of
that word. In this paper we consider the problem of deciding whether same words contained in different
documents are related to the same meaning or are homonyms. Our goal is to improve the estimate of the
similarity of documents in which some words may be used with different meanings. We present three new
strategies for solving this problem, which are used to filter out homonyms from the similarity computation.
Two of them are intrinsically non-semantic, whereas the other one has a semantic flavor and can also be
applied to word sense disambiguation. The three strategies have been embedded in an article document
recommendation system that one of the most important Italian ad-serving companies offers to its customers.
Genetic Approach For Arabic Part Of Speech Taggingkevig
With the growing number of textual resources available, the ability to understand them becomes critical.
An essential first step in understanding these sources is the ability to identify the parts-of-speech in each
sentence. Arabic is a morphologically rich language, which presents a challenge for part of speech
tagging. In this paper, our goal is to propose, improve, and implement a part-of-speech tagger based on a
genetic algorithm. The accuracy obtained with this method is comparable to that of other probabilistic
approaches.
Rule Based Transliteration Scheme for English to Punjabikevig
Machine Transliteration has come out to be an emerging and a very important research area in the field of
machine translation. Transliteration basically aims to preserve the phonological structure of words. Proper
transliteration of name entities plays a very significant role in improving the quality of machine translation.
In this paper we are doing machine transliteration for English-Punjabi language pair using rule based
approach. We have constructed some rules for syllabification. Syllabification is the process to extract or
separate the syllable from the words. In this we are calculating the probabilities for name entities (Proper
names and location). For those words which do not come under the category of name entities, separate
probabilities are being calculated by using relative frequency through a statistical machine translation
toolkit known as MOSES. Using these probabilities we are transliterating our input text from English to
Punjabi.
Improving Dialogue Management Through Data Optimizationkevig
In task-oriented dialogue systems, the ability for users to effortlessly communicate with machines and computers through natural language stands as a critical advancement. Central to these systems is the dialogue manager, a pivotal component tasked with navigating the conversation to effectively meet user goals by selecting the most appropriate response. Traditionally, the development of sophisticated dialogue management has embraced a variety of methodologies, including rule-based systems, reinforcement learning, and supervised learning, all aimed at optimizing response selection in light of user inputs. This research casts a spotlight on the pivotal role of data quality in enhancing the performance of dialogue managers. Through a detailed examination of prevalent errors within acclaimed datasets, such as Multiwoz 2.1 and SGD, we introduce an innovative synthetic dialogue generator designed to control the introduction of errors precisely. Our comprehensive analysis underscores the critical impact of dataset imperfections, especially mislabeling, on the challenges inherent in refining dialogue management processes.
Document Author Classification using Parsed Language Structurekevig
Over the years there has been ongoing interest in detecting authorship of a text based on statistical properties of the text, such as by using occurrence rates of noncontextual words. In previous work, these techniques have been used, for example, to determine authorship of all of The Federalist Papers. Such methods may be useful in more modern times to detect fake or AI authorship. Progress in statistical natural language parsers introduces the possibility of using grammatical structure to detect authorship. In this paper we explore a new possibility for detecting authorship using grammatical structural information extracted using a statistical natural language parser. This paper provides a proof of concept, testing author classification based on grammatical structure on a set of “proof texts,” The Federalist Papers and Sanditon which have been as test cases in previous authorship detection studies. Several features extracted from the statisticalnaturallanguage parserwere explored: all subtrees of some depth from any level; rooted subtrees of some depth, part of speech, and part of speech by level in the parse tree. It was found to be helpful to project the features into a lower dimensional space. Statistical experiments on these documents demonstrate that information from a statistical parser can, in fact, assist in distinguishing authors.
Rag-Fusion: A New Take on Retrieval Augmented Generationkevig
Infineon has identified a need for engineers, account managers, and customers to rapidly obtain product information. This problem is traditionally addressed with retrieval-augmented generation (RAG) chatbots, but in this study, I evaluated the use of the newly popularized RAG-Fusion method. RAG-Fusion combines RAG and reciprocal rank fusion (RRF) by generating multiple queries, reranking them with reciprocal scores and fusing the documents and scores. Through manually evaluating answers on accuracy, relevance, and comprehensiveness, I found that RAG-Fusion was able to provide accurate and comprehensive answers due to the generated queries contextualizing the original query from various perspectives. However, some answers strayed off topic when the generated queries' relevance to the original query is insufficient. This research marks significant progress in artificial intelligence (AI) and natural language processing (NLP) applications and demonstrates transformations in a global and multi-industry context.
Performance, Energy Consumption and Costs: A Comparative Analysis of Automati...kevig
The common practice in Machine Learning research is to evaluate the top-performing models based on their performance. However, this often leads to overlooking other crucial aspects that should be given careful consideration. In some cases, the performance differences between various approaches may be insignificant, whereas factors like production costs, energy consumption, and carbon footprint should be taken into account. Large Language Models (LLMs) are widely used in academia and industry to address NLP problems. In this study, we present a comprehensive quantitative comparison between traditional approaches (SVM-based) and more recent approaches such as LLM (BERT family models) and generative models (GPT2 and LLAMA2), using the LexGLUE benchmark. Our evaluation takes into account not only performance parameters (standard indices), but also alternative measures such as timing, energy consumption and costs, which collectively contribute to the carbon footprint. To ensure a complete analysis, we separately considered the prototyping phase (which involves model selection through training-validation-test iterations) and the in-production phases. These phases follow distinct implementation procedures and require different resources. The results indicate that simpler algorithms often achieve performance levels similar to those of complex models (LLM and generative models), consuming much less energy and requiring fewer resources. These findings suggest that companies should consider additional considerations when choosing machine learning (ML) solutions. The analysis also demonstrates that it is increasingly necessary for the scientific world to also begin to consider aspects of energy consumption in model evaluations, in order to be able to give real meaning to the results obtained using standard metrics (Precision, Recall, F1 and so on).
Evaluation of Medium-Sized Language Models in German and English Languagekevig
Large language models (LLMs) have garnered significant attention, but the definition of “large” lacks clarity. This paper focuses on medium-sized language models (MLMs), defined as having at least six billion parameters but less than 100 billion. The study evaluates MLMs regarding zero-shot generative question answering, which requires models to provide elaborate answers without external document retrieval. The paper introduces an own test dataset and presents results from human evaluation. Results show that combining the best answers from different MLMs yielded an overall correct answer rate of 82.7% which is better than the 60.9% of ChatGPT. The best MLM achieved 71.8% and has 33B parameters, which highlights the importance of using appropriate training data for fine-tuning rather than solely relying on the number of parameters. More fine-grained feedback should be used to further improve the quality of answers. The open source community is quickly closing the gap to the best commercial models.
IMPROVING DIALOGUE MANAGEMENT THROUGH DATA OPTIMIZATIONkevig
In task-oriented dialogue systems, the ability for users to effortlessly communicate with machines and
computers through natural language stands as a critical advancement. Central to these systems is the
dialogue manager, a pivotal component tasked with navigating the conversation to effectively meet user
goals by selecting the most appropriate response. Traditionally, the development of sophisticated dialogue
management has embraced a variety of methodologies, including rule-based systems, reinforcement
learning, and supervised learning, all aimed at optimizing response selection in light of user inputs. This
research casts a spotlight on the pivotal role of data quality in enhancing the performance of dialogue
managers. Through a detailed examination of prevalent errors within acclaimed datasets, such as
Multiwoz 2.1 and SGD, we introduce an innovative synthetic dialogue generator designed to control the
introduction of errors precisely. Our comprehensive analysis underscores the critical impact of dataset
imperfections, especially mislabeling, on the challenges inherent in refining dialogue management
processes.
Document Author Classification Using Parsed Language Structurekevig
Over the years there has been ongoing interest in detecting authorship of a text based on statistical properties of the
text, such as by using occurrence rates of noncontextual words. In previous work, these techniques have been used,
for example, to determine authorship of all of The Federalist Papers. Such methods may be useful in more modern
times to detect fake or AI authorship. Progress in statistical natural language parsers introduces the possibility of
using grammatical structure to detect authorship. In this paper we explore a new possibility for detecting authorship
using grammatical structural information extracted using a statistical natural language parser. This paper provides a
proof of concept, testing author classification based on grammatical structure on a set of “proof texts,” The Federalist
Papers and Sanditon which have been as test cases in previous authorship detection studies. Several features extracted
of some depth, part of speech, and part of speech by level in the parse tree. It was found to be helpful to project the
features into a lower dimensional space. Statistical experiments on these documents demonstrate that information
from a statistical parser can, in fact, assist in distinguishing authors.
RAG-FUSION: A NEW TAKE ON RETRIEVALAUGMENTED GENERATIONkevig
Infineon has identified a need for engineers, account managers, and customers to rapidly obtain product
information. This problem is traditionally addressed with retrieval-augmented generation (RAG) chatbots,
but in this study, I evaluated the use of the newly popularized RAG-Fusion method. RAG-Fusion combines
RAG and reciprocal rank fusion (RRF) by generating multiple queries, reranking them with reciprocal
scores and fusing the documents and scores. Through manually evaluating answers on accuracy,
relevance, and comprehensiveness, I found that RAG-Fusion was able to provide accurate and
comprehensive answers due to the generated queries contextualizing the original query from various
perspectives. However, some answers strayed off topic when the generated queries' relevance to the
original query is insufficient. This research marks significant progress in artificial intelligence (AI) and
natural language processing (NLP) applications and demonstrates transformations in a global and multiindustry context
Performance, energy consumption and costs: a comparative analysis of automati...kevig
The common practice in Machine Learning research is to evaluate the top-performing models based on their
performance. However, this often leads to overlooking other crucial aspects that should be given careful
consideration. In some cases, the performance differences between various approaches may be insignificant, whereas factors like production costs, energy consumption, and carbon footprint should be taken into
account. Large Language Models (LLMs) are widely used in academia and industry to address NLP problems. In this study, we present a comprehensive quantitative comparison between traditional approaches
(SVM-based) and more recent approaches such as LLM (BERT family models) and generative models (GPT2 and LLAMA2), using the LexGLUE benchmark. Our evaluation takes into account not only performance
parameters (standard indices), but also alternative measures such as timing, energy consumption and costs,
which collectively contribute to the carbon footprint. To ensure a complete analysis, we separately considered the prototyping phase (which involves model selection through training-validation-test iterations) and
the in-production phases. These phases follow distinct implementation procedures and require different resources. The results indicate that simpler algorithms often achieve performance levels similar to those of
complex models (LLM and generative models), consuming much less energy and requiring fewer resources.
These findings suggest that companies should consider additional considerations when choosing machine
learning (ML) solutions. The analysis also demonstrates that it is increasingly necessary for the scientific
world to also begin to consider aspects of energy consumption in model evaluations, in order to be able to
give real meaning to the results obtained using standard metrics (Precision, Recall, F1 and so on).
EVALUATION OF MEDIUM-SIZED LANGUAGE MODELS IN GERMAN AND ENGLISH LANGUAGEkevig
Large language models (LLMs) have garnered significant attention, but the definition of “large” lacks
clarity. This paper focuses on medium-sized language models (MLMs), defined as having at least six
billion parameters but less than 100 billion. The study evaluates MLMs regarding zero-shot generative
question answering, which requires models to provide elaborate answers without external document
retrieval. The paper introduces an own test dataset and presents results from human evaluation. Results
show that combining the best answers from different MLMs yielded an overall correct answer rate of
82.7% which is better than the 60.9% of ChatGPT. The best MLM achieved 71.8% and has 33B
parameters, which highlights the importance of using appropriate training data for fine-tuning rather than
solely relying on the number of parameters. More fine-grained feedback should be used to further improve
the quality of answers. The open source community is quickly closing the gap to the best commercial
models.
Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze, understand, and generate languages that humans use naturally to address computers.
International Conference on NLP, Artificial Intelligence, Machine Learning an...gerogepatton
International Conference on NLP, Artificial Intelligence, Machine Learning and Applications (NLAIM 2024) offers a premier global platform for exchanging insights and findings in the theory, methodology, and applications of NLP, Artificial Intelligence, Machine Learning, and their applications. The conference seeks substantial contributions across all key domains of NLP, Artificial Intelligence, Machine Learning, and their practical applications, aiming to foster both theoretical advancements and real-world implementations. With a focus on facilitating collaboration between researchers and practitioners from academia and industry, the conference serves as a nexus for sharing the latest developments in the field.
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...shadow0702a
This document serves as a comprehensive step-by-step guide on how to effectively use PyCharm for remote debugging of the Windows Subsystem for Linux (WSL) on a local Windows machine. It meticulously outlines several critical steps in the process, starting with the crucial task of enabling permissions, followed by the installation and configuration of WSL.
The guide then proceeds to explain how to set up the SSH service within the WSL environment, an integral part of the process. Alongside this, it also provides detailed instructions on how to modify the inbound rules of the Windows firewall to facilitate the process, ensuring that there are no connectivity issues that could potentially hinder the debugging process.
The document further emphasizes on the importance of checking the connection between the Windows and WSL environments, providing instructions on how to ensure that the connection is optimal and ready for remote debugging.
It also offers an in-depth guide on how to configure the WSL interpreter and files within the PyCharm environment. This is essential for ensuring that the debugging process is set up correctly and that the program can be run effectively within the WSL terminal.
Additionally, the document provides guidance on how to set up breakpoints for debugging, a fundamental aspect of the debugging process which allows the developer to stop the execution of their code at certain points and inspect their program at those stages.
Finally, the document concludes by providing a link to a reference blog. This blog offers additional information and guidance on configuring the remote Python interpreter in PyCharm, providing the reader with a well-rounded understanding of the process.
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
Null Bangalore | Pentesters Approach to AWS IAMDivyanshu
#Abstract:
- Learn more about the real-world methods for auditing AWS IAM (Identity and Access Management) as a pentester. So let us proceed with a brief discussion of IAM as well as some typical misconfigurations and their potential exploits in order to reinforce the understanding of IAM security best practices.
- Gain actionable insights into AWS IAM policies and roles, using hands on approach.
#Prerequisites:
- Basic understanding of AWS services and architecture
- Familiarity with cloud security concepts
- Experience using the AWS Management Console or AWS CLI.
- For hands on lab create account on [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
# Scenario Covered:
- Basics of IAM in AWS
- Implementing IAM Policies with Least Privilege to Manage S3 Bucket
- Objective: Create an S3 bucket with least privilege IAM policy and validate access.
- Steps:
- Create S3 bucket.
- Attach least privilege policy to IAM user.
- Validate access.
- Exploiting IAM PassRole Misconfiguration
-Allows a user to pass a specific IAM role to an AWS service (ec2), typically used for service access delegation. Then exploit PassRole Misconfiguration granting unauthorized access to sensitive resources.
- Objective: Demonstrate how a PassRole misconfiguration can grant unauthorized access.
- Steps:
- Allow user to pass IAM role to EC2.
- Exploit misconfiguration for unauthorized access.
- Access sensitive resources.
- Exploiting IAM AssumeRole Misconfiguration with Overly Permissive Role
- An overly permissive IAM role configuration can lead to privilege escalation by creating a role with administrative privileges and allow a user to assume this role.
- Objective: Show how overly permissive IAM roles can lead to privilege escalation.
- Steps:
- Create role with administrative privileges.
- Allow user to assume the role.
- Perform administrative actions.
- Differentiation between PassRole vs AssumeRole
Try at [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Sinan KOZAK
Sinan from the Delivery Hero mobile infrastructure engineering team shares a deep dive into performance acceleration with Gradle build cache optimizations. Sinan shares their journey into solving complex build-cache problems that affect Gradle builds. By understanding the challenges and solutions found in our journey, we aim to demonstrate the possibilities for faster builds. The case study reveals how overlapping outputs and cache misconfigurations led to significant increases in build times, especially as the project scaled up with numerous modules using Paparazzi tests. The journey from diagnosing to defeating cache issues offers invaluable lessons on maintaining cache integrity without sacrificing functionality.
Batteries -Introduction – Types of Batteries – discharging and charging of battery - characteristics of battery –battery rating- various tests on battery- – Primary battery: silver button cell- Secondary battery :Ni-Cd battery-modern battery: lithium ion battery-maintenance of batteries-choices of batteries for electric vehicle applications.
Fuel Cells: Introduction- importance and classification of fuel cells - description, principle, components, applications of fuel cells: H2-O2 fuel cell, alkaline fuel cell, molten carbonate fuel cell and direct methanol fuel cells.
Software Engineering and Project Management - Introduction, Modeling Concepts...Prakhyath Rai
Introduction, Modeling Concepts and Class Modeling: What is Object orientation? What is OO development? OO Themes; Evidence for usefulness of OO development; OO modeling history. Modeling
as Design technique: Modeling, abstraction, The Three models. Class Modeling: Object and Class Concept, Link and associations concepts, Generalization and Inheritance, A sample class model, Navigation of class models, and UML diagrams
Building the Analysis Models: Requirement Analysis, Analysis Model Approaches, Data modeling Concepts, Object Oriented Analysis, Scenario-Based Modeling, Flow-Oriented Modeling, class Based Modeling, Creating a Behavioral Model.
john krisinger-the science and history of the alcoholic beverage.pptx
Understanding Chinese Moral Stories with Further Pre-Training
1. International Journal on Natural Language Computing (IJNLC) Vol.12, No.2, April 2023
DOI: 10.5121/ijnlc.2023.12201 1
UNDERSTANDING CHINESE MORAL STORIES WITH
FURTHER PRE-TRAINING
Jing Qian1
, Yong Yue1
, Katie Atkinson2
and Gangmin Li3
1
School of Advanced Technology, Xi’an Jiaotong Liverpool University,
Suzhou, China
2
Department of Computer Science, University of Liverpool, Liverpool, UK
3
School of Computer Science Technology, University of Bedfordshire, Luton, UK
ABSTRACT
The goal of moral understanding is to grasp the theoretical concepts embedded in a narrative by delving
beyond the concrete occurrences and dynamic personas. Specifically, the narrative is compacted into a
single statement without involving any characters within the original text, necessitating a more astute
language model that can comprehend connotative morality and exhibit commonsense reasoning. The “pre-
training + fine-tuning” paradigm is widely embraced in neural language models. In this paper, we propose
an intermediary phase to establish an improved paradigm of “pre-training + further pre-training + fine-
tuning”. Further pre-training generally refers to continual learning on task-specific or domain-relevant
corpora before being applied to target tasks, which aims at bridging the gap in data distribution between
the phases of pre-training and fine-tuning. Our work is based on a Chinese dataset named STORAL-ZH
that composes of 4k human-written story-moral pairs. Furthermore, we design a two-step process of
domain-adaptive pre-training in the intermediary phase. The first step depends on a newly-collected
Chinese dataset of Confucian moral culture. And the second step bases on the Chinese version of a
frequently-used commonsense knowledge graph (i.e. ATOMIC) to enrich the backbone model with
inferential knowledge besides morality. By comparison with several advanced models including BERT-
base, RoBERTa-base and T5-base, experimental results on two understanding tasks demonstrate the
effectiveness of our proposed three-phase paradigm.
KEYWORDS
Moral Understanding, Further Pre-training, Knowledge Graph, Pre-trained Language Model
1. INTRODUCTION
Morality is one of the most intricate topics related to human behavior [1]. It overlaps with
commonsense, molded by cultural norms, and regulated by laws and rules. Children are often
introduced to ethical values and morals through fable stories, which teach them to differentiate
between right and wrong in their daily lives. Moral understanding is a new challenging task for
natural language processing, which aims to comprehend the abstract concepts that are hidden
within a narrative by observing the concrete events and vivid characters portrayed. Earlier works
about story understanding are mainly centered on story ending prediction [2], or controllable
story generation given specific constraints, such as storylines [3], emotions [4], and styles [5].
Most of them attach importance to the surface realization based on the concrete content, whereas
our work attempts to uncover the implied morals that are conveyed by the stories. Table 1 shows
one moral-story pair from STORAL-ZH [6], a recently released dataset of 4,209 Chinese moral
stories.
2. International Journal on Natural Language Computing (IJNLC) Vol.12, No.2, April 2023
2
Self-supervised pre-training on massive amounts of unlabeled corpora from a general domain
endows large language models with contextual knowledge and the capacity to recognize n-grams.
Originated from Transformer [7], plenty of Pre-trained Language Models (PLMs) have sprung up
in succession. These models can be broadly classified into three groups based on their
architecture: Transformer encoder (e.g., BERT [8], RoBERTa [9]), Transformer decoder (e.g.,
GPT2 [10], GPT3 [11]), and the full Transformer encoder-decoder network (e.g., T5 [12], MASS
[13]). With distinct pre-training strategies, PLMs are capable of handling diverse downstream
tasks. For example, RoBERTa's masked language modeling produces advantageous contextual
word representations that enhance natural language understanding, while the strategy of auto-
regressive language modeling exploited by GPT2 establishes the groundwork for natural
language generation.
Table 1. A moral-story pair from STORAL-ZH.
STORY
晚饭后, 母亲和女儿一块儿洗碗盘, 父亲和儿子在客厅看电视。
One day after dinner, mother and daughter were washing dishes together,
father and son were watching TV in the living room.
突然, 厨房里传来打破盘子的响声, 然后一片沉寂。
Suddenly there was a sound of breaking plates in the kitchen, and then
there was silence.
这时儿子望着父亲说道: “一定是妈妈打破的。”
The son looked at his father and said, “Mom must have broken it.”
“你怎么知道?” “她没有骂人。”
“How do you know?” “She didn’t swear.”
MORAL
我们习惯以不同的标准来看人看己, 以致往往是责人以严, 待己宽。
We are used to looking at others and ourselves by different standards, so
that we tend to be strict with others and lenient with ourselves.
The pre-training phase enhances the capability of language models, particularly as the number of
model parameters and the magnitude of unlabeled corpora continue to expand. This assertion is
supported by the impressive performance achieved by prompt learning on various benchmark
tasks [14]. Prompt learning [15] wraps the input sequence with a template containing masked
tokens to handle downstream tasks by imitating the pre-training objectives. By which, the great
potential of PLMs is better stimulated. Therefore, further pre-training on in-domain data
(Domain-Adaptive Pre-Training, DAPT) or task-relevant data (Task-Adaptive Pre-Training,
TAPT) is a highly recommended option when the downstream scenarios involve specific
domains and no relevant data is available in the unlabeled corpora. Figure 1 exhibits the
advantages of further pre-training for pre-trained language models as listed by ChatGPT, the
most remarkable PLM in recent days.
Apart from domain-specific knowledge and task-relevant information that are attained from
unlabeled unstructured text, sometimes equipping PLMs with the capability of commonsense
reasoning can further enhance their efficacy. In this regard, the phase of further pre-training can
be extended to heterogeneous data, such as Knowledge Graphs (KGs), A typical KG is composed
of RDF triples (h, r, t), where h and t represent head entity and tail entity respectively,
r represents their relationship. There have been various kinds of KGs, including linguistic [16],
encyclopedia [17], commonsense [18], domain-specific [19]. For instance, ATOMIC [20], a
frequently-used commonsense KG, contains triples that encode inferential knowledge about
3. International Journal on Natural Language Computing (IJNLC) Vol.12, No.2, April 2023
3
everyday life, such as (PersonX applies to jobs, xEffect, gets hired) and (PersonX asks PersonY
for money, xWant, to go pay bills).
Figure 1. The advantages of further pre-training for PLMs, according to ChatGPT.
This work proposes a three-phase paradigm that builds upon the conventional two-phase
paradigm of “pre-training + fine-tuning”. The proposed paradigm involves an intermediary phase
of further pre-training, which will be evaluated using two downstream tasks related to moral
understanding. STORAL-ZH [6], a dataset of Chinese human-written moral stories, is employed
for the target tasks, and LongLM-base [21], a T5-architectured language model pre-trained on
120G Chinese long novels, is selected as our backbone model. TAPT and DAPT constitute the
newly added phase of further pre-training. For TAPT, the language model is further trained on
unlabeled STORAL-ZH to possess task-awareness knowledge. For DAPT, we design a two-step
process to involve two domains, namely moral culture and commonsense knowledge. Inspired by
[22], we utilize triples of a commonsense KG and transform them into readable textual sequences
for domain-adaptive further pre-training. To summarize, our contributions are reflected in the
following three aspects: (1) different from the conventional two-phase paradigm, we add an
intermediary phase of further pre-training before fine-tuning to boost the model’s performance on
moral understanding, (2) our model is equipped with commonsense knowledge through continual
pre-training on the template-based transformation of triples in ATOMIC-zh [18] to facilitate
moral perception beyond concrete characters and events, and (3) a new corpus centered on
Chinese traditional moral culture (i.e. the Four Books and Five Classics) is collected to support
domain-adaptive pre-training about moral culture.
2. RELATED WORK
2.1. Story Understanding
A wide range of tasks related to story understanding and generation have been proposed in the
literature, which includes story ending prediction [2], commonsense story continuation [22] and
story ending generation with fine-grained sentiment [23]. To improve story understanding,
4. International Journal on Natural Language Computing (IJNLC) Vol.12, No.2, April 2023
4
various factors have been taken into account, such as storylines [3], emotions [4] and styles [5].
Different from that storylines guide the writing of a story, emotions portray the characters’
mental and emotional states, and styles determine the overall tone of the narrative, moral
understanding emphasizes the ability to bridge story content and implied morals, which is more
complicated. To address this challenge, [6] introduces a new dataset of 4k Chinese and 2k
English moral stories, and proposes two moral understanding tasks including concept
understanding and preference alignment to facilitate the modeling of commonsense and discourse
relations among abstract concepts and concrete events.
2.2. Further Pre-training
Pre-training is a crucial initial phase for language model implementation, serving to initialize the
model and expedite parameter convergence in downstream tasks. With the rapid expansion in
model size, larger unlabeled corpora are necessary to fully pre-train the model and prevent over-
fitting. To bridge the gap between pre-training and fine-tuning data distributions, further pre-
training exhibits positive effects and brings improved model performance [24, 25]. Additionally,
[26] introduces two forms of further pre-training: task-adaptive pre-training (TAPT) and domain-
adaptive pre-training (DAPT). To be specific, TAPT refers to continual pre-training on task-
specific unlabeled data prior to the phase of fine-tuning, which helps to incorporate task-
awareness knowledge with the language model in advance and brings consistent performance
improvements [27]. On the other hand, DAPT supports better adaptability to relevant domains
and acquires domain-awareness knowledge, which is particularly useful in low-resource cases
[28].
2.3. Joint Knowledge and PLMs
In recent times, there has been a growing interest in integrating knowledge into PLMs. Although
self-supervised pre-training over large-scale corpora has proven useful in providing PLMs with
ample contextual semantics, it lacks domain-specific [19, 29], factual [30, 31] or commonsense
knowledge [22, 32]. To address this limitation, various methods have been proposed. For
instance, K-BERT [19] inserts domain-specific knowledge triples into the input sequence
explicitly and employs a visible matrix to govern the mutual influence among tokens. BERT-MK
[29] integrates graph contextualized knowledge from a medical KG into language models.
KEPLER [30] encodes entity descriptions as their embeddings and jointly optimizes knowledge
embedding and masked language modeling objectives in the same PLM. ERNIE [31] utilizes the
informative entities in KGs to enhance language representation by putting forward a new pre-
training objective. KG-BART [32] captures the complex relations between concepts over a
commonsense KG for generative commonsense reasoning. Furthermore, [22] conducts
incremental pre-training on two commonsense knowledge bases to generate more reasonable
stories without spending additional efforts on considering heterogeneous information fusion or
sub-graph aggregation, which implicitly and effectively incorporates commonsense knowledge
with GPT-2 [10].
3. METHODOLOGY
This section elaborates on the key constituents of our method, encompassing the transformer-
based language model, the implementation details of task-adaptive and domain-adaptive pre-
training, as well as the phase of fine-tuning.
5. International Journal on Natural Language Computing (IJNLC) Vol.12, No.2, April 2023
5
3.1. Transformer-Based Language Model
This work employs a language model based on the full Transformer architecture [7], in which the
encoder accepts an input sequence and uses fully-visible masking, while the decoder produces the
target sequence through causal masking and cross-attention. This text-to-text framework is adept
at handling both understanding and generation tasks. T5 [12] is a representative encoder-decoder
language model trained on the Colossal Clean Crawled Corpus (C4) composed of English,
French, Romanian, and German. T5 employs an unsupervised pre-training objective of replacing
corrupted spans, which proves to be the best choice after a comprehensive comparison.
LongLM [21], a Chinese version of T5, is pre-trained on 120G Chinese novels with two
generative tasks: text infilling [12] and conditional continuation [10]. Text infilling, inspired by
Span BERT [33], replaces a few of text spans in the input sequence with special tokens at a
corruption rate of 15%, with span lengths following the Poisson distribution with λ = 3. The pre-
training objective is to output the original text spans that are replaced with special tokens using
the greedy decoding algorithm. The second task, conditional continuation, aims to generate the
back half of a text based on its front half using top-k sampling [34] with k = 40 and a softmax
temperature of 0.7 [35]. In this work, we leverage the pre-trained checkpoint of LongLM-base,
which has 223M model parameters, available on Hugging Face [36].
3.2. Further Pre-training
3.2.1. Task-Adaptive Pre-Training (TAPT)
It is worthwhile to conduct further pre-training on the unlabeled data of downstream tasks to
enhance the adaptability of the language model before fine-tuning. TAPT offers advantages such
as increased efficiency and improved task-specific performance. Pre-training can be
computationally expensive, but further pre-training on the far smaller task-specific dataset
exhibits higher efficiency and possesses task-awareness knowledge in advance. [6] has already
post-trained LongLM [21] on the unlabeled version of STORAL [6], and names it T5-Post that
serves as a compared baseline in the original paper. Table 2 provides an example for the pre-
training objective of text infilling.
Table 2. An example showing the pre-training objective of text infilling.
Original
Story
I was sitting in my room and was busy with my usual things. Knowing through
the news of social media the carnage of seven civilians, I was afflicted with a
heart trouble and great care.
Inputs
I was sitting in my room and was busy with <X>. Knowing through the news
of social media <Y>, I was afflicted with a heart trouble and great care.
Targets <X> my usual things <Y> the carnage of seven civilians <Z>
3.2.2. Domain-Adaptive Pre-Training (DAPT)
With the objective of boosting the performance of the language model on downstream tasks, it is
reasonable to conduct further pre-training on unlabeled corpora that are specific to relevant
domains. DAPT enables PLMs to acquire domain-awareness knowledge that can improve their
ability to understand the text in the target domain. In order to better grasp the abstract morals
conveyed by concrete events and characters in Chinese narratives, background domains can
involve traditional moral culture and commonsense knowledge.
6. International Journal on Natural Language Computing (IJNLC) Vol.12, No.2, April 2023
6
Moral Knowledge Confucianism represents the mainstream moral culture of China, and its
authoritative books, the Four Books and Five Classics, provide a comprehensive record of the
political, economic, diplomatic, cultural, and philosophical developments during the most active
period in the history of Chinese ideology. The morals and ethics contained in these books have
exerted a significant and lasting influence on Chinese culture. Since the Four Books and Five
Classics were originally written in classical Chinese, we have collected translated versions in
written vernacular Chinese as the corpus for further pre-training, referred to as 4+5. Table 3
provides several examples from the Analects, one of the Four Books, to illustrate the type of
content included in 4+5.
Table 3. Three examples from the Analects and translated in Vernacular Chinese and English
.
Example 1 巧
言
令
色
,
鲜
仁
矣
。
Vernacular Chinese 花
言
巧
语
,
装
出
和
颜
悦
色
的
样
子
,
这
种
人
的
仁
心
就
很
少
了
。
English Translation
Talking plausibly with feign amiable looks, people
of this sort are scarcely benevolent.
Example 2 君
子
不
器
。
Vernacular Chinese 君
子
不
像
器
具
那
样
,
只
有
某
方
面
的
用
途
。
English Translation
Gentlemen are not like tools, each of which only
has a certain use.
Example 3 学
如
不
及
,
犹
恐
失
之
。
Vernacular Chinese 学
习
如
同
赛
跑
,
总
怕
赶
不
上
,
可
赶
上
了
,
又
怕
被
超
过
。
English Translation
Study is just like race. You’re always afraid of
being unable to catch up. Yet when you catch up,
you’ll be afraid of being overtaken.
Commonsense Knowledge Integrating commonsense knowledge equips PLMs with the ability
of commonsense reasoning in downstream tasks. To this end, we leverage the commonsense
knowledge from a frequently-used knowledge graph, ATOMIC [20]. It consists of 877K if-then
triples (h, r, t) in which the head h and the tail t are two events and the relation r describe their if-
then relationship. For examples, (PersonX accomplishes PersonY’s work, xAttr, helpful) means
that if X accomplishes Y’s work, then X is helpful, (PersonX accomplishes PersonY’s work,
oWant, to thank PersonX) tells that if X accomplishes Y’s work, then Y will thank X. xAttr
represents the persona attribute of X, oWant states others’ event. There are three if-then types
including If-Event-Then-Mental-State, If-Event-Then-Event and If-Event-Then-Persona.
Inferential knowledge brought by ATOMIC [20] facilitates language comprehension, especially
commonsense relations among concrete events and abstract concepts for moral stories. Our work
is based on Chinese, we utilize the translated ATOMIC dataset, ATOMIC-ZH [18] instead.
Inspired by [22] and [37], we linearize KG triples into textual sequences through the template-
based transformation, as illustrated in Table 4. Different from previous works that explicitly
introduced part commonsense knowledge into PLMs, further pre-training directly on all
linearized triples can integrate commonsense knowledge into LongLM [21] implicitly in a more
convenient way.
7. International Journal on Natural Language Computing (IJNLC) Vol.12, No.2, April 2023
7
Table 4. Examples of template-based transformation of KG triples.
KG Triples Transformed Sentences
(某人完全放弃某物, xEffect, 翻开
新的一页)
(PersonX abandons ____ altogether,
xEffect, turns over a new leaf)
汤姆完全放弃某物,结果他翻开新的一页。
Tom abandons something altogether, as a result,
he turns over a new leaf.
(有人接受事实, xAttr, 勇敢的)
(PersonX accepts the fact, xAttr,
brave)
汤姆接受事实,他是勇敢的。
Tom accepts the fact, he is brave.
(有人接受了挑战, xIntent, 以证明
他能做到)
(PersonX accepts the challenge,
xIntent, to prove he can do it)
汤姆接受了挑战,因为他想以证明他能做到。
Tom accepts the challenge, because he wanted to
prove he can do it.
3.3. Fine-tuning
Following the standard paradigm “pre-training + fine-tuning”, we fine-tune our model on two
moral understanding tasks after task-adaptive pre-training on unlabeled STORAL-ZH [6] and
domain-adaptive pre-training on 4+5 and ATOMIC-ZH [18]. Both tasks are designed by [6], they
aim to select the correct moral from several choices given a story, but test the abilities of the
PLM from two different aspects. One is concept understanding, the other is preference alignment.
ConcePT understanding (CPT) It requires choosing the correct one from the five candidates of
morals for each story, that tests the ability of understanding abstract concepts behind concrete
events in the story. Apart from the paired moral of the story, the other four candidates are true
negative samples that are selected from the morals of stories about irrelevant topics.
PREFerence alignment (PREF) Simpler than CPT, PREF aims to tell the right moral from the
other wrong one. There are only two moral candidates for each story in the constructed task
dataset [6]. The incorrect candidate is obtained by replacing one random token in the correct
moral with its antonym. As some words do not have antonyms, the training data for PREF is a
little smaller than CPT.
To handle both tasks of CPT and PREF, we first concatenate the story and its candidate morals,
then insert unique special tokens before the story and each candidate, and feed the sequence into
the tested language model. Following the default settings of T5 [12], special tokens are
<extra_id_i> where i points out the number order. Inspired by [6], we take the hidden states of
corresponding special tokens as the representations of the story and each candidate respectively,
afterwards we normalize the dot-product scores between the representations of the story and each
candidate to predict the probability distribution over all candidates. We optimize the language
model by minimizing the cross-entropy loss.
4. EXPERIMENTS
4.1. Datasets
Corpus for Further Pre-training We adopt two kinds of corpora of different domains including
moral culture and commonsense knowledge for domain-adaptive pre-training. For moral culture,
8. International Journal on Natural Language Computing (IJNLC) Vol.12, No.2, April 2023
8
the corpus is composed of vernacular version for the Four Books and Five Classics. The Four
Books are Great Learning, Doctrine of the Mean, Analects and Mencius, while the Five Classics
are Classic of Poetry, Book of Documents, Book of Rites, I Ching, and Spring and Autumn
Annals. We collect the writings in the vernacular of each work from public web resources and
integrate them together to get the unlabeled corpus named “4+5”.
To enrich our model with commonsense knowledge, we transform the triples in ATOMIC-ZH
[18] into readable textual sequences using a template-based method [37] for further pre-training.
ATOMIC-ZH [18] is the translated ATOMIC [20] used for Chinese tasks. [18] applies Regular
Replacement to alleviate the problems of containing special tokens (i.e., PersonX and PersonY)
as well as blank in some triples. To facilitate convenient translation, [18] transform triples into
reasonable natural language sentences, then split them into the form of (h, r, t) after being
translated via automatic translation system to make up ATOMIC-ZH. The Chinese commonsense
knowledge graph provided by [18] is enlarged by other resources, we only select the triples with
the nine relations that are mentioned in [20] for our further use.
Corpus for Fine-tuning The corpus for downstream tasks are constructed from STORAL-ZH
[6], which composes of 4,209 Chinese story-moral pairs. This new dataset is collected by [6]
from multiple web pages of moral stories and is cleansed with de-duplication and decoupling.
The average number of words and sentences are 322 and 18 for stories, 25 and 1.5 for morals.
When applied in the phase of fine-tuning, the labeled data are randomly splitted by 8:1:1 for
training/validation/testing set, respectively.
4.2. Compared Baselines
BERT The BERT-architectured model used in our work is the bert-base-Chinese register model
[8]. It has been pre-trained for Chinese with the pre-training objective of masked language
modeling.
RoBERTa The RoBERTa-architectured model used in our work is the hfl/chinese-roberta-wwm-
ext register model [38]. It is essentially a Chinese pre-trained BERT model with whole word
masking.
T5 The T5-architectured model used in our work is the thu-coai/LongLM-base register model
[21]. It has been pre-trained on 120G Chinese long novels with two pre-training tasks including
text infilling [12] and conditional continuation [10].
4.3. Experiment Settings
Our experiments are basing on Long LM-base [21], a Chinese pre-trained T5 model. All
language models are implemented on the codes and pre-trained checkpoints from Hugging Face
[36]. The model configurations are following their respective base version. As for the hyper-
parameters for all models, we set the batch size to 16, the maximum sequence length to 1,024,
and the learning rate to 3e-5. As for tokenization, a sentencepiece vocabulary of 32,000 word
pieces [39] is applied. We use accuracy as the metric to evaluate the two understanding tasks.
5. RESULTS AND ANALYSIS
This section is going to specify and analyze the experimental results. Based on previous work
done by [6], we conduct further domain-adaptive pre-training focusing on two relevant domains,
moral culture and commonsense knowledge. [6] Has post-trained RoBERTa [38] and T5 [21] on
9. International Journal on Natural Language Computing (IJNLC) Vol.12, No.2, April 2023
9
the unlabeled data and names them RoBERTa-Post and T5-Post in the original paper. Such post-
training is the task-adaptive pre-training that we call in our paper, thus we rename them
RoBERTa-T and T5-T in Table 5 for better distinguishment with our methods. The T in their
names means Task-adaptive pre-training, TD means both Task- and Domain-adaptive pre-
training, but the domain is moral culture. TD+ means further pretraining about the domain of
commonsense upon TD. Human means human performance on the two tasks, which has been
tested by [6]. #Para is the approximate number of model parameters. For each task, the best
performance is highlighted in bold and the second best is underlined, except for human
performance.
Table 5. Accuracy (%) for CPT and PREF with different pre-training strategies.
Models CPT PREF #Para
BERT [8] 59.62 82.97 110M
RoBERTa [38] 62.71 89.54 110M
RoBERTa-T [6] 64.61 87.59 110M
T5 [21] 69.60 82.00 220M
T5-T [6] 70.07 81.75 220M
T5-TD 70.42 82.68 220M
T5-TD+ 71.86 82.41 220M
Human [6] 95.00 98.00 N/A
By analyzing the accuracy results in Table 5, we summarize our findings on two moral
understanding tasks as follows: (1) T5 performs better than BERT and RoBERTa on CPT
but worse on PREF, that tells that the encoder-only architecture might be good at
aligning preferences; (2) We find that further pre-training does not always improve the
performance on target tasks after comparing RoBERTa-T with RoBERTa and T5-TD+
with T5-TD on PREF, which advises that a better way is required to make use of these
data especially when handling tasks similar with PREF; (3) We observe that different
pre-training corpus brings different degrees of effects, which might depend on target
tasks. T5-TD makes smaller progress than T5-TD+ on CPT, but the reverse happens on
PREF, which indicating that the corpus of commonsense is more needed by CPT to
enhance the ability of commonsense reasoning while PREF requires more moral data to
capture value preferences; (4) Although a big gap exists between our models and human
performance, further pre-training has proved its effectiveness. Zero-shot or few-shot
learning has been an important trend, which is supported by PLMs with strong
generalization capability.
6. CONCLUSIONS
In this paper, we suggest to leverage a three-phase paradigm (“pre-training + further pre-training
+ fine-tuning”) instead of the traditional two-phase paradigm (“pre-training + fine-tuning”). The
effects of the intermediate phase is tested on two downstream tasks of moral understanding.
Specifically, the further pre-training is categorized in two types, task-adaptive and domain-
adaptive, with the aim of enriching the language model with task- and domain-awareness
knowledge. Task-adaptive pre-training refers to further pre-training on unlabeled training corpus
for target tasks before fine-tuning on labeled corpus. As for domain-adaptive pre-training, we
10. International Journal on Natural Language Computing (IJNLC) Vol.12, No.2, April 2023
10
utilize corpora from two different domains including moral culture and commonsense
knowledge. To be specific, the corpus about moral culture is composed of Vernacular Chinese of
Confucius theory. Furthermore, we linearize the triples of a Chinese commonsense knowledge
graph into readable natural language sentences for incremental domain-adaptive pre-training.
Experimental results reveals the effectiveness of our method, and requires paying attention to
specific task property and the relevance between the domains and the target task. Further pre-
training performs better when the language model is more adaptable to the downstream tasks or
when the content of the further pre-training corpus is more supportive for them. Larger-scale pre-
training over multitasks and multi-domains is of high computational cost but still necessary,
especially in low-resource settings. For future work, we will figure out a better way to make the
best of the corpora of further pre-training, such as novel pre-training strategies and preferable
data preparation.
ACKNOWLEDGEMENTS
This research was supported by the Research Development Fund (RDFSP16) at Xi’an Jiaotong-
Liverpool University (XJTLU), XJTLU Key Programme Special Fund (KSF-A-17) and the
Suzhou Municipal Key Laboratory for Intelligent Virtual Engineering (SZS2022004).
REFERENCES
[1] L. Jiang, C. Bhagavatula, J. Liang, J. Dodge, K. Sakaguchi, M. Forbes, J. Borchardt, S. Gabriel, Y.
Tsvetkov, R. A. Rini, and Y. Choi, “Can machines learn morality? the delphi experiment,” 2022.
[2] Z. Li, X. Ding, and T. Liu, “Story ending prediction by transferable bert,” in Proceedings of the
Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19. International Joint
Conferences on Artificial Intelligence Organization, 7 2019.
[3] L. Yao, N. Peng, R. Weischedel, K. Knight, D. Zhao, and R. Yan, “Plan-and-write: Towards better
automatic storytelling,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no.
01, pp. 7378–7385, Jul. 2019.
[4] F. Brahman and S. Chaturvedi, “Modeling protagonist emotions for emotion-aware storytelling,” in
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing
(EMNLP), Nov. 2020.
[5] X. Kong, J. Huang, Z. Tung, J. Guan, and M. Huang, “Stylized story generation with style-guided
planning,” in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Aug.
2021.
[6] J. Guan, Z. Liu, and M. Huang, “A corpus for understanding and generating moral stories,” in
Proceedings of the 2022 Conference of the North American Chapter of the Association for
Computational Linguistics: Human Language Technologies, Jul. 2022.
[7] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, and I.
Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems, vol.
30, 2017.
[8] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional
transformers for language understanding,” in Proceedings of the 2019 Conference of the North
American Chapter of the Association for Computational Linguistics: Human Language Technologies,
Volume 1 (Long and Short Papers), Jun. 2019.
[9] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V.
Stoyanov, “Roberta: A robustly optimized bert pretraining approach,” ArXiv, vol. abs/1907.11692,
2019.
[10] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language models are
unsupervised multitask learners,” 2019.
[11] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G.
Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D.
Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C.
Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei, “Language models are few-shot
learners,” in Advances in Neural Information Processing Systems, vol. 33, 2020.
11. International Journal on Natural Language Computing (IJNLC) Vol.12, No.2, April 2023
11
[12] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu,
“Exploring the limits of transfer learning with a unified text-to-text transformer,” Journal of Machine
Learning Research, vol. 21, no. 140, pp. 1–67, 2020.
[13] K. Song, X. Tan, T. Qin, J. Lu, and T.-Y. Liu, “MASS: Masked sequence to sequence pre-training for
language generation,” in Proceedings of the 36th International Conference on Machine Learning,
2019.
[14] A. Wang, Y. Pruksachatkun, N. Nangia, A. Singh, J. Michael, F. Hill, O. Levy, and S. Bowman,
“Superglue: A stickier benchmark for general purpose language understanding systems,” in Advances
in Neural Information Processing Systems, vol. 32. Curran Associates, Inc., 2019.
[15] P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, and G. Neubig, “Pre-train, prompt, and predict: A
systematic survey of prompting methods in natural language processing,” ACM Computing Surveys,
vol. 55, pp. 1 – 35, 2021.
[16] K. D. Bollacker, C. Evans, P. K. Paritosh, T. Sturge, and J. Taylor, “Freebase: a collaboratively
created graph database for structuring human knowledge,” in SIGMOD Conference, 2008.
[17] D. Vrandeˇci ́c and M. Kr ̈otzsch, “Wiki data: A free collaborative knowledge base,”
Communications of the ACM, 2014.
[18] D. Li, Y. Li, J. Zhang, K. Li, C. Wei, J. Cui, and B. Wang, “C3KG: A Chinese commonsense
conversation knowledge graph,” in Findings of the Association for Computational Linguistics: ACL
2022, May 2022.
[19] W. Liu, P. Zhou, Z. Zhao, Z. Wang, Q. Ju, H. Deng, and P. Wang, “K-bert: Enabling language
representation with knowledge graph,” in AAAI Conference on Artificial Intelligence, 2019.
[20] M. Sap, R. L. Bras, E. Allaway, C. Bhagavatula, N. Lourie, H. Rashkin, B. Roof, N. A. Smith, and Y.
Choi, “Atomic: An atlas of machine commonsense for if-then reasoning,” in AAAI Conference on
Artificial Intelligence, 2019.
[21] J. Guan, Z. Feng, Y. Chen, R. He, X. Mao, C. Fan, and M. Huang, “LOT: A story-centric benchmark
for evaluating Chinese long text understanding and generation,” Transactions of the Association for
Computational Linguistics, vol. 10, 2022.
[22] J. Guan, F. Huang, Z. Zhao, X. Zhu, and M. Huang, “A knowledge-enhanced pretraining model for
commonsense story generation,” Transactions of the Association for Computational Linguistics, vol.
8, 2020.
[23] F. Luo, D. Dai, P. Yang, T. Liu, B. Chang, Z. Sui, and X. Sun, “Learning to control the fine-grained
sentiment for story ending generation,” in Proceedings of the 57th Annual Meeting of the Association
for Computational Linguistics, Jul. 2019.
[24] J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, and J. Kang, “BioBERT: a pre-trained
biomedical language representation model for biomedical text mining,” Bioinformatics, vol. 36, no. 4,
pp. 1234–1240, 09 2019.
[25] Y. Sun, S. Wang, Y. Li, S. Feng, H. Tian, H. Wu, and H. Wang, “ERNIE 2.0: A continual pre-
training framework for language understanding,” in The Thirty-Fourth AAAI Conference on
Artificial Intelligence. AAAI Press, 2020.
[26] S. Gururangan, A. Marasovi ́c, S. Swayamdipta, K. Lo, I. Beltagy, D. Downey, and N. A. Smith,
“Don’t stop pretraining: Adapt language models to domains and tasks,” in Proceedings of the 58th
Annual Meeting of the Association for Computational Linguistics, Jul. 2020.
[27] C. Sun, X. Qiu, Y. Xu, and X. Huang, “How to fine-tune bert for text classification?” in Chinese
Computational Linguistics, M. Sun, X. Huang, H. Ji, Z. Liu, and Y. Liu, Eds. Springer International
Publishing, 2019.
[28] L. Zhuang, L. Wayne, S. Ya, and Z. Jun, “A robustly optimized BERT pre-training approach with
post-training,” in Proceedings of the 20th Chinese National Conference on Computational Linguistics,
Aug. 2021.
[29] B. He, D. Zhou, J. Xiao, X. Jiang, Q. Liu, N. J. Yuan, and T. Xu, “BERT-MK: Integrating graph
contextualized knowledge into pre-trained language models,” in Findings of the Association for
Computational Linguistics: EMNLP 2020, Nov. 2020.
[30] X. Wang, T. Gao, Z. Zhu, Z. Zhang, Z. Liu, J. Li, and J. Tang, “KEPLER: A unified model for
knowledge embedding and pre-trained language representation,” Transactions of the Association for
Computational Linguistics, vol. 9, 2021.
[31] Z. Zhang, X. Han, Z. Liu, X. Jiang, M. Sun, and Q. Liu, “ERNIE: Enhanced language representation
with informative entities,” in Proceedings of the 57th Annual Meeting of the Association for
Computational Linguistics, Jul. 2019.
12. International Journal on Natural Language Computing (IJNLC) Vol.12, No.2, April 2023
12
[32] Y. Liu, Y. Wan, L. He, H. Peng, and P. S. Yu, “Kg-bart: Knowledge graph-augmented bart for
generative commonsense reasoning,” ArXiv, vol. abs/2009.12677, 2020.
[33] M. Joshi, D. Chen, Y. Liu, D. S. Weld, L. Zettlemoyer, and O. Levy, “SpanBERT: Improving pre-
training by representing and predicting spans,” Transactions of the Association for Computational
Linguistics, vol. 8, 2020.
[34] A. Fan, M. Lewis, and Y. Dauphin, “Hierarchical neural story generation,” in Proceedings of the 56th
Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Jul.
2018.
[35] Y. B. Ian Good fellow and A. Courville, “Deep learning,” MIT Press, 2016.
[36] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M.
Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. Le Scao, S.
Gugger, M. Drame, Q. Lhoest, and A. Rush, “Transformers: State-of-the-art natural language
processing,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language
Processing: System Demonstrations, Oct. 2020.
[37] P. Hosseini, D. A. Broniatowski, and M. Diab, “Knowledge-augmented language models for cause-
effect relation classification,” in Proceedings of the First Workshop on Commonsense Representation
and Reasoning (CSRR 2022), May 2022.
[38] Y. Cui, W. Che, T. Liu, B. Qin, S. Wang, and G. Hu, “Revisiting pre-trained models for Chinese
natural language processing,” in Findings of the Association for Computational Linguistics: EMNLP
2020, Nov. 2020.
[39] T. Kudo and J. Richardson, “Sentence Piece: A simple and language independent subword tokenizer
and detokenizer for neural text processing,” in Proceedings of the 2018 Conference on Empirical
Methods in Natural Language Processing: System Demonstrations, Nov. 2018.