These are the slides for the technical briefing given at ICSE 2021, given by Alessio Ferrari, Liping Zhao, and Waad Alhoshan
It covers RE tasks to which NLP is applied, an overview of a recent systematic mapping study on the topic, and a hands-on tutorial on using transfer learning for requirements classification.
Please find the links to the colab notebooks here:
https://colab.research.google.com/drive/158H-lEJE1pc-xHc1ISBAKGDHMt_eg4Gn?usp=sharing
https://colab.research.google.com/d rive/1B_5ow3rvS0Qz1y-KyJtlMNnm gmx9w3kJ?usp=sharing
https://colab.research.google.com/d rive/1Xrm0gNaa41YwlM5g2CRYYX cRvpbDnTRT?usp=sharing
2. Objectives
● Provide an overview of NLP tasks for RE (NLP4RE)
● Present the results of a mapping study on NLP4RE
● Point to relevant tools and resources to practice NLP4RE
● Introduce you to transfer learning for NLP4RE
3. What is Natural Language Processing?
Technologies enabling extraction and manipulation of information
from natural language (NL) - English, Italian, Swedish, etc.
4. Wait...What is a Requirement?
● Jackson and Zave: Condition over phenomena of the environment that we
want to make true by developing the system
● Lamsweerde: Goal under the responsibility of a single agent of the
software-to-be
● ISO/IEC/IEEE 29148 Standard: Statement which translates or expresses a
need and its associated constraints and conditions
● Wikipedia: Singular documented physical or functional Need that a particular
design, product or process aims to satisfy
● No agreed INTENSIONAL definition
● Some confusion on the types of requirements (e.g., user, system, software,
business, functional, non-functional), the concept of specification, etc.
● So, let us give some EXAMPLES, and give an EXTENSIONAL definition
7. Why are requirements so special?
● Requirements are heterogeneous (specs, app reviews, legal docs)
● Requirements specifications use a more restricted vocabulary (about
half of generic texts), and longer sentences
● App reviews are unstructured and informal
● 62% of the words used in reqs do not appear in generic texts
This suggests that NLP tools trained on generic texts
may need to be tailored for requirements
Ferrari, A., Spagnolo, G. O., & Gnesi, S. PURE: A dataset
of public requirements documents. IEEE RE 2017.
17. Observations
● Most RE problems could be solved top-down
● I can enforce tracing when writing requirements
● I can use constrained natural languages to improve quality
● I can tag classes in advance
● I can write a glossary in advance
● Unfortunately, this does not happen, that’s why we need NLP
● We need NLP also to recover from errors when RE problems
are addressed top-down by fallible humans
18. NLP4RE
Techniques, Tools &
Resources
Zhao, L., Alhoshan, W., Ferrari, A., et al. Natural Language Processing for Requirements
Engineering: A Systematic Mapping Study. ACM Comput. Surv. 54, 3, Article 55 (April 2021),
41 pages. DOI:https://doi.org/10.1145/3444689
20. Publication Status of NLP4RE Literature
Key takeaways:
● 404 relevant papers in 4 decades
● 1st papers published in 1983
● Fast growth since 2004
● NLP4RE an active and thriving
research area in RE
●
21. State of Empirical Research in NLP4RE
Majority NLP4RE studies (> 67%) are solution proposals, typically involving the
development of a novel solution, a new method or a new technique
Only a small number of studies (7%) is conducted in an industrial setting via a
case study or field study
About a third (35%) of the solution proposals are not evaluated but only illustrated
using examples, discussion or simulation
Remaining solution proposals are evaluated in a lab experiment using either
students or software subjects
Limited evidence of industrial uptake of NLP4RE results
A typical NLP4RE study is a solution proposal, possibly evaluated internally
through an experiment or example, but without evaluation in the real world
22. State of Tool Development in NLP4RE Research
Key takeaways:
● 130 tools developed over 30 years
● Only 17 (13%) of them available online
● No evidence these 17 tools still in use
● State of tool development very poor - tools
for NLP4RE don’t exist!
23. NLP Technologies Used in NLP4RE Research
● NLP technique: a underlying technique
for performing a basic NLP task (e.g.,
POS tagging, parsing or tokenization)
● NLP resource: a linguistic data resource
for supporting NLP tools
- Lexical resources (e.g., WordNet,
VerbNet)
- Text corpus/dataset (e.g., British
National Corpus and Brown Corpus)
● NLP tool: a software system or library
supporting NLP pipelines (e.g., Stanford
CoreNLP, NLTK or OpenNLP)
24. Frequently Used NLP Techniques in NLP4RE Research
Key takeaways:
● Only 32/140 (< 1 in 4) NLP techniques
in frequent use
● 90% of these frequently used NLP
techniques are word/syntactic based
● Baseline techniques, e.g., POS
tagging, tokenization, syntactic
parsing most in use
Frequency
of
Use
25. A Typical NLP Pipeline for Processing Requirements Text
26. Frequently Used NLP Tools in NLP4RE Research
Frequency
of
Use
Key takeaways:
● 1 in 5 (14/66) NLP tools in frequent
use
● The top 5 most used tools –
Stanford CoreNLP, Gate, NLTK,
OpenNLP, and Weka - are toolkits,
supporting multiple NLP pipelines
27. Top Most Used NLP Tools in NLP4RE Research
● Stanford CoreNLP (https://stanfordnlp.github.io/CoreNLP/): support
common core NLP tasks, open source, datasets not available
● OpenNLP (https://opennlp.apache.org/): support both lower-level (POS,
chunking) and high-level NLP tasks (language detection, classification),
open source, datasets available
● NLTK (https://www.nltk.org): support a wide range of NLP tasks, open
source + over 50 datasets available
● GATE (https://gate.ac.uk): support text processing tasks such as
information extraction and text annotation
● WEKA (https://www.cs.waikato.ac.nz/ml/weka/): support data mining tasks
such as data pre-processing, classification, regression, clustering,
association rules, and visualization, open source
28. Frequently Used NLP Resources in NLP4RE Research
Frequency
of
Use
Key takeaways:
● 1 in 2 (13/25) NLP resources in
frequent use
● Lexical resources used most
(i.e., WordNet, VerbNet)
● Only two RE related datasets:
MODIS and CM-1
29. Other RE Related NLP Resources
● PROMISE Software Engineering Repository
(http://promise.site.uottawa.ca/SERepository/datasets-page.html)
○ Created by Sayyad and Menzies from University of Ottawa, Canada in 2005
○ Contains 20 publicly available datasets (including MODIS and CM-1)
● PURE Dataset (https://zenodo.org/record/1414117)
○ Created by Alessio Ferrari, et al. in 2017
○ Contains 79 publicly available requirements documents collected from the Web
● User Stories (https://data.mendeley.com/datasets/7zbk8zsd8y/1)
○ Created by Fabiano Dalpiaz in 2018
○ Contains 22 datasets, each with 50+ requirements, expressed as user stories
● FN-RE (https://zenodo.org/record/1291660)
○ Created by Waad Alhoshan et al. in 2018
○ Contains a dataset of requirements documents annotated with FrameNet semantics
● App Reviews
○ 13 annotated datasets are reported in this paper: Dabrowski, Jacek, et al. "App Review Analysis for
Software Engineering: A Systematic Literature Review." University College London, Tech. Rep (2020).
○ Mobile App Market (https://sites.google.com/site/appsimilarity/)
30. Emerging Trends in NLP4RE Research – Some Observations
2007 onwards
Consistent rise in developing
ML-based approaches (using ML
algorithms such as SVM, DT, NB,
KNN) for automatic requirements
classification
2013 onwards
Increase in using
non-traditional requirements
texts (e.g., app reviews and
user stories) in NLP4RE
research
2020
Upsurge in developing
DL-based approaches (e.g.,
BERT and Bi-LSTM) for
automatic requirements
classification
35. Transfer Learning & Machines
To train a model to learn from a type of
problem and leverage that model (i.e., the
knowledge) to solve new BUT related
problem.
Pan, Sinno Jialin, and Qiang Yang. "A survey on transfer learning." IEEE
Transactions on knowledge and data engineering 22.10 (2009): 1345-1359.
less training power and less overall
computing data (dataset)
36. Transfer Learning for NLP: Language Model
Language Model is a way to represent the relations between words in a language
Woman
queen (8%)
king (3%)
prince (0.2%)
princess (5%)
daughter (45%)
son (0.1%)
father (2%)
mother (25%)
Neural Language Models
(Contextual Embedding)
Designed to overcome ambiguity
and language variations
Ex. Transformers-based Models such asL BERT ,
GPT-2 & GPT-3, XLNet, and more!
37. BERT Language Model
Bi-directional Encoder Representations from Transformers
Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional
transformers for language understanding." arXiv preprint
arXiv:1810.04805 (2018).
Pretrained LM which is originally initiated
by Google AI to enhance the Google
search engine to predicate user’s queries
BERTBase
→ 12 encoder layers
BERTLarge
→ 24 encoder layers
Vaswani, Ashish, et al. "Attention is all you need." arXiv
preprint arXiv:1706.03762 (2017).
38. BERT Language Model
Bi-directional Encoder Representations from Transformers
So, what can we do with BERT model?
Use it as
classifier
Extract
features and
train a new
classifier
Fine-tune the
BERT model
Use
pre-trained
BERT model
39. NLP Transfer Learning for RE: Getting Started
Problem to Solve:
Classifying a set of non-functional requirements which discern usability and security
aspects.
The dataset can downloaded from https://github.com/tobhey/NoRBERT
Part 1: Using pre-trained BERT
with Zero-Shot Learning Classifier
Part 2: Using pre-trained BERT
model to extract features and train a
LR classifier
Part 3: Fine-tune BERT model and
use it in a new classifier
https://colab.research.google.com/d
rive/158H-lEJE1pc-xHc1ISBAKGD
HMt_eg4Gn?usp=sharing
https://colab.research.google.com/d
rive/1B_5ow3rvS0Qz1y-KyJtlMNnm
gmx9w3kJ?usp=sharing
https://colab.research.google.com/d
rive/1Xrm0gNaa41YwlM5g2CRYYX
cRvpbDnTRT?usp=sharing
40. Transfer Learning for NLP4RE Tasks: Key Takeaways
On-the-fly requirement
categorization/classification
Auto-completing
requirements at
requirements elicitation
phase
Enabling re-usability of
requirements from large
repositories
Language Model assist in identifying contextual information ...
This is very useful for NLP4RE tasks
“Unfortunately, LMs trained on unfiltered text corpora suffer from degenerate and
biased behaviour.”
Using the pre-trained model directly might not bring the best of these language
techniques ⇒ Fine-tuning is highly encouraged for domain-specific task
https://github.com/tobhey/NoRBERT
Schramowski, Patrick, et al. "Language Models have a Moral
Dimension." arXiv preprint arXiv:2103.11790 (2021).