DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
Triantafyllia Voulibasi
1. Test Routine Automation through Natural
Language Processing Techniques
Voulivasi Triantafyllia
Guided by:
A. Symeonidis, Aristotle University of Thessaloniki
E. Gómez, European Space Operations Centre
11 July 2017
ARISTOTLE UNIVERSITY OF THESSALONIKI
Faculty of Engineering
Department of Electrical and Computer Engineering
Information Processing Laboratory
6. Objectives
1. Automation in the creation of tests
● Time-consuming procedure: Senior SE → Tester
● Difficult to work with for new testers
2. Requirements Tracing
7. Objectives
1. Automation in the creation of tests
● Time-consuming procedure: Senior SE → Tester
● Difficult to work with for new testers
2. Requirements Tracing
● Tests originate from SPRs (Software Problem Report) instead of
requirements
● Lack of consistency in evaluation of software requirements
15. Automated test
Sequence of test building blocks
Test Block Attributes:
● Name
● Description
● Parameters
● Precondition
● Postcondition
16. Automated test
Sequence of test building blocks
Test Block Attributes:
● Name
● Description
● Parameters
● Precondition
● Postcondition
Natural Language
28. Natural Language Representation
Vector Space
Model
Natural
Language
Machine
Representation
One-hot encoding
● Does not capture semantics
● Huge Length
-- equal to the size of the total unique
vocabulary in the corpora
capital 195
29. Natural Language Representation
Vector Space
Model
Natural
Language
Machine
Representation
One-hot encoding
● Does not capture semantics
● Huge Length
-- equal to the size of the total unique
vocabulary in the corpora
capital 195
30. Word Embeddings
● state-of-the-art word embedding methods: Word2Vec, Glove and FastText changed
completely NLP
● reduce dimensionality
● capture semantics
Word2Vec vector:
[0.12, 0.23, 056]
31. Word2vec: simplified structure
● Shallow feed-forward neural network with one hidden layer and linear activation function
● Input and output are hot-encoded vectors of pairs of words: drink | juice, New | York
● The word vectors are referring to the first (left) weight matrix
32. Word2vec: simplified structure
● Shallow feed-forward neural network with one hidden layer and linear activation function
● Input and output are hot-encoded vectors of pairs of words: drink | juice, New | York
● The word vectors are referring to the first (left) weight matrix
35. Word2Vec Parameters
● size A value of 100 - 1000 for the dimension of the hidden layer
● window The maximum distance between the target word and a
neighbor word
● min_count Minimum frequency count of words
● workers How many threads to use behind the scenes?
● sg Whether to use skip-gram or CBOW architecture
● negative Whether to use negative sampling
● corpus relevant documents
36. Word2Vec Parameters
● size A value of 100 - 1000 for the dimension of the hidden layer
● window The maximum distance between the target word and a
neighbor word
● min_count Minimum frequency count of words
● workers How many threads to use behind the scenes?
● sg Whether to use skip-gram or CBOW architecture
● negative Whether to use negative sampling
● corpus relevant documents
37. Word2Vec Parameters
● size A value of 100 - 1000 for the dimension of the hidden layer
● window The maximum distance between the target word and a
neighbor word
● min_count Minimum frequency count of words
● workers How many threads to use behind the scenes?
● sg Whether to use skip-gram or CBOW architecture
● negative Whether to use negative sampling
● corpus relevant documents
38. Word2Vec Corpus: Software Documentation
● Glossary ● Software Problem Report
● Technical Notes ● Software Design Document
● Software Development Plan ● Kick Off Meeting Minutes
● Software User Manual ● Final Report
● Software Requirements Specification ● Software Validation Specification
● Software Unit and Integration Test
Plan
● Configuration and Installation Guide
48. Spell Checker
Levenshtein distance (LD)
s = "test"
t = "test" → LD(s,t) = 0
no transformations are needed
s = "test"
t = "tent" → LD(s,t) = 1
one substitution transforms s into t
49. Presenter
● Present the processed information to the
UI
● Communicate with the other components
to trigger data processing in the system
Presenter & Parser
Parser
● Identify how many blocks match to a
sentence
● Identify a test step’s category
1. Informative
“Open the Manual Stack and disable
dynamic PTV checks.” → 2 test blocks
2. Repetitive
“Repeat steps 1 to 4.” → 4 test blocks
53. Recommender - Parameters
Step Parameters Block
Parameters
Parameter
Score
a, b, c a, b, c 1.0
- a, b, c 0.0
a, b, c - 0.0
a, b, c a, c 0.6667
54. Recommender - Association Rules Mining
● Re-ranking based on Association Rules Mining
Itemsets of interest: the previous blocks and the
block in question
● Calculate all those itemsets and the number of their
occurences together σ: support count
● Calculate the confidence scores c of X → Y
○ X : the previous blocks
○ Y : each test block in question
ID Items
1 {1, 2}
2 {1, 2, 3, 4}
3 {1, 2, 3 5}
55. Recommender - Association Rules Mining
● Re-ranking based on Association Rules Mining
Itemsets of interest: the previous blocks and the
block in question
● Calculate all those itemsets and the number of their
occurences together σ: support count
● Calculate the confidence scores c of X → Y
○ X : the previous blocks
○ Y : each test block in question
ID Items
1 {1, 2}
2 {1, 2, 3, 4}
3 {1, 2, 3 5}
56. Flow Checker & Data Container
Flow Checker Data Container
● data import
● data export
● data distribution to other components
(i.e. Repetitive category test step)
58. Dataset
Word2Vec Training Corpus
ESA Mission Control System Infrastructure
● Files: ~200
● Requirements: 5569
● Test Scenarios: 5040
Word counts: 3.006.330
Word embeddings: 31580
Testing Dataset
● Requirements: 5569
● Test Blocks
○ Libraries: 21
○ Extracted Test Blocks: 2160
○ Filtered - High Level Test Blocks: 685
● Test Scenarios
○ Automated by a test engineer: 8
○ Total Test Steps: 181
○ Associated Test Blocks: 260
○ Linked Requirements: 36
66. ● This work is the first step towards AI of SWE data in ESOC
● Retrieve software documentation information
● Increase productivity of the Testing team
● Hidden purpose: gather labeled data
Conclusion
67. ● This work is the first step towards AI of SWE data in ESOC
● Retrieve software documentation information
● Increase productivity of the Testing team
● Hidden purpose: gather labeled data
Future Work
● Use of a Deep Learning Model for recommendations
● Embed pre-trained Word vectors
● On-site experiments in time and effort
● Incorporate Software Testing metrics
Conclusion
68. Thanks to:
● Assoc. Professor Andreas Symeonidis
● Eduardo Gómez
● ESOC Data Analytics team
● ISSEL Labgroup