utomatic question generation (AQG) has many diverse applications in educational contexts. To bring these benefits to as many students as possible, it is prudent to expand AQG capabilities in as many languages as possible. How- ever, English remains the dominant language in AQG research, and the required natural language processing tools for other languages are often under-resourced relative to English, which can make developing AQG pipelines difficult or im- practical altogether. An approach called parallel construction has been devel- oped to leverage existing English AQG systems for AQG in other languages. The benefits of this parallel construction approach are described, and examples of questions generated from Spanish and Brazilian Portuguese textbooks using the parallel construction method are presented and discussed.
Parallel Construction: A Parallel Corpus Approach for Automatic Question Generation in Non-English Languages
1. c
Parallel Construction: A Parallel Corpus Method for
Automatic Question Generation in Non-English
Languages
Benny G. Johnson, Jeffrey S. Dittel, Rachel Van Campenhout,
Rodrigo Bistolfi, Aida Maeda, and Bill Jerome
VitalSource Technologies, Research and Development
AIED2022 iTextbooks
2. c
Problem
• English is the dominant language in automatic question
generation (AQG) research.
• NLP tools needed for AQG are often under-resourced in
non-English languages.
It would be desirable to leverage the research and existing
AQG systems in English for other languages.
4. c
Automatic Question
Generation
Research has found no
difference in how
students use AI-
generated versus human-
authored questions.
Van Campenhout, R., Brown, N., Jerome, B., Dittel, J. S., & Johnson, B. G. (2021).
Toward Effective Courseware at Scale: Investigating Automatically Generated
Questions as Formative Practice. Learning at Scale. pp. 295–298.
https://doi.org/10.1145/3430895.3460162
Van Campenhout, R., Dittel, J. S., Jerome, B., & Johnson, B. G. (2021). Transforming
textbooks into learning by doing environments: an evaluation of textbook-based
automatic question generation. In: Third Workshop on Intelligent Textbooks at the
22nd International Conference on Artificial Intelligence in Education. CEUR Workshop
Proceedings, ISSN 1613-0073, pp. 1–12. Retrieved from: http://ceur-ws.org/Vol-
2895/paper06.pdf
Johnson, B. G., Dittel, J. S., Van Campenhout, R., & Jerome, B. (2022). Discrimination of
automatically generated questions used as formative practice. Proceedings of the
Ninth ACM Conference on Learning@Scale (pp. 325-329).
https://doi.org/10.1145/3491140.3528323
5. c
Method
Parallel construction uses machine translation (MT) and a
parallel corpus approach.
1. Translate the textbook to English using MT, e.g.,
Google Translate.
2. Align the sentences and words in the parallel corpus.
3. Perform English AQG exactly as usual.
4. For each QG step in English, perform the equivalent
manipulation directly on the original text using the
alignment.
6. c
Questions
Why not simply implement AQG directly in Spanish?
This can be done, but it’s much more work. In our case, the
English AQG system had already been developed, validated, and
tested. Parallel construction enables its reuse for other
languages too.
7. c
Questions
Why not simply use MT to translate the textbook to English, do
AQG, and then translate the questions back to the original
language?
There is still a large gap in quality between MT and human
translation. The errors and noise in MT make this approach
insufficient for educational applications.
8. c
Method
Source language questions are kept up to date in parallel
with the English questions being generated, hence parallel
construction.
Advantages:
• All AQG decisions are made by the English system.
• The linguistic quality of the source text is preserved.
• Much less development work than direct AQG.
9. c
Example
Cloze matching question, Spanish-language macroeconomics
textbook.
Step 1: English system selects sentence for question creation.
However, during the 1980s many borrowing LDCs were unable to cope with the burden
of their foreign debt - a situation known as the LDC debt crisis - and, perhaps as a
consequence, their economic growth. countries experienced a serious decline.
10. c
Example
Cloze matching question, Spanish-language macroeconomics
textbook.
Step 1: English system selects sentence for question creation.
However, during the 1980s many borrowing LDCs were unable to cope with the burden
of their foreign debt - a situation known as the LDC debt crisis - and, perhaps as a
consequence, their economic growth. countries experienced a serious decline.
Corresponding Spanish sentence retrieved using alignment.
Sin embargo, durante la década de 1980 muchos PMD prestatarios no pudieron hacer
frente a la carga de su deuda exterior –situación que se conoce con el nombre de crisis
de la deuda de los PMD– y, quizá como consecuencia, el crecimiento económico de
estos países experimentó una grave disminución.
11. c
Example
Step 2: English system selects answer words.
borrowing, crisis, decline
Corresponding Spanish words retrieved using alignment.
prestatarios, crisis, disminución
12. c
Example
Step 3: Final question in English.
However, during the 1980s many ______ LDCs were unable to cope with the burden of
their foreign debt - a situation known as the LDC debt ______ - and, perhaps as a
consequence, their economic growth. countries experienced a serious ______.
Choices: borrowing, crisis, decline
Final question in Spanish.
Sin embargo, durante la década de 1980 muchos PMD ______ no pudieron hacer frente
a la carga de su deuda exterior –situación que se conoce con el nombre de ______ de la
deuda de los PMD– y, quizá como consecuencia, el crecimiento económico de estos
países experimentó una grave ______.
Opciones: crisis, disminución, prestatarios
13. c
Example
The translated English sentence is noisy.
However, during the 1980s many borrowing LDCs were unable to cope with the burden
of their foreign debt - a situation known as the LDC debt crisis - and, perhaps as a
consequence, their economic growth. countries experienced a serious decline.
The back-translated Spanish question is unacceptable.
Sin embargo, durante la década de 1980, muchos PMA ______ no pudieron
hacer frente a la carga de su deuda externa, una situación conocida como la
______ de la deuda de los PMA, y, tal vez, como consecuencia, su crecimiento
económico. Los países experimentaron un grave ______.
Opciones: crisis, declive, prestatarios
14. c
Example
The translated English sentence is noisy.
However, during the 1980s many borrowing LDCs were unable to cope with the burden
of their foreign debt - a situation known as the LDC debt crisis - and, perhaps as a
consequence, their economic growth. countries experienced a serious decline.
The back-translated Spanish question is unacceptable.
Sin embargo, durante la década de 1980, muchos PMA ______ no pudieron
hacer frente a la carga de su deuda externa, una situación conocida como la
______ de la deuda de los PMA, y, tal vez, como consecuencia, su crecimiento
económico. Los países experimentaron un grave ______.
Opciones: crisis, declive, prestatarios
PMA = países menos avanzados
PMD = países menos desarrollados
15. c
Example
The parallel construction Spanish question is correct.
Sin embargo, durante la década de 1980 muchos PMD ______ no pudieron hacer frente
a la carga de su deuda exterior –situación que se conoce con el nombre de ______ de la
deuda de los PMD– y, quizá como consecuencia, el crecimiento económico de estos
países experimentó una grave ______.
Opciones: crisis, disminución, prestatarios