Brookefield Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
19BCE1367_Capstone_Review 2_Final.pdf
1. School of Computer Science and Engineering Register No: 19BCE1367
Deep Neural Network-based Limerick
Generation for an Image
Name: Divyanshi Thapa
Register No: 19BCE1367
Programme and Specialization: B.Tech CSE
CAPSTONE PROJECT
REVIEW 2
Guide Name:
Dr. Praveen Joe I R
2. School of Computer Science and Engineering Register No: 19BCE1367
01
Introduction
Outline
�
�
02
03 05
06 08
07
Problem
Statement
Research
Challenges
What to be
done next
Guide
Approval
Proposed
System
Research
Paper Status
04
Research
Objectives
09
References
3. School of Computer Science and Engineering Register No: 19BCE1367
Introduction
01
4. School of Computer Science and Engineering Register No: 19BCE1367
● Creative writing using artificial intelligence (AI) is
one of the most popular and rapidly growing
research fields. It is highly intriguing but also
challenging as we go more to the side of
generating human-like texts with constraints as
we have in poems.
● Among creative writing tasks, paraphrasing and
writing stories are easier than writing poetry
because poems have many restrictions such as
rhyming structures, number of lines, type of
language, etc.
Introduction
5. School of Computer Science and Engineering Register No: 19BCE1367
● Several poem frameworks have been developed to assist AI in generating
human-like poems to address the issue.
Introduction
● Poems in literature can be broadly
classified into nine categories depending
on their rhyming structure and the
number of lines. Among all the nine
categories, one of the most challenging
tasks is to generate a limerick using
artificial intelligence and deep learning as
a limerick is a five-lined poem that has a
strict rhyming structure of AABBA
6. School of Computer Science and Engineering Register No: 19BCE1367
● Image captioning has also helped to
automatically generate well-formed sentences
from a given image which is widely used in
many NLP tasks such as VQA.
● Language models based on neural networks
have improved the state of the art with regard to
predictive language modeling, while topic
models are successful at capturing clear-cut,
semantic dimensions.
● NLP + DL = a system which can understand and
analyze an image and can generate a creative
human like poem based of the theme of the
image.
NLP + Deep learning
7. School of Computer Science and Engineering Register No: 19BCE1367
Problem Statement
02
8. School of Computer Science and Engineering Register No: 19BCE1367
● For a poem to be meaningful, both linguistic and literary
aspects need to be taken into account.
● With the advancement in image captioning, the NLP
tasks such as Question Answering has gone to it’s phase 2
that is Visual Question Answering.
● “To create a deep learning model which can create
limericks (a form of poem) for the given input image in
English language. ”
Problem Statement
9. School of Computer Science and Engineering Register No: 19BCE1367
Current approaches of generating rhyming English poetry with a neural network
involve constraining output to enforce the condition of rhyme.
The generated poem should be:
● According to the context or theme of the given input image
● Error free
● Coherent
● Follows the rhyming structure of the limerick (AABBA)
Problem Statement
10. School of Computer Science and Engineering Register No: 19BCE1367
Research Challenges
03
11. School of Computer Science and Engineering Register No: 19BCE1367
1. Mapping the theme of the image with the topic of
poem.
2. Both linguistic and literary aspects need to be taken
into account so that the poem is meaningful.
3. Syntactic well-formedness and topical coherence
throughout the poem.
4. Rhyming constraint (Maintaining rhyming scheme)
5. Certain amount of creativity in literature for making
poem interesting.
Research Challenges
12. School of Computer Science and Engineering Register No: 19BCE1367
Research Objectives
04
13. School of Computer Science and Engineering Register No: 19BCE1367
1. An attempt to mimic human creative writing by creating a
simple framework for image to poem generation for English
language.
2. Using a transformer models for better image captioning and
limerick generation .
3. A framework to generate poems (limericks) efficiently so that
it can be deployed as a public application after the post-
processing.
4. Major focus on maintaining the coherency, rhyming structure
of limerick and the efficiency of the framework.
Research Objectives
14. School of Computer Science and Engineering Register No: 19BCE1367
Proposed System
05
15. School of Computer Science and Engineering Register No: 19BCE1367
● The goal is also to make a speed-efficient framework and to do so, the
transformer models are the choice for image analysis and limerick generation.
The features of the image are extracted and the description is generated by
the Vision encoder-decoder model which is a combination of a vision
transformer as an encoder for image feature extraction and GPT-2 as a
decoder for generating human-like captions.
● This caption is treated as the first line of the limerick and is fed to another GPT-
2 model for generating a pool of 20 limericks.
● Best limerick is selected as the final output after post-processing.
Proposed System Introduction
16. School of Computer Science and Engineering Register No: 19BCE1367
Proposed System Diagram
17. School of Computer Science and Engineering Register No: 19BCE1367
Module 1 (M1): Image Captioning
Module 2 (M2): GPT-2 reverse language modeling
Module 3 (M3): Post-processing
Module 3.1 (M3.1): Grammar and spelling error detection
Module 3.2 (M3.2): BERT based word embeddings
Module 4 (M4): Evaluation
List of Modules
18. School of Computer Science and Engineering Register No: 19BCE1367
● The vision encoder-decoder model is used via HuggingFace API
which has ViT as its vision encoder model and GPT-2 as the text
decoder model It is trained on the popular Common Objects in
Context (COCO) dataset which contains more than 120
thousand images with their descriptions.
● The PyTorch version is used for generating the captions for the
given input image.
M1: Image Captioning
19. School of Computer Science and Engineering Register No: 19BCE1367
Problem: GPT2 is a forward language model as it utilizes the standard left-to-right
order of tokens present in a limerick for fine-tuning. This helps in maintaining the
subject’s continuity and coherency but it cannot maintain the rhyming structure
of the poem.
M2: GPT-2 reverse language modeling
20. School of Computer Science and Engineering Register No: 19BCE1367
● Solution: The GPT-2 model can be fine-tuned with the corpus of reverse order
(right to left) of tokens present in the limerick. This technique helps the GPT-2
model to learn the rhyming structure.
● The caption generated from the image caption model is fed into this fine-
tuned reverse GPT-2 model as a seed sentence to generate limericks and a
pool of 20 limericks is generated
M2: GPT-2 reverse language modeling
21. School of Computer Science and Engineering Register No: 19BCE1367
M3.1: Grammar and spelling error detection
- The generated limerick should be syntactically
correct and in order to do so, an open-source spelling
and grammar checker is used to assign scores to
each limerick. The limerick with no errors are chosen
for further processing.
M3: Post-processing
22. School of Computer Science and Engineering Register No: 19BCE1367
● Bidirectional Encoder Representations from Transformers (BERT) model can be
used to generate in-context embeddings.
● The subject continuity is quantified throughout the limerick as the average noun
centroid distance in the embedding space[5].
● If:
○ mean = high, nouns far from the average subject of the limerick.
○ standard deviation = high, many subjects present in the limerick.
● The limericks with lowest mean and standard deviation is selected as final output..
M3: Post-processing
M3.2: BERT based word embeddings
23. School of Computer Science and Engineering Register No: 19BCE1367
Automatic evaluation methods :
- BLEU (Bilingual Evaluation Understudy ) score
- Cosine Similarity
- Semantic Similarity (using Sentence BERT)
The MultiM-Poem dataset is a collection of 8292 images scraped from the Flikr and
each image is mapped to a related human-written poem. The image will be the
user input image and the related poem will be the ground truth.
M4: Evaluation
24. School of Computer Science and Engineering Register No: 19BCE1367
What to be done next?
06
25. School of Computer Science and Engineering Register No: 19BCE1367
1. Compilation of the results.
2. Research paper completion.
What to be done next?
26. School of Computer Science and Engineering Register No: 19BCE1367
Research Paper Status
07
27. School of Computer Science and Engineering Register No: 19BCE1367
1. Abstract.
2. Introduction.
3. Related work.
4. Approach.
a. Architecture
b. Image captioning
c. Language model
5. Experiment.
6. Result
7. Conclusion and Future work.
Research Paper Status
28. School of Computer Science and Engineering Register No: 19BCE1367
Guide Approval
08
29. School of Computer Science and Engineering Register No: 19BCE1367
Guide Approval mail screenshot
30. School of Computer Science and Engineering Register No: 19BCE1367
[1] Wang, H., Zhang, Y., & Yu, X. (2020). An overview of image caption generation
methods. Computational intelligence and neuroscience, 2020.
[2] Van de Cruys, T. (2020, July). Automatic poetry generation from prosaic text. In
Proceedings of the 58th annual meeting of the association for computational linguistics
(pp. 2471-2480).
[3] Beheitt, M. E. G., & Hmida, M. B. H. (2022). Automatic Arabic Poem Generation with
GPT-2. In ICAART (2) (pp. 366-374).
[4] Liu, D., Guo, Q., Li, W., & Lv, J. (2018, July). A multi-modal chinese poetry generation
model. In 2018 International Joint Conference on Neural Networks (IJCNN) (pp. 1-8).
IEEE.
[5] Lo, K. L., Ariss, R., & Kurz, P. (2022). GPoeT-2: A GPT-2 Based Poem Generator. arXiv
preprint arXiv:2205.08847.
References
(Reference papers)
31. School of Computer Science and Engineering Register No: 19BCE1367
[6] Meyer, J. B. (2019). Generating Free Verse Poetry with Transformer Networks (Doctoral
dissertation, Reed College).
[7] Talafha, S., & Rekabdar, B. (2019, January). Arabic poem generation with hierarchical
recurrent attentional network. In 2019 IEEE 13th International Conference on Semantic
Computing (ICSC) (pp. 316-323). IEEE.
[8] Gao, L., Fan, K., Song, J., Liu, X., Xu, X., & Shen, H. T. (2019, July). Deliberate attention
networks for image captioning. In Proceedings of the AAAI conference on artificial
intelligence (Vol. 33, No. 01, pp. 8320-8327).
[9] Jhamtani, H., Mehta, S. V., Carbonell, J., & Berg-Kirkpatrick, T. (2019). Learning rhyming
constraints using structured adversaries. arXiv preprint arXiv:1909.06743.
[10] Lau, J. H., Cohn, T., Baldwin, T., Brooke, J., & Hammond, A. (2018). Deep-speare: A joint
neural model of poetic language, meter and rhyme. arXiv preprint arXiv:1807.03491.
References
(Reference papers)
32. School of Computer Science and Engineering Register No: 19BCE1367
[11] Talafha, S., & Rekabdar, B. (2021, January). Poetry generation model via deep learning
incorporating extended phonetic and semantic embeddings. In 2021 IEEE 15th
International Conference on Semantic Computing (ICSC) (pp. 48-55). IEEE.
[12] Min, K., Dang, M., & Moon, H. (2021). Deep Learning-Based Short Story Generation for an
Image Using the Encoder-Decoder Structure. IEEE Access, 9, 113550-113557.
[13] Zhang, D., Ni, B., Zhi, Q., Plummer, T., Li, Q., Zheng, H., ... & Wang, D. (2019, August).
Through the eyes of a poet: Classical poetry recommendation with visual input on social
media. In 2019 IEEE/ACM International Conference on Advances in Social Networks
Analysis and Mining (ASONAM) (pp. 333-340). IEEE.
[14] Ghazvininejad, M., Shi, X., Priyadarshi, J., & Knight, K. (2017, July). Hafez: an interactive
poetry generation system. In Proceedings of ACL 2017, System Demonstrations (pp. 43-48).
[15] Liu, Z., Fu, Z., Cao, J., de Melo, G., Tam, Y. C., Niu, C., & Zhou, J. (2019, July). Rhetorically
controlled encoder-decoder for modern chinese poetry generation. In Proceedings of the
57th Annual Meeting of the Association for Computational Linguistics (pp. 1992-2001).
References
(Reference papers)
33. School of Computer Science and Engineering Register No: 19BCE1367
1. https://scottmduda.medium.com/generating-an-edgar-allen-poe-styled-
poem-using-gpt-2-289801ded82c
2. https://timesofindia.indiatimes.com/readersblog/newtech/artificial-
intelligence-in-education-39512/
3. https://news.climate.columbia.edu/2022/04/22/haiku-ai-generated-poetry/
4. https://towardsdatascience.com/transformers-89034557de14
5. https://github.com/minimaxir/gpt-2-simple
6. https://languagetool.org/
References
(Websites and articles)
34. School of Computer Science and Engineering Register No: 19BCE1367
THANK YOU