Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Próximo SlideShare
Gpt1 and 2 model review
Gpt1 and 2 model review
Cargando en…3
×

Eche un vistazo a continuación

1 de 5 Anuncio

Más Contenido Relacionado

Similares a Proposal.pptx (20)

Más reciente (20)

Anuncio

Proposal.pptx

  1. 1. Arabic
  2. 2. Most of the developed techniques for topic modeling are language agnostic. The models can train on any vocabulary. However, once trained, they can be used only with documents having the same fixed vocabulary specific to the training. A trained model cannot handle unknown tokens and cannot be easily applied to other languages. Moreover, in Neural Topic Models deploying various embeddings to represent the input corpus, these models’ performance depends on the quality of the obtained embeddings specific to the training language. This language dependence creates different challenges for different languages that dictate different handling.
  3. 3. Recent advancements in applying topic modeling to Arabic texts include: •Improved pre-processing techniques: •Advanced models:. •Incorporation of Sentiment analysis: •Incorporation of Named Entity Recognition: •Incorporation of Word Embedding: •Handling unstructured data: •Handling dialectal Arabic:
  4. 4. •The first stage is dataset acquisition. •The second stage is for preprocessing the datasets. Preprocessing includes tokenization, removing punctuation, removing stopwords, tagging, and constructing n-grams. Text normalization is essential. •In this study, and for simplicity, only 1-gram tokens were included, and all but noun tokens were removed. •This preprocessing results in a smaller number of tokens and slightly shorter documents which challenges the neural models whose performance is impacted by the corpus’s vocabulary size and the length of the documents [19]. The preprocessing steps were implemented using CAMEL tools [15].

Notas del editor

  • Recent advancements in topic modeling for Arabic texts have focused on several areas, including:
    Improved pre-processing techniques: Researchers have developed new pre-processing techniques specifically designed for Arabic text that can help to improve the effectiveness of topic modeling algorithms. These techniques include text normalization, stemming, and removing diacritical marks.
    Advanced models: Researchers have developed advanced topic modeling algorithms specifically designed for Arabic text, such as the Arabic Latent Dirichlet Allocation (LDA) model and BERT-based models which can effectively identify topics in Arabic texts.
    Incorporation of Sentiment analysis: Some recent research have been incorporating Sentiment analysis techniques to the topic modeling algorithms to get a better understanding of the topics and the authors' attitudes towards the topic.
    Incorporation of Named Entities Recognition: With the help of Named Entities Recognition techniques, researchers have been able to extract important entities from the text and use them in topic modeling to get a more specific understanding of the topics.
    Handling unstructured data: some recent studies have been focused on handling unstructured data such as social media text, which is highly relevant for topic modeling on Arabic texts.
    Handling dialectal Arabic: Some recent works have been focusing on handling dialectal Arabic, which is different than the formal Arabic and has its own vocabulary, grammar and syntax.
    Incorporation of Word Embedding: Researchers have been using word embeddings to improve topic modeling performance. Word embeddings map words to high-dimensional vectors and can be used to capture semantic and syntactic information about words, which can help to improve the accuracy of topic modeling algorithms.

×