The document discusses a new algorithm for topic mining over asynchronous text sequences. The algorithm aims to explore correlations between multiple related text sequences that may have different time stamps. It consists of two alternating steps: 1) extracting common topics from sequences based on adjusted time stamps, and 2) adjusting time stamps according to the discovered topic time distributions. The approach is evaluated on research papers and news articles, demonstrating effectiveness in identifying topics across asynchronously published documents.
1. Impulse Technologies
Beacons U to World of technology
044-42133143, 98401 03301,9841091117 ieeeprojects@yahoo.com www.impulse.net.in
Topic-Mining-over-Asynchronous-Text-Sequences
Abstract
Time stamped texts, or text sequences, are ubiquitous in real-world
applications. Multiple text sequences are often related to each other by sharing
common topics. The correlation among these sequences provides more meaningful
and comprehensive clues for topic mining than those from each individual
sequence. However, it is nontrivial to explore the correlation with the existence of
asynchronism among multiple sequences, i.e., documents from different sequences
about the same topic may have different time stamps. In this paper, we formally
address this problem and put forward a novel algorithm based on the generative
topic model. Our algorithm consists of two alternate steps: the first step extracts
common topics from multiple sequences based on the adjusted time stamps
provided by the second step; the second step adjusts the time stamps of the
documents according to the time distribution of the topics discovered by the first
step. We perform these two steps alternately and after iterations a monotonic
convergence of our objective function can be guaranteed. The effectiveness and
advantage of our approach were justified through extensive empirical studies on
two real data sets consisting of six research paper repositories and two news article
feeds, respectively.
Your Own Ideas or Any project from any company can be Implemented
at Better price (All Projects can be done in Java or DotNet whichever the student wants)
1