Advanced topic analysis for online documents in Web 2.0

Office of Research Affairs and Knowledge Transfer Research Achievements Advanced topic analysis for online documents in Web 2.0

Advanced topic analysis for online documents in Web 2.0

With the development of Web 2.0, various online applications such as Facebook, Twitter and Weibo have generated an enormous number of real-time-updated messages. Topic modeling — a technique for discovering the topics of a document — has become increasingly important in analyzing the meaning of online messages. However, the conventional methods for topic modeling suffer from problems such as high computational complexity and limited scalability. To tackle these issues, the research team of Professor Philips Fu Lee Wang, Dean of the School of Science and Technology at HKMU, successfully obtained funding from Research Grants Council's Faculty Development Scheme to develop a new method for topic modeling. 
 
The new method features the parallel processing of new documents in a real-time manner, an online learning framework for topic discovery to reduce computational complexity, and the hierarchical updating of topics to improve efficiency and accuracy.
 
As the first phase of the study, the research team has successfully developed two efficient and scalable topic models which can be trained three times faster than the existing method.
 
The topic modeling method developed in this study can be used to identify words expressing similar sentiments and a word conveying different sentiments; overcome the problem of word ambiguity and improve the performance of information retrieval tasks; and address the problem of data sparsity. With these advantages, the research outcomes are useful for a broad range of online applications, such as the prediction of sentiment in social media, automatic retrieval of information, and recommender systems.
 
For more information about their study, please refer to the following articles generated from the study:
Neural mixed counting models for dispersed topic discovery’, The 58th Annual Meeting of the Association for Computational Linguistics.