A Hierarchical Topic Modelling Approach for Tweet Clustering

نویسندگان

  • Bo Wang
  • Maria Liakata
  • Arkaitz Zubiaga
  • Rob Procter
چکیده

While social media platforms such as Twitter can provide rich and up-to-date information for a wide range of applications, manually digesting such large volumes of data is difficult and costly. Therefore it is important to automatically infer coherent and discriminative topics from tweets. Conventional topic models and document clustering approaches fail to achieve good results due to the noisy and sparse nature of tweets. In this paper, we explore various ways of tackling this challenge and finally propose a two-stage hierarchical topic modelling system that is efficient and effective in alleviating the data sparsity problem. We present an extensive evaluation on two datasets, and report our proposed system achieving the best performance in both document clustering performance and topic coherence.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modelling Techniques for Twitter Contents: A Step beyond Classification based Approaches

In this paper we present our first participation at RepLab Campaign. Our work is focused in two contributions. The first one is the use of an IR method to address Polarity and Filtering tasks. These two tasks can be seen as the same problem: to find the most relevant class to annotate a given tweet. For that, we applied a classical IR approach, using the tweet content as query against an index ...

متن کامل

Topic Evolutionary Tweet Stream Clustering Algorithm and TCV Rank Summarization

Tweet are being created short text message and shared for both users and data analysts. Twitter which receive over 400 million tweets per day has emerged as an invaluable source of news, blogs, opinions and more. our proposed work consists three components tweet stream clustering to cluster tweet using k-means cluster algorithm and second tweet cluster vector technique to generate rank summariz...

متن کامل

PKUICST at TREC 2014 Microblog Track: Feature Extraction for Effective Microblog Search and Adaptive Clustering Algorithms for TTG

This paper describes our approaches to temporally-anchored ad hoc retrieval task and tweet timeline generation (TTG) task in the TREC 2014 Microblog track. In the ad hoc search, we apply a learning to rank framework which utilizes not only the various content relevance of a tweet, but also the quality of a tweet. External evidences are well incorporated in our approach with Web-based query expa...

متن کامل

به کارگیری روش‌های خوشه‌بندی در ریزآرایه DNA

Background: Microarray DNA technology has paved the way for investigators to expressed thousands of genes in a short time. Analysis of this big amount of raw data includes normalization, clustering and classification. The present study surveys the application of clustering technique in microarray DNA analysis. Materials and methods: We analyzed data of Van’t Veer et al study dealing with BRCA1...

متن کامل

Clustering and Topic Modelling: A New Approach for Analysis of National Cyber security Strategies

The consequences of cybersecurity attacks can be severe for nation states and their people. Recently many nations have revisited their national cybersecurity strategies (NCSs) to ensure that their cybersecurity capabilities is sufficient to protect their citizens and cyberspace. This study is an initial attempt to compare NCSs by using clustering and topic modelling methods to investigate the s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017