Evaluating topic coherence measures

نویسندگان

  • Frank Rosner
  • Alexander Hinneburg
  • Michael Röder
  • Martin Nettling
  • Andreas Both
چکیده

Topic models extract representative word sets—called topics—from word counts in documents without requiring any semantic annotations. Topics are not guaranteed to be well interpretable, therefore, coherence measures have been proposed to distinguish between good and bad topics. Studies of topic coherence so far are limited to measures that score pairs of individual words. For the first time, we include coherence measures from scientific philosophy that score pairs of more complex word subsets and apply them to topic scoring.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating Topic Coherence Using Distributional Semantics

This paper introduces distributional semantic similarity methods for automatically measuring the coherence of a set of words generated by a topic model. We construct a semantic space to represent each topic word by making use of Wikipedia as a reference corpus to identify context features and collect frequencies. Relatedness between topic words and context features is measured using variants of...

متن کامل

Measuring the coherence of writing using topic-based analysis

Measuring the coherence of writing using topic-based analysis Abstract Among the many possible aspects to assess in writing, one of the most problematic is coherence. The problems with marking coherence arise because it is by nature subjective. However, the reasonable probability of several readers reaching a consensus concerning the coherence of a text suggests that it may be possible to assig...

متن کامل

The Sensitivity of Topic Coherence Evaluation to Topic Cardinality

When evaluating the quality of topics generated by a topic model, the convention is to score topic coherence — either manually or automatically — using the top-N topic words. This hyper-parameter N , or the cardinality of the topic, is often overlooked and selected arbitrarily. In this paper, we investigate the impact of this cardinality hyper-parameter on topic coherence evaluation. For two au...

متن کامل

Distributional Lexical Entailment by Topic Coherence

Automatic detection of lexical entailment, or hypernym detection, is an important NLP task. Recent hypernym detection measures have been based on the Distributional Inclusion Hypothesis (DIH). This paper assumes that the DIH sometimes fails, and investigates other ways of quantifying the relationship between the cooccurrence contexts of two terms. We consider the top features in a context vecto...

متن کامل

A Novel Measure for Coherence in Statistical Topic Models

Big data presents new challenges for understanding large text corpora. Topic modeling algorithms help understand the underlying patterns, or “topics”, in data. Researchersauthor often read these topics in order to gain an understanding of the underlying corpus. It is important to evaluate the interpretability of these automatically generated topics. Methods have previously been designed to use ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1403.6397  شماره 

صفحات  -

تاریخ انتشار 2013