Leveraging Textual Features for Best Answer Prediction in Community-based Question Answering

نویسندگان

  • George Gkotsis
  • Maria Liakata
  • Carlos Pedrinaci
  • John Domingue
چکیده

One of the intriguing problems in Community-based Question Answering (CQA) research is the automatic identification of the best answer, which is expected to benefit various stakeholders. First of all, since several answers are provided for each question, the readers of these websites will be able to process the candidate answers more efficiently and mitigate the “information overload” phenomenon. Secondly, a mechanism that identifies high quality answers will increase awareness within the community and will help to put more effort into questions that remain poorly answered. For instance, in StackOverflow(SO) alone, as of September 2013, we found that approximately 33% of the questions have yet to be marked as resolved (i.e., out of the 5 million, 1.7 million questions have no answer marked as “accepted”). Researchers in related fields have used lexical, syntactic, and discourse features to produce a predictive model of readers’ judgments [3]. In several cases, the use of shallow features, i.e. features that do not employ semantic or syntactic parsing such as sentence length or word length, have been shown to be effective in assessing properties such as ease of reading or usefulness. However, with respect to CQA, research efforts towards the exploitation of shallow features report relatively low results. To improve the efficacy of their models, researchers refer to more contextual information, such as the score of each answer, the comments received or the reputation of the user [1]. However, these features may not be readily available since a) comments and scores introduce an inherent delay, and b) features based on reputation may not be applicable on a newly formed community or pose a threat to its development (i.e. preferential attachment) and result in the reinforcement of the pre-existing community hierarchy. In our approach, we revisit the case of shallow linguistic features and use features found in [3]. Figure 1 shows the average feature values for the accepted answers together with the non-accepted ones of SO using a one-month window time frame. As seen from the figure, the linguistic features clearly differentiate the accepted from the non-accepted answers. More specifically, accepted answers tend to be longer, use a less common vocabulary, contain longer words, more words per sentence and the longest sentences are lengthier. Even though the above remarks look promising concerning best answer prediction, when training a binary classifier prediction remains weak (58% precision and 0.56 F-Measure

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Community Answer Summarization for Multi-Sentence Question with Group L1 Regularization

We present a novel answer summarization method for community Question Answering services (cQAs) to address the problem of “incomplete answer”, i.e., the “best answer” of a complex multi-sentence question misses valuable information that is contained in other answers. In order to automatically generate a novel and non-redundant community answer summary, we segment the complex original multi-sent...

متن کامل

Deceptive Answer Prediction with User Preference Graph

In Community question answering (QA) sites, malicious users may provide deceptive answers to promote their products or services. It is important to identify and filter out these deceptive answers. In this paper, we first solve this problem with the traditional supervised learning methods. Two kinds of features, including textual and contextual features, are investigated for this task. We furthe...

متن کامل

Incorporate Credibility into Context for the Best Social Media Answers

In this paper, we focus on the task of identifying the best answer for a usergenerated question in Collaborative Question Answering (CQA) services. Given that most existing research on CQA has focused on non-textual features such as click-through counts which are relatively difficult to access, we examine the effectiveness of diverse content-based features for the task. Specially, we propose to...

متن کامل

Towards Predicting the Best Answers in Community-based Question-Answering Services

Community-based question-answering (CQA) services contribute to solving many difficult questions we have. For each question in such services, one best answer can be designated, among all answers, often by the asker. However, many questions on typical CQA sites are left without a best answer even if when good candidates are available. In this paper, we attempt to address the problem of predictin...

متن کامل

Fora: Leveraging the Power of Internet Communities for Question Answering

This paper introduces a system for searching question answer pairs automatically extracted from the discussions in internet communities. The system, named Fora, aggregates discussions from multiple forums and newsgroups in the same domain, automatically extracts question answer pairs from the data, and provides searches of the question answer pairs. The system also offers expert search, query s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1506.02816  شماره 

صفحات  -

تاریخ انتشار 2015