Conversation Extraction in Dynamic Text Message Stream

نویسندگان

  • Le Wang
  • Yan Jia
  • Yingwen Chen
چکیده

Text message stream which is produced by Instant Messager and Internet Relay Chat poses interesting and challenging problems for information technologies. It is beneficial to extract the conversations in this kind of chatting message stream for information management and knowledge finding. However, the data in text message stream are usually very short and incomplete, and it requires efficiency to monitor thousands of continuous chat sessions. Many existing text mining methods encounter challenges. This paper focuses on the conversation extraction in dynamic text message stream. We design the dynamic representation for messages to combine the text content information and linguistic feature in message stream. A memory structure of reversed maximal similar relationship is developed for renewable assignments when grouping messages into conversations. We finally propose a double time window algorithm based on above methods to extract conversations in dynamic text message stream. Experiments on a real dataset shows that our method outperforms two baseline methods introduced in a recent related paper about 47% and 15% in terms of F measure

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classifying Short Text in Social Media: Twitter as Case Study

With the huge growth of social media, especially with 500 million Twitter messages being posted per day, analyzing these messages has caught intense interest of researchers. Topics of interest include micro-blog summarization, breaking news detection, opinion mining and discovering trending topics. In information extraction, researchers face challenges in applying data mining techniques due to ...

متن کامل

A Study on Mining Approach under Cyber Crime Analysis

Today, the use of social network is been increased in daily life of a user. But this increased use of social media has also increased the associated threats. There are number of criminal activities possible over the web or social network conversation. This kind of conversation includes spamming, blackmailing, cyber threatening etc. The identification of these kind of messages or text segments i...

متن کامل

An Information Retrieval Approach to Short Text Conversation

Human computer conversation is regarded as one of the most difficult problems in artificial intelligence. In this paper, we address one of its key sub-problems, referred to as short text conversation, in which given a message from human, the computer returns a reasonable response to the message. We leverage the vast amount of short conversation data available on social media to study the issue....

متن کامل

Similar Section Extraction for the Analysis of the Stream Data Structure

The present paper proposes a new algorithm for discovering similar sections between two time sequence data sets. The algorithm, called Partial Matching Discovery, or PMD, is based on Dynamic Programming. PMD realizes fast matching between arbitrary sections in reference stream data and input stream data and enables the extraction of similar sections in a synchronous manner with the input data. ...

متن کامل

ارائه یک مدل احتمالاتی جهت تعیین انسجام متن در سیستم های پرسش و پاسخ تعاملی

Evaluation plays an important role in interactive question answering systems like many computational linguistics fields. The coherence between the questions and the answers exchanged between the user and the system is one of the important criteria in evaluating these systems. In this paper, a new approach to determine the degree of coherence of generated text by the IQA systems is presented. Th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • JCP

دوره 3  شماره 

صفحات  -

تاریخ انتشار 2008