New metrics for blog mining

نویسندگان

  • Brian Ulicny
  • Kenneth Baclawski
  • Amy L. Magnus
چکیده

Blogs represent an important new arena for knowledge discovery in open source intelligence gathering. Bloggers are a vast network of human (and sometimes non-human) information sources monitoring important local and global events, and other blogs, for items of interest upon which they comment. Increasingly, issues erupt from the blog world and into the real world. In order to monitor blogging about important events, we must develop models and metrics that represent blogs correctly. The structure of blogs requires new techniques for evaluating such metrics as the relevance, specificity, credibility and timeliness of blog entries. Techniques that have been developed for standard information retrieval purposes (e.g. Google's PageRank) are suboptimal when applied to blogs because of their high degree of exophoricity, quotation, brevity, and rapidity of update. In this paper, we offer new metrics related for blog entry relevance, specificity, timeliness and credibility that we are implementing in a blog search and analysis tool for international blogs. This tools utilizes new blog-specific metrics and techniques for extracting the necessary information from blog entries automatically, using some shallow natural language processing techniques supported by background knowledge captured in domain-specific ontologies.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Uses of Ontologies in Open Source Blog Mining

The blogosphere provides a novel window into public opinion, but its dynamic nature makes it an elusive medium to analyze and interpret in the aggregate, where it is most informative. We are developing new technology employing ontologies to solve this problem by fusing the signals of the blogosphere and zeroing in on issues that are most likely to migrate offline, enabling analysts to anticipat...

متن کامل

Overlapping Community Detection in Temporal Text Networks

Network is a powerful language to represent relational data. One way to understand network is to analyze groups of nodes which share same properties or functions. The task of discovering such groups is known as community detection. Generally, two types of information can be utilized to fulfill this task, i.e., the link structures and the node attributes. The temporal text network is a special k...

متن کامل

New Metrics for Newsblog Credibility

The blogosphere is an invaluable source of insight into attitudes towards significant world and local events. Traditional measures of topical relevance, timeliness, specificity and credibility are inadequate when it comes to blogs, however, due to their short length, high degree of quotation, exophoricity, and the short life cycle of blog postings. In this paper, we motivate a novel metric for ...

متن کامل

Pbm: A new dataset for blog mining

Text mining is becoming vital as Web 2.0 offers collaborative content creation and sharing. Now Researchers have growing interest in text mining methods for discovering knowledge. Text mining researchers come from variety of areas like: Natural Language Processing, Computational Linguistic, Machine Learning, and Statistics. A typical text mining application involves preprocessing of text, stemm...

متن کامل

Customer Segmentation and Classification from blogs by Using Data Mining: an Example of VoIP Phone

Blogs have been considered the 4 th Internet application which can cause radical change in the world, after E-mail, Instant Message, and Bulletin Board System (BBS). Lots of Internet users heavily rely on them to express their emotions and personal comments on whatever topics interest them. Nowadays, blogs have become the popular media and could been viewed as new marketing channels. Depending ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007