Constructing Time Decompositions for Analyzing Time-Stamped Documents

نویسندگان

  • Parvathi Chundi
  • Daniel J. Rosenkrantz
چکیده

Extraction of sequences of events from news and other documents based on the publication times of these documents has been shown to be extremely effective in tracking past events. This paper addresses the issue of constructing an optimal decomposition of the time period associated with a given document set, i.e., a decomposition with the smallest number of subintervals, subject to no or limited loss of information. We introduce the notion of the compressed interval decomposition, where each subinterval consists of consecutive time points having identical information content. We define optimality, and show that any optimal information preserving decomposition of the time period is a refinement of the compressed interval decomposition. We define several special classes of measure functions (functions that compute the significant information from document sets), based on their effect on the information computed as document sets are combined. These classes are used in developing algorithms for computing an optimal information preserving decomposition of the time period of a given document set. We also define the notion of information loss of a time decomposition of a given document set and give an efficient algorithm for computing an optimal lossy decomposition. We discuss the effectiveness of our algorithms on the Reuters–21578, Distribution 1.0 data set and a subset of Medline abstracts.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Introducing and Analyzing Two Historical Documents about the Development of Tehran at the Reign of Nasiruddin Shah

Tehran has changed a lot at the time of Nasiruddin Shah. The changes began from the destruction of Tahmāsbī fortress and constructing a new one and development of the city in 1867. These were done because the town was small; therefore, Nasiruddin Shah ordered to make the changes. There is a very few information about the formation of the development and its details. The existing data can be ext...

متن کامل

Introducing and Analyzing Two Historical Documents about the Development of Tehran at the Reign of Nasiruddin Shah

Tehran has changed a lot at the time of Nasiruddin Shah. The changes began from the destruction of Tahmāsbī fortress and constructing a new one and development of the city in 1867. These were done because the town was small; therefore, Nasiruddin Shah ordered to make the changes. There is a very few information about the formation of the development and its details. The existing data can be ext...

متن کامل

Entropy Based Measure Functions for Analyzing Time Stamped Documents

Measure functions that assign numeric values to keywords to capture their significance in a document set play a crucial role in the construction of a time decomposition of a document set. In this paper, we define two measure functions based on the notion of entropy. The interval entropy measure function identifies time intervals that have non-uniform keyword distributions and assigns high measu...

متن کامل

A Two-level Time-Stamping System

electronic document existed at a certain point in time and that it has not been modified since then. Different time-stamping schemes have already been proposed. Most of them use the concept of trusted Time-Stamping Authority (TSA). A TSA is in charge of time-stamping documents and delivering a timestamping certificate for each time-stamped document. The purpose of this paper is to propose a new...

متن کامل

Extracting Temporal Equivalence Relationships among Keywords from Time-Stamped Documents

Identifying keyword associations from text and search sources is often used to facilitate many tasks such as understanding relationships among concepts, extracting relevant documents, matching advertisements to web pages, expanding user queries, etc. However, these keyword associations change as the underlying content changes with time. Two keywords that are associated with each other during on...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004