Subspace Representations of Unstructured Text

نویسنده

  • F. B. Holt
چکیده

Since 1970 vector-space models have been used for information retrieval from unstructured text. The initial simple vector-space models suffered the same problems encountered today in searching the internet. These difficulties were significantly relieved by Latent Semantic Indexing (LSI), introduced in 1990 and improved through 1995. Starting with the simple vector-space model’s sparse term-by-document matrix, LSI used a truncated singular-value decomposition to obtain a low-rank approximation, reinforcing similarities between documents. This approach stalled, owing primarily to a lack of interpretation for the low-rank approximation and consequently a lack of controls for accomplishing specific tasks in information retrieval. The text mining team in Boeing Phantom Works has taken a broad systematic approach to vector-space models for unstructured text. The two pillars of our emerging technology Trust are the handling of subspaces to capture latent semantics and the assignment of natural-language labels to elements in the subspace. On these, Trust is developing into a consistent natural system for addressing a variety of information-retrieval tasks. Our focus on subspaces enables us to compute with flexible and efficient algorithms. The natural-language interpretation in the subspace allows us to cast the tasks conceptually within linear algebra and yet to retain the association to natural language. Thus the display, controls and parameters of the system can all be revealed to the user in a familiar natural language setting. Our broad approach to subspace representations is described here.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Representations with Joint Models for Information Extraction

Unstructured natural language text contains vast quantities of human knowledge, yet this knowledge is mostly inaccessible to computers. Computers rely on structured representations (e.g. databases) for knowledge organization and retrieval, and cannot easily understand the ambiguity and nuance of human language. Dramatically increasing the accessibility of knowledge through search engines, inter...

متن کامل

Image Classification via Sparse Representation and Subspace Alignment

Image representation is a crucial problem in image processing where there exist many low-level representations of image, i.e., SIFT, HOG and so on. But there is a missing link across low-level and high-level semantic representations. In fact, traditional machine learning approaches, e.g., non-negative matrix factorization, sparse representation and principle component analysis are employed to d...

متن کامل

A New Implicit Dissipation Term for Solving 3D Euler Equations on Unstructured Grids by GMRES+LU-SGS Scheme

Due to improvements in computational resources, interest has recently increased in using implicit scheme for solving flow equations on 3D unstructured grids. However, most of the implicit schemes produce greater numerical diffusion error than their corresponding explicit schemes. This stems from the fact that in linearizing implicit fluxes, it is conventional to replace the Jacobian matrix in t...

متن کامل

A New Implicit Dissipation Term for Solving 3D Euler Equations on Unstructured Grids by GMRES+LU-SGS Scheme

Due to improvements in computational resources, interest has recently increased in using implicit scheme for solving flow equations on 3D unstructured grids. However, most of the implicit schemes produce greater numerical diffusion error than their corresponding explicit schemes. This stems from the fact that in linearizing implicit fluxes, it is conventional to replace the Jacobian matrix in t...

متن کامل

Mental Representations of Lyrical Prose

The article analyzes mental representations of Russian lyrical prose texts. The texts demonstrate collective memory engrams that are defined by cultural and historical legacy of the nation and authors’ creative world perception. In architectonics of a lyrical prose text, sense perception reveals itself in accumulated underlying meanings and wisdom conveyed by expressive means. The author’s inte...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007