retrieval speed of collocations

INFO256 Project Report Implementation and Evaluation of Xtract in WordSeer

2013

Mosharaf Chowdhury

Natural languages are full of word collocations that frequently co-occur and correspond to arbitrary word usages. They appear in both technical and non-technical textual corpora and often have specific significance in individual contexts. Accurately retrieving and identifying collocations from a given corpus in an unsupervised manner is imperative to understanding and automatically generating t...

متن کامل

Investigating the Relationship between Word Segmentation Performance and Retrieval Performance in Chinese IR

2002

Fuchun Peng Xiangji Huang Dale Schuurmans Nick Cercone

It is commonly believed that word segmentation accuracy is monotonically related to retrieval performance in Chinese information retrieval. In this paper we show that, for Chinese, the relationship between segmentation and retrieval performance is in fact nonmonotonic; that is, at around 70% word segmentation accuracy an over-segmentation phenomenon begins to occur which leads to a reduction in...

متن کامل

Domain Collocation Identification

2009

Jirí Materna

In this paper we present a new method of automatic collocation identification. Collocation is an important relation between words, which is widely used, among others, in information retrieval tasks. Over the last years, many methods of automatic collocation acquisition from text corpora have been proposed. The approach described in this paper differs from the others by focusing on domain colloc...

متن کامل

The effect of frequency of exposure on the processing and learning of collocations: A comparison of first and second language readers’ eye movements

Journal: :Applied Psycholinguistics 2022

Abstract This study examined the processing and acquisition of novel words their collocates (i.e., that frequently co-occur with other words) from reading effect frequency exposure on this process. First second language speakers English read a story 1) eight exposures adjective-pseudoword collocations, 2) four same or 3) control collocations. Results recall recognition tests showed participants...

متن کامل

Information Retrieval from Unstructured Web Text Document Based on Automatic Learning of the Threshold

Journal: :IJIRR 2012

Fethi Fkih Mohamed Nazih Omri

Collocation is defined as a sequence of lexical tokens which habitually co-occur. This type of information is widely used in various applications such as Information Retrieval, document indexing, machine translation, lexicography, etc. Therefore, many techniques are developed for the automatic retrieval of collocations from textual documents. These techniques use statistical measures based on a...

متن کامل

Integration of Collocation Statistics into the Probabilistic Retrieval Model

2000

Olga Vechtomova Stephen Robertson

The paper presents a method of combining corpus information on word collocations with the probabilistic model of information retrieval. Corpus term dependencies are used to modify the probabilistic retrieval based on the term independence assumption. Collocates are derived from windows around term occurrences in the corpus. Statistical measures of mutual information and Z score are applied to s...

متن کامل

Reflections of Accomplishments in Natural Language Based Detection and Summarization

1998

Susan R. Viscuso

The common tie among these lines of research is that natural language processing techniques offer a way of overcoming the weaknesses inherent to purely statistical approaches. GE pioneered the large-scale use of natural language processing techniques in information retrieval. Standard statistical search methods use words, word fragments, and simple collocations to index documents. The GE work i...

متن کامل

A Recursive Treatment of Collocations

2010

Luka Nerima Eric Wehrli Violeta Seretan

This article discusses the treatment of collocations in the context of a long-term project on the development of multilingual NLP tools. Besides “classical” two-word collocations, we will focus on the case of complex collocations (3 words or more) for which a recursive design is presented in the form of collocation of collocations. Although comparatively less numerous than two-word collocations...

متن کامل

A Computationally Efficient Algorithm for Learning Topical Collocation Models

2015

Zhendong Zhao Lan Du Benjamin Börschinger John K. Pate Massimiliano Ciaramita Mark Steedman Mark Johnson

Most existing topic models make the bagof-words assumption that words are generated independently, and so ignore potentially useful information about word order. Previous attempts to use collocations (short sequences of adjacent words) in topic models have either relied on a pipeline approach, restricted attention to bigrams, or resulted in models whose inference does not scale to large corpora...

متن کامل

Enhancing English/Arabic CLIR Using Word Collocations and Statistical Translation and Transliteration Resources

2008

Tarek A. Elghazaly Aly A. Fahmy

In Cross Language Information Retrieval (CLIR), queries in one language retrieve documents in other language(s). This can be done through Query Translation that comes up against Translation/Transliteration challenges like ambiguity as the main problems. In this paper, a comprehensive solution has been introduced for these challenges. 1, 4 powerful English/Arabic Machine Readable Dictionaries (M...

متن کامل