نتایج جستجو برای: post text

تعداد نتایج: 564440  

Journal: :Pattern Recognition 2002
Mingjing Li Zheng Chen HongJiang Zhang

A statistical correlation model for image retrieval is proposed. This model captures the semantic relationships among images in a database from simple statistics of userprovided relevance feedback information. It is applied in the post-processing of image retrieval results such that more semantically related images are returned to the user. The algorithm is easy to implement and can be efficien...

2008
Wouter Weerkamp Maarten de Rijke

We describe the participation of the University of Amsterdam’s ILPS group in the blog track at TREC 2008. We mainly explored different ways of using external corpora to expand the original query. In the blog post retrieval task we did not succeed in improving over a simple baseline (equal weights for both the expanded and original query). Obtaining optimal weights for the original and the expan...

2013
Rubén San-Segundo-Hernández Juan Manuel Montero-Martínez Mircea Giurgiu Ioana Muresan Simon King

This paper describes the text normalization module of a text to speech fully-trainable conversion system and its application to number transcription. The main target is to generate a language independent text normalization module, based on data instead of on expert rules. This paper proposes a general architecture based on statistical machine translation techniques. This proposal is composed of...

2011
Vit Suchomel Jan Pomikálek

SpiderLing—a web spider for linguistics—is new software for creating text corpora from the web, which we present in this article. Many documents on the web only contain material which is not useful for text corpora, such as lists of links, lists of products, and other kind of text not comprised of full sentences. In fact such pages represent the vast majority of the web. Therefore, by doing unr...

Journal: :Pattern Recognition 1997
Gary Geunbae Lee Jong-Hyeok Lee JinHee Yoo

Most of the post-processing methods for character recognition rely on contextual information of character and word-fragment levels. However, due to linguistic characteristics of Korean, such low-level information alone is not sufficient for high-quality character-recognition applications, and we need much higher-level contextual information to improve the recognition results. This paper present...

2015
Benjamin Marie Alexandre Allauzen Franck Burlot Quoc-Khanh Do Julia Ive Elena Knyazeva Matthieu Labeau Thomas Lavergne Kevin Löser Nicolas Pécheux François Yvon

This paper describes LIMSI’s submissions to the shared WMT’15 translation task. We report results for French-English, Russian-English in both directions, as well as for Finnish-into-English. Our submissions use NCODE and MOSES along with continuous space translation models in a post-processing step. The main novelties of this year’s participation are the following: for Russian-English, we inves...

2015
Benjamin Marie Alexandre Allauzen Franck Burlot Quoc-Khanh Do Julia Ive Elena Knyazeva Matthieu Labeau Thomas Lavergne Kevin Löser Nicolas Pécheux François Yvon

This paper describes LIMSI’s submissions to the shared WMT’15 translation task. We report results for French-English, Russian-English in both directions, as well as for Finnish-into-English. Our submissions use NCODE and MOSES along with continuous space translation models in a post-processing step. The main novelties of this year’s participation are the following: for Russian-English, we inves...

2010
Helena Blancafort

In this paper we present preliminary work conducted on semi-automatic induction of inflectional paradigms from non annotated corpora using the open-source tool Linguistica (Goldsmith 2001) that can be utilized without any prior knowledge of the language. The aim is to induce morphology information from corpora such as to compare languages and foresee the difficulty to develop morphosyntactic le...

2016
Johannes Jurgovsky Michael Granitzer Christin Seifert

Skip-Gram word embeddings, estimated from large text corpora, have been shown to improve many NLP tasks through their highquality features. However, little is known about their robustness against parameter perturbations and about their e ciency in preserving word similarities under memory constraints. In this paper, we investigate three post-processing methods for word embeddings to study their...

2014
Nikola Ljubesic Darja Fiser Tomaz Erjavec

This paper presents TweetCaT, an open-source Python tool for building Twitter corpora that was designed for smaller languages. Using the Twitter search API and a set of seed terms, the tool identifies users tweeting in the language of interest together with their friends and followers. By running the tool for 235 days we tested it on the task of collecting two monitor corpora, one for Croatian ...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید