Free/Open Source Shallow-Transfer Based Machine Translation for Spanish and Aragonese
نویسندگان
چکیده
s_an.nt.bz2 for Spanish and Aragonese, respectively. Available from https://github.com/jimregan/ es-an-sentences Table 1: Example first sentences from the Wikipedia articles “Mar”. es Un mar es una masa de agua salada de tamaño inferior al océano an A mar u o mar ye una masa d’augua salada de grandaria inferior a l’ocián A sea is a body of salt water smaller than an ocean Table 2: Example extract from the DBpedia abstract dataset. es John Joseph Nicholson es un actor, productor, guionista y director de cine estadounidense doce veces nominado y tres veces ganador del Premio de la Academia. En activo como actor desde 1958. an Jack Nicholson (nombre artistico de John Joseph Nicholson) ye un actor y director cinematografico estatounitense, naixito o 22 d’abril de 1937 en Nueva York. Gloss es John Joseph Nicholson is an actor, producer, screenwriter and director of American cinema twelve times nominated and three times winner of the Academy Award. Active as an actor since 1958. Gloss an Jack Nicholson (artistic name of John Joseph Nicholson) is an American actor and director, born 22 April 1937 in New York. tions, both for normalisation of non-standard forms, and for cognate induction; and to build a set of equivalent suffixes, which, as well as functioning as transformations, also served as a means of filtering words for equivalence, and for assigning categories: for example, -dá, -dat, daz and -datz all refer to the same feminine noun (-dat, -datz in the standard orthography), which typically have cognates with the suffix -dad, -dades in Spanish: for example, (uniformidat, uniformidatz (“uniformity”, “uniformities”) in Aragonese, uniformidad, uniformidades in Spanish. Cognate induction was then performed by filtering words by suffix, applying the extracted transformations, and comTable 3: SPARQL query used to extract abstracts.
منابع مشابه
Sharing resources between free/open-source rule-based machine translation systems: Grammatical Framework and Apertium
In this paper, we describe two methods developed for sharing linguistic data between two free and open source rule based machine translation systems: Apertium, a shallow-transfer system; and Grammatical Framework (GF), which performs a deeper syntactic transfer. In the first method, we describe the conversion of lexical data from Apertium to GF, while in the second one we automatically extract ...
متن کاملAn Open-Source Shallow-Transfer Machine Translation Engine for the Romance Languages of Spain
We present the current status of development of an open-source shallow-transfer machine translation engine for the Romance languages of Spain (the main ones being Spanish, Catalan and Galician) as part of a larger government-funded project which includes non-Romance languages such as Basque and involving both universities and linguistic technology companies. The machine translation architecture...
متن کاملOpen-Source Portuguese-Spanish Machine Translation
This paper describes the current status of development of an open-source shallow-transfer machine translation (MT) system for the [European] Portuguese ↔ Spanish language pair, developed using the OpenTrad Apertium MT toolbox (www.apertium.org). Apertium uses finite-state transducers for lexical processing, hidden Markov models for part-of-speech tagging, and finite-state-based chunking for str...
متن کاملAn open-source shallow-transfer machine translation toolbox: consequences of its release and availability
By the time Machine Translation Summit X is held in September 2005, our group will have released an open-source machine translation toolbox as part of a large government-funded project involving four universities and three linguistic technology companies from Spain. The machine translation toolbox, which will most likely be released under a GPL-like license includes (a) the open-source engine i...
متن کاملInferring Shallow-Transfer Machine Translation Rules from Small Parallel Corpora
This paper describes a method for the automatic inference of structural transfer rules to be used in a shallow-transfer machine translation (MT) system from small parallel corpora. The structural transfer rules are based on alignment templates, like those used in statistical MT. Alignment templates are extracted from sentence-aligned parallel corpora and extended with a set of restrictions whic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012