Translating noun compounds using semantic relations
نویسندگان
چکیده
relations, such as, Agent, Location, Instrument, suggested in Barker and Szpakowicz (1998), Finin (1980), Girju et al. (2005), Kim and Baldwin (2005, 2008), Moldovan et al. (2004), Rosario and Hearst (2001). Generalized prepositions, such as, Of, For, In, proposed by Lauer (1995). Recoverably deletable predicates (RDPs) to interpret semantic relations in the NCs (Levi, 1978). Semantic relations, such as Have, Make, Be, From, For, In are examples of this kind. For the present work we have used a set of 20 semantic relations which are taken from the existing literature. These are gent, Beneficiary, Cause, Container, Content, Equative, Instrument, Location, Material, Possessor, Product, Purpose, esult, Source, Time, Topic, Experiencer, Specialization, Attribute-Transfer and Use. We excluded some of the semantic relations found in literature (e.g. Extent, Probability, Frequency, Influence, ynonymy, Possibility) due to their lack of instances. Another important semantic relation, viz. Property, has also been gnored as this is satisfied primarily by a combination of “adjective + noun” or “proper noun + common noun” pairs. or example “blue car”, “Delhi city”. Section 4 describes the scheme used for semantic relation identification and translation pattern generation for the 2ord noun compounds. Section 5 discusses the bracketing issues, and how this 2-word scheme can be used recursively or generating translation patterns for 3-word and 4-word noun compounds. . Semantic relation identification and translation patterns for 2-word NCs In order to find the translation pattern for a 2-word noun compound we first need to find the semantic relation etween the two nouns of the noun compound. .1. Semantic relation identification As a semantic relation can be represented and is dominated by a set of verbs (Nakov and Hearst, 2006; Nakov, 008), the proposed scheme tries to uncover the relationship between two noun pairs by rewriting or paraphrasing the oun compounds as a phrase that contains a verb and one or more preposition(s). For illustration, the noun compound “family car” can be represented by the following paraphrases: “car owned by amily”, “car possessed by family”, “car belonging to family”. The verbs ‘own’, ‘possess’ and ‘belong’ along with he prepositions ‘by’ and ‘to’ provide an evidence for the presence of the semantic relation: Possessor. Similarly, the oun compound “olive oil” can be represented by the following paraphrases: “oil obtained from olive”, “oil made from live”, “oil coming from olive”. These constructs indicate the presence of semantic relation: Material between the two ouns. Thus, for each semantic relation a group of verbs called seed verbs, are used. These seed verbs have been taken rom Nakov and Hearst (2006), and the shared Task7 data 2008. Each semantic relation is represented by a group of erbs and a verb assigned to a semantic relation may belong to multiple semantic relations. A set of 728 seed verbs nd 30 prepositions have been identified for the purpose of paraphrasing. Table 2 presents some examples of the seed erbs assigned to the semantic relations. 7 http://multiword.sourceforge.net/. 96 R. Balyan, N. Chatterjee / Computer Speech and Language 32 (2015) 91–108 Table 2 Seed verbs associated with the semantic relations. S. No. Semantic relations Seed verbs with suitable prepositions 1 Cause Cause, promote, lead to, result in, generate, create, carry, spread, transmit, bring, infect, responsible for, give, pass 2 Experiencer Spread, acquire, suffer from, die of, develop, contract, catch, diagnosed of, have, beat, infected by, survive from, get, pass, fall, transmit, avoid 3 Possessor Own, owned by, possess, possessed by, have, belong to, related to, borrow, take, grant, request 4 Product Produce, make, manufacture, build, assemble, create 5 Time Arrive in, leave at, conducted in, occur in, happen during, experience in 6 Material Made of, made from, contain, originate from, composed of, produced from 7 Source Come from, caused by, induced by, relate to, arise from, result from, generated by 8 Purpose Cure, relieve, treat, help with, reduce, heal, prevent, prescribed for, block, control, end, intended for 9 Container Contained in, created in, built in, built for, provided in, experienced in, included by 10 Location Live in, work on, come from, work in, reside in, located in, bred in, kept in, made in, born from Noun Compoun d (NC) N1N2 (anthrax death) Verbs an d Prepositio ns Paraphrase Generato r death ac t as an thrax death ai d as an thrax death aris e from anthra x death ca use d by an thrax death caused from anthrax death cons ist of anthra x death result from anthrax Paraphrase Ca ndi dates death suppo rted by anthrax Fig. 2. Paraphrase generation. For a 2-word noun compound the paraphrases are formed using these seed verbs and prepositions. Paraphrase generation for a noun compound, viz. anthrax death, is shown in Fig. 2. For identifying the semantic relation between the nouns of a noun compound, paraphrases are generated and web frequency of these paraphrases is found using search engines and the Netspeak8 web service (Potthast et al., 2010). The top 15 paraphrases in terms of web frequencies are identified, and the verb parts of these paraphrases are extracted. The semantic relation that contains the maximum of these extracted verbs is selected. This indicates that the semantic relation is best represented by this group of verbs, and hence indicates the semantic relation existing between the two nouns of the noun compound. The algorithm for semantic relation identification between the two nouns in a 2-word noun compound is described in Algorithm 1. Algorithm 1 (Semantic relation identification for a 2-word English noun compound). Input: A 2-word English noun compound, seed verbs, prepositions. Output: The semantic relation(s) for the noun compound. 1. The 2-word NC (N1N2) is split in its two nouns (N1 and N2) 2. Using the two nouns (N1 and N2) do: 2.1 Form the paraphrases using the seed verbs, the prepositions and the nouns 2.2 (a) Find the web frequency of all the paraphrases using search engine and Netspeak (b) Find the top 15 paraphrases having highest frequencies (c) Find the verbs forming these 15 paraphrases obtained from (b) (d) Find the semantic relation(s) having the maximum number of verbs extracted in (c) 2.3 Return the semantic relation(s) 8 http://www.netspeak.eu/. R. Balyan, N. Chatterjee / Computer Speech and Language 32 (2015) 91–108 97 Table 3 Semantic relations and the Hindi translation patterns: N1T and N2T are the translations of N1 and N2. Semantic relation Definition Examples Hindi translation patterns Possessor N1 has N2 – N1 is owner Company car, family estate, girl mouth, child foot N1T + kaa/ke/ki + N2T N1 has N2 – N1 is borrower Student loan, national debt N1TN2T Source N1 is the source of N2 and N1 is not a body part Northern wind, foreign capital N1TN2T N1 is the source of N2 but N1 is a body part Chest pain, stomach ache, heart attack N1T + mein + N2T Equative9 N1 is also head – N1, N2 are human Composer arranger, player coach, lady doctor N1TN2T Experiencer N2 experiences N1 (an animated entity experiencing a state/feeling) Heart patient, cancer patient N1T + kaa/ke/ki + N2T Specialization N1 is specialization of N2 but N1 and N2 are human Boy child, girl child, baby boy, baby girl Single word NC is specialization of N2 Fighter planes, war ships N1TN2T Attribute-Transfer Salient attribute of N1 is transferred to N2 Iron will, crescent wrench, lion heart, doe eye, chicken heart N1T + jaise + N2T + vaalaa or single word Use N2 uses N1 Laser printer, water gun, electron microscope N1T + vaalaa/vaale/vaali + N2T
منابع مشابه
Standardised Evaluation of English Noun Compound Interpretation
We present a tagged corpus for English noun compound interpretation and describe the method used to generate them. In order to collect noun compounds, we extracted binary noun compounds (i.e. noun-noun pairs) by looking for sequences of two nouns in the POS tag data of the Wall Street Journal. We then manually filtered out all noun compounds which were incorrectly tagged or included proper noun...
متن کاملAutomatic Interpretation of Noun Compounds Using WordNet Similarity
The paper introduces a method for interpreting novel noun compounds with semantic relations. The method is built around word similarity with pretagged noun compounds, based on WordNet::Similarity. Over 1,088 training instances and 1,081 test instances from the Wall Street Journal in the Penn Treebank, the proposed method was able to correctly classify 53.3% of the test noun compounds. We also i...
متن کاملOn the semantics of noun compounds
This paper provides new insights on the semantic characteristics of two and three noun compounds. An analysis is performed using two sets of semantic classification categories: a list of 8 prepositional paraphrases previously proposed by Lauer [Designing statistical language learners: experiments on noun compounds, Ph.D. Thesis, Macquarie University, Australia] and a new set of 35 semantic rela...
متن کاملUsing WordNet to Automatically Deduce Relations between Words in Noun-Noun Compounds
We present an algorithm for automatically disambiguating noun-noun compounds by deducing the correct semantic relation between their constituent words. This algorithm uses a corpus of 2,500 compounds annotated with WordNet senses and covering 139 different semantic relations (we make this corpus available online for researchers interested in the semantics of noun-noun compounds). The algorithm ...
متن کاملSupervised Learning of German Qualia Relations
In the last decade, substantial progress has been made in the induction of semantic relations from raw text, especially of hypernymy and meronymy in the English language and in the classification of noun-noun relations in compounds or other contexts. We investigate the question of learning qualia-like semantic relations that cross part-of-speech boundaries for German, by first introducing a han...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Computer Speech & Language
دوره 32 شماره
صفحات -
تاریخ انتشار 2015