Automatic Learning for Semantic Collocation
نویسندگان
چکیده
The real difficulty in development of practical NLP systems comes from the fact that we do not have effective means for gathering "knowledge". In this paper, we propose an algorithm which acquires automatically knowledge of semantic collocations among "words" from sample corpora. The algorithm proposed in this paper tries to discover semantic collocations which will be useful for disambiguating structurally ambiguous sentences, by a statistical approach. The algorithm requires a corpus and minimum linguistic knowledge (parts-of-speech of words, simple inflection rules, and a small number of general syntactic rules). We conducted two experiments of applying the algorithm to different corpora to extract different types of semantic collocations. Though there are some unsolved problems, the results showed the effectiveness of the proposed algorithm. 1 I n t r o d u c t i o n Quite a few grammatical formalisms have been proposed by computational linguists, which are claimed to be "good" (declarative, highly modular, etc.) for practical application systems in NLP. It has also been claimed that extra-linguistic, domain specific knowledge is indispensable in most NLP applications, and computational frameworks for representing and using such domain knowledge have also been developed. However, the real difficulty in developing practical NLP systems is due to the fact that we do not have effective means for gathering the "knowledge", whether lin*SEKINE is now a visitor at C.C.L., U.M.I.S.T. s ekine @ ccl. umist, ac. uk guistic or extra-linguistic. In particular, it has been reported [Ananiadou, 1990] that not only extra-linguistic, domain knowledge but also linguistic knowledge required for application systems varies, depending on text-type (technical reports, scientific papers, manuals, etc.), subject domain, type of application (MT, automatic abstraction, etc.) etc. This means that we have to have effective and efficient methods either for adapting already existing knowledge for a specific "sublanguage" or for acquiring knowledge automatically, for example from sample corpora of given applications. In this paper, we propose an algorithm which automatically acquires knowledge of semantic collocations among "words". "Semantic" here means that the collocations the algorithm discovers are not collocations among words in the sense of traditional linguistics but collocations that reflect ontological relations among entities in given subject domains. We expect that the knowledge to be extracted will not only be useful for disambiguating sentences but also will contribute to discovering ontological classes in given subject domains. Though several studies with similar objectives have been reported [Church, 1988], [Zernik and Jacobs, 1990], [Calzolari and Bindi, 1990], [Garside and Leech, 1985], [Hindle and Rooth, 1991], [Brown et al., 1990], they require that sample corpora be correctly analyzed or tagged in advance. It must be a training corpus, which is tagged or parsed by human or it needs correspondence between two language corpora. Because their preparation needs a lot of manual assistance or an unerring tagger or parser, this requirement makes their algorithm~, troublesome in actual application environments. On the other hand, the algorithm in this paper has no such requirement, it requires only a minimum of linguistic knowledge, including parts-of-speech of words, simple inflection rules, and a small number of general syntactic rules which lexicon based syntactic theories like HPSG CC etc. normally assume. The parser is not a deterministic parser, but a parser which produces all possible analyses. All of the results are used for calculation ant
منابع مشابه
Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures
Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...
متن کاملPresentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures
Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...
متن کاملThe Impact of L2 Semantic Tasks (L2 Collocation versus L2 Definition) on Iranian Intermediate EFL Learners’ Vocabulary Achievement
This study investigated the relationship between teaching L2 semantic tasks (collocation vs. definition) in vocabulary achievement of Iranian intermediate EFL learners. To this end, 60 students at intermediate level studying in the Simin Institute were selected from a total number of 100 participants based on their performance on Oxford Placement Test. After ensuring the criterion of homogeneit...
متن کاملConstruction of Semantic Collocation Bank Based on Semantic Dependency Parsing
Collocation has always been an important issue in language research, especially in Chinese language researches. Chinese is an isolated language, which lacks morphological changes.Establishing a relatively complete dictionary of Chinese collocation will be a great contribution to Chinese study and research. Collocation plays a significant supporting role in many fields of NLP, such as informatio...
متن کاملCollocation, semantic prosody and near synonymy: A cross-linguistic perspective
This paper explores the collocational behaviour and semantic prosody of near synonyms from a cross-linguistic perspective. The importance of these concepts to language learning is well recognized. Yet while collocation and semantic prosody have recently attracted much interest from researchers studying the English language, there has been little work done on collocation and semantic prosody on ...
متن کاملCollocation, Semantic Prosody, and Near Synonymy: A Cross-Linguistic Perspective
This paper explores the collocational behaviour and semantic prosody of near synonyms from a cross-linguistic perspective. The importance of these concepts to language learning is well recognized. Yet while collocation and semantic prosody have recently attracted much interest from researchers studying the English language, there has been little work done on collocation and semantic prosody on ...
متن کامل