Extraction of Type-Logical Supertags from the Spoken Dutch Corpus
نویسنده
چکیده
The Spoken Dutch Corpus assigns 1 million of its 9 million total words a syntactic annotation in the form of dependency graphs. We will look at strategies for automatically extracting a lexicon of type-logical supertags from these dependency graphs and investigate how different levels of lexical detail affect the size of the resulting lexicon as well as the performancewith respect to supertag disambiguation.
منابع مشابه
Using the Spoken Dutch Corpus for type-logical grammar induction
Abstract The dependency-based annotation format employed within the Spoken Dutch Corpus (CGN) project (van der Wouden et al., 2002) has been designed in such a way as to enable a transparent mapping to the derivational structures of current ‘lexicalized’ grammar formalisms. Through such translations, the CGN tree bank can be used to train and evaluate computational grammars within these framewo...
متن کاملTransferring PoS-tagging and lemmatization tools from spoken to written Dutch corpus development
Abstract We describe a case study in the reuse and transfer of tools in language resource development, from a corpus of spoken Dutch to a corpus of written Dutch. Once tools for a particular language have been developed, it is logical, but not trivial to reuse them for other types or registers of the language than the tools were originally designed for. This paper reviews the decisions and adap...
متن کاملThe Assessment of Pragmatic Knowledge in the Online General IELTS-Practice Resources: A Corpus Analysis of Writing Tasks
Motivated by the concept of Communicative Language Ability and the eminence of the IELTS exam, this study intended to scrutinize the representation of functional knowledge (FK) and socio-linguistic knowledge (SK) as sub-components of pragmatic knowledge in the writing performances of both tasks of the online General IELTS-practice resources across three band scores. This quantitative inter-scor...
متن کاملUsing Large Multi-purpose Corpora for Specific Research Questions: Discourse Phenomena Related to Wh-questions in the Spoken Dutch Corpus
In this paper, we investigate whether a dataset derived from a multi-purpose corpus such as the Spoken Dutch Corpus may be considered appropriate for developing a taxonomy of wh-questions, and a model of the way in which these questions are integrated in spoken discourse. We compare the results obtained from the Spoken Dutch Corpus with a similar analysis of a large random collection of FAQs fr...
متن کاملSpontaneous Speech in the Spoken Dutch Corpus
In this paper the Spoken Dutch Corpus project is presented, a joint Flemish-Dutch undertaking aimed at the compilation and annotation of a corpus of 1,000 hours of spoken Dutch. Upon completion, the corpus will constitute a valuable resource for research in the fields of (computational) linguistics and language and speech technology. Although the corpus will contain a fair amount of read speech...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010