Automatic Procedures in Tectogrammatical Tagging

نویسنده

  • Alena Böhmová
چکیده

A semi-automatic syntactic annotation of a part of the Czech National Corpus in the Prague Dependency Treebank (PDT) has among its aims the possibility to check the theoretical approach chosen (Functional Generative Description, see [2]). While the first phases of the annotation of PDT, i.e. the morphemic representations and the dependency trees on an intermediate analytic level, i.e. analytic tree structures (ATSs, see [1]) have been discussed elsewhere, the present paper is devoted to the second, basic phase, the transduction from AL to (underlying) syntax itself, i.e. to tectogrammatical representations, which should be provided for 10 000 sentences during the year 2000 (at its start, 100 000 sentences have obtained their ATS annotations).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Syntactic Tagging: Procedure for the Transition from the Analytic to the Tectogrammatical Tree Structures

The syntactic tagging of the Prague Dependency Treebank (PDT) is divide into two steps, the rst resulting in analytic tree structures (ATS) and the second in tectogrammatical tree structures (TGTS). The present paper describes the transition procedures, automatic and manual, from ATS to TGTS and illustrates these procedures on two Czech sentences. Syntactic tagging in The Prague Dependency Tree...

متن کامل

Coreferential Relations In The Prague Dependency Treebank

The approach to corpus annotation of PDT is performed in several levels and steps. The annotation of coreference relations is carried out on underlying (tectogrammatical) tree structures assigned to the sentences in the text on independent (and theoretically based) grounds, which makes it possible to systematically include into the annotation the superficially “null“ (unrealized) anaphors and o...

متن کامل

Czech-English Dependency-based Machine Translation

We present some preliminary results of a Czech-English translation system based on dependency trees. The fully automated process includes: morphological tagging, analytical and tectogrammatical parsing of Czech, tectogrammatical transfer based on lexical substitution using word-to-word translation dictionaries enhanced by the information from the English-Czech parallel corpus of WSJ, and a simp...

متن کامل

English-Czech Machine Translation Using TectoMT

English to Czech machine translation as it is implemented in the TectoMT system consists of three phases: analysis, transfer and synthesis. The system uses tectogrammatical (deep-syntactic dependency) trees as the transfer medium. Each phase is divided into so-called blocks, which are processing units that solve linguistically interpretable tasks (e.g., statistical part-of-speech tagging or rul...

متن کامل

Prague Dependency Treebank: From analytic to tectogrammatical annotations

The Prague Dependency Treebank is conceived of as an annotated corpus of written Czech, comprising three layers of annotations. In the present paper, we focus on a more detailed description of the structure and contents of the tectogrammatical syntactic trees (underlying sentence representations) and a specification of the transition from the analytic syntactic tree to the tectogrammatical one....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Prague Bull. Math. Linguistics

دوره 76  شماره 

صفحات  -

تاریخ انتشار 2001