Robust Processing of Natural Language

نویسنده

  • Wolfgang Menzel
چکیده

Previous approaches to robustness in natural language processing usually treat deviant input by relaxing grammatical constraints whenever a successful analysis cannot be provided by “normal” means. This schema implies, that error detection always comes prior to error handling, a behaviour which hardly can compete with its human model, where many erroneous situations are treated without even noticing them. The paper analyses the necessary preconditions for achieving a higher degree of robustness in natural language processing and suggests a quite different approach based on a procedure for structural disambiguation. It not only offers the possibility to cope with robustness issues in a more natural way but eventually might be suited to accommodate quite different aspects of robust behaviour within a single framework. 1 Robustness in Natural Language Processing The notion of robustness in natural language processing is a rather broad one and lacks a precise definition. Usually, it is taken to describe a kind of monotonic behaviour, which should be guaranteed whenever a system is exposed to some sort of non-standard input data: A comparatively small deviation from a predefined ideal should lead to no or only minor disturbances in the system’s response, whereas a total failure might only be accepted for sufficiently distorted input. Under this informal notion robustness may well be interpreted as a system’s indifference to a wide range of external disruptive factors including – the inherent uncertainty of real world input, e.g. speech or hand writing, – noisy environments, – the variance between speakers, for instance idiolectal, dialectal or sociolectal, – “erroneous” input with respect to some normative standard, – an insufficient competence of the processing system, if e.g. exposed to a non-native language or new terminology, – highly varying speech rates and – resource limitations due to the parallel execution of several mental activities. One of the most impressive features of human language processing is the ability to retain its basic capabilities even if it is exposed to a combination of adverse factors. Technical solutions, on the other hand, are likely to have serious problems if confronted with only a single type of distortion, apart from the fundamental difficulties to supply the desired monotonic behaviour at all. Accordingly, problems of robustness in NLP have almost never been considered from a unifying perspective so far. A number of very specific techniques for some of those different aspects has been developed, which hardly can be related to each other. Robustness, for instance, is a key issue in speech recognition, where reliable recognition results for a variety of speakers and speaking conditions are desired. Two basic technologies attempt to support this goal – robust stochastic modelling techniques which are able to capture generalizations across the individual variety and – sophisticated search procedures which select among huge amounts of competing recognition hypotheses by comparing probability estimations for signal segments of increasing length. Special signal enhancement techniques are used to suppress stationary environmental noise. There are other aspects of robustness which even have not been treated at all, including the flexible adaptation to external time constraints or internal resource limitations. Traditionally the notion of robustness has been strongly connected to the processing of ill-formed input, where ill-formedness can be defined both, in terms of human standards of grammaticality or in terms of unexpected input. Most of the work has been concerned with the problem from a purely syntactic point of view and usually relied on two basic techniques: error anticipation and constraint relaxation. Error anticipation identifies a number of common mistakes and tries to integrate them into the existing grammar by devising dedicated extensions to its coverage. Therefore, the method is limited to a few selected types of deviant constructions which are notorious and therefore predictable, namely – stereotypical spelling mistakes (*comittee, *rigth, etc.), – performance phenomena in spoken language, like restarts (cf. [6]) and – interference-based competence errors in early phases of second language learning (cf. [1]). Obviously, the complete “innovative” potential and the individual creativity for producing ill-formed input cannot be adequately captured by such means alone. On the other hand, constraint relaxation techniques rely on a systematic variation of existing grammar rules written for standard input. Initially, the idea was restricted to the stepwise retraction of e.g. agreement conditions in syntactic rules. It can easily be extended to incorporate arbitrary rule transformations in order to allow for the insertion, deletion, substitution and transposition of 1 The difficulties with a straightforward generalization of this approach to e.g. syntactic or semantic anomalies are obvious: It would require huge amounts of sufficiently deviant utterances being available as training data. This renders the approach technically infeasible and cognitively implausible. For similar reasons connectionist approaches are not considered here: At the moment they seem to be limited to approximate solutions for flat representations (cf. [27]). 2 For a good overview see [25]. elements. The difference vanishes completely within modern constraint-based formalisms [26] [2], where a transposition of constituents can be interpreted equally well as a relaxation of linear precedence constraints. Furthermore, constraints can be annotated by their degree of vulnerability, hence allowing to include aspects of error anticipation into the relaxation framework. Since both, error anticipation and constraint relaxation considerably enlarge the generative capacity of the original grammar they will lead to spurious ambiguities and serious search problems. This restricts their application to a kind of post mortem analysis. Only if a failure of the standard analysis procedure indicates the presence of non-standard input, error rules or relaxation techniques are activated to integrate the fragmentary results obtained so far. Even a superficial comparison with human processing principles shows the fundamental deficit of these approaches. A human reader or listener accepts illformed input to a wide degree, often without noticing an error at all. This is particularly true if strong expectations concerning the content of the utterance are involved or if heavy time constraints restrict the processing depth. Obviously, there is a fundamental parallelism between robustness issues and time considerations, which syntactically oriented solutions lack so far. Robustness in human language processing does not amount to an additional effort, but instead facilitates both, insensitivity to ill-formed input as well as a flexible adaptation to temporal restrictions. This basic pattern is much better modelled by semantically oriented approaches based on the slot-and-filler-principle. Here, highly domain specific expectations are coded by means of frame-like structures and checked against the input for satisfaction. The schema can be successfully extended to a kind of skimming understanding bringing together the question of robustness against syntactically ill-formed input and some simple considerations concerning resource limitations. This advantage of a semantically guided analysis, however, is won by the cost of excluding another important robustness feature, namely the ability to cope with unexpected input (e.g. a change of topic beyond the narrow limitations of the domain or the violation of selectional restrictions in metaphorical expressions). 2 Observations from Human Language Processing Psycholinguistic evidence provides a contradictory picture of human language processing. Some observations clearly support a rather strong modular organization with processing units of great autonomy like syntax and semantics [4] [5]. On the other hand there is a considerable semantic influence on the assignment of syntactic structure [20] which suggests a highly integrated processing architecture. 3 There are exceptions to every rule: For language learning purposes [17] propose an initial analysis based on a moderately weak grammar and followed by a more rigid second pass. Robust behaviour in natural language understanding seems to require both, – the autonomy between parallel lines of processing which embodies redundancy and allows to compensate partial insufficiencies and – the interactive nature of informational exchange which allows to relate partial structures on different levels of granularity. Functional autonomy undoubtedly is of fundamental importance for robustness. It allows to yield an at least vague interpretation even in cases of extremely distorted input: 1. A semantically almost empty sentence can be analysed quite well by syntactic means alone, delivering a hypothetical interpretation in terms of a possible world with highly underspecified referential object descriptions and possibly ambiguous thematic roles. “... und grausig gutzt der Golz.”[23] (1) 2. Syntactically ill-formed utterances are interpreted based on semantic and background knowledge even if subcategorization regularities or other grammatical constraints are violated. Although both processing units are – at least partially – able to generate some useful interpretation independently of the other one, best results, of course, are to be expected if they combine their efforts in a systematic way. Parallel and autonomous structures in language processing have not only evolved between syntactic and semantic aspects of language. They can be observed equally well at the level of speech comprehension where auditory (hearing) and visual (lip-reading) clues are usually combined to achieve a reliable recognition result. Again, both systems – in principle – are able to work independently, but synergy occurs if both are activated concurrently. A second group of observations related to the question of robustness concerns the expectation-driven nature of human language understanding. Here, expectations come to play at two different dimensions: – Syntactic, semantic and pragmatic predictions about future input derived from previous parts of the utterance or dialogue. – Expectations exchanged between parallel and autonomous processing structures for syntax and semantics. The role of dynamic expectations has mostly been investigated from the viewpoint of a possible search space reduction in prediction based parsing strategies (namely left or head corner algorithms). If used to select between competing hypotheses in speech recognition the predictive capacity of a grammar can contribute additionally to an enhanced robustness of the overall system [10] [7]. Although the importance of predictions for robustness is beyond question, here the second type of expectations shall be examined as a matter of priority, since they are expected to establish the attempted informational coupling between parallel processing units. As the simple examples above have shown, no predefined direction for this exchange of information can be assumed. Certain syntactic constructions may trigger specific semantic interpretations, a view which is strongly supported by the traditional perspective on the relation between syntax and semantics. In the opposite direction, semantic relations, e.g. derived from background knowledge, can not only be used to disambiguate between preestablished syntactic readings, but moreover are able to actively propose suitable syntactic structures. This bidirectionality of interaction seems to be of great importance for the ability to provide the mutual compensation necessary to treat deviant constructions of different kind. Of course, the expectation-based nature of natural language processing cannot guarantee a failure-proof performance under all circumstances. There certainly are situations in which strong expectations may override even sensory data. Such a situation can easily be studied in everyday conversation whenever e.g. pragmatic expectations are predominant. A similar problem occurs in experimental settings using intentionally desynchronised video input, where lip reading information sometimes overrides even the auditory stimulus. The problem is witnessed as well by the difficulties usually encountered in proof-reading one’s own text: Extremely strong expectations concerning the content usually cause minor mistakes to be passed unnoticed. Typically, expectations are contradictory and will be of different impact on the progress of the analysis procedure. Hence, there is a third principle of robust language processing upon which the human model builds. It concerns the preference-based selection between both, competing interpretations as well as different expectations [12]. Expectations have to be ranked according to their particular strength and weighted against each other. Recently linguistic research has shown a remarkable trend towards the development of integrated models of language structure. One of the more popular examples surely is Head-Driven Phrase Structure Grammar (HPSG [24]), where syntactic and semantic descriptions are uniquely related to each other by coreferential pointers within the framework of typed feature structures. The strong coupling on the level of representation and on the level of processing (i.e. within unification) completely lacks autonomy. The construction of a logical form is always mediated by syntactic descriptions taken e.g. from subcategorization information. Since syntactic and semantic restrictions are conjunctively combined the overall vulnerability against arbitrary impairment of the input utterances even increases: An analysis may now fail due to syntactic as well as due to semantic reasons. A quite similar conclusion can be drawn for construction grammar [3], another integrated approach. It combines syntactic, semantic and even pragmatic information in a single representation named construction. Again, autonomy of individual description levels is missing and even if constructions are supplied with preferential weightings derived from their frequency of use (as realized in 4 Note that perfect performance is not necessarily covered by the informal notion of robustness introduced earlier. SAL [13]) robustness does not increase. A clearcut separation of representational levels has actually been realized in the cognitively motivated parser COMPERE [11] [18]. The system aims at modelling error recovery techniques for garden-path sentences. It uses an arbitration mechanism to decide in case of a conflict situation which alternative reading should be backed up. This allows to combine early commitment decisions with the possibility to switch to another interpretation if necessary later on. Although the parser is guided in its decisions by different kinds of preferences, the mapping between syntactic and semantic representations seems to be a strict one. Accordingly, it does not provide the necessary means for conflict resolution in all those cases of non-standard input for which no interpretation can be established. In particular, three different cases can be distinguished 1. failure on a single level (syntax or semantics) 2. failure on both levels (syntax and semantics) 3. no consistent mapping between levels Whereas the first case might be easily accommodated by the arbitration mechanism the latter two require the abandonment of the strict mapping and its replacement by a preference-based module interaction. 3 Disambiguation by Constraint Propagation A suitable combination of the three principles discussed above might in fact provide the foundation for an effective use of redundancy in parallel processing structures – autonomy guarantees a fall-back behaviour for failures of a single module – expectancy-oriented analysis facilitates the informational exchange and – preference-based processing guides the analysis towards a promising interpretation and establishes a loose coupling between modules. These principles, even if taken together, do not explain the almost unconscious treatment of errors in everyday communication. To simulate a similar behaviour a selective constraint invocation strategy will become necessary. Then, parsing is understood as a disambiguation procedure, which activates only specific parts of the grammar, if this is deemed to be unavoidable for solving a particular disambiguation problem. The procedure can be terminated if a sufficiently reliable disambiguation has been achieved even if certain conditions of the grammar have never been checked so far. Robustness is not introduced by a post mortem retraction of constraints but rather by their careful invocation. Along these lines a rudimentary kind of robustness has been achieved in the Constraint Grammar framework [15], a system for parsing large amounts of unrestricted text. Constraint Grammar (CG) attempts to establish a dependency description which is underspecified with respect to the precise identity of modifiees. Initially, it assigns a set of morphologically justified syntactic labels to each word form in the input sentence. Possible labels among others are @+FMAINV the finite verb of a clause @SUBJ a grammatical subject @OBJ a direct object @DN> a determiner modifying a noun to the right @NN> a noun modifying a noun to the right The initial set of labels is successively reduced by applying compatibility and surface ordering constraints until a unique interpretation has been reached or the set of available constraints is exhausted. In the latter case, a total disambiguation cannot be achieved by purely syntactic means, as in the following attachment example: Bill saw the little dog in the park @SUBJ @+FMAINV @DN> @AN> @OBJ @ @<P

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust Learning in Random Subspaces: Equipping NLP for OOV Effects

Inspired by work on robust optimization we introduce a subspace method for learning linear classifiers for natural language processing that are robust to out-of-vocabulary effects. The method is applicable in live-stream settings where new instances may be sampled from different and possibly also previously unseen domains. In text classification and part-of-speech (POS) tagging, robust perceptr...

متن کامل

Robust Automated Natural Language Processing with Multiword Expressions and Collocations

This tutorial aims to provide attendees with a clear notion of the linguistic and distributional characteristics of multiword expressions (MWEs), their relevance for robust automated natural language processing and language technology, what methods and resources are available to support their use, and what more could be done in the future. Our target audience are researchers and practitioners i...

متن کامل

Towards a Robust Deep Language Understanding System

We propose a system that bridges the gap between the two major approaches toward natural language processing: robust shallow text processing and domain-specific (often linguistically-based) deep understanding. We propose to use an existing linguistically motivated deep understanding system as the core and to leverage statistical techniques and external resources such as world knowledge to broad...

متن کامل

Middleware for Creating and Combining Multi-dimensional NLP Markup

We present the Heart of Gold middleware by demonstrating three XMLbased integration scenarios where multidimensional markup produced online by multilingual natural language processing (NLP) components is combined to deliver rich, robust linguistic markup for use in NLP-based applications like information extraction, question answering and semantic web. The scenarios include (1) robust deep-shal...

متن کامل

Studies on Robust Language and Dialogue Processing for Spoken Dialogue Systems

In spoken dialogue systems, robust language processing for spontaneous speech understanding and robust dialogue processing for achieving user goal are inevitable. Previously, research of speech recognition and research of natural language understanding were done independently. At first glance, it seems to be no problem to combine these two technologies, because the purpose of speech recognition...

متن کامل

The Automatic Construction Of A Symbolic Parser Via Statistical Techniques

We report on the development of a robust parsing device which aims to provide a partial explanation for child language acquisition and help in the construction of better natural language processing systems. The backbone of the new approach is the synthesis of statistical and symbolic approaches to natural language.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995