Spatial Representation of Classifier Predicates for Machine Translation into American Sign Language

نویسنده

  • Matt Huenerfauth
چکیده

The translation of English text into American Sign Language (ASL) animation tests the limits of traditional machine translation (MT) approaches. The generation of spatially complex ASL phenomena called “classifier predicates” motivates a new representation for ASL based on virtual reality modeling software, and previous linguistic research provides constraints on the design of an English-toClassifier-Predicate translation process operating on this representation. This translation design can be incorporated into a multipathway architecture to build English-to-ASL MT systems capable of producing classifier predicates. Introduction and Motivations Although Deaf students in the U.S. and Canada are taught written English, the challenge of acquiring a spoken language for students with hearing impairments results in the majority of Deaf U.S. high school graduates reading at a fourth-grade level (Holt, 1991). Unfortunately, many strategies for making elements of the hearing world accessible to the Deaf (e.g. television closed captioning or teletype telephone services) assume that the user has strong English literacy skills. Since many Deaf people who have difficulty reading English possess stronger fluency in American Sign Language (ASL), an automated English-to-ASL machine translation (MT) system can make more information and services accessible in situations where English captioning text is at too high a reading level or a live interpreter is unavailable. Previous English-to-ASL MT systems have used 3D graphics software to animate a virtual human character to perform ASL output. Generally, a script written in a basic animation instruction set controls the character’s movement; so, MT systems must translate English text into a script directing the character to perform ASL. Previous projects have either used word-to-sign dictionaries to produce English-like manual signing output, or they have incorporated analysis grammar and transfer rules to produce ASL output (Huenerfauth, 2003; Sáfár and Marshall, 2001; Speers, 2001; Zhao et al., 2000). While most of this ASL MT work is still preliminary, there is promise that an MT system will one day be able to translate many kinds of English-to-ASL sentences; although, some particular ASL phenomena – those involving complex use of the signing space – have proven difficult for traditional MT approaches. This paper will present a design for generating these expressions. ASL Spatial Phenomena ASL signers use the space around them for several grammatical, discourse, and descriptive purposes. During a conversation, an entity under discussion (whether concrete or abstract) can be “positioned” at a point in the signing space. Subsequent pronominal 1 Students who are age eighteen and older are reading English text at a level more typical of a ten-year-old student. reference to this entity can be made by pointing to this location (Neidle et al., 2000). Some verb signs will move toward or away from these points to indicate (or show agreement with) their arguments (Liddell, 2003a; Neidle et al., 2000). Generally, the locations chosen for this use of the signing space are not topologically meaningful; that is, one imaginary entity being positioned to the left of another in the signing space doesn’t necessarily indicate the entity is to the left of the other in the real world. Other ASL expressions are more complex in their use of space and position invisible objects around the signer to topologically indicate the arrangement of entities in a 3D scene being discussed. Constructions called “classifier predicates” allow signers to use their hands to position, move, trace, or re-orient an imaginary object in the space in front of them to indicate the location, movement, shape, contour, physical dimension, or some other property of a corresponding real world entity under discussion. Classifier predicates consist of a semantically meaningful handshape and a 3D hand movement path. A handshape is chosen from a closed set based on characteristics of the entity described (whether it is a vehicle, human, animal, etc.) and what aspect of the entity the signer is describing (surface, position, motion, etc). For example, the sentence “the car drove down the bumpy road past the cat” could be expressed in ASL using two classifier predicates. First, a signer would move a hand in a “bent V” handshape (index and middle fingers extended and bent slightly) forward and slightly downward to a point in space in front of his or her torso where an imaginary miniature cat could be envisioned. Next, a hand in a “3” handshape (thumb, index, middle fingers extended with the thumb pointing upwards) could trace a path in space past the “cat” in an up-and-down fashion as if it were a car bouncing along a bumpy road. Generally, “bent V” handshapes are used for animals, and “3” handshapes, for vehicles. Generating Classifier Predicates As the “bumpy road” example suggests, translation involving classifier predicates is more complex than most English-to-ASL MT because of the highly productive and spatially representational nature of these signs. Previous ASL MT systems have dealt with this problem by omitting these expressions from their linguistic coverage; however, many English concepts lack a fluent ASL translation without them. Further, these predicates are common in ASL; in many genres, signers produce a classifier predicate on average once per 100 signs (this is approximately once per minute at typical signing rates) (Morford and MacFarlane, 2003). So, systems that cannot produce classifier predicates can only produce ASL of limited fluency and are not a viable longterm solution to the English-to-ASL MT problem. Classifier predicates challenge traditional definitions of what constitutes linguistic expression, and they oftentimes incorporate spatial metaphor and scenevisualization to such a degree that there is debate as to whether they are paralinguistic spatial gestures, nonspatial polymorphemic constructions, or compositional yet spatially-parameterized expressions (Liddell, 2003b). No matter their true nature, an ASL MT system must somehow generate classifier predicates. While MT designs are not required to follow linguistic models of human language production in order to be successful, it is worthwhile to consider linguistic models that account well for the ASL classifier predicate data but minimize the computational or representational overhead required to implement them. Design Focus and Assumptions This paper will focus on the generation of classifier predicates of movement and location (Supalla, 1982; Liddell, 2003a). Most of the discussion will be about generating individual classifier predicates; an approach for generating multiple interrelated predicates will be proposed toward the end of the paper. This paper will assume that English input sentences that should be translated into ASL classifier predicates can be identified. Some of the MT designs proposed below will be specialized for the task of generating these phenomena. Since a complete MT system for English-to-ASL would need to generate more than just classifier predicates, the designs discussed below would need to be embedded within an MT system that had other processing pathways for handling non-spatial English input sentences. The design of such multipathway MT architectures is another focus of this research project (Huenerfauth, 2004). These other pathways could handle most inputs by employing traditional MT technologies (like the ASL MT systems mentioned above). A sentence could be “identified” (or intercepted) for special processing in the classifier predicate pathway if it fell within the pathway’s implemented lexical (and – for some designs – spatial) resources. In this way, a classifier predicate generation component could actually be built on top of an existing ASL MT system that didn't currently support classifier predicate expressions. We will first consider a classifier predicate MT approach requiring little linguistic processing or novel ASL representations, namely a fully lexicalized approach. 2 A later section of this paper describes how the decision of whether an input English sentence can be processed by the special classifier predicate translation pathway depends on whether a motif (introduced in that section) has been implemented for the semantic domain of that sentence. As engineering limitations are identified or additional linguistic analyses are considered, the design will be modified, and progressively more sophisticated representations and processing architectures will emerge. Design 1: Lexicalize the Movement Paths The task of selecting the appropriate handshape for a classifier predicate, while non-trivial, seems approachable with a lexicalized design. For example, by storing semantic features (e.g. +human, +vehicle, +animal, +flat-surface) in the English lexicon, possible handshapes can be identified for entities referred to by particular English nouns. Associating other features (e.g. +motion-path, +stationary-location, +relative-locations, +shape-contour) with particular verbs or prepositions in the English lexicon could help identify what kind of information the predicate must express – further narrowing the set of possible classifier handshapes. To produce the 3D movement portion of the predicate using this lexicalized approach, we could store a set of 3D coordinates in the English lexicon for each word or phrase (piece of lexicalized syntactic structure) that may be translated as a classifier predicate. Problems with This Design Unfortunately, the highly productive and scenespecific nature of these signs makes them potentially infinite in number. For example, while it may seem possible to simply store a 3D path with the English phrase "driving up a hill," factors like the curve of the road, steepness of hill, how far up to drive, etc. would affect the final output. So, a naïve lexicalized 3D-semantics treatment of classifier movement would not be scalable. Design 2: Compose the Movement Paths Since the system may need to produce innumerable possible classifier predicates, we can't merely treat the movement path as an unanalyzable whole. A more practical design would compose a 3D path based on some finite set of features or semantic elements from the English source text. This approach would need a library of basic animation components that could be combined to produce a single classifier predicate movement. Such an “animation lexicon” would contain common positions in space, relative orientations of objects in space (for concepts like above, below, across from), common motion paths, or common contours for such paths. Finally, these components would be associated with corresponding features or semantic elements of English so that the appropriate animation components can be selected and combined at translation time to produce a 3D path. Problems with This Design This design is analogous to the polymorphemic model of classifier predicate generation (Supalla 1978, 1982, 1986). This model describes ASL classifier predicates as categorical, and it characterizes their generation as a process of combining sets of spatially semantic morphemes. The difficulty is that every piece of spatial information we might express with a classifier predicate must be encoded as a morpheme. These phenomena can convey such a wide variety of spatial information – especially when used in combination to describe spatial relationships or comparisons between objects in a scene – that many morphemes are required. Liddell’s analysis (2003b) of the polymorphemic model indicates that in order to generate the variety of classifier predicates seen in ASL data, the model would need a tremendously large (and possibly infinite) number of morphemes. Using a polymorphemic analysis, Liddell (2003b) decomposes a classifier predicate of one person walking up to another, and he finds over 28 morphemes, including some for: two entities facing each other, being on the same horizontal plane, being vertically oriented, being freely moving, being a particular distance apart, moving on a straight path, etc. Liddell considers classifier predicates as being continuous and somewhat gestural in nature (2003a), and this partially explains his rejection of the model. (If there are not a finite number of possible sizes, locations, and relative orientations for objects in the scene, then the number of morphemes needed becomes infinite.) Whether classifier predicates are continuous or categorical and whether this number of morphemes is infinite or finite, the number would likely be intractably large for an MT system to process. We will see that the final classifier predicate generation design proposed in this paper will use a non-categorical approach for selecting its 3D hand locations and movements. This should not be taken as a linguistic claim about human ASL signers (who may indeed use the large numbers of morphemes required by the polymorphemic model) but rather as a tractable engineering solution to the highly productive nature of classifier predicates. Another reason why a polymorphemic approach to classifier predicate generation would be difficult to implement in a computational system is that the complex spatial interactions and constraints of a 3D scene would be difficult to encode in a set of compositional rules. For example, consider the two classifier predicates in the “the car drove down the bumpy road past the cat” example. To produce these predicates, the signer must know how the scene is arranged including the locations of the cat, the road, and the car. A path for the car must be chosen with beginning/ending positions, and the hand must be articulated to indicate the contour of the path (e.g. bumpy, hilly, twisty). The proximity of the road to the cat, the plane of the ground, and the curve of the road must be selected. Other properties of the objects must be known: (1) cats generally sit on the ground and (2) cars generally travel along the ground on roads. The successful translation of the English sentence into these two classifier predicates involved a great deal of semantic understanding, spatial knowledge, and reasoning. A 3D Spatial Representation for ASL MT ASL signers using classifier predicates handle these complexities using their own spatial knowledge and reasoning and by visualizing the elements of the scene. An MT system may also benefit from a 3D representation of the scene from which it could calculate the movement paths of classifier predicates. While design 2 needed compositional rules (and associated morphemes) to cover every possible combination of object positions and spatial implications as suggested by English texts, the third and final MT design (discussed in a later section) will use virtual reality 3D scene modeling software to simulate the movement and location of entities described by an English text (and to automatically manage their interactions).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spatial and Planning Models of ASL Classifier Predicates for Machine Translation

A difficult aspect of the translation of English text into American Sign Language (ASL) animation has been the production of ASL phenomena called “classifier predicates.” The complex way these expressions use the 3D space around the signer challenges traditional MT approaches. This paper presents new models for classifier predicates based on a 3D spatial representation and an animation planning...

متن کامل

A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation

The translation of English text into American Sign Language (ASL) animation tests the limits of traditional MT architectural designs. A new semantic representation is proposed that uses virtual reality 3D scene modeling software to produce spatially complex ASL phenomena called “classifier predicates.” The model acts as an interlingua within a new multi-pathway MT architecture design that also ...

متن کامل

American Sign Language Spatial Representations for an Accessible User-Interface

Several common misconceptions about the English literacy rates of deaf Americans, the linguistic structure of American Sign Language (ASL), and the suitability of traditional machine translation (MT) technology to ASL have slowed the development of English-to-ASL MT systems for use in accessibility applications. This paper will discuss these issues and will trace the progress of a new English-t...

متن کامل

Design and Evaluation of an American Sign Language Generator

We describe the implementation and evaluation of a prototype American Sign Language (ASL) generation component that produces animations of ASL classifier predicates, some frequent and complex spatial phenomena in ASL that no previous generation system has produced. We discuss some challenges in evaluating ASL systems and present the results of a userbased evaluation study of our system. 1 Backg...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004