A Bilingual VOYAGER System

نویسندگان

  • David Goodine
  • Michael S. Phillips
  • Shinsuke Sakai
  • Stephanie Seneff
  • Victor Zue
چکیده

This paper describes our initial efforts at porting the VOYAGER spoken language system to Japanese. In the process we have reorganized the structure of the system so that language dependent information is separated from the core engine as much as possible. For example, this information is encoded in tabular or rule-based form for the natural language understanding and generation components. The internal system manager, discourse and dialogue component, and database are all maintained in language transparent form. Once the generation component was ported, data were collected from 40 native speakers of Japanese using a wizard collection paradigm. A portion of these data was used to train the natural language and segment-based speech recognition components. The system obtained an overall understanding accuracy of 52~0 on the test data, which is similar to our earlier reported results for English [i]. I N T R O D U C T I O N In the fall of 1989, our group first demonstrated VOYAGER, a system tha t can engage in verbal dialogues with users about a geographical region within Cambridge, Massachusetts [2]. The system can provide users with information about distances, travel times, or directions between objects located within this area (e.g., restaurants, hotels, post offices, subway stops), as well as information such as addresses or telephone numbers of the objects themselves. While VOYAGER is constrained both in its capabilities and domain of knowledge, it contains all the essential components of a spoken-language system, including discourse maintenance and language generation. The VOYAGER application provided us with our first experience with the development of spoken language systems, helped us understand the issues related to this endeavor, and provided a framework for our subsequent system development efforts [3, 4]. 1This research was supported by DARPA under Contract N00014-89-J-1332, monitored through the Office of Naval Research. 2Currently a visiting scientist from NEC Corp, Kawasaki, Japan. 3The authors are listed in alphabetical order. 49 Over the past few years, we have become increasi n g l y interested in developing multilingual spoken language systems. There are several ongoing international spoken language translation projects whose goal is to enable humans to communicate with each other in their imtive tongues [5, 6]. Our objective, however, is somewhat different. Specifically, we are interested in developing multilingual human-computer interfaces, such tha t the information stored in the database can be accessed and received in multiple spoken languages. We believe that there is great utility in having such systems, since information is fast becoming globally accessible. Furthermore, we suspect tha t this type of multilingual system may be easier to develop than speech translation systems, since the system only needs to anticipate the diversity of one side of the conversation, i.e., the human side. During the past year, we have begun to develop a multilingual version of VOYAGER. This paper will describe our work in extending VOYAGER'S capability from English to Japanese. Since VOYAGER was originally designed only for English, a number of changes were necessary to accommodate multiple languages. In the next section, we describe our approach to developing multilingual systems, and the modifications made to the original system. A discussion of the specific implementation of the various components for Japanese will follow. Finally, performance evaluation of the Japanese VOYAGER system will be presented, followed by a brief description of future plans. S Y S T E M D E S C R I P T I O N Figure 1 shows a block diagram of a prototypical MIT spoken language system. The speech signal is converted to words using our SUMMIT segment-based speech recognition system [7]. Language understanding makes use of TINA, a probabilistic natural language system tha t interleaves syntactic and semantic information in the parse tree [8]. Da ta exchange between SUMMIT and TINA is currently achieved via an N-bes t interface, in which the recognizer produces the top -N sentence hypotheses, and TINA screens them for syntactic and semantic well-formedness within the domain [1]. The parse-tree produced Figure 1: system. Schematic of prototypical MIT spoken-language by TINA iS subsequently converted to a semantic frame which is intended t o capture the meaning of the input utterance in a language independent form [4]. The semantic frame is passed to the system manage r which uses it, along with contextual information stored in the discourse component, to access information stored in the database, and provide a response [2]. The VOYAGER application uses an object-oriented database, although we have also accessed data in SQL and other configurations [3]. Responses to the user consist of displays, text, and synthetic speech. The latter two are derived via a language generation component which generates nounphrases from the internal semantic representation and embeds them into context-dependent messages. In order to develop a multilingual capability for our spoken language systems, we have adopted the approach that each component in the system be as language transparent as possible. In the VOYAGER system for instance, the system manager, discourse component, and the database are all structured so as to be independent of the input or output language. Where language-dependent information is required we have at tempted to isolate it in the form of external tables or rules, as illustrated in Figure 1 for both the language understanding and generation components. As will be described in more detail in the next section, we trained a version of the basic SUMMIT system for both Japanese and English, using data recorded from native speakers for each language. The current user interface is very similar to that of the original VOYAGER system, except tha t a separate recording icon is used for each language. For text-to-speech synthesis we use a DECtalk system for English, and an NEC text-to-speech system for Japanese. If we are to at tain a multilingual capability within a single system framework, the task of porting to a new language should involve only adapting existing tables or models, without requiring any modification of the indi-50 vidual components. By incrementally porting the system to new languages we hope to slowly generalize the architecture of each component to achieve this result. The following sections provide more detailed descriptions of the work done in the different areas to achieve a bilingual status of VOYAGER. J A P A N E S E I M P L E M E N T A T I O N To allow VOYAGER to converse with a user in Japanese, the following steps were taken. We first converted the system so that it could generate responses in Japanese. This enabled us to collect data from native speakers of Japanese in a wizard mode whereby an experimentor would translate the subjects' spoken input and type the resulting English queries to the system [3, 9]. Once da ta were available we were able to port the speech recognition and language understanding components. In the process of augmenting the system components to handle Japanese, we made many changes to the system core structure, separating out the language-dependent aspects into external tables and rules. D a t a C o l l e c t i o n One of the most time-consuming aspects of the porting process was the acquisition of appropriate user data capturing the many different ways users can ask questions within the VOYAGER domain: We started with translations from available English sentences, but these alone are not nearly adequate for closure on coverage of actual data. Although in theory a grammar developer can use his/her innate knowledge of the language to write appropriate grammar rules, in practice such an approach falls far short of complete coverage of actual user utterances. For data collection from Japanese subjects we recorded data from 40 native speakers, recruited from the general MIT community. In a manner similar to data collection techniques used for the ATIS domain [3], subjects were asked to solve four problem scenarios. At the end of the session subjects were also allowed to ask random questions of the system. The resulting corpus of 1426 utterances was partit ioned into a 34 speaker training set and a 6 speaker test set which was subsequently used to evaluate system components.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of Information Retrieval Capabilities in Library Software of Payam, Voyager and Aleph

The purpose of this study was comparing Information Retrieval Capabilities in Web-based Library Software of Payam, with Voyager and ALEPH. A checklist designed and included six main trait for evaluation and comparing 73 scales. Data collected by experts' observing of the software's OPAC. Data analyzed by the descriptive statistics methods. Findings shows the preferences in search capabilities i...

متن کامل

Translation Quality Assessment of English Equivalents of Persian Proper Nouns: A case of bilingual tourist signposts in Isfahan

Abstract This study evaluated the translation quality of English equivalents of Persian proper nouns in the tourist signs and bilingual boards in Isfahan. To find different errors in the translations of the bilingual boards and tourist signs, the data were collected directly by taking picture or writing exactly from the available tourist signs and bilingual boards. Then, the errors were assesse...

متن کامل

Recent Progress on the VOYAGER System

Introduction The VOYAGER speech recognition system, which was described in some detail at the last DARPA meeting [9], is an urban exploration system which provides the user with help in locating various sites in the area of Cambridge, Massachusetts. The system has a limited database of objects such as banks, restaurants, and post offices and can provide information about these objects (e.g., ph...

متن کامل

Translation Quality Assessment of English Equivalents of Persian Proper Nouns: A case of bilingual tourist signposts in Isfahan

Abstract This study evaluated the translation quality of English equivalents of Persian proper nouns in the tourist signs and bilingual boards in Isfahan. To find different errors in the translations of the bilingual boards and tourist signs, the data were collected directly by taking picture or writing exactly from the available tourist signs and bilingual boards. Then, the errors were assesse...

متن کامل

Bilingual Education and Necessity to Differentiate Two Educational Challenges for Deaf Students

Background: Some obstacles and inefficiencies in deaf education system may be attributed to the fact that the right to education and equality of opportunities for national core curriculum, and the need for learning Farsi language are not met separately among deaf students. In fact, the distinction between these two educational challenges is not addressed to deaf pupils in particular. Based on t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1993