Data and language documentation
نویسنده
چکیده
The topic of this chapter is the relationship between data and language documentation. Unlike many fields of study, concerns regarding data collection and manipulation play a central role in our understanding of, and theorizing about, language documentation. The field to a large extent, in fact, owes its existence to a shift in focus in the goals of linguistic field work from concerns regarding outputs derived from primary data, like grammars and dictionaries, to the collection of the primary data itself. When trying to understand the role of data in language documentation, the first question we must consider is what precisely do we mean by data? Beginning with the work of Himmelmann (1998), it has become customary in language documentation to distinguish between primary data—constituting recordings, notes on recordings, and transcriptions—and analytical resources—like descriptive grammars and dictionaries—constructed on the basis of, and via generalization over, primary data. While making this conceptual distinction is essential to the practice and theorizing of language documentation, most individuals or teams working on language documentation projects are ultimately interested in both collecting primary data and producing the kinds of analytical resources associated with traditional language description, most prominently grammars, dictionaries, and texts (whether oriented for community or academic use). Therefore, each will be considered here. That is, the discussion will cover topics both regarding the collection, storage, and manipulation of primary data as well as the mobilization (see Holton (this volume)) of that data to create analytical resources. While it is also important to keep in mind that data is not synonymous with digital data, for the most part, in this chapter, only digital data will be discussed. Generally, digital, rather than analog, data has been the focus of work in language documentation both because new data is typically captured solely in digital form at present and because analog data is increasingly being digitized so that it can be manipulated and disseminated with digital tools. Discussion of important aspects of digitization—i.e., the process through which a digital representation of a non-digital object is created—can be found in the E-MELD School of Best Practices in Digital Language Documentation (Boynton et al. (2006)), and an exemplary case study of the digitization process can be found in Simons et al. (2007). This chapter will focus on conceptual issues rather than specific technical recommendations, though such recommendations may be discussed to provide illustrative examples. This is because our understanding of the conceptual issues evolves at a much slower rate than the technical recommendations, which change as the technologies we use for capturing and analyzing data themselves change and, therefore, largely outpace the speed through which works like this one make their way into publication. At least for the time being, the best way to find answers to questions like What audio recording device should I use? or What
منابع مشابه
Impact of Controlled and Free Language Use in Retrieving Articles from the ProQuest and Science Direct Databases
Abstract Introduction: The growth and expansion of the Internet has changed the way information is accessed and many facilities have been created on the Web to facilitate and expedite information locating. Objective: To identify the impact of keyword documentation using the medical thesaurus on the retrieval of articles from Proquest and Science Direct databases. Materials and Methods:The pr...
متن کاملBeyond the Ancestral Code: Towards a Model for Sociolinguistic Language Documentation
Most language documentation efforts focus on capturing lexico-grammatical information on individual languages. Comparatively little effort has been devoted to considering a language’s sociolinguistic contexts. In parts of the world characterized by high degrees of multilingualism, questions surrounding the factors involved in language choice and the relationship between ‘communities’ and ‘langu...
متن کاملInstant Annotations – Applying NLP Methods to the Annotation of Spoken Language Documentation Corpora
Thepaper describes work-in-progress by the Pite Saami, Kola Saami and Izhva Komi language documentation projects, all of which use similar data and technical frameworks and are carried out in Freiburg and in collaboration with Hamburg, Syktyvkar, Tromsø and Uppsala. Our projects work in the endangered language documentation framework and record new spoken language data, digitize available recor...
متن کاملSpeeding up corpus development for linguistic research: language documentation and acquisition in Romansh Tuatschin
In this paper, we present ongoing work for developing language resources and basic NLP tools for an undocumented variety of Romansh, in the context of a language documentation and language acquisition project. Our tools are designed to improve the speed and reliability of corpus annotations for noisy data involving large amounts of code-switching, occurrences of child speech and orthographic no...
متن کاملHospital Compliance with Clinical Documentation Standards: A Descriptive Study in two Iranian Teaching Hospitals
Background and Objectives: Standard clinical documentation is an integral part of quality patient care. This study aimed to explore compliance of two Iranian teaching hospitals with the clinical documentation standards. Methods: A total of 400 records were surveyed. Data were collected using a checklist of standard measures. The checklist comprised 15 items selected from relevant guidelines...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010