Measures of dispersion for corpus data: an overview, a suggestion, and a research program II
ثبت نشده
چکیده
In order to adjust observed frequencies of occurrence, previous studies have suggested a variety of measures of dispersion and adjusted frequencies. In part I of this article, I first summarily reviewed many of these measures as well as a variety of their shortcomings and then suggested an alternative measure, DP, for deviation of proportions, which I argued to be conceptually simpler, but at the same time more versatile than many competing measures. I then exemplified this measure on the basis of word frequency data on co-occurrence data of words and construction/patterns. However, in spite of the advantages of DP and in spite of its relevance for virtually all corpus-linguistic work, dispersion is still a very much under-researched topic: to the best of my knowledge, there is not a single study investigating how different measures compare to each other when applied to large datasets. The present article therefore is largely programmatic and exploratory: it sketches out a research program for the investigation of measures of dispersion and adjusted frequencies and takes some initial steps itself. More specifically, this paper addresses the issues of the integration of frequencies and dispersion measures and the quantitative comparison of dispersion measures and adjusted frequencies. Finally, the paper makes available a few online resources that will hopefully stimulate more research in this central area of corpus-linguistic methodology and help other researchers go beyond the first programmatic steps taken here.
منابع مشابه
Measures of dispersion for corpus data: an overview, a suggestion, and a research program I
The most frequently used statistic in corpus linguistics are the frequency of occurrence of some linguistic variable and the frequency of co-occurrence of two or more linguistic variables. However, as has been pointed out correctly and repeatedly, frequencies of (co-)occurrence in isolation may sometimes be severely misleading given that they alone to not take into consideration the degree of d...
متن کاملAn Empirical Comparison of Distance Measures for Multivariate Time Series Clustering
Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from a lack of comparative...
متن کاملDeveloping a Corpus-Based Word List in Pharmacy Research Articles: A Focus on Academic Culture
The present corpus-based lexical study reports the development of a Pharmacy Academic Word List (PAWL); a list of the most frequent words from a corpus of 3,458,445 tokens made up of 800 most recent pharmacy texts including research articles, review articles, and short communications in four sub-disciplines of pharmacy. WordSmith (Scott, 2017) and AntWordProfiler (Anthony, 2014) were used to sc...
متن کاملThe Influence of Data-Driven Exercises Through Using a Computer Program on Vocabulary Improvement in an EFL Context
The present study was conducted to evaluate data driven learning (DDL) combined with Computer Assisted Language Learning (CALL) as an approach to improving vocabulary knowledge of Iranian postgraduates majoring in teaching English, English literature and translation. The purpose was to help language learners get familiar with DDL as a student-centered method taking advantage of a computer progr...
متن کاملThe Influence of Data-Driven Exercises Through Using a Computer Program on Vocabulary Improvement in an EFL Context
The present study was conducted to evaluate data driven learning (DDL) combined with Computer Assisted Language Learning (CALL) as an approach to improving vocabulary knowledge of Iranian postgraduates majoring in teaching English, English literature and translation. The purpose was to help language learners get familiar with DDL as a student-centered method taking advantage of a computer progr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007