Measures of dispersion for corpus data: an overview, a suggestion, and a research program II

ثبت نشده
چکیده

In order to adjust observed frequencies of occurrence, previous studies have suggested a variety of measures of dispersion and adjusted frequencies. In part I of this article, I first summarily reviewed many of these measures as well as a variety of their shortcomings and then suggested an alternative measure, DP, for deviation of proportions, which I argued to be conceptually simpler, but at the same time more versatile than many competing measures. I then exemplified this measure on the basis of word frequency data on co-occurrence data of words and construction/patterns. However, in spite of the advantages of DP and in spite of its relevance for virtually all corpus-linguistic work, dispersion is still a very much under-researched topic: to the best of my knowledge, there is not a single study investigating how different measures compare to each other when applied to large datasets. The present article therefore is largely programmatic and exploratory: it sketches out a research program for the investigation of measures of dispersion and adjusted frequencies and takes some initial steps itself. More specifically, this paper addresses the issues of the integration of frequencies and dispersion measures and the quantitative comparison of dispersion measures and adjusted frequencies. Finally, the paper makes available a few online resources that will hopefully stimulate more research in this central area of corpus-linguistic methodology and help other researchers go beyond the first programmatic steps taken here.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Measures of dispersion for corpus data: an overview, a suggestion, and a research program I

The most frequently used statistic in corpus linguistics are the frequency of occurrence of some linguistic variable and the frequency of co-occurrence of two or more linguistic variables. However, as has been pointed out correctly and repeatedly, frequencies of (co-)occurrence in isolation may sometimes be severely misleading given that they alone to not take into consideration the degree of d...

متن کامل

An Empirical Comparison of Distance Measures for Multivariate Time Series Clustering

Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from a lack of comparative...

متن کامل

Developing a Corpus-Based Word List in Pharmacy Research ‎Articles: A Focus on Academic Culture

The present corpus-based lexical study reports the development of a Pharmacy Academic Word List (PAWL); a list of the most frequent words from a corpus of 3,458,445 tokens made up of 800 most recent pharmacy texts including research articles, review articles, and short communications in four sub-disciplines of pharmacy. WordSmith (Scott, 2017) and AntWordProfiler (Anthony, 2014) were used to sc...

متن کامل

The Influence of Data-Driven Exercises Through Using a Computer Program on Vocabulary Improvement in an EFL Context

The present study was conducted to evaluate data driven learning (DDL) combined with Computer Assisted Language Learning (CALL) as an approach to improving vocabulary knowledge of Iranian postgraduates majoring in teaching English, English literature and translation. The purpose was to help language learners get familiar with DDL as a student-centered method taking advantage of a computer progr...

متن کامل

The Influence of Data-Driven Exercises Through Using a Computer Program on Vocabulary Improvement in an EFL Context

The present study was conducted to evaluate data driven learning (DDL) combined with Computer Assisted Language Learning (CALL) as an approach to improving vocabulary knowledge of Iranian postgraduates majoring in teaching English, English literature and translation. The purpose was to help language learners get familiar with DDL as a student-centered method taking advantage of a computer progr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007