Commentary/Chow: Statistical significance

نویسنده

  • Robert W. Frick
چکیده

I disagree with several of Chow’s traditional descriptions and justifications of null hypothesis testing: (1) accepting the null hypothesis whenever p . .05; (2) random sampling from a population; (3) the frequentist interpretation of probability; (4) having the null hypothesis generate both a probability distribution and a complement of the desired conclusion; (5) assuming that researchers must fix their sample size before performing their study. Critics of the null-hypothesis statistical-testing procedure (NHSTP) do not tend to criticize one another, despite differences in their positions. For example, NHSTP is criticized but power analyses are not, even though a power analysis assumes the existence of NHSTP. Researchers are advised to report effect size in statistical units such as Cohen’s d (e.g., Schmidt 1996) or to report confidence intervals (e.g., Loftus & Masson 1994), but they are not told to report confidence intervals for effect size reported in statistical units. Cohen (1994) criticized the underlying logic of NHSTP but then suggested that researchers report confidence intervals because that accomplished NHSTP for all possible null hypotheses. One might expect the defenders of NHSTP to ally, but this alliance too would be unnatural. I agree with Chow that NHSTP plays an essential and irreplaceable role in science (Frick 1996). I agree with many of his points, especially that effect size is not relevant in the theory-corroboration experiment. However, I disagree with many of the justifications Chow provides for NHSTP. In this commentary, I will focus on ways that Chow is in a sense too traditional. In assessing NHSTP, the actual practice of researchers must be distinguished from the way it is described in textbooks and the attempts to justify that practice logically. In each of the following criticisms, Chow has defended the traditional description or justification of NHSTP rather than the actual practice of researchers. First, Chow implies that the null hypothesis is accepted whenever p . .05. Good researchers do sometimes argue that their evidence supports a hypothesis of no effect or no difference, but they use more evidence than just p . .05 (e.g., Frick 1995). Second, Chow uses random sampling from a population to justify the construction of the requisite probability distribution. This implies that researchers should sample randomly from populations and that the business of statistical testing is making claims about populations. I disagree. To make a claim about a pattern in the data, such as that one treatment is more effective than another, the researcher must address the possibility that this observed pattern occurred just by chance. As Chow notes, statistical testing accomplishes this, with p being a measure of the strength of the evidence against the just-by-chance hypothesis. The outcome of statistical testing and a lack of artifacts – which I call the finding – is a conclusion about the subjects tested. No assumption of randomsampling is needed for this interpretation (Frick, in press b). Third, Chow defends the frequentist interpretation of probability, in which probability is defined as the limiting ratio of an infinite sequence of trials. This definition confuses probabilities with the method of measuring probabilities. In other words, it is the operationalism Chow decries (p. 153). A propensity definition of probability better justifies the procedures of NHSTP (Frick, in press b). Fourth, in the traditional justification of NHSTP, the null hypothesis plays two roles – it generates the probability distribution underlying the determination of p, and it is the complement of the researcher’s desired conclusion. These two roles are incompatible. To generate the probability distribution, a point hypothesis, for example, m1 5 m2 is needed. However, the complement of this is m1 ± m2, which is not the claim researchers make and – as critics of NHSTP are fond of noting – not even a claim worth making. Researchers in practice make a directional claim, such as m1 , m2. To allow this claim, Chow describes the null hypothesis as being directional, for example, m1 # m2. However, this leads Chow to the awkward position of primarily defending the use of a one-tailed test, which researchers rarely use. This definition also does not support the definition of p as the probability of achieving the observed results or larger given the null hypothesis. A solution is this: A point hypothesis is used to generate the probability distribution. Following the conventional rules of science, p , .05 allows rejection of this hypothesis, and it would also allow rejecting the hypotheses even more discrepant from the observed data. Therefore, a directional conclusion can be made. This is exactly the process Chow describes (and Fisher before him), but it cannot be described with a single null hypothesis serving two roles. Fifth, Chow equates NHSTP with the fixed-sample stopping rule, in which the number of subjects is determined in advance. Do researchers actually use the fixed-sample stopping rule? Do researchers never (a) give up part way through a study because the results were discouraging, (b) test less than the planned number of subjects because p was already less than .001, or (c) test more subjects than planned when p was slightly greater than .05? These actions seem rational to me, but they violate the fixed-sample stopping rule. Fortunately, the alternatives to the fixed-sample stopping rule – sequential stopping rules in which the number of subjects is not fixed in advance – are compatible with NHSTP. Because of their increased efficiency and practicality, sequential stopping rules should usually be preferred to the fixed-sample stopping rule (Frick, in press a). We need statistical thinking, not statistical rituals

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

you’ve got a weak effect, do a meta-analysis

Statistical significance testing has its problems, but so do the alternatives that are proposed; and the alternatives may be both more cumbersome and less informative. Significance tests remain legitimate aspects of the rhetoric of scientific persuasion. I admit it: after more than 25 years of reading, writing, reviewing, and editing scientific research in psychology and related fields, I still...

متن کامل

Commentary on “ Analysis of clinical data with breached blindness

In their recent paper, Chow and Shao [1] proposed a method for analysing clinical data with breached blindness. As an incentive to their work they claimed that bias caused by the knowledge of the identity of the treatment can seriously distort statistical inference on the therapeutic effect. Thus they argued that when the integrity of blinding is doubtful, adjustments to statistical analyses sh...

متن کامل

What Statistical Significance Means

Sohn (1998) presents a good argument that neither statistical significance nor effect size is indicative of the replicability of research results. His objection to the Bayesian argument is also succinct. However, his solution of the `replicability belief' issue is problematic, and his verdict that significance tests have no role to play in empirical research is debatable. The strengths and weak...

متن کامل

Issues in Statistical Inference

Being critical of using significance tests in empirical research, the Board of Scientific Affairs (BSA) of the American Psychological Association (APA) convened a task force "to elucidate some of the controversial issues surrounding applications of statistics including significance testing and its alternatives; alternative underlying models and data transformation; and newer methods made possib...

متن کامل

History and Philosophy of Psychology Bulletin Volume 15, No

Taking issue with Chow's (2002a) critique of Wilkinson and Task Force's (1999) report on statistical inference, Green (2002) raised several instructive issues, namely, (i) appealing to authority, (ii) theories for which there is no criterion of falsification, (iii) the distinction between experiment and meta-experiment, and (iv) the probability foundation of the null-hypothesis significance-tes...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998