Too many digits: the presentation of numerical data
نویسنده
چکیده
As a statistical reviewer for Archives and BMJ I am interested in the presentation of numerical data. It concerns me that numbers are often reported to excessive precision, because too many digits can swamp the reader, overcomplicate the story and obscure the message. A number’s precision relates to its decimal places or significant figures (or as preferred here, significant digits). The number of decimal places is the number of digits to the right of the decimal point, while the number of significant digits is the number of all digits ignoring the decimal point, and ignoring all leading zeros and some trailing zeros (for a fuller definition see ‘significant figures’ on Wikipedia). Ideally data should be rounded appropriately, not too much and not too little (one might call it Goldilocks rounding). The European Association of Science Editors guidelines include the useful rule of thumb: “numbers should be given in (sic) 2–3 effective digits”. Take as an example the odds ratio (OR) of 22.68 (95% CI 7.51 to 73.67) comparing beta mimetics with placebo for side effects requiring a change of medication. Its two decimal places and four significant digits are excessive when the effect size and confidence interval (CI) are so large. Reporting it rounded to two significant digits, as 23 (7.5 to 74), or even as 23 (8 to 70), with one significant digit for the CI, would be simpler and clearer. There are several published recommendations (or reporting rules) about rounding numbers, some of which relate to decimal places (eg, the Cochrane Style Guide or APA Style to round to two decimal places), some to significant digits (eg, the European Association of Science Editors guideline above) and some to a combination of the two (eg, setting the number of decimal places to ensure two significant digits for the standard deviation (SD)). However, the message here is that rules of the first type, specifying the number of decimal places and ignoring the number of significant digits, are inherently unsatisfactory, as the following examples show. Birth weight is usually reported in units of grams, for example, “birth weight ... resulting from blastocyst transfer was significantly greater than ... resulting from Day 3 transfer (3465.31±51.36 g vs 3319.82±10.04 g respectively, p=0.009)”. However it is also reported in kilograms: “The mean birth weight of babies was 3.05±0.57 (95% CI 2.95 to 3.15) kg”. In both articles birth weight is reported to two decimal places, but due to the different units they correspond to six and three significant digits, respectively. The first is clearly excessive while the second is about right, giving the SD to two significant digits. By analogy, birth weight in grams ought to be rounded to the nearest 10 g. A second example is the Cochrane Style Guide, which requires risk ratios to be reported to two decimal places. This is clearly unsatisfactory for ratios that are very large (see the example above) or very small, for example a hazard ratio (HR) of 0.03 (95% CI 0.01 to 0.05) for the updating of systematic review citations in Clinical Evidence versus Dynamed. If the direction of the HR were reversed its true value could be anywhere between 29 and 40 due to the extreme rounding. As a third example, p values, it has been suggested, should be rounded to one or two decimal places. For p values above the conventional 0.05 cut-off there is little justification for quoting more than one decimal place, while for significant results three or even four decimals may be necessary. The better rule is to report rounded up to one significant digit, which works across the spectrum of values. Thus a decimal places rule that ignores significant digits does not work. But equally, and perhaps surprisingly, a significant digits rule that ignores decimal places does not always work either. Reporting risk ratios to three significant digits for example leads to the largest ratio below 1 being reported as 0.999 and the smallest above 1 as 1.01, with three and two decimal places, respectively. This is clearly unsatisfactory as they differ in precision by a factor of ten. In this instance a combination of significant digits and decimal places, the rule of four, works best: round the risk ratio to two significant digits if the leading non-zero digit is four or more, otherwise round to three. The rule of four gives three decimal places for risk ratios from 0.040 to 0.399, two from 0.40 to 3.99 and one from 4.0 to 39.9. Applying it to the example of 22.68 above gives 22.7 (95% CI 7.5 to 74). Alternatively one can apply the rule with one less significant digit, giving 23 with CI 8 to 70. Another example is the reporting of test statistics such as t or F. Specifying one decimal place would permit say t=30.1, where 30 is clearly sufficient as it is so highly significant. Conversely specifying two significant digits would permit t=−0.13, where again the extra precision is irrelevant as it is far from significant. A suitable rule specifies up to one decimal place and up to two significant digits. When comparing group means or percentages in tables, rounding should not blur the differences between them. This is the basis for the Hopkins two digits rule, whereby the mean has enough decimal places to ensure two significant digits for the SD. An analogous rule for percentages might be to use enough decimal places to ensure two significant digits for the range of values across groups, eg, if the range is 10% or more use whole numbers, if less than 1% use two decimal places, and otherwise one. In practice percentages are usually given along with their corresponding frequencies, so precision is less critical as the exact values can be calculated. Recognising the fallibility of decimal places rules means that tables ought not to be restricted to columns of numbers with fixed decimal places, and this adds flexibility when deciding how many decimals to use. For example measures of variability, eg, standard errors (SE)s or CIs, need not be as precise as the effect size, particularly if the CI is wide. A useful trick when formatting table columns is to align the numbers by decimal point, which highlights differences in the number of decimal places. This is particularly useful in columns of risk ratios or p values—see the examples in the table. It is important that any intermediate calculations are carried out to full Correspondence to Professor T J Cole, Population, Policy and Practice Programme, UCL Institute of Child Health, London WC1N 1EH, UK; [email protected]
منابع مشابه
Symbol processing in the left angular gyrus: Evidence from passive perception of digits
Arabic digits are one of the most ubiquitous symbol sets in the world. While there have been many investigations into the neural processing of the semantic information digits represent (e.g. through numerical comparison tasks), little is known about the neural mechanisms which support the processing of digits as visual symbols. To characterise the component neurocognitive mechanisms which under...
متن کاملSetting number of decimal places for reporting risk ratios: rule of four.
Precision and rounding—decimal places and significant digits Reporting of numerical data is an important element in medical research. Summary statistics are often reported to too many decimal places, leading to spurious precision and over-complicated presentation1; less often, too few decimal places are used, resulting in a lack of precision. Surprisingly, few guidelines on the subject exist. T...
متن کاملDisplaying Bivariate Data
Numerical techniques are too often designed to yield specific answers to rigidly defined questions. Graphical techniques are less confining. They aid in understanding the numerous relationships reflected in the data. They help reveal the existence of peculiar looking observations or subsets of the data. It is difficult to obtain similar information from numerical procedures. In this article, by...
متن کاملO12: Off the Couch and Out the Door: Improving Treatment Through a Refined Understanding of Psychotherapeutic Change
State-of-the-art psychotherapy for anxiety disorders represents some of the most efficacious treatments in the mental health literature. Nevertheless, these treatments are not panacea. Too many patients drop out of treatment, response rates leave room for improvement, and residual symptomatology is common. The quest to improve therapy for patients suffering from mental disorders necessitates on...
متن کاملNumerical Modelling of the Segmental Lining of Underground Structures
There are several methods for analysing the behaviour of underground structures under different loading conditions. Most of these methods have many simplifications; therefore, in some cases, the results are too conservative and a very high safety factor, usually of more than 2 is needed. On the other hand, for stability analysis and the designing of support systems, these methods consider segme...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 100 شماره
صفحات -
تاریخ انتشار 2015