This paper develops the thesis that the sample sizes which are commonly used in clinical outcome research are not sufficient to detect meaningful differences between treatments. Behavioral weight control is used to exemplify this problem. The sample sizes needed to statistically detect a difference between treatment conditions of 5, 10, and 15 pounds have been computed based on the attrition and the variability of treatment effects reported in the literature. It is demonstrated that sample sizes used in behavioral weight control studies are usually too small to detect any but the largest differences between conditions. With usual sample sizes, a 10-pound difference between conditions at the end of treatment and a 15-pound difference at follow-up (effect size of 1.2-1.3) would be required to assure statistical significance. Recommendations are made for (a) greater attention to sample size calculation in study design, (b) attempts to reduce between-subject variability, and (c) consideration of relaxing standard criteria for statistical significance in exploratory studies.
Bibliographical noteFunding Information:
Preparation of this manuscript was supported, in part, by Grant AM 29757-02 to Dr. Rena R. Wing from the National Institute of Arthritis, Metabolism and Digestive Diseases and, in part, by Grant AM 26542-03 to Dr. Robert W. Jeffcry from the National Institute of Arthritis, Metabolism and Digestive Diseases. Requests for reprints should be sent to Rena R. Wing, Western Psychiatric Institute and Clinic, University of Pittsburgh School of Medicine, 3811 O'Hara Street, Pittsburgh, PA 15213.