Share this article:
Replication in Psychotherapy
I have argued in my book Cognitive Neuroscience and Psychotherapy: Network Principles for a Unified Theory that theories should not be expected to explain facts until they have been replicated. Makel, Plucker and Hegarty (2012) reported “… recent data indicate that replications are infrequently published and that when they are, they have success rates a little over 1%” (p. 305). Pashler and Wagenmakers (2012) published an introduction to a special issue concerning replication in which they characterized this inability to replicate psychological phenomena as a scientific crisis.
What I did not discuss in my book is what constitutes successful replication. Stanley and Spence (2014) addressed the question of how exact can we expect replications to be. More specifically, they asked “How should failed replication attempts be interpreted?” (p. 305). Their answer and mine is that replication should be evaluated using the quantitative literature review procedure known as meta-analysis. There are two types of meta-analyses: regular and psychometric. The regular form of meta-analysis has been correctly criticized because it combines low quality studies with high quality ones. This practice underestimates effect sizes and introduces illusory moderator variables that mislead readers into believing that relationships are both smaller and more complicated than they actually are. The psychometric form corrects for various psychometric and experimental design flaws thereby avoiding this criticism.
All meta-analyses compute statistics and as I have told the students in my statistics classes for decades every statistic is always wrong. A main objective of statistical analysis is to determine how wrong specific statistics are. Confidence intervals are a statistical tool for quantifying the range of values that a particular statistic can be expected to take on 95% or 99% of the time given exact replication on a new random sample. As with everything in life there is a tradeoff. The more confident one chooses to be the less certain you can be. That is, a 99% confidence interval will be broader, will cover a wider range of values, than a 95% confidence interval will. Psychologists have generally agreed to compute 95% confidence intervals. I now discuss several reasons that contribute to the width of confidence intervals which is to say why exact replications should not be expected.
Reasons Why Replications are Not Exact
Sampling error derives from the fact that subsets of all potential participants are actually studied because one cannot ever study all of the people to which one wishes to generalize the results to. Ideally, participants are randomly sampled from the population of interest but as we shall see below this rarely if ever happens. But computer simulation studies such as those presented by Stanley and Spence (2014) can randomly sample. They conclusively demonstrate that each sample generates a different value of the statistic in question such as the mean value. Replicating the study 1,000 times will yield 1,000 means; all with different values. The 95% confidence interval specifies the lower and upper values that bound 95% of the identified means. The width of this confidence interval shrinks in direct proportion to the square root of the sample size. Hence, studies conducted on larger sample sizes replicate better than studies conducted on smaller sample sizes. Replication should be evaluated on the combined width of the meta-analytic confidence interval.
Participants in scientific psychological experiments are presumed to be randomly selected from the population to which the investigators wish to generalize and that generally means all people in the Western world or sometimes all people in the world. Psychologists sometimes like to think that their findings generalize to everyone. If psychologists only studied random samples then the results of their studies would replicate more closely than they do. However, such generalizations are almost never true because investigators almost never have access to a truly random sample of people from the population they wish to generalize to and consequently exact replications should not be expected.
There are several reasons why psychologists do not study random samples. To start with, participants have to volunteer. Not everyone in the population knows about the experiment and far fewer volunteer for scientific studies of any kind. Many participants in psychological experiments are college sophomores taking Introductory Psychology courses. A certain number of hours of participation in psychology experiments is often a course requirement. Consequently, many psychological studies are conducted with conscripted college sophomores and therefore the results of these experiments are unrepresentative of the population at large because college sophomores are generally younger than average, more intelligent than average, and more affluent than average among other differences. Experiments run on the Internet promise to select from a broader more representative population but not everyone in the world has a computer and an Internet connection. Once participants are fully informed about the study some decline to participate. Once the study is underway, some participants drop out. The final sample is far from a random sample.
Scientific studies require that measurements be made. Psychologists frequently use tests. The psychometric reliability of these tests can vary substantially. Some tests are more reliable than others. Efforts to replicate findings based on more reliable measurement using less reliable tests are unlikely to succeed. Stanley and Spence (2014) conducted computer simulations that varied test reliability. Their results showed that possible correlations upon replication varied much more widely when test reliability was .70 rather than .90. These findings confirm the need to correct for test reliability when comparing studies.
Some tests are more valid than others. Efforts to replicate findings based on more valid measurement using less valid tests are unlikely to succeed. Hence, it is also important to correct for variation in test validity when comparing studies.
Experimental Design Issues
Investigators who choose to use a different experimental design than the one used in the study they are trying to replicate cannot expect to fully replicate the original results.
Dichotomization is problematic. For example, some investigators split their sample into high and low groups. They may give an anxiety test to a group of people and divide them at the median score into two groups of equal size labeled “high anxious” and “low anxious”. Then they compare the means of these two groups based on a measure of interest; perhaps grade point average in this case. What they don’t realize is that such analyses are the same as correlating group status (1 = high anxious/ 0 = low anxious) with grade point average. This choice of analytic method means that their results will not replicate the findings of those of investigators who simply correlate test anxiety scores with grade point average. The primary reason for this is that dichotomization loses information. Dichotomization essentially gives all members of the high anxious group the same test anxiety score by assigning a code of 1 to everyone in this group. This ignores all differences in their test anxiety scores. Likewise, dischotomization also gives all members of the low anxious group the same test anxiety score by assigning a code of 0 to everyone in this group. This ignores all differences in their test anxiety scores. Discarding all of this information leads to a different result than that found by simply correlating test anxiety scores with grades. It is therefore important to correct for this study artifact when comparing studies.
Range restriction is another problem. Investigators at large state universities may have access to a large group of students who differ much more widely in test anxiety than do students at small private colleges. The greater range of anxiety in the state school sample makes for groups that contrast more than does the smaller range of test anxiety in the private school sample. It is important to control for range restriction in both the independent and dependent variables when comparing studies.
The regular type of meta-analysis is concerned with sample statistics rather than population parameters. It ignores the attenuating effects of low test reliability, low test validity, and study artifacts. Two common results occur. First, the resulting average effect size seriously underestimates the population value; the value that would have been reported had perfectly reliable and valid tests been used and no study artifacts occurred. This creates the false impression that effects are not as large as they truly are. Second, illusory moderator variables are introduced that create the false impression that the findings are more complicated than they really are. Psychometric meta-analysis is concerned with population parameters. It corrects for variations in test reliability, test validity, and study artifacts. Two common results occur. First, the resulting average effect size is often substantially greater than that reported by regular meta-analysis. Second, the illusory moderators identified by regular meta-analysis typically disappear when reliability, validity, and study artifacts are corrected.
The regular form of meta-analysis obscures replicability by seriously underestimating effect size and by introducing illusory moderator variables. Hence, psychometric meta-analysis pioneered by Hunter and Schmidt (1990) is the preferred method of establishing replicability.
Warren’s book, Cognitive Neuroscience and Psychotherapy: Network Principles for a Unified Theory is available for purchase on the Elsevier Store.
Use discount code “STC215” at checkout and save up to 30% on your very own copy.
About the Author
Warren W. Tryon received his undergraduate degree from Ohio Northern University in 1966. He was enrolled in the APA approved Doctoral Program in Clinical Psychology at Kent State University from 1966 – 1970. Upon graduation from Kent State, Dr. Tryon joined the Psychology Department faculty at Fordham University in 1970 as an Assistant Professor. He was promoted to Associate Professor in 1977 and to Full Professor in 1983. Licensed as a psychologist in New York State in 1973, he joined the National Register of Health Service Providers in Psychology in 1976, became a Diplomate in Clinical Psychology from the American Board of Professional Psychology (ABPP) in 1984, was promoted to Fellow of Division 12 (Clinical) of the American Psychological Association in 1994 and a fellow of the American Association of Applied and Preventive Psychology in 1996. Also in 1996 he became a Founder of the Assembly of Behavior Analysis and Therapy.
In 2003 he joined The Academy of Clinical Psychology. He was Director of Clinical Psychology Training from 1997 to 2003, and presently is in the third and final year of phased retirement. He will become Emeritus Professor of Psychology in May 2015 after 45 years of service to Fordham University. Dr. Tryon has published 179 titles, including 3 books, 22 chapters, and 140 articles in peer reviewed journals covering statistics, neuropsychology, and clinical psychology. He has reviewed manuscripts for 45 journals and book publishers and has authored 145 papers/posters that were presented at major scientific meetings. Dr. Tryon has mentored 87 doctoral dissertations to completion. This is a record number of completed dissertations at the Fordham University Graduate School of Arts and Sciences and likely elsewhere.
His academic lineage is as follows. His mentor was V. Edwin Bixenstein who studied with O. Hobart Mowrer at the University of Illinois who studied with Knight Dunlap at Johns Hopkins University who studied with Hugo Munsterberg at Harvard University who studied with Wilhelm Wundt at the University of Leipzig.
Cognitive Neuroscience and Psychotherapy: Network Principles for a Unified Theory is Dr. Tryon’s capstone publication. It is the product of more than a quarter of a century of scholarship. Additional material added after this book was printed is available at www.fordham.edu/psychology/tryon. This includes chapter supplements, a color version of Figure 5.6, and a thirteenth “Final Evaluation” chapter. He is on LinkedIn and Facebook. His email address is email@example.com.
Researchers and clinicians in psychology work across a vast array of sub-disciplines, including applied psychology, addictions, cognitive psychology, developmental and educational psychology, experimental physiological psychology, forensic psychology, neuropsychology, and behavioral and cognitive therapy. For these professionals, and students as well, cross-disciplinary study is a given. For more than 75 years, Elsevier has cultivated portfolios of psychology books, eBooks, and journals covering current and critical issues in all of these areas. This vital content provides a sound basis of understanding for all those involved in this multi-faceted field.