Share this article:
The Data Complexity Barrier and the Surprising Importance of Rare Diseases
A funny thing happened on the way to “Precision Medicine”. It seems that many of the fundamental studies in the field of pharmacogenomics and personalized medicine are yielding irreproducible results. We find that we can not depend on the data that we depend on. If you don’t believe me, consider these shocking headlines:
1) “Unreliable research: Trouble at the lab.” (1) The Economist, in 2013 ran an article examining flawed biomedical research. The magazine article referred to an NIH official who indicated that “researchers would find it hard to reproduce at least three-quarters of all published biomedical findings.” The article also described a study conducted at the pharmaceutical company Amgen, wherein 53 landmark studies were repeated. The Amgen scientists were successful at reproducing the results of only 6 of the 53 studies. Another group, at Bayer HealthCare, repeated 63 studies. The Bayer group succeeded in reproducing the results of only one-fourth of the original studies.
2) “A decade of reversal: an analysis of 146 contradicted medical practices.” (2) The authors reviewed 363 journal articles, reexamining established standards of medical care. Among these articles were 146 manuscripts (40.2%) claiming that an existing standard of care had no clinical value.
3) “Cancer fight: unclear tests for new drug.” (3). This New York Times article examined whether a common test performed on breast cancer tissue (Her2) was repeatable. It was shown that for patients who tested positive for Her2, a repeat test indicated that 20% of the original positive assays were actually negative (i.e., falsely positive on the initial test). (3).
4) “Why most published research findings are false.” (4). Modern scientists often search for small effect sizes, using a wide range of available analytic techniques, and a flexible interpretation of outcome results. Under such conditions, the manuscript author found that research conclusions are more likely to be false than true (4).
5) “Reproducibility crisis: Blame it on the antibodies” (5). Biomarker developers are finding that they cannot rely on different batches of a reagent to react in a consistent manner, from test to test. Hence, laboratory analytic methods, developed using a controlled set of reagents, may not have any diagnostic value when applied by other laboratories, using different sets of the same analytes (5).
Anyone who tries to stay current in biomedical research understands that much of the published literature is irreproducible (6); and that almost anything published today might be retracted tomorrow. This appalling truth applies to some of the most respected laboratories in the world (7), (8), (9), (10), (11), (12), (13). Those of us who have been involved in assessing the rate of progress in disease research are painfully aware of the numerous reports indicating a general slowdown in medical progress (14), (15), (16), (17), (18), (19), (20), (21).
For the optimists, it is tempting to assume that any problems that we may be experiencing today are par for the course, and temporary. It is the nature of science to stall for a while and lurch forwards in fits. Errors and retractions will always be with us so long as humans are involved in the scientific process.
For the pessimists, such as myself, there seems to be something going on that is really new and different; a game changer. This game changer is the “complexity barrier”, a term credited to Boris Beizer, who used it to describe the impossibility of managing increasingly complex software products (22). The “complexity barrier” applies equally well to biomedical research. Recent studies have shown that inherited behavior is not fully determined by the genetic sequence of DNA. There are many different elements that modify and control genetic expression, and these elements do not have simple functionality (23). When complex cells are perturbed from their normal, steady-state activities, the rules that define cellular behavior become complex, and impossible to predict (24).
Modern biomedical data is high-volume (e.g., gigabytes and larger), heterogeneous (i.e., derived from diverse sources), private (i.e., measured on human subjects), and multi-dimensional (e.g., containing thousands of different measurements for each data record). The complexities of handling such data are daunting. When we closely examine the kinds of systemic flaws that cropping up in the recent biomedical literature, complexity always seems to play a supporting role. Here are a few examples:
1) Errors in sample selection, labeling, and measurement (25), (26), (27)
2) Misinterpretation of the data (28), (4), (29), (19), (30), (31), (32)
3) Data hiding and data obfuscation (33), (34)
4) Unverified and unvalidated data (35), (36), (37), (38), (30), (39)
5) Outright fraud (34), (40)
As biomedical data becomes increasingly complex, the complexity barrier will become impenetrable. It is ironic that just as we are coming to accept the limits of analyzing complex data, we are given President Obama’s gift of a new funding initiative in support of precision medicine (41). How can use this money effectively when we know that human diseases reside on the far side of the complexity barrier?
As it happens, not all diseases are genetically complex. The rare genetic diseases of humans, with very few exceptions, involve a single mutation in a single gene. In the past decade, we have made remarkable advances in understanding the rare diseases. For example, FDA approved a total of 44 drugs in 2014 (42). Of those 44 drugs, 21 (47%) were approved for the treatment of rare diseases, including: non-24-hour sleep-wake disorder, Morquio A syndrome, neurogenic orthostatic hypotension, generalized lipodystrophy, psoriatic arthritis, hemophilia B, multicentric Castleman’s disease, hemophilia A (two drugs), Non-Hodgkin lymphoma, hereditary angioedema, leukemia (two drugs), multiple sclerosis, Gaucher disease, idiopathic pulmonary fibrosis (two drugs), gastric cancer, melanoma (two drugs), and ovarian cancer. In fact, most of the medical advances in the past two decades have occurred in the rare diseases; not the common diseases (43).
The rare diseases, being genetically simple, can be successfully treated with drugs targeted to a specific pathway. As it happens all of the cancers that we can cure in an advanced stage of growth (i.e., with metastases) are rare cancers (43), (44): choriocarcinoma, acute lymphocytic leukemia of childhood, Burkitt lymphoma, Hodgkin lymphoma, acute promyelocytic leukemia, large follicular center cell (diffuse histiocytic)lymphoma, embryonal carcinoma of testis, hairy cell leukemia, and seminoma.
Too often, the rare diseases are casually dismissed as outliers, not representative of metabolic pathways that drive the common diseases. Bad decision. Historically, the most important advances in common diseases have come from studying the rare diseases (43). Pathways that are studied, understood, and treated in the rare cancers will likely apply to common (i.e., genetically complex) cancers that share some of the same pathogenic pathways. Hence, the rare diseases are not the exceptions to the general rules that apply to common diseases; the rare diseases are the exceptions upon which the general rules of common diseases are based (43).
In this new, highly-funded era of precision medicine research, funding should flow to the rare diseases. If the common diseases are the genetic puzzles that modern medical researchers are mandated to solve, then the rare diseases are the pieces of the puzzles (43).
© 2015 Jules J. Berman
 Unreliable research: Trouble at the lab. The Economist October 19, 2013.
 Prasad V, Vandross A, Toomey C, Cheung M, Rho J, Quinn S, et al. A decade of reversal: an analysis of 146 contradicted medical practices. Mayo Clin Proc 88:790-8, 2013.
 Kolata G. Cancer fight: unclear tests for new drug. The New York Times April 19, 2010.
 Ioannidis JP. Why most published research findings are false. PLoS Med 2:e124, 2005.
 Baker M. Reproducibility crisis: Blame it on the antibodies. Nature 521:274-276, 2015.
 Naik G. Scientists’ Elusive Goal: Reproducing Study Results. Wall Street Journal December 2, 2011.
 Zimmer C. A sharp rise in retractions prompts calls for reform. The New York Times April 16, 2012.
 Altman LK. Falsified data found in gene studies. The New York Times October 30, 1996.
 Weaver D, Albanese C, Costantini F, Baltimore D. Retraction: altered repertoire of endogenous immunoglobulin gene expression in transgenic mice containing a rearranged mu heavy chain gene. Cell 65:536 (inclusive), 1991.
 Chang K. Nobel winner in physiology retracts two papers. The New York Times September 23, 1010.
 Fourth paper retracted at Potti’s request. The Chronicle March 3, 2011.
 Whoriskey P. Doubts about Johns Hopkins research have gone unanswered, scientist says. The Washington Post March 11, 2013.
 Lin YY, Kiihl S, Suhail Y, Liu SY, Chou YH, Kuang Z, et al. Retraction: Functional dissection of lysine deacetylases reveals that HDAC1 and p300 regulate AMPK. Nature 482:251-255, retracted November, 2013.
 Innovation or Stagnation: Challenge and Opportunity on the Critical Path to New Medical Products. U.S. Department of Health and Human Services, Food and Drug Administration, 2004.
 Hurley D. Why Are So Few Blockbuster Drugs Invented Today? The New York Times November 13, 2014.
 Angell M. The Truth About the Drug Companies. The New York Review of Books Vol 51, July 15, 2004.
 Crossing the Quality Chasm: A New Health System for the 21st Century. Quality of Health Care in America Committee, editors. Institute of Medicine, Washington, DC., 2001.
 Wurtman RJ, Bettiker RL. The slowing of treatment discovery, 1965-1995. Nat Med 2:5-6, 1996.
 Ioannidis JP. Microarrays and molecular research: noise discovery? The Lancet 365:454-455, 2005.
 Weigelt B, Reis-Filho JS. Molecular profiling currently offers no more than tumour morphology and basic immunohistochemistry. Breast Cancer Research 12:S5, 2010.
 Personalised medicines: hopes and realities. The Royal Society, London, 2005.Available from: https://royalsociety.org/~/media/Royal_Society_Content/policy/publications/2005/9631.pdf, viewed Jan 1, 2015.
 Beizer B. Software Testing Techniques. Van Nostrand Reinhold; Hoboken, NJ 2 edition, 1990.
 Ecker JR, Bickmore WA, Barroso I, Pritchard JK, Gilad Y, Segal E. Genomics: ENCODE explained. Nature 489:52-55, 2012.
 Rosen JM, Jordan CT. The increasing complexity of the cancer stem cell paradigm. Science 324:1670-1673, 2009.
 Bandelt H, Salas A. Contamination and sample mix-up can best explain some patterns of mtDNA instabilities in buccal cells and oral squamous cell carcinoma. BMC Cancer 9:113, 2009.
 Knight, J. Agony for researchers as mix-up forces retraction of ecstasy study. Nature 425:109, September 11, 2003.
 Gerlinger M, Rowan AJ, Horswell S, Larkin J, Endesfelder D, Gronroos E, et al. Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N Engl J Med 366:883-892, 2012.
 Ioannidis JP. Is molecular profiling ready for use in clinical decision making? The Oncologist 12:301-311, 2007.
 Ioannidis JP. Some main problems eroding the credibility and relevance of randomized trials. Bull NYU Hosp Jt Dis 66:135-139, 2008.
 Ioannidis JP, Panagiotou OA. Comparison of effect sizes associated with biomarkers reported in highly cited individual articles and in subsequent meta-analyses. JAMA 305:2200-2210, 2011.
 Ioannidis JPA, Panagiotou OA. “Comparison of effect sizes associated with biomarkers reported in highly cited individual articles and in subsequent meta-analyses. JAMA 305:2200-2210, 2011.
 Ioannidis JP: Excess significance bias in the literature on brain volume abnormalities. Arch Gen Psychiatry 68:773-780, 2011.
 Harris G. Diabetes drug maker hid test data, files indicate. The New York Times July 12, 2010.
 Berman JJ. Machiavelli’s Laboratory. Amazon Digital Services, Inc., 2010.
 Misconduct in science: an array of errors. The Economist. September 10, 2011.
 Begley S. In cancer science, many ‘discoveries’ don’t hold up. Reuters Mar 28, 2012,
 Abu-Asab MS, Chaouchi M, Alesci S, Galli S, Laassri M, Cheema AK, et al. Biomarkers in the age of omics: time for a systems biology approach. OMICS 15:105-112, 2011.
 Moyer VA; on behalf of the U.S. Preventive Services Task Force. Screening for prostate cancer: U.S. Preventive Services Task Force recommendation statement. Ann Intern Med May 21, 2011
 How science goes wrong. The Economist Oct 19, 2013.
 Shafer SL. Letter: To our readers. Anesthesia and Analgesia. February 20, 2009.
 Pear R. Obama to Request Research Funding for Treatments Tailored to Patients’ DNA. The New York Times January 24, 2015.
 Munos B. 2014 New Drug Approvals Hit 18-Year High. Forbes Jan 2, 2015. Available at: http://www.forbes.com/sites/bernardmunos/2015/01/02/the-fda-approvals-of-2014, viewed June 20, 2015.
 Berman JJ. Rare diseases and orphan drugs: Keys to Understanding and Treating Common Diseases. Academic Press, in press, 2014.
 Holland Frei Cancer Medicine. Kufe D, Pollock R, Weichselbaum R, Bast R, Gansler T, Holland J, Frei E, eds. BC Decker, Ontario, Canada, 2003.
Visit the Elsevier Store to access the many books written by Dr. Berman. Use discount code “STC215″ at checkout and save up to 30% on your very own copy! Below is just a small selection of those books.
Repurposing Legacy Data explains how to integrate heterogeneous data sets for the purpose of answering questions or developing concepts that span several different scientific fields.
Rare Diseases and Orphan Drugs investigates how further research into rare diseases may lead to new methods for preventing, diagnosing, and treating all diseases.
About the Author
Jules Berman received two baccalaureate degrees from MIT; in Mathematics, and in Earth and Planetary Sciences. He received the Ph.D. from Temple University, and the M.D. from the U. of Miami. He received post-doctoral training at NIH and residency training at Geo. Washington U Med Ctr. He is board certified in anatomic pathology and in cytopathology. He served as Chief of Anatomic Pathology, Surgical Pathology and Cytopathology at the Veterans Administration Medical Center in Baltimore, Maryland, where he held joint appointments at the University of Maryland Medical Center and the Johns Hopkins Medical Institutions.
In 1998, he became a Medical Officer at the U.S. National Cancer Institute and served as the Program Director for Pathology Informatics in the Institute’s Cancer Diagnosis Program. In 2006, Jules Berman was President of the Association for Pathology Informatics. In 2011 he received the Lifetime Achievement Award from the Association for Pathology Informatics. Today, Jules Berman is a free-lance writer. He has first-authored more than 100 articles and 13 book titles in science and medicine.
You can read more from Jules at his personal blog, Specified Life.
Biomedicine & Biochemistry
The disciplines of biomedicine and biochemistry impact the lives of millions of people every day. Research in these areas has led to practical applications in cardiology, cancer treatment, respiratory medicine, drug development, and more. Interdisciplinary fields of study, including neuroscience, chemical engineering, nanotechnology, and psychology come together in this research to yield significant new discoveries. Elsevier’s biomedicine and biochemistry content spans a wide range of subject matter in various forms, including journals, books, eBooks, and online information services, enabling students, researchers, and clinicians to advance these fields. Learn more about our Biomedical and Biochemistry books here.