Meta-analyses suggest that the published literature represents only a small minority of the total data collected in biomedical research, with most becoming ‘dark data’ unreported in the literature. Dark data is due to publication bias toward novel results that confirm investigator hypotheses and omission of data that do not. Publication bias contributes to scientific irreproducibility and failures in bench-to-bedside translation. Sharing dark data by making it Findable, Accessible, Interoperable, and Reusable (FAIR) may reduce the burden of irreproducible science by increasing transparency and support data-driven discoveries beyond the lifecycle of the original study. We illustrate feasibility of dark data sharing by recovering original raw data from the Multicenter Animal Spinal Cord Injury Study (MASCIS), an NIH-funded multi-site preclinical drug trial conducted in the 1990s that tested efficacy of several therapies after a spinal cord injury (SCI). The original drug treatments did not produce clear positive results and MASCIS data were stored in boxes for more than two decades. The goal of the present study was to independently confirm published machine learning findings that perioperative blood pressure is a major predictor of SCI neuromotor outcome (Nielson et al., 2015). We recovered, digitized, and curated the data from 1125 rats from MASCIS. Analyses indicated that high perioperative blood pressure at the time of SCI is associated with poorer health and worse neuromotor outcomes in more severe SCI, whereas low perioperative blood pressure is associated with poorer health and worse neuromotor outcome in moderate SCI. These findings confirm and expand prior results that a narrow window of blood-pressure control optimizes outcome, and demonstrate the value of recovering dark data for assessing reproducibility of findings with implications for precision therapeutic approaches.
Bibliographical noteFunding Information:
Supported by Wings for Life Foundation (ARF); NIH/NINDS: R01NS088475 (ARF); UG3/UH3NS106899 (ARF); JLN supported by NIH/NIMH: R01MH116156 and NIH/NCATS: UL1TR002494; Department of Veterans Affairs: 1I01RX002245 (ARF), I01RX002787 (ARF); and Craig H. Neilsen Foundation (ARF). Original MASCIS data collection was funded by NS032000 (WY). The authors would like to thank Hadi Askari, Sean P. O’Leary, and Patricia Morton for help with data recovery.
Despite limitations, our study shows that even legacy data from 25 years ago may yield important findings, and this helps support emerging standards that all NIH funded research should follow FAIR data stewardship principles (Mueck, ; Wilkinson et al., ). The first attempt to gather subject-level data from neurotrauma studies was VISION-SCI (Nielson et al., ), but to our knowledge the present work represents the first targeted attempt of data retrieval of animal subject level data at this scale. The MASCIS consortium was a large and expensive group with a budget that exceeded $1 million annually between 1994 and 1996, and used over 2000 animals for their experiments. Our inability to recover the original treatment conditions for rats from MASCIS is not unusual given the regulatory standards under which these data were collected. For the majority of grant funded research, historically, NIH mandated that data be maintained for 3–5 years post-study completion (NIH Office of Extramural Research, ). Having retrieved data for over 1000 animals at an estimated data recovery rate above 60%, our experience retrieving part of that dataset was overall successful because we increased the retained value from the original investment. Additionally, we are adding these data to our prior recovered data from OSU in our public release of the MASCIS data as part of this paper yielding a total of 1459 animals data records made FAIR through data archeology.
© 2021, The Author(s).
- Data science
- Motor recovery
- Spinal contusion
PubMed: MeSH publication types
- Journal Article