The launch of the US BRAIN and European Human Brain Projects coincides with growing international efforts toward transparency and increased access to publicly funded research in the neurosciences. The need for data-sharing standards and neuroinformatics infrastructure is more pressing than ever. However, 'big science' efforts are not the only drivers of data-sharing needs, as neuroscientists across the full spectrum of research grapple with the overwhelming volume of data being generated daily and a scientific environment that is increasingly focused on collaboration. In this commentary, we consider the issue of sharing of the richly diverse and heterogeneous small data sets produced by individual neuroscientists, so-called long-tail data. We consider the utility of these data, the diversity of repositories and options available for sharing such data, and emerging best practices. We provide use cases in which aggregating and mining diverse long-tail data convert numerous small data sources into big data for improved knowledge about neuroscience-related disorders.
Bibliographical noteFunding Information:
Some funding bodies, such as the NIH, have successfully instituted targeted data-sharing requirements, requiring communities to deposit data in a shared repository as a condition of funding. Notable examples include the National Database on Autism Research (NDAR) and the Federal Interagency TBI Research (FITBIR) informatics system. These focused efforts have implemented standards and tools for tracking compliance and have sustained intramural support from the NIH, US Department of Defense Congressionally Directed Medical Research Program and the US Army Medical Research and Materiel Command, among others. Coupled with support mechanisms, this infrastructure provides a model for sustained long-tail data sharing.
The premise that neuroscience will benefit from routine and universal data sharing has been around since the early days of the Internet. Calls to develop shared data repositories similar to those developed for genomics and protein structure communities were instantiated through the US Human Brain Project in the early 1990s, funded by the US National Institutes of Health (NIH)1. Part of the motivation behind this was the idea that an understanding of the brain would require cooperative efforts to integrate information across scales and modalities2, combining data generated with different techniques practiced across the various disciplines in neuroscience.
We thank the NIF staff, especially B. Ozyurt for his text mining expertise and tools that contributed substantially to Supplementary Table 1. The Neuroscience Information Framework is supported by a contract from the NIH Neuroscience Blueprint HHSN271200800035C via the National Institute on Drug Abuse. VISION-SCI is supported by NIH grants NS067092 (A.R.F.) and NS079030 (J.L.N.), and the Craig H. Neilsen foundation (A.R.F.) and Wings for Life foundation (A.R.F). This material is based on (M.H.C.) work supported while serving at the National Science Foundation. Any opinions, findings, and conclusions or recommendations expressed in
© 2014 Nature America, Inc.