Wrangling Galaxy's reference data

Daniel Blankenberg; James E. Johnson; James Taylor; Anton Nekrutenko

doi:10.1093/bioinformatics/btu119

Wrangling Galaxy's reference data

Daniel Blankenberg, James E. Johnson, James Taylor, Anton Nekrutenko

Research Computing

Research output: Contribution to journal › Article › peer-review

25 Scopus citations

Abstract

Summary: The Galaxy platform has developed into a fully featured collaborative workbench, with goals of inherently capturing provenance to enable reproducible data analysis, and of making it straightforward to run one's own server. However, many Galaxy platform tools rely on the presence of reference data, such as alignment indexes, to function efficiently. Until now, the building of this cache of data for Galaxy has been an error-prone manual process lacking reproducibility and provenance. The Galaxy Data Manager framework is an enhancement that changes the management of Galaxy's built-in data cache from a manual procedure to an automated graphical user interface (GUI) driven process, which contains the same openness, reproducibility and provenance that is afforded to Galaxy's analysis tools. Data Manager tools allow the Galaxy administrator to download, create and install additional datasets for any type of reference data in real time.

Original language	English (US)
Pages (from-to)	1917-1919
Number of pages	3
Journal	Bioinformatics
Volume	30
Issue number	13
DOIs	https://doi.org/10.1093/bioinformatics/btu119
State	Published - Jul 1 2014

Bibliographical note

Funding Information:
Funding: This work was supported through grant number HG005542 from the National Human Genome Research Institute, National Institutes of Health, as well as grants HG005133, HG004909 and HG006620 and NSF grant DBI 0543285. Additional funding is provided by Huck Institutes for the Life Sciences at Penn State and, in part, under a grant with the Pennsylvania Department of Health using Tobacco Settlement Funds. The Department specifically disclaims responsibility for any analyses, interpretations or conclusions.

Access

10.1093/bioinformatics/btu119

OpenUrl availability

Full text

Cite this

@article{9649194ce67b416391e82db265150bb2,

title = "Wrangling Galaxy's reference data",

abstract = "Summary: The Galaxy platform has developed into a fully featured collaborative workbench, with goals of inherently capturing provenance to enable reproducible data analysis, and of making it straightforward to run one's own server. However, many Galaxy platform tools rely on the presence of reference data, such as alignment indexes, to function efficiently. Until now, the building of this cache of data for Galaxy has been an error-prone manual process lacking reproducibility and provenance. The Galaxy Data Manager framework is an enhancement that changes the management of Galaxy's built-in data cache from a manual procedure to an automated graphical user interface (GUI) driven process, which contains the same openness, reproducibility and provenance that is afforded to Galaxy's analysis tools. Data Manager tools allow the Galaxy administrator to download, create and install additional datasets for any type of reference data in real time.",

author = "Daniel Blankenberg and Johnson, {James E.} and James Taylor and Anton Nekrutenko",

note = "Funding Information: Funding: This work was supported through grant number HG005542 from the National Human Genome Research Institute, National Institutes of Health, as well as grants HG005133, HG004909 and HG006620 and NSF grant DBI 0543285. Additional funding is provided by Huck Institutes for the Life Sciences at Penn State and, in part, under a grant with the Pennsylvania Department of Health using Tobacco Settlement Funds. The Department specifically disclaims responsibility for any analyses, interpretations or conclusions.",

year = "2014",

month = jul,

day = "1",

doi = "10.1093/bioinformatics/btu119",

language = "English (US)",

volume = "30",

pages = "1917--1919",

journal = "Bioinformatics",

issn = "1367-4803",

publisher = "Oxford University Press",

number = "13",

}

TY - JOUR

T1 - Wrangling Galaxy's reference data

AU - Blankenberg, Daniel

AU - Johnson, James E.

AU - Taylor, James

AU - Nekrutenko, Anton

N1 - Funding Information: Funding: This work was supported through grant number HG005542 from the National Human Genome Research Institute, National Institutes of Health, as well as grants HG005133, HG004909 and HG006620 and NSF grant DBI 0543285. Additional funding is provided by Huck Institutes for the Life Sciences at Penn State and, in part, under a grant with the Pennsylvania Department of Health using Tobacco Settlement Funds. The Department specifically disclaims responsibility for any analyses, interpretations or conclusions.

PY - 2014/7/1

Y1 - 2014/7/1

N2 - Summary: The Galaxy platform has developed into a fully featured collaborative workbench, with goals of inherently capturing provenance to enable reproducible data analysis, and of making it straightforward to run one's own server. However, many Galaxy platform tools rely on the presence of reference data, such as alignment indexes, to function efficiently. Until now, the building of this cache of data for Galaxy has been an error-prone manual process lacking reproducibility and provenance. The Galaxy Data Manager framework is an enhancement that changes the management of Galaxy's built-in data cache from a manual procedure to an automated graphical user interface (GUI) driven process, which contains the same openness, reproducibility and provenance that is afforded to Galaxy's analysis tools. Data Manager tools allow the Galaxy administrator to download, create and install additional datasets for any type of reference data in real time.

AB - Summary: The Galaxy platform has developed into a fully featured collaborative workbench, with goals of inherently capturing provenance to enable reproducible data analysis, and of making it straightforward to run one's own server. However, many Galaxy platform tools rely on the presence of reference data, such as alignment indexes, to function efficiently. Until now, the building of this cache of data for Galaxy has been an error-prone manual process lacking reproducibility and provenance. The Galaxy Data Manager framework is an enhancement that changes the management of Galaxy's built-in data cache from a manual procedure to an automated graphical user interface (GUI) driven process, which contains the same openness, reproducibility and provenance that is afforded to Galaxy's analysis tools. Data Manager tools allow the Galaxy administrator to download, create and install additional datasets for any type of reference data in real time.

UR - http://www.scopus.com/inward/record.url?scp=84903703285&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84903703285&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btu119

DO - 10.1093/bioinformatics/btu119

M3 - Article

C2 - 24585771

AN - SCOPUS:84903703285

SN - 1367-4803

VL - 30

SP - 1917

EP - 1919

JO - Bioinformatics

JF - Bioinformatics

IS - 13

ER -

Wrangling Galaxy's reference data

Abstract

Bibliographical note

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this