Managing dbGaP data with stratus, a research cloud for protected data

Evan F. Bollig, Graham T. Allan, Benjamin J. Lynch, Yectli Huerta, Mathew Mix, Brent Swartz, Edward A. Munsell, Joshua Leibfried, Naomi Hospodarsky

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

Modern research computing needs at academic institutions are evolving. While traditional HPC has and continues to satisfy most workflows, a new generation of researchers has emerged looking for sophisticated, on-demand, and self-service control of compute infrastructure in a cloud-like environment. Furthermore, many also seek policy-complaint safe spaces to compute on sensitive or protected data. To cater to these modern users, the Minnesota Supercomputing Institute is deploying a cloud service for research computing called Stratus. In its initial iteration, Stratus is designed expressly to satisfy the requirements set forth by the NIH Genomic Data Sharing (GDS) Policy for data from the Database of Genotypes and Phenotypes (dbGaP) [8]. Stratus is powered by the Newton version of the OpenStack cloud platform, and backed by Ceph storage. The subscriptionbased service is currently running in beta-Test mode. In addition to data protection and compliance, the service offers three features not available on traditional HPC systems: A) on-demand availability of compute resources; b) long-running jobs (i.e., > 30 days); and c) container-based computing with Docker. This document surveys the design of Stratus with emphasis on security and compliance related to managing dbGaP data. Additionally, we highlight end-user workflows for processing large data in the presence of multi-Tiered cloud storage (including a special "dbGaP Cache" for staged data).

Original languageEnglish (US)
Title of host publicationPEARC 2017 - Practice and Experience in Advanced Research Computing 2017
Subtitle of host publicationSustainability, Success and Impact
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450352727
DOIs
StatePublished - Jul 9 2017
Event2017 Practice and Experience in Advanced Research Computing, PEARC 2017 - New Orleans, United States
Duration: Jul 9 2017Jul 13 2017

Publication series

NameACM International Conference Proceeding Series
VolumePart F128771

Other

Other2017 Practice and Experience in Advanced Research Computing, PEARC 2017
CountryUnited States
CityNew Orleans
Period7/9/177/13/17

Keywords

  • Ceph
  • Cloud Computing
  • DbGaP
  • Docker
  • OpenStack
  • Private Cloud
  • Protected Data
  • S3

Fingerprint Dive into the research topics of 'Managing dbGaP data with stratus, a research cloud for protected data'. Together they form a unique fingerprint.

Cite this