TY - GEN
T1 - A parallel input-output system for resolving spatial data challenges
T2 - ACM SIGSPATIAL 2nd International Workshop on High Performance and Distributed Geographic Information Systems, ACM SIGSPATIAL HPDGIS 2011
AU - Shook, Eric
AU - Wang, Shaowen
PY - 2011
Y1 - 2011
N2 - With recent advances in data collection technologies such as remote sensing and global positioning systems, the amount of spatial data being produced has been increasing at a staggering rate. Simultaneously, a shift is being experienced in computing from single-core to multi-core processors. To effectively utilize the computational power afforded by these new generation of processors for serving data-intensive geospatial applications, parallel computing techniques need to be employed. Parallel computing, however, raises new challenges associated with handling the input and output of spatial data in parallel. This paper describes a Parallel Input/Output System (PIOS) to address challenges associated with handling large amounts of diverse spatial data. The PIOS is based on a hierarchical structure that uses a scalable file partitioning strategy and combines data and metadata to enable efficient handling of terabyte-scale data sets in parallel. A spatially-explicit agent-based model is developed as a case study. Computational experiments were conducted on a supercomputer supported by the National Science Foundation. PIOS achieved ten times speedup in parallel input/output time, and was demonstrated to efficiently scale to over one thousand processing cores and handle multiple terabytes of data.
AB - With recent advances in data collection technologies such as remote sensing and global positioning systems, the amount of spatial data being produced has been increasing at a staggering rate. Simultaneously, a shift is being experienced in computing from single-core to multi-core processors. To effectively utilize the computational power afforded by these new generation of processors for serving data-intensive geospatial applications, parallel computing techniques need to be employed. Parallel computing, however, raises new challenges associated with handling the input and output of spatial data in parallel. This paper describes a Parallel Input/Output System (PIOS) to address challenges associated with handling large amounts of diverse spatial data. The PIOS is based on a hierarchical structure that uses a scalable file partitioning strategy and combines data and metadata to enable efficient handling of terabyte-scale data sets in parallel. A spatially-explicit agent-based model is developed as a case study. Computational experiments were conducted on a supercomputer supported by the National Science Foundation. PIOS achieved ten times speedup in parallel input/output time, and was demonstrated to efficiently scale to over one thousand processing cores and handle multiple terabytes of data.
UR - http://www.scopus.com/inward/record.url?scp=83455243210&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=83455243210&partnerID=8YFLogxK
U2 - 10.1145/2070770.2070773
DO - 10.1145/2070770.2070773
M3 - Conference contribution
AN - SCOPUS:83455243210
SN - 9781450310406
T3 - Proceedings of the ACM SIGSPATIAL 2nd International Workshop on High Performance and Distributed Geographic Information Systems, ACM SIGSPATIAL HPDGIS 2011
SP - 18
EP - 25
BT - Proceedings of the ACM SIGSPATIAL 2nd International Workshop on High Performance and Distributed Geographic Information Systems, ACM SIGSPATIAL HPDGIS 2011
Y2 - 1 November 2011 through 1 November 2011
ER -