Given a collection of geo-distributed points, we aim to detect statistically significant clusters of varying shapes and densities. Spatial clustering has been widely used many important societal applications, including public health and safety, transportation, environment, etc. The problem is challenging because many application domains have low-tolerance to false positives (e.g., falsely claiming a crime cluster in a community can have serious negative impacts on the residents) and clusters often have irregular shapes. In related work, the spatial scan statistic is a popular technique that can detect significant clusters but it requires clusters to have certain predefined shapes (e.g., circles, rings). In contrast, density-based methods (e.g., DBSCAN) can find clusters of arbitrary shape efficiently but do not consider statistical significance, making them susceptible to spurious patterns. To address these limitations, we first propose a modeling of statistical significance in DBSCAN based clustering. Then, we propose a baseline Monte Carlo method to estimate the significance of clusters and a Dual-Convergence algorithm to accelerate the computation. Experiment results show that significant DBSCAN is very effective in removing chance patterns and the Dual-Convergence algorithm can greatly reduce execution time.
|Original language||English (US)|
|Title of host publication||Proceedings of the 16th International Symposium on Spatial and Temporal Databases, SSTD 2019|
|Publisher||Association for Computing Machinery|
|Number of pages||10|
|State||Published - Aug 19 2019|
|Event||16th International Symposium on Spatial and Temporal Databases, SSTD 2019 - Vienna, Austria|
Duration: Aug 19 2019 → Aug 21 2019
|Name||ACM International Conference Proceeding Series|
|Conference||16th International Symposium on Spatial and Temporal Databases, SSTD 2019|
|Period||8/19/19 → 8/21/19|
Bibliographical noteFunding Information:
This work is supported by the US NSF under Grants No. 1737633, 1029711, IIS-1320580, 0940818 and IIS-1218168, the USDOD under Grants HM0210-13-1-0005, ARPA-E under Grant No. DE-AR0000795, USDA under Grant No. 2017-51181-27222, NIH under Grant No. UL1 TR002494, KL2 TR002492 and TL1 TR002493 and the OVPR U-Spatial and MSI at the U. of Minnesota. We also thank Dr. Hans-Peter Kriegel for his encouragement on carrying out this research.
© 2019 Association for Computing Machinery.
- Statistical significance