Abstract
DBSCAN is a density-based clustering algorithm that is especially useful for finding clusters of arbitrary shapes. As opposed to other clustering techniques, like K-means, it does not require the number of clusters to be specified as an input parameter, and it is highly robust to outliers. However, DBSCAN has a worst-case quadratic time complexity, which makes it difficult to handle large dataset sizes. To address this problem, several works have been proposed that exploit the massive parallelism of GPUs in DBSCAN clustering. Nonetheless, none of these works have been experimentally compared against each other. In this paper, we review the existing GPU algorithms for DBSCAN clustering and conduct the first experimental study comparing these GPU algorithms using three real-world datasets to identify the best performing algorithm. Our results show that CUDA-DClust is the best performing GPU algorithm in terms of execution time and memory requirements.
Original language | English (US) |
---|---|
Title of host publication | Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019 |
Editors | Chaitanya Baru, Jun Huan, Latifur Khan, Xiaohua Tony Hu, Ronay Ak, Yuanyuan Tian, Roger Barga, Carlo Zaniolo, Kisung Lee, Yanfang Fanny Ye |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 3701-3710 |
Number of pages | 10 |
ISBN (Electronic) | 9781728108582 |
DOIs | |
State | Published - Dec 2019 |
Event | 2019 IEEE International Conference on Big Data, Big Data 2019 - Los Angeles, United States Duration: Dec 9 2019 → Dec 12 2019 |
Publication series
Name | Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019 |
---|
Conference
Conference | 2019 IEEE International Conference on Big Data, Big Data 2019 |
---|---|
Country/Territory | United States |
City | Los Angeles |
Period | 12/9/19 → 12/12/19 |
Bibliographical note
Publisher Copyright:© 2019 IEEE.
Keywords
- DBSCAN
- GPU
- clustering
- parallel computing