TY - JOUR
T1 - A K-main routes approach to spatial network activity summarization
AU - Oliver, Dev
AU - Shekhar, Shashi
AU - Kang, James M.
AU - Laubscher, Renee
AU - Carlan, Veronica
AU - Bannur, Abdussalam
PY - 2014/6
Y1 - 2014/6
N2 - Data summarization is an important concept in data mining for finding a compact representation of a dataset. In spatial network activity summarization (SNAS), we are given a spatial network and a collection of activities (e.g., pedestrian fatality reports, crime reports) and the goal is to find \(k\) shortest paths that summarize the activities. SNAS is important for applications where observations occur along linear paths such as roadways, train tracks, etc. SNAS is computationally challenging because of the large number of \(k\) subsets of shortest paths in a spatial network. Previous work has focused on either geometry or subgraph-based approaches (e.g., only one path), and cannot summarize activities using multiple paths. This paper proposes a K-Main Routes (KMR) approach that discovers \(k\) shortest paths to summarize activities. KMR generalizes K-means for network space but uses shortest paths instead of ellipses to summarize activities. To improve performance, KMR uses network Voronoi, divide and conquer, and pruning strategies. We present a case study comparing KMR's network-based output (i.e., shortest paths) to geometry-based outputs (e.g., ellipses) on pedestrian fatality data. Experimental results on synthetic and real data show that KMR with our performance-tuning decisions yields substantial computational savings without reducing summary path coverage.
AB - Data summarization is an important concept in data mining for finding a compact representation of a dataset. In spatial network activity summarization (SNAS), we are given a spatial network and a collection of activities (e.g., pedestrian fatality reports, crime reports) and the goal is to find \(k\) shortest paths that summarize the activities. SNAS is important for applications where observations occur along linear paths such as roadways, train tracks, etc. SNAS is computationally challenging because of the large number of \(k\) subsets of shortest paths in a spatial network. Previous work has focused on either geometry or subgraph-based approaches (e.g., only one path), and cannot summarize activities using multiple paths. This paper proposes a K-Main Routes (KMR) approach that discovers \(k\) shortest paths to summarize activities. KMR generalizes K-means for network space but uses shortest paths instead of ellipses to summarize activities. To improve performance, KMR uses network Voronoi, divide and conquer, and pruning strategies. We present a case study comparing KMR's network-based output (i.e., shortest paths) to geometry-based outputs (e.g., ellipses) on pedestrian fatality data. Experimental results on synthetic and real data show that KMR with our performance-tuning decisions yields substantial computational savings without reducing summary path coverage.
KW - activity summarization
KW - hot routes
KW - hot spots
KW - partitioning
KW - spatial network
UR - http://www.scopus.com/inward/record.url?scp=84902176387&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84902176387&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2013.135
DO - 10.1109/TKDE.2013.135
M3 - Article
AN - SCOPUS:84902176387
SN - 1041-4347
VL - 26
SP - 1464
EP - 1478
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 6
M1 - 6574853
ER -