This paper presents GeoTrend+; a system approach to support scalable local trend discovery on recent microblogs, e.g., tweets, comments, online reviews, and check-ins, that come in real time. GeoTrend+ discovers top-k trending keywords in arbitrary spatial regions from recent microblogs that continuously arrive with high rates and a significant portion has uncertain geolocations. GeoTrend+ distinguishes itself from existing techniques in different aspects: (1) Discovering trends in arbitrary spatial regions, e.g., city blocks. (2) Considering both exact geolocations, e.g., accurate latitude/longitude coordinates, and uncertain geolocations, e.g., district-level or city-level, that represents a significant portion of past years microblogs. (3) Promoting recent microblogs as first-class citizens and optimizes different components to digest a continuous flow of fast data in main-memory while removing old data efficiently. (4) Providing various main-memory optimization techniques that are able to distinguish useful from useless data to effectively utilize tight memory resources while maintaining accurate query results on relatively large amounts of data. (5) Supporting various trending measures that effectively capture trending items under a variety of definitions that suit different applications. GeoTrend+ limits its scope to real-time data that is posted during the last T time units. To support its queries efficiently, GeoTrend+ employs an in-memory spatial index that is able to efficiently digest incoming data and expire data that is beyond the last T time units. The index also materializes top-k keywords in different spatial regions so that incoming queries can be processed with low latency. In peak times, the main-memory optimization techniques are employed to shed less important data to sustain high query accuracy with limited memory resources. Experimental results based on real data and queries show the scalability of GeoTrend+ to support high arrival rates and low query response time, and at least 90+% query accuracy even under limited memory resources.
Bibliographical noteFunding Information:
Amr Magdy acknowledges the support of the National Science Foundation under Grants Number IIS-1849971, SES-1831615, and CNS-1837577. Mohamed Mokbel acknowledges the support of the National Science Foundation under Grants Number IIS-1525953, CNS-1512877, and IIS-1907855. Walid Aref acknowledges the support of the National Science Foundation under Grants Number III-1815796, and IIS-1910216.
© 2019, Springer Science+Business Media, LLC, part of Springer Nature.
- Adaptive memory optimization
- Query processing
- Uncertain location