TY - GEN
T1 - Exploiting geo-tagged tweets to understand localized language diversity
AU - Magdy, Amr
AU - Ghanem, Thanaa M.
AU - Musleh, Mashaal
AU - Mokbel, Mohamed F
PY - 2014
Y1 - 2014
N2 - Social media services are the top-growing online communities in the last few years. Among those, Twitter becomes the de facto of microblogging services with millions of tweets posted everyday. In this paper, we present an analytical study for localized language usage and diversity in Twitter data using a half billion geotagged tweets. We first identify local Twitter communities on a country-level. For the identified communities, we examine (1) the language diversity, (2) the language dominance within the community and how this differs from local to global views, (3) demographics representativeness of tweets for real population demographics, and (4) the spatial distribution of different cultural groups within the countries. To this end, we group the tweets on two levels. First, we group tweets per country to identify the local communities. Second, we group tweets within each local community based on the tweet language. Our study shows useful insights about language usage on Twitter which provide important information for language-based applications on top of Twitter data, e.g., lingual analysis and disaster management. In addition, we present an interactive exploration tool for the spatial distribution of cultural groups, which provides a low-effort and high-precision localization of different cultural groups inside a certain country.
AB - Social media services are the top-growing online communities in the last few years. Among those, Twitter becomes the de facto of microblogging services with millions of tweets posted everyday. In this paper, we present an analytical study for localized language usage and diversity in Twitter data using a half billion geotagged tweets. We first identify local Twitter communities on a country-level. For the identified communities, we examine (1) the language diversity, (2) the language dominance within the community and how this differs from local to global views, (3) demographics representativeness of tweets for real population demographics, and (4) the spatial distribution of different cultural groups within the countries. To this end, we group the tweets on two levels. First, we group tweets per country to identify the local communities. Second, we group tweets within each local community based on the tweet language. Our study shows useful insights about language usage on Twitter which provide important information for language-based applications on top of Twitter data, e.g., lingual analysis and disaster management. In addition, we present an interactive exploration tool for the spatial distribution of cultural groups, which provides a low-effort and high-precision localization of different cultural groups inside a certain country.
UR - http://www.scopus.com/inward/record.url?scp=84907016474&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84907016474&partnerID=8YFLogxK
U2 - 10.1145/2619112.2619114
DO - 10.1145/2619112.2619114
M3 - Conference contribution
AN - SCOPUS:84907016474
SN - 9781450329781
T3 - 1st International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data, GeoRich 2014 - In Conjunction with SIGMOD 2014
SP - 7
EP - 12
BT - 1st International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data, GeoRich 2014 - In Conjunction with SIGMOD 2014
PB - Association for Computing Machinery
T2 - 1st International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data, GeoRich 2014 - In Conjunction with SIGMOD 2014
Y2 - 27 June 2014 through 27 June 2014
ER -