Regrocket: Scalable multinomial autologistic regression with unordered categorical variables using Markov logic networks

Ibrahim Sabek, Mashaal Musleh, Mohamed F. Mokbel

Research output: Contribution to journalArticlepeer-review

Abstract

Autologistic regression is one of the most popular statistical tools to predict spatial phenomena in several applications, including epidemic diseases detection, species occurrence prediction, earth observation, and business management. In general, autologistic regression divides the space into a two-dimensional grid, where the prediction is performed at each cell in the grid. The prediction at any location is based on a set of predictors (i.e., features) at this location and predictions from neighboring locations. In this article, we address the problem of building efficient autologistic models with multinomial (i.e., categorical) prediction and predictor variables, where the categories represented by these variables are unordered. Unfortunately, existing methods to build autologistic models are designed for binary variables in addition to being computationally expensive (i.e., do not scale up for large-scale grid data such as fine-grained satellite images). Therefore, we introduce RegRocket: a scalable framework to build multinomial autologistic models for predicting large-scale spatial phenomena. RegRocket considers both the accuracy and efficiency aspects when learning the regression model parameters. To this end, RegRocket is built on top of Markov Logic Network (MLN), a scalable statistical learning framework, where its internals and data structures are optimized to process spatial data. RegRocket provides an equivalent representation of the multinomial prediction and predictor variables using MLN where the dependencies between these variables are transformed into first-order logic predicates. Then, RegRocket employs an efficient framework that learns the model parameters from the MLN representation in a distributed manner. Extensive experimental results based on two large real datasets show that RegRocket can build multinomial autologistic models, in minutes, for 1 million grid cells with 0.85 average F1-score.

Original languageEnglish (US)
Article number27
JournalACM Transactions on Spatial Algorithms and Systems
Volume5
Issue number4
DOIs
StatePublished - Dec 2019

Bibliographical note

Funding Information:
This work is partially supported by the National Science Foundation, USA, under Grants IIS-1907855, IIS-1525953, and CNS-1512877. Authors’ addresses: I. Sabek and M. Musleh, University of Minnesota, Department of Computer Science and Engineering, 4-192 KHKH Building, 200 Union Street SE, Minneapolis, MN 55455, USA; emails: {sabek001, musle005}@umn.edu; M. F. Mokbel, Qatar Computing Research Institute, B103, Hamad Bin Khalifa Research Complex B1, Doha, Qatar; email: mmokbel@hbku.edu.qa. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. © 2019 Association for Computing Machinery. 2374-0353/2019/11-ART27 $15.00 https://doi.org/10.1145/3366459

Publisher Copyright:
© 2019 Association for Computing Machinery.

Keywords

  • Autologistic models
  • Factor graph
  • First-order logic
  • Markov logic networks
  • Multinomial spatial regression

Fingerprint

Dive into the research topics of 'Regrocket: Scalable multinomial autologistic regression with unordered categorical variables using Markov logic networks'. Together they form a unique fingerprint.

Cite this