DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDM.2017.154
Predictive modeling of nested geospatial data is a challenging problem as the models must take into account potential interactions among variables defined at different spatial scales. These cross-scale interactions, as they are commonly known, are particularly important to understand relationships among ecological properties at macroscales. In this paper, we present a novel, multi-level multi-task learning framework for modeling nested geospatial data in the lake ecology domain. Specifically, we consider region-specific models to predict lake water quality from multi-scaled factors. Our framework enables distinct models to be developed for each region using both its local and regional information. The framework also allows information to be shared among the region-specific models through their common set of latent factors. Such information sharing helps to create more robust models especially for regions with limited or no training data. In addition, the framework can automatically determine cross-scale interactions between the regional variables and the local variables that are nested within them. Our experimental results show that the proposed framework outperforms all the baseline methods in at least 64% of the regions for 3 out of 4 lake water quality datasets evaluated in this study. Furthermore, the latent factors can be clustered to obtain a new set of regions that is more aligned with the response variables than the original regions that were defined a priori from the ecology domain.