Structured estimation in high dimensions: Applications in climate

André R. Goncalves; Arindam Banerjee; Vidyashankar Sivakumar; Soumyadeep Chatterjee

doi:10.4324/9781315371740

Structured estimation in high dimensions: Applications in climate

André R. Goncalves, Arindam Banerjee, Vidyashankar Sivakumar, Soumyadeep Chatterjee

Computer Science and Engineering

Research output: Chapter in Book/Report/Conference proceeding › Chapter

1 Scopus citations

Abstract

One of the central challenges of data analysis in climate science is understanding complex dependencies between multiple spatiotemporal climate variables. The data are typically high dimensional with each climate variable in each spatial grid or time period denoting a separate dimension. In fact, in many climate problems, the dimensionality, that is, the number of possible features or factors potentially affecting a response variable, is usually much larger than the number of samples that are typically reanalysis data sets over the past few decades. For example, in one of the problems considered in this chapter, one wants to predict climate variables like monthly temperature, precipitable water, etc. over land locations using information from six climate variables over oceans. We formulate it as a regression problem with the climate variable over a land location as the response variable. We consider 439 locations on oceans, so that there are a total of 6 × 439 = 2634 covariates in our regression problem. The data are the monthly means of the climate variables for 1948-2007 from the National Centers for Environmental Prediction/National Center for Atmospheric Research (NCEP/NCAR) Reanalysis 1 data set [1], so that we have a total of 60 × 12 = 720 data samples.

Original language	English (US)
Title of host publication	Large-Scale Machine Learning in the Earth Sciences
Publisher	CRC Press
Pages	13-32
Number of pages	20
ISBN (Electronic)	9781498703888
ISBN (Print)	9781498703871
DOIs	https://doi.org/10.4324/9781315371740
State	Published - Jan 1 2017

Bibliographical note

Publisher Copyright:
© 2017 by Taylor & Francis Group, LLC.

Access

10.4324/9781315371740

OpenUrl availability

Full text

Cite this

@inbook{1d15d88833064fca82476437f02df326,

title = "Structured estimation in high dimensions: Applications in climate",

abstract = "One of the central challenges of data analysis in climate science is understanding complex dependencies between multiple spatiotemporal climate variables. The data are typically high dimensional with each climate variable in each spatial grid or time period denoting a separate dimension. In fact, in many climate problems, the dimensionality, that is, the number of possible features or factors potentially affecting a response variable, is usually much larger than the number of samples that are typically reanalysis data sets over the past few decades. For example, in one of the problems considered in this chapter, one wants to predict climate variables like monthly temperature, precipitable water, etc. over land locations using information from six climate variables over oceans. We formulate it as a regression problem with the climate variable over a land location as the response variable. We consider 439 locations on oceans, so that there are a total of 6 × 439 = 2634 covariates in our regression problem. The data are the monthly means of the climate variables for 1948-2007 from the National Centers for Environmental Prediction/National Center for Atmospheric Research (NCEP/NCAR) Reanalysis 1 data set [1], so that we have a total of 60 × 12 = 720 data samples.",

author = "Goncalves, {Andr{\'e} R.} and Arindam Banerjee and Vidyashankar Sivakumar and Soumyadeep Chatterjee",

note = "Publisher Copyright: {\textcopyright} 2017 by Taylor & Francis Group, LLC.",

year = "2017",

month = jan,

day = "1",

doi = "10.4324/9781315371740",

language = "English (US)",

isbn = "9781498703871",

pages = "13--32",

booktitle = "Large-Scale Machine Learning in the Earth Sciences",

publisher = "CRC Press",

}

TY - CHAP

T1 - Structured estimation in high dimensions

T2 - Applications in climate

AU - Goncalves, André R.

AU - Banerjee, Arindam

AU - Sivakumar, Vidyashankar

AU - Chatterjee, Soumyadeep

PY - 2017/1/1

Y1 - 2017/1/1

N2 - One of the central challenges of data analysis in climate science is understanding complex dependencies between multiple spatiotemporal climate variables. The data are typically high dimensional with each climate variable in each spatial grid or time period denoting a separate dimension. In fact, in many climate problems, the dimensionality, that is, the number of possible features or factors potentially affecting a response variable, is usually much larger than the number of samples that are typically reanalysis data sets over the past few decades. For example, in one of the problems considered in this chapter, one wants to predict climate variables like monthly temperature, precipitable water, etc. over land locations using information from six climate variables over oceans. We formulate it as a regression problem with the climate variable over a land location as the response variable. We consider 439 locations on oceans, so that there are a total of 6 × 439 = 2634 covariates in our regression problem. The data are the monthly means of the climate variables for 1948-2007 from the National Centers for Environmental Prediction/National Center for Atmospheric Research (NCEP/NCAR) Reanalysis 1 data set [1], so that we have a total of 60 × 12 = 720 data samples.

AB - One of the central challenges of data analysis in climate science is understanding complex dependencies between multiple spatiotemporal climate variables. The data are typically high dimensional with each climate variable in each spatial grid or time period denoting a separate dimension. In fact, in many climate problems, the dimensionality, that is, the number of possible features or factors potentially affecting a response variable, is usually much larger than the number of samples that are typically reanalysis data sets over the past few decades. For example, in one of the problems considered in this chapter, one wants to predict climate variables like monthly temperature, precipitable water, etc. over land locations using information from six climate variables over oceans. We formulate it as a regression problem with the climate variable over a land location as the response variable. We consider 439 locations on oceans, so that there are a total of 6 × 439 = 2634 covariates in our regression problem. The data are the monthly means of the climate variables for 1948-2007 from the National Centers for Environmental Prediction/National Center for Atmospheric Research (NCEP/NCAR) Reanalysis 1 data set [1], so that we have a total of 60 × 12 = 720 data samples.

UR - http://www.scopus.com/inward/record.url?scp=85051788997&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85051788997&partnerID=8YFLogxK

U2 - 10.4324/9781315371740

DO - 10.4324/9781315371740

M3 - Chapter

AN - SCOPUS:85051788997

SN - 9781498703871

SP - 13

EP - 32

BT - Large-Scale Machine Learning in the Earth Sciences

PB - CRC Press

ER -

Structured estimation in high dimensions: Applications in climate

Abstract

Bibliographical note

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this