The First International Workshop on Climate Informatics

FREE

for Members

The First International Workshop on Climate Informatics

Friday, August 26, 2011

The New York Academy of Sciences

Climate Informatics 2011 logo

The threat of climate change is one of the greatest challenges currently facing society. Given the profound impact machine learning has made on the natural sciences to which it has been applied, such as the field of bioinformatics, we are forging and encouraging collaborations between machine learning (as well as data mining and statistics) and climate science in order to accelerate progress in answering pressing questions in climate science.

The goal of this workshop is to incubate this new field, climate informatics. Recent progress on climate informatics reveals that collaborations with climate scientists also open interesting new problems for machine learning. There are myriad collaborations possible at the intersection of these two fields. We hope that every workshop attendee leaves with a new collaboration in climate informatics. The format of the workshop will emphasize communication among the various fields, with a strong emphasis on brainstorming and break-out sessions, as well as a panel discussion. We will also generate a white paper on climate informatics as a result of the workshop.

Focusing on topics at the interface of climate science with machine learning, data mining, statistics, and related fields, the workshop's topics will include, but not be limited to:

  • • Machine learning, data mining, or statistics as applied to climate science
  • • Long-term climate prediction
  • • Short-term climate prediction
  • • Combining the predictions of climate model ensembles
  • • Past climate reconstruction
  • • Uncertainty quantification
  • • Spatio-temporal methods applied to climate data
  • • Time series methods applied to climate data
  • • Methods for modeling and predicting climate extremes
  • • Climate change attribution
  • • Dependence and causality among climate variables
  • • Data assimilation
  • • Hybrid methods
  • • Modeling of climate data

 

Networking reception to follow.

Additional information available on the climate informatics workshop wiki.


Supported by:

  • The Climate Center of LDEO and GISS

Information Science and Technology Center at Los Alamos National Laboratory

  • NEC Laboratories America

Columbia University Department of Statistics

Presenting Partner

  • Columbia Center for Computational Learning Systems

Silver Partner

  • Yahoo! Labs

Image credit: Michael Tippett. Colors show deviations of sea-surface temperatures from their climatological values in the equatorial Pacific from January 1997 to April 2000 with time going counter-clockwise.

Agenda

* presentation times are subject to change


Friday, August 26, 2011

9:00 AM

Opening Remarks

9:15 AM

Invited Tutorial on Machine Learning for Climate Science
Arindam Banerjee, PhD, University of Minnesota

10:00 AM

Coffee Break (posters on view)

10:20 AM

Climate Science Tutorial
Gavin Schmidt, PhD, NASA Goddard Institute for Space Studies and Columbia University

10:50 AM

Breakout on Problems

11:50 AM

Keynote address
David Musicant, PhD, Carleton College

12:30 PM

Lunch (posters on view)

2:00 PM

Keynote address
Timothy DelSole, PhD, George Mason University and Center for Ocean-Land-Atmosphere Studies

2:40 PM

Breakout on Data

3:40 PM

Coffee Break (posters on view)

4:00 PM

Keynote address
Douglas Nychka, PhD, National Center for Atmospheric Research

4:40 PM

Summary Reports from All Breakout Groups

5:10 PM

Panel Discussion

6:10 PM

Closing Remarks

6:15 PM

Reception (posters on view)

Speakers

Speakers

Arindam Banerjee, PhD

Computer Science and Engineering, University of Minnesota

Arindam Banerjee is an associate professor and a McKnight Land Grant Professor at the Department of Computer and Engineering and a Resident Fellow at the Institute on the Environment at the University of Minnesota, Twin Cities. He received his PhD from the University of Texas at Austin in 2005, where his dissertation was nominated for the best dissertation award. His research interests are in machine learning, data mining, information theory, convex analysis and optimization, and their applications in complex real-world problems. He has won several awards, including the NSF CAREER award in 2010, the McKnight Land-Grant Professorship at the University of Minnesota, Twin Cities (2009–2011), the J. T. Oden Faculty Research Fellowship from the Institute for Computational Engineering and Sciences (ICES), University of Texas at Austin (2006), and the prestigious IBM PhD fellowship for the academic years 2003–2004 and 2004–2005. He has also won several awards for his publications, including the Best Paper Award at the SIAM International Conference on Data Mining (SDM) (2004), the Best Research Paper Award under University Cooperative Society Research Excellence Awards, University of Texas at Austin (2005), and the Best of SIAM Data Mining (SDM) Award at the SIAM International Conference on Data Mining (2007).

Timothy DelSole, PhD

George Mason University and Center for Ocean-Land-Atmosphere Studies

Timothy DelSole is Associate Professor of Atmospheric, Oceanic, and Earth Sciences at George Mason University and a research scientist at the Center for Ocean-Land-Atmosphere Studies. His research focuses on distinguishing human and natural influences on climate and predicting seasonal changes in climate. He is noted particularly for his development of stochastic models of turbulence and application of multivariate statistics in climate and predictability research.

DelSole has published over 50 peer-reviewed papers since receiving his PhD in Applied Physics from Harvard University in 1993. He worked at NASA Goddard Space Flight Center as a Global Change Distinguished Postdoctoral Fellow for two years and as a National Research Council Associate for two years. DelSole received the Distinguished Teacher Award from the Derek Bok Center for Teaching and Learning at Harvard University and was a nominee for a Teaching Excellence Award at George Mason University. Since 2010 he has served as editor of Journal of Climate.

David Musicant, PhD

Computer Science, Carleton College

David Musicant is an associate professor of computer science at Carleton College in Northfield, MN. He received his PhD in Computer Sciences from University of Wisconsin–Madison and his undergraduate degrees from Michigan State University. His research interests are in machine learning and data mining, and he has been involved with the EDAM project in applying these ideas to atmospheric data analysis. He has also recently been involved with the GroupLens group at the University of Minnesota in studying social-computing systems, particularly Wikipedia.

Douglas Nychka, PhD

Institute for Mathematics Applied to Geosciences, National Center for Atmospheric Research

Douglas Nychka is a statistical scientist with an interest in the problems posed by geophysical data sets. His PhD (1983) is from the University of Wisconsin, and he subsequently spent 14 years as a faculty member at North Carolina State University. His research background in fitting curves and surfaces led to an interest in the analysis of spatial and environmental data. Pursuing this area of application, he assumed leadership of the Geophysical Statistics Project at the National Center for Atmospheric Research (NCAR) in 1997, an NSF-funded program to build collaborative research and training between statistics and the geosciences. In 2004 he became Director of the Institute of Mathematics Applied to Geosciences, an interdisciplinary component at NCAR with a focus on transferring innovative mathematical models and tools to the geosciences. His current interests are in quantifying the uncertainty of numerical experiments that simulate the Earth's present and possible future climate.

Gavin Schmidt, PhD

NASA Goddard Institute for Space Studies and Columbia University

See biography below.

Co-Chairs

Claire Monteleoni, PhD

George Washington University and Columbia University

Claire Monteleoni is an assistant professor of computer science at George Washington University. She is also an adjunct research scientist at the Center for Computational Learning Systems, Columbia University, where she was previously an associate research scientist since 2008. Prior to joining Columbia, she was a postdoc in computer science and engineering at the University of California, San Diego. She completed her PhD in 2006 and her master's degree in 2003, both in computer science, at the Massachusetts Institute of Technology. She did her undergraduate work in earth and planetary sciences at Harvard University. Her research focus is on machine learning theory and algorithms and climate informatics, specifically accelerating discovery in climate science with machine learning. Her work in climate informatics has received a Best Application Paper Award and has been presented at an Expert Meeting of the Intergovernmental Panel on Climate Change (IPCC), a panel formed by the UN that shared the 2007 Nobel Peace Prize.

Gavin Schmidt, PhD

NASA Goddard Institute for Space Studies and Columbia University

Gavin Schmidt is a climatologist with NASA’s Goddard Institute for Space Studies in New York, where he models past, present, and future climate. He received a BA in Mathematics in 1989 from Oxford University and a PhD in Applied Mathematics in 1994 from University College London. He was a postdoctoral fellow at McGill University in Montreal, until 1996, when he was awarded a Climate and Global Change Fellowship from the National Oceanic and Atmospheric Administration and moved to the Goddard Institute. Schmidt was cited by Scientific American as one of the 50 leading researchers of 2004 and was a contributing author for the 2007 Nobel Prize-winning report of the Intergovernmental Panel on Climate Change. He is a cofounder and contributing editor of RealClimate.org, which provides context and background on climate science issues that are missing in popular media coverage.

Organizers

Francis Alexander, PhD

Information Science and Technology Center at Los Alamos National Laboratory

Francis Alexander received a PhD in Physics from Rutgers University in 1991 and a BS degree in mathematics and physics from Ohio State University in 1987. He then joined CNLS at LANL as a postdoc, where he worked on problems in statistical physics and computational fluid dynamics. In 1993 Alexander moved to Lawrence Livermore National Laboratory, where he started work on hybrid numerical algorithms for multiscale problems. In 1995 he then joined the research faculty at Boston University in the Center for Computational Science. In 1998 Alexander returned to LANL as a staff member in what was then the CIC division. In 2002 he became the Deputy Group Leader for CCS-3 and in 2007 the Group Leader. Alexander is currently Acting Deputy Division Leader for CCS Division, as well as the Information Science and Technology Center Leader.

Alexandru Niculescu-Mizil, PhD

NEC Laboratories America

Alexandru Niculescu-Mizil has been a research staff member at NEC Labs America since 2010. Before joining NECLA, he was a Herman Goldstine postdoctoral fellow at IBM T. J. Watson Research Center. He received his PhD from Cornell University in 2008 under the supervision of Rich Caruana, MS degree in computer science from Cornell University, and bachelor's degree in mathematics and computer science, magna cum laude, from University of Bucharest. His research interests are in machine learning and data mining, particularly in inductive transfer, graphical model structure learning, probability estimation, empirical evaluations, ensemble methods, and online learning. He received an ICML Distinguished Student Paper Award in 2005 for his work on probability estimation and a COLT Best Student in 2008 paper award for his work on online learning. In 2009 he led the IBM Research team that won the KDD Cup.

Karsten Steinhaeuser, PhD

University of Minnesota

Karsten Steinhaeuser is a research associate in the Department of Computer Science and Engineering at the University of Minnesota. His primary responsibilities include two major research projects: an NSF Expeditions in Computing on "Understanding Climate Change: A Data Driven Approach" and the Planetary Skin Institute. His research interests are broadly in data mining and machine learning, in particular the construction and analysis of complex networks with applications in diverse domains including (but not limited to) climate, ecology, and social networks. He is actively involved in shaping an emerging research area called climate informatics, which lies at the intersection of computer science and climate sciences. He co-organizes the IEEE ICDM Workshop on Knowledge Discovery from Climate Data and the International Workshop on Climate Informatics, among others, and is engaged in numerous other professional service activities. Karsten earned his PhD in Computer Science and Engineering at the University of Notre Dame in 2011; he previously received an MS in Computer Science and Engineering (2007) and a BS, summa cum laude, in Computer Science (2005), both from the University of Notre Dame.

Michael Tippett, PhD

The International Research Institute for Climate and Society, Columbia University

Michael Tippett is a research scientist at the International Research Institute for Climate and Society at Columbia University. He develops and implements methods for producing reliable, calibrated probabilistic seasonal forecasts of global temperature and precipitation. His interests include prediction, predictability, and analysis of climate variability. He received his PhD from the Courant Institute of Mathematical Sciences at New York University and undergraduate degrees in electrical engineering and mathematics from North Carolina State University.

Additional speaker biographies forthcoming.

Abstracts

Tutorial: Introduction to Machine Learning for Climate Science

Arindam Banerjee, PhD, Computer Science and Engineering, University of Minnesota

Over the past few decades, the field of machine learning has matured significantly, drawing ideas from several disciplines including artificial intelligence, optimization, and statistics. Application of machine learning has led to important advances in a wide variety of domains ranging from Internet applications to scientific problems. The talk will give a gentle tutorial introduction to machine learning with focus on four families of models and methods, viz., predictive models, graphical models, online learning, and exploratory data analysis. The talk will discuss the main idea behind some of key approaches in each family and the problems where they are applicable. Examples of potential climate science applications will be discussed.

Climate Science Tutorial

Gavin Schmidt, PhD, NASA Goddard Institute for Space Studies and Columbia University

I will provide an overview of current issues in climate science. I will specifically discuss the role of computation, the types and sources of data available, and the grand challenges that might be amenable to a computational/machine learning approach. In particular, I will highlight three areas where I think there is significant scope for interdisciplinary collaboration: the development of sub-grid scale parameterizations for climate models, the use and interpretation of the multi-model ensemble of climate projections, and the role of data mining with observations and models to constrain uncertain physical processes. In each case, there are new and large data sets (petabyte size) being generated that are defying standard approaches to interpretation.

The Whole Enchilada: Environmental Chemistry through Intelligent Atmospheric Data Analysis

David Musicant, PhD, Carleton College

What happens when you cross two atmospheric chemists and two data miners? You get the EDAM project, which was designed to come up with new tools, algorithms, and research approaches for solving problems involving atmospheric chemistry datasets. During the span of this project, we applied clustering algorithms to mass spectra, researched algorithms for doing supervised learning on datasets with inconsistent time granularities, and developed Enchilada, a comprehensive software suite for combining and analyzing mass spectra with other forms of time series data. Working on a cross-disciplinary team such as this was fun, productive, and, at times, bewildering. In this talk I'll review the above work, recall challenges we faced, and look towards the future.

Outstanding Problems at the Interface of Climate Prediction and Data Mining

Timothy DelSole, PhD, George Mason University and Center for Ocean-Land-Atmosphere Studies

Climate prediction refers to the process of predicting the state of the atmosphere-ocean-land-ice-biosphere system based on antecedent observations, while data mining refers to the process of discovering new patterns from large data sets. I discuss three outstanding problems in climate prediction that can be framed as data mining problems. The first problem is to identify variables in one large data set that can be used to predict variables in another large data set. For instance, a wide body of scientific research shows that the average temperature during winter over some continents is influenced by the global sea surface temperature structure. Unfortunately, the historical record is too short to confidently identify the associated spatial structures. In some cases, such as the summer monsoon rainfall over India, a phenomenon that dramatically affects a billion people every year, it is unclear whether there is any basis for prediction at all. I discuss some of the (unsatisfactory) methods climate scientists use to deal with this problem. The second problem is to determine whether predictions by numerical weather and climate models are better than a random guess. While numerous statistical methods exist for deciding whether predictions of a single variable are better than a random guess, the problem of deciding whether a field of variables can be predicted better than random guesses is much harder. A third problem is how to combine climate predictions made by quasi-independent modeling centers. Outstanding questions in this topic include: is there a "best" model, is there a model that should be ignored, and when should the models be treated on equal footing?

Regional Climate Informatics: A Statistical Perspective

Douglas Nychka, PhD, National Center for Atmospheric Research

As attention shifts from broad global summaries of climate change to more specific regional impacts, there is a need for data sciences to quantify the uncertainty in regional predictions. This talk will provide an overview on regional climate experiments with an emphasis on the statistical and computational problems for interpreting these large and complex simulations. A regional climate model (RCM) is a computer code based on physics that simulates the detailed flow of the atmosphere in a particular region from the large scale information of a global climate model. One intent is to compare simulations under current climate to future scenarios to infer the nature of climate change expected at a location. Clearly this an area where information and statistical science has an important role to play. Due to computational constraints RCM experiments typically simulate only 20-30 years of weather, and so a statistical treatment is needed to separate the underlying climate of the model from year-to-year variations. The output of these experiments includes surface variables such as temperature and rainfall that can be compared to observations but also simulate more complex variables such as soil moisture or wind patterns that are informative for underlying ecological or geophysical processes. Finally, it is important to attach measures of uncertainty to the model predictions in order to make these geophysical results relevant for forming local policy and making economic decisions. This talk will use the recent North American Regional Climate Change and Assessment Program to illustrate some analyses of a multi-factor design where the response is one or more spatial fields of geophysical variables. As a technical anchor, an analysis-of-variance (ANOVA) will be described for spatial fields. This is a Bayesian approach that can incorporate prior information about the fields and also give measures of uncertainty in the results.

Travel & Lodging

Our Location

The New York Academy of Sciences

7 World Trade Center
250 Greenwich Street, 40th floor
New York, NY 10007-2157
212.298.8600

Directions to the Academy

Hotels Near 7 World Trade Center

Recommended partner hotel

Club Quarters, World Trade Center
140 Washington Street
New York, NY 10006
Phone: 212.577.1133

The New York Academy of Sciences is a member of the Club Quarters network, which offers significant savings on hotel reservations to member organizations. Located opposite Memorial Plaza on the south side of the World Trade Center, Club Quarters, World Trade Center is just a short walk to the Academy.

Use Club Quarters Reservation Password NYAS to reserve your discounted accommodations online.

Other nearby hotels

Millenium Hilton

212.693.2001

Marriott Financial Center

212.385.4900

Club Quarters, Wall Street

212.269.6400

Eurostars Wall Street Hotel

212.742.0003

Gild Hall, Financial District

212.232.7700

Wall Street Inn

212.747.1500

Ritz-Carlton New York, Battery Park

212.344.0800