11th Annual Machine Learning Symposium

11th Annual Machine Learning Symposium

Friday, March 3, 2017

The New York Academy of Sciences

Machine Learning, a subfield of computer science, involves the development of mathematical algorithms that discover knowledge from specific data sets, and then "learn" from the data in an iterative fashion that allows predictions to be made. Today, Machine Learning has a wide range of applications, including natural language processing, search engine optimization, medical diagnosis and treatment, financial fraud detection, and stock market analysis.

This symposium, the eleventh in an ongoing series presented by the Machine Learning Discussion Group at the New York Academy of Sciences, will feature Keynote Presentations from leading scientists in both applied and theoretical Machine Learning and Spotlight Talks, a series of short, early career investigator presentations across a variety of topics at the frontier of Machine Learning.


Two of the keynote addresses for this meeting are archived and available via Livestream. For full details, and to view the Livestreams, use the link below:


Registration Pricing

Member $60
Member (Student / Postdoc / Resident / Fellow) $25
Nonmember (Academia) $105
Nonmember (Corporate) $160
Nonmember (Non-profit) $105
Nonmember (Student / Postdoc / Resident / Fellow) $70


* Presentation titles and times are subject to change.

Friday, March 3, 2017

9:00 AM

Registration, Continental Breakfast, and Poster Set-up

10:00 AM

Welcome Remarks
Jennifer L. Costley, PhD, The New York Academy of Sciences
Mehryar Mohri, PhD, New York University

10:10 AM

Keynote Address 1: Challenges in Privacy-preserving Data Analysis
Kamalika Chaudhuri, PhD, University of California San Diego

10:50 AM

Audience Q&A

Spotlight Talks: Session 1

A series of short, early career investigator presentations across a variety of topics at the frontier of machine learning. Selected from Poster Abstracts.

11:05 AM

The Temporal Neural Coding Network: Towards Lifelong Language Learning
Alexander G. Ororbia II, Pennsylvania State University

11:10 AM

The Topology of Time Series Networks
Ben Cassidy, PhD, Columbia University

11:15 AM

Edward: A Library for Probabilistic Modeling, Inference, and Criticism
Dustin Tran, Columbia University

11:20 AM

Oracle-Efficient Learning and Auction Design
Nika Haghtalab, Carnegie Mellon University

11:25 AM

Unsupervised Learning of Word-Sequence Representations from Scratch via Convolutional Tensor Decomposition
Furong Huang, PhD, Microsoft Research

11:30 AM

Networking Break and Poster Viewing

12:20 PM

Keynote Address 2: Machine Learning for Individualized Healthcare
Suchi Saria, PhD, Johns Hopkins University

1:00 PM

Audience Q&A

1:15 PM

Networking Lunch and Poster Viewing

Spotlight Talks: Session 2

2:30 PM

Learning with Rejection
Giulia DeSalvo, New York University

2:35 PM

Corralling a Band of Bandit Algorithms
Haipeng Luo, PhD, Microsoft Research

2:40 PM

Reconstructing Hidden Permutations Using the Average-Precision (AP) Correlation Statistic
Lorenzo De Stefani, PhD, Brown University

2:45 PM

Human Teaching by Demonstration: Showing versus Doing Reinforcement Learning Tasks
Mark K. Ho, Brown University

2:50 PM

AdaCluster: Adaptive Clustering for Heterogeneous Data
Mehmet Emin Basbug, PhD, Lion Cave Capital

2:55 PM

A New Theory of Exploration in Reinforcement Learning with Function Approximation
Nan Jiang, University of Michigan

3:00 PM

Generalized Topic Modeling
Nika Haghtalab, Carnegie Mellon University

3:05 PM

AdaNet: Adaptive Structural Learning of Artificial Neural Networks
Scott Yang, Courant Institute of Mathematical Sciences

3:10 PM

Time Series Prediction and Online Learning: Model Selection and Ensemble Learning
Vitaly Kuznetsov, PhD, Google Research

3:15 PM

Networking Break and Poster Viewing

3:55 PM

Keynote Address 3: Applications of Learning Theory in Algorithmic Game Theory
Tim Roughgarden, PhD, Stanford University

4:35 PM

Audience Q&A

4:50 PM

Closing Remarks and Awards

Best Poster Presentation
IBM is the proud sponsor of the Best Poster Presenter awards

1. Nika Haghtalab, Carnegie Mellon University
2. Giulia DeSalvo, NYU Courant Institute
3. Andres Munoz Medina, Google Research
4. Mehmet Emin Basbug, Lion Cave Capital and Princeton University
5. Francisco J. R. Ruiz Columbia University and University of Cambridge

Spotlight Talk Award Presentation
Google is the proud sponsor of the Spotlight Talk awards

1. Scott Yang, Courant Institute of Mathematical Sciences
2. Haipeng Luo, Microsoft Research
3. Mark K. Ho, Brown University
4. Nan Jiang, University of Michigan
5. Nika Haghtalab, Carnegie Mellon University

5:00 PM

Networking Reception

6:00 PM

Symposium Adjourns

Keynote Speakers

Kamalika Chaudhuri, PhD

University of California San Diego

Kamalika Chaudhuri is an Assistant Professor in the Computer Science and Engineering Department at UC San Diego. Prior to joining the department, she received a PhD in Computer Science from UC Berkeley in 2007, and was a postdoctoral researcher at UC San Diego from 2007–2010. She is the recipient of a Hellman Faculty Fellowship and she received an NSF CAREER award in 2012. Kamalika's research is on learning theory, which deals with the theoretical foundations of machine learning. She is particularly interested in privacy-preserving machine learning—how to learn a good predictor from sensitive training data, while ensuring the privacy of individuals in the data set.

Tim Roughgarden, PhD

Stanford University

Tim Roughgarden is an Associate Professor of Computer Science and (by courtesy) Management Science and Engineering at Stanford University. He received a BS in Applied Mathematics from Stanford in 1997, and a PhD in Computer Science from Cornell in 2002. His research interests include the many connections between computer science and economics, as well as the design, analysis, and applications of algorithms. For his research, he has been awarded the ACM Grace Murray Hopper Award, the Presidential Early Career Award for Scientists and Engineers (PECASE), the Kalai Prize in Computer Science and Game Theory, the Shapley Lecturership of the Game Theory Society, the Social Choice and Welfare Prize, INFORM's Optimization Prize for Young Researchers, the Mathematical Programming Society's Tucker Prize, and the EATCS-SIGACT Gödel Prize.

Suchi Saria, PhD

Johns Hopkins University

Suchi Saria is an assistant professor of computer science, health policy and statistics at Johns Hopkins University. Her research interests are in statistical machine learning and "precision" healthcare. Specifically, her focus is in designing novel data-driven computing tools for optimizing care delivery. Her work is being used to drive electronic surveillance for reducing adverse events in the inpatient setting and to individualize disease management in complex, chronic diseases.

She received her PhD from Stanford University with Prof. Daphne Koller. Her work has received recognition in the form of two cover articles in Science Translational Medicine (2010, 2015), paper awards by the Association for Uncertainty in Artificial Intelligence (2007) and the American Medical Informatics Association (2011), an Annual Scientific Award by the Society of Critical Care Medicine (2014), a Rambus Fellowship (2004–2010), an NSF Computing Innovation fellowship (2011), and competitive awards from the Gordon and Betty Moore Foundation (2013), and Google Research (2014). In 2015, she was selected by the IEEE Intelligent Systems to the "AI's 10 to Watch" list. In 2016, she was selected as a DARPA Young Faculty awardee and to Popular Science's "Brilliant 10."

Scientific Organizing Committee

Naoki Abe, PhD

IBM Research

Corinna Cortes, PhD

Google Research

Corinna Cortes is the Head of Google Research, New York, where she works on a broad range of theoretical and applied large-scale machine learning problems. Prior to Google, Cortes spent more than ten years at AT&T Labs–Research, formerly AT&T Bell Labs, where she held a distinguished research position. Her research work is well-known in particular for her contributions to the theoretical foundations of support vector machines (SVMs), for which she jointly with Vladimir Vapnik received the 2008 Paris Kanellakis Theory and Practice Award, and her work on data-mining in very large data sets for which she was awarded the AT&T Science and Technology Medal in the year 2000. She received her master's degree in physics from the University of Copenhagen and joined AT&T Bell Labs as a researcher in 1989. She received her PhD in computer science from the University of Rochester in 1993. Cortes is also a competitive runner, and a mother of two.

Jennifer L. Costley, PhD

The New York Academy of Sciences

Patrick Haffner, PhD

Interactions Corporation

Tony Jebara, PhD

Columbia University

Tony Jebara is Associate Professor of Computer Science at Columbia University and Director of Machine Learning at Netflix. His research intersects computer science and statistics to develop new frameworks for learning from data with applications in recommendation, social networks, spatio-temporal data, vision and text. Jebara has founded and advised several startups including Sense Networks (acquired by yp.com), Evidation Health, Agolo, Ufora, MagikEye, and Bookt (acquired by RealPage NASDAQ:RP). He has published over 100 peer-reviewed papers in conferences, workshops and journals (such as NIPS, ICML, UAI, COLT, JMLR, CVPR, ICCV, and AISTAT) as well as the book Machine Learning: Discriminative and Generative. Jebara was the recipient of the Career award from the National Science Foundation, a best paper award at the 26th International Conference on Machine Learning, a best student paper award at the 20th International Conference on Machine Learning as well as an outstanding contribution award from the Pattern Recognition Society in 2001. Jebara's research has been featured on ABC, BBC, the New York Times, Slash Dot, Wired, Businessweek, IEEE Spectrum and more. He obtained his PhD in 2002 from MIT. Esquire magazine named him one of their Best and Brightest of 2008.

John Langford, PhD

Microsoft Research

John Langford is a machine learning research scientist, a field which he says "is shifting from an academic discipline to an industrial tool." He is the author of the weblog hunch.net and the principal developer of Vowpal Wabbit. John works at Microsoft Research New York, of which he was one of the founding members, and was previously affiliated with Yahoo! Research, Toyota Technological Institute, and IBM's Watson Research Center. He studied Physics and Computer Science at the California Institute of Technology, earning a double bachelor's degree in 1997, and received his PhD in Computer Science from Carnegie Mellon University in 2002. He was the program co-chair for the 2012 International Conference on Machine Learning.

Mehryar Mohri, PhD

Courant Institute of Mathematical Sciences, New York University

Mehryar Mohri is a Professor of Computer Science at the Courant Institute of Mathematical Sciences and a Research Consultant at Google Research. He graduated from Ecole Polytechnique (Paris), received an MS degree in Applied Mathematics and Computer Science from Ecole Normale Superieure (ENS Ulm, Paris), and a PhD in Computer Science from the University of Paris 7 in 1993. One year after his PhD, he joined AT&T Bell Labs, later AT&T Labs–Research, where he worked for ten years (1994–2004), in the last four years serving as a Department Head and a Technology Leader. His research interests cover a broad range of problems in machine learning, automata theory, and speech and natural language processing. He is co-author of the textbook Foundations of Machine Learning and several widely used software libraries.

Alexander Rakhlin, PhD

University of Pennsylvania

Alexander (Sasha) Rakhlin is an Associate Professor at the Department of Statistics at the University of Pennsylvania. He received his bachelor's degrees in Mathematics and Computer Science from Cornell University, and a doctoral degree from MIT. He was a postdoc at UC Berkeley before joining UPenn. His research is in machine learning, with an emphasis on online methods, statistics, and computation. Alexander is a recipient of the NSF CAREER award, IBM Research Best Paper award, Machine Learning Journal award, and COLT Best Paper Award.

Gunnar Rätsch, PhD

ETH Zürich

Dr. Gunnar Rätsch studied computer science and physics and obtained his PhD degree in computer science in 2001 with his work in Machine Learning (Boosting and Support Vector Machines) at the National Institute for Data Analysis in Berlin. He was a postdoctoral fellow at the Research School of Information Sciences and Engineering of the Australian National University in Canberra (Australia) and at the Max Planck Institute for Biological Cybernetics in Tübingen (Germany). In 2002, he received the Michelson award for his PhD work and in 2007 he was awarded the Olympus prize from the German Association for Pattern Recognition for his work on Boosting. Between 2005 and 2011 he led a research group at the Friedrich Miescher Laboratory of the Max Planck Society in Tübingen (Germany). In January 2012 he and his group moved to the Memorial Sloan Kettering Cancer Center in New York City (USA) to work on cancer genomics and applications of machine learning in healthcare. He recently moved to the Computer Science Department of ETH Zürich to form the new Biomedical Informatics group.

Robert Schapire, PhD

Microsoft Research

Robert Schapire is a Principal Researcher at Microsoft Research in New York City. He received his PhD from MIT in 1991. After a short post-doc at Harvard, he joined the technical staff at AT&T Labs (formerly AT&T Bell Laboratories) in 1991. In 2002, he became a Professor of Computer Science at Princeton University. He joined Microsoft Research in 2014. His awards include the 1991 ACM Doctoral Dissertation Award, the 2003 Gödel Prize, and the 2004 Kanelakkis Theory and Practice Award (both of the last two with Yoav Freund). He is a fellow of the AAAI, and a member of both the National Academy of Engineering and the National Academy of Sciences. His main research interest is in theoretical and applied machine learning, with particular focus on boosting, online learning, game theory, and maximum entropy. For more information, see http://rob.schapire.net/


Keynote Abstracts

Challenges in Privacy-preserving Data Analysis
Kamalika Chaudhuri, PhD, University of California San Diego

Machine learning algorithms increasingly work with sensitive information on individuals, and hence the problem of privacy-preserving data analysis—how to design data analysis algorithms that operate on the sensitive data of individuals while still guaranteeing the privacy of individuals in the data—has achieved great practical importance. In this talk, we address two problems in privacy-preserving data analysis.
First, we address the problem of privacy-preserving classification, and present an efficient classifier which is private in the differential privacy model of Dwork et al. Our classifier works in the ERM (empirical loss minimization) framework, and includes privacy preserving logistic regression and privacy preserving support vector machines.
We next address the question of learning from sensitive correlated data, such as private information on users connected together in a social network, and measurements of physical activity of a single user across time. We consider a recent generalization of differential privacy, called Pufferfish, that can be used to address privacy in correlated data and present new privacy mechanisms in this framework.
Joint work with Claire Monteleoni (George Washington University), Anand Sarwate (Rutgers), Yizhen Wang (UCSD) and Shuang Song (UCSD).

Machine Learning for Individualized Healthcare
Suchi Saria, PhD, Johns Hopkins University

Ever increasing quantities of health-related data are being collected in electronic health records (EHRs), by clinicians and through personal monitoring devices. I develop statistical learning algorithms for leveraging this data in optimizing decision-making in healthcare. In this talk, I plan to focus on the following overarching research question: how can we individualize prognosis tools using the diverse clinical data in EHRs? I will introduce challenges associated with personalization from observational clinical data. And, on the technical front, we will tackle questions such as:

i) how can diversity-in baseline health and in disease presentation-across individuals be modeled and estimated from data to individualize prognostic tools?

ii) how can we do joint analysis of heterogeneous data sources-demographics, history, biological markers, time-varying tests, treatments, and

iii) how do we develop models that generalize well to shifts in provider practice patterns. 

Applications of Learning Theory in Algorithmic Game Theory
Tim Roughgarden, PhD, Stanford University

Algorithmic game theory is a field that uses and extends tools from economics and game theory to reason about fundamental computer science problems. The field is important both for its applications, which span the gamut from network routing to online advertising, and for its remarkably diverse and rich connections to other areas of theoretical computer science, including complexity theory and approximation algorithms. In this talk, we survey two ways in which definitions and tools from learning theory have been crucial to recent advances in algorithmic game theory. First, we outline a theory of robust bounds on the "price of anarchy"—meaning approximation guarantees for game-theoretic equilibria—that apply to all outcome sequences generated by no-regret learners playing a multi-player game. Second, we explain how to use concepts from learning theory to make traditional (Bayesian) optimal auction theory operational, replacing the practically problematic "common prior" assumption with a data-driven approach. 

Spotlight Abstracts

Corralling a Band of Bandit Algorithms
Haipeng Luo, PhD, Microsoft Research, New York, New York

We study the problem of combining multiple bandit algorithms (that is, online learning algorithms with partial feedback) with the goal of creating a master algorithm that performs almost as well as the best base algorithm if it were to be run on its own. The main challenge is that when run with a master, base algorithms unavoidably receive much less feedback and it is thus critical that the master not starve a base algorithm that might perform uncompetitively initially but would eventually outperform others if given enough feedback. We address this difficulty by devising a version of Online Mirror Descent with a special mirror map together with a sophisticated learning rate scheme. We show that this approach manages to achieve a more delicate balance between exploiting and exploring base algorithms than previous works yielding superior regret bounds.
Our results are applicable to many settings, such as multi-armed bandits, contextual bandits, and convex bandits. As examples, we present two main applications. The first is to create an algorithm that enjoys worst-case robustness while at the same time performing much better when the environment is relatively easy. The second is to create an algorithm that works simultaneously under different assumptions of the environment, such as different priors or different loss structures.
Coauthors: Alekh Agarwal1, Behnam Neyshabur2, and Robert E. Schapire1.
1. Microsoft Research, New York, New York.
2. Toyota Technological Institute, Chicago, Illinois. 

Edward: A Library for Probabilistic Modeling, Inference, and Criticism
Dustin Tran, MS, Columbia University, New York, New York

Probabilistic modeling is a powerful approach for analyzing empirical information. In this talk, I will provide an overview of Edward, a software library for probabilistic modeling. Formally, Edward is a probabilistic programming system built on computational graphs, supporting compositions of both models and inference for flexible experimentation. For example, Edward makes it easy to fit the same model using a variety of composable inferences, ranging from point estimation, to variational inference, to MCMC. Edward is also integrated into TensorFlow, providing significant speedups over existing probabilistic systems. As examples, I will show how Edward can be leveraged for expanding the frontier of variational inference and deep generative models.
Coauthors: Matthew D. Hoffman2,3, Kevin Murphy2, Eugene Brevdo2, Rif A. Saurous2, and David M. Blei1.
1. Columbia University, New York, New York.
2. Google Brain, Mountain View, California.
3. Adobe Research, Mountain View, California. 

Generalized Topic Modeling
Nika Haghtalab, MMath, Carnegie Mellon University, Pittsburgh, Pennsylvania

Recently there has been significant activity in developing algorithms with provable guarantees for topic modeling. In standard topic models, a topic (such as sports, business, or politics) is viewed as a probability distribution over words, and a document is generated by first selecting a mixture over topics, and then generating words i.i.d. from the associated mixture . Given a large collection of such documents, the goal is to recover the topic vectors and then to correctly classify new documents according to their topic mixture.
In this work we consider a broad generalization of this framework in which words are no longer assumed to be drawn i.i.d. and instead a topic is a complex distribution over sequences of paragraphs. Since one could not hope to even represent such a distribution in general (even if paragraphs are given using some natural feature representation), we aim instead to directly learn a document classifier. That is, we aim to learn a predictor that given a new document, accurately predicts its topic mixture, without learning the distributions explicitly. We present several natural conditions under which one can do this efficiently and discuss issues such as noise tolerance and sample complexity in this model. More generally, our model can be viewed as a generalization of the multi-view or co-training setting in machine learning.
Coauthor: Avrim Blum, PhD, Carnegie Mellon University. 

A New Theory of Exploration in Reinforcement Learning with Function Approximation
Nan Jiang, MSc, University of Michigan, Ann Arbor, Michigan

In reinforcement learning, autonomous agents solve sequential decision-making problems by (1) actively exploring the environment to collect data, and (2) improving behavior by learning from the collected data. The recent success of reinforcement learning can largely be attributed to the use of advanced function approximation techniques (e.g., deep neural networks) for the learning component. In contrast, advances in exploration techniques have been rather limited: most existing algorithms that perform systematic exploration only apply to simple problems with small state spaces and cannot accommodate sophisticated function approximation schemes that are critical for solving real-world problems with rich observations.
In this work, we take a unified view of reinforcement learning with rich observations and function approximation through a new model called the Contextual Decision Process (CDP), which encompasses existing models such as MDPs and POMDPs. Several special cases of CDPs are, however, provably intractable in terms of the amount of data needed to find the near-optimal behavior. To overcome this challenge, we identify a structural property, called the Bellman rank, which is naturally small in a range of important reinforcement learning scenarios. We propose a new algorithm, whose sample complexity scales polynomially in the Bellman rank and other relevant parameters, and crucially has no dependence on the number of unique observations. The algorithm uses Bellman error minimization with optimistic exploration, and provides new insights into exploration in complex reinforcement learning problems.
Coauthors: Akshay Krishnamurthy1, Alekh Agarwal2, John Langford2, and Robert E. Schapire2.
1. University of Massachusetts, Amherst, Massachusetts.
2. Microsoft Research, New York, New York. 

Unsupervised Learning of Word-Sequence Representations from Scratch via Convolutional Tensor Decomposition
Furong Huang, PhD, Microsoft Research, New York, New York

Unsupervised text embedding extraction is crucial for text understanding in machine learning. Word2Vec and its variations are successful in mapping words with similar syntactic or semantic meaning to vectors close to each other. However, extracting context-aware word-sequence embedding remains a challenging task. Training over large corpus is difficult as labels are difficult to get. More importantly, it is challenging for pre-trained models to obtain word-sequence embeddings that are universally good for all downstream tasks or for any new datasets.
We propose a two-phased ConvDic+DeconvDec framework to solve the problem by combining a word-sequence dictionary learning model with a word-sequence embedding decode model. We propose a convolutional tensor decomposition mechanism to learn good word-sequence phrase dictionary in the learning phase. It is proved to be more accurate and much more efficient than the popular alternating minimization method. In the decode phase, we introduce a deconvolution framework that is immune to the problem of varying sentence lengths. The word-sequence embeddings we extracted using ConvDic+DeconvDec are universally good for a few downstream tasks we test on. The framework requires neither pre-training nor prior/outside information.
Coauthor: Anima Anandkumar, University of California, Irvine. 

Oracle-Efficient Learning and Auction Design
Nika Haghtalab, MMath, Carnegie Mellon University, Pittsburgh, Pennsylvania

We consider the design of online no-regret algorithms that are computationally efficient, given access to an offline optimization oracle. Our first main contribution is introducing an oracle-based online algorithm and conditions under which it achieves vanishing regret. Our learning algorithm is a generalization of the Follow-The-Perturbed-Leader algorithm of Kalai and Vempala that at every step plays the best-performing action subject to some independent random perturbations. Our design uses a shared source of randomness across all actions that can be efficiently implemented by adding to the history of the game a set of adversary's actions from a carefully designed distribution. Our work extends to oracle-efficient algorithms for contextual learning, learning with Maximal-in-Range approximation algorithms, and no-regret bidding in simultaneous auctions, answering an open problem of Daskalakis and Syrgkanis in the latter case.
Our second main contribution is introducing a new adversarial auction-design framework for revenue maximization and applying our oracle-efficient learning results to adaptive auction design. We give oracle-efficient learning results for: (1) VCG auctions with bidder-specific reserves in single-parameter settings, (2) envy-free item pricing in multi-item auctions, and (3) s-level auctions of Morgenstern and Roughgarden for single-item settings. The last result leads to an approximation of the optimal Myerson auction for the stationary distribution of a Markov process, extending prior work that only gave such guarantees for the i.i.d. setting. We also extend our framework to allow the auctioneer to use side information about the bidders in the design of the optimal auction (contextual learning).
Coauthors: Miroslav Didík1, Haipeng Luo1, Robert E. Schapire1, Vasilis Syrgkanis2, and Jennifer Wortman Vaughan1.
1. Microsoft Research, New York, New York.
2. Microsoft Research, Cambridge, Massachusetts. 

AdaCluster: Adaptive Clustering for Heterogeneous Data
Mehmet E. Basbug, PhD, Lion Cave Capital, Chatham, New Jersey, and Princeton University

Clustering algorithms start with a fixed divergence, which captures the possibly asymmetric distance between a sample and a centroid. In the mixture model setting, the sample distribution plays the same role. When all attributes have the same topology and dispersion, the data are said to be homogeneous. If the prior knowledge of the distribution is inaccurate or the set of plausible distributions is large, an adaptive approach is essential. The motivation is more compelling for heterogeneous data, where the dispersion or the topology differs among attributes. We propose an adaptive approach to clustering using classes of parametrized Bregman divergences. We first show that the density of a steep exponential dispersion model (EDM) can be represented with a Bregman divergence. We then propose AdaCluster, an expectation-maximization (EM) algorithm to cluster heterogeneous data using classes of steep EDMs. We compare AdaCluster with EM for a Gaussian mixture model on synthetic data and nine UCI data sets. We also propose an adaptive hard clustering algorithm based on Generalized Method of Moments. We compare the hard clustering algorithm with k-means on the UCI data sets. We empirically verified that adaptively learning the underlying topology yields better clustering of heterogeneous data.
Coauthor: Barbara E. Engelhardt, Princeton University. 

Time Series Prediction and Online Learning: Model Selection and Ensemble Learning
Vitaly Kuznetsov, PhD, Google Research, New York, New York

We present a series of theoretical and algorithmic results for time series prediction leveraging recent advances in the statistical learning analysis of this problem and on-line learning. We prove the first generalization bounds for a hypothesis derived by online-to-batch conversion of the sequence of hypotheses output by an online algorithm, in the general setting of a non-stationary non-mixing stochastic process. Our guarantees hold for adapted sequences of hypotheses both for convex and non-convex losses. We give generalization bounds for sequences of hypotheses that may not be adapted but that admit a stability property. Our bounds are given in terms of a discrepancy measure, which we show can be accurately estimated from data under a mild assumption.
We also highlight the application of our results to two related problems: model selection in time series prediction, and the design of accurate ensembles of time series predictors. Model selection for time series prediction appears to be a difficult task: in contrast with the i.i.d. scenario, in time series analysis, there is no straightforward method for splitting a sample into a training and validation sets. Using the most recent data for validation may result in models that ignore the most recent information. Validating over the most distant past may lead to selecting sub-optimal parameters. Any other split of the sample may result in the destruction of important statistical patterns and correlations across time. We show that, remarkably, our on-line-to-batch conversions enable us to use the same time series for both training and model selection.
Coauthors: Mehryar Mohri, Google Research and Courant Institute, New York, New York. 

The Temporal Neural Coding Network: Towards Lifelong Language Learning
Alexander G. Ororbia II, BS, Pennsylvania State University

We present a novel lifelong neural architecture, the Temporal Neural Coding Network (TNCN), and its learning algorithm, for uncovering multiple levels of distributed representations for language data streams. The TNCN model adapts its parameters iteratively as new samples are observed without resorting to the popular, but expensive back-propagation through time procedure needed to calculate gradients for recurrent neural networks, notably requiring no unrolling of internally defined recurrence relations. Furthermore, we integrate concepts from variational encoder-decoder models, such as those in [1], to easily demonstrate how uncertainty can be characterized in the model's next step predictions. We discuss how the proposed TNCN works specifically on language-based tasks formulated in the streaming setting, compared to various neural models, as well as how the TNCN may be easily adapted to other real-time tasks, such as video sequence modeling, yielding a natural framework for online multi-modal and online semi-supervised learning.
Coauthors: David Reitter and C. Lee Giles, Pennsylvania State University.
References: Serban, Iulian; Ororbia II, Alexander G.; Pineau, Joelle; Courville, Aaron. Multi-modal Variational Encoder-Decoders. 2017. arXiv:1612.00377 [cs.CL] 

The Topology of Time Series Networks
Ben Cassidy, PhD, Columbia University, New York, New York

Large high-dimensional networks constructed from time series arise naturally in many areas of machine learning such as genomics, neuroimaging, econometrics, and social networks. But the interpretation and comparison of differently sized networks remains problematic, since many measures assume the networks to be compared are defined on the same node set, or those measures scale with network (graph) size. A further problem is to meaningfully compare between dense and sparse time series networks.
In this work we introduce a new network comparison method based on Persistent Homology, an approach from Topological Signal Processing. Persistent homology studies the mesoscopic architecture within networks, to find distributed patterns that appear or disappear over a range of scales.
We apply the new method to compare networks from neuroimaging (brain activity networks) and econometrics datasets; we show which fundamental topological features are equivalent or different between networks of different sizes, and between sparse and dense networks. In the process we explain why many existing time series network identification methods fail in real world circumstances.
Coauthors: DuBois Bowman1, Goran Marjanovic2, and Victor Solo2,3.
1. Columbia University, New York, New York.
2. The University of New South Wales, Sydney.
3. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, Massachusetts. 

AdaNet: Adaptive Structural Learning of Artificial Neural Networks
Scott Yang, BS, Courant Institute, New York, New York

We present a new theoretical framework for building and learning artificial neural networks. Formulating the configuration of a neural network as an effective parametrized model for supervised learning, we design a method for training neural networks that adapts model structure and complexity to the difficulty of the particular problem, with no pre-defined architecture.
Starting from a simple single layer network, we add additional neurons and layers to the model using rigorous estimates of the generalization ability from statistical learning theory. Specifically, we determine whether to add new nodes to the network by measuring the incremental reduction in empirical error against the additional model complexity.
This structural risk minimization-type approach is in contrast to existing work on structure learning, which has been either based on heuristics for growing and pruning the network or focused on designing more efficient methods for large-scale hyperparameter tuning. The former is unable to provide concrete theoretical guarantees, and the latter is wasteful of data and statistically equivalent to standard grid search.
Our structure learning algorithm is based on directly optimizing for generalization performance. This allows us to not only enforce model sparsity but to also provide strong accompanying generalization bounds, in contrast to previous work in this area. Remarkably, the resulting method is also convex and hence more stable than many of the current deep learning methodologies employed. Preliminary experiments support and validate our theory and framework.
Coauthors: Corinna Cortes1, Xavi Gonzalvo1, Vitaly Kuznetsov1, and Mehryar Mohri1,2.
1. Google Research, New York, New York.
2. Courant Institute, New York, New York. 

Learning with Rejection
Giulia DeSalvo, BS, New York University Courant Institute, New York, New York

We consider a flexible binary classification scenario where the learner is given the option to reject an instance instead of returning its prediction, thereby incurring a cost. We introduce a novel framework for this setting that consists of simultaneously learning two functions: a classifier along with a rejection function. We present a full theoretical analysis of this framework including new data-dependent learning bounds in terms of the Rademacher complexities of the classifier and rejection families as well as consistency and calibration results. These theoretical guarantees guide us in designing new algorithms that can exploit different kernel-based hypothesis sets for the two sets of classifier and rejection functions. We also present a boosting-style algorithm where at each round, it selects a pair of functions, a base classifier and a base rejection function. We give its convergence guarantees along with a linear-time weak-learning algorithm for rejection stumps. We compare and contrast to the special case of confidence-based rejection, devising alternative loss functions and algorithms for this setting as well. We report the results of several experiments showing that our algorithms can yield a notable improvement over the best existing confidence-based rejection algorithm.
Coauthors: Corinna Cortes2 and Mehryar Mohri1.
1. New York University Courant Institute, New York, New York.
2. Google Research, New York, New York. 

Reconstructing Hidden Permutations Using the Average-Precision (AP) Correlation Statistic
Lorenzo De Stefani, PhD, Brown University, Providence, Rhode Island

We study the problem of learning probabilistic models for permutations, where the order between highly ranked items in the observed permutations is more reliable (i.e., consistent in different rankings) than the order between lower ranked items, a typical phenomena observed in many applications such as web search results and product ranking. We introduce and study a variant of the Mallows model where the distribution is a function of the widely used Average-Precision (AP) Correlation statistic, instead of the standard Kendall's tau distance.
We present a generative model for constructing samples from this distribution and prove useful properties of that distribution. Using these properties we develop an efficient algorithm that provably computes an asymptotically unbiased estimate of the center permutation, and a faster algorithm that learns with high probability the hidden central permutation for a wide range of the parameters of the model.
We show experimentally the accuracy of our estimators in the reconstruction of the hidden permutation with a limited number of samples. We also show with real data that unsupervised methods based on our model can precisely (and efficiently) identify ground-truth clusters of rankings. Compared to the Kendall's tau based methods, our methods are less affected by noise in low-rank items.
Finally, we show that supervised classification algorithms based on the AP statistic outperform Kendall's tau distance based algorithms by comparing their performance in classifying large genomic expressions from ten publicly available datasets from human cancer research.
Appeared in the proceedings of the Thirtieth (AAAI) Conference on Artificial Intelligence.
Coauthors: Alessandro Epasto2, Eli Upfal1, and Fabio Vandin3.
1. Brown University, Providence, Rhode Island.
2. Google Research, New York, New York.
3. Univeristà degli Studi di Padova, Padova, Italy. 

Human Teaching by Demonstration: Showing versus Doing Reinforcement Learning Tasks
Mark K. Ho, ScM, Brown University, Providence, Rhode Island

What is the relationship between doing a task and showing another agent how to do a task? In inverse reinforcement learning, an algorithm attempts to learn a policy or task representation by observing example demonstrations. Typically, these example demonstrations are sampled from the optimal policy for a task. That is, they are samples of an expert "doing" a task. However, when teaching by demonstration, people often do not simply do a task but actively attempt to show others how to do a task. A better understanding of this showing behavior can be used to improve inverse reinforcement learning algorithms.
In this work, we present a computational account of showing as a form of Bayesian pedagogy: optimally selecting sequences of behavior that lead an observer to infer a particular underlying reward function. This formulation can be understood as "planning in a learner's belief space" and extends previous accounts of teaching by example to sequential domains. In several human experiments, we demonstrate that people's behavior when "showing" a task closely matches this model, while their "doing" a task does not. Similarly, we show that these human-generated showing trajectories are more effective for training standard inverse reinforcement learning algorithms. Moreover, this work provides a basis for developing inverse reinforcement learning algorithms that can benefit from reasoning about intentional teaching.
Coauthors: Michael Littman1, James MacGlashan1, Fiery Cushman2, and Joseph L. Austerweil3.
1. Brown University, Providence, Rhode Island.
2. Harvard University, Cambridge, Massachusetts.
3. University of Wisconsin–Madison, Madison, Wisconsin.

Travel & Lodging

Our Location

The New York Academy of Sciences

7 World Trade Center
250 Greenwich Street, 40th floor
New York, NY 10007-2157

Directions to the Academy

Hotels Near 7 World Trade Center

Recommended partner hotel 

Club Quarters, World Trade Center
140 Washington Street
New York, NY 10006
Phone: 212.577.1133

The New York Academy of Sciences is a member of the Club Quarters network, which offers significant savings on hotel reservations to member organizations. Located opposite Memorial Plaza on the south side of the World Trade Center, Club Quarters, World Trade Center is just a short walk to the Academy.

Use Club Quarters Reservation Password NYAS to reserve your discounted accommodations online.

Other nearby hotels

Conrad New York


Millenium Hilton


Marriott Financial Center


Club Quarters, Wall Street


Eurostars Wall Street Hotel


Gild Hall, Financial District


Wall Street Inn


Ritz-Carlton New York, Battery Park