Support The World's Smartest Network

Help the New York Academy of Sciences bring late-breaking scientific information about the COVID-19 pandemic to global audiences. Please make a tax-deductible gift today.

This site uses cookies.
Learn more.


This website uses cookies. Some of the cookies we use are essential for parts of the website to operate while others offer you a better browsing experience. You give us your permission to use cookies, by continuing to use our website after you have received the cookie notification. To find out more about cookies on this website and how to change your cookie settings, see our Privacy policy and Terms of Use.

We encourage you to learn more about cookies on our site in our Privacy policy and Terms of Use.

9th Annual Machine Learning Symposium

9th Annual Machine Learning Symposium

Friday, March 13, 2015

The New York Academy of Sciences

Presented By

The Machine Learning Discussion Group at the New York Academy of Sciences


In our current digital age, a wealth of data is available at our fingertips. Often, the value of this 'Big Data' is not in the data itself, but in the ability to learn from it in order to make predictions. Machine Learning, a branch of artificial intelligence, involves the development of mathematical algorithms that discover knowledge from specific data sets, and then "learn" from the data in an iterative fashion that allows predictions to be made. Today, Machine Learning has a wide range of applications, including natural language processing, search engine functionality, medical diagnosis, credit card fraud detection, and stock market analysis.

This symposium — part of an ongoing series presented by the Machine Learning Discussion Group at the New York Academy of Sciences — will feature Keynote Presentations from leading scientists in both applied and theoretical Machine Learning. Keynote Speakers include Pedro Domingos, Yoshua Bengio, and Elad Hazan.

2015 Spotlight Talk Awards

The New York Academy of Sciences congratulates the winners of the 2015 Spotlight Talk Awards, which recognized a series of the best oral research presentations delivered by early career investigators during the Symposium.

Regret Minimization in Posted Price Auctions against Strategic Buyers
Andres Muñoz Medina
Courant Institute

Large-Scale Clustering of Sentences and Patients based on Electronic Health Records
Stefan Stark
Memorial Sloan-Kettering Cancer Center

Learning With Deep Cascades
Giulia DeSalvo
Courant Institute

Achieving All with No Parameters: Adaptive NormalHedge
Haipeng Luo
Princeton University

Approximate Kernel Methods for Speech Recognition and Computer Vision
Avner May
Columbia University

Anchored Factor Analysis
Yonatan Halpern
New York University

On-line Learning Approach to Ensemble Methods for Structured Prediction
Vitaly Kuznetsov
Courant Institute

Probabilistic Bayesian Analysis of Genetic Associations with Clinical Features in Cancer
Melanie Pradier
Memorial Sloan-Kettering Cancer Center

Theoretical Foundations for Learning Kernels in Supervised Kernel PCA
Dmitry Storcheus
Courant Institute (Currently Google)

Finding a Sparse Vector in a Subspace: Linear Sparsity Using Alternating Directions
Qing Qu
Columbia University

Google is the proud sponsor of the Spotlight Talk awards.


Naoki Abe, IBM Research
Corinna Cortes, Google
Patrick Haffner, Interactions Corporation
Tony Jebara, Columbia University
John Langford, Microsoft Research
Mehryar Mohri, Courant Institute of Mathematical Sciences, New York University
Gunnar Rätsch, Memorial Sloan Kettering Cancer Center
Greg Recine, The New York Academy of Sciences
Robert Schapire, Microsoft Research and Princeton University
Di Xu, American Express

Mission Partner support for the Frontiers of Science program provided by   Pfizer


* Presentation titles and times are subject to change.

March 13, 2015

9:00 AM

Registration, Continental Breakfast, and Poster Set-up

10:00 AM

Welcome Remarks
Greg Recine, PhD, The New York Academy of Sciences

10:10 AM

Keynote Address 1
Sum-Product Networks: Deep Models with Tractable Inference
Pedro Domingos, University of Washington

10:50 AM

Audience Q&A

Spotlight Talks: Session 1

A series of short, early career investigator presentations across a variety of topics at the frontier of Machine Learning science. Selected from Poster Abstracts.

11:05 AM

Multi-Class Deep Boosting
Vitaly Kuznetsov, MS, Courant Institute of Mathematical Sciences, New York University

11:10 AM

Large-Scale Clustering of Sentences and Patients based on Electronic Health Records
Stefan Stark, Memorial Sloan-Kettering Cancer Center

11:15 AM

Achieving All with No Parameters: Adaptive NormalHedge
Haipeng Luo, BS, Princeton University

11:20 AM

Learning With Deep Cascades
Giulia DeSalvo, BA, Courant Institute of Mathematical Sciences, New York University

11:25 AM

Anchored Factor Analysis
Yonatan Halpern, New York University

11:30 AM

Networking Break and Poster Viewing

12:20 PM

Keynote Address 2
Overcoming Computational Hardness by Non-proper Learning
Elad Hazan, PhD, Princeton University

1:00 PM

Audience Q&A

1:15 PM

Networking Lunch and Poster Viewing

Spotlight Talks: Session 2

2:30 PM

Regret Minimization in Posted Price Auctions against Strategic Buyers
Andres Muñoz Medina, Courant Institute of Mathematical Sciences, New York University

2:35 PM

Probabilistic Bayesian Analysis of Genetic Associations with Clinical Features in Cancer
Melanie Pradier, MSc, Memorial Sloan-Kettering Cancer Center

2:40 PM

Finding a Sparse Vector in a Subspace: Linear Sparsity Using Alternating Directions
Qing Qu, MS, Columbia University

2:45 PM

Approximate Kernel Methods for Speech Recognition and Computer Vision
Avner May, MS, Columbia University

2:50 PM

Theoretical Foundations for Learning Kernels in Supervised Kernel PCA
Dmitry Storcheus, MSc, Memorial Sloan-Kettering Cancer Center

2:55 PM

Keynote Address 3
Deep Generative Models
Yoshua Bengio, PhD, University of Montreal

3:35 PM

Audience Q&A

3:50 PM

Networking Break

Career Development Presentations

4:15 PM

Machine Learning at American Express
Alexander Statnikov, VP, Digital Modeling and Machine Learning, American Express

4:25 PM

Dataminr - Real-time Information Detection
Mike Myer, CPO & VP, Engineering and Julian Pan, Chief Data Scientist, Dataminr

4:35 PM

Award Presentation

Best Early Career Investigator Research Presentation
The Scientific Organizing Committee will announce the winner(s), selected from Spotlight Talk presentations made throughout the day.

Google is the proud sponsor of the early career investigator Spotlight Talk awards.

4:45 PM

Closing Remarks

4:50 PM

Networking Reception

Company representatives will be available at information tables for students to speak with during this time. Refreshments will also be available.

5:50 PM

Symposium Adjourns


Keynote Speakers

Yoshua Bengio, PhD

Université de Montréal

Yoshua Bengio received a PhD in Computer Science from McGill University, Canada in 1991. After two post-doctoral years, one at M.I.T. with Michael Jordan and one at AT&T Bell Laboratories with Yann LeCun and Vladimir Vapnik, he became professor at the Department of Computer Science and Operations Research at Université de Montréal. He is the author of two books and around 200 publications, the most cited being in the areas of deep learning, recurrent neural networks, probabilistic learning algorithms, natural language processing and manifold learning. He is among the most cited Canadian computer scientists and is or has been associate editor of the top journals in machine learning and neural networks. Since 2000 he holds a Canada Research Chair in Statistical Learning Algorithms, since 2006 an NSERC Industrial Chair, since 2005 his is a Fellow of the Canadian Institute for Advanced Research (CIFAR) and since 2014 co-directs the CIFAR NCAP program. He is on the board of the NIPS foundation and has been program chair and general chair for NIPS. He has co-organized the Learning Workshop for 14 years and co-created the new International Conference on Learning Representations. His current interests are centered around a quest for AI through machine learning, and include fundamental questions on deep learning and representation learning, the geometry of generalization in high-dimensional spaces, manifold learning, biologically inspired learning algorithms, and challenging applications of statistical machine learning.

Pedro Domingos, PhD

University of Washington

Pedro Domingos is Professor of Computer Science and Engineering at the University of Washington. His research interests are in machine learning, artificial intelligence and data science. He received a PhD in Information and Computer Science from the University of California at Irvine, and is the author or co-author of over 200 technical publications. He is a winner of the SIGKDD Innovation Award, the highest honor in data science. He is a AAAI Fellow, and has received a Sloan Fellowship, an NSF CAREER Award, a Fulbright Scholarship, an IBM Faculty Award, and best paper awards at several leading conferences. He is a member of the editorial board of the Machine Learning journal, co-founder of the International Machine Learning Society, and past associate editor of JAIR. He was program co-chair of KDD-2003 and SRL-2009, and has served on numerous program committees.

Elad Hazan, PhD

Princeton University

Elad Hazan joined the faculty at Princeton in 2015 from the Technion, where he had been an associate professor of operations research. His research focuses on the design and analysis of algorithms for basic problems in machine learning and optimization. He is the recipient of (twice) the IBM Goldberg best paper award in 2012 for contributions to sublinear time algorithms for machine learning, and in 2008 for decision making under uncertainty, a European Research Council grant, a Marie Curie fellowship and a Google Research Award (twice). He serves on the steering committee of the Association for Computational Learning and is a program co-chair for COLT 2015.


Naoki Abe

IBM Research

Naoki Abe has been a Research Staff Member in the Data Analytics Research group since May 2001, and is engaged in research in Machine Learning, Data Mining, and their applications to problems in business analytics. Abe obtained his BS and MS in computer science from MIT in 1984 and a PhD in Computer and Information Sciences from the University of Pennsylvania in 1989. From 1984 to 1985, he worked as a researcher at IBM T.J. Watson Research Center. From 1989 to 1990, he was a postdoctoral researcher at U.C. Santa Cruz, where he conducted research in computational learning theory. During the 1990s, he was with NEC research laboratories in Japan, where he was engaged in research in machine learning and its applications to various areas, including data mining, e-commerce, natural language processing, and bioinformatics. During this period he was also involved with the MITI-sponsored Real World Computing project, and the MEXT-sponsored Discovery Science project. From 1998 to 2000, he was adjunct Associate Professor at the Tokyo Institute of Technology. Naoki has served as program committee members of ICML, ALT, and COLT, and is currently on the editorial board of the Journal of Machine Learning Research.

Corinna Cortes


Corinna Cortes is the Head of Google Research, NY, where she is working on a broad range of theoretical and applied large-scale machine learning problems. Prior to Google, Corinna spent more than ten years at AT&T Labs - Research, formerly AT&T Bell Labs, where she held a distinguished research position. Corinna's research work is well-known in particular for her contributions to the theoretical foundations of support vector machines (SVMs), for which she jointly with Vladimir Vapnik received the 2008 Paris Kanellakis Theory and Practice Award, and her work on data-mining in very large data sets for which she was awarded the AT&T Science and Technology Medal in the year 2000. Corinna received her MS degree in Physics from University of Copenhagen and joined AT&T Bell Labs as a researcher in 1989. She received her Ph.D. in computer science from the University of Rochester in 1993.

Corinna is also a competitive runner.

Patrick Haffner

Interactions Corporation

Patrick Haffner is a Lead Scientist at Interactions, a company that combines Machine Learning and human intelligence for better virtual assistants. He was with AT&T Research until 2014. In the 1990s, he pioneered Neural Networks for speech and image recognition, resulting in the first industrial deployment of Deep Learning by AT&T in 1996 (with Yann LeCun). He was also one of the lead inventor of the DjVu compression technology. Since 2000, his main focus has been large scale algorithms for speech, natural language and network applications.

Tony Jebara

Columbia University

Tony Jebara is Associate Professor of Computer Science at Columbia University. He chairs the Center on Foundations of Data Science as well as directs the Columbia Machine Learning Laboratory. In 2004, Jebara was the recipient of the Career award from the National Science Foundation. His work was recognized with a best paper award at the 26th International Conference on Machine Learning, a best student paper award at the 20th International Conference on Machine Learning as well as an outstanding contribution award from the Pattern Recognition Society in 2001. He obtained his PhD in 2002 from MIT.

John Langford

Microsoft Research

John Langford is a Doctor of Learning at Microsoft Research. His work includes research in machine learning, game theory, steganography, and Captchas. He was previously a Research Associate Professor at the Toyota Technological Institute in Chicago. He has worked in the past at IBM's Watson Research Center in Yorktown, NY, under the Goldstine Fellowship. He earned a PhD in computer science from Carnegie Mellon University in 2002 and a Physics/Computer Science double major from CalTech in 1997.

Mehryar Mohri

Courant Institute of Mathematical Sciences, New York University

Mehryar Mohri spent about 10 years at AT&T Bell Labs or AT&T Labs–Research (1995–2004), where, in the last four years, he served as the Head of the Speech Algorithms Department and as a Technology Leader, overseeing research projects in machine learning, text and speech processing, and the design of general algorithms. He joined the Courant Institute in the summer of 2004 as a Professor of Computer Science. In 2004, he was a Visiting Professor at Google Research for a full semester where he worked on several machine learning and algorithmic research projects. Since then, he continues to work at Google as a Research Consultant. His current topics of interest are machine learning, computational biology, and text and speech processing.

Gunnar Rätsch

Memorial Sloan-Kettering Cancer Center

Data scientist Gunnar Rätsch develops and applies advanced data analysis and modeling techniques to data from genomics, high-throughput sequencing, clinical records and images.

He earned his PhD at the German National Laboratory for Information Technology under supervision of Klaus-Robert Müller. His thesis is on iterative algorithms related to Boosting and Support Vector Machines. He was a postdoc with Bob Williamson and Bernhard Schölkopf. Gunnar Rätsch received the Max Planck Young and Independent Investigator award and was leading the group on Machine Learning in Genome Biology at the Friedrich Miescher Laboratory in Tübingen (2005-2011). In 2012 he joined Memorial Sloan-Kettering Cancer Center as Associate Faculty.

The Rätsch laboratory advances computational methods for the analysis of big data common in the biomedical sciences. The group utilizes, develops and integrates ideas from machine learning, operations research, sequence analysis, statistical genetics, text mining and computer vision with the aim to discover relationships in complex biomedical data.

Robert Schapire

Microsoft Research and Princeton University

Robert Schapire received his ScB in math and computer science from Brown University in 1986, and his SM (1988) and PhD (1991) from MIT under the supervision of Ronald Rivest. After a short post-doc at Harvard, he joined the technical staff at AT&T Labs (formerly AT&T Bell Laboratories) in 1991. Since 2002, he has been on the faculty of Princeton University where he is currently the David M. Siegel '83 Professor in Computer Science. His awards include the 1991 ACM Doctoral Dissertation Award, the 2003 Gödel Prize, and the 2004 Kanelakkis Theory and Practice Award (both of the last two with Yoav Freund). He is a fellow of the AAAI, and a member of the National Academy of Engineering. His main research interest is in theoretical and applied machine learning.

Di Xu

American Express

Di Xu is vice president, Risk and Information Management at American Express. Di has been with American Express since 2001 in positions of increasing responsibility in decision science, including acquisition, underwriting and fraud and customer management modeling functions. He currently heads the global underwriting decision science team that supports new accounts underwriting and line management models for both US and International markets as well as acquisition and targeting models for the U.S. market. Di and his team are actively exploring cutting-edge machine learning research and its application in financial services. He earned a doctorate degree in Industrial Engineering and a Master of Science in Statistics, both from Rutgers University and Bachelor's in Engineering in Control Theory from Shanghai JiaoTong University.

Spotlight Speakers

Giulia DeSalvo

New York University, Courant Institute of Mathematical Sciences

Giulia DeSalvo is a third year mathematics PhD student at NYU's Courant Institute of Mathematical Sciences with funding from NSF. Her research interests are in both theory and applications of machine learning including computational learning theory, decision trees, ensemble methods, and on-line learning. She received a BA in applied mathematics and Italian studies from UC Berkeley with Highest Honors in Applied Mathematics and Highest Distinction in General Scholarship. She has worked and lived in multiple countries namely Italy, US, Japan, France, Germany, and Switzerland. Most notably, she completed a Fulbright in Italy and a NSF funded REU in Japan. You can reach her at desalvo (at) cims (dot) nyu (dot) edu.

Yonatan Halpern

New York University

Yoni Halpern is a 4th year PhD student at New York University studying machine learning. His research interests include efficient and provable methods for learning latent variable models and applications for medical diagnosis and informatics.

Vitaly Kuznetsov

New York University, Courant Institute of Mathematical Sciences

Vitaly Kuznetsov is a PhD candidate in applied mathematics at the Courant Institute of Mathematical Sciences. His advisor is Professor Mehryar Mohri. Before coming to Courant Institute, Vitaly received Bachelor's and Master's degrees in mathematics and computer science from University of Toronto. Vitaly's current research interests are theory and applications of machine learning and a wide range of related topics that includes probability theory, statistics, optimization and algorithms. Within machine learning, his focus is on ensemble methods, structured prediction, time series and learning theory.

Haipeng Luo

Princeton University

Haipeng Luo is currently a fourth year PhD student in the Computer Science Department at Princeton University, where he works with Professor Rob Schapire. His main research interest is in theoretical and applied machine learning, with a focus on adaptive and robust online learning algorithms. Previously he received his bachelor's degree in computer science from Peking University in 2011, with a double major in mathematics.

Avner May, MS

Columbia University

Avner May is a PhD candidate in the Computer Science Department at Columbia University, advised by Professor Michael Collins. Prior to his PhD, he spent 2 years working as a software engineer at Microsoft, working in Redmond, WA. He graduated with a BA in Mathematics from Harvard University, with a minor in Computer Science. His main research interests are in large-scale machine learning, with a focus on speech recognition and computer vision. His recent work has focusing on scaling kernel methods to compete in domains where deep learning has been prevailing.

Andres Muñoz Medina

Courant Institute of Mathematical Sciences

Andres Muñoz Medina is a PhD student at the Courant Institute of Mathematical Sciences in New York. He is currently working under the advice of Mehryar Mohri. His research interests include the theory and design of algorithms for domain adaptation as well as the use of machine learning techniques for the optimization of markets. He has been the recipient of the Dean's dissertation award at NYU as well as the Harold Grad memorial prize for most promising graduate student. For the past 3 years Andres has also been a Software Engineer intern at Google Research working on the efficient implementation of large scale machine learning algorithms.

Melanie F. Pradier

University Carlos III

Melanie is a PhD student at the University Carlos III in Madrid, and a member of the European "Machine Learning for Personalized Medicine"–ITN network. She is currently a research visitor at the Memorial Sloan Kettering Cancer Center, working on Bayesian Non-Parametric models for biomedical applications. Melanie studied Telecommunication Engineering at the Technical University of Madrid, and obtained her MSc in Information Technology at the University of Stuttgart in 2011. Afterwards, she spent two years working in the industry at Sony Research Center in Stuttgart and Sony Corporation R&D in Tokyo. Her interests include Dependent non-parametric processes, fast DP/IBP extensions, MCMC methods, variational inference, clustering and topic modeling.

Qing Qu

Columbia University

Qing Qu is currently a second-year Ph.D student in Electrical Engineering Department of Columbia University, working with Prof. John Wright. Prior to that, he obtained his Bachelor degree from Tsinghua University in Jul. 2011, and got his master degree from Johns Hopkins University in Dec. 2012 with Prof. Trac Tran, both in Electrical Engineering. From 2012 to 2013, he interned at U.S Army Research Laboratory with Dr. Nasser Nasrabadi. His current research interest focuses on developing practical algorithms and provable guarantees for signal processing and machine learning problems with low intrinsic data structure.

Stefan Stark

Memorial Sloan-Kettering Cancer Center

Stefan Stark graduated from NYU in 2014 with a bachelor's degree in Mathematics. There he worked with Theoretical Chemistry Prof. Mark E. Tuckerman on free energy calculation algorithms and Computational Biologist Prof. Richard Bonneau on gene network inference. He is now an analyst at the Rätsch Lab of Memorial Sloan Kettering Cancer Center's Computational Biology department where he works with genomic and clinical text data sets.

Dmitry Storcheus, MSc

Google Research

Dmitry is an Engineer at Google Research NY. He specializes in the research and implementation of scalable machine learning algorithms. He received his MSc in Mathematics from the Courant Institute at NYU, where he wrote a thesis with advisor Mehryar Mohri on Supervised Kernel PCA. Dmitry's recent research contributions include deriving generalization guarantees for suprevized dimensionality reduction and currently he is working on implementing matrix approximation algorithms.


Keynote Presentations

Sum-Product Networks: Deep Models with Tractable Inference
Pedro Domingos, PhD, University of Washington, Seattle, Washington, United States

Big data makes it possible in principle to learn very rich probabilistic models, but inference in them is prohibitively expensive. Since inference is typically a subroutine of learning, in practice learning such models is very hard. Sum-product networks (SPNs) are a new model class that squares this circle by providing maximum flexibility while guaranteeing tractability. In contrast to Bayesian networks and Markov random fields, SPNs can remain tractable even in the absence of conditional independence. SPNs are defined recursively: an SPN is either a univariate distribution, a product of SPNs over disjoint variables, or a weighted sum of SPNs over the same variables. It's easy to show that the partition function, all marginals and all conditional MAP states of an SPN can be computed in time linear in its size. SPNs have most tractable distributions as special cases, including hierarchical mixture models, thin junction trees, and nonrecursive probabilistic context-free grammars. I will present generative and discriminative algorithms for learning SPN weights, and an algorithm for learning SPN structure. SPNs have achieved impressive results in a wide variety of domains, including object recognition, image completion, collaborative filtering, and click prediction. Our algorithms can easily learn SPNs with many layers of latent variables, making them arguably the most powerful type of deep learning to date. (Joint work with Rob Gens and Hoifung Poon.)

Overcoming Computational Hardness by Agnostic Non-Proper Learning
Elad Hazan, PhD, Princeton University Department of Computer Science and Microsoft Research, Herzliya, Israel

Numerous learning problems are naturally modelled by NP-hard formulations. Examples include predicting outcomes in a sports tournament, learning the preference of users in media recommendation systems and learning the optimal ranking of web search results. Recent literature overcomes the computational hardness for these and other problems through convex optimization and regret minimization techniques, giving rise to agnostic non-proper learning algorithms that are both efficient and come with provable generalization guarantees. We describe a recent example—the problem of learning from low rank missing data arising in recommendation systems.
Coauthors: Roi Livni and Yishay Mansour, Tel Aviv University Department of Computer Science and Microsoft Research, Herzliya, Israel.

Deep Generative Models
Yoshua Bengio, PhD, Université de Montréal, Ontario, Canada

Boltzmann machines and their variants (restricted ordeep) have been the dominant model for generative neural network models for a long time and they are appealing among other things because of their relative biological plausibility (say, compared to back-prop). We start this presentation by discussing some of the difficulties we encountered in training them, and undirected graphical models in general, and ask the question of the existence of credible alternatives which avoid the issues with the partition function gradient and mixing between modes with MCMC methods. We review advances of recent years to train deep unsupervised models that capture the data distribution, all related to auto-encoders, and that avoid the partition function and MCMC issues. In particular recent theoretical and empirical work on denoising auto-encoders can be extended to train deep generative models with latent variables. The presentation will end with an introduction to on-going exploration of a framework that is meant to be more biologically plausible as well as an alternative to both Boltzmann machines and back-propagation.

Spotlight Presentations

Achieving All with No Parameters: Adaptive NormalHedge
Haipeng Luo, Department of Computer Science, Princeton University

The problem of predicting with expert advice was pioneered in [1, 2, 3, 4] about two decades ago, but there are still constant new challenges on this problem driven from both theory and practice, and various advancements from different aspects in recent years. In this work, we address a challenging problem of catching any unknown competitor with zero prior information, even when the competitor is changing over time. Our main contribution is a novel expert algorithm that is truly parameter-free and adaptive to the environment.
Roughly speaking, in the expert problem, a player tries to lose as little as possible by cleverly spreading a fixed amount of money to bet on a set of experts on each day. His goal is to have a small regret, i.e. to have a total loss that is not much worse than any single expert, or more generally, any fixed and unknown convex combination of experts that we want to compare to. When this competitor is known, the well-known exponential weights algorithm already gives the optimal results [2, 5]. However, when the competitor is unknown beforehand, existing algorithms fail to do the job. Let alone the case where we even allow the unknown competitor to vary over time.
We address this problem by proposing an improved version of the NormalHedge.DT algorithm [6], called adaptive NormalHedge (or AdaNormalHedge for short). On one hand, this new algorithm is completely parameter-free and able to compete with any convex combination of experts with a regret in terms of the relative entropy of the prior and the competitor. On the other hand, it ensures a new regret bound in terms of the cumulative magnitude of the instantaneous regrets, which is always at most the bound for NormalHedge.DT. More importantly, this new form of regret implies 1) a small regret when the loss of the competitor is small and 2) an almost constant regret when the losses are stochastically generated. This resolves an open problem proposed by [7] and [8]. In fact, our results are even better and more general.
We then extend the results to the so-called sleeping expert setting and provide two applications to illustrate the power of AdaNormalHedge: 1) competing with time-varying unknown competitors and 2) predicting almost as well as the best pruning tree. Our results on these applications significantly improve previous work from different aspects, and a special case of the first application resolves another open problem on whether one can simultaneously achieve optimal shifting regret for both adversarial and stochastic losses [9].
Coauthor: Robert E. Schapire, Department of Computer Science, Princeton University and Microsoft Research, New York City.
[1] Nick Littlestone and Manfred K. Warmuth. The weighted majority algorithm. Information and Computation, 108:212–261, 1994.
[2] Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, August 1997.
[3] Nicolo Cesa-Bianchi, Yoav Freund, David Haussler, David P. Helmbold, Robert E. Schapire, and Manfred K. Warmuth. How to use expert advice. Journal of the ACM, 44(3):427–485, May 1997.
[4] V. G. Vovk. A game of prediction with expert advice. Journal of Computer and System Sciences, 56(2):153–173, April 1998.
[5] Yoav Freund and Robert E. Schapire. Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29:79–103, 1999.
[6] Haipeng Luo and Robert E. Schapire. A drifting-games analysis for online learning and applications to boosting. In Advances in Neural Information Processing Systems 27, 2014.
[7] Kamalika Chaudhuri, Yoav Freund, and Daniel Hsu. A parameter-free hedging algorithm. In Ad- vances in Neural Information Processing Systems 22, 2009.
[8] Alexey Chernov and Vladimir Vovk. Prediction with advice of unknown number of experts. In Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence, 2010.
[9] Manfred K. Warmuth and Wouter M. Koolen. Open problem: Shifting experts on easy data. In Proceedings of the 27th Annual Conference on Learning Theory, 2014.

Approximate Kernel Methods for Speech Recognition
Avner May2†

In this work, we investigate how to scale kernel methods to large-scale problems in speech recognition and computer vision, and compare their performance with deep neural networks (DNNs). We show that on these tasks, our methods are competitive with well-engineered neural networks.
The difficulty in scaling kernel methods is that their running times are generally at least quadratic in the size of the training set; simply computing the kernel matrix takes O(N2d) time and O(N2) space, where N is the training set size, and d is the dimensionality of the data. This makes using standard algorithms like kernel SVM intractable for large-scale problems. In this work, we investigate whether it is possible to use approximation techniques to help scale kernel methods, in the context of the acoustic modeling problem in speech recognition, as well as in the digit and objection recognition problems from computer vision. Our work builds on the kernel approximation method proposed by Rahimi and Recht [1], which uses random projections, passed through a cosine non-linearity, to generate feature representations. In our work, we use these features to train a multinomial logistic regression model. We leverage GPUs extensively during training, both for generating the random features, as well as for model updates.
Our contributions are as follows: First, we propose ways of incorporating ideas from multiple kernel learning (MKL) in the context of the "Random Kitchen Sink" (RKS) features of Rahimi and Recht. Second, we show how to scale training to very large numbers of random features, by training separate models on disjoint subsets of the features in parallel, and then combining the models. Third, we provide extensive experiments which compare these kernel methods with deep neural networks, on two speech recognition datasets, as well as two computer vision tasks, and show that these methods match or surpass the DNNs in many cases. Lastly, we show that the representations learned from DNNs appear to be complementary to the RKS features, as combining these models achieves better performance than any single model.
Our empirical work is as follows: For the speech recognition problem, we worked with the IARPA Babel Program Cantonese (IARPAbabel101-v0.4c) and Bengali (IARPA-babel103b-v0.4b) limited language packs. Each pack contains a 20-hour training, and a 20-hour test set. We show that although DNN models often attain lower perplexity ("Perp") and higher frame-level state accuracy ("Accuracy"), the kernel models are quite competitive in these metrics, and sometimes beat the DNNs in the token error rates ("TER") measured after speech recognition is performed. See table below for details.
For the computer vision problem, we worked with the MNIST-8M dataset for digit recognition, and with the CIFAR-10 dataset for object recognition. We observe very comparable performance between these methods, with kernel methods outperforming DNNs on CIFAR-10, but DNNs winning on MNIST-8M. See detailed results in table below. It is important to acknowledge that convolution neural networks (CNNs) can significantly outperform DNNs on these tasks, and thus these results are not state of the art. However, they do indicate comparable performance between DNNs and kernels in a challenging domain.
These results call into question the assumption commonly made today that deep architectures are necessary to achieve state of the art results in speech recognition and computer vision. They open the door for new kernel-based methods to be explored for speech recognition and computer vision, as well as in other domains where deep architectures are accepted as the de facto standard.
Coauthors: Zhiyun Lu1†, Avner May2†,*, Kuan Liu1‡, Alireza Bagheri Garakani1‡, Dong Guo1‡, Aurélian Bellet4‡, Linxi Fan2‡, Michael Collins2, Brian Kingsbury3, Michael Picheny3, and Fei Sha1

1 Department of Computer Science, University of Southern California, Los Angeles, CA
2 Department of Computer Science, Columbia University, New York, NY
3 IBM T.J. Watson Research Center, Yorktown Heights, NY
4 LTCI UMR 5141, Télécom ParisTech & CNRS, France
† and ‡: Shared first and second co-authorships, respectively.

Probabilistic Bayesian Analysis of Genetic Associations with Clinical Features in Cancer
Melanie F. Pradier, University Carlos III in Madrid and Memorial Sloan-Kettering Cancer Center

Understanding key genotype-phenotype relationships remains one of the central challenges in biomedical research [1]. Such studies allow identifying genetic risk factors in patients or improving current clinical diagnosis. It also gives more insights about the diseases at hand, which might be valuable for marker discovery or treatment personalization. This work deals with the task of finding associations between genetic variants and clinical features in cancer. The features considered are directly extracted from the Electronic Health Records (EHR) of the patients. To the best of our knowledge, there exist no previous work that considers such high-level features.
We first present a classical approach to find pair-wise associations using Linear Mixed Models (LMM) [2]. Such models are often considered because of their capacity to deal with confounding effects, like population structure, which can cause false positive associations if ignored. Already known covariates such as for example age, gender or cancer type are accounted for as structured noise in the model. We estimate additional hidden confounders using PANAMA (Probabilistic ANAlysis of GenoMic Data), with is a statistical model often used in expression quantitative trait loci (eQTL) studies [3]. This model combines a Gaussian Process Latent Variable Model of the phenotypes with a LMM, considering the obtained latent projections as noise or signal in an iterative fashion.
The first approach does not consider epistasis, i.e. complex interactions between genetic variants, nor pleiotropy, i.e. multiple traits being influenced by the same mutation. Also, complex diseases such as breast cancer often present very heterogeneous phenotypes, which might cause some associations to remain hidden [4]. Therefore, we propose an alternative approach where multiple clinical features and multiple genetic markers are tested all together. Two infinite mixture models are used to identify genetic and clinical patterns independently [5]. The previous LMM approach can then be used with the mixture assignment variables as input, accounting for both group clinical features and group genetic variants, while identifying sub-phenotypes in the population.
The LMM also assumes that the features are Gaussian-distributed. This assumption does not hold for most clinical features. A common practice applies a rank transformation to the features to make them Gaussian- distributed, despite of the information loss. We additionally propose an alternative approach using a joint Bayesian Partition model to generate both the discrete clinical features and genetic information. Bayesian modeling has already been proved useful on epistasis [6], pleiotropy [6,7] or sub-phenotyping [8,9] applications. Our model combines ideas of these previous contributions and is the first one to deal with clinical text data and genetic information, capturing interactions of multiple-markers beyond additive effects.
Coauthors: Fernando Perez-Cruz, University Carlos III in Madrid; Julia E. Vogt, Stefan Stark, and Gunnar Rätsch, Memorial Sloan-Kettering Cancer Center
[1] M. I. McCarthy, G. R. Abecasis, L. R. Cardon, D. B. Goldstein, J. Little, J. P. A. Ioannidis, and J. N. Hirschhorn. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet, vol. 9, no. 5, pp. 356–369, May 2008.
[2] C. Lippert, F. P. Casale, B. Rakitsch, and O. Stegle. LIMIX: genetic analysis of multiple traits. bioRxiv, 2014.
[3] N. Fusi, O. Stegle, and N. D. Lawrence. Joint modelling of confounding factors and prominent genetic regulators provides increased accuracy in genetical genomics studies. PLoS Comput Biol, vol. 8, no. 1, p. e1002330, Jan. 2012.
[4] M. D. Ritchie, L. W. Hahn, N. Roodi, L. R. Bailey, W. D. Dupont, F. F. Parl, and J. H. Moore. Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am. J. Hum. Genet. vol. 69, no. 1, pp. 138–147, Jul. 2001.
[5] A. Rodriguez and K. Ghosh. Nested partition models. Jack Baskin School of Engineering, 2009.
[6] Y. Zhang and J. S. Liu. Bayesian inference of epistatic interactions in case-control studies. Nature Genetics, vol. 39, no. 9, pp. 1167–1173, Sep. 2007.
[7] W. Zhang, J. Zhu, E. E. Schadt, and J. S. Liu. A bayesian partition method for detecting pleiotropic and epistatic eQTL modules. PLoS Comput Biol, vol. 6, no. 1, Jan. 2010.
[8] D. Warde-Farley, M. Brudno, Q. Morris, and A. Goldenberg. Mixture model for subphenotyping in GWAS. in Pac. Symp. Biocomput, 2012, vol. 17, pp. 363–374.
[9] L. Parts, O. Stegle, J. Winn, and R. Durbin. Joint genetic analysis of gene expression data with inferred cellular phenotypes. PLoS Genetics, vol. 7, no. 1, p. e1001276, Jan. 2011.

Learning With Deep Cascades
Giulia DeSalvo, Courant Institute of Mathematical Sciences, New York University

Anchored Factor Analysis
Yonatan Halpern, New York University

On-line Learning Approach to Ensemble Methods for Structured Prediction
Vitaly Kuznetsov, Courant Institute of Mathematical Sciences, New York University

Regret Minimization in Posted Price Auctions against Strategic Buyers
Andrés Muñoz Medina, Courant Institute of Mathematical Sciences, New York University

Finding a Sparse Vector in a Subspace: Linear Sparsity Using Alternating Directions
Qing Qu, Columbia University

Large-Scale Clustering of Sentences and Patients Based on Electronic Health Records
Stefan Stark, Memorial Sloan-Kettering Cancer Center

Theoretical Foundations for Learning Kernels in Supervised Kernel PCA
Dmitry Storcheus, Google Research

Travel & Lodging

Our Location

The New York Academy of Sciences

7 World Trade Center
250 Greenwich Street, 40th floor
New York, NY 10007-2157

Directions to the Academy

Hotels Near 7 World Trade Center

Recommended partner hotel

Club Quarters, World Trade Center
140 Washington Street
New York, NY 10006
Phone: 212.577.1133

The New York Academy of Sciences is a member of the Club Quarters network, which offers significant savings on hotel reservations to member organizations. Located opposite Memorial Plaza on the south side of the World Trade Center, Club Quarters, World Trade Center is just a short walk to the Academy.

Use Club Quarters Reservation Password NYAS to reserve your discounted accommodations online.

Other nearby hotels

Conrad New York


Millenium Hilton


Marriott Financial Center


Club Quarters, Wall Street


Eurostars Wall Street Hotel


Gild Hall, Financial District


Wall Street Inn


Ritz-Carlton New York, Battery Park