
3rd Annual Machine Learning Symposium
Friday, October 10, 2008
This is the third symposium on Machine Learning at the New York Academy of Sciences. The aim of these series of symposia is to build a community of scientists in machine learning from the NYC area's academic, government, and industrial institutions by convening and promoting the exchange of ideas in a neutral setting.
Steering Committee
- Corinna Cortes, PhD, Google, Inc.
- Tony Jebara, PhD, Columbia University
- John Langford, PhD, Yahoo Research
- Michael L. Littman, PhD, Rutgers University
- Mehryar Mohri, PhD, Courant Institute of Mathematical Sciences
- Robert Schapire, PhD, Princeton University
- David Waltz, PhD, Columbia University
Schedule
9:30 AM Coffee and Poster Set-Up
10:00 AM Opening Remarks
10:15 AM Edo Airoldi, Troyanskaya Lab, Princeton University
11:00 AM Dana Angluin, Yale University
11:45 AM Graduate Student Talks
Corinna Cortes, Mehryar Mohri, Michael Riley and Afshin Rostamizadeh, Google Research
Koby Crammer, Eyal Even-Dar, Yishay Mansour and Jennifer Wortman, University of Pennsylvania
Sina Jafarpour, Princeton University
Lihong Li and Thomas J. Walsh, Rutgers University
Piotr W. Mirowski, Yann LeCun, Deepak Madhavan and Ruben Kuzniecky, Courant Institute of Mathematical Science
Mehryar Mohri and Ameet Talwalkar, Courant Institute of Mathematical Sciences
Indraneel Mukherjee and Robert E. Schapire, Princeton University
Anil Raj and Chris H. Wiggins, Columbia University
Victor S. Sheng, Foster Provost and Panagiotis G. Ipeirotis, New York University
Pannagadatta K. Shivaswamy and Tony Jebara, Columbia University
12:45 PM Lunch & Poster Session
2:30 PM Tony Jebara, Columbia University
3:15 PM Robert Kleinberg, Cornell University
4:00 PM Student Award Winner Announcement & Closing Remarks
Speaker Abstracts
A Statistical Perspective on Cellular Growth
Edo Airoldi, Princeton University
Maintaining balanced growth in a changing environment is a fundamental systems-level challenge for cellular physiology, particularly in microorganisms. While the complete set of regulatory and functional pathways supporting growth and cellular proliferation are not yet known, portions of them are well understood. In particular, cellular proliferation is governed by mechanisms that are highly conserved from unicellular to multicellular organisms, and the disruption of these processes in metazoans is a major factor in the development of cancer. In this talk, we will introduce statistical and computational methods to identify quantitative aspects of the regulatory mechanisms underlying cell proliferation in Saccharomyces cerevisiae. We find that the expression levels of a small set of genes accurately predict the instantaneous growth rate of any cellular culture, robust to changing biological conditions, experimental methods, and technological platforms. Our model also predicts growth rates for the related yeast Saccharomyces bayanus and the highly diverged yeast Schizosaccharomyces pombe, suggesting that the underlying regulatory signature is conserved across a wide range of unicellular evolution.
We investigate the biological significance of the identified gene expression signature from multiple perspectives: by perturbing the regulatory network through the Ras/ cAMP/PKA pathway, observing strong up-regulation of growth rate even in the absence of appropriate nutrients, and by discovering potential transcription factor binding sites enriched in growth-correlated genes. Most importantly, statistical and computational methods enable substantive biological insights about growth at instantaneous time scales inaccessible by direct experimental methods.
Value Injection Queries for Circuit Learning
Dana Angluin, Yale University
We survey results on algorithms to learn Boolean, analog and probabilistic circuits using value injection queries. A value injection query is a kind of enhanced membership query, in which we may control the values on interior wires, as well as on input wires of the circuit, but still may only observe the values on output wire(s) of the circuit. This type of query is inspired by the capabilities of gene suppression and gene over-expression in studying the structure of gene regulatory networks.
We consider the theoretical power of such queries in learning Boolean circuits, where we give polynomial time algorithms to learn circuits with bounded fan-in and logarithmic depth, as well as unbounded fan-in constant depth circuits over AND, OR and NOT. For analog circuits, a topological parameter, the shortcut width of the circuit, turns out to be a key to its efficient learnability. Finally, for probabilistic circuits (equivalently, Bayesian networks) we can generalize the Boolean case for 0/1 values, but we also encounter novel phenomena. This talk describes joint work with James Aspnes, Jiang Chen, David Eisenstat, Lev Reyzin, and Yinghua Wu; relevant papers may be found on the webpage of James
Embedding, Clustering and Matching with Graphs of GPS Data
Tony Jebara, Columbia University
Many machine learning tasks can naturally be framed as problems on graphs. These tasks include dimensionality reduction, clustering and classification. I will describe matching algorithms that recover graphs from data, minimum volume embedding algorithms that recover low dimensional visualizations from graphs and new spectral algorithms that partition graphs into pieces.
At Sense Networks, we have been building graphs from spatio-temporal location data from many GPS equipped phones and devices. One example is a graph or network of places in the city that shows similarity between different locations and how active they are right now. Sense also builds a network of users showing how similar person X is to person Y by comparing their movement trails or histories. Embedding and clustering these graphs reveals interesting trends in behavior and tribes of people that are far more detailed than traditional census demographics. With machine learning algorithms applied to these human activity graphs, it becomes possible to make predictions for advertising, marketing and collaborative recommendation.
Multi-Armed Bandit Problems in Metric Spaces
Robert Kleinberg, Cornell University
Multi-armed bandit problems constitute a well-studied abstraction of the exploration/exploitation tradeoffs inherent in many sequential decision making problems. A broad range of computing applications require bandit algorithms with a large but structured set of alternatives. Often this structure takes the form of a metric: a distance function expressing the decision-maker's prior knowledge that certain alternatives will have similar payoffs. This talk focuses on two such applications, one in electronic commerce and the other in web advertising. We will show how both applications can be formulated as special cases of a general problem, the "Lipschitz multi-armed bandit problem," which generalizes the classical multi-armed bandit problem by allowing for a large (possibly uncountable) decision set comprising the points of a metric space. We will define an invariant that precisely determines the performance of the best possible algorithm for this problem in a given metric, and we will describe an algorithm that meets this bound. This is joint work with Alex Slivkins and Eli Upfal.
Posters
Hierarchial Bayesian Models of Categorical Data Annotation
Bob Carpenter, Alias-I Inc.
Sparse Regression and Model Degeneracy in fMRI
Melissa K. Carroll, Guillermo A. Cecchi, Irina Rish, Rahul Garg and A. Ravi Rao, Princeton University
Automatically Extracting Social Networks from Unstructured Text
Jonathan Chang, Jordan Boyd-Graber and David M. Blei, Princeton University
Sample Selection Bias Correction Theory
Corinna Cortes, Mehryar Mohri, Michael Riley and Afshin Rostamizadeh, Google Research
Regret Minimization with Concept Drift
Koby Crammer, Eyal Even-Dar, Yishay Mansour and Jennifer Wortman, University of Pennsylvania
Ranking Electrical Feeders of the New York Power Grid
Philip Gross, Ansaf Salleb-Aouissi, Haimonti Dutta and Albert Boulanger, Columbia University
Automatically Marking Houses in Rural Satellite Images of UN Millennium Villages in Africa
Roy Han, Columbia University
Large Margin Transformation Learning
Andrew G. Howard and Tony Jebara, Columbia University
Learning Directly from Compressed Sensed Data, Maching Learning and Compressed Sensing Benefits
Sina Jafarpour, Princeton University
Scaling Up Linear SVM Classifiers Using Confidence-Based Boosting, A Theoretical Analysis Based on Rademacher Complexity
Sina Jafarpour, Princeton University
Learning Animal Movement Models and Location Estimates Using HMMs
Berk Kapicioglu, Robert E. Schapire, Martin Wikelski and Tamara Broderick, Princeton University
High-Performance Analysis of Sequences
Pavel Kuksa, Pai-Hsi Huang and Vladimir Pavlovic, Rutgers University
Fast Feature Selection for Reinforcement-Learning-Based Spoken Dialog Management: A Case Study
Lihong Li, Jason D. Williams and Suhrid Balakrishnan, Rutgers University
Knows What It Knows: A Framework for Self-Aware Learning
Lihong Li and Thomas J. Walsh, Rutgers University
Learning Regulatory Motifs from Gene Expression Trajectories Using Graph-Regularized Partial Least Square Regression
Xuejing Li, Chris H. Wiggins, Valerie Reinke and Christina Leslie, Columbia University
Reducing Statistical Dependencies in Natural Images Using Radial Gaussianization
Siwei Lyu and Eero P. Simoncelli, University at Albany, SUNY
Comparing SVM and Convolutional Networks for Epileptic Seizure Prediction from EEG
Piotr W. Mirowski, Yann LeCun, Deepak Madhavan and Ruben Kuzniecky, Courant Institute of Mathematical Sciences
A Dynamical Factor Graph with Latent Variables for Time Series Prediction
Piotr W. Mirowski and Yann LeCun, Courant Institute of Mathematical Sciences
Improved Bounds for the Nyström Method
Mehryar Mohri and Ameet Talwalkar, Courant Institute of Mathematical Sciences
Learning with Continuous Experts Using Drifting Games
Indraneel Mukherjee and Robert E. Schapire, Princeton University
PAC-MDP Reinforcement Learning with Bayesian Priors
Ali Nouri and Lihong Li, Rutgers University
An Information-Theoretic Derivation of Min-Cut Based Graph Partitioning
Anil Raj and Chris H. Wiggins, Columbia University
Mining Retail Data for Targeting Customers with Headroom
Madhu Shashanka and Michael Giering, Mars Inc.
Graph Embedding with Global Structure Preserving Constraints
Blake Shaw and Tony Jabara, Columbia University
Improving Data Quality and Data Mining Using Multiple, Noisy Labelers
Victor S. Sheng, Foster Provost and Panagiotis G. Ipeirotis, New York University
A Heuristic to Enable Auditing Decisions in Travel & Entertainment Expense Management
Anshul Sheopuri, Jose Gomes, Sai Zeng, Paolina Centonze and Ioana Boier-Martin, IBM T J Watson Rsearch Center
Relative Margin Machines
Pannagadatta K. Shivaswamy and Tony Jebara, Columbia University
Efficient Learning of Action Schemas and Web-Service Descriptions
Thomas J. Walsh, Rutgers University