Abstracts
Lessons from the Netflix Prize
Robert Bell, PhD, AT&T Labs Research
In October 2006, the DVD rental company Netflix released more than 100 million user ratings of movies for a competition to predict users’ ratings based on prior ratings. One allure to data analysts around the world was a $1,000,000 prize for a team achieving a ten percent reduction in root mean squared prediction error relative to Netflix’s current algorithm. The size of the data (over 17,000 movies and 480,000 users) and the nature of human-movie interactions produced many modeling challenges. After describing some of the techniques in use and advances spurred by the competition, I will offer lessons and raise some questions about building and regularizing massive prediction models, the role of statistics versus computer science in such endeavors, and prizes as a way to advance science. This is joint work with Chris Volinsky and Yehuda Koren, current and former colleagues at AT&T Labs-Research.
A New Theoretical Framework for Clustering
Avrim Blum, PhD, Carnegie Mellon University
Problems of clustering data come up in many different areas throughout science. Theoretical treatments of clustering have generally been of two types: either on algorithms for (approximately) optimizing various distance-based quantities such as k-median, k-means, and min-sum objectives, or on clustering under probabilistic "generative model"
assumptions such as mixtures of Gaussian or related distributions.
In this work we propose a new approach to analyzing the problem of clustering. Building on models used in computational learning theory, we consider the goal of approximately recovering an unknown target clustering from given pairwise similarity information, without making generative-model assumptions about the data. Instead, we ask: what relations between the similarity information and the desired clustering are sufficient to cluster well -- these relations taking the role of the "concept class" in learning theory. We find that if we are willing to relax our goals a bit (for example, allow the algorithm to produce a hierarchical clustering that we will call successful if it contains a pruning that is close to the correct
answer) then this leads to a number of interesting graph-theoretic and game-theoretic properties that are sufficient for an algorithm to succeed. We show we can also produce accurate clusterings under implicit assumptions made when considering approximation algorithms for distance-based objectives such as k-median, k-means, and min-sum.
In fact, in this case we can do as well as if we had been able to approximate such objectives to values that are NP-hard to achieve.
This talk is based on work joint with Maria-Florina Balcan, Anupam Gupta, and Santosh Vempala.
Support Vector Machines for Predicting Structured Outputs
Thorsten Joachims, PhD, Cornell University
Over the last decades, much of the research on discriminative learning has focused on problems like classification and regression, where the prediction is a single univariate variable. But what if we need to predict complex objects like trees, orderings, or alignments? Such problems arise, for example, when a natural language parser needs to predict the correct parse tree for a given sentence, when one needs to optimize a multivariate performance measure like the F1-score, or when predicting the alignment between two proteins.
This talk discusses a support vector approach and algorithm for
predicting such complex objects. It generalizes conventional
classification SVMs to a large range of structured outputs and
multivariate loss functions. While the resulting training problems
have exponential size, there is a simple algorithm that allows
training in polynomial (or in many cases linear) time. The algorithm
is implemented in the SVM-Struct software and empirical results will
be given for several examples.
On Noise-Tolerant Learning using Linear Classifiers
Phil Long, PhD, Google
This talk is about learning using linear hypotheses in the
presence of noise, including the following topics:
* New algorithms that tolerate a lot of "malicious noise" given
constraints on a probability distribution generating the examples.
* The ability of linear classifiers to approximate the Bayes optimal
error rate for some tree-structured two-layer sources with the
class designation at the root, the observed variables at the leaves,
and some hidden variables in between.
* Limitations on the noise-tolerance of some boosting algorithms
based on convex optimization.
(This is joint work with Nader Bshouty, Adam Klivans and Rocco Servedio.)