Probabilistic Topic Models of Text and Users
Probabilistic topic models provide a suite of tools for analyzing large document collections. Topic modeling algorithms can discover the latent themes that underlie the documents, and identify how each document exhibits those themes. Topic modeling can be used to help explore, summarize, and form predictions about documents.
Traditional topic modeling algorithms take a document collection as input and analyze the texts to estimate its latent thematic structure. But for many collections, we have an additional kind of data: how people use the documents. (As examples, consider weblog data or purchase histories.) In this talk, I will describe our recent research on simultaneously analyzing texts and the corresponding user data.
First I will describe collaborative topic models for document recommendation. Unlike classical matrix factorization, these models give interpretable dimensions to user interests and can form recommendations about sparsely rated or previously unrated items.
Then I will describe a model of legislative history. (In this data we consider lawmakers' votes on bills as a kind of "user data.") Issue-adjusted ideal point models capture how a lawmaker's vote can deviate from her usual voting pattern, using the text of the bill to encode the issue under discussion.
With these three models I will demonstrate how texts can help us make better predictions of what users will do and how user data can give us information about what the texts are about.
This is joint work with Chong Wang and Sean Gerrish.
Bio: David Blei is an associate professor of Computer Science at Princeton University. He received his PhD in 2004 at U.C. Berkeley and was a postdoctoral fellow at Carnegie Mellon University. His research focuses on probabilistic topic models, Bayesian nonparametric methods, and approximate posterior inference. He works on a variety of applications, including text, images, music, social networks, and scientific data.
The Colloquium is supported by generous contributions from the Bloomberg, Information Builders, Inc., and Netlogic, Inc.