Beyond Convexity: Submodularity in Machine Learning

Description

Convex optimization has become a main workhorse for many machine learning algorithms during the past ten years. When minimizing a convex loss function for, e.g., training a Support Vector Machine, we can rest assured to efficiently find an optimal solution, even for large problems. In recent years, another fundamental problem structure, which has similar beneficial properties, has emerged as very useful in a variety of machine learning applications: Submodularity is an intuitive diminishing returns property, stating that adding an element to a smaller set helps more than adding it to a larger set. Similarly to convexity, submodularity allows one to efficiently find provably (near-)optimal solutions.
In this tutorial, we will give an introduction to the concept of submodularity, discuss algorithms for optimizing submodular functions and – as the main focus – illustrate their usefulness in solving difficult machine learning problems, such as active learning [6,11] and sparse experimental design [23, 14, 13, 3], informative path planning [2, 24, 17], structure learning [19, 1], clustering [20], influence maximization [9] and ranking [15].

Materials

Outline

  1. Introduction
    • Motivating Applications
    • Why should the Machine Learning community care about submodularity?
  2. Submodular set functions
    • Definitions
    • Intuition: Why are submodular functions useful
    • Examples of submodular functions (entropy [8] and mutual information [10, 14], influence in graphs [9], etc.)
    • Examples of functions which are not submodular
    • Operations preserving submodularity
    • Relationship between submodularity and convexity [16]
  3. Minimizing submodular functions
    • Overview about known results for minimization [22, 7]
    • Queyranne’s algorithm for minimizing symmetric submodular functions [22], applications to factorizing distributions and clustering [20]
    • Learning structure of graphical models [19, 1]
    • The submodular-supermodular procedure with applications to feature selection and structure learning [18]
  4. Maximizing submodular functions
    • The Greedy algorithm [21,32]
    • Lazy evaluations and scaling up to large data sets [28]
    • Applications of maximizing submodular functions
    • Constrained maximization of submodular functions [25]
    • Robust maximization of submodular functions (with applications to Gaussian Process regression) [13]
  5. Conclusions (5 min)
    • Current research directions / Open questions
    • Other pointers (submodularity in games / allocation problems [4,35] / online algorithms [5])

Who should attend

Since submodularity arises in so many different areas of machine learning, this tutorial should be of theoretical and practical interest to a large part of the machine learning community. The tutorial will not require prior knowledge beyond the basic concepts covered in an introductory machine learning class.

Presenters

Andreas Krause is a Ph.D. Candidate at the Computer Science Department of Carnegie Mellon University. He is a recipient of a Microsoft Research Graduate Fellowship, and his research on sensor placement and information acquisition received awards at several conferences (KDD 2007, IPSN 2006, ICML 2005 and UAI 2005). He obtained his Diplom in Computer Science and Mathematics from the Technische Universität München, where his research received the NRW Undergraduate Science Award.

Carlos Guestrin is an assistant professor in the Machine Learning and in the Computer Science Departments at Carnegie Mellon University. Previously, he was a senior researcher at the Intel Research Lab in Berkeley. Carlos received his MSc and PhD in Computer Science from Stanford University in 2000 and 2003, respectively, and a Mechatronics Engineer degree from the Polytechnic School of the University of Sao Paulo, Brazil, in 1998. Carlos' work received awards at a number of conferences and a journal: KDD 2007, IPSN 2005 and 2006, VLDB 2004, NIPS 2003 and 2007, UAI 2005, ICML 2005, and JAIR in 2007. He is also a recipient of the ONR Young Investigator Award, the NSF Career Award, the Alfred P. Sloan Fellowship, the IBM Faculty Fellowship, the Siebel Scholarship and the Stanford Centennial Teaching Assistant Award. Carlos is currently a member of the Information Sciences and Technology (ISAT) advisory group for DARPA.

References

The following references are used in the tutorial. A short high level summary is given, without any claim of completeness.
Last updated: July 8 2008