Select Lab Publications

Turning Down the Noise in the Blogosphere (2009)

By: Khalid El-Arini, Gaurav Veda, Dafna Shahaf, and Carlos Guestrin

Abstract: In recent years, the blogosphere has experienced a substantial increase in the number of posts published daily, forcing users to cope with information overload. The task of guiding users through this flood of information has thus become critical. To address this issue, we present a principled approach for picking a set of posts that best covers the important stories in the blogosphere.

We define a simple and elegant notion of coverage and formalize it as a submodular optimization problem, for which we can efficiently compute a near-optimal solution. In addition, since people have varied interests, the ideal coverage algorithm should incorporate user preferences in order to tailor the selected posts to individual tastes. We define the problem of learning a personalized coverage function by providing an appropriate user-interaction model and formalizing an online learning framework for this task. We then provide a no-regret algorithm which can quickly learn a user's preferences from limited feedback.

We evaluate our coverage and personalization algorithms extensively over real blog data. Results from a user study show that our simple coverage algorithm does as well as most popular blog aggregation sites, including Google Blog Search, Yahoo! Buzz, and Digg. Furthermore, we demonstrate empirically that our algorithm can successfully adapt to user preferences. We believe that our technique, especially with personalization, can dramatically reduce information overload.

Download Information
Khalid El-Arini, Gaurav Veda, Dafna Shahaf, and Carlos Guestrin (2009). "Turning Down the Noise in the Blogosphere." ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD). pdf long talk      
BibTeX citation

title = {Turning Down the Noise in the Blogosphere},
author = {Khalid El-Arini and Gaurav Veda and Dafna Shahaf and Carlos Guestrin},
booktitle = {ACM SIGKDD Conference on Knowledge Discovery and
Data Mining (KDD)},
year = 2009,
month = {June},
address = {Paris, France},
wwwfilebase = {kdd2009-elarini-veda-shahaf-guestrin},
wwwtopic = {Observation Selection},

full list