GraphLab

A New Parallel Framework for Machine Learning

Overview

The first GraphLab Workshop is coming up! Don't miss the early registration. Mark your calendar - Monday July 9th in San Francisco. 60 companies and 12 universities already confirmed their participation. What about you?

Overview

Designing and implementing efficient and provably correct parallel machine learning (ML) algorithms can be very challenging. Existing high-level parallel abstractions like MapReduce are often insufficiently expressive while low-level tools like MPI and Pthreads leave ML experts repeatedly solving the same design challenges. By targeting common patterns in ML, we developed GraphLab, which improves upon abstractions like MapReduce by compactly expressing asynchronous iterative algorithms with sparse computational dependencies while ensuring data consistency and achieving a high degree of parallel performance.

The popular MapReduce abstraction, is defined in two parts, a Map stage which performs computation on indepedent problems which can be solved in isolation, and a Reduce stage which combines the results.

GraphLab provides a similar analog to the Map in the form of an Update Function. The Update Function however, is able to read and modify overlapping sets of data (program state) in a controlled fashion as defined by the user provided data graph. The user provided data graph represents the program state with arbitrary blocks of memory associated with each vertex and edges. In addition the update functions can be recursively triggered with one update function spawning the application of update functions to other vertices in the graph enabling dynamic iterative computation. GraphLab uses powerful scheduling primitives to control the order update functions are executed.

The GraphLab analog to Reduce is the Sync Operation. The Sync Operation also provides the ability to perform reductions in the background while other computation is running. Like the update function sync operations can look at multiple records simultaneously providing the ability to operate on larger dependent contexts.

For more details on the GraphLab abstraction see:

News

13 March 2012

The first GraphLab workshop will be held early July at the bay area. 35 companies and 10 universities have already confirmed their participation.

31 Dec 2011

In its first year GraphLab was downloaded 2425 times, and is currently used by 57 universities worldwide! read more.

15 Dec 2011

GraphLab receives additional NSF grant of 200,000 cpu hours on BlackLight/Kraken supercomputers. read more.

8 Sept 2011

GraphLab project received an operating grant from Amazon Elastic Cloud (EC2) read more.

30 August 2011

We will be hosting the Big Learning NIPS’11 Workshop on Algorithms, Systems, and Tools for Learning at Scale. We are excited to see some of the latest work on systems like GraphLab, as well as other software, tools, algorithms, and models. Please consider presenting your recent work (see submission +guidelines).

29 June 2011

We are happy to announce that we have completed a license change from the LGPL to the Apache license. This takes effect as of repository version 1238 (June 22nd 2011).

The license switch was performed to provide greater flexibility for our users. Unlike the LGPL, Apache license is not a "copyleft" license and does not require all source modifications to be open sourced. In particular, this allows companies to make modifications, and make use of GraphLab without worrying about intricate GPL compliance issues.

16 June 2011

GraphLab based matrix factorization code is in the 5th place (out more than of 1000 participants) in Yahoo! KDD CUP 2011 - track 1. read more.

6 June 2011

GraphLab user feedback form is up: html. Please take 5 minutes and tell us about your experiance with GraphLab! Be a part of the open source community!

2 June 2011

GraphLab collaborative filtering library has been downloaded by more than 100 unique users. Its main usage is for competing in Yahoo! KDD Cup 2011 - track 1. learn more.

25 Mar 2011

Good news everyone! After lots of coffee we have implemented a simple solution to your installation troubles. The configure script now takes an additional argument:

 ./configure --bootstrap 
which determines whether you have properly installed boost and cmake (our two dependencies) and if necessary automatically downloads and installs (locally) boost and cmake. While you still need to have make and gcc this will hopefully simplify installation.

There are also a good number of improvements especially with regards to the Shared Data system. The documentation will be updated shortly

14 April 2011

GraphLab is named as one of the viable solutions when Map-Reduce fails to scale: In the opening lecture by Prof. Bryant, Dean of Computer Science Department of Carnegie Mellon University, at the 2011 TeraGrid Super Computing Symposium. read more.

21 Mar 2011

A lot of people are finding that GraphLab is quite difficult to set up. We are now trying to simplify the installation process. The release should be updated in a few weeks.

11 Dec 2010

We have released version v1.470 of GraphLab. See the Download page for details.

20 Nov 2010

We will be releasing a major update in 2-3 weeks. The goal of this release is to significantly improve useability and internal code quality. We are also working on a Java interface usng JNI, as well as a Matlab interface which uses EMLC to compile Matlab code to C. Watch this space for updates!

Contacts

Face
Face
Face
Face
Face

Funding

This work was supported by: The views and conclusions of the research performed in the Select Lab are those of the authors, and should not be interpreted as necessarily representing the official policies or endorsements, either express or implied, of the funders, the U.S Government, or any of its agencies.