Designing and implementing efficient and provably correct parallel machine learning (ML) algorithms can be very challenging. Existing high-level parallel abstractions like MapReduce are often insufficiently expressive while low-level tools like MPI and Pthreads leave ML experts repeatedly solving the same design challenges. By targeting common patterns in ML, we developed GraphLab, which improves upon abstractions like MapReduce by compactly expressing asynchronous iterative algorithms with sparse computational dependencies while ensuring data consistency and achieving a high degree of parallel performance.
The popular MapReduce abstraction, is defined in two parts, a Map stage which performs computation on indepedent problems which can be solved in isolation, and a Reduce stage which combines the results.
GraphLab provides a similar analog to the Map in the form of an Update Function. The Update Function however, is able to read and modify overlapping sets of data (program state) in a controlled fashion as defined by the user provided data graph. The user provided data graph represents the program state with arbitrary blocks of memory associated with each vertex and edges. In addition the update functions can be recursively triggered with one update function spawning the application of update functions to other vertices in the graph enabling dynamic iterative computation. GraphLab uses powerful scheduling primitives to control the order update functions are executed.
The GraphLab analog to Reduce is the Sync Operation. The Sync Operation also provides the ability to perform reductions in the background while other computation is running. Like the update function sync operations can look at multiple records simultaneously providing the ability to operate on larger dependent contexts.
For more details on the GraphLab abstraction see:
GraphLab project received an operating grant from Amazon Elastic Cloud (EC2) read more.
We will be hosting the Big Learning NIPS’11 Workshop on Algorithms, Systems, and Tools for Learning at Scale. We are excited to see some of the latest work on systems like GraphLab, as well as other software, tools, algorithms, and models. Please consider presenting your recent work (see submission +guidelines).
We are happy to announce that we have completed a license change from the LGPL to the Apache license. This takes effect as of repository version 1238 (June 22nd 2011).
The license switch was performed to provide greater flexibility for our users. Unlike the LGPL, Apache license is not a "copyleft" license and does not require all source modifications to be open sourced. In particular, this allows companies to make modifications, and make use of GraphLab without worrying about intricate GPL compliance issues.
Good news everyone! After lots of coffee we have implemented a simple solution to your installation troubles. The configure script now takes an additional argument:
./configure --bootstrapwhich determines whether you have properly installed boost and cmake (our two dependencies) and if necessary automatically downloads and installs (locally) boost and cmake. While you still need to have make and gcc this will hopefully simplify installation.
There are also a good number of improvements especially with regards to the Shared Data system. The documentation will be updated shortly
A lot of people are finding that GraphLab is quite difficult to set up. We are now trying to simplify the installation process. The release should be updated in a few weeks.
We have released version v1.470 of GraphLab. See the Download page for details.
We will be releasing a major update in 2-3 weeks. The goal of this release is to significantly improve useability and internal code quality. We are also working on a Java interface usng JNI, as well as a Matlab interface which uses EMLC to compile Matlab code to C. Watch this space for updates!