GraphLab

A New Parallel Framework for Machine Learning

Downloading and Installing GraphLab


Detailed Ubuntu 11.04 (Natty 64 bit) instructions

Detailed MAC OS X 10.6 instructions

Detailed Amazon EC2 instructions

Detailed BlackLight instructions

Detailed Instructions for Gentoo Linux

Detailed instructions for RedHat Enterprise Server 6.1

Detailed instructions for Fedora Core 16

Detailed instructions for CentOS 6.0

Detailed instructions for Free-BSD 8.2

Detailed instruction for Ubuntu Maverick 10.10 32 bit

Downloading GraphLab

The current version of GraphLab was tested using Ubuntu Linux 9.04, 10.04, 10.10, 11.04 (Natty), 11.10 (Oneiric) as well as Mac OSX 10.5, 10.6 and 10.7 (Lion), CentOS 5, Linode, RedHat Enterprise 6, Amazon Linux and Gentoo Linux.

There are multiple ways to install GraphLab: compiling tgz src file of stable release, checking out the code from mercurial or running a precompiled GraphLab EC2 AMI:

If you have any trouble in installing or compiling, please use our support page.

You can find support for GraphLab in the GraphLab support group.
User announcements and design discussions are found in GraphLab API group.

Installing GraphLab

Run the ./configure --bootstrap script to create the build directories. If you want to change the default installation location (currently /usr/local) you can pass an additional argument --prefix=/home/me. For more options see ./configure --help .

Once you have run the configure script change directories to release/ and then run make and then make install . This will install the following:

include/graphlab.hpp
The primary GraphLab header
include/graphlab
The folder containing the headers for the rest of the GraphLab library
lib/libgraphlab.a
The main GraphLab binary
Note: If you are interested in trying out the collaborative filtering library, you should install itpp as well, so follow the instructions here instead. (under the section "Installation").
Note: for MAC OS users, you need to have cmake installed. Follow the instructions here.

Dependencies

GraphLab currently has a few required dependencies as well as a one optional (but recommended) dependency listed below:

LibBoost (>= 1.37) [Required]
Boost is used for program options parsing, convenient range concepts, foreach macros, and random number generation.
CMake (>= 2.6) [Required]
We rely on the CMake build system to manage the library dependency search and generation of Makefiles.
gcc (>= 4.2) [Required]
Required for compiling GraphLab.
Kyoto Cabinet [Optional]
Disk Graph format
Open MPI or MPICH2 [Optional]
Required for RPC / Distributed GraphLab
Google Performance Tools (>= 1.4) [Optional]
The Google performance tools provide the TCMalloc memory manager which improves upon the standard memory manager by reducing thread contention on allocation. Standard memory managers typically incur a relatively large penalty when multiple threads try to allocate memory simultaneously

GraphLab is released under the Apache license. We only ask that if you use GraphLab in your research please cite our paper.

@inproceedings{Low+al:uai10graphlab,
  title = {GraphLab: A New Parallel Framework for Machine Learning},
  author = {Yucheng Low and 
            Joseph Gonzalez and 
            Aapo Kyrola and 
            Danny Bickson and 
            Carlos Guestrin and 
            Joseph M. Hellerstein},
  booktitle = {Conference on Uncertainty in Artificial Intelligence (UAI)},
  month = {July},
  year = {2010}
}

Source Tree Organization

src/
The GraphLab library source.
demoapps/
Some demo applications.
extern/
Any external dependencies that we modified to include in GraphLab
cmake/
The cmake configuration files

Creating you first application (optional)

GraphLab has many implemented algorithms. You can fine them here.

The best way to start writing your own application is to take a close look at one of the tutorials: here.
You can take one of sample algorithms, like PageRank (found under demoapps/pagerank) and use the code as a template for your own application.
Once you have tried to run one of the implemented methods, you can start building your own project. In the demoapps/ directory create your own sub-directory, for example demoapps/my_app. Then in your folder lets assume you have the program hello_world.cpp. To build this program you will need to create demoapps/my_app/CMakeLists.txt and add your target:

# Contents of demoapps/my_app/CMakeLists.txt project(GraphLab) # Add an executable add_executable(hello_world hello_world.cpp) # Attach extra external libraries like # tcmalloc (optional) for improved performance # TARGET_LINK_LIBRARIES(hello_world tcmalloc)

You then need to run ./configure --bootstrap script in the base of the source tree. The configure script will create a debug/, release/, and profile/ directories each with different build configurations.

Once you have run the configure script you can switch to any of the build directories and run make which will build your app in the location <build_dir>/apps/my_app .

Installing and Linking against GraphLab

Once you have setup your hello world appplication as part of GraphLab source tree, you can compile your program by running make on either the release/ or debug/ folders.
NOTE: You application folder must reside inside the apps folder, for example apps/myapps/ or else the include and link path will not be set correctly.

Alternatively, if you would like to create an application outside the GraphLab source tree, you can compaile using
g++ hello_world.cpp -o hello -lboost_program_options -lgraphlab -I/path/to/graphlab/include -L/path/to/graphlab/release/src/graphlab
Where -I is pointing to the include folder path (where graphlab.hpp is found) , and -L is pointing to the location of the folder where libraphlab.a is found.

Known Installation Issues

Boost configuration

If running ./configure fails because of Boost libraries are not found, you might need to declare environment variable BOOST_ROOT which points to the installation directory of Boost. You can do this using:

env BOOST_ROOT=[location of boost installation] ./configure

Alternatively, you can try using our automatic dependency installing tools by running

./configure --bootstrap
Please let us know if you have trouble configuring GraphLab on your system.

Mac: on Mac OS X, you also need to include Boost libraries in the list of dynamic libraries defined in environment variable DYLD_LIBRARY_PATH. Here is an example:

declare -x DYLD_LIBRARY_PATH=":${BOOST_ROOT}/stage/lib:${DYLD_LIBRARY_PATH}"

Acknowledgements