Spawning and Initialization

Spawning is the process of starting an instance of GraphLab RPC on seperate machines. GraphLab RPC supports two spawning methods: MPI or rpcexec.py (a script in the tools/ directory). The MPI method is strongly recommended, though it does require all machines to have shared access to a common file system.

Spawning with MPI

GraphLab was tested with MPICH2, but should also with OpenMPI. Refer to the documentation for MPICH2 or OpenMPI to set up MPI and make sure that you can run the basic test MPI programs (MPICH2 comes with an mpdringtest).

No additional configuration is necessary to spawn a GraphLab RPC program with MPI.

The GraphLab RPC program should begin with:

#include <graphlab/rpc/dc.hpp>
#include <graphlab/rpc/dc_init_from_mpi.hpp>
using namespace graphlab;

int main(int argc, char ** argv) {
  mpi_tools::init(argc, argv);

  dc_init_param param;
  // set additional param options here
  if (init_param_from_mpi(param) == false) {
    return 0;
  }
  distributed_control dc(param);
  ...
}

In this case, init_param_from_mpi uses the MPI ring to exchange port numbers and set up the RPC communication layer. See dc_init_param for details about additional configuration options.

Spawning with rpcexec.py

rpcexec.py provides a simple way to run a process on a collection of machines, using ssh to communicate between them. rpcexec.py --help provides some basic help.

You will first need to create a host file which is simply a list of host names and IP addresses:

localhost
192.168.1.5
node2
node3
localhost
192.168.1.5
node2
node3

Running rpcexec.py -n [num to start] -f [hostsfile] `command` will read the first execute the command on the first N hosts in the hostfile. For instance in this case, running

rpcexec.py -n 5 -f hostsfile ls

will run the ls bash command twice on the localhost, and once on the three nodes : 192.168.1.5, node2, node3.

rpcexec.py also supports a 'screen' (GNU Screen) mode. Running

rpcexec.py -s lsscreen -n 3 -f hostsfile ls

will create a `screen` session with 3 windows where one window ran `ls` on the localhost, while two other windows sshed into 192.168.1.5 and node2, running the `ls` on each of them.

rpcexec.py will terminate immediately after creating the screen session.

screen -r lsscreen

will display and resume the screen session.

If rpcexec.py will be used to spawn the program, The GraphLab RPC program should begin with:

#include <graphlab/rpc/dc.hpp>
#include <graphlab/rpc/dc_init_from_env.hpp>
using namespace graphlab;

int main(int argc, char ** argv) {
  dc_init_param param;
  // set additional param options here
  if (init_param_from_env(param) == false) {
    return 0;
  }
  distributed_control dc(param);
  ...
}

Since unlike MPI spawning, there is no existing channel for communicating port information between the machines. rpcexec.py therefore uses environment variables to pass information to the GraphLab RPC process. The following two environment variables are used:

A machine will listen on the port 10000 + SPAWNID.

See dc_init_param for details about additional configuration options.

This spawning system is less flexibile due to the fixed port numbering. For instance, a crashed process will keep the port in TIMED_WAIT for a few minutes, preventing the next RPC process from running. This also prevents multiple different GraphLab RPC programs from running on the same set of the machines.

The MPI spawner is therefore the recommended method for starting the RPC system.