Serialization

We have a custom serialization scheme which is designed for performance rather than compatibility. It does not perform type checking, It does not perform pointer tracking, and has only limited support across platforms. It has been tested, and should be compatible across x86 platforms.

There are two serialization classes graphlab::oarchive and graphlab::iarchive. The former does output, while the latter does input. To include all serialization headers, include <graphlab/serialization/serialization_includes.hpp>.

Basic serialize/deserialize

To serialize data to disk, you just create an output archive, and associate it with a file stream.

std::ofstream fout("file.bin", std::fstream::binary);
graphlab::oarchive oarc(fout);

The stream operators are then used to write data into the archive.

int i = 10;
double j = 20;
std::vector v(10,1.0);

oarc << i << j << v;

To read back, you use the iarchive with an input file stream, and read back the variables in the same order:

std::ifstream fin("file.bin", std::fstream::binary);
graphlab::iarchive iarc(fout);
  
int i;
double j;
std::vector v;

iarc >> i >> j >> v;

All basic datatypes are supported. The four STL containers std::map, std::set, std::list and std::vector are supported as long as the contained type can be serialized. That is to say, it will correctly serialize a vector of a set of integers.

User Structs and Classes

To serialize a struct/class, all you need to do is to define a public load/save function. For instance:

class TestClass{
 public:
  int i, j;
  std::vector<int> k;

  void save(oarchive& oarc) const {
    oarc << i << j << k;
  }

  void load(iarchive& iarc) {
    iarc >> i >> j >> k;
  }
};

After which, the standard stream operators as described in the previous section will work fine. STL containers of TestClass will work as well.

POD (Plain Old Data) types

POD types are data types which occupy a contiguous region in memory. For instance, basic types (double, int, etc), or structs which contains only basic types. Such types can be copied or replicated using a simple mem-copy operation and is a great candidate for acceleration during serialization / deserialization.

Unfortunately, there is not a simple way to detect or test if any given type is POD or not. C++ TR1 defines an is_pod feature, but this feature is yet implemented in some compilers. We therefore defined our own gl_is_pod<...> feature which allows the user to explicitly express that a particular type is a POD type. gl_is_pod can be further extended in the future when compilers provide better support for std::is_pod.

To use gl_is_pod, we consider the following Coordinate struct.

struct Coordinate{
  int x, y, z;
};

This struct can be defined to be a POD type using an accelerated serializer by defining:

namespace graphlab {
  struct gl_is_pod<Coordinate> {
    BOOST_STATIC_CONSTANT(bool, value = true);
  };
}

Now, Coordinate variables, or even vector<Coordinate> variables will serialize/deserialize faster, making use of direct mem-copies.

A caveat of this fast serialization mechanism is that the resulting archive may not be cross platform since it forces a particular length for the integers inside the struct. There may also be issues using the same archive between compilers or different alignment options.

Any

graphlab::any is a variant type derived from Boost Any, but extended to support the GraphLab serialization system. See graphlab::any for details.