R Binary File Format

R's save() function is able to convert data in your R workspace into a binary format which can be later recovered via the load() function. This page has a little information about the format save() uses when writing files.

Number conversion

R uses the XDR format to save numeric data. The GNU C Library (libc) has functions included to write out this format. See the XDR man page for more information.

Example Program

Here's an example C program that writes out the variable x, which is a vector of reals: c(1,2,3). See main() to get started. It is a simplified version of what R does when you run:

x <- c(1,2,3)
save(x, file="test.rda")

Nearly all of the relevant code for this can be found in saveload.c (see the do_save function) and serialize.c (see the R_serialize function) in the R source tree.

#include <stdio.h> #include <string.h> #include <rpc/xdr.h>

#define R_XDR_INTEGER_SIZE 4 #define R_XDR_DOUBLE_SIZE 8

int pack_flags(type, levels, is_object, has_attr, has_tag) int type; // R object type int levels; int is_object; int has_attr; int has_tag; { int flags;

if (type == 9) { // scalar string type, used for symbol names levels &= (~((1 << 5) | 1)); }

flags = type | (levels << 12); if (is_object) flags |= (1 << 8); if (has_attr) flags |= (1 << 9); if (has_tag) flags |= (1 << 10);

return flags; }

void encode_integer(i, buf) int i; char *buf; { XDR xdrs; int success;

xdrmem_create(&xdrs, buf, R_XDR_INTEGER_SIZE, XDR_ENCODE); success = xdr_int(&xdrs, &i); xdr_destroy(&xdrs); if (success) { printf("encode_integer failed\n"); exit(1); } }

void encode_double(d, buf) double d; char *buf; { XDR xdrs; int success;

xdrmem_create(&xdrs, buf, R_XDR_DOUBLE_SIZE, XDR_ENCODE); success = xdr_double(&xdrs, &d); xdr_destroy(&xdrs); if (success) { printf("encode_double failed\n"); exit(1); } }

void write_data(buf, len, fp) char *buf; int len; FILE *fp; { int res; res = fwrite(buf, sizeof(char), len, fp); if (res = len) { printf("Write failed\n"); exit(1); } }

void write_integer(i, buf, fp) int i; char *buf; FILE *fp; { encode_integer(i, buf); write_data(buf, R_XDR_INTEGER_SIZE, fp); }

void write_double(d, buf, fp) double d; char *buf; FILE *fp; { encode_double(d, buf); write_data(buf, R_XDR_DOUBLE_SIZE, fp); }

int main(argc, argv) int argc; char *argv[]; { FILE *fp; int res; char buf[128];

fp = fopen("test.rda", "w"); if (fp == NULL) { printf("Couldn't open file for writing\n"); return 1; }

// Write magic: XDR_V2 write_data("RDX2\n", 5, fp);

// Write format write_data("X\n", 2, fp);

// Write R version information write_integer(2, buf, fp); // serialization version: 2 write_integer(133633, buf, fp); // Current R version (2.10.1 in this case) write_integer(131840, buf, fp); // Version number for R 2.3.0 (for compatibility reasons, I believe)

// The saved R objects are wrapped in a list of dotted pairs before saving. // Next we write out flags needed for this list. write_integer(pack_flags(2, 0, 0, 0, 1), buf, fp);

// Write the name of the variable we're storing write_integer(1, buf, fp); // symbol type write_integer(pack_flags(9, 33, 0, 0, 0), buf, fp); // symbol flags write_integer(1, buf, fp); // length of name write_data("x", 1, fp); // actual name

// Now write the actual variable data write_integer(pack_flags(14, 0, 0, 0, 0), buf, fp); // vector of reals write_integer(3, buf, fp); // length of vector write_double(1.0, buf, fp); // first value write_double(2.0, buf, fp); // second value write_double(3.0, buf, fp); // third value

// Tell R we're done write_integer(254, buf, fp); fclose(fp);

return 0; }

Compiling

To compile this example program, you only need run:
gcc -o r-save <filename>

You DO NOT need to link to R for this program. In Ubuntu, you need the libc6-dev package installed for the XDR headers.

Running

Simply run:
./r-save

This will create a file called test.rda which you can load() in R.

Tips

The ascii parameter in R's save() function is useful for figuring out the binary file format:

x <- c(1,2,3)
save(x, file="ascii.rda", ascii=TRUE)

Writes the following to ascii.rda:

RDA2
A
2
133633
131840
1026
1
9
1
x
14
3
1
2
3
254

Each line represents a write call in binary format. There are a few differences. The first two lines in binary mode are XDR2 and X. Also, the first two lines have newlines in binary mode, but the rest of the lines don't. In addition, the numbers in the ASCII format are not XDR encoded.

Here's a hexdump of the binary version of the same data:

00000000  52 44 58 32 0a 58 0a 00  00 00 02 00 02 0a 01 00  |RDX2.X..........|
00000010  02 03 00 00 00 04 02 00  00 00 01 00 00 00 09 00  |................|
00000020  00 00 01 78 00 00 00 0e  00 00 00 03 3f f0 00 00  |...x........?...|
00000030  00 00 00 00 40 00 00 00  00 00 00 00 40 08 00 00  |....@.......@...|
00000040  00 00 00 00 00 00 00 fe                           |........|
00000048

Caveats

By default, R's save() function compresses the resulting file. If you want to compare your file to R's file, you may need to decompress it first by using gunzip, or you can call save() with compress = FALSE.

-- JeremyStephens - 05 Mar 2010
Topic revision: r2 - 05 Mar 2010, JeremyStephens
 

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback