R Internals
This page is about how R stores objects internally. I've had to re-figure this out several times, so I thought a TWiki page about this might be prudent.
SEXP
All R objects are accessible in C via the
SEXP
type.
SEXP
is just a pointer defined by a typedef, and you can access the actual data inside an SEXP object by using macros which call other macros which use some typedefs of macros and other typedefs, etc. (basically, a real pain to understand). Here's an attempted trace of what
SEXP
actually is:
SEXP
is a pointer to an
SEXPREC
object:
typedef struct SEXPREC *SEXP;
SEXPREC
is a struct:
typedef struct SEXPREC {
SEXPREC_HEADER;
union {
struct primsxp_struct primsxp;
struct symsxp_struct symsxp;
struct listsxp_struct listsxp;
struct envsxp_struct envsxp;
struct closxp_struct closxp;
struct promsxp_struct promsxp;
} u;
} SEXPREC, *SEXP;
SEXPREC_HEADER
is a preprocessor-defined variable:
#define SEXPREC_HEADER \
struct sxpinfo_struct sxpinfo; \
struct SEXPREC *attrib; \
struct SEXPREC *gengc_next_node, *gengc_prev_node
spxinfo_struct
looks like this:
struct sxpinfo_struct {
SEXPTYPE type : 5;
unsigned int obj : 1;
unsigned int named : 2;
unsigned int gp : 16;
unsigned int mark : 1;
unsigned int debug : 1;
unsigned int trace : 1;
unsigned int fin : 1; /* has finalizer installed */
unsigned int gcgen : 1; /* old generation number */
unsigned int gccls : 3; /* node class */
}; /* Tot: 32 */
SEXPTYPE
is an integer that defines what this object is (list, string vector, numeric vector, etc.). The other variables in
sxpinfo_struct
seem to be flags of some kind, which I don't really care to understand at this time.
The union in
SEXPREC
(called
u
) can contain several structs, which I assume depend on what type the R object happens to be.
DATAPTR
The
DATAPTR
macro is used to access the actual data of an
SEXP
object. Here's what I found by tracing
DATAPTR
.
DATAPTR
is called from lots of macros such as
REAL
and
INTEGER
, where
x
is an
SEXP
object:
#define CHAR(x) ((char *) DATAPTR(x))
#define LOGICAL(x) ((int *) DATAPTR(x))
#define INTEGER(x) ((int *) DATAPTR(x))
#define RAW(x) ((Rbyte *) DATAPTR(x))
#define COMPLEX(x) ((Rcomplex *) DATAPTR(x))
#define REAL(x) ((double *) DATAPTR(x))
DATAPTR
is also a preprocessor-defined macro:
#define DATAPTR(x) (((SEXPREC_ALIGN *) (x)) + 1)
SEXPREC_ALIGN
is a typedef'd union:
typedef union { VECTOR_SEXPREC s; double align; } SEXPREC_ALIGN;
VECTOR_SEXPREC
is a struct that is a mini-version of
SEXPREC
for use with vectors:
typedef struct VECTOR_SEXPREC {
SEXPREC_HEADER;
struct vecsxp_struct vecsxp;
} VECTOR_SEXPREC, *VECSEXP;
Getting Data
So how do you get at the actual data? It depends on the type of R object you're dealing with.
Reals and Integers
double *myArray = REAL(some_SEXP_object);
which is equivalent to:
double *myArray = ((double *) (((SEXPREC_ALIGN *) (some_SEXP_object)) + 1))
What this code does is take an
SEXP
object, which is really a pointer to an
SEXPREC
struct, and cast it as an
SEXPREC_ALIGN
pointer.
SEXPREC_ALIGN
is either another pointer or a double. So what I think is happening here, is that by using pointer arithmetic, you magically get a pointer of type
double
that points to the actual data you're trying to access. I don't completely understand this yet, but I'm working on it.
Relevant files
Here is a list of R files I waded through to get this information (relative to an unpacked R source tree):
The attached files are taken from the R-2.2.1 source tree.