Generating data structures in a memory efficient manner

We often generate data structures, such as vectors or data frames, in for loops or in our own used-defined functions. Unfortunately, if you're not careful, generating a data structure can be very memory intensive. Specifically, in each iteration of a for loop we often concatenate the next element of the vector onto the existing vector to generate the final vector. However, every time you do this, R implicitly copies the existing vector and then adds the additional element. Therefore, you are using 2=n=+1 the amount of memory, where n is the length of the vector during a specific iteration of the for loop, just to add a single element to the vector. Depending on the length n, this can be all of your memory.

A much more efficient way of generating a vector, is to define an "empty" vector of the final length, if you know what this final length will be.

For example, suppose we want to generate a numeric vector of 100 elements. Instead of,

a <- NULL
for (i in 1:100) {
   a <- c(a, rnorm(1))
We can do the following:

a <- numeric(100)
a[] <- NA

for(i in 1:100) {
   a[i] <- rnorm(1)
The numeric() function generates a numeric vector of specified length, where each element has a value 0. We could have also used the character() function to generate a character vector of specified length, where each element has a value "". In either case, we can easily replace all of the elements of the vector with NA using the code a[] <- NA.

We can use a similar process to efficiently generate a data frame --- i.e., have defined dimensions of the data frame. For example,

c <- data.frame(a = numeric(100), b = character(100))
c[ , ] <- NA

If you wanted a completely numeric (or completely character) data frame, you could have also done the following:

b <- numeric(100)
attr(b, "dim") <- c(10,10)
c <-
c[ , ] <- NA
Topic revision: r1 - 21 Nov 2006, TheresaScott

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback