Generating data structures in a memory efficient manner

We often generate data structures, such as vectors or data frames, in for loops or in our own used-defined functions. Unfortunately, if you're not careful, generating a data structure can be very memory intensive. Specifically, in each iteration of a for loop we often concatenate the next element of the vector onto the existing vector to generate the final vector. However, every time you do this, R implicitly copies the existing vector and then adds the additional element. Therefore, you are using 2=n=+1 the amount of memory, where n is the length of the vector during a specific iteration of the for loop, just to add a single element to the vector. Depending on the length n, this can be all of your memory.

A much more efficient way of generating a vector, is to define an "empty" vector of the final length, if you know what this final length will be.

For example, suppose we want to generate a numeric vector of 100 elements. Instead of, a <- NULL for (i in 1:100) { a <- c(a, rnorm(1)) } We can do the following: a <- numeric(100) a a[] <- NA

for(i in 1:100) { a[i] <- rnorm(1) } The numeric() function generates a numeric vector of specified length, where each element has a value 0. We could have also used the character() function to generate a character vector of specified length, where each element has a value "". In either case, we can easily replace all of the elements of the vector with NA using the code a[] <- NA.

We can use a similar process to efficiently generate a data frame --- i.e., have defined dimensions of the data frame. For example, c <- data.frame(a = numeric(100), b = character(100)) c[ , ] <- NA

If you wanted a completely numeric (or completely character) data frame, you could have also done the following: b <- numeric(100) attr(b, "dim") <- c(10,10) c <- as.data.frame(c) c[ , ] <- NA

This topic: Main > WebHome > Seminars > RClinic > MemoryEfficientGeneration
Topic revision: 21 Nov 2006, TheresaScott

Copyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback