You are here:
Vanderbilt Biostatistics Wiki
>
Main Web
>
Seminars
>
RClinic
>
MemoryEfficientGeneration
(21 Nov 2006,
TheresaScott
)
(raw view)
E
dit
A
ttach
---+ Generating data structures in a memory efficient manner We often generate data structures, such as vectors or data frames, in =for= loops or in our own used-defined functions. Unfortunately, if you're not careful, generating a data structure can be very memory intensive. Specifically, in each iteration of a =for= loop we often concatenate the next element of the vector onto the existing vector to generate the final vector. However, every time you do this, R implicitly copies the existing vector and then adds the additional element. Therefore, you are using 2=n=+1 the amount of memory, where =n= is the length of the vector during a specific iteration of the =for= loop, just to add a single element to the vector. Depending on the length =n=, this can be all of your memory. A much more efficient way of generating a vector, is to define an "empty" vector of the _final_ length, if you know what this final length will be. For example, suppose we want to generate a numeric vector of 100 elements. Instead of, <highlight> a <- NULL for (i in 1:100) { a <- c(a, rnorm(1)) } </highlight> We can do the following: <highlight> a <- numeric(100) a a[] <- NA for(i in 1:100) { a[i] <- rnorm(1) } </highlight> The =numeric()= function generates a _numeric_ vector of specified length, where each element has a value =0=. We could have also used the =character()= function to generate a _character_ vector of specified length, where each element has a value =""=. In either case, we can easily replace all of the elements of the vector with =NA= using the code =a[] <- NA=. We can use a similar process to efficiently generate a data frame --- i.e., have defined dimensions of the data frame. For example, <highlight> c <- data.frame(a = numeric(100), b = character(100)) c[ , ] <- NA </highlight> If you wanted a completely numeric (or completely character) data frame, you could have also done the following: <highlight> b <- numeric(100) attr(b, "dim") <- c(10,10) c <- as.data.frame(c) c[ , ] <- NA </highlight>
E
dit
|
A
ttach
|
P
rint version
|
H
istory
: r1
|
B
acklinks
|
V
iew topic
|
Edit
w
iki text
|
M
ore topic actions
Topic revision: r1 - 21 Nov 2006,
TheresaScott
Main
Department Home Page
Biostatistics Graduate Program
Vanderbilt University Medical Center
Main Web
Main Web Home
Search
Recent Changes
Changes
Topic list
Biostatistics Webs
Archive
Main
Sandbox
System
Register
|
Log In
Copyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki?
Send feedback