You are here: Vanderbilt Biostatistics Wiki>Main Web>YamlR (revision 20)EditAttach

R YAML package

The R YAML package implements the Syck YAML parser for R, along with some R methods for converting R objects to YAML.

You can see the development history of this package here.

What is YAML?

YAML is a human-readable markup language. With it, you can create easily readable documents that can be consumed by a variety of programming languages. It's used frequently with Ruby and Ruby on Rails.

Examples

Hash of baseball teams per league:
american:
  - Boston Red Sox
  - Detroit Tigers
  - New York Yankees
national:
  - New York Mets
  - Chicago Cubs
  - Atlanta Braves

Data dictionary specification:
- field: ID
  description: primary identifier
  type: integer
  primary key: yes
- field: DOB
  description: date of birth
  type: date
  format: yyyy-mm-dd
- field: State
  description: state of residence
  type: string

Installation

CRAN

You can install this package directly from CRAN by running (from within R): install.packages('yaml')

Zip/Tarball

  1. Download the appropriate zip file or tar.gz file from Github: https://github.com/viking/r-yaml/downloads
  2. Run R CMD INSTALL followed by the name of the file you downloaded (as root if necessary)

Git

  1. Download the source via git: git clone git://github.com/viking/r-yaml yaml
  2. Run R CMD check yaml to make sure everything is OK.
  3. Run R CMD INSTALL yaml (as root if necessary).

Usage

The yaml packages has two main functions: yaml.load and as.yaml.

yaml.load

The yaml.load function is the YAML parsing function. It accepts a YAML document as a string. Here's a simple example that parses a YAML sequence:
x <- "
- 1
- 2
- 3
"
yaml.load(x)  #=> [1] 1 2 3

Strings

A YAML string is the basic building block of YAML documents. Example of a YAML document with one element:
1.2345

In this case, the string "1.2345" is typed as a float (or numeric) by the parser. yaml.load would return a numeric vector of length 1 for this document.
yaml.load("1.2345")  #=> [1] 1.2345

Sequences

A YAML sequence is a list of elements. Here's an example of a simple YAML sequence:
- this
- is
- a
- simple
- sequence
- of
- strings

If you pass a YAML sequence to yaml.load, a couple of things can happen. If all of the elements in the sequence are uniform, yaml.load will return a vector of that type (i.e. character, integer, real, or logical). If the elements are not uniform, yaml.load will return a list of the elements. No coercion is done by default.

Maps

A YAML map is a list of paired keys and values, or hash, of elements. Here's an example of a simple YAML map:
one: 1
two: 2
three: 3
four: 4

Passing a map to yaml.load will produce a named list by default. That is, keys are coerced to strings. Since it is possible for the keys of a YAML map to be almost anything (not just strings), you might not want yaml.load to return a named list. If you want to preserve the data type of keys, you can pass as.named.list = FALSE to yaml.load. If as.named.list is FALSE, yaml.load will create a keys attribute for the list it returns instead of coercing the keys into strings.

Handlers

yaml.load has the capability to accept custom handler functions. With handlers, you can customize yaml.load to do almost anything you want. Example of handler usage:

integer.handler <- function(x) { as.integer(x) + 123 }
yaml.load("123", handlers = list(int = integer.handler))  #=> [1] 246

Handlers are passed to yaml.load through the handlers argument. The handlers argument must be a named list of functions, where each name is the YAML type that you want to be handled by your function. The functions you provide must accept one argument and must return an R object.

Most of the time, handler functions will be passed a string. In the example above, integer.handler was passed the string "123". However, you can also provide custom handler functions to deal with sequences and maps.

Sequence handlers

Custom sequence handlers will be passed a list of objects. You can then convert the list into whatever you want and return it. Example:

sequence.handler <- function(x) {
  tmp <- as.numeric(x)
  tmp / 5
}
string <- "
- foo
- bar
- 123
- 4.567
"
yaml.load(string, handlers = list(seq = sequence.handler))  #=> [1]      NA      NA 24.6000  0.9134

Map handlers

Custom map handlers work much in the same way as custom list handlers. A map handler function is passed a named list, or a list with a keys attribute (depending on the value of as.named.list). Example:

string <- "
a: 
  - 1
  - 2
b: 
  - 3
  - 4
"
yaml.load(string, handlers = list(map = function(x) { as.data.frame(x) }))

Returns:
  b a
1 3 1
2 4 2

An interesting thing to note in this example is that the b column appears before the a column in the resulting data frame. This is because YAML maps are considered to be hashes, and therefore, order is not preserved. If you want an ordered map, you can use a combination of maps and sequences like so:

- a:
    - 1
    - 2
- b:
    - 3
    - 4

as.yaml

as.yaml is an S3 method used to convert R objects into YAML strings. Example as.yaml usage:
x <- as.yaml(1:5)
cat(x, "\n")

Output from above example:
- 1
- 2
- 3
- 4
- 5

Arguments

Here's the list of as.yaml arguments:
Name Description Default
x the object to convert  
line.sep line separator to use ("\n" or "\r\n") "\n"
indent number of spaces to use for indenting 2
pre.indent number of spaces to shift document 0
column.major determines if data.frames are output as column major TRUE

pre.indent

The pre.indent option will shift the entire document to the right by the number of spaces you specify. For example:
x <- as.yaml(1:5, pre.indent = 4)
cat(x, "\n")

Outputs:
    - 1
    - 2
    - 3
    - 4
    - 5

column.major

The column.major option determines how a data frame is converted into YAML. By default, column.major is TRUE.

Example of as.yaml when column.major is TRUE:
x <- data.frame(a=1:5, b=6:10)
y <- as.yaml(x, column.major = TRUE)
cat(y, "\n")

Outputs:
a:
  - 1
  - 2
  - 3
  - 4
  - 5
b:
  - 6
  - 7
  - 8
  - 9
  - 10

Whereas:
x <- data.frame(a=1:5, b=6:10)
y <- as.yaml(x, column.major = FALSE)
cat(y, "\n")

Outputs:
- a: 1
  b: 6
- a: 2
  b: 7
- a: 3
  b: 8
- a: 4
  b: 9
- a: 5
  b: 10

Additional documentation

For more information, run help(package='yaml') or example('yaml-package') for some examples.

To Do

  • Include named vector support (instead of only named lists)
  • Add date support
Edit | Attach | Print version | History: r23 | r21 < r20 < r19 < r18 | Backlinks | View wiki text | Edit WikiText | More topic actions...
Topic revision: r20 - 16 Jun 2011, JeremyStephens
 

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback