R YAML package
The R
YAML package implements the
Syck YAML parser for R, along with some R methods for converting R objects to YAML.
You can see the development history of this package
here.
What is YAML?
YAML is a human-readable markup language. With it, you can create easily readable documents that can be consumed by a variety of programming languages. It's used frequently with
Ruby and
Ruby on Rails.
Examples
Hash of baseball teams per league:
american:
- Boston Red Sox
- Detroit Tigers
- New York Yankees
national:
- New York Mets
- Chicago Cubs
- Atlanta Braves
Data dictionary specification:
- field: ID
description: primary identifier
type: integer
primary key: yes
- field: DOB
description: date of birth
type: date
format: yyyy-mm-dd
- field: State
description: state of residence
type: string
Installation
CRAN
You can install this package directly from CRAN by running (from within R):
install.packages('yaml')
Zip/Tarball
- Download the appropriate zip file or tar.gz file from Github: https://github.com/viking/r-yaml/downloads
- Run
R CMD INSTALL
followed by the name of the file you downloaded (as root if necessary)
Git
- Download the source via git:
git clone git://github.com/viking/r-yaml yaml
- Run
R CMD check yaml
to make sure everything is OK.
- Run
R CMD INSTALL yaml
(as root if necessary).
Usage
The
yaml
packages has two main functions:
yaml.load
and
as.yaml
.
yaml.load
The
yaml.load
function is the YAML parsing function. It accepts a YAML document as a string. Here's a simple example that parses a YAML sequence:
x <- "
- 1
- 2
- 3
"
yaml.load(x) #=> [1] 1 2 3
Strings
A YAML string is the basic building block of YAML documents. Example of a YAML document with one element:
1.2345
In this case, the string "1.2345" is typed as a
float
(or numeric) by the parser.
yaml.load
would return a numeric vector of length 1 for this document.
yaml.load("1.2345") #=> [1] 1.2345
Sequences
A YAML sequence is a list of elements. Here's an example of a simple YAML sequence:
- this
- is
- a
- simple
- sequence
- of
- strings
If you pass a YAML sequence to
yaml.load
, a couple of things can happen. If all of the elements in the sequence are uniform,
yaml.load
will return a vector of that type (i.e. character, integer, real, or logical). If the elements are
not uniform,
yaml.load
will return a list of the elements. No coercion is done by default.
Maps
A YAML map is a list of paired keys and values, or hash, of elements. Here's an example of a simple YAML map:
one: 1
two: 2
three: 3
four: 4
Passing a map to
yaml.load
will produce a named list by default. That is, keys are coerced to strings. Since it is possible for the keys of a YAML map to be almost anything (not just strings), you might not want
yaml.load
to return a named list. If you want to preserve the data type of keys, you can pass
as.named.list = FALSE
to
yaml.load
. If
as.named.list
is FALSE,
yaml.load
will create a
keys
attribute for the list it returns instead of coercing the keys into strings.
Handlers
yaml.load
has the capability to accept custom handler functions. With handlers, you can customize
yaml.load
to do almost anything you want. Example of handler usage:
integer.handler <- function(x) { as.integer(x) + 123 }
yaml.load("123", handlers = list(int = integer.handler)) #=> [1] 246
Handlers are passed to
yaml.load
through the
handlers
argument. The
handlers
argument must be a named list of functions, where each name is the YAML type that you want to be handled by your function. The functions you provide must accept one argument and must return an R object.
Most of the time, handler functions will be passed a string. In the example above,
integer.handler
was passed the string "123". However, you can also provide custom handler functions to deal with sequences and maps.
Sequence handlers
Custom sequence handlers will be passed a list of objects. You can then convert the list into whatever you want and return it. Example:
sequence.handler <- function(x) {
tmp <- as.numeric(x)
tmp / 5
}
string <- "
- foo
- bar
- 123
- 4.567
"
yaml.load(string, handlers = list(seq = sequence.handler)) #=> [1] NA NA 24.6000 0.9134
Map handlers
Custom map handlers work much in the same way as custom list handlers. A map handler function is passed a named list, or a list with a
keys
attribute (depending on the value of
as.named.list
). Example:
string <- "
a:
- 1
- 2
b:
- 3
- 4
"
yaml.load(string, handlers = list(map = function(x) { as.data.frame(x) }))
Returns:
b a
1 3 1
2 4 2
An interesting thing to note in this example is that the
b
column appears before the
a
column in the resulting data frame. This is because YAML maps are considered to be hashes, and therefore, order is not preserved. If you want an ordered map, you can use a combination of maps and sequences like so:
- a:
- 1
- 2
- b:
- 3
- 4
as.yaml
as.yaml
is an S3 method used to convert R objects into YAML strings. Example
as.yaml
usage:
x <- as.yaml(1:5)
cat(x, "\n")
Output from above example:
- 1
- 2
- 3
- 4
- 5
Arguments
Here's the list of
as.yaml
arguments:
Name |
Description |
Default |
x |
the object to convert |
|
pre.indent |
number of spaces to shift document |
0 |
line.sep |
line separator to use ("\n" or "\r\n") |
"\n" |
indent |
number of spaces to use for indenting |
2 |
column.major |
determines if data.frames are output as column major |
TRUE |
pre.indent
The
pre.indent
option will shift the entire document to the right by the number of spaces you specify. For example:
x <- as.yaml(1:5, pre.indent = 4)
cat(x, "\n")
Outputs:
- 1
- 2
- 3
- 4
- 5
column.major
The
column.major
option determines how a data frame is converted into YAML. By default,
column.major
is TRUE.
Example of
as.yaml
when
column.major
is TRUE:
x <- data.frame(a=1:5, b=6:10)
y <- as.yaml(x, column.major = TRUE)
cat(y, "\n")
Outputs:
a:
- 1
- 2
- 3
- 4
- 5
b:
- 6
- 7
- 8
- 9
- 10
Whereas:
x <- data.frame(a=1:5, b=6:10)
y <- as.yaml(x, column.major = FALSE)
cat(y, "\n")
Outputs:
- a: 1
b: 6
- a: 2
b: 7
- a: 3
b: 8
- a: 4
b: 9
- a: 5
b: 10
Additional documentation
For more information, run
help(package='yaml')
or
example('yaml-package')
for some examples.
To Do
- Include named vector support (instead of only named lists)
- Add date support