How To Open Unicode Data Files

While some programs open Unicode and ASCII data files equally well, some statistical packages and programming languages require a little extra effort to work with Unicode. Here are some solutions if you have a Unicode dataset you need to work with.

Convert Unicode to ASCII in Windows

From the Windows command line, you can convert a unicode encoded file to an ASCII encoded file using the TYPE command.
  1. Click start, click run, type "cmd", click ok.
  2. Use the TYPE command as follows
    TYPE "path and file of unicode file" > "path and file of ascii file to create"

Here's an example
TYPE "C:\Documents and Settings\Robert\Desktop\ExampleUnicode.csv" > "C:\Documents and Settings\Robert\Desktop\ExampleASCII.csv"

If you wish to avoid specifying the path, you can opt to cd into the appropriate folder first.

Naturally, you can not convert a Unicode file that contains characters not availing in the ASCII encoding. The TYPE command is essentially reading the input file and printing it to the output file. In the process, it converts the input to the format it would use if just printing to the screen. If you leave off the "> out file", this command prints directly to the screen. Note, this trick may not work for UTF-8 encoding, which is backwards compatible with ASCII.

Convert Unicode to ASCII in Linux

(This section needs creating. If you know how to do this, please create this section.)

Open the Unicode Data File Directly in R

(This section needs creating. If you know how to do this, please create this section.)

(See also this May 2008 R-help thread.)

Open the Unicode Data File Directly in Stata

(This section needs creating. If you know how to do this, please create this section.)

Open the Unicode Data File Directly in SAS

(This section needs creating. If you know how to do this, please create this section.)

Sample Data to Play With

The following files contain the following dataset delimited by commas.

Name Age EyeColor
Andrea 71 Green
Bobby 72 Hazel
Charles 73 Brown

Edit | Attach | Print version | History: r4 < r3 < r2 < r1 | Backlinks | View wiki text | Edit WikiText | More topic actions...
Topic revision: r1 - 11 Apr 2009, RobertGreevy
 

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback