You are here:
Vanderbilt Biostatistics Wiki
>
Main Web
>
HowToPagesList
>
HowToOpenUnicodeDataFiles
(revision 3) (raw view)
Edit
Attach
---+ How To Open Unicode Data Files While some programs open [[http://en.wikipedia.org/wiki/Unicode][Unicode]] and [[http://en.wikipedia.org/wiki/ASCII][ASCII]] data files equally well, some statistical packages and programming languages require a little extra effort to work with Unicode. Here are some solutions if you have a Unicode dataset you need to work with. ---++ _Convert_ Unicode to ASCII in Windows Naturally, you can only convert Unicode characters that have ASCII equivalents. Most datafiles in English will only use characters available in both encodings, so conversion is often an option. There are at least two opens available to convert Unicode files to ASCII files in Windows. ---+++ !WordPad 1 Open the file with !WordPad. 1 Go to File -> Save As -> in the drop down menu just below the file name field change the file type from _Unicode Text Document_ to _Text Document_. 1 Now enter the file name you want remembering to specify the suffix you want such as .csv. The default is .txt. The !WordPad option is convenient, but may not work for very large files and requires a lot of pointing and clicking. The following command line option solves those problems. ---+++ TYPE command From the Windows command line, you can _convert_ a unicode encoded file to an ASCII encoded file using the TYPE command. 1 Click start, click run, type "cmd", click ok. 1 Use the TYPE command as follows <br><highlight>TYPE "path and file of unicode file" > "path and file of ascii file to create"</highlight> Here's an example <br><highlight>TYPE "C:\Documents and Settings\Robert\Desktop\ExampleUnicode.csv" > "C:\Documents and Settings\Robert\Desktop\ExampleASCII.csv"</highlight> _If you wish to avoid specifying the path, you can opt to cd into the appropriate folder first._ The TYPE command is essentially reading the input file and printing it to the output file. In the process, it converts the input to the format it would use if just printing to the screen. If you leave off the "> out file", this command prints directly to the screen. Note, this trick may not work for [[http://en.wikipedia.org/wiki/UTF-8][UTF-8]] encoding, which is backwards compatible with ASCII. ---++ Convert Unicode to ASCII in Linux (This section needs creating. If you know how to do this, please create this section.) ---++ Open the Unicode Data File Directly in R (This section needs creating. If you know how to do this, please create this section.) (See also this May 2008 [[https://stat.ethz.ch/pipermail/r-help/2008-May/163469.html][R-help thread]].) ---++ Open the Unicode Data File Directly in Stata (This section needs creating. If you know how to do this, please create this section.) ---++ Open the Unicode Data File Directly in SAS (This section needs creating. If you know how to do this, please create this section.) ---++ Sample Data to Play With The following files contain the following dataset delimited by commas.<br> * [[%ATTACHURL%/ExampleUnicode.csv][ExampleUnicode.csv]] * [[%ATTACHURL%/ExampleUTF8.csv][ExampleUTF8.csv]] * [[%ATTACHURL%/ExampleASCII.csv][ExampleASCII.csv]] |Name|Age|!EyeColor| |Andrea|71|Green| |Bobby|72|Hazel| |Charles|73|Brown|
Edit
|
Attach
|
P
rint version
|
H
istory
:
r4
<
r3
<
r2
<
r1
|
B
acklinks
|
V
iew topic
|
Edit WikiText
|
More topic actions...
Topic revision: r3 - 16 Apr 2009,
RobertGreevy
Main
Department Home Page
Biostatistics Graduate Program
Vanderbilt University Medical Center
Main Web
Main Web Home
Search
Recent Changes
Changes
Topic list
Biostatistics Webs
Archive
Main
Sandbox
System
Register
|
Log In
Copyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki?
Send feedback