Home

It might be a tad hard to believe, but I am new … well … somewhat new, to R. I have been using R for a while, but it has been mostly by copying the proper commands from some place, understanding what they do, then putting them into a program of mine so that the commands would be generated by my other program, and then ran by my other program too. Oh, let’s not forget, then my program would parse the results of the R run. So, in the end, I forget what the command was like, etc.

Anyway, anybody working on genomics knows how precious hard drive space can become. I therefore compress most of the files I use. When I produce a table, I compress it. Some time ago I discovered that bzip2 achieved better compression ratios than gzip. I started using bzip2 from that point on. However, whenever a student was going to work with these tables two things would happen:

  1. the student would copy the files, and
  2. the student would decompress the file before use.

Two question here: Why copy the file? Was it necessary to decompress the file? I often find that decompression was not necessary, and therefore copying the file was not necessary either.

In the case of R, I had the problem that I was sending some tables to a colleague, and then his technician would complain that he did not have space for the files … yes, you guessed well, the person was decompressing the files. I asked why, and this person answered: because I’m using R! What did I do? I asked how he was calculating what he was calculating, I added the calculation to my programs, and we solved that problem, but I was left with some noise in my head. So I googled “read compressed files in R,” and “read bzipped files in R,” etc. I found a marvelous command in R: bzfile(), which allows us to work with bzipped files in R. I have been using it for years to read tables and! [pause for dramatic effect] to write the results to compressed files! Anyway, a few days ago I though of sharing this little piece of wisdom at twitter:

Don’t bunzip2 that table!
read.table(bzfile(“file.bz2”))
Yes, gzfile does what you think it does.

and Hadley Wickham (@hadleywickham) informed me that actually, read.table(“file.bz2”) works too. What did I do? I tested it, and yes, R can read compressed files directly and transparently. But we can still write bzipped files with R using bzfile(), right?

Have fun,

–Gabo

Advertisements

One thought on “Learning R: R can read compressed files!

  1. Nice trick. I’m taking the Coursera Data Science track and couldn’t figure out how to decompress a bzip2’d file. Turns out, I don’t need to decompress before reading. Thanks

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s