I’ve been writing a lot lately, papers, presentations, late blog posts and I wanted to talk briefly about the way we process data. I’ve used many different data processing tools in the past from Excel, to Matlab, to Igor and Origin. I wanted to write a brief post about the one I’m getting used to right now: R. R is a programming language created in the early 90s. It’s open source for all of you open source aficionados out there. This means that no matter where you are in the world, as long as you have Administrator access to your computer, you can download and install it. This gives us the flexibility of being able to process data and produce high-quality images on numerous different devices. Getting familiar with the basics of R is a good first step towards conducting science professionally.
I say professional science because making the images for a paper is often times much different than producing an image for a lab report. As an undergraduate student you can get away with creating pretty poor graphs as long as they display the data you need to show. Take this simple bar graph for instance, Figure 1. It shows all of the data that you want, Percentage vs Type with the correct amounts, but it’s pretty ugly to look at. The lines don’t really give us a great sense of where the values are nor the error bar sizes but it communicates the basic idea. Take a look at Figure 2 instead. This is made using R with a package called Grammar of Graphics (“ggplot”). All I have to do is load my data from a txt file and I can copy and paste a pre-written code to make a very clear looking graph. It takes a little bit of initial setup to write the code (see below) but I can just copy and paste it into R and it will make my image for me! This helps speed up the process of graph production and also helps standardize it across labs. Instead of emailing excel files back and forth one can email a simple string of code to get the same graphs.
Figure 1 Figure 2
There is one pro and one con of this program that I’d like to discuss. The Pro to using R comes from that last line of the code. It specifies a place to save it but also the DPI, “Dots Per Inch” of the image. Publications will not accept a blurry, pixelated image that you copied and pasted into a word document. There are minimum levels for image quality that have to be met. If you really love Excel you can control the DPI by using a separate image processing program or use a different data processing program. It’s nice to be able to do it all in one shot, however, with R. The major downside to R, that I haven’t mentioned yet, is that it is NOT a spreadsheet program. You can’t save a spreadsheet of data in R as you could in Excel, so you’d need to save .txt files or other types of files along with the R code you’d want to process. This can be both a help and a hinder as most machines will put out data as a .txt file or as an exotic file form. The nice thing about R is that there is a large community of people working to create new add-ons constantly. So don’t despair! There is chance that somebody has written code to put your exotic data file into a form that R can read!
Thanks for reading! Once things quite down I’ll start talking a bit more about some now methods that I’m looking into for these combinatorial chemistry projects! Please feel free to send me comments here or any other posts!
limits <- aes(ymax = Num + Error, ymin = Num – Error)
p <- ggplot(data=df, aes(x=Type, y=Num), fill=Type) + theme(text = element_text(size=8)) +
p + geom_bar(position=”dodge”, stat=”identity”)
dodge <- position_dodge(width=0.9)
F <- p + geom_bar(position=dodge) + geom_errorbar(limits, position=dodge, width=0.25)+ labs(x = “Type”, y = “Percent”)
ggsave(file=”Example.png”, width = 3.2, height = 2.4, dpi = 500)