Friday, June 07, 2013

Automatic Graphing with Lua and R

Outline

This post will show you how to use my Lua script grapher to automatically identify CSV files in a folder and graph them using R.

This project was inspired by some benchmarking tests I did. Each test I ran had a set of results and I often wanted to run many tests and compare the results. The easiest way to do this was to write a script to produce graphs of the data.

Grapher works in three steps, it first identifies the .csv files in a folder, it then creates an R file based on each CSV file and the options it's given. Finally it executes the R file with Rscript, producing the desired graph.

For some flexibility, two types of data can be interpreted, 'horizontal' CSV files and 'vertical' CSV files, where we are referring to the orientation of the headers. So for example a vertical CSV file may be :

Lua,Perl,Ruby
1,4,7
2,5,8
3,6,9

The data could also be represented in a horizontal format like this :

Lua,1,2,3
Perl,4,5,6
Ruby,7,8,9

Prerequisites

  • Lua
    • Tested on Lua 5.1.4.
  • LuaFileSystem
    • To automatically gather all the CSV files in the folder. Without this you can still use the script but you'll have pass in the names of each CSV file you want graphed.
  • R (including Rscript)
    • Rscript must be available to the command line by adding it's directory to your system's path.
  •  Windows
    • Tested on Windows 7 64-bit.

Use

Basically there are two files, grapher.lua and options.lua. Once you configure the options file all you need to do is run grapher.lua.
> lua grapher.lua
This will show you the options you've selected. To confirm them, press enter and the script will begin graphing your files.

Step-by-step guide

We have a CSV file example.csv with contents :

Lua,Ruby,Python
0.2234,0.2814,0.3067
0.7471,0.7744,0.1373
0.3520,0.9374,0.3733
0.2940,0.2261,0.4777
0.1928,0.8558,0.0805
0.1828,0.4750,0.5826
0.6042,0.2155,0.6299
0.5172,0.2763,0.9875
0.8191,0.6965,0.8450
0.4532,0.3463,0.7834

The numbers here are random, but let's imagine each number represents how long a particular language took to run a particular test, where Row 2 corresponds to Test 1 and so on. We'll put this file into a folder on our desktop called Test (C:\Users\Odhran\Test\example.csv).

Now, open up your options file wherever you saved it. We will edit this to get the output we want. The default options are :



The first this to do is change our directory (the double backslashes are required for R). We'll change this to :
directory = [[C:\\Users\\Odhran\\Desktop\\Test\\]]
 The default legend location seems fine, although you may want to test this on a file first before producing all your graphs and finding the legend is in the way.

We're running tests so we'll change the axis labels accordingly :
xaxis = [[Test]]
yaxis = [[Execution Speed (s)]]
 The headers in our CSV file are in the first row so our CSV file is vertically oriented, no change to the orientation is needed.

I want my graph to be called "Tests" so I change title :
  title = [[Tests]]
 And finally I want my output to be in a pdf, so I change png to pdf :
 outputType  = [[pdf]]
 After saving the changes run grapher.lua.It will ask you to confirm your settings :



These are correct so we press enter and it tells you 1 of 1 graph(s) have been produced. To view your new graph go to the Test folder where your CSV file is located and you'll see two new files, example.r and example.pdf. The R file was produced by grapher to produce the graph, this can be tweaked and regraphed with Rscript if you want to change individual graphs. Below is the content of the R file which was produced and the resulting graph.





The repo can be found here.