I’m a fan of literate programming. When I’m writing a paper, I typically use LaTeX to draft the paper, Java to generate the data, and R to create figures and do basic analysis. With Sweave (or the more recent Knitr package), it is possible to easily add R code to LaTeX documents. However, I had previously been running the Java code separately and then reading the results into R separately for analysis. But with the rJava package, it is now possible to run Java code directly from R, making sure that the whole document is up-to-date at all times.

rJava Intro

The rJava package lets you call java code directly from within R. There are two parallel APIs provided from the package. A low-level API, consisting of functions starting with ‘.j’ (e.g. .jnew, .jcall, etc.) is faster but requires knowing about JNI type strings. For some reason the rJava documentation focuses on this API, but there is also a high-level API which is much easier to use (but a little slower when calling functions).

First, initialize the JVM:

[splus]
require(rJava)
.jinit() # this starts the JVM
[/splus]

The J() call is quite flexible. Given a java class name, it can return a S4 class wrapping the java Class. Instances can then be created from this class using the new function. J() can also be used to call instance methods of java objects, or to invoke static functions.

[splus]x = new(J("java.lang.String"), "Example String")
J(x,"toUpperCase")[/splus]

## [1] "EXAMPLE STRING"

[splus]J("java.lang.Double", "parseDouble", "2.718282" )[/splus]

## [1] 2.718282

You can also use the $ operator, which works like the java ., to get class members or call functions.

[splus]x$substring(as.integer(10))[/splus]

## [1] "ring"

[splus]J("java.lang.Double")$parseDouble("3.141593")[/splus]

## [1] 3.141593

BioJava

Now lets use BioJava! We need to reinitialize the JVM with the full biojava classpath. Then we import some java packages to R’s namespace to reduce typing.

[splus]classpath = path.expand(c(
"~/.m2/repository/java3d/vecmath/1.3.1/vecmath-1.3.1.jar",
"~/.m2/repository/org/slf4j/slf4j-api/1.7.10/slf4j-api-1.7.10.jar",
"~/.m2/repository/org/biojava/biojava-alignment/4.0.0/biojava-alignment-4.0.0.jar",
"~/.m2/repository/org/biojava/biojava-core/4.0.0/biojava-core-4.0.0.jar",
"~/.m2/repository/org/biojava/biojava-structure/4.0.0/biojava-structure-4.0.0.jar",
"~/.m2/repository/org/biojava/biojava-structure-gui/4.0.0/biojava-structure-gui-4.0.0.jar",
"~/.m2/repository/net/sourceforge/jmol/jmol/13.0.14/jmol-13.0.14.jar"
))
.jinit(classpath,force.init=T) # this starts the JVM

attach( javaImport(c( "java.lang",
"java.util",
"org.biojava.nbio.structure",
"org.biojava.nbio.structure.align",
"org.biojava.nbio.structure.align.ce",
"org.biojava.nbio.structure.align.util"
)),
pos = 2 , name = "java:imports" )[/splus]

Now we’ll calculate a simple structural alignment. The code looks remarkably similar to the Java code to do the same task.

[splus]cache = new(AtomCache)
ca1 = cache$getAtoms("4hhb.A")
ca2 = cache$getAtoms("4hhb.B")
ce = StructureAlignmentFactory$getAlgorithm(CeMain$algorithmName)
afpChain = ce$align(ca1,ca2)
cat(afpChain$toFatcat(ca1,ca2))[/splus]

## Align 4hhb.A.pdb 141 with 4hhb.B.pdb 146
## Twists 0 ini-len 141 ini-rmsd 1.54 opt-equ 139 opt-rmsd 1.49 chain-rmsd 1.49 Score 366.29 align-len 147 gaps 8 (5.44%)
## Z-score 6.47 Afp-num 3 Identity 40.82% Similarity 57.82%
## Block  0 afp  0 score  0.00 rmsd  1.49 gap 0 (NaN%)
##
##                   .    :    .    :    .    :    .    :    .    :    .    :    .    :
## Chain 1:    1 VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDL------SHGSAQVKGHGKKVAD
##               11111111111111111  11111111111111111111111111111      1111111111111111
## Chain 2:    2 HLTPEEKSAVTALWGKV--NVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLG
##
##                   .    :    .    :    .    :    .    :    .    :    .    :    .    :
## Chain 1:   65 ALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVST
##               1111111111111111111111111111111111111111111111111111111111111111111111
## Chain 2:   70 AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVAN
##
##                   .
## Chain 1:  135 VLTSKYR
##               1111111
## Chain 2:  140 ALAHKYH
##
## Note: positions are from PDB; | means alignment of identical amino acids, : of similar amino acids

Leave a Reply

Your email address will not be published. Required fields are marked *