Wednesday, March 24, 2010


The last couple of days I spend with developing a quick application in griffon and have to say this is a rather impressive framework.. Specially since it's a young framework with a version of 0.3

what I like?

  • using the grails approach
  • very simple to use
  • saves a lot of time in the initial setup of the project
  • clear separation into controller/model/views
what I'm missing?
  • better support in IDE's
  • no SWT support
  • no gorm (it will come sooner or later)
  • documentation, with the current documentation i can't use it for production code

Thursday, March 11, 2010

i just love groovy for tasks like this...

sometimes you have these annoying little tasks on your hands and I always forget the bash syntax and it’s been a while since I seriously played with python.

So I thought, hey let’s try groovysh.


  • read wordlist
  • convert to lowercase
  • remove parts of words

new File("en_US.dic").text.split("\n").each{ new File("result.txt").append "${it.split("/")[0].toLowerCase()}\n" }

ok after a bit more coding I actually wrote a groovy script

  • read file
  • read file with words we do not want in word list
  • convert all values to lowercase
  • remove duplicated values
  • save to output file
really nothing fancy, but get’s the job done and took like 3 minutes to write and won’t be used again.

Set<String> cache = new HashSet<String>()
Set<String> result = new HashSet<String>()

new File(“whitelist.txt”).text.split(“\n”).each{ it.split("\t").eachWithIndex{

        String s, int index ->

                if(index > 0){

new File(“blacklist.txt”).text.split(“\n”).each{
        if(cache.contains(it.toLowerCase()) == false){

File out = new File("wordlist.txt")

        out.append it
        out.append "\n"

Wednesday, March 10, 2010

calculating exact molare masses using the cdk 1.3.1

I needed a simple way to calculate the exact mass for a couple of millions compounds and so decided to give it another try with the CDK.

After googleing a bit I found something what put me in the right direction and worked with the current version of the cdk, so it was rather simple

class ConvertInchi {

* converts an inchi code to a molare mass
* @param inchi
* @return
public static double convertInchiToMolareMass(String inchi) {

IMolecularFormula moleculeFormula = MolecularFormulaManipulator.getMolecularFormula(convertInchiToMolecularFormula(inchi),DefaultChemObjectBuilder.getInstance())

return MolecularFormulaManipulator.getTotalExactMass(moleculeFormula)


* converts an inchi to a molecular formula
* @param
* @return
public static String convertInchiToMolecularFormula(String inchi) {
return inchi.split("/")[1]


Friday, March 5, 2010

Postgres - partial indexes and sort indexes

this morning it was a sad moment. Normally people ask me all the time about SQL optimization or SQL tuning or when to create an index and so on.

This morning i had another DOOOH moment.

Basically while trying to optimize BinBase even further I noticed that some queries take an absurd amount of time to finish. Like 500 seconds to query the bin table.

So I started to investigate and discovered that some of my queries are using seq scans over huge tables. For no apparent reason.

It turned out that the sort column was not indexed...

CREATE INDEX bin_retention_index ON bin (retention_index ASC NULLS LAST);

which was greatly improved after executing this statement.

Now there was still a huge wait time for a certain query

explain select * from BIN where bin_id not in ( SELECT bin_id from SPECTRA where sample_id = 1630733 and bin_id is not null )ORDER BY retention_index ASC

and it turned out that indexes were never used for the 'is not null' part. After some research and a lot of head scratching it turned out that postgres supports partial indexes for exactly this case.

create index spectra_bin_id_partial_is_not_null on spectra (bin_id) where bin_id is not null

Now afterward we some improvement but the actually slow data access is caused by something else. We always need the complete table - a few compounds and this operation takes some time.

Time to optimize the internal database lookup...

Thursday, March 4, 2010

rocks linux cluster - adding a new parallel environment

by default rocks ships with a couple of environments, which execute stuff on different nodes. But sometimes you just want to have a node all to your self and take over all it's slots.

Todo this you can just create a new environment and which gives you a defined number of cpus for a specified job.

  1. create a file which describes the paralell environment like this

  2. pe_name threaded
    slots 999
    user_lists NONE
    xuser_lists NONE
    start_proc_args /bin/true
    stop_proc_args /bin/true
    allocation_rule $pe_slots
    control_slaves FALSE
    job_is_first_task TRUE
    urgency_slots min
    accounting_summary FALSE

  3. register this on the head node

  4. qconf -Ap file.txt

  5. add it to the list of available envionments

  6. qconf -mq all.q
    pe_list make mpich mpi orte threaded

  7. test it with qlogin

  8. qlogin -pe threaded 4

Wednesday, March 3, 2010

SetupX - tuning and the missing indexes

after analyzing a couple of hundred sql statements in setupx I noticed that there is no real use of indexes for some reason. Why there are no indexes escapes my mind, but since we prefer higher query speed I suggest the creation of the following indexes (which are far from perfect)

create index pubdata_relatedExperimentID_index on pubdata(relatedExperimentID)
create index NCBIClassifierID_ncbiID_index on NCBIClassifierID(ncbiID)
create index formobject_question_index on formobject(question)
create index formobject_discriminator_index on formobject(discriminator)
create index formobject_value_index on formobject(value(100))
create index query_querytype_index on query(querytype)
create index query_userid_index on query(userID)
create index datafile_source_index on datafile (source)
create index cache_experiment_id_index on cache (experimentID)
create index user_password_index on user (passwd)
create index user_username_index on user (username)
create index BlockedIP_blockedIp_index on BlockedIP(blockedIP)

this should improve the performance nicely and is currently being applied to our production system.

SetupX - rawdata access

ever had the desire to access the rawdata in SetupX, because the gui doesn't give you easy or access at all to it?

First you need to understand what's happening in the background. The used approache for setupx looks like a mapping from a tree to a table structure. Which is a valid approach and keeps it very flexible in therory, but is not a real pratical solution. Specially since it's a real pain to write queries for.

Get all species:

select distinct value from formobject where question = 'species' and value != ''

Get all organs:

select distinct lower(value) from formobject where question in ( 'organ','organ name','Organs','Organ specification' ) and value != ''

we need to use lower, cause people have a 'strange' way of spelling things

work in progress!

Tuesday, March 2, 2010

jboss 4.2.1GA and Java6 and jboss based webservice

thanks to this report I was able to fix this issue. Well it's not a fix, it's a workaround...

Ariela Hui - 13/Aug/08 04:23 PM
I was able to solve this problem. This is what environment I have:
JDK 1.6.0
JBoss 4.2.1

In the [jboss_home]/lib/endorsed add:

copy them from [jboss_home]/server/default/lib

I've also added to endored jboss-logging-spi.jar
This I copied from jboss-5.0.0 client folder. 

Monday, March 1, 2010

scala/groovy on rocks linux

well since there is not scala/groovy roll for rocks we need to install it the traditional way.

  • go into the directory /shared/apps on the frontend
  • if apps doesn't exist create it
  • copy your scala/groovy tgz there
  • gunzip and untar it
  • edit your extend-compute.xml as shown here
  • add a new file modification section like this

<file name="/etc/profile" mode="append">




export PATH


  • rebuild your dist as shown here
  • reinstall you nodes as shown here