Coding and More - keeping track of annoying things I run into while coding: March 2010

Wednesday, March 24, 2010

griffon

The last couple of days I spend with developing a quick application in griffon and have to say this is a rather impressive framework.. Specially since it's a young framework with a version of 0.3

what I like?

using the grails approach
very simple to use
saves a lot of time in the initial setup of the project
clear separation into controller/model/views

what I'm missing?

better support in IDE's
no SWT support
no gorm (it will come sooner or later)
documentation, with the current documentation i can't use it for production code

Thursday, March 11, 2010

i just love groovy for tasks like this...

sometimes you have these annoying little tasks on your hands and I always forget the bash syntax and it’s been a while since I seriously played with python.

So I thought, hey let’s try groovysh.

task?

read wordlist
convert to lowercase
remove parts of words

solution

new File("en_US.dic").text.split("\n").each{ new File("result.txt").append "${it.split("/")[0].toLowerCase()}\n" }

ok after a bit more coding I actually wrote a groovy script

read file
read file with words we do not want in word list
convert all values to lowercase
remove duplicated values
save to output file

really nothing fancy, but get’s the job done and took like 3 minutes to write and won’t be used again.

Set<String> cache = new HashSet<String>()
Set<String> result = new HashSet<String>()

new File(“whitelist.txt”).text.split(“\n”).each{ it.split("\t").eachWithIndex{

        String s, int index ->

                if(index > 0){
                        cache.add(s.toLowerCase())
                }
        }
}

new File(“blacklist.txt”).text.split(“\n”).each{
        if(cache.contains(it.toLowerCase()) == false){
                result.add(it.toLowerCase())
        }
}

File out = new File("wordlist.txt")

result.each{
        out.append it
        out.append "\n"
}

Wednesday, March 10, 2010

calculating exact molare masses using the cdk 1.3.1

I needed a simple way to calculate the exact mass for a couple of millions compounds and so decided to give it another try with the CDK.

After googleing a bit I found something what put me in the right direction and worked with the current version of the cdk, so it was rather simple
 
class ConvertInchi {

/**
* converts an inchi code to a molare mass
* @param inchi
* @return
*/
public static double convertInchiToMolareMass(String inchi) {

IMolecularFormula moleculeFormula = MolecularFormulaManipulator.getMolecularFormula(convertInchiToMolecularFormula(inchi),DefaultChemObjectBuilder.getInstance())

return MolecularFormulaManipulator.getTotalExactMass(moleculeFormula)

}

/**
*
* converts an inchi to a molecular formula
* @param
inchi
* @return
*/
public static String convertInchiToMolecularFormula(String inchi) {
return inchi.split("/")[1]

}
}

Friday, March 5, 2010

Postgres - partial indexes and sort indexes

this morning it was a sad moment. Normally people ask me all the time about SQL optimization or SQL tuning or when to create an index and so on.

This morning i had another DOOOH moment.

Basically while trying to optimize BinBase even further I noticed that some queries take an absurd amount of time to finish. Like 500 seconds to query the bin table.

So I started to investigate and discovered that some of my queries are using seq scans over huge tables. For no apparent reason.

It turned out that the sort column was not indexed...



CREATE INDEX bin_retention_index ON bin (retention_index ASC NULLS LAST);

which was greatly improved after executing this statement.

Now there was still a huge wait time for a certain query



explain select * from BIN where bin_id not in ( SELECT bin_id from SPECTRA where sample_id = 1630733 and bin_id is not null )ORDER BY retention_index ASC

and it turned out that indexes were never used for the 'is not null' part. After some research and a lot of head scratching it turned out that postgres supports partial indexes for exactly this case.



create index spectra_bin_id_partial_is_not_null on spectra (bin_id) where bin_id is not null

Now afterward we some improvement but the actually slow data access is caused by something else. We always need the complete table - a few compounds and this operation takes some time.

Time to optimize the internal database lookup...

Thursday, March 4, 2010

rocks linux cluster - adding a new parallel environment

by default rocks ships with a couple of environments, which execute stuff on different nodes. But sometimes you just want to have a node all to your self and take over all it's slots.

Todo this you can just create a new environment and which gives you a defined number of cpus for a specified job.

create a file which describes the paralell environment like this




pe_name            threaded


slots              999


user_lists         NONE


xuser_lists        NONE


start_proc_args    /bin/true


stop_proc_args     /bin/true


allocation_rule    $pe_slots


control_slaves     FALSE


job_is_first_task  TRUE


urgency_slots      min


accounting_summary FALSE




qconf -Ap file.txt

add it to the list of available envionments




qconf -mq all.q


pe_list               make mpich mpi orte threaded

test it with qlogin




qlogin -pe threaded 4

Wednesday, March 3, 2010

SetupX - tuning and the missing indexes

after analyzing a couple of hundred sql statements in setupx I noticed that there is no real use of indexes for some reason. Why there are no indexes escapes my mind, but since we prefer higher query speed I suggest the creation of the following indexes (which are far from perfect)

create index pubdata_relatedExperimentID_index on pubdata(relatedExperimentID)
create index NCBIClassifierID_ncbiID_index on NCBIClassifierID(ncbiID)
create index formobject_question_index on formobject(question)
create index formobject_discriminator_index on formobject(discriminator)
create index formobject_value_index on formobject(value(100))
create index query_querytype_index on query(querytype)
create index query_userid_index on query(userID)
create index datafile_source_index on datafile (source)
create index cache_experiment_id_index on cache (experimentID)
create index user_password_index on user (passwd)
create index user_username_index on user (username)
create index BlockedIP_blockedIp_index on BlockedIP(blockedIP)

this should improve the performance nicely and is currently being applied to our production system.

SetupX - rawdata access

ever had the desire to access the rawdata in SetupX, because the gui doesn't give you easy or access at all to it?

First you need to understand what's happening in the background. The used approache for setupx looks like a mapping from a tree to a table structure. Which is a valid approach and keeps it very flexible in therory, but is not a real pratical solution. Specially since it's a real pain to write queries for.

Get all species:

select distinct value from formobject where question = 'species' and value != ''

Get all organs:

select distinct lower(value) from formobject where question in ( 'organ','organ name','Organs','Organ specification' ) and value != ''

we need to use lower, cause people have a 'strange' way of spelling things

work in progress!

Tuesday, March 2, 2010

jboss 4.2.1GA and Java6 and jboss based webservice

thanks to this report I was able to fix this issue. Well it's not a fix, it's a workaround...

Ariela Hui - 13/Aug/08 04:23 PM
I was able to solve this problem. This is what environment I have:
winXP
JDK 1.6.0
JBoss 4.2.1

In the [jboss_home]/lib/endorsed add:
jaxb-api.jar
jboss-jaxrpc.jar
jboss-jaxws.jar
jboss-saaj.jar

copy them from [jboss_home]/server/default/lib

I've also added to endored jboss-logging-spi.jar
This I copied from jboss-5.0.0 client folder.

Monday, March 1, 2010

scala/groovy on rocks linux

well since there is not scala/groovy roll for rocks we need to install it the traditional way.

go into the directory /shared/apps on the frontend
if apps doesn't exist create it
copy your scala/groovy tgz there
gunzip and untar it
edit your extend-compute.xml as shown here
add a new file modification section like this





<file name="/etc/profile" mode="append">



GROOVY_HOME=/share/apps/groovy

SCALA_HOME=/share/apps/scala



export GROOVY_HOME

export SCALA_HOME



PATH=$GROOVY_HOME/bin:$PATH

PATH=$SCALA_HOME/bin:$PATH



export PATH



</file>

rebuild your dist as shown here
reinstall you nodes as shown here

Coding and More - keeping track of annoying things I run into while coding