Wednesday, November 18, 2009

SQL - small mistakes, big impact

the last couple of days I was fighting with a database which became slower and slower and slower and now needed close to 24h to execute a rather simple query...

After working for hours on different parameters and tuning the database, including setting up a spare server to see if a dump and restore fixes the issues I got the time down to 5 minutes.

Sounds good?

well not really since I needed an execution in the range of a couple of seconds. So I actually looked at the sql statement which caused issues and found a gigantic error in it...

the statement:


select * from BIN where bin_id not in ( SELECT bin_id from SPECTRA where sample_id = 424105 AND SPECTRA.bin_id = BIN.bin_id)


and the correct statement would be:


select * from BIN a where a.bin_id not in ( SELECT bin_id from SPECTRA where sample_id = 424105 and bin_id is not null) ORDER BY retention_index ASC


the problem?


SPECTRA.bin_id = BIN.bin_id


this little part is totally wrong in the whole statement and causes a seq scan over a table with 70 Million entries against a table with 6000 entries. It does nothing for the queries and I guess it was left over from a copy/paste accident 6 years ago when I wrote this script originally.

let's compare some numbers:

explain before:

Seq Scan on bin (cost=0.00..459513.42 rows=3044 width=1290)
SubPlan
Filter: (NOT (subplan))
-> Bitmap Heap Scan on spectra (cost=75.20..75.42 rows=2 width=8)
Recheck Cond: ((sample_id = 424105::double precision) AND (bin_id = ($0)::double precision))
-> BitmapAnd (cost=75.20..75.20 rows=2 width=0)
-> Bitmap Index Scan on spectra_sample_id (cost=0.00..18.99 rows=5696 width=0)
-> Bitmap Index Scan on spectra_binid (cost=0.00..56.18 rows=17658 width=0)
Index Cond: (sample_id = 424105::double precision)
Index Cond: (bin_id = ($0)::double precision)

explain after:

Sort (cost=1765.47..1766.23 rows=3044 width=1290)
Sort Key: retention_index
-> Seq Scan on bin a (cost=647.21..1747.86 rows=3044 width=1290)
Filter: (NOT (hashed subplan))
SubPlan
-> Index Scan using spectra_sample_id on spectra (cost=0.00..646.42 rows=3138 width=8)
Index Cond: (sample_id = 424105::double precision)
Filter: (bin_id IS NOT NULL)


in real world number this translates to 24 hours before statement compared to 4 seconds after statement...
...

Tuesday, November 17, 2009

maven2 - choose dependcies based on platform

today I run into an old friend. I was compiling my stuff happily under oxs and building eclipse plugins with maven2 and the psteclipse plugin, which I by now more or less rewrote and adapted to my project to perform as good as possible. Anyway, basically I encountered the issue that eclipse ships with different dependencies under osx/win32 and linux.

So promptly it fails to compile under linux...

solution for this problem?

maven2 profiles, your best friend for cross platform development. They actually allow you define dependencies based on the current platform.

Example for out case would be:


<profiles>
<profile>
<id>dev-windows</id>

<dependencies>
</dependencies>

<activation>
<activeByDefault>true</activeByDefault>
<os>
<family>windows</family>
</os>
</activation>
</profile>
<profile>
<id>dev-mac</id>
<dependencies>
<dependency>
<groupId>psteclipse</groupId>
<artifactId>org.eclipse.swt.cocoa.macosx</artifactId>
<version>3.5.1</version>
<scope>provided</scope>
</dependency>

</dependencies>

<activation>
<activeByDefault>false</activeByDefault>
<os>
<family>mac</family>
</os>
</activation>
</profile>
<profile>
<id>dev-linux</id>

<dependencies>
<dependency>
<groupId>psteclipse</groupId>
<artifactId>org.eclipse.swt.gtk.linux.x86</artifactId>
<version>3.5.1</version>
<scope>provided</scope>
</dependency>
</dependencies>

<activation>
<activeByDefault>false</activeByDefault>
<os>
<name>linux</name>
</os>
</activation>
</profile>
</profiles>


And sets a different library for osx and linux. Sweet!

Wednesday, November 11, 2009

hibernate - configure caching

since the hibernate tools for eclipse need to have the caching enabled, something I'm not the biggest fan off for several reason.

Well


<property name="hibernate.cache.provider_class">
org.hibernate.cache.EhCacheProvider
</property>

<property name="hibernate.cache.use_second_level_cache">true</property>
<property name="hibernate.cache.use_query_cache">true</property>


so this makes the binbase configuration run in the eclipse tools.

Monday, November 2, 2009

Log4J and eclipse

sometimes you want to use the log4j system instead of the internal eclipse provide logging framework.

The use of this is rather simple and straight forwarded. You only need to register an ILogListener at the platform object and implement it.

An example would be:


package edu.ucdavis.genomics.metabolomics.binbase.gui.swt.logging;

import org.apache.log4j.Logger;
import org.eclipse.core.runtime.ILogListener;
import org.eclipse.core.runtime.IStatus;

/**
* used to forward eclipse logging to log4j
*
* @author wohlgemuth
*
*/
public class Log4JListener implements ILogListener {

static Logger LOGGER = Logger.getLogger(Log4JListener.class);

public void logging(IStatus status, String plugin) {
if (status.getSeverity() == IStatus.WARNING) {
if (status.getException() == null) {
LOGGER.warn(status.getMessage() + "(" + status.getCode() + ")");
} else {
LOGGER.warn(status.getMessage() + "(" + status.getCode() + ")",
status.getException());
}
} else if (status.getSeverity() == IStatus.ERROR) {
if (status.getException() == null) {
LOGGER
.error(status.getMessage() + "(" + status.getCode()
+ ")");
} else {
LOGGER.error(
status.getMessage() + "(" + status.getCode() + ")",
status.getException());
}

} else if (status.getSeverity() == IStatus.INFO) {
if (status.getException() == null) {
LOGGER.info(status.getMessage() + "(" + status.getCode() + ")");
} else {
LOGGER.info(status.getMessage() + "(" + status.getCode() + ")",
status.getException());
}

}
}

}



and to register it


this.listener = new Log4JListener();
Platform.addLogListener(this.listener);