Tuesday, February 2, 2010

regular expression for common chemical identifiers

This is basically a small collection for regular expressions which I use from time to time to distinguish chemical identifiers. Please feel free to add more to the list to make it grow and more complete.


The first line is the name, the second is the valid groovy/java version. All are validate with at least thousand examples

std inchi 

InChI=1S/([^/]+)(?:/[^/]+)*\\S

std inchiKey 

[A-Z]{14}-[A-Z]{10}-[A-Z,0-9]


CAS

\\d{1,7}-\\d\\d-\\d

KEGG

C\d{5}

LipidMaps

LMFA[0-9]{8}

HMDB

HMDB[0-9]*

No comments:

Post a Comment