Tuesday, July 13, 2010

finding all external reference names in pubchem sdf files

currently I'm working on a simple way to find all external reference names in the pubchem sdf files. Which is rather trivial, but time consuming:



cat *.sdf | grep PUBCHEM_EXT_DATASOURCE_NAME -A 1 | grep -v PUBCHEM_EXT_DATASOURCE_NAME | grep -v '\-\-' | sort | uniq



once this finished it should give us a list of all possible data sources from pubchem as a unique list of names.


...
Ambinter
Burnham Center for Chemical Genomics
Calbiochem
CC_PMLSC
ChEBI
ChemSpider
DiscoveryGate
Emory University Molecular Libraries Screening Center
InFarmatik
KUMGM
LipidMAPS
MICAD
MLSMR
MMDB
MTDP
Nature Chemical Biology
NCGC
NIAID
NMMLSC
NMRShiftDB
ORST SMALL MOLECULE SCREENING CENTER
PCMD
ProbeDB
Prous Science Drugs of the Future
R&D Chemicals
Sigma-Aldrich
Specs
SRMLSC
Structural Genomics Consortium
The Scripps Research Institute Molecular Screening Center
Thomson Pharma
UM-BBD
UPCMLD
...

No comments:

Post a Comment