<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>ChemHack &#187; Uncategorized</title>
	<atom:link href="http://chemhack.com/category/uncategorized/feed/" rel="self" type="application/rss+xml" />
	<link>http://chemhack.com</link>
	<description>Hacking the chemistry world.</description>
	<lastBuildDate>Sat, 18 Dec 2010 18:07:03 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Chemistry Software Everywhere: Handheld Calulator</title>
		<link>http://chemhack.com/2009/03/chemistry-software-everywhere/</link>
		<comments>http://chemhack.com/2009/03/chemistry-software-everywhere/#comments</comments>
		<pubDate>Mon, 23 Mar 2009 04:48:56 +0000</pubDate>
		<dc:creator>Duan Lian</dc:creator>
				<category><![CDATA[Chemoinformatics]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[calculator]]></category>
		<category><![CDATA[TI-Nspire]]></category>

		<guid isPermaLink="false">http://chemhack.com/?p=293</guid>
		<description><![CDATA[In Google Group tinpire, I saw a very interesting Chemistry Library for TI calculator, TI-nspire. I installed it on my handheld, looks nice.]]></description>
			<content:encoded><![CDATA[<p>In Google Group <a href="http://groups.google.com/group/tinspire">tinpire</a>, I saw a very interesting <a href="http://nelsonsousa.pt/index.php?lang=en&amp;cat=2&amp;subcat=3&amp;article=33">Chemistry Library</a> for TI calculator, TI-nspire. I installed it on my handheld, looks nice.</p>
<p><img class="screenshots" src="http://nelsonsousa.pt/images/chemistry1.en.jpg" alt="" /></p>
]]></content:encoded>
			<wfw:commentRss>http://chemhack.com/2009/03/chemistry-software-everywhere/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Does the CDK Fingeprint works? Something went wrong.</title>
		<link>http://chemhack.com/2009/03/does-the-cdk-fingeprint-works-something-went-wrong/</link>
		<comments>http://chemhack.com/2009/03/does-the-cdk-fingeprint-works-something-went-wrong/#comments</comments>
		<pubDate>Mon, 16 Mar 2009 16:11:38 +0000</pubDate>
		<dc:creator>Duan Lian</dc:creator>
				<category><![CDATA[Chemoinformatics]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[CDK]]></category>
		<category><![CDATA[Fingerprint]]></category>

		<guid isPermaLink="false">http://chemhack.com/?p=282</guid>
		<description><![CDATA[  In my last post, I doubted the accuracy of fingerprint based substructure search and pointed out sometimes fingerprint loosed hits. In fact, something went wrong in my code. As I was reading from SDF directly, while IteratingMDLReader does not percieve atom type or detect aromaticity automatically. This cause the incorrect matchings of UniversalIsomorphismTester, sorry for the incorrect post. I&#8217;ve run the test again, [...]]]></description>
			<content:encoded><![CDATA[<p> </p>
<p>In my <a href="http://chemhack.com/archives/2009/03/255/">last post</a>, I doubted the accuracy of fingerprint based substructure search and pointed out sometimes fingerprint loosed hits. In fact, something went wrong in my code. As I was reading from SDF directly, while IteratingMDLReader does not percieve atom type or detect aromaticity automatically. This cause the incorrect matchings of UniversalIsomorphismTester, sorry for the incorrect post. I&#8217;ve run the test again, using the <a href="http://chemhack.com/wp-content/uploads/2009/03/junk.smi">SMILES</a> provided by <a href="http://blog.rguha.net/?p=133" target="_blank">Guha</a> as input. The groovy script is also attached <a href="http://chemhack.com/wp-content/uploads/2009/03/substructure.groovy">here</a>.</p>
<table border="1" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td valign="top"><strong>#</strong></td>
<td valign="top"><strong>Query</strong></td>
<td valign="top"><strong>Subgraph</strong><strong> Isomorphism</strong></td>
<td valign="top"><strong>Entended CDK</strong></td>
<td valign="top"><strong>Missing</strong></td>
<td valign="top"><strong>Extra</strong></td>
</tr>
<tr>
<td valign="top">1</td>
<td valign="top"><img class="size-thumbnail wp-image-258 alignnone" title="e59bbee78987-101" src="http://chemhack.com/wp-content/uploads/2009/03/e59bbee78987-101-150x150.png" alt="e59bbee78987-101" width="120" height="120" /></td>
<td valign="top">20</td>
<td valign="top">24</td>
<td valign="top">0</td>
<td valign="top">4</td>
</tr>
<tr>
<td valign="top">2</td>
<td valign="top"><img class="size-full wp-image-256 alignnone" title="e59bbee78987-9" src="http://chemhack.com/wp-content/uploads/2009/03/e59bbee78987-9.png" alt="e59bbee78987-9" width="150" height="121" /></td>
<td valign="top">7</td>
<td valign="top">103</td>
<td valign="top">0</td>
<td valign="top">96</td>
</tr>
<tr>
<td valign="top">3</td>
<td valign="top"><img class="alignnone size-thumbnail wp-image-259" title="e59bbee78987-11" src="http://chemhack.com/wp-content/uploads/2009/03/e59bbee78987-11-150x150.png" alt="e59bbee78987-11" width="135" height="135" /></td>
<td valign="top">69</td>
<td valign="top">100</td>
<td valign="top">0</td>
<td valign="top">31</td>
</tr>
<tr>
<td valign="top">4</td>
<td valign="top"><img class="alignnone size-thumbnail wp-image-260" title="e59bbee78987-12" src="http://chemhack.com/wp-content/uploads/2009/03/e59bbee78987-12-150x150.png" alt="e59bbee78987-12" width="135" height="135" /></td>
<td valign="top">6</td>
<td valign="top">10</td>
<td valign="top">0</td>
<td valign="top">4</td>
</tr>
<tr>
<td valign="top">5</td>
<td valign="top"><img class="alignnone size-full wp-image-261" title="e59bbee78987-13" src="http://chemhack.com/wp-content/uploads/2009/03/e59bbee78987-13.png" alt="e59bbee78987-13" width="151" height="81" /></td>
<td valign="top">31</td>
<td valign="top">41</td>
<td valign="top">0</td>
<td valign="top">10</td>
</tr>
<tr>
<td valign="top">6</td>
<td valign="top"><img class="alignnone size-full wp-image-262" title="e59bbee78987-15" src="http://chemhack.com/wp-content/uploads/2009/03/e59bbee78987-15.png" alt="e59bbee78987-15" width="85" height="112" /></td>
<td valign="top">23</td>
<td valign="top">23</td>
<td valign="top">0</td>
<td valign="top">0</td>
</tr>
<tr>
<td valign="top">7</td>
<td valign="top"><img class="alignnone size-full wp-image-263" title="e59bbee78987-16" src="http://chemhack.com/wp-content/uploads/2009/03/e59bbee78987-16.png" alt="e59bbee78987-16" width="134" height="102" /></td>
<td valign="top">7</td>
<td valign="top">75</td>
<td valign="top">0</td>
<td valign="top">68</td>
</tr>
</tbody>
</table>
<p>Well, CDK fingerprint is OK.</p>
]]></content:encoded>
			<wfw:commentRss>http://chemhack.com/2009/03/does-the-cdk-fingeprint-works-something-went-wrong/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Does the CDK Fingerprints Work? Substructure search</title>
		<link>http://chemhack.com/2009/03/does-the-cdk-fingerprints-work-substructure-search/</link>
		<comments>http://chemhack.com/2009/03/does-the-cdk-fingerprints-work-substructure-search/#comments</comments>
		<pubDate>Thu, 05 Mar 2009 17:41:14 +0000</pubDate>
		<dc:creator>Duan Lian</dc:creator>
				<category><![CDATA[Chemoinformatics]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Fingerprint]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[substructure]]></category>

		<guid isPermaLink="false">http://chemhack.com/?p=255</guid>
		<description><![CDATA[The test code used in this post has fatal error, which caused the test result to be completely incorrect. Please see details here. Abstract: Doubting on the accuracy of fingerprint based molecule substructure search, I did a test between different fingerprints implemented in CDK and subgraph isomorphism. The result is very interesting, CDK fingerprints should [...]]]></description>
			<content:encoded><![CDATA[<p><span style="color: #ff0000;">The test code used in this post has fatal error, which caused the test result to be completely incorrect. Please see details <a href="http://chemhack.com/archives/2009/03/282/">here</a>.</span></p>
<p>Abstract: Doubting on the accuracy of fingerprint based molecule substructure search, I did a test between different fingerprints implemented in CDK and subgraph isomorphism. The result is very interesting, CDK fingerprints should never be used alone in substructure search, but combine CDK fingerprints and subgraph isomorphism, we can have a balance between speed and accuracy.</p>
<h2>Backgournd</h2>
<p><a href="http://blog.rguha.net/">Guha</a> wrote a <a href="http://blog.rguha.net/?p=29">post</a> about benchmarking of different type of fingerprints with benchmarking strategies described by <a href="http://dx.doi.org/10.1021/ci0500177">Bender &amp; Glen</a> and <a href="http://dx.doi.org/10.1021/ci050276w">Godden</a> et al. The benchmark is based on Tanimoto similarity, which is the foundation of most chemistry database’s similar structure search. Another impo</p>
<p>rtant feature of molecule structure search is substructure, currently,  subgraph isomorphism and fingerprint are both used in substructure search. Adel Golovin and Kim Henrick’s article <a href="http://pubs.acs.org/doi/abs/10.1021/ci8003013">Chemical Substructure Search in SQL</a> provides a pure SQL subgraph isomorphism strategies, <a href="http://depth-first.com">Rich Apodaca</a>’s <a href="http://depth-first.com/articles/2008/10/02/fast-substructure-search-using-open-source-tools-part-1-fingerprints-and-databases">post</a> fingerprint based MySQL substructure search in MySQL also found a solution with limited binary operation in MySQL.</p>
<p>Fingerprint is obviously faster as it’s much less consuming than subgraph isomorphism. But the real question is, does the fingerprint method really find all the substructures? Are there any hits misjudged as substructure?</p>
<h2>Method</h2>
<p>Using <a href="http://www.drugbank.ca">DrugBank</a> small molecule drugs as test dataset, several hand-draw structures of different types as queries, I performed substructure search using subgraph isomorphism and fingerprints implemented in CDK. If a hit in search results of fingerprint method is also found in search results of subgraph isomorphism method, I count this hit as a correct hit, other wise the hit will be counted as a incorrect hit.</p>
<p>All fingerprints implemented in CDK are tested, generated using Fingerprinter, ExtendedFingerprinter, GraphOnlyFingerprinter, SubstructureFingerprinter, MACCSFingerprinter and EStateFingerprinter. All parameters are kept default.</p>
<p>To test the accuracy of queries with different complexity, I drew several structures, as listed below.</p>
<p><img class="size-full wp-image-256 alignnone" title="e59bbee78987-9" src="http://chemhack.com/wp-content/uploads/2009/03/e59bbee78987-9.png" alt="e59bbee78987-9" width="150" height="121" /><img class="size-thumbnail wp-image-258 alignnone" title="e59bbee78987-101" src="http://chemhack.com/wp-content/uploads/2009/03/e59bbee78987-101-150x150.png" alt="e59bbee78987-101" width="120" height="120" /><img class="alignnone size-thumbnail wp-image-259" title="e59bbee78987-11" src="http://chemhack.com/wp-content/uploads/2009/03/e59bbee78987-11-150x150.png" alt="e59bbee78987-11" width="135" height="135" /><img class="alignnone size-thumbnail wp-image-260" title="e59bbee78987-12" src="http://chemhack.com/wp-content/uploads/2009/03/e59bbee78987-12-150x150.png" alt="e59bbee78987-12" width="135" height="135" /><img class="alignnone size-full wp-image-261" title="e59bbee78987-13" src="http://chemhack.com/wp-content/uploads/2009/03/e59bbee78987-13.png" alt="e59bbee78987-13" width="151" height="81" /><img class="alignnone size-full wp-image-262" title="e59bbee78987-15" src="http://chemhack.com/wp-content/uploads/2009/03/e59bbee78987-15.png" alt="e59bbee78987-15" width="85" height="112" /><img class="alignnone size-full wp-image-263" title="e59bbee78987-16" src="http://chemhack.com/wp-content/uploads/2009/03/e59bbee78987-16.png" alt="e59bbee78987-16" width="134" height="102" /></p>
<h2>Result</h2>
<p>The result is listed below. The result is listed in the format of &#8220;A/B&#8221;. A represents current hits, i.e. hits also founded by subgraph isomorphism method. B represents incurrent hits. For example, 31/49 stands for 31 current hits and 49 incurrent hits are found. Higher A and lower B is better.</p>
<table border="1" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td valign="top"><strong>#</strong></td>
<td valign="top"><strong>Query</strong></td>
<td valign="top"><strong>Subgraph</strong><strong> Isomorphism</strong></td>
<td valign="top"><strong>EState</strong></td>
<td valign="top"><strong>MACCS</strong></td>
<td valign="top"><strong>Standard CDK</strong></td>
<td valign="top"><strong>Entended CDK</strong></td>
<td valign="top"><strong>Graphonly CDK FIngerpint</strong></td>
<td valign="top"><strong>Substructure Fingerprint</strong></td>
</tr>
<tr>
<td valign="top">1</td>
<td valign="top"><img class="size-thumbnail wp-image-258 alignnone" title="e59bbee78987-101" src="http://chemhack.com/wp-content/uploads/2009/03/e59bbee78987-101-150x150.png" alt="e59bbee78987-101" width="120" height="120" /></td>
<td valign="top">31</td>
<td valign="top">0/0</td>
<td valign="top">0/43</td>
<td valign="top"><span style="color: red;">31/49</span></td>
<td valign="top">31/54</td>
<td valign="top">31/574</td>
<td valign="top">0/0</td>
</tr>
<tr>
<td valign="top">2</td>
<td valign="top"><img class="size-full wp-image-256 alignnone" title="e59bbee78987-9" src="http://chemhack.com/wp-content/uploads/2009/03/e59bbee78987-9.png" alt="e59bbee78987-9" width="150" height="121" /></td>
<td valign="top">54</td>
<td valign="top">0/0</td>
<td valign="top">0/1</td>
<td valign="top"><span style="color: red;">21/95</span></td>
<td valign="top">18/84</td>
<td valign="top">54/1512</td>
<td valign="top">11/30</td>
</tr>
<tr>
<td valign="top">3</td>
<td valign="top"><img class="alignnone size-thumbnail wp-image-259" title="e59bbee78987-11" src="http://chemhack.com/wp-content/uploads/2009/03/e59bbee78987-11-150x150.png" alt="e59bbee78987-11" width="135" height="135" /></td>
<td valign="top">29</td>
<td valign="top">0/0</td>
<td valign="top">0/20</td>
<td valign="top"><span style="color: red;">29/74</span></td>
<td valign="top">29/83</td>
<td valign="top">29/1793</td>
<td valign="top">0/5</td>
</tr>
<tr>
<td valign="top">4</td>
<td valign="top"><img class="alignnone size-thumbnail wp-image-260" title="e59bbee78987-12" src="http://chemhack.com/wp-content/uploads/2009/03/e59bbee78987-12-150x150.png" alt="e59bbee78987-12" width="135" height="135" /></td>
<td valign="top">3</td>
<td valign="top">0/0</td>
<td valign="top">0/85</td>
<td valign="top">3/6</td>
<td valign="top"><span style="color: red;">3/3</span></td>
<td valign="top">3/36</td>
<td valign="top">0/4</td>
</tr>
<tr>
<td valign="top">5</td>
<td valign="top"><img class="alignnone size-full wp-image-261" title="e59bbee78987-13" src="http://chemhack.com/wp-content/uploads/2009/03/e59bbee78987-13.png" alt="e59bbee78987-13" width="151" height="81" /></td>
<td valign="top">31</td>
<td valign="top">0/0</td>
<td valign="top">29/93</td>
<td valign="top">31/14</td>
<td valign="top"><span style="color: red;">31/13</span></td>
<td valign="top">31/1593</td>
<td valign="top">27/53</td>
</tr>
<tr>
<td valign="top">6</td>
<td valign="top"><img class="alignnone size-full wp-image-262" title="e59bbee78987-15" src="http://chemhack.com/wp-content/uploads/2009/03/e59bbee78987-15.png" alt="e59bbee78987-15" width="85" height="112" /></td>
<td valign="top">23</td>
<td valign="top">0/0</td>
<td valign="top">0/0</td>
<td valign="top"><span style="color: red;">23/0</span></td>
<td valign="top"><span style="color: red;">23/0</span></td>
<td valign="top">23/23</td>
<td valign="top">0/0</td>
</tr>
<tr>
<td valign="top">7</td>
<td valign="top"><img class="alignnone size-full wp-image-263" title="e59bbee78987-16" src="http://chemhack.com/wp-content/uploads/2009/03/e59bbee78987-16.png" alt="e59bbee78987-16" width="134" height="102" /></td>
<td valign="top">9</td>
<td valign="top">0/0</td>
<td valign="top">0/0</td>
<td valign="top">8/83</td>
<td valign="top"><span style="color: red;">8/63</span></td>
<td valign="top">9/237</td>
<td valign="top">5/40</td>
</tr>
</tbody>
</table>
<h2>Conclusions &amp; Questions</h2>
<p>As we can see from the table, MACCS, Estate and Substructure Fingerprint  perform very bad, they found very little hit, sometimes no hit at all. They are not designed to do this task, it&#8217;s not amazing to see this result.</p>
<p>For standard and extended CDK fingerprint, sometimes standard one works better(Maybe I should use longer extended fingerprint rather than the default length, as discussed in Guha&#8217;s <a href="http://blog.rguha.net/?p=29">post</a>). On queries of complex ring system, extended CDK fingerprint works better, but not a obvious advantage.</p>
<p>But I wonder why hashed fingerprint still miss some result, (see query 2 and query 7)? Why the superstructure doesn&#8217;t share the same bits with substructure? Is this because the structure is too simple?</p>
<p>As many incurrent hits are found, CDK fingerprints should never be used alone in substructure search. But please consider combine CDK fingerprints and subgraph isomorphism, do fingerprint search first, we can avoid performing the consuming subgraph isomorphism match on all targets, thus we can have a balance between speed and accuracy in that way.</p>
]]></content:encoded>
			<wfw:commentRss>http://chemhack.com/2009/03/does-the-cdk-fingerprints-work-substructure-search/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Building Search Engine for All RDBMSs: Data Synchronization</title>
		<link>http://chemhack.com/2009/02/building-search-engine-for-all-rdbmss-data-synchronization/</link>
		<comments>http://chemhack.com/2009/02/building-search-engine-for-all-rdbmss-data-synchronization/#comments</comments>
		<pubDate>Fri, 27 Feb 2009 19:18:06 +0000</pubDate>
		<dc:creator>Duan Lian</dc:creator>
				<category><![CDATA[Chemoinformatics]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[RDBMS]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[Synchronize]]></category>
		<category><![CDATA[trigger]]></category>

		<guid isPermaLink="false">http://chemhack.com/?p=246</guid>
		<description><![CDATA[It has been discussed in my last post Structure Search Engine for All Major RDBMSs, if you want to build a structure search engine for all RDBMSs, the best way to do it is do it outside RDBMSs. If the structure search index is stored outside RDBSs, you’ve to find a way to synchronize SQL [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://chemhack.com/wp-content/uploads/2009/02/sync_how_to-300x290.jpg" alt="Sync" title="Sync" width="300" height="290" class="alignright size-medium wp-image-251" />It has been discussed in my last post <a href="http://chemhack.com/archives/2009/02/240/">Structure Search Engine for All Major RDBMSs</a>, if you want to build a structure search engine for all RDBMSs, the best way to do it is do it outside RDBMSs. If the structure search index is stored outside RDBSs, you’ve to find a way to synchronize SQL table and the search index. Most modern RDBMSs provide TRIGGER to monitor modification of SQL table, so we can easily implement a one-way synchronization from SQL table to the structure search index. </p>
<p>OK. Let’s go and see how does this works. </p>
<p>NOTE: I choose MySQL as the development platform, SQL statements may need a little change to work on other RDBMSs. But I’m sure SQL statements for other platform will be included in the final release, as my goal is support all major RDBMSs including MySQL, PostgreSQL, Microsoft SQL Server, Oracle and IBM DB2(if possible). </p>
<p> If we had a database named “a_chemical_database” and a table named “molecules”. </p>
<pre lang="SQL">mysql> use a_chemical_database;
mysql> SELECT * FROM molecules;
+----+-----------+-----------+-----------+
| id | smiles    | property1 | property2 |
+----+-----------+-----------+-----------+
|  1 | Cc1ccccc1 | 84        | liquid    |
|  2 | CCC       | 36        | gas       |
+----+-----------+-----------+-----------+
2 rows in set (0.00 sec)
</pre>
<p>In the table “molecules” structure information, the “smiles” column, and other properties exist. There’s nothing different between this table and common SQL tables, i.e. you don’t need to specially design your SQL tables to do structure search.</p>
<p>We want our search engine to know where the modification occurs if someone changed the data. It’s possible to monitor the “molecules” data directly intermittently, but this will be a very consuming task if you have a really big table. With triggers, we can know which kind of modification(INSERT, UPDATE or DELETE) is performed on which row exactly. Before we can create triggers, a table to log the modifications needs to be created.</p>
<pre lang="SQL">mysql> CREATE TABLE `syncs` (
    ->   `id` int(11) NOT NULL auto_increment,
    ->   `mod_action` varchar(10) default NULL,
    ->   `prim_key` int(11) default NULL,
    ->   PRIMARY KEY  (`id`)
    -> );
</pre>
<p>In the table “syncs”, we can store which the type of modification(column “mod_action”) and the row (column “prim_key”).</p>
<p>If data in the “molecules” table is changed, we expect a new record inserted into the “syncs” table, for example,</p>
<pre lang="SQL">mysql> SELECT * FROM syncs ;
+----+------------+----------+
| id | mod_action | prim_key |
+----+------------+----------+
|  1 | INSERT     |        2 |
|  2 | UPDATE     |        2 |
|  3 | DELETE     |        1 |
+----+------------+----------+
</pre>
<p>Now we create triggers.</p>
<pre lang="SQL">
CREATE TRIGGER molecules_insert AFTER INSERT ON molecules
FOR EACH ROW INSERT INTO syncs(mod_action,prim_key) VALUES('INSERT',NEW.id);
CREATE TRIGGER molecules_update AFTER UPDATE ON molecules
FOR EACH ROW
BEGIN
IF  NOT(OLD.smiles LIKE NEW.smiles)  THEN
INSERT INTO syncs(mod_action,prim_key) VALUES('UPDATE',NEW.id));
END IF;
END;
CREATE TRIGGER molecules_delete AFTER DELETE ON molecules
FOR EACH ROW INSERT INTO syncs(mod_action,prim_key) VALUES('DELETE', OLD.id);
</pre>
<p>Here we’ve done. Let’s do something on the “molecules” table and see what happens.</p>
<pre lang="SQL">mysql> INSERT INTO molecules(smiles) VALUES("CC=CCN");
mysql> UPDATE molecules SET smiles='CC1CCC1' WHERE id=2;
mysql> DELETE FROM molecules WHERE id=3;

mysql> SELECT * FROM syncs;
+----+------------+----------+
| id | mod_action | prim_key |
+----+------------+----------+
|  6 | DELETE     |        3 |
|  5 | UPDATE     |        2 |
|  4 | INSERT     |        3 |
+----+------------+----------+
3 rows in set (0.00 sec)
</pre>
<p>The trigger successfully monitored the modifications.</p>
]]></content:encoded>
			<wfw:commentRss>http://chemhack.com/2009/02/building-search-engine-for-all-rdbmss-data-synchronization/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Structure Search Engine for All Major RDBMSs</title>
		<link>http://chemhack.com/2009/02/structure-search-engine-for-all-major-rdbmss/</link>
		<comments>http://chemhack.com/2009/02/structure-search-engine-for-all-major-rdbmss/#comments</comments>
		<pubDate>Thu, 26 Feb 2009 14:40:18 +0000</pubDate>
		<dc:creator>Duan Lian</dc:creator>
				<category><![CDATA[Chemoinformatics]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://chemhack.com/?p=240</guid>
		<description><![CDATA[Many methods of doing substructure search directly in SQL has been reported recently, Adel Golovin and Kim Henrick’s Chemical Substructure in SQL, Rich Apodaca’s fingerprint based MySQL substructure search in MySQL, and Charlie Zhu’s Microsoft SQL Server based substructure search with SMARTS support. Doing this in RDBMSs do have a number of advantages, “including platform independency, [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-medium wp-image-241" title="FingerPrint" src="http://chemhack.com/wp-content/uploads/2009/02/finger-print-256x300.jpg" alt="FingerPrint" width="230" height="270" /></p>
<p>Many methods of doing substructure search directly in SQL has been reported recently, Adel Golovin and Kim Henrick’s <a href="http://pubs.acs.org/doi/abs/10.1021/ci8003013" target="_blank">Chemical Substructure in SQL</a>, <a href="http://depth-first.com" target="_blank">Rich Apodaca</a>’s <a href="http://depth-first.com/articles/2008/10/02/fast-substructure-search-using-open-source-tools-part-1-fingerprints-and-databases">fingerprint based MySQL substructure search in MySQL</a>, and <a href="http://blog.charliezhu.com/">Charlie Zhu</a>’s <a href="http://code.google.com/p/sqlmol/" target="_blank">Microsoft SQL Server based substructure search with SMARTS support</a>.</p>
<p><span>Doing this in RDBMSs do have a number of advantages, “including platform independency, simplicity, flexibility, integrity, robustness and single point of failure”, as Adel and Kim describes. But some light weight RDBMSs such as MySQL and PostgreSQL, the most widely used open source ones, provide very limited SQL programming function, a pure SQL based solution may be impossible. </span></p>
<p><span>Plugins are developed to enhance the functionality. For MySQL, there’s an open source project called <a href="http://sourceforge.net/projects/mychem">mychem</a>. For PostgreSQL, there’s <a href="http://pgfoundry.org/projects/pgchem">pgchem:tigress</a> which is also open source. Both of them is based on <a href="http://openbabel.org/">OpenBabel</a>, a C++ chemoinformatics library. </span></p>
<p><span>On Oracle platform, there’re <a href="http://www.cambridgesoft.com/solutions/details/?es=2">CambridgeSoft Oracle Cartridge</a>, <a href="http://www.symyx.com/products/software/cheminformatics/direct/index.jsp">Symyx Direct</a>, <a href="http://www.chemaxon.com/product/jc_cart.html">JChem Cartridge</a>(may be free for academic or non-commercial use), etc.. As Oracle is a commercial platform, not of these above is free.</span></p>
<p><span>When I was developing <a href="http://chemsoso.com">chemsoso.com</a>, a Chinese chemical supplier database, structure search feature is an important problem to be solved. The database contains 90,000 different chemicals in total and still growing, performance needs to be carefully dealt with. </span></p>
<p><span>In consideration of speed, fingerprint is obviously the best choice. It takes time to generate fingerprints, but in the search stage, bit operation are much less consuming than </span><span>graph matching. My initial idea is to generate fingerprint in Java and do bit operation in MySQL. Unfortunately, MySQL has restrictions on bit operation, it limit the maximum range to 64 bits. In Rich’s solution, fingerprint is separated into multi fileds to satisfy MySQL’s requirement. Substructure search is possible in this method. But similar search where Tanimoto coefficient needs to be calculated is still impossible, as more bit operation function is missing in MySQL.</span></p>
<p><span>In my final solution, a in-memory fingerprint index outside MySQL is created. Molecule structure information(SMILES or mol file) is stored in MySQL, my search engine synchronize data between the in-memory index and MySQL table. Structure searching is performed directly on the in-memory index, this guarantee the performance. On a MacBook with 1.83GHz CPU,</span><span> it only takes about 50ms to do substructure search on </span><span>chemsoso.com’s</span><span> 90,000 structures. For similar structure search, it takes about 300ms, for full structure search, the time is less than 10ms. If search boundary if set according to similarity requirement, we can have another 4X to 100X performance improvement depends on the complexity of the query molecule structure.</span></p>
<p><span>Several days ago, Charlie Zhu talked to me, wondering how chemsoso.com’s structure search engine works. As the structure was mainly built from open source libraries, I decide to make the search engine open source to ease the work of building chemistry related database. With the search engine released, developers can focus the database’s own functionality, instead of dealing with structure search.</span></p>
<p style="text-align: center;"><img class="size-medium wp-image-242 aligncenter" title="Engine Structure" src="http://chemhack.com/wp-content/uploads/2009/02/engine-254x300.png" alt="Engine Structure" /></p>
<p><span>Before I can release the search engine, I have to find a way to cut in users’ system in the form of plugin. Including source code directly into users’ project may be the fastest way to add structure search functionality, but my code is in Java does not means everyone’s project is in Java. I want the search engine works not only with all major RDBMSs, but also all major OSs and all major programming languages. Besides Java API, command-line API and HTTP API will also be provided to make sure the search engine works with multi programming language and network environment where server clusters exists. </span></p>
]]></content:encoded>
			<wfw:commentRss>http://chemhack.com/2009/02/structure-search-engine-for-all-major-rdbmss/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Share:开放和封闭</title>
		<link>http://chemhack.com/2009/02/share%e5%bc%80%e6%94%be%e5%92%8c%e5%b0%81%e9%97%ad/</link>
		<comments>http://chemhack.com/2009/02/share%e5%bc%80%e6%94%be%e5%92%8c%e5%b0%81%e9%97%ad/#comments</comments>
		<pubDate>Thu, 26 Feb 2009 08:37:38 +0000</pubDate>
		<dc:creator>Duan Lian</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://chemhack.com/?p=237</guid>
		<description><![CDATA[很多人都说Apple很封闭。但是看看apple的很多重要系统都和开源世界有精密联系。而看看微软，其实在这一点上远不如apple。 其实封闭和开放应该从不同的角度去看待。微软对底层核心技术很封闭，但是对第三方合作伙伴看起来就比较开放。Apple的封闭似乎更体现在对最终用户体验的严格掌控上。用jobs的话说：如果不明确所有的最终细节，我不知道该怎么保证优秀的用户体验。 这世界就是这样，各取所需就好了。 感谢bitcowboy的精彩言论]]></description>
			<content:encoded><![CDATA[<blockquote><p>很多人都说Apple很封闭。但是看看apple的很多重要系统都和开源世界有精密联系。而看看微软，其实在这一点上远不如apple。</p>
<p>其实封闭和开放应该从不同的角度去看待。微软对底层核心技术很封闭，但是对第三方合作伙伴看起来就比较开放。Apple的封闭似乎更体现在对最终用户体验的严格掌控上。用jobs的话说：如果不明确所有的最终细节，我不知道该怎么保证优秀的用户体验。</p>
<p>这世界就是这样，各取所需就好了。</p></blockquote>
<p><a href="http://www.elesson.com.cn/forum/space.php?uid=468">感谢bitcowboy的精彩言论</a></p>
]]></content:encoded>
			<wfw:commentRss>http://chemhack.com/2009/02/share%e5%bc%80%e6%94%be%e5%92%8c%e5%b0%81%e9%97%ad/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>jsMolEditor preview 0.2 is available for download now.</title>
		<link>http://chemhack.com/2009/02/jsmoleditor-preview-02-is-available-for-download-now/</link>
		<comments>http://chemhack.com/2009/02/jsmoleditor-preview-02-is-available-for-download-now/#comments</comments>
		<pubDate>Sun, 01 Feb 2009 16:20:03 +0000</pubDate>
		<dc:creator>Duan Lian</dc:creator>
				<category><![CDATA[Chemoinformatics]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://chemhack.com/?p=234</guid>
		<description><![CDATA[I&#8217;m very glad to tell you that the world&#8217;s first molecule structure editor in pure JavaScript has now released its first workable preview. jsMolEditor is different from all the normal input plugins. It&#8217;s in pure JavaScript, you don&#8217;t need any runtime installed. It simply works on all modern web browsers, including Firefox, Safari, Chrome, Opera [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m very glad to tell you that the world&#8217;s first molecule structure editor in pure JavaScript has now released its first workable preview.</p>
<p>jsMolEditor is different from all the normal input plugins. It&#8217;s in pure JavaScript, you don&#8217;t need any runtime installed. It simply works on all modern web browsers, including Firefox, Safari, Chrome, Opera and Internet Explorer(workable though slow).</p>
<p>Online demo and download is at http://chemhack.com/jsmoleditor/</p>
<div></div>
]]></content:encoded>
			<wfw:commentRss>http://chemhack.com/2009/02/jsmoleditor-preview-02-is-available-for-download-now/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>jsMolEditor:Click is OK</title>
		<link>http://chemhack.com/2009/01/jsmoleditorclick-is-ok/</link>
		<comments>http://chemhack.com/2009/01/jsmoleditorclick-is-ok/#comments</comments>
		<pubDate>Mon, 19 Jan 2009 14:57:41 +0000</pubDate>
		<dc:creator>Duan Lian</dc:creator>
				<category><![CDATA[Chemoinformatics]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://chemhack.com/?p=231</guid>
		<description><![CDATA[Now you don&#8217;t need to drag to draw new bond, the latest version of jsMolEditor supports clicking directly on an atom to draw bonds. I&#8217;m also thinking about charge support and a better tool box UI.]]></description>
			<content:encoded><![CDATA[<p>Now you don&#8217;t need to drag to draw new bond, the latest version of jsMolEditor supports clicking directly on an atom to draw bonds. I&#8217;m also thinking about charge support and a better tool box UI.</p>
]]></content:encoded>
			<wfw:commentRss>http://chemhack.com/2009/01/jsmoleditorclick-is-ok/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Progress of jsMolEditor</title>
		<link>http://chemhack.com/2009/01/progress-of-jsmoleditor/</link>
		<comments>http://chemhack.com/2009/01/progress-of-jsmoleditor/#comments</comments>
		<pubDate>Thu, 15 Jan 2009 16:32:09 +0000</pubDate>
		<dc:creator>Duan Lian</dc:creator>
				<category><![CDATA[Chemoinformatics]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://chemhack.com/?p=229</guid>
		<description><![CDATA[The latest development version of jsMolEditor is here. I implemented zoom in/out, single/double/triple bond toggle, erasering etc.. So, have a look.]]></description>
			<content:encoded><![CDATA[<p>The latest development version of jsMolEditor is <a href="http://chemhack.com/gwt/com.chemhack.jsMolEditor.Editor/Editor.html" target="_blank">here</a>. I implemented zoom in/out, single/double/triple bond toggle, erasering etc..</p>
<p>So, have a look.</p>
]]></content:encoded>
			<wfw:commentRss>http://chemhack.com/2009/01/progress-of-jsmoleditor/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>JavaScript for Cheminformatics:JavaScript Molecule Editor and 3D Structure Viewer</title>
		<link>http://chemhack.com/2009/01/javascript-for-cheminformaticsjavascript-molecule-editor-and-3d-structure-viewer/</link>
		<comments>http://chemhack.com/2009/01/javascript-for-cheminformaticsjavascript-molecule-editor-and-3d-structure-viewer/#comments</comments>
		<pubDate>Wed, 07 Jan 2009 09:15:01 +0000</pubDate>
		<dc:creator>Duan Lian</dc:creator>
				<category><![CDATA[Chemoinformatics]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[jsMolEditor]]></category>

		<guid isPermaLink="false">http://chemhack.com/?p=217</guid>
		<description><![CDATA[Very happy to read about Rich&#8217;s post. It was one month ago when I released a demo of molecule structure renderer. Sorry for disappearing from the Internet so long, but I had to cope with final exams for 9 courses. However, I passed all of them and are now enjoying my winter vacation for 35 [...]]]></description>
			<content:encoded><![CDATA[<p>Very happy to read about <a href="http://depth-first.com/articles/2009/01/06/javascript-for-cheminformatics-cross-compiling-java-to-javascript-with-gwt-revisited">Rich&#8217;s post</a>. It was one month ago when I released a <a href="http://chemhack.com/mx-gwt/demo-molecule-structure-rendering/">demo</a> of molecule structure renderer. Sorry for disappearing from the Internet so long, but I had to cope with final exams for 9 courses. However, I passed all of them and are now enjoying my winter vacation for 35 days, enough to make the tittle of this post to become true. </p>
<p>Today, I got something to show you.</p>
<p>Have a look at renderer and editor <a href="http://chemhack.com/gwt/com.chemhack.jsMolEditor.Editor/Editor.html" target="_blank">demo here</a>. </p>
<p>You can click the first two buttons to load a demo structure with different sizes, and you can read a demo molecule. You can move your mouse on atoms, drag the editor on atoms and in white space to see what happened. If you&#8217;re using IE, especially IE6, the dragging process may not be so smooth, as IE is famous for its super slow JavaScript engine. </p>
<p>OK, what happened after you clicked the buttons. The buttons for loading editor have a onClick attribute with JavaScript below.<br />
<code>initEditor('editor1',500,300);</code><br />
And behind this function is:<br />
<code>    function initEditor(divID,width, height){<br />
        if(window.__initEditor){<br />
        document.getElementById(divID).innerHTML="";<br />
		__initEditor(divID,width, height);<br />
        }else{<br />
            document.getElementById(divID).style.width=width+"px";<br />
            document.getElementById(divID).style.height=height+"px";<br />
            document.getElementById(divID).innerHTML="Loading...";<br />
            setTimeout(function(){initEditor(divID,width, height);}, 1000);<br />
        }<br />
    }<br />
</code><br />
And this in GWT Java code:<br />
<code><br />
    private static native void injectJSMethods()/*-{<br />
    $wnd.__readMolFile =function(divID,fileContent){<br />
    @com.chemhack.jsMolEditor.client.Editor::readMolFile(Ljava/lang/String;Ljava/lang/String;)(divID,fileContent);<br />
    };</p>
<p>    $wnd.__initEditor =function(divID, width, height){<br />
    @com.chemhack.jsMolEditor.client.Editor::initEditor(Ljava/lang/String;II)(divID, width, height);<br />
    };</p>
<p>    }-*/;</p>
<p></code></p>
<p>I think Rich&#8217;s problem has been partly solved as this proves how we can cross the boundary between hand-written JavaScript and GWT generated JavaScript. If you&#8217;d like to make the whole library exposed to JavaScript world, just write a wrapper for each Java method you&#8217;d like to call. GWT code generator may be a good helper.</p>
<p>I call this molecule editor jsMolEditor, and I plan to release its first fully functional Alpha version in two or three weeks. jsMolEditor will be released under GPL license as its code mainly came from MX-GWT and JChemPaint. </p>
<p>So how about JMol in javascript? This <a href="http://www.redbrick.dcu.ie/~noel/blog/molecproc/">demo</a> shows that it&#8217;s not a mission impossible, but we have a long way to go. </p>
<p>I promise to release the first Alpha in two or three weeks and also keep you informed how is the work going with at least two posts per week.  <img src='http://chemhack.com/wp-includes/images/smilies/icon_lol.gif' alt=':dsadsad:' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://chemhack.com/2009/01/javascript-for-cheminformaticsjavascript-molecule-editor-and-3d-structure-viewer/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

