msgbartop
Hacking the chemistry world.
msgbarbottom

28 Feb 09 Building Search Engine for All RDBMSs: Data Synchronization

SyncIt has been discussed in my last post Structure Search Engine for All Major RDBMSs, if you want to build a structure search engine for all RDBMSs, the best way to do it is do it outside RDBMSs. If the structure search index is stored outside RDBSs, you’ve to find a way to synchronize SQL table and the search index. Most modern RDBMSs provide TRIGGER to monitor modification of SQL table, so we can easily implement a one-way synchronization from SQL table to the structure search index.

OK. Let’s go and see how does this works.

NOTE: I choose MySQL as the development platform, SQL statements may need a little change to work on other RDBMSs. But I’m sure SQL statements for other platform will be included in the final release, as my goal is support all major RDBMSs including MySQL, PostgreSQL, Microsoft SQL Server, Oracle and IBM DB2(if possible).

If we had a database named “a_chemical_database” and a table named “molecules”.

mysql> use a_chemical_database;
mysql> SELECT * FROM molecules;
+----+-----------+-----------+-----------+
| id | smiles    | property1 | property2 |
+----+-----------+-----------+-----------+
|  1 | Cc1ccccc1 | 84        | liquid    |
|  2 | CCC       | 36        | gas       |
+----+-----------+-----------+-----------+
2 rows in set (0.00 sec)

In the table “molecules” structure information, the “smiles” column, and other properties exist. There’s nothing different between this table and common SQL tables, i.e. you don’t need to specially design your SQL tables to do structure search.

We want our search engine to know where the modification occurs if someone changed the data. It’s possible to monitor the “molecules” data directly intermittently, but this will be a very consuming task if you have a really big table. With triggers, we can know which kind of modification(INSERT, UPDATE or DELETE) is performed on which row exactly. Before we can create triggers, a table to log the modifications needs to be created.

mysql> CREATE TABLE `syncs` (
    ->   `id` int(11) NOT NULL auto_increment,
    ->   `mod_action` varchar(10) default NULL,
    ->   `prim_key` int(11) default NULL,
    ->   PRIMARY KEY  (`id`)
    -> );

In the table “syncs”, we can store which the type of modification(column “mod_action”) and the row (column “prim_key”).

If data in the “molecules” table is changed, we expect a new record inserted into the “syncs” table, for example,

mysql> SELECT * FROM syncs ;
+----+------------+----------+
| id | mod_action | prim_key |
+----+------------+----------+
|  1 | INSERT     |        2 |
|  2 | UPDATE     |        2 |
|  3 | DELETE     |        1 |
+----+------------+----------+

Now we create triggers.

CREATE TRIGGER molecules_insert AFTER INSERT ON molecules
FOR EACH ROW INSERT INTO syncs(mod_action,prim_key) VALUES('INSERT',NEW.id);
CREATE TRIGGER molecules_update AFTER UPDATE ON molecules
FOR EACH ROW
BEGIN
IF  NOT(OLD.smiles LIKE NEW.smiles)  THEN
INSERT INTO syncs(mod_action,prim_key) VALUES('UPDATE',NEW.id));
END IF;
END;
CREATE TRIGGER molecules_delete AFTER DELETE ON molecules
FOR EACH ROW INSERT INTO syncs(mod_action,prim_key) VALUES('DELETE', OLD.id);

Here we’ve done. Let’s do something on the “molecules” table and see what happens.

mysql> INSERT INTO molecules(smiles) VALUES("CC=CCN");
mysql> UPDATE molecules SET smiles='CC1CCC1' WHERE id=2;
mysql> DELETE FROM molecules WHERE id=3;

mysql> SELECT * FROM syncs;
+----+------------+----------+
| id | mod_action | prim_key |
+----+------------+----------+
|  6 | DELETE     |        3 |
|  5 | UPDATE     |        2 |
|  4 | INSERT     |        3 |
+----+------------+----------+
3 rows in set (0.00 sec)

The trigger successfully monitored the modifications.

Tags: , , , ,

Leave a Comment

buy arimidex australia kamagra uk cheapest buy accutane online viagra professional 100 mg order mifepristone and misoprostol online buy cheap accutane buy nolvadex and clomid