MySQL :: A Chinese Fulltext Search Plugin based on Friso Tokenizer

MySQL 5.7 support external fulltext plugin for tables in InnoDB storage engine, you can build a real time search solution in few minutes. For 1 million documents table, MySQL can offer about 2000 QPS for fulltext search queries by a testing about two years ago (writing the plugin for MySQL 5.6 and 5.7.3). We checked several chinese tokenizer libarys and finally choosed the friso project based on the quanlity and performance. Since MySQL 5.7 is GA now, I updated the fulltext plugin with latest friso code.

You can download the plugin and extract it to the MySQL home directory. And then copy the “ft_friso.so” file to the plugin directory.  Then run the following with root access in MySQL.

mysql> INSTALL PLUGIN friso SONAME 'ft_friso.so';

Make sure that the following two parameters are set to value “1”, as the default value “3” means it will not index the word less than 3 words, the default value is not proper for chinsese tokenizer.

mysql> show variables like '%min_word_len';
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| ft_min_word_len | 1     |
+-----------------+-------+
1 row in set (0.00 sec)

mysql> show variables like '%min_token_size';
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| innodb_ft_min_token_size | 1     |
+--------------------------+-------+
1 row in set (0.00 sec)

Now we can create a table for testing, the charset of the testing table must be either “UTF8” or “GBK”.

mysql> create table t_fulltext (
    ->   id int not null primary key,
    ->   doc varchar(100)) charset utf8;
Query OK, 0 rows affected (0.33 sec)

mysql> create fulltext index idx_t_fulltext
    ->    on t_fulltext(doc)
    ->    with parser friso;
Query OK, 0 rows affected, 1 warning (1.19 sec)
Records: 0  Duplicates: 0  Warnings: 1

And insert few rows.

mysql> insert into t_fulltext values (1, '中文Search');
Query OK, 1 row affected (0.46 sec)

Now you can run fulltext query with “MATCH (…) AGAINST (…)” syntax, visit MySQL document for more information.

mysql> select * from t_fulltext
    ->   where match(doc) against ('search');
+----+------------+
| id | doc        |
+----+------------+
|  1 | 中文Search |
+----+------------+
1 row in set (0.00 sec)

mysql> select * from t_fulltext
    ->   where match(doc) against ('中文');
+----+------------+
| id | doc        |
+----+------------+
|  1 | 中文Search |
+----+------------+
1 row in set (0.00 sec)

If you have lot’s of document need to be searched in MySQL, you can sharding the tables out to multiple MySQL databases with OneProxy, or build multiple slaves for load balance with OneProxy.