昨天在测试Spark over MySQL/OneProxy时,发现网上有人使用Spark来加速(通过并行)MySQL上单个大表的查询速度,马上想到OneProxy里也有并行功能,是不是也可以用来加速MySQL上单个大表的查询速度呢?毕竞Spark的任务下发过程中时延还是比较大的,可能OneProxy更适合这个场景。假设在MySQL数据库中有一张比较大的表,如下所示:
mysql> select count(*) from my_hash2; +----------+ | count(*) | +----------+ | 8698317 | +----------+ 1 row in set (2.17 sec)
可以通过视图来将数据分成多个分片,如下所示:
create view my_hash2_p0 as select * from my_hash2 where tid < 1000000; create view my_hash2_p1 as select * from my_hash2 where tid >= 1000000 and tid < 2000000; ...... create view my_hash2_p8 as select * from my_hash2 where tid >= 8000000 and tid < 9000000;
接下来只需要在OneProxy里配置一个分区表,就可以使用OneProxy的并行查询(企业版特有功能)来加速了。分区表的配置如下所示:
[ { "table" : "my_hash2", "pkey" : "tid", "type" : "int", "method" : "range", "partitions": [ { "suffix" : "_p0", "group": "shixun1", "value" : 1000000 }, { "suffix" : "_p1", "group": "shixun1", "value" : 2000000 }, { "suffix" : "_p2", "group": "shixun1", "value" : 3000000 }, { "suffix" : "_p3", "group": "shixun1", "value" : 4000000 }, { "suffix" : "_p4", "group": "shixun1", "value" : 5000000 }, { "suffix" : "_p5", "group": "shixun1", "value" : 6000000 }, { "suffix" : "_p6", "group": "shixun1", "value" : 7000000 }, { "suffix" : "_p7", "group": "shixun1", "value" : 8000000 }, { "suffix" : "_p8", "group": "shixun1", "value" : 9000000 } ] } ]
接下来通过OneProxy来查询表的记录数,看看需要多长时间,如下所示:
mysql> select /* parallel */ count(*) from my_hash2; +----------+ | count(*) | +----------+ | 8698317 | +----------+ 1 row in set (0.49 sec)
可以看到,查询速度提升了将近4倍多,MySQL 5.7对于视图合并的SQL优化好象进步比较大,不再是将视图先生成临时表来处理了,那时这个并行加速的方法将相当实用,Good Luck!