MySQL排序后分页错乱问题分析

从问题出发

一位叫军磊的开发,每次说到order by的时候,都要提到一个事,那就是要带上一个有唯一键的列,否则会出现问题。具体是啥问题,他也遗忘了,我来帮他回忆一下😁

1.建个表

1
2
3
4
5
6
7
CREATE TABLE `student` (
`id` varchar(50) NOT NULL COMMENT '学生ID',
`name` varchar(20) DEFAULT NULL COMMENT '学生姓名',
`create_time` datetime DEFAULT NULL COMMENT '创建时间',
`modify_time` datetime DEFAULT NULL COMMENT '修改时间',
UNIQUE KEY `student_id_IDX` (`id`) USING BTREE
) COMMENT='学生表';

2.插入一些数据

1
2
3
4
5
6
7
8
9
10
11
insert into
student
values
("STU000001","001","2012-12-01 12:12:12","2012-12-01 12:12:12"),
("STU000002","002和002时间一致","2012-12-02 12:12:12","2012-12-02 12:12:12"),
("STU000003","003和002时间一致","2012-12-02 12:12:12","2012-12-02 12:12:12"),
("STU000004","004和002时间一致","2012-12-02 12:12:12","2012-12-02 12:12:12"),
("STU000005","005和002时间一致","2012-12-02 12:12:12","2012-12-02 12:12:12"),
("STU000006","006","2012-12-06 12:12:12","2012-12-02 12:12:12"),
("STU000007","007和002时间一致","2012-12-02 12:12:12","2012-12-02 12:12:12"),
("STU000008","008和002时间一致","2012-12-02 12:12:12","2012-12-02 12:12:12");

3.执行查询语句得到结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
select * from student order by create_time limit 0,2; 
预期:
STU000001
STU000002
结果:
STU000001
STU000002
select * from student order by create_time limit 2,2;
预期:
STU000003
STU000004
结果:
STU000003
STU000004
select * from student order by create_time limit 4,2;
预期:
STU000005
STU000007
结果: 不符合预期
STU000005
STU000008
select * from student order by create_time limit 6,2;
预期:
STU000008
STU000006
结果:
STU000008
STU000006

​ 可以看到在执行limit 4,2这条语句时,查询的结果不符合预期

产生问题的原因

《MySQL5.7官方文档-8.2.1.17 LIMIT Query Optimization》:If an index is not used for ORDER BY but a LIMIT clause is also present, the optimizer may be able to avoid using a merge file and sort the rows in memory using an in-memory filesort operation.

order by对一个非索引列使用时,还使用了limit语句,优化器会使用内存对行排序

《Quota-Sachin Khanna》:MySQL has multiple filesort algorithms for sorting and retrieving results. The original algorithm uses only the ORDER BY columns. The modified algorithm uses not just the ORDER BY columns, but all columns referenced by the query. There is also an algorithm for small result sets that sorts in memory using the sort buffer as a priority queue without a merge file.

filesortpriority queue主要使用了堆排序,而堆排序属于不稳定的排序算法,造成了结果的不确定性

如何解决问题

解决order by的二义性,增加一个唯一字段,如下:

select * from student order by create_time,id limit 4,2;

附录

MySQL5.7字符编码修改

Windows的MYSQL5.7环境

修改字符编码:

在目录C:\ProgramData\MySQL\MySQL Server 5.7\my.ini最后增加:

1
2
3
4
5
6
7
[client]
default-character-set=utf8
[mysql]
default-character-set=utf8
[mysqld]
character-set-server=utf8
collation-server=utf8_general_ci

重启本地MYSQL(Windows10可以通过任务管理器,最右边服务下面有MYSQL57可以鼠标控制重启)

查询字符环境(SHOW VARIABLES LIKE 'character%';):

1
2
3
4
5
6
7
8
character_set_client	utf8mb4
character_set_connection utf8mb4
character_set_database latin1
character_set_filesystem binary
character_set_results
character_set_server utf8
character_set_system utf8
character_sets_dir C:\Program Files\MySQL\MySQL Server 5.7\share\charsets\