Descending indexing and loose index scan

Comments to my previous posts, especially this one by Gokhan inspired me to write a bit about descending indexes and about loose index scan, or what Gokhan calls “better range” support. None of these are actially related to Innodb tables in general - these are features MySQL should get for all storage engines at some point.

在我以前文章的评论中,尤其是 Gokhan 在这个中提到的,激发了我想写点关于降序索引和减轻索引扫描的东西,或者是 Gokhan 所谓的“改善的范围”支持。通常这些特性跟 Innodb 表都没有内在联系,它们将来在某些时候都能支持各种存储引擎。

Descending indexes - This is something MySQL does not have at this point, but it was not where for so long at large extent because it is less problem than many people think. First - if index is ascending it does not mean it can’t be scanned in reverse order and it will well be. This is how MySQL will optimize indexed ORDER BY col DESC queries for example. Reverse scan could be as fast as forward scan - this is however where storage engines and operating systems come in play. For example certain operation systems might not do backward read-ahead which may slow it down a bit. Or some storage engines, such as MyISAM (for packed indexes) may have reverse scan being much slower than forward scan.

降序索引 -- 现在 MySQL 还不支持这个功能,不过这比很多人想的那样,问题少多了。首先,如果索引是顺序的并不意味着它不能倒序扫描,实际上它表现得挺好的。这就是为什么 MySQL 能优化例如 ORDER BY col DESC 查询的索引。倒序扫描能和正序扫描一样快,不过这些是由存储引擎和操作系统来处理的。例如有些操作系统无法倒序读,这对速度有所降低。或者某些存储引擎,如 MyISAM (它压缩了索引) 倒序扫描时比正序扫描来的慢。

So when do you really need Descending indexes ? Most typical case is when you want to order by two colums in different directions: … ORDER BY price ASC, date DESC LIMIT 10 If you have indexed on (price,date) in ascending order you will not be able to optimize this query well - external sort (”filesort”) will be needed. If you would be able to build index on price ASC, date DESC the same query could retrive data in aready sorted order.

那么什么时候才真的需要倒序索引呢?很多典型的情况是当你想要对两个字段作不同方向的排序时:… ORDER BY price ASC, date DESC LIMIT 10。如果已经有了对 (price,date) 的正序索引,则不能较好地优化这个查询 -- 需要用到外部排序(“filesort”)。如果能建立 price ASC, date DESC 的索引,那么这个查询就能按照已经排好的顺序取出数据了。

This is however something you can workaround by having something like “reverse_date” column and using it for sort. With MySQL 5.0 you even can use triggers to update it as real date updates so it becomes less ugly. In fact this is for example why you would see “reverse_timestamp” field in Wikipedia table structure.

然而,常见的变通办法是创建一个“倒序数据”字段,并且利用它来排序。在 MySQL 5.0 中你甚至可以使用触发器来更新真实的数据使得更合适。这就是为什在 Wikipedia 的表结构中有一个 “reverse_timestamp” 字段的缘故。

Loose index scan - Number of years ago when I just started using MySQL I thought it would have any optimization which could come to my mind. For example if I would have (A>0 and B>6) clause and index (A,B) I expected it would start looking at all values where A>0 instantly jumping to onces have B>6 by using index. It is possibe. So I was shocked and upset to find out it did not. And this optimization is still not implemented. This is very important item to remember when you designing your new applications or porting ones from other databases. Designing the indexes for MySQL you should only make sure queries use “=” for all keyparts in the last of index. Only last one is allowed to have “range” clauses, such as >, IN etc. All clauses which follow the range in the index will not use index for their operation.

减少索引扫描 -- 多年前当我刚开始使用 MySQL 时,我想它也许有些优化方法能让我记住。例如如果有一个 (A>0 and B>6) 分句和索引 (A,B),我期望能使用索引来查找所有 A>0 的值,并且能立刻跳转到 B>6 的记录上,我想这是可行的。不过令我郁闷的是竟然不支持,并且这种优化方法还未实现。在设计新的应用程序或者移植数据库时,记住这个特点很重要。设计 MySQL 索引时只需设计保证能让索引最后的所有索引部分都使用 “=” 查询。只有最后一个索引部分才支持 “range” 分句、IN 等。所有在范围索引后面的分句都不会使用到索引。

Let me give one more example KEY (A,B,C) A=5 and B>6 and C=7 In this case index will be used to check A=5 and B>6 cause. C=7 will not be using the index (rows with all C values will be retrieved from the index) and if this is not the index covered query you might rather shorten your key to KEY(A,B) to keep index smaller.

举几个例子吧, 索引 (A,B,CP) 和 A=5 and B>6 and C=7 分句的情况下,索引会检索 A=5 和 B>6 的条件,C=7 则不会用到索引(所有包含 C 的记录都会从索引中检索得到)。这个时候如果任何查询都无需使用完整的索引的话,就可以缩短索引为 KEY(A,B),这样能让索引变小。

The good news are Loose index scan implementation is finally on a way. In fact it was even implemented in MySQL 5.0, but only for very narrow case of aggregate queries.

一个好消息是,减少索引扫描终究会以某种方式实现。MySQL 5.0 中其实已经实现了,不过只适用于少数情况的聚合查询。

In general complete loose index scan implementation is one of my most wanted features for MySQL optimizer.
P.S If you post queries in your comments please also explain which indexes do you have on the table. SHOW CREATE TABLE is the best. Otherwise I can get you wrong.

常规意义上的完全减少索引扫描是我最想要实现的MySQL优化器特性。顺便提一下,如果你在我的帖子评论中贴上了查询语句,请顺便说明你的索引情况,最好是贴上 SHOW CREATE TABLE 的结果。