大家好,我是只谈技术不剪发的 Tony 老师。
Microsoft SQL Server 过滤索引(筛选索引)是指基于满足特定条件的数据行进行索引。与全表索引(默认创建)相比,设计良好的筛选索引可以提高查询性能、减少索引维护开销并可降低索引存储开销。本文就给大家介绍一下 Microsoft SQL Server 中的过滤索引功能。
在创建过滤索引之前,我们需要了解它的适用场景。
我们在创建索引时可以通过一个 WHERE 子句指定需要索引的数据行,从而创建一个过滤索引。例如,对于以下订单表 orders:
CREATE TABLE orders (
id INTEGER PRIMARY KEY,
customer_id INTEGER,
status VARCHAR(10)
);
BEGIN
DECLARE @counter INT = 1
WHILE @counter <= 1000000
BEGIN
INSERT INTO orders
SELECT @counter, (rand() * 100000),
CASE
WHEN (rand() * 100)<1 THEN 'pending'
WHEN (rand() * 100)>99 THEN 'shipped'
ELSE 'completed'
END
SET @counter = @counter + 1
END
END;
CREATE INDEX full_idx ON orders (customer_id, status);
然后我们查看以下查询语句的执行计划:
SET STATISTICS PROFILE ON
SELECT *
FROM orders
WHERE customer_id = 5043
AND status != 'completed';
id |customer_id|status |
------+-----------+-------+
743436| 5043|pending|
947848| 5043|shipped|
RowsExecutesStmtTextStmtIdNodeIdParentPhysicalOpLogicalOpArgumentDefinedValuesEstimateRowsEstimateIOEstimateCPUAvgRowSizeTotalSubtreeCostOutputListWarningsTypeParallelEstimateExecutions
21SELECT * FROM [orders] WHERE [customer_id]=@1 AND [status]<>@2110NULLNULLNULLNULL1.405213NULLNULLNULL0.003283546NULLNULLSELECT0NULL
21 |--Index Seek(OBJECT:([hrdb].[dbo].[orders].[full_idx]), SEEK:([hrdb].[dbo].[orders].[customer_id]=(5043) AND [hrdb].[dbo].[orders].[status] < 'completed' OR [hrdb].[dbo].[orders].[customer_id]=(5043) AND [hrdb].[dbo].[orders].[status] > 'completed') ORDERED FORWARD)121Index SeekIndex SeekOBJECT:([hrdb].[dbo].[orders].[full_idx]), SEEK:([hrdb].[dbo].[orders].[customer_id]=(5043) AND [hrdb].[dbo].[orders].[status] < 'completed' OR [hrdb].[dbo].[orders].[customer_id]=(5043) AND [hrdb].[dbo].[orders].[status] > 'completed') ORDERED FORWARD[hrdb].[dbo].[orders].[id], [hrdb].[dbo].[orders].[customer_id], [hrdb].[dbo].[orders].[status]1.4052130.0031250.0001585457270.003283546[hrdb].[dbo].[orders].[id], [hrdb].[dbo].[orders].[customer_id], [hrdb].[dbo].[orders].[status]NULLPLAN_ROW01
SELECT ix.name AS "Index name",
SUM(sz.used_page_count) * 8/1024.0 AS "Index size (MB)"
FROM sys.dm_db_partition_stats AS sz
INNER JOIN sys.indexes AS ix ON sz.object_id = ix.object_id
AND sz.index_id = ix.index_id
INNER JOIN sys.tables tn ON tn.OBJECT_ID = ix.object_id
WHERE tn.name = 'orders'
GROUP BY ix.name;
Index name |Index size (MB)|
----------------------------+---------------+
full_idx | 26.171875|
PK__orders__3213E83F1E3B8A3B| 29.062500|
接下来我们再创建一个部分索引,只包含未完成的订单数据,从而减少索引的数据量:
CREATE INDEX partial_idx ON orders (customer_id)
WHERE status != 'completed';
索引 partial_idx 中只有 customer_id 字段,不需要 status 字段。同样可以查看一下索引 partial_idx 占用的空间大小:
SELECT ix.name AS "Index name",
SUM(sz.used_page_count) * 8/1024.0 AS "Index size (MB)"
FROM sys.dm_db_partition_stats AS sz
INNER JOIN sys.indexes AS ix ON sz.object_id = ix.object_id
AND sz.index_id = ix.index_id
INNER JOIN sys.tables tn ON tn.OBJECT_ID = ix.object_id
WHERE tn.name = 'orders'
GROUP BY ix.name;
Index name |Index size (MB)|
----------------------------+---------------+
full_idx | 26.171875|
partial_idx | 0.289062|
PK__orders__3213E83F1E3B8A3B| 29.062500|
SELECT *
FROM orders WITH ( INDEX ( partial_idx ) )
WHERE customer_id = 5043
AND status != 'completed';
RowsExecutesStmtTextStmtIdNodeIdParentPhysicalOpLogicalOpArgumentDefinedValuesEstimateRowsEstimateIOEstimateCPUAvgRowSizeTotalSubtreeCostOutputListWarningsTypeParallelEstimateExecutions
21SELECT * FROM orders WITH ( INDEX ( partial_idx ) ) WHERE customer_id = 5043 AND status != 'completed'110NULLNULLNULLNULL1.124088NULLNULLNULL0.03279812NULLNULLSELECT0NULL
21 |--Nested Loops(Inner Join, OUTER REFERENCES:([hrdb].[dbo].[orders].[id]))121Nested LoopsInner JoinOUTER REFERENCES:([hrdb].[dbo].[orders].[id])NULL1.12408804.15295E-05240.03279812[hrdb].[dbo].[orders].[id], [hrdb].[dbo].[orders].[customer_id], [hrdb].[dbo].[orders].[status]NULLPLAN_ROW01
21 |--Index Seek(OBJECT:([hrdb].[dbo].[orders].[partial_idx]), SEEK:([hrdb].[dbo].[orders].[customer_id]=(5043)) ORDERED FORWARD)132Index SeekIndex SeekOBJECT:([hrdb].[dbo].[orders].[partial_idx]), SEEK:([hrdb].[dbo].[orders].[customer_id]=(5043)) ORDERED FORWARD, FORCEDINDEX[hrdb].[dbo].[orders].[id], [hrdb].[dbo].[orders].[customer_id]9.9352870.0031250.0001679288150.003292929[hrdb].[dbo].[orders].[id], [hrdb].[dbo].[orders].[customer_id]NULLPLAN_ROW01
22 |--Clustered Index Seek(OBJECT:([hrdb].[dbo].[orders].[PK__orders__3213E83F1E3B8A3B]), SEEK:([hrdb].[dbo].[orders].[id]=[hrdb].[dbo].[orders].[id]) LOOKUP ORDERED FORWARD)152Clustered Index SeekClustered Index SeekOBJECT:([hrdb].[dbo].[orders].[PK__orders__3213E83F1E3B8A3B]), SEEK:([hrdb].[dbo].[orders].[id]=[hrdb].[dbo].[orders].[id]) LOOKUP ORDERED FORWARD, FORCEDINDEX[hrdb].[dbo].[orders].[status]10.0031250.0001581160.02946366[hrdb].[dbo].[orders].[status]NULLPLAN_ROW09.935287
我们比较通过 full_idx 和 partial_idx 执行以下查询的时间:
-- 300 ms
SELECT count(*)
FROM orders WITH ( INDEX ( full_idx ) )
WHERE status != 'completed';
-- 10 ms
SELECT count(*)
FROM orders WITH ( INDEX ( partial_idx ) )
WHERE status != 'completed';
DROP INDEX partial_idx ON orders;
TRUNCATE TABLE orders;
CREATE UNIQUE INDEX partial_idx ON orders (customer_id)
WHERE status != 'completed';
INSERT INTO orders(id, customer_id, status) VALUES (1, 1, 'pending');
INSERT INTO orders(id, customer_id, status) VALUES (2, 1, 'pending');
SQL 错误 [2601] [23000]: 不能在具有唯一索引“partial_idx”的对象“dbo.orders”中插入重复键的行。重复键值为 (1)。
用户必须完成一个订单之后才能继续生成新的订单。
通过以上介绍可以看出,过滤索引是一种经过优化的非聚集索引,尤其适用于从特定数据子集中选择数据的查询。
到此这篇关于利用 SQL Server 过滤索引提高查询语句的性能分析的文章就介绍到这了,更多相关SQL Server索引提高语句性能内容请搜索脚本之家以前的文章或继续浏览下面的相关文章希望大家以后多多支持脚本之家!