Oracle 11G - 插入时索引的性能影响
验证是否插入不带 PK/index 的记录加上稍后创建时间比插入带 PK/Index 快.
Verify if it is true that insert records without PK/index plus create thme later is faster than insert with PK/Index.
注意
这里的重点不是索引需要更多时间(很明显),而是总成本(Insert without index + create index)高于(Insert with index).因为我被教导在没有索引的情况下插入并稍后创建索引,因为它应该更快.
Note
The point here is not about indexing takes more time (it is obvious), but the total cost (Insert without index + create index) is higher than (Insert with index). Because I was taught to insert without index and create index later as it should be faster.
DELL Latitude core i7 2.8GHz 8G 内存和 Windows 7 64 位固态硬盘
Oracle 11G R2 64 位
Windows 7 64 bit on DELL Latitude core i7 2.8GHz 8G memory & SSD HDD
Oracle 11G R2 64 bit
我被告知插入没有 PK/Index 的记录并在插入后创建它们会比使用 PK/Index 插入更快.
I was taught that insert records without PK/Index and create them after insert would be faster than insert with PK/Index.
但是,使用 PK/Index 插入 100 万条记录实际上比稍后创建 PK/Index 快,大约 4.5 秒对 6 秒,下面的实验.通过将记录增加到 300 万条(999000 -> 2999000),结果是一样的.
However 1 million record inserts with PK/Index was actually faster than creating PK/Index later, approx 4.5 seconds vs 6 seconds, with the experiments below. By increasing the records to 3 million (999000 -> 2999000), the result was the same.
- 下表是 DDL.一个用于数据和数据的大文件表空间指数.
(测试了一个单独的索引表空间,结果相同,整体性能较差) - 每次运行前刷新缓冲区/线轴.
- 每次运行 3 次实验并确保结果是相似的.
要刷新的 SQL:
ALTER SYSTEM CHECKPOINT;
ALTER SYSTEM FLUSH SHARED_POOL;
ALTER SYSTEM FLUSH BUFFER_CACHE;
问题
在没有 PK/Index + PK/Index 创建之后插入"实际上是否比使用 PK/Index 插入"更快?
Question
Would it be actually true that "insert witout PK/Index + PK/Index creation later" is faster than "insert with PK/Index"?
我是否在实验中犯了错误或遗漏了某些条件?
Did I make mistakes or missed some conditions in the experiment?
TRUNCATE TABLE TBL2;
ALTER TABLE TBL2 DROP CONSTRAINT PK_TBL2_COL1 CASCADE;
ALTER TABLE TBL2 ADD CONSTRAINT PK_TBL2_COL1 PRIMARY KEY(COL1) ;
SET timing ON
INSERT INTO TBL2
SELECT i+j, rpad(TO_CHAR(i+j),100,'A')
FROM (
WITH DATA2(j) AS (
SELECT 0 j FROM DUAL
UNION ALL
SELECT j+1000 FROM DATA2 WHERE j < 999000
)
SELECT j FROM DATA2
),
(
WITH DATA1(i) AS (
SELECT 1 i FROM DUAL
UNION ALL
SELECT i+1 FROM DATA1 WHERE i < 1000
)
SELECT i FROM DATA1
);
commit;
1,000,000 rows inserted.
Elapsed: 00:00:04.328 <----- Insert records with PK/Index
插入没有 PK/Index 的记录并在之后创建它们
TRUNCATE TABLE TBL2;
ALTER TABLE &TBL_NAME DROP CONSTRAINT PK_TBL2_COL1 CASCADE;
SET TIMING ON
INSERT INTO TBL2
SELECT i+j, rpad(TO_CHAR(i+j),100,'A')
FROM (
WITH DATA2(j) AS (
SELECT 0 j FROM DUAL
UNION ALL
SELECT j+1000 FROM DATA2 WHERE j < 999000
)
SELECT j FROM DATA2
),
(
WITH DATA1(i) AS (
SELECT 1 i FROM DUAL
UNION ALL
SELECT i+1 FROM DATA1 WHERE i < 1000
)
SELECT i FROM DATA1
);
commit;
ALTER TABLE TBL2 ADD CONSTRAINT PK_TBL2_COL1 PRIMARY KEY(COL1) ;
1,000,000 rows inserted.
Elapsed: 00:00:03.454 <---- Insert without PK/Index
table TBL2 altered.
Elapsed: 00:00:02.544 <---- Create PK/Index
表 DDL
CREATE TABLE TBL2 (
"COL1" NUMBER,
"COL2" VARCHAR2(100 BYTE),
CONSTRAINT "PK_TBL2_COL1" PRIMARY KEY ("COL1")
) TABLESPACE "TBS_BIG" ;
推荐答案
确实,如果您不必同时修改一个或多个索引并且可能还执行约束检查,那么修改表会更快,但它是如果您必须添加这些索引,也很大程度上无关紧要.您必须考虑对您希望实现的系统进行全面更改,而不仅仅是其中的一个部分.
It's true that it is faster to modify a table if you do not also have to modify one or more indexes and possibly perform constraint checking as well, but it is also largely irrelevant if you then have to add those indexes. You have to consider the complete change to the system that you wish to effect, not just a single part of it.
显然,如果您要向已经包含数百万行的表中添加一行,那么删除和重建索引将是愚蠢的.
Obviously if you are adding a single row into a table that already contains millions of rows then it would be foolish to drop and rebuild indexes.
但是,即使您有一个完全空的表,您要在其中添加几百万行,将索引推迟到之后仍然会比较慢.
However, even if you have a completely empty table into which you are going to add several million rows it can still be slower to defer the indexing until afterwards.
这样做的原因是这样的插入最好使用直接路径机制执行,当您使用直接路径插入到带有索引的表中时,会构建临时段,其中包含构建索引所需的数据(数据加上rowids).如果这些临时段比您刚刚加载的表小得多,那么它们的扫描和构建索引也会更快.
The reason for this is that such an insert is best performed with the direct path mechanism, and when you use direct path inserts into a table with indexes on it, temporary segments are built that contain the data required to build the indexes (data plus rowids). If those temporary segments are much smaller than the table you have just loaded then they will also be faster to scan and to build the indexes from.
如果表上有五个索引,则另一种方法是在加载后进行五次全表扫描以构建索引.
the alternative, if you have five index on the table, is to incur five full table scans after you have loaded it in order to build the indexes.
显然这里涉及到巨大的灰色区域,但做得很好:
Obviously there are huge grey areas involved here, but well done for:
- 质疑权威和一般经验法则,以及
- 运行实际测试以确定您自己案例中的事实.
其他注意事项 - 您在删除索引时运行备份.现在,在紧急恢复之后,您必须有一个脚本来验证所有索引是否都已到位,而此时您需要竭尽全力恢复系统.
Further considerations -- you run a backup while the indexes are dropped. Now, following an emergency restore, you have to have a script that verifies that all indexes are in place, when you have the business breathing down your neck to get the system back up.
此外,如果您确定在批量加载期间不维护索引,请不要删除索引 - 而是禁用它们.这保留了索引存在和定义的元数据,并允许更简单的重建过程.请注意不要通过截断表意外地重新启用索引,因为这会使禁用的索引再次启用.
Also, if you absolutely were determined to not maintain indexes during a bulk load, do not drop the indexes -- disable them instead. This preserves the metadata for the indexes existence and definition, and allows a more simple rebuild process. Just be careful that you do not accidentally re-enable indexes by truncating the table, as this will render disabled indexes enabled again.
相关文章