SUBSTR 在 CLOB 上的表现

2021-12-24 00:00:00 sql oracle11g oracle plsql clob

我有一个 PL/SQL 过程,它在 VARCHAR2 参数上执行了很多 SUBSTR 操作.我想去掉长度限制,所以我试着把它改成CLOB.

I have a PL/SQL procedure that does a lot of SUBSTRs on a VARCHAR2 parameter. I would like to remove the length limit, so I tried to change it to CLOB.

工作正常,但性能受到影响,所以我做了一些测试(基于 这些2005 年的测试).

Works fine, but performance suffers, so I did some tests (based on these tests from 2005).

更新:我可以在具有不同 Oracle 版本和不同硬件的多个不同实例上重现这一点,dbms_lob.substr 总是明显比 substr(CLOB) 慢,比 SUBSTR(VARCHAR2) 慢很多.

UPDATE: I can reproduce this on several different instances with different Oracle versions and different hardware, dbms_lob.substr is always noticeable slower than substr(CLOB), and a lot slower than SUBSTR(VARCHAR2).

Bob 的结果和上面链接中的测试讲述了一个不同的故事.

Bob's results and the tests in the link above tell a different story.

谁能解释一下,或者至少重现鲍勃或我的结果?谢谢!

Can anyone explain this, or at least reproduce either Bob's or my results? Thanks!

测试结果:

+000000000 00:00:00.004000000 (VARCHAR2)
+000000000 00:00:00.298000000(CLOB SUBSTR)
+000000000 00:00:00.356000000 (DBMS_LOB.SUBSTR)

+000000000 00:00:00.004000000 (VARCHAR2)
+000000000 00:00:00.298000000 (CLOB SUBSTR)
+000000000 00:00:00.356000000 (DBMS_LOB.SUBSTR)

测试代码:

DECLARE
  l_text   VARCHAR2(30) := 'This is a test record';
  l_clob   CLOB := l_text;
  l_substr VARCHAR2(30);
  t TIMESTAMP;
BEGIN
  t := SYSTIMESTAMP;
  FOR i IN 1..100000 LOOP
    l_substr := SUBSTR(l_text,1,14);
  END LOOP;
  dbms_output.put_line( SYSTIMESTAMP - t || ' (VARCHAR2)');

  t := SYSTIMESTAMP;
  FOR i IN 1..100000 LOOP
    l_substr := SUBSTR(l_clob,1,14);
  END LOOP;
  dbms_output.put_line( SYSTIMESTAMP - t || ' (CLOB SUBSTR)');

  t := SYSTIMESTAMP;
  FOR i IN 1..100000 LOOP
    l_substr := DBMS_LOB.SUBSTR(l_clob,14,1);
  END LOOP;
  dbms_output.put_line( SYSTIMESTAMP - t || ' (DBMS_LOB.SUBSTR)');
END;

推荐答案

(谎言,该死的谎言,和基准...)

(Lies, damn lies, and benchmarks...)

我重新运行了您的测试 10 次,将字符串扩展到整整 30 个字符,并得到以下平均结果:

I re-ran your test 10 times, expanding the string so it was a full 30 characters long, and got the following averaged results:

+000000000 00:00:00.011694200 (VARCHAR2)
+000000000 00:00:00.901000600 (CLOB SUBSTR)
+000000000 00:00:00.013169200 (DBMS_LOB.SUBSTR)

然后我将子字符串范围更改为 5,14(DBMS_LOB.SUBSTR 为 14,5)并得到:

I then changed the substring range to 5,14 (14,5 for DBMS_LOB.SUBSTR) and got:

+000000000 00:00:00.011731000 (VARCHAR2)
+000000000 00:00:01.010840000 (CLOB SUBSTR)
+000000000 00:00:00.011427000 (DBMS_LOB.SUBSTR)

然后我将范围更改为 17,14(DBMS_LOB.SUBSTR 为 14,17)并得到

I then changed the range to 17,14 (14,17 for DBMS_LOB.SUBSTR) and got

+000000000 00:00:00.013578900 (VARCHAR2)
+000000000 00:00:00.964527400 (CLOB SUBSTR)
+000000000 00:00:00.011416800 (DBMS_LOB.SUBSTR)

最后,我将范围更改为 25,14(DBMS_LOB.SUBSTR 为 14,25)并得到

Finally, I changed the range to 25,14 (14,25 for DBMS_LOB.SUBSTR) and got

+000000000 00:00:00.011210200 (VARCHAR2)
+000000000 00:00:00.916439800 (CLOB SUBSTR)
+000000000 00:00:00.013781300 (DBMS_LOB.SUBSTR)

我的结论是,当针对 CLOB 工作时,最好使用 DBMS_LOB.SUBSTR,因为与针对正常"VARCHAR2 使用 SUBSTR 相比,它似乎实际上没有性能损失.针对 CLOB 的 SUBSTR 似乎遭受了显着的性能损失.作为记录 - 操作系统 = HP/UX(Unix 变体),Oracle 版本 = 11.1,处理器 =HP Itanium 2-plex.天啊.

My conclusion is that when working against CLOB's it's best to use DBMS_LOB.SUBSTR as it appears to have effectively no performance penalty compared to using SUBSTR against a "normal" VARCHAR2. SUBSTR against a CLOB seems to suffer from a significant performance penalty. For the record - OS = HP/UX (Unix variant), Oracle version=11.1, processor=HP Itanium 2-plex. YMMV.

分享和享受.

而且因为如果值得做就值得过度做,这里有更多的结果,字符串扩展到 32767 个字符.每组结果给出的子串范围:

And because if it's worth doing it's worth over-doing, here's some more results with the strings expanded to 32767 characters. Substring ranges given with each set of results:

1, 25000
+000000000 00:00:00.198466400 (VARCHAR2)
+000000000 00:00:02.870958700 (CLOB SUBSTR)
+000000000 00:00:00.174490100 (DBMS_LOB.SUBSTR)

1000, 25000
+000000000 00:00:00.253447900 (VARCHAR2)
+000000000 00:00:02.491790500 (CLOB SUBSTR)
+000000000 00:00:00.193560100 (DBMS_LOB.SUBSTR)

10000, 25000
+000000000 00:00:00.217812000 (VARCHAR2)
+000000000 00:00:02.268794800 (CLOB SUBSTR)
+000000000 00:00:00.222200200 (DBMS_LOB.SUBSTR)

同一天,同样的结论.

克苏鲁 fhtagn.

Cthulhu fhtagn.

(再次违反,亲爱的朋友们,再次......)

(Once more unto the breach, dear friends, once more...)

重新运行基准测试,将 CLOB 的大小更改为 3276700,并从中间取子串,从 2475000 开始,长度为 25000,我得到:

Re-ran the benchmarks, changing the size of the CLOB to 3276700, and taking the substring from the middle starting at 2475000 for length 25000 I get:

+000000000 00:00:00.176883200 (VARCHAR2)
+000000000 00:00:02.069482600 (CLOB SUBSTR)
+000000000 00:00:00.175341500 (DBMS_LOB.SUBSTR)

(请注意,更改仅影响最后两个测试).

(Note that changes only affect the last two tests).

AND...相同的结果,不同的日子.

AND...same results, different day.

天啊.

相关文章