Oracle:是否有任何合乎逻辑的理由不对 SELECT 列表中的子查询使用并行执行?

2021-12-30 00:00:00 sql parallel-processing oracle

对于 SELECT 列表中的标量子查询，Oracle 不使用并行执行是否有任何合乎逻辑的原因?为什么不应该使用它们?

<块引用>

SELECT 语句只有在以下情况下才能并行化条件满足:

查询包含并行提示规范(PARALLEL 或PARALLEL_INDEX) 或查询中引用的架构对象具有与它们关联的 PARALLEL 声明.
至少查询中指定的表之一需要以下:
- 全表扫描
- 跨越多个分区的索引范围扫描
SELECT 列表中没有标量子查询.

解决方案

列表错误.

(至少对于 Oracle 11gR2，可能还有 10g.对于某些过时的 Oracle 版本，该列表可能是准确的.)

我建议尽可能使用 Oracle 官方文档，但并行执行章节不是很准确.

而且即使手册没有错，也常常会产生误导，因为并行执行非常复杂.如果您浏览所有文档，您会发现大约有 30 个不同的变量可以确定并行度.如果您曾经看到一个简短的项目清单，您应该非常怀疑.这些清单通常只是在非常特定的上下文中需要考虑的最相关的项目.

<小时>

示例:

SQL>--创建一个没有任何并行设置的表SQL>创建表parallel_test(a number主键，b number);表已创建.SQL>--创建一些测试数据SQL>插入parallel_test2 选择level，level from dual connect by level <= 100000;创建了 100000 行.SQL>犯罪;提交完成.SQL>--强制会话并行运行查询SQL>更改会话强制并行查询；会话已更改.SQL>--生成解释计划SQL>解释计划2 选择一个3 ,(4 选择一个5 来自 parallel_test parallel_test26 其中 parallel_test2.a = parallel_test.a7)8 来自 parallel_test；解释了.SQL>从表中选择*(dbms_xplan.display)；PLAN_TABLE_OUTPUT------------------------------------------------------------------------------------------------------------------------计划哈希值:3823224058---------------------------------------------------------------------------------------------------------------------|身份证 |操作 |姓名 |行 |字节 |成本 (%CPU)|时间 |TQ |输入-输出|PQ 分销 |---------------------------------------------------------------------------------------------------------------------|0 |选择语句 ||116K|1477K|9 (0)|00:00:01 |||||* 1 |索引唯一扫描 |SYS_C0028894 |1 |13 |1 (0)|00:00:01 |||||2 |PX 协调员 ||||||||||3 |PX 发送 QC(随机) |:TQ10000 |116K|1477K|9 (0)|00:00:01 |Q1,00 |P->S |QC (兰德) ||4 |PX 块迭代器 ||116K|1477K|9 (0)|00:00:01 |Q1,00 |PCWC |||5 |索引快速全扫描|SYS_C0028894 |116K|1477K|9 (0)|00:00:01 |Q1,00 |PCWP ||---------------------------------------------------------------------------------------------------------------------谓词信息(由操作 id 标识):--------------------------------------------------——1 - 访问(PARALLEL_TEST2".A"=:B1)笔记------ 用于此语句的动态采样(级别 = 2)已选择 21 行.SQL>

没有并行提示，没有并行对象，没有全表扫描，没有跨越多个分区的索引范围扫描，以及一个标量子查询.

没有满足单个条件，但查询仍然使用并行性.(我还验证了 v$px_process 以确保查询确实使用了并行性，而不仅仅是解释计划失败.)

<小时>

这意味着您对其他问题的回答是错误的.>

我不确定在这种情况下到底发生了什么，但我认为这与 FAST DUAL 优化有关.在某些情况下，DUAL 不用作表，因此无需并行化.这可能是一个错误"，但是如果您使用的是 DUAL，那么无论如何您真的不想要并行性.(虽然我假设您将 DUAL 用于演示目的，并且您的实际查询更复杂.如果是这样，您可能需要使用更实际的示例更新查询.)

Is there any logical reason for Oracle not to use parallel execution with scalar subqueries in the SELECT list? Why it shouldn't use them?

A SELECT statement can be parallelized only if the following conditions are satisfied:

The query includes a parallel hint specification (PARALLEL or PARALLEL_INDEX) or the schema objects referred to in the query have a PARALLEL declaration associated with them.

At least one of the tables specified in the query requires one of the following:

A full table scan

An index range scan spanning multiple partitions

No scalar subqueries are in the SELECT list.

解决方案

Every item in that list is wrong.

(At least for Oracle 11gR2, and probably10g as well. The list may be accurate for some obsolete versions of Oracle.)

I recommend using the official Oracle documentation whenever possible, but the parallel execution chapter is not very accurate.

And even when the manual isn't wrong, it is often misleading, because parallel execution is very complicated. If you go through all the documentation you'll find there are about 30 different variables that determine the degree of parallelism. If you ever see a short checklist of items, you should be very skeptical. Those checklists are usually just the most relevant items to consider in a very specific context.

Example:

SQL> --Create a table without any parallel settings
SQL> create table parallel_test(a number primary key, b number);

Table created.

SQL> --Create some test data
SQL> insert into parallel_test
  2  select level, level from dual connect by level <= 100000;

100000 rows created.

SQL> commit;

Commit complete.

SQL> --Force the session to run the query in parallel
SQL> alter session force parallel query;

Session altered.
SQL> --Generate explain plan
SQL> explain plan for
  2  select a
  3     ,(
  4             select a
  5             from parallel_test parallel_test2
  6             where parallel_test2.a = parallel_test.a
  7     )
  8  from parallel_test;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3823224058

---------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name         | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |              |   116K|  1477K|     9   (0)| 00:00:01 |        |      |            |
|*  1 |  INDEX UNIQUE SCAN      | SYS_C0028894 |     1 |    13 |     1   (0)| 00:00:01 |        |      |            |
|   2 |  PX COORDINATOR         |              |       |       |            |          |        |      |            |
|   3 |   PX SEND QC (RANDOM)   | :TQ10000     |   116K|  1477K|     9   (0)| 00:00:01 |  Q1,00 | P->S | QC (RAND)  |
|   4 |    PX BLOCK ITERATOR    |              |   116K|  1477K|     9   (0)| 00:00:01 |  Q1,00 | PCWC |            |
|   5 |     INDEX FAST FULL SCAN| SYS_C0028894 |   116K|  1477K|     9   (0)| 00:00:01 |  Q1,00 | PCWP |            |
---------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("PARALLEL_TEST2"."A"=:B1)

Note
-----
   - dynamic sampling used for this statement (level=2)

21 rows selected.

SQL>

No parallel hint, no parallel objects, no full table scans, no index range scans spanning multiple partitions, and a scalar subquery.

Not a single condition met, yet the query still uses parallelism. (I also verified v$px_process to make sure that the query really does use parallelism, and it's not just an explain plan failure.)

This means the answer to your other question is wrong.

I'm not sure exactly what's going on in that case, but I think it has to do with the FAST DUAL optimization. In some contexts, DUAL isn't used as a table, so there's nothing to parallelize. This is probably a "bug", but if you're using DUAL then you really don't want parallelism anyway. (Although I assume you used DUAL for demonstration purposes, and your real query is more complicated. If so, you may need to update the query with a more realistic example.)

相关文章