在没有 REGEXP_REPLACE 和 PL/SQL 的情况下格式化 UUID 字符串

2022-01-15 00:00:00 string format uuid oracle

我想格式化 sys_guid() 函数的结果,例如 这个答案

I'd like to format the result of the sys_guid() function such as proposed in this answer

select regexp_replace(rawtohex(sys_guid())
       , '([A-F0-9]{8})([A-F0-9]{4})([A-F0-9]{4})([A-F0-9]{4})([A-F0-9]{12})'
       , '1-2-3-4-5') 
         as FORMATTED_GUID 
 from dual

出于性能原因,我想避免使用 regexp_replace(因为我处理大量记录).

From performance reasons I'd like to avoid the usage of regexp_replace (as I process large number of records).

我的场景可以简化为这个用例:

My scenario can be simplified to this use case:

 select rawtohex(sys_guid()) GUID
 from dual connect by level <= 2;

显然我不能使用 substr 和串联,因为每个 SUBSTR 会处理不同的 SYS_GUID.我也想留在 SQL 中,无需上下文切换到 PL/SQL 函数.

Obviously I can't use substr and concatenation as each SUBSTR would process a different SYS_GUID. I would also like to stay in SQL, without a context switch to PL/SQL function.

知道如何在 SQL 中使用掩码格式化字符串,类似于日期或数字:

Any idea how to format string in SQL similar to date or number using a mask:

 to_char(rawtohex(sys_guid(),'CCCCCCCC-CCCC-CCCC-CCCC-CCCCCCCCCCCC') /* note, this is clear illegal */

推荐答案

不幸的是,您不能在数字格式中包含字符串文字,否则您可以将十六进制字符串转换为数字然后再返回,在格式掩码中插入文字在正确的地方 - 但你只能在约会时这样做.

You can't include string literals in number formats unfortunately, otherwise you could convert the hex string to a number and then back again, inserting literals in the format mask in the right places - but you can only do that for dates.

你可以使用 substr() 因为位置是固定的.你担心

You can use substr() since the positions are fixed. You were concerned that

显然我不能使用 substr 和串联,因为每个 SUBSTR 会处理不同的 SYS_GUID.

Obviously I can't use substr and concatenation as each SUBSTR would process a different SYS_GUID.

使用子查询分解(又名公用表表达式/CTE)意味着 substr() 调用该 CTE 中的一行都看到相同的 GUID;此方法不会为每个生成一个新的 SYS_GUID.

Using subquery factoring (a.ka. a common table expression/CTE) means the substr() calls for a row from that CTE all see the same GUID; this method doesn't generate a new SYS_GUID for each one.

with t as (
  select rawtohex(sys_guid()) guid from dual
  connect by level <= 2
)
select guid, substr(guid, 1, 8)
  ||'-'|| substr(guid, 9, 4)
  ||'-'|| substr(guid, 13, 4)
  ||'-'|| substr(guid, 17, 4)
  ||'-'|| substr(guid, 21, 12) as formatted_guid
from t;

GUID                             FORMATTED_GUID                         
-------------------------------- ----------------------------------------
2F6BA62518F926D0E0534D49E50ABB46 2F6BA625-18F9-26D0-E053-4D49E50ABB46    
2F6BA62518FA26D0E0534D49E50ABB46 2F6BA625-18FA-26D0-E053-4D49E50ABB46    

这比处理大量数据的正则表达式快得多.在循环中有 100000 个值(在 PL/SQL 块中,在循环内做最少的工作以使其实际正确评估,并使用 dbms_utility.get_cpu_time 检查经过的时间)正则表达式版本大约需要 2.51 秒,而子字符串版本大约需要 0.29 秒.你的系统当然会得到不同的数字,但它应该仍然是相同的数量级.

That's a lot faster than the regex on a larger amount of data. With 100000 values in a loop (in a PL/SQL block, doing a minimal amount of work inside the loop to make it actually evaluate properly, and using dbms_utility.get_cpu_time to check the elapsed time) the regex version takes about 2.51 seconds, while the substring version takes about 0.29 seconds. Your system will get different numbers of course, but it should still be about the same order of magnitude.

相关文章