ORA_HASH函数使用的算法是什么?

我在我正在处理的应用程序中遇到了一些代码,这些代码调用数据库只是为了调用 ORA_HASH 函数 (文档) 上的 UUID 字符串.它这样做的原因是它需要该值来向另一个似乎使用 ORA_HASH 进行分区的系统发出服务调用.

I've come across some code in the application I'm working on that makes a database call merely to call the ORA_HASH function (documentation) on a UUID string. The reason it's doing this is that it needs the value to make a service call to another system that appears to use ORA_HASH for partitioning.

我想知道 ORA_HASH 使用的算法,以便我可以重新实现它,为无法访问真实数据库的应用程序进行类似的服务调用,更不用说 Oracle.到目前为止,我只能找到相当于 Oracle API 文档的内容.

I would like to know the algorithm ORA_HASH uses so that I can re-implement it to make a similar service call for an application that won't have access to a real database, let alone Oracle. I've only been able to find what amounts to Oracle API documentation so far.

特别清楚:我需要克隆ORA_HASH,因为这是我无法控制的另一个系统使用的,我需要与该系统集成.是的,如果可以使用真正标准算法就好了,比如 MD5,但我不能,除非这就是 ORA_HASH 的隐藏内容.

Just to be super clear: I need to clone ORA_HASH because that's what another system that's outside of my control uses, and I need to integrate with that system. Yes, it would nice if could use a really standard algorithm, like MD5, but I can't, unless that's what ORA_HASH is under the covers.

建议使用除 ORA_HASH 之外的散列算法的答案或评论没有帮助.这个问题专门针对 ORA_HASH,而不是一般的散列或分区.

Answers or comments that propose the use of a hash algorithm besides ORA_HASH are not helpful. This question is specifically about ORA_HASH, not hashing or partitioning in general.

推荐答案

另一个似乎使用 ORA_HASH 的系统

another system that appears to use ORA_HASH

好吧,如果它似乎可以使用",那么进行一些逆向工程并检查究竟调用了什么并反汇编该函数的代码是有意义的.

Well, if it "appears to use" then it makes sense to do a bit of reverse engineering and check what exactly is called and disassemble code of the function.

但是,如果您想深入了解 Oracle 内部结构,那么以下内容可能会有所帮助.

If you, however, want to dive into Oracle internals then following may help.

首先,你要弄清楚内部调用的是什么C函数.为此,您可以在一个会话中执行一些长时间运行的代码.我确实运行了这个

First of all, you have to figure out what internal C function is called. To do that you can execute some long running code in one session. I did run this

select avg(ora_hash(rownum)) id from
(select rownum from dual connect by rownum <= 1e4),
(select rownum from dual connect by rownum <= 1e4);

它也可以是 PL/SQL 代码,你只需要确保你不断调用 ora_hash.

It can be PL/SQL code as well, you just need to make sure that you constantly call ora_hash.

运行时

  • 如果您使用的是 Windows,那么您可以使用 TANEL PODER 的 ostackprof(https://blog.tanelpoder.com/2008/10/31/advanced-oracle-troubleshooting-guide-part-9-process-stack-profiling-from-sqlplus-using-ostackprof/)

如果你在 *nix 上,那么你可以使用 dtrace(http://www.oracle.com/technetwork/articles/servers-storage-dev/dtrace-on-linux-1956556.html),火焰图(使用场景 https://blog.dbi-services.com/oracle-database-multilingual-engine-mle/)

If you on *nix then you can use dtrace (http://www.oracle.com/technetwork/articles/servers-storage-dev/dtrace-on-linux-1956556.html), Flame Graph (usage scenario https://blog.dbi-services.com/oracle-database-multilingual-engine-mle/)

我在 Windows 上测试过,看起来 ora_hash 是 ...->evaopn2()->evahash()->...

I tested on Windows and looks like that ora_hash is ...->evaopn2()->evahash()->...

现在让我们用谷歌搜索 evahash.我们非常幸运,因为官方网站上有一个头文件 https://oss.oracle.com/projects/ocfs-tools/src/branches/new-dir-format/libocfs/Linux/inc/ocfshash.h 与evahash 的链接.

Now let's google for evahash. We got extremely lucky because there is a header file on official site https://oss.oracle.com/projects/ocfs-tools/src/branches/new-dir-format/libocfs/Linux/inc/ocfshash.h with link to evahash.

最后是带有实际 C 代码的页面 http://burtleburtle.net/bob/hash/evahash.html

And finally there is page with actual C code http://burtleburtle.net/bob/hash/evahash.html

到目前为止一切顺利,我们记得如果我们将其构建到库(Windows 上的 DLL)中,我们可以在 Oracle 中使用外部 C 函数.

So far so good, we remember that we can use external C function in Oracle if we build it into library (DLL on Windows).

例如在我的 Win x64 上,如果我将函数签名更改为

For example on my Win x64 if I change function signature to

extern "C" ub4 hash( ub1 *k, ub4 length, ub4 initval)

它可以从 Oracle 成功执行.但是,如您所见,签名与 Oracle 中的 ora_hash 略有不同.该函数接受值、长度和初始值(可能是种子),而 Oracle 中的签名是 ora_hash(expr, max_bucket, seed_value).

it can be successfully executed from Oracle. But, as you see, signature a bit differs from ora_hash in Oracle. This function accepts value, its length and initval (may be seed) while signature in Oracle is ora_hash(expr, max_bucket, seed_value).

让我们尝试测试甲骨文

SQL> select ora_hash(utl_raw.cast_to_raw('0'), power(2, 32) - 1, 0) oh1,
  2         ora_hash('0', power(2, 32) - 1, 0) oh2,
  3         ora_hash(0, power(2, 32) - 1, 0) oh3,
  4         ora_hash(chr(0), power(2, 32) - 1, 0) oh4
  5    from dual;

       OH1        OH2        OH3        OH4
---------- ---------- ---------- ----------
3517341953 3517341953 1475158189 4056412421

C

int main()
{
    ub1 ta[] = {0};
    ub1* t = ta;
    cout << hash(t, 1, 0) << endl;
    ub1 ta0[] = {'0'};
    ub1* t0 = ta0;
    cout << hash(t0, 1, 0) << endl;
    return 0;
}

1843378377
4052366646

没有一个数字匹配.那么问题出在哪里呢?ora_hash 接受几乎任何类型的参数(例如 select ora_hash(sys.odcinumberlist(1,2,3)) from dual),而 C 函数接受作为字节数组的值.这意味着在函数调用之前会发生一些转换.因此,在使用提到的 C 哈希函数之前,您必须弄清楚实际值在传递给它之前是如何转换的.

None of the numbers matches. So what is the problem? ora_hash accepts parameters of almost any type (for example select ora_hash(sys.odcinumberlist(1,2,3)) from dual) while C function accepts value as array of bytes. This means that some conversion happens before function call. Thus before using mentioned C hash function you have to figure out how actual value is transformed before passing to it.

您可以使用 IDA PRO + 十六进制射线对 Oracle 二进制文件进行逆向工程,但这可能需要数天时间.更不用说平台特定的细节了.

You can proceed with reverse engineering of Oracle binaries using IDA PRO + hex rays but that may take days. Not to mention platform specific details.

所以如果你想模仿ora_hash,最简单的选择就是安装Oracle express edition并用它来调用ora_hash.

So if you want to imitate ora_hash, the easiest option would be to install Oracle express edition and use it to call ora_hash.

我希望那很有趣.祝你好运.

I hope that was interesting. Good luck.

更新

ora_hash 和 dbms_utility.get_hash_value 可以相互映射(参见 https://jonathanlewis.wordpress.com/2009/11/21/ora_hash-function/)

ora_hash and dbms_utility.get_hash_value can be mapped to each other (see https://jonathanlewis.wordpress.com/2009/11/21/ora_hash-function/)

SQL> select dbms_utility.get_hash_value('0', 0 + 1, 1e6 + 1) ha1,
  2         ora_hash('0', 1e6, 0) + 1 ha2
  3    from dual;

       HA1        HA2
---------- ----------
    338437     338437

如果我们解开 dbms_utility 的包体,我们将看到以下声明

If we unwrap package body of dbms_utility we will see following declaration

  function get_hash_value(name varchar2, base number, hash_size number)
    return number is
  begin
    return(icd_hash(name, base, hash_size));
  end;

  function icd_hash(name      varchar2,
                    base      binary_integer,
                    hash_size binary_integer) return binary_integer;
  pragma interface(c, icd_hash);

让我们谷歌搜索 icd_hash,我们可以发现它映射到 _psdhsh (https://yurichev.com/blog/50/).现在是时候反汇编 oracle.exe 并从中提取 _psdhsh 的代码了.也许明年我会花一些时间在这上面.

Let's google for icd_hash and we can find that it's mapped to _psdhsh (https://yurichev.com/blog/50/). Now it's time to disassemble oracle.exe and extract code for _psdhsh from it. Maybe I'll spend some time on this next year.

相关文章