为什么 crypt/blowfish 使用两种不同的盐生成相同的哈希?
这个问题与 PHP 对 crypt()
.对于这个问题,salt 的前 7 个字符不计算在内,所以一个 salt '$2a$07$a
' 会被说成长度为 1,因为它只是 salt 的 1 个字符和 7 个元数据字符.
This question has to do with PHP's implementation of crypt()
. For this question, the first 7 characters of the salt are not counted, so a salt '$2a$07$a
' would be said to have a length of 1, as it is only 1 character of salt and seven characters of meta-data.
当使用长度超过 22 个字符的 salt 字符串时,生成的哈希值没有变化(即截断),当使用长度小于 21 个字符的字符串时,salt 将自动填充(使用 '$
' 字符,显然);这是相当简单的.但是,如果给定一个salt 20 个字符和一个salt 21 个字符,其中除了长度为21 的salt 的最后一个字符之外,两者是相同的,那么两个散列字符串将是相同的.一个22个字符长的salt,除了最后一个字符与21个长度的salt相同,hash会再次不同.
When using salt strings longer than 22 characters, there is no change in the hash generated (i.e., truncation), and when using strings shorter than 21 characters the salt will automatically be padded (with '$
' characters, apparently); this is fairly straightforward. However, if given a salt 20 characters and a salt 21 characters, where the two are identical except for the final character of the 21-length salt, both hashed strings will be identical. A salt 22 characters long, which is identical to the 21 length salt except for the final character, the hash will be different again.
代码示例:
$foo = 'bar';
$salt_xx = '$2a$07$';
$salt_19 = $salt_xx . 'b1b2ee48991281a439d';
$salt_20 = $salt_19 . 'a';
$salt_21 = $salt_20 . '2';
$salt_22 = $salt_21 . 'b';
var_dump(
crypt($foo, $salt_19),
crypt($foo, $salt_20),
crypt($foo, $salt_21),
crypt($foo, $salt_22)
);
将产生:
string(60) "$2a$07$b1b2ee48991281a439d$$.dEUdhUoQXVqUieLTCp0cFVolhFcbuNi"
string(60) "$2a$07$b1b2ee48991281a439da$.UxGYN739wLkV5PGoR1XA4EvNVPjwylG"
string(60) "$2a$07$b1b2ee48991281a439da2.UxGYN739wLkV5PGoR1XA4EvNVPjwylG"
string(60) "$2a$07$b1b2ee48991281a439da2O4AH0.y/AsOuzMpI.f4sBs8E2hQjPUQq"
这是为什么?
有些用户注意到整个字符串存在差异,这是真的.在salt_20
中,offset(28, 4)为da$.
,而在salt_21
中,offset(28, 4)为da2.
;但是,需要注意的是,生成的字符串包括哈希值、盐值以及生成盐值的指令(即 $2a$07$
);事实上,发生差异的部分仍然是盐.实际哈希值保持不变,为 UxGYN739wLkV5PGoR1XA4EvNVPjwylG
.
Some users are noting that there is a difference in the overall string, which is true. In salt_20
, offset (28, 4) is da$.
, while in salt_21
, offset (28, 4) is da2.
; however, it is important to note that the string generated includes the hash, the salt, as well as instructions to generate the salt (i.e. $2a$07$
); the part in which the difference occurs is, in fact, still the salt. The actual hash is unchanged as UxGYN739wLkV5PGoR1XA4EvNVPjwylG
.
因此,这实际上不是生成的散列的差异,而是用于存储散列的盐的差异,这正是手头的问题:两个盐生成相同的散列.
Thus, this is in fact not a difference in the hash produced, but a difference in the salt used to store the hash, which is precisely the problem at hand: two salts are generating the same hash.
Rembmer:输出将采用以下格式:
Rembmer: the output will be in the following format:
"$2a$##$saltsaltsaltsaltsaltsaHASHhashHASHhashHASHhashHASHhash"
// ^ Hash Starts Here, offset 28,32
其中## 是 log-base-2 决定算法运行的迭代次数
where ## is the log-base-2 determining the number of iterations the algorithm runs for
在评论中,有人要求我发布一些附加信息,因为用户无法重现我的输出.执行以下代码:
In the comments, it was requested that I post some additional info, as the user could not reproduce my output. Execution of the following code:
var_dump(
PHP_VERSION,
PHP_OS,
CRYPT_SALT_LENGTH,
CRYPT_STD_DES,
CRYPT_EXT_DES,
CRYPT_MD5,
CRYPT_BLOWFISH
);
产生以下输出:
string(5) "5.3.0"
string(5) "WINNT"
int(60)
int(1)
int(1)
int(1)
int(1)
希望这会有所帮助.
推荐答案
经过一些实验,我得出的结论是,这是由于盐的处理方式造成的.salt 不被认为是文字文本,而是一个 base64 编码的字符串,这样 22 字节的 salt 数据实际上代表了一个 16 字节的字符串 (floor(22 * 24/32) == 16代码>) 盐.明白了!"但是,使用此实现,就像 Unix crypt 一样,它使用非标准"base64 字母表.准确地说,它使用这个字母表:
After some experimentation, I have come to the conclusion that this is due to the way the salt is treated. The salt is not considered to be literal text, but rather to be a base64 encoded string, such that 22 bytes of salt data would actually represent a 16 byte string (floor(22 * 24 / 32) == 16
) of salt. The "Gotcha!" with this implementation, though, is that, like Unix crypt, it uses a "non-standard" base64 alphabet. To be exact, it uses this alphabet:
./ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789$
第 65 个字符$
"是填充字符.
The 65th character, '$
', is the padding character.
现在,crypt()
函数似乎能够采用小于或等于其最大值的任何长度的盐,并通过丢弃任何不符合要求的数据来静默处理 base64 中的任何不一致.不构成另一个完整字节.如果你在 salt 中传递不属于它的 base64 字母表的字符,crypt 函数将完全失败,这正好证实了它的操作理论.
Now, the crypt()
function appears to be capable of taking a salt of any length less than or equal to its maximum, and silently handling any inconsistencies in the base64 by discarding any data that doesn't make up another full byte. The crypt function will fail completely if you pass it characters in the salt that are not part of its base64 alphabet, which just confirms this theory of its operation.
取一个假想的盐1234
".这与 base64 完全一致,因为它表示 24 位数据,即 3 个字节,并且不携带任何需要丢弃的数据.这是一个 Len Mod 4
为零的盐.将任何字符附加到该盐上,它就变成了一个 5 个字符的盐,并且 Len Mod 4
现在是 1.但是,这个额外的字符只代表 6 位数据,因此不能转换为另一个完整的字节,因此被丢弃.
Take an imaginary salt '1234
'. This is perfectly base64 consistent in that it represents 24 bits of data, so 3 bytes, and does not carry any data that needs to be discarded. This is a salt whose Len Mod 4
is zero. Append any character to that salt, and it becomes a 5 character salt, and Len Mod 4
is now 1. However, this additional character represents only six bits of data, and therefore cannot be transformed into another full byte, so it is discarded.
因此,对于任意两种盐 A 和 B,其中
Thus, for any two salts A and B, where
Len A Mod 4 == 0
&& Len B Mod 4 == 1 // these two lines mean the same thing
&& Len B = Len A + 1 // but are semantically important separately
&& A == substr B, 0, Len A
crypt()
用于计算散列的实际盐实际上是相同的.作为证明,我提供了一些可用于展示这一点的示例 PHP 代码.盐以半非随机方式不断旋转(基于当前时间到微秒的漩涡散列的随机片段),以及要散列的数据(此处称为 $种子
) 只是当前的 Unix-Epoch 时间.
The actual salt used by crypt()
to calculate the hash will, in fact, be identical. As proof, I'm including some example PHP code that can be used to show this. The salt constantly rotates in a seminon-random way (based on a randomish segment of the whirlpool hash of the current time to the microsecond), and the data to be hashed (herein called $seed
) is simply the current Unix-Epoch time.
$salt = substr(hash('whirlpool',microtime()),rand(0,105),22);
$seed = time();
for ($i = 0, $j = strlen($salt); $i <= $j; ++$i) {
printf('%02d = %s%s%c',
$i,
crypt($seed,'$2a$07$' . substr($salt, 0, $i)),
$i%4 == 0 || $i % 4 == 1 ? ' <-' : '',
0x0A
);
}
这会产生类似于以下的输出
And this produces output similar to the following
00 = $2a$07$$$$$$$$$$$$$$$$$$$$$$.rBxL4x0LvuUp8rhGfnEKSOevBKB5V2. <-
01 = $2a$07$e$$$$$$$$$$$$$$$$$$$$.rBxL4x0LvuUp8rhGfnEKSOevBKB5V2. <-
02 = $2a$07$e8$$$$$$$$$$$$$$$$$$$.WEimjvvOvQ.lGh/V6HFkts7Rq5rpXZG
03 = $2a$07$e89$$$$$$$$$$$$$$$$$$.Ww5p352lsfQCWarRIWWGGbKa074K4/.
04 = $2a$07$e895$$$$$$$$$$$$$$$$$.ZGSPawtL.pOeNI74nhhnHowYrJBrLuW <-
05 = $2a$07$e8955$$$$$$$$$$$$$$$$.ZGSPawtL.pOeNI74nhhnHowYrJBrLuW <-
06 = $2a$07$e8955b$$$$$$$$$$$$$$$.2UumGVfyc4SgAZBs5P6IKlUYma7sxqa
07 = $2a$07$e8955be$$$$$$$$$$$$$$.gb6deOAckxHP/WIZOGPZ6/P3oUSQkPm
08 = $2a$07$e8955be6$$$$$$$$$$$$$.5gox0YOqQMfF6FBU9weAz5RmcIKZoki <-
09 = $2a$07$e8955be61$$$$$$$$$$$$.5gox0YOqQMfF6FBU9weAz5RmcIKZoki <-
10 = $2a$07$e8955be616$$$$$$$$$$$.hWHhdkS9Z3m7/PMKn1Ko7Qf2S7H4ttK
11 = $2a$07$e8955be6162$$$$$$$$$$.meHPOa25CYG2G8JrbC8dPQuWf9yw0Iy
12 = $2a$07$e8955be61624$$$$$$$$$.vcp/UGtAwLJWvtKTndM7w1/30NuYdYa <-
13 = $2a$07$e8955be616246$$$$$$$$.vcp/UGtAwLJWvtKTndM7w1/30NuYdYa <-
14 = $2a$07$e8955be6162468$$$$$$$.OTzcPMwrtXxx6YHKtaX0mypWvqJK5Ye
15 = $2a$07$e8955be6162468d$$$$$$.pDcOFp68WnHqU8tZJxuf2V0nqUqwc0W
16 = $2a$07$e8955be6162468de$$$$$.YDv5tkOeXkOECJmjl1R8zXVRMlU0rJi <-
17 = $2a$07$e8955be6162468deb$$$$.YDv5tkOeXkOECJmjl1R8zXVRMlU0rJi <-
18 = $2a$07$e8955be6162468deb0$$$.aNZIHogUlCn8H7W3naR50pzEsQgnakq
19 = $2a$07$e8955be6162468deb0d$$.ytfAwRL.czZr/K3hGPmbgJlheoZUyL2
20 = $2a$07$e8955be6162468deb0da$.0xhS8VgxJOn4skeI02VNI6jI6324EPe <-
21 = $2a$07$e8955be6162468deb0da3.0xhS8VgxJOn4skeI02VNI6jI6324EPe <-
22 = $2a$07$e8955be6162468deb0da3ucYVpET7X/5YddEeJxVqqUIxs3COrdym
结论?双重.首先,它按预期工作,其次,了解你自己的盐或不滚你自己的盐.
The conclusion? Twofold. First, it's working as intended, and second, know your own salt or don't roll your own salt.
相关文章