应该执行哪些字符替换以使 base 64 编码 URL 安全?

2022-01-21 00:00:00 encoding perl url base64 php

在查看 URL 安全 base 64 编码时,我发现它是一件非常不标准的事情.尽管 PHP 具有大量内置函数,但没有一个用于 URL 安全 base 64 编码的函数.在 base64_encode() 的手册页上,大多数评论建议使用该函数,用 strtr():

In looking at URL safe base 64 encoding, I've found it to be a very non-standard thing. Despite the copious number of built in functions that PHP has, there isn't one for URL safe base 64 encoding. On the manual page for base64_encode(), most of the comments suggest using that function, wrapped with strtr():

function base64_url_encode($input)
{
     return strtr(base64_encode($input), '+/=', '-_,');
}

我能在这个领域找到的唯一 Perl 模块是 MIME::Base64::URLSafe (source),它在内部执行以下替换:

The only Perl module I could find in this area is MIME::Base64::URLSafe (source), which performs the following replacement internally:

sub encode ($) {
    my $data = encode_base64($_[0], '');
    $data =~ tr|+/=|-_|d;
    return $data;
}

与上面的 PHP 函数不同,这个 Perl 版本完全删除了 '='(等于)字符,而不是像 PHP 那样用 ','(逗号)替换它.Equals 是一个填充字符,因此 Perl 模块在解码时会根据需要替换它们,但这种差异导致两种实现不兼容.

Unlike the PHP function above, this Perl version drops the '=' (equals) character entirely, rather than replacing it with ',' (comma) as PHP does. Equals is a padding character, so the Perl module replaces them as needed upon decode, but this difference makes the two implementations incompatible.

最后,Python 函数 urlsafe_b64encode(s) 保留'=' 填充,提示有人提出 此功能 删除'python base64 url​​ safe':

Finally, the Python function urlsafe_b64encode(s) keeps the '=' padding around, prompting someone to put up this function to remove the padding which shows prominently in Google results for 'python base64 url safe':

from base64 import urlsafe_b64encode, urlsafe_b64decode

def uri_b64encode(s):
    return urlsafe_b64encode(s).strip('=')

def uri_b64decode(s):
    return urlsafe_b64decode(s + '=' * (4 - len(s) % 4))

这里的愿望是有一个可以包含在 URL 中的字符串而无需进一步编码,因此放弃或翻译字符+"、/"和=".由于没有明确的标准,正确的方法是什么?

The desire here is to have a string that can be included in a URL without further encoding, hence the ditching or translation of the characters '+', '/', and '='. Since there isn't a defined standard, what is the right way?

推荐答案

似乎有一个标准,它是 RFC 3548,第 4 节,使用 URL 和文件名安全字母的 Base 64 编码:

There does appear to be a standard, it is RFC 3548, Section 4, Base 64 Encoding with URL and Filename Safe Alphabet:

这种编码在技术上是相同的对上一个,除了62:nd 和 63:rd 字母字符,如见表 2.

This encoding is technically identical to the previous one, except for the 62:nd and 63:rd alphabet character, as indicated in table 2.

+/ 应分别替换为 - (减号)_ (understrike).任何不兼容的库都应进行包装,使其符合 RFC 3548.

+ and / should be replaced by - (minus) and _ (understrike) respectively. Any incompatible libraries should be wrapped so they conform to RFC 3548.

请注意,这要求您对 (pad) = 字符进行 URL 编码,但我更喜欢对 +/ 进行 URL 编码来自标准 base64 字母表的字符.

Note that this requires that you URL encode the (pad) = characters, but I prefer that over URL encoding the + and / characters from the standard base64 alphabet.

相关文章