Python:base64解码时忽略“不正确的填充"错误

2022-01-21 00:00:00 python base64

问题描述

我有一些经过 base64 编码的数据,即使其中存在填充错误,我也想将其转换回二进制.如果我使用

I have some data that is base64 encoded that I want to convert back to binary even if there is a padding error in it. If I use

base64.decodestring(b64_string)

它会引发不正确的填充"错误.还有其他方法吗?

it raises an 'Incorrect padding' error. Is there another way?

更新:感谢所有反馈.老实说,所有提到的方法听起来都有点打击错过了所以我决定尝试openssl.以下命令很有效:

UPDATE: Thanks for all the feedback. To be honest, all the methods mentioned sounded a bit hit and miss so I decided to try openssl. The following command worked a treat:

openssl enc -d -base64 -in b64string -out binary_data


解决方案

正如其他回复中所说,base64 数据可能被破坏的方式有很多种.

As said in other responses, there are various ways in which base64 data could be corrupted.

但是,正如 Wikipedia 所说,删除填充('=' 字符在base64 编码数据的结尾)是无损"的:

However, as Wikipedia says, removing the padding (the '=' characters at the end of base64 encoded data) is "lossless":

从理论上讲,填充字符是不需要的,因为丢失的字节数可以从数字中计算出来Base64 位数.

From a theoretical point of view, the padding character is not needed, since the number of missing bytes can be calculated from the number of Base64 digits.

因此,如果这确实是您的 base64 数据唯一错误"的地方,则可以重新添加填充.我想出了这个能够解析 WeasyPrint 中的数据"URL,其中一些是 base64 没有填充:

So if this is really the only thing "wrong" with your base64 data, the padding can just be added back. I came up with this to be able to parse "data" URLs in WeasyPrint, some of which were base64 without padding:

import base64
import re

def decode_base64(data, altchars=b'+/'):
    """Decode base64, padding being optional.

    :param data: Base64 data as an ASCII byte string
    :returns: The decoded byte string.

    """
    data = re.sub(rb'[^a-zA-Z0-9%s]+' % altchars, b'', data)  # normalize
    missing_padding = len(data) % 4
    if missing_padding:
        data += b'='* (4 - missing_padding)
    return base64.b64decode(data, altchars)

此函数的测试:weasyprint/tests/test_css.py#L68

相关文章