编码/转义 JSON 控制字符

2022-01-15 00:00:00 json mariadb php mysql

我正在使用 MariaDB 的 COLUMN_JSON() 功能.正如 this bug 所示,该函数可以正确转义双引号,但不能转义其他字符被编码/转义.

I'm using MariaDB's COLUMN_JSON() function. As this bug illustrates, the function properly escapes double quotes, but not other characters that should be encoded/escaped.

这是一个愚蠢的示例查询,用于演示如何创建 JSON 列.

Here's a silly example query to demonstrate how the JSON column is created.

SELECT CONCAT('[', GROUP_CONCAT(COLUMN_JSON(COLUMN_CREATE(
        'name', `name`,
        'value', `value`
    )) SEPARATOR ','), ']') AS `json`
FROM `settings`

如果 namevalue 包含无效的 JSON 字符,json_decode 将失败.

If the name or value contain invalid JSON characters, json_decode will fail.

我编写了一个 PHP 函数来转义/编码来自查询的值,但似乎应该有更好的方法.

I've written a PHP function to escape/encode the value that comes from the query, but it seems like there should be a better way.

/**
 * Makes sure the JSON values built by COLUMN_JSON() in MariaDB are safe for json_decode()
 * Assumes that double quotes are already escaped
 *
 * @param string $mysql_json
 * @return string
 */
public static function jsonEscape($mysql_json)
{
    $rtn = '';
    for ($i = 0; $i < strlen($mysql_json); ++$i) {
        $char = $mysql_json[$i];
        if (($char === '\') && ($mysql_json[$i + 1] !== '"')) {
            // escape a backslash, but leave escaped double quotes intact
            $rtn .= '\\';
        } elseif (($ord = ord($char)) && ($ord < 32)) {
            // hex encode control characters (below ASCII 32)
            $rtn .= '\u' . str_pad(dechex($ord), 4, '0', STR_PAD_LEFT);
        } else {
            $rtn .= $char;
        }
    }
    return $rtn;
}

像这样逐个字符地检查字符串效果不佳.也许有一个字符串替换或正则表达式会更高效?

Examine the string character-by-character like this doesn't perform well. Perhaps there's a string replacement or regular expression that would be more performant?

推荐答案

根据 Halcyon 的评论,我切换了str_replace() 解决方案,它的表现要好得多!trim(json_encode(13), '"')'\u' . str_pad(dechex(13), 4, '0', STR_PAD_LEFT) 稍微好一点,但它使意图更加清晰.

Based on a comment from Halcyon, I switched to a str_replace() solution, and it performs much better! The performance difference between trim(json_encode(13), '"') and '\u' . str_pad(dechex(13), 4, '0', STR_PAD_LEFT) is just barely better, but it makes the intent more clear.

private static $json_replace_search;
private static $json_replace_replace;

/**
 * Makes sure the JSON values built by GROUP_CONCAT() and COLUMN_JSON() in MariaDB are safe for json_decode()
 * Assumes that double quotes are already escaped
 *
 * @param string $mysql_json
 * @return string
 */
public static function jsonEscape($mysql_json)
{
    if (is_null(self::$json_replace_search)) {
        // initialize
        self::$json_replace_search = [];
        self::$json_replace_replace = [];
        // set up all of the control characters (below ASCII 32)
        for ($i = 0; $i < 32; ++$i) {
            self::$json_replace_search[$i] = chr($i);
            self::$json_replace_replace[$i] = trim(json_encode(self::$json_replace_search[$i]), '"');
        }
    }
    // replace them
    return str_replace(self::$json_replace_search, self::$json_replace_replace, $mysql_json);
}

/**
 *
 * @param string $mysql_json
 * @return mixed
 */
public static function jsonDecode($mysql_json)
{
    return json_decode(self::jsonEscape($mysql_json));
}

相关文章