查询字符串参数的 Java URL 编码

2022-01-30 00:00:00 http urlencode encoding url java

假设我有一个网址

http://example.com/query?q=

我有一个用户输入的查询,例如:

and I have a query entered by the user such as:

随机词 £500 银行 $

random word £500 bank $

我希望结果是正确编码的 URL:

I want the result to be a properly encoded URL:

http://example.com/query?q=random%20word%20%A3500%20bank%20%24

实现这一目标的最佳方法是什么?我尝试了 URLEncoder 并创建了 URI/URL 对象,但没有一个是完全正确的.

What's the best way to achieve this? I tried URLEncoder and creating URI/URL objects but none of them come out quite right.

推荐答案

URLEncoder 是要走的路.您只需要记住编码 only 单个查询字符串参数名称和/或值,而不是整个 URL,确保不是查询字符串参数分隔符 & 也不是参数名值分隔符=.

URLEncoder is the way to go. You only need to keep in mind to encode only the individual query string parameter name and/or value, not the entire URL, for sure not the query string parameter separator character & nor the parameter name-value separator character =.

String q = "random word £500 bank $";
String url = "https://example.com?q=" + URLEncoder.encode(q, StandardCharsets.UTF_8);

当您仍未使用 Java 10 或更高版本时,请使用 StandardCharsets.UTF_8.toString() 作为字符集参数,或者当您仍未使用 Java 7 或更高版本时,请使用UTF-8".

When you're still not on Java 10 or newer, then use StandardCharsets.UTF_8.toString() as charset argument, or when you're still not on Java 7 or newer, then use "UTF-8".

请注意,查询参数中的空格由 + 表示,而不是合法有效的 %20.%20 通常用于表示 URI 本身中的空格(URI 查询字符串分隔符 ? 之前的部分),而不是查询字符串中的空格(之后的部分?).

Note that spaces in query parameters are represented by +, not %20, which is legitimately valid. The %20 is usually to be used to represent spaces in URI itself (the part before the URI-query string separator character ?), not in query string (the part after ?).

另请注意,有三个 encode() 方法.一个没有 Charset 作为第二个参数,另一个使用 String 作为第二个参数,这会引发检查异常.不推荐使用没有 Charset 参数的那个.永远不要使用它并始终指定 Charset 参数.javadoc 甚至明确建议使用 UTF-8 编码,这是由 RFC3986 和 W3C.

Also note that there are three encode() methods. One without Charset as second argument and another with String as second argument which throws a checked exception. The one without Charset argument is deprecated. Never use it and always specify the Charset argument. The javadoc even explicitly recommends to use the UTF-8 encoding, as mandated by RFC3986 and W3C.

所有其他字符都是不安全的,首先使用某种编码方案将其转换为一个或多个字节.然后每个字节由 3 个字符的字符串%xy"表示,其中 xy 是字节的两位十六进制表示.推荐使用的编码方案是 UTF-8.但是,出于兼容性考虑,如果没有指定编码,则使用平台的默认编码.

All other characters are unsafe and are first converted into one or more bytes using some encoding scheme. Then each byte is represented by the 3-character string "%xy", where xy is the two-digit hexadecimal representation of the byte. The recommended encoding scheme to use is UTF-8. However, for compatibility reasons, if an encoding is not specified, then the default encoding of the platform is used.

另见:

  • 每一个Web 开发者必须了解 URL 编码

相关文章