Java String split 删除了空值

2022-01-30 00:00:00 string split java

我正在尝试使用分隔符拆分值.但我发现了令人惊讶的结果

I am trying to split the Value using a separator. But I am finding the surprising results

String data = "5|6|7||8|9||";
String[] split = data.split("\|");
System.out.println(split.length);

我期望得到 8 个值.[5,6,7,EMPTY,8,9,EMPTY,EMPTY]但我只得到 6 个值.

I am expecting to get 8 values. [5,6,7,EMPTY,8,9,EMPTY,EMPTY] But I am getting only 6 values.

任何想法以及如何解决.无论 EMPTY 值来自任何地方,它都应该在数组中.

Any idea and how to fix. No matter EMPTY value comes at anyplace, it should be in array.

推荐答案

split(delimiter) 默认从结果数组中删除尾随的空字符串.要关闭此机制,我们需要使用 split(delimiter, limit) 的重载版本,并将 limit 设置为负值,例如

split(delimiter) by default removes trailing empty strings from result array. To turn this mechanism off we need to use overloaded version of split(delimiter, limit) with limit set to negative value like

String[] split = data.split("\|", -1);

更多细节:
split(regex) 内部返回 split(regex, 0) 和 你可以找到这个方法的文档(强调我的)

Little more details:
split(regex) internally returns result of split(regex, 0) and in documentation of this method you can find (emphasis mine)

limit 参数控制应用模式的次数,因此会影响结果数组的长度.

The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array.

如果限制 n 大于零,那么该模式将最多应用 n - 1 次,数组的长度将不大于 n,并且数组的最后一个条目将包含最后一个匹配分隔符之外的所有输入.

If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter.

如果 n 为 非正数,则该模式将被应用尽可能多的次数,并且数组可以具有任意长度.

If n is non-positive then the pattern will be applied as many times as possible and the array can have any length.

如果n为零,那么该模式将被应用尽可能多的次数,数组可以有任意长度,尾随的空字符串将被丢弃.

If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.

例外:

值得一提的是,删除尾随的空字符串才有意义只有当这样的空字符串是由拆分机制创建时.所以对于 "".split(anything) 因为我们不能将 "" 分割得更远,我们将得到结果 [""] 数组.
发生这种情况是因为此处没有发生拆分,因此 "" 尽管为空且尾随表示 original 字符串,而不是 创建的空字符串 通过拆分过程.

It is worth mentioning that removing trailing empty string makes sense only if such empty strings were created by the split mechanism. So for "".split(anything) since we can't split "" farther we will get as result [""] array.
It happens because split didn't happen here, so "" despite being empty and trailing represents original string, not empty string which was created by splitting process.

相关文章