使用流来操作字符串
假设我想从 String
中删除所有非字母.
Let's say that I want to remove all the non-letters from my String
.
String s = "abc-de3-2fg";
我可以使用 IntStream
来做到这一点:
I can use an IntStream
in order to do that:
s.stream().filter(ch -> Character.isLetter(ch)). // But then what?
我该怎么做才能将此流转换回 String
实例?
What can I do in order to convert this stream back to a String
instance?
另一方面,为什么我不能将 String
视为 Character
类型的对象流?
On a different note, why can't I treat a String
as a stream of objects of type Character
?
String s = "abc-de3-2fg";
// Yields a Stream of char[], therefore doesn't compile
Stream<Character> stream = Stream.of(s.toCharArray());
// Yields a stream with one member - s, which is a String object. Doesn't compile
Stream<Character> stream = Stream.of(s);
根据javadoc,Stream
的创建签名如下:
According to the javadoc, the Stream
's creation signature is as follows:
Stream.of(T... 值)
Stream.of(T... values)
我能想到的唯一(糟糕的)方法是:
The only (lousy) way that I could think of is:
String s = "abc-de3-2fg";
Stream<Character> stream = Stream.of(s.charAt(0), s.charAt(1), s.charAt(2), ...)
当然,这还不够好……我错过了什么?
And of course, this isn't good enough... What am I missing?
推荐答案
这是问题第二部分的答案.如果你有一个由调用 string.chars()
产生的 IntStream
你可以通过转换为 char
然后通过调用 mapToObj
将结果装箱.例如,下面是如何将 String
转换为 Set
:
Here's an answer the second part of the question. If you have an IntStream
resulting from calling string.chars()
you can get a Stream<Character>
by casting to char
and then boxing the result by calling mapToObj
. For example, here's how to turn a String
into a Set<Character>
:
Set<Character> set = string.chars()
.mapToObj(ch -> (char)ch)
.collect(Collectors.toSet());
请注意,强制转换为 char
对于将装箱结果变为 Character
而不是 Integer
至关重要.
Note that casting to char
is essential for the boxed result to be Character
instead of Integer
.
现在处理 char
或 Character
数据的大问题是补充字符表示为 char<的 代理对/code> 值,因此任何处理单个
char
值的算法在出现补充字符时都可能会失败.
Now the big problem with dealing with char
or Character
data is that supplementary characters are represented as surrogate pairs of char
values, so any algorithm with deals with individual char
values will probably fail when presented with supplementary characters.
(看起来补充字符是我们无需担心的晦涩难懂的 Unicode 功能,但据我所知,所有表情符号都是补充字符.)
(It may seem like supplementary characters are an obscure Unicode feature that we don't need to worry about, but as far as I know, all emoji are supplementary characters.)
考虑这个例子:
string.chars()
.filter(Character::isAlphabetic)
...
如果出现包含代码点 U+1D400(数学粗体大写字母 A)的字符串,这将失败.该代码点在字符串中表示为代理对,并且代理对的值都不是字母字符.要获得正确的结果,您需要这样做:
This will fail if presented with a string that contains the code point U+1D400 (Mathematical Bold Capital A). That code point is represented as a surrogate pair in the string, and neither value of a surrogate pair is an alphabetic character. To get the correct result, you'd need to do this instead:
string.codePoints()
.filter(Character::isAlphabetic)
...
我建议始终使用 codePoints()
.
现在,给定一个 IntStream
的代码点,如何将它重新组合成一个字符串?Sleiman Jneidi 的回答 是一个合理的答案 (+1),使用三个参数 collect()
IntStream
的方法.
Now, given an IntStream
of code points, how can one reassemble it into a String? Sleiman Jneidi's answer is a reasonable one (+1), using the three-arg collect()
method of IntStream
.
这里有一个替代方案:
StringBuilder sb = ... ;
string.codePoints()
.filter(...)
.forEachOrdered(sb::appendCodePoint);
return sb.toString();
这可能更灵活一些,如果您已经有一个 StringBuilder
用于累积字符串数据.您不必每次都创建一个新的 StringBuilder
,也不必事后将其转换为 String
.
This might be a bit more flexible, in cases where you already have a StringBuilder
that you're using to accumulate string data. You don't have to create a new StringBuilder
each time, nor do you have to convert it to a String
afterwards.
相关文章