积累一个Java Stream，然后再处理它

2022-01-22 00:00:00 reduce java-8 java java-stream collectors

我的文档如下所示:

数据.txt

100, "some text"
101, "more text"
102, "even more text"

我使用正则表达式处理它并返回一个新的处理文档，如下所示:

I processed it using regex and returned a new processed documents as the follow:

Stream<String> lines = Files.lines(Paths.get(data.txt); Pattern regex = Pattern.compile("([\d{1,3}]),(.*)"); List<MyClass> result = lines.map(regex::matcher) .filter(Matcher::find) .map(m -> new MyClass(m.group(1), m.group(2)) //MyClass(int id, String text) .collect(Collectors.toList());

这将返回已处理的 MyClass 列表.可以并行运行，一切正常.

This returns a list of MyClass processed. Can run in parallel and everything is ok.

问题是我现在有这个:

data2.txt

101, "some text the text continues in the next line and maybe in the next" 102, "for a random number of lines" 103, "until the new pattern of new id comma appears"

所以，我不知何故需要加入从流中读取的行，直到出现新的匹配项.(类似于缓冲区的东西?)

So, I somehow need to join lines that are being read from the stream until a new match appear. (Something like an buffer?)

我尝试收集字符串，然后收集 MyClass()，但没有成功，因为我实际上无法拆分流.

I tried to Collect strings and then collect MyClass(), but with no success, because I cannot actually split streams.

Reduce 想到连接行，但我只连接行，我不能减少和生成新的行流.

Reduce comes to mind to concatenate lines, but I'll concatenate just lines and I cannot reduce and generate a new stream of lines.

任何想法如何用 java 8 流解决这个问题?

Any ideas how to solve this with java 8 streams?

推荐答案

这是 java.util.Scanner 的工作.对于即将推出的 Java 9，您可以编写:

This is a job for java.util.Scanner. With the upcoming Java 9, you would write:

List<MyClass> result; try(Scanner s=new Scanner(Paths.get("data.txt"))) { result = s.findAll("(\d{1,3}),\s*"([^"]*)"") //MyClass(int id, String text) .map(m -> new MyClass(Integer.parseInt(m.group(1)), m.group(2))) .collect(Collectors.toList()); } result.forEach(System.out::println);

但由于生成 findAll 的 Stream 在 Java 8 下不存在，我们需要一个辅助方法:

but since the Stream producing findAll does not exist under Java 8, we’ll need a helper method:

private static Stream<MatchResult> matches(Scanner s, String pattern) { Pattern compiled=Pattern.compile(pattern); return StreamSupport.stream( new Spliterators.AbstractSpliterator<MatchResult>(1000, Spliterator.ORDERED|Spliterator.NONNULL) { @Override public boolean tryAdvance(Consumer<? super MatchResult> action) { if(s.findWithinHorizon(compiled, 0)==null) return false; action.accept(s.match()); return true; } }, false); }

用这个辅助方法替换findAll，我们得到

Replacing findAll with this helper method, we get

List<MyClass> result; try(Scanner s=new Scanner(Paths.get("data.txt"))) { result = matches(s, "(\d{1,3}),\s*"([^"]*)"") // MyClass(int id, String text) .map(m -> new MyClass(Integer.parseInt(m.group(1)), m.group(2))) .collect(Collectors.toList()); }

相关文章