收集 HashSet/Java 8/Regex Pattern/Stream API

2022-01-22 00:00:00 regex collections java-8 java java-stream

最近我更改了项目的 JDK 8 版本而不是 7 版本,现在我使用 Java 8 附带的新功能覆盖了一些代码片段.

Recently I change version of the JDK 8 instead 7 of my project and now I overwrite some code snippets using new features that came with Java 8.

final Matcher mtr = Pattern.compile(regex).matcher(input);

HashSet<String> set = new HashSet<String>() {{
    while (mtr.find()) add(mtr.group().toLowerCase());
}};

如何使用 Stream API 编写此代码?

推荐答案

如果你重用 JDK 提供的 Spliterators.AbstractSpliteratorMatcher 的拆分器实现会非常简单>:

A Matcher-based spliterator implementation can be quite simple if you reuse the JDK-provided Spliterators.AbstractSpliterator:

public class MatcherSpliterator extends AbstractSpliterator<String[]>
{
  private final Matcher m;

  public MatcherSpliterator(Matcher m) {
    super(Long.MAX_VALUE, ORDERED | NONNULL | IMMUTABLE);
    this.m = m;
  }

  @Override public boolean tryAdvance(Consumer<? super String[]> action) {
    if (!m.find()) return false;
    final String[] groups = new String[m.groupCount()+1];
    for (int i = 0; i <= m.groupCount(); i++) groups[i] = m.group(i);
    action.accept(groups);
    return true;
  }
}

请注意,拆分器提供 所有 匹配器组,而不仅仅是完整匹配.另请注意,此拆分器支持并行性,因为 AbstractSpliterator 实现了拆分策略.

Note that the spliterator provides all matcher groups, not just the full match. Also note that this spliterator supports parallelism because AbstractSpliterator implements a splitting policy.

通常您会使用便利的流工厂:

Typically you will use a convenience stream factory:

public static Stream<String[]> matcherStream(Matcher m) {
  return StreamSupport.stream(new MatcherSpliterator(m), false);
}

这为您简洁地编写各种复杂的面向正则表达式的逻辑提供了强大的基础,例如:

This gives you a powerful basis to concisely write all kinds of complex regex-oriented logic, for example:

private static final Pattern emailRegex = Pattern.compile("([^,]+?)@([^,]+)");
public static void main(String[] args) {
  final String emails = "kid@gmail.com, stray@yahoo.com, miks@tijuana.com";
  System.out.println("User has e-mail accounts on these domains: " +
      matcherStream(emailRegex.matcher(emails))
      .map(gs->gs[2])
      .collect(joining(", ")));
}

打印出来的

User has e-mail accounts on these domains: gmail.com, yahoo.com, tijuana.com

为了完整起见,您的代码将被重写为

For completeness, your code will be rewritten as

Set<String> set = matcherStream(mtr).map(gs->gs[0].toLowerCase()).collect(toSet());

相关文章