从 Java 8 中的列表中提取重复对象

2022-01-10 00:00:00 list duplicates java-8 java java-stream

此代码从原始列表中删除重复项，但我想从原始列表中提取重复项 -> 不删除它们(此包名称只是另一个项目的一部分):

This code removes duplicates from the original list, but I want to extract the duplicates from the original list -> not removing them (this package name is just part of another project):

给定:

一个人 pojo:

package at.mavila.learn.kafka.kafkaexercises; import org.apache.commons.lang3.builder.ToStringBuilder; public class Person { private final Long id; private final String firstName; private final String secondName; private Person(final Builder builder) { this.id = builder.id; this.firstName = builder.firstName; this.secondName = builder.secondName; } public Long getId() { return id; } public String getFirstName() { return firstName; } public String getSecondName() { return secondName; } public static class Builder { private Long id; private String firstName; private String secondName; public Builder id(final Long builder) { this.id = builder; return this; } public Builder firstName(final String first) { this.firstName = first; return this; } public Builder secondName(final String second) { this.secondName = second; return this; } public Person build() { return new Person(this); } } @Override public String toString() { return new ToStringBuilder(this) .append("id", id) .append("firstName", firstName) .append("secondName", secondName) .toString(); } }

重复提取码.

注意这里我们过滤了 id 和名字来检索一个新列表，我在其他地方看到了这段代码，不是我的:

Notice here we filter the id and the first name to retrieve a new list, I saw this code someplace else, not mine:

package at.mavila.learn.kafka.kafkaexercises; import java.util.List; import java.util.Map; import java.util.Objects; import java.util.concurrent.ConcurrentHashMap; import java.util.function.Function; import java.util.function.Predicate; import java.util.stream.Collectors; import static java.util.Objects.isNull; public final class DuplicatePersonFilter { private DuplicatePersonFilter() { //No instances of this class } public static List<Person> getDuplicates(final List<Person> personList) { return personList .stream() .filter(duplicateByKey(Person::getId)) .filter(duplicateByKey(Person::getFirstName)) .collect(Collectors.toList()); } private static <T> Predicate<T> duplicateByKey(final Function<? super T, Object> keyExtractor) { Map<Object,Boolean> seen = new ConcurrentHashMap<>(); return t -> isNull(seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE)); } }

测试代码.如果你运行这个测试用例，你会得到 [alex, lolita, elpidio, romualdo].

The test code. If you run this test case you will get [alex, lolita, elpidio, romualdo].

我希望得到 [romualdo, otroRomualdo] 作为给定 id 和 firstName 的提取副本:

I would expect to get instead [romualdo, otroRomualdo] as the extracted duplicates given the id and the firstName:

package at.mavila.learn.kafka.kafkaexercises; import org.junit.Test; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.ArrayList; import java.util.List; import static org.junit.Assert.*; public class DuplicatePersonFilterTest { private static final Logger LOGGER = LoggerFactory.getLogger(DuplicatePersonFilterTest.class); @Test public void testList(){ Person alex = new Person.Builder().id(1L).firstName("alex").secondName("salgado").build(); Person lolita = new Person.Builder().id(2L).firstName("lolita").secondName("llanero").build(); Person elpidio = new Person.Builder().id(3L).firstName("elpidio").secondName("ramirez").build(); Person romualdo = new Person.Builder().id(4L).firstName("romualdo").secondName("gomez").build(); Person otroRomualdo = new Person.Builder().id(4L).firstName("romualdo").secondName("perez").build(); List<Person> personList = new ArrayList<>(); personList.add(alex); personList.add(lolita); personList.add(elpidio); personList.add(romualdo); personList.add(otroRomualdo); final List<Person> duplicates = DuplicatePersonFilter.getDuplicates(personList); LOGGER.info("Duplicates: {}",duplicates); } }

在我的工作中，我能够通过使用 TreeMap 和 ArrayList 的 Comparator 来获得所需的结果，但这是创建一个列表然后对其进行过滤，再次将过滤器传递给新创建的列表，这看起来很臃肿的代码，(并且可能效率低下)

In my job I was able to get the desired result it by using Comparator using TreeMap and ArrayList, but this was creating a list then filtering it, passing the filter again to a newly created list, this looks bloated code, (and probably inefficient)

有人对如何提取重复项有更好的想法吗?而不是删除它们.

Does someone has a better idea how to extract duplicates?, not remove them.

提前致谢.

更新

感谢大家的回答

使用与 uniqueAttributes 相同的方法删除重复项:

To remove the duplicate using same approach with the uniqueAttributes:

public static List<Person> removeDuplicates(List<Person> personList) { return getDuplicatesMap(personList).values().stream() .filter(duplicates -> duplicates.size() > 1) .flatMap(Collection::stream) .collect(Collectors.toList()); } private static Map<String, List<Person>> getDuplicatesMap(List<Person> personList) { return personList.stream().collect(groupingBy(DuplicatePersonFilter::uniqueAttributes)); } private static String uniqueAttributes(Person person){ if(Objects.isNull(person)){ return StringUtils.EMPTY; } return (person.getId()) + (person.getFirstName()) ; }

更新 2

但@brett-ryan 提供的答案也是正确的:

But also the answer provided by @brett-ryan is correct:

public static List<Person> extractDuplicatesWithIdentityCountingV2(final List<Person> personList){ List<Person> duplicates = personList.stream() .collect(Collectors.groupingBy(Function.identity(), Collectors.counting())) .entrySet().stream() .filter(n -> n.getValue() > 1) .flatMap(n -> nCopies(n.getValue().intValue(), n.getKey()).stream()) .collect(toList()); return duplicates; }

编辑

上面的代码可以在下面找到:

Above code can be found under:

https://gitlab.com/totopoloco/marco_utilities/-/tree/master/duplicates_exercises

请看:

用法:https://gitlab.com/totopoloco/marco_utilities/-/blob/master/duplicates_exercises/src/test/java/at/mavila/exercises/duplicates/lists/DuplicatePersonFilterTest.java

实施:https://gitlab.com/totopoloco/marco_utilities/-/blob/master/duplicates_exercises/src/main/java/at/mavila/exercises/duplicates/lists/DuplicatePersonFilter.java

推荐答案

如果你可以在 Person 上实现 equals 和 hashCode 那么你就可以使用 groupingBy 的计数下游收集器来获取已重复的不同元素.

If you could implement equals and hashCode on Person you could then use a counting down-stream collector of the groupingBy to get distinct elements that have been duplicated.

List<Person> duplicates = personList.stream() .collect(groupingBy(identity(), counting())) .entrySet().stream() .filter(n -> n.getValue() > 1) .map(n -> n.getKey()) .collect(toList());

如果您想保留一个连续重复元素的列表，您可以使用 Collections.nCopies 将其展开.此方法将确保重复的元素排列在一起.

If you would like to keep a list of sequential repeated elements you can then expand this out using Collections.nCopies to expand it back out. This method will ensure repeated elements are ordered together.

List<Person> duplicates = personList.stream() .collect(groupingBy(identity(), counting())) .entrySet().stream() .filter(n -> n.getValue() > 1) .flatMap(n -> nCopies(n.getValue().intValue(), n.getKey()).stream()) .collect(toList());

相关文章