ElasticSearch 仅返回具有不同值的文档

2022-01-13 00:00:00 elasticsearch nosql aggregate java spring-data-elasticsearch

假设我有这个给定的数据

Let's say I have this given data

{ "name" : "ABC", "favorite_cars" : [ "ferrari","toyota" ] }, { "name" : "ABC", "favorite_cars" : [ "ferrari","toyota" ] }, { "name" : "GEORGE", "favorite_cars" : [ "honda","Hyundae" ] }

每当我在搜索最喜欢的汽车是丰田的人时查询此数据时，它都会返回此数据

Whenever I query this data when searching for people who's favorite car is toyota, it returns this data

{ "name" : "ABC", "favorite_cars" : [ "ferrari","toyota" ] }, { "name" : "ABC", "favorite_cars" : [ "ferrari","toyota" ] }

结果是两条名为 ABC 的记录.如何仅选择不同的文档?我想得到的结果只有这个

the result is Two records of with a name of ABC. How do I select distinct documents only? The result I want to get is only this

{ "name" : "ABC", "favorite_cars" : [ "ferrari","toyota" ] }

这是我的查询

{ "fuzzy_like_this_field" : { "favorite_cars" : { "like_text" : "toyota", "max_query_terms" : 12 } } }

我正在使用 ElasticSearch 1.0.0.使用 java api 客户端

I am using ElasticSearch 1.0.0. with the java api client

推荐答案

您可以使用聚合.使用术语聚合结果将按一个字段分组，例如name，还提供了该字段每个值的出现次数，并将按此计数对结果进行排序(降序).

You can eliminate duplicates using aggregations. With term aggregation the results will be grouped by one field, e.g. name, also providing a count of the ocurrences of each value of the field, and will sort the results by this count (descending).

{ "query": { "fuzzy_like_this_field": { "favorite_cars": { "like_text": "toyota", "max_query_terms": 12 } } }, "aggs": { "grouped_by_name": { "terms": { "field": "name", "size": 0 } } } }

除了 hits 之外，结果还将包含 buckets，其中 key 中的唯一值和 中的计数>doc_count:

In addition to the hits, the result will also contain the buckets with the unique values in key and with the count in doc_count:

{ "took" : 4, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 2, "max_score" : 0.19178301, "hits" : [ { "_index" : "pru", "_type" : "pru", "_id" : "vGkoVV5cR8SN3lvbWzLaFQ", "_score" : 0.19178301, "_source":{"name":"ABC","favorite_cars":["ferrari","toyota"]} }, { "_index" : "pru", "_type" : "pru", "_id" : "IdEbAcI6TM6oCVxCI_3fug", "_score" : 0.19178301, "_source":{"name":"ABC","favorite_cars":["ferrari","toyota"]} } ] }, "aggregations" : { "grouped_by_name" : { "buckets" : [ { "key" : "abc", "doc_count" : 2 } ] } } }

请注意，由于重复消除和结果排序，使用聚合的成本会很高.

Note that using aggregations will be costly because of duplicate elimination and result sorting.

相关文章