用逗号分隔值的 Solr Facet 多个单词

2021-12-30 00:00:00 solr mysql dataimporthandler

我正在将数据从 mysql 拉入 solr.其中一个字段是使用 group_concat 函数生成的,该函数会生成一个逗号分隔的字段,其中列出了一个事件的所有波段.当时我相信这是为一个事件存储多个乐队的最佳方式.但是,我发现我无法针对所有事件处理此查询.

I'm pulling data into solr from mysql. One of the fields is generated using a group_concat function that results in a comma separated field that lists all the bands for an event. At the time I believe this was the best way to store multiple bands for one event. However, I'm finding that I cannot facet this query against all events.

我已将波段字段设置为字符串并将多值设置为 true.

I've set the band field to string and multivalued to true.

<field name="bands" type="string" indexed="true" stored="true" multiValued="true"/>

结果与预期一致,其中字符串分面为一个长字符串.

The result is as expected where the string is faceted as one long string.

"珍珠果酱,爱丽丝,尖叫的树,Everclear",1,"Primus,Gaga,培根位",1,"公鸡、翅膀、鼓槌、尾羽",1,

"Pearl Jam,Alice,Screaming Trees,Everclear",1, "Primus,Gaga,Bacon Bits",1, "Roosters,Wings,Drumsticks,Tail Feathers",1,

这种方法的最大问题是当字段类型是字符串时,它似乎不可搜索.似乎我需要创建一个重复的字段,该字段类型为 text_general 用于搜索并有一个用于分面.是吗?

The biggest problem with this approach is when the field type is string it appears to not be searchable. Seems like I need to create a duplicate field that is type text_general for searching and have one for faceting. Yes?

有没有办法为 band 字段声明一个分隔符来正确处理这个问题,还是我的方法有误?

Is there a way to declare a delimiter for the band field to facet this properly, or is my approach wrong?

推荐答案

标记你的领域并不能解决你的方面问题,你可以用单个乐队名称搜索并获得结果,但方面会更糟.基本规则是不要对用于制作刻面的字段使用任何标记化或文本增强.

Tokenizing your field will not solve your facet problem, you will be able to search with a single band name and get results, but the facet will be even worse. The basic rule is to not use any tokenization or text enhance for field used to make facets.

使用 multiValued 字段很好,但实际上是将带有带列表的单个值放入其中,因为您的查询将该列表作为单个列返回,该列映射到 Solr 中相关字段的单个值.

It's good to use a multiValued field, but are actually putting into it a single value with a list of bands, because your query return that list as a single column that is mapped to a single value for the related field in Solr.

您可以保留 group_concat 输出并通过对 data-config.xml 的简单更改来解决您的问题,告诉 Solr 使用分隔符拆分这些带名称.查看 RegexTransformer 及其 splitBy 参数:

You can keep the group_concat output and solve your problem with a simple change to your data-config.xml, telling Solr to split those band names using a separator. Have a look at the RegexTransformer and its splitBy parameter:

splitBy :用于拆分一个 String 以获得多个值,返回一个值列表

splitBy : Used to split a String to obtain multiple values, returns a list of values

如果您使用用于 group_concat 的相同分隔符配置 splitBy,那么技巧就完成了,您将拥有多个值,并且您的构面看起来不错.

If you configure the splitBy with the same separator you're using for group_concat the trick is done, you'll have multiple values and your facet will look good.

相关文章