Lucene 搜索结果按自定义订单列表排序(每个用户唯一)

2022-01-15 00:00:00 algorithm lucene solr sphinx java

我的应用程序中有经过身份验证的用户可以访问包含多达 500,000 个项目的共享数据库.每个用户都有自己的面向公众的网站,并且需要能够优先考虑在他们自己的网站上展示的项目(想想赞成).

I have authenticated users in my application who have access to a shared database of up to 500,000 items. Each of the users has their own public facing web site and needs the ability to prioritize the items on display (think upvote) on their own site.

在 500,000 个项目中,他们最多可能只有 200 个优先项目,其余项目的顺序不太重要.

out of the 500,000 items they may only have up to 200 prioritized items, the order of the rest of the items is of less importance.

每个用户对项目的优先级不同.

Each of the users will prioritize the items differently.

我最初在这里问了一个类似的 mysql 问题 Mysql 结果按每个用户唯一的列表排序 并得到了很好的答案,但我相信更好的选择可能是选择非 sql 索引解决方案.

I initially asked a similar mysql question here Mysql results sorted by list which is unique for each user and got a good answer but i believe a better option may be to opt for a non sql indexed solution.

这可以在 Lucene 中完成吗?是否有另一种搜索技术会更好.

Can this be done in Lucene?, is there another search technology which would be better for this.

ps.Google 对其搜索结果实施了类似的类型设置,如果您已登录,您可以在其中优先考虑和排除您自己的搜索结果.

ps. Google implements a similar type setup with their search results where you can prioritize and exclude your own search results if you are logged in.

更新:在我阅读文档时重新标记为 sphinx,我相信它可能能够通过存储在内存中的每个文档属性值"来完成我正在寻找的事情 - 有兴趣听到对此的任何反馈来自斯芬克斯大师

Update: re-tagged with sphinx as i have been reading the documentation and i believe it may be able to do what i am looking for with "per-document attribute values" stored in memory - interested to hear any feedback on this from sphinx gurus

推荐答案

在构建索引时,您肯定希望将 item 的 id 存储在每个文档对象中.有几种方法可以进行下一步,但一种简单的方法是获取优先项并将它们添加到您的搜索查询中,对于每个特殊项如下所示:

You'll definitely want to store the id of item in each document object when building your index. There's a few ways to do the next step, but an easy one would be take the prioritized items and add them to your search query, something like this for each special item:

"OR item_id=%d+X"

其中 X 是您想要使用的提升量.您可能需要根据经验调整此数字,以确保仅被点赞"不会将其置于搜索完全不相关内容的列表的顶部.

where X is the amount of boost you'd like to use. You'll probably need to empirically tweak this number to make sure that just being "upvoted" doesn't put it to the top of a list searching for something totally unrelated.

这样做至少可以避免很多烦人的后处理步骤,这些步骤需要您遍历整个结果集——希望从查询索引开始就可以进行正确的排序.

Doing it this way will at least prevent you from a lot of annoying postprocessing steps that would require you to iterate over the whole result set -- hopefully the proper sorting will be there right from querying the index.

相关文章