为什么 Lucene 不支持对现有文档进行任何类型的更新
我的用例涉及索引一个 Lucene 文档,然后在以后的多个场合添加指向该现有文档的术语,而不是为每个新术语删除和重新添加整个文档(因为性能,而不是保留原始条款).
My use case involves index a Lucene document, then on multiple future occasions add terms that point to this existing doc, that's without deleting and re-adding the entire document for each new term (because of performance, and not keeping the original terms).
我知道文档不能真正更新.我的问题是为什么?
I do know that a document can not be truly updated. My question is why?
或者更准确地说,为什么不支持所有形式的更新(术语、存储字段)?
为什么不可能添加另一个术语来指向现有文档 - 从技术上讲:所需要的不仅仅是将现有的文档 ID 放在术语的发布列表中.为什么这么难?是否有一些不可变的统计数据?
Or more precisely, why are all forms of updates (terms, stored fields) not supported?
Why it's not possible to add another term to point to an existing document - technically: isn't all that's needed is to have the existing doc Id placed in the posting list of the term. Why is that hard? Is there some immutable statistics that are in the way?
是否有任何解决方法可以支持我将术语(索引字段)添加到现有文档的用例?
Are there any workarounds for supporting my usecase of adding a term (indexed field) to an existing doc?
推荐答案
我知道文档不能真正更新.我的问题是为什么?
I do know that a document can not be truly updated. My question is why?
Gili,编辑文档会导致相关术语发布发生变化,由于术语发布列表结构,这是有问题的.过帐列表被排序并按顺序存储在内存中.因此,要将文档添加到术语的发布列表中,您必须为其提供更高的 doc id
,这是通过删除并重新索引整个文档来完成的.
Gili, editing a document will cause changes in the related terms postings and this is problematic due to to the terms posting-list structure. The posting-list is sorted and stored sequential in memory. Thus to add a document to a term's posting-list you have to give it a higher doc id
this is done by deleting and re-index the entire document.
相关文章