无法使用 lucene IndexWriter.deleteDocuments(term) 删除文档
这两天一直在苦苦挣扎,就是无法用indexWriter.deleteDocuments(term)
Have been struggling for this two days now, just can't delete the document with indexWriter.deleteDocuments(term)
这里我会放上做测试的代码,希望有人能指出我做错了什么,已经尝试过的事情:
Here I will put the code which will do a test, hopefully someone could point out what I have done wrong, things that have been tried:
- 将 lucene 版本从
2.x
更新为5.x
- 使用
indexWriter.deleteDocuments()
代替indexReader.deleteDocuments()
- 将
indexOption
配置为NONE
或DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
- Updating the lucene version from
2.x
to5.x
- Using
indexWriter.deleteDocuments()
instead ofindexReader.deleteDocuments()
- Tring the
indexOption
configured asNONE
orDOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
这里是代码:
import org.apache.lucene.analysis.core.SimpleAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.FieldType;
import org.apache.lucene.index.*;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import java.io.IOException;
import java.nio.file.Paths;
public class TestSearch {
static SimpleAnalyzer analyzer = new SimpleAnalyzer();
public static void main(String[] argvs) throws IOException, ParseException {
generateIndex("5836962b0293a47b09d345f1");
query("5836962b0293a47b09d345f1");
delete("5836962b0293a47b09d345f1");
query("5836962b0293a47b09d345f1");
}
public static void generateIndex(String id) throws IOException {
Directory directory = FSDirectory.open(Paths.get("/tmp/test/lucene"));
IndexWriterConfig config = new IndexWriterConfig(analyzer);
IndexWriter iwriter = new IndexWriter(directory, config);
FieldType fieldType = new FieldType();
fieldType.setStored(true);
fieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
Field idField = new Field("_id", id, fieldType);
Document doc = new Document();
doc.add(idField);
iwriter.addDocument(doc);
iwriter.close();
}
public static void query(String id) throws ParseException, IOException {
Query query = new QueryParser("_id", analyzer).parse(id);
Directory directory = FSDirectory.open(Paths.get("/tmp/test/lucene"));
IndexReader ireader = DirectoryReader.open(directory);
IndexSearcher isearcher = new IndexSearcher(ireader);
ScoreDoc[] scoreDoc = isearcher.search(query, 100).scoreDocs;
for(ScoreDoc scdoc: scoreDoc){
Document doc = isearcher.doc(scdoc.doc);
System.out.println(doc.get("_id"));
}
}
public static void delete(String id){
try {
Directory directory = FSDirectory.open(Paths.get("/tmp/test/lucene"));
IndexWriterConfig config = new IndexWriterConfig(analyzer);
IndexWriter iwriter = new IndexWriter(directory, config);
Term term = new Term("_id", id);
iwriter.deleteDocuments(term);
iwriter.commit();
iwriter.close();
}catch (IOException e){
e.printStackTrace();
}
}
}
首先generateIndex()
会在/tmp/test/lucene
中生成索引,query()
会显示id
会被查询成功,那么 delete()
是希望删除该文档,但再次 query()
将证明删除操作失败.
First generateIndex()
will generate a index in /tmp/test/lucene
, and query()
will show that id
will be successfully queried, then delete()
was hopefully to deleting the document, but query()
again will prove that the deleting action failed.
这是 pom 依赖项,以防有人可能需要测试
Here is the pom dependency in case someone may need for a test
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-core</artifactId>
<version>5.5.4</version>
<type>jar</type>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-analyzers-common</artifactId>
<version>5.5.4</version>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-queryparser</artifactId>
<version>5.5.4</version>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-analyzers-smartcn</artifactId>
<version>5.5.4</version>
</dependency>
渴望得到答案.
推荐答案
你的问题出在分析器上.SimpleAnalyzer
将标记定义为 letters 的最大字符串(StandardAnalyzer
,甚至 WhitespaceAnalyzer
,是更典型的选择),所以您要索引的值被拆分为标记:b"、a"、b"、d"、f".您定义的删除方法虽然不会通过分析器,但只是创建一个原始术语.如果您尝试将 main
替换为以下内容,则可以看到这一点:
Your problem is in the analyzer. SimpleAnalyzer
defines tokens as maximal strings of letters (StandardAnalyzer
, or even WhitespaceAnalyzer
, are more typical choices), so the value you are indexing gets split into the tokens: "b", "a", "b", "d", "f". The delete method you've defined doesn't pass through the analyzer though, but rather just creates a raw term. You can see this in action if you try replacing your main
with this:
generateIndex("5836962b0293a47b09d345f1");
query("5836962b0293a47b09d345f1");
delete("b");
query("5836962b0293a47b09d345f1");
作为一般规则,查询和术语等不分析,QueryParser 会.
As a general rule, queries and terms and such do not analyze, QueryParser does.
对于(看起来像)一个标识符字段,您可能根本不想分析这个字段.在这种情况下,将其添加到 FieldType:
For (what looks like) an identifier field, you probably don't really want to analyze this field at all. In that case, add this to the FieldType:
fieldType.setTokenized(false);
然后您将不得不更改您的查询(同样,QueryParser 分析),并改用 TermQuery
.
You will then have to change your query (again, QueryParser analyzes), and use TermQuery
instead.
Query query = new TermQuery(new Term("_id", id));
相关文章