lucene 在查询中获得匹配的术语
在 lucene 中找出与给定文档匹配的查询中的哪些词的最佳方法是什么?
What is the best way to find out which terms in a query matched against a given document returned as a hit in lucene?
我尝试了一种奇怪的方法,涉及 lucene contrib 中的命中突出显示包,以及一种针对最顶层文档搜索查询中每个单词的方法(docId: xy AND description: each_word_in_query").
I have tried a weird method involving hit highlighting package in lucene contrib and also a method that searches for every word in the query against the top most document ("docId: xy AND description: each_word_in_query").
没有得到满意的结果?命中突出显示不报告与第一个文档以外的文档匹配的某些单词.我不确定第二种方法是否是最佳选择.
Do not get satisfactory results? Hit highlighting does not report some of the words that matched for a document other than the first one. I'm not sure if the second approach is the best alternative.
推荐答案
方法explain 在 Searcher 中是查看查询的哪个部分匹配以及如何匹配的好方法它会影响总分.
The method explain in the Searcher is a nice way to see which part of a query was matched and how it affects the overall score.
示例取自《Lucene In Action 2nd Edition》一书:
Example taken from the book Lucene In Action 2nd Edition:
public class Explainer {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.err.println("Usage: Explainer <index dir> <query>");
System.exit(1);
}
String indexDir = args[0];
String queryExpression = args[1];
Directory directory = FSDirectory.open(new File(indexDir));
QueryParser parser = new QueryParser(Version.LUCENE_CURRENT,
"contents", new SimpleAnalyzer());
Query query = parser.parse(queryExpression);
System.out.println("Query: " + queryExpression);
IndexSearcher searcher = new IndexSearcher(directory);
TopDocs topDocs = searcher.search(query, 10);
for (int i = 0; i < topDocs.totalHits; i++) {
ScoreDoc match = topDocs.scoreDocs[i];
Explanation explanation = searcher.explain(query, match.doc);
System.out.println("----------");
Document doc = searcher.doc(match.doc);
System.out.println(doc.get("title"));
System.out.println(explanation.toString());
}
}
}
这将解释与查询匹配的每个文档的分数.
This will explain the score of each document that matches the query.
相关文章