Lucene 中的关键字(OR、AND)搜索
我在我的门户(基于 J2EE)中使用 Lucene 来提供索引和搜索服务.
问题在于 Lucene 的关键字.当您在搜索查询中使用其中一个时,您会收到错误消息.
例如:
searchTerms = "ik OR jij"
这很好用,因为它会搜索 "ik"
或 "jij"
searchTerms = "ik AND jij"
这很好用,它搜索 "ik"
和 "jij"
但是当你搜索时:
searchTerms = "OR"searchTerms = "AND"searchTerms = "ik 或"searchTerms = "或 ik"
等等,会失败并报错:
<上一页>组件名称:STSE_RESULTS 类:org.apache.lucene.queryParser.ParseException 消息:无法解析OR jij":在第 1 行第 0 列遇到OR".期待其中之一:...这是有道理的,因为这些词是 Lucene 的关键字,可能是保留的,并将充当关键字.
在荷兰语中,OR"这个词很重要,因为它具有Ondernemings Raad"的含义.它在许多文本中使用,需要找到它.例如,或"确实有效,但不返回与或"一词匹配的文本.如何使其可搜索?
如何转义关键字或"?或者我如何告诉 Lucene 将或"视为搜索词而不是关键字.
解决方案我猜你试过把OR"放在双引号里?
如果这不起作用,我认为您可能不得不更改 Lucene 源代码,然后重新编译整个东西,因为运算符OR"深埋在代码中.实际上,编译可能还不够:您必须更改源包中用作 JavaCC 输入的文件 QueryParser.jj,然后运行 JavaCC,然后重新编译整个东西.
不过,好消息是只有一行需要更改:
<代码>|<OR: ("OR" | "||") >
变成
<代码>|<OR: ("||") >
这样,您将只有||"作为逻辑或运算符.有一个 build.xml 也包含 JavaCC 的调用,但你必须下载 那个工具你自己.恐怕我现在不能自己尝试.
这对于 Lucene 开发者邮件列表来说可能是一个很好的问题,但是如果你这样做了,请告诉我们,他们会提出一个更简单的解决方案 ;-)
I am using Lucene in my portal (J2EE based) for indexing and search services.
The problem is about the keywords of Lucene. When you use one of them in the search query, you'll get an error.
For example:
searchTerms = "ik OR jij"
This works fine, because it will search for "ik"
or "jij"
searchTerms = "ik AND jij"
This works fine, it searches for "ik"
and "jij"
But when you search:
searchTerms = "OR"
searchTerms = "AND"
searchTerms = "ik OR"
searchTerms = "OR ik"
Etc., it will fail with an error:
Component Name: STSE_RESULTS Class: org.apache.lucene.queryParser.ParseException Message: Cannot parse 'OR jij': Encountered "OR" at line 1, column 0. Was expecting one of: ...
It makes sense, because these words are keywords for Lucene are probably reserved and will act as keywords.
In Dutch, the word "OR" is important because it has a meaning for "Ondernemings Raad". It is used in many texts, and it needs to be found. For example "or" does work, but does not return texts matching the term "OR". How can I make it searchable?
How can I escape the keyword "or"? Or How can I tell Lucene to treat "or" as a search term NOT as a keyword.
解决方案I suppose you have tried putting the "OR" into double quotes?
If that doesn't work I think you might have to go so far as to change the Lucene source and then recompile the whole thing, as the operator "OR" is buried deep inside the code. Actually, compiling probably isn't even enough: you'll have to change the file QueryParser.jj in the source package that serves as input for JavaCC, then run JavaCC, then recompile the whole thing.
The good news, however, is that there's only one line to change:
| <OR: ("OR" | "||") >
becomes
| <OR: ("||") >
That way, you'll have only "||" as logical OR operator. There is a build.xml that also contains the invocation of JavaCC, but you have to download that tool yourself. I can't try it myself right now, I'm afraid.
This is perhaps a good question for the Lucene developer mailing list, but please let us know if you do that and they come up with a simpler solution ;-)
相关文章