MySQL中布尔全文搜索中的`~`(代字号)运算符的行为与MySQL开发人员网站中的声明不符
我已经创建了下表fruits
-
CREATE TABLE `fruits` (
`id` tinyint unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(200) NOT NULL,
PRIMARY KEY (`id`),
FULLTEXT KEY `ft_name` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
然后我在表fruits
-
SELECT * FROM fruits;
+----+---------------+
| id | name |
+----+---------------+
| 1 | apple, orange |
| 2 | apple, mango |
| 3 | mango, kiwi |
| 4 | mango, guava |
| 5 | apple, banana |
+----+---------------+
现在我运行以下三个SQL查询-
查询1:
SELECT id, name FROM fruits
-> WHERE MATCH(name) AGAINST
-> ('+apple' IN BOOLEAN MODE);
+----+---------------+
| id | name |
+----+---------------+
| 1 | apple, orange |
| 2 | apple, mango |
| 5 | apple, banana |
+----+---------------+
查询2:
SELECT id, name FROM fruits
-> WHERE MATCH(name) AGAINST
-> ('+apple -orange' IN BOOLEAN MODE);
+----+---------------+
| id | name |
+----+---------------+
| 2 | apple, mango |
| 5 | apple, banana |
+----+---------------+
查询3:
SELECT id, name FROM fruits
-> WHERE MATCH(name) AGAINST
-> ('+apple ~orange' IN BOOLEAN MODE);
+----+---------------+
| id | name |
+----+---------------+
| 1 | apple, orange |
| 2 | apple, mango |
| 5 | apple, banana |
+----+---------------+
根据MySQL开发人员网站,以下是‘布尔全文搜索’中~
(波浪号)运算符的功能
https://dev.mysql.com/doc/refman/8.0/en/fulltext-boolean.html
查找包含单词"Apple"的行,但如果该行还包含单词"Macintosh",则其评级低于不包含单词"Macintosh"的行。这比搜索‘+APPLE-Macintosh’要"温和",搜索"Macintosh"会导致该行根本不返回。
- ‘+Apple~Macintosh’
我尝试了‘Query 3’中的~
(代字号)运算符,但输出肯定不是预期的结果。在这里,预期行为是最后出现id = 1
的行。
附注: 我在x86_64((Ubuntu))上使用MySQL版本-8.0.26-0ubuntu0.20.04.2 for Linux((Ubuntu))
解决方案
虽然任何地方都没有关于我的答案的文档,但经过彻底的试验,我得出了这个最符合逻辑的结论-
存在‘+’运算符将使‘~’运算符的任何效果无效
我已使用下列值更新了我的表fruits
-
SELECT * FROM fruits;
+----+-------------------------+
| id | name |
+----+-------------------------+
| 1 | apple orange watermelon |
| 2 | apple mango pomegranate |
| 3 | apple mango banana |
| 4 | mango kiwi pomegranate |
| 5 | mango guava watermelon |
| 6 | apple banana kiwi |
+----+-------------------------+
查询1:
SELECT id, name FROM fruits
-> WHERE MATCH(name) AGAINST
-> ('apple mango ~pomegranate'
-> IN BOOLEAN MODE);
+----+-------------------------+
| id | name |
+----+-------------------------+
| 3 | apple mango banana |
| 1 | apple orange watermelon |
| 5 | mango guava watermelon |
| 6 | apple banana kiwi |
| 2 | apple mango pomegranate |
| 4 | mango kiwi pomegranate |
+----+-------------------------+
查询2:
SELECT id, name FROM fruits
-> WHERE MATCH(name) AGAINST
-> ('apple ~pomegranate'
-> IN BOOLEAN MODE);
+----+-------------------------+
| id | name |
+----+-------------------------+
| 1 | apple orange watermelon |
| 3 | apple mango banana |
| 6 | apple banana kiwi |
| 2 | apple mango pomegranate |
+----+-------------------------+
查询3:
SELECT id, name FROM fruits
-> WHERE MATCH(name) AGAINST
-> ('mango ~pomegranate'
-> IN BOOLEAN MODE);
+----+-------------------------+
| id | name |
+----+-------------------------+
| 3 | apple mango banana |
| 5 | mango guava watermelon |
| 2 | apple mango pomegranate |
| 4 | mango kiwi pomegranate |
+----+-------------------------+
此处,在查询1、2&;3中,值apple
之前没有运算符,mango
和~
运算符位于值pomegranate
之前。这可确保具有单词pomegranate
的行的排名低于其他行。
查询4:
SELECT id, name FROM fruits
-> WHERE MATCH(name) AGAINST
-> ('+apple +mango ~pomegranate'
-> IN BOOLEAN MODE);
+----+-------------------------+
| id | name |
+----+-------------------------+
| 2 | apple mango pomegranate |
| 3 | apple mango banana |
+----+-------------------------+
查询5:
SELECT id, name FROM fruits
-> WHERE MATCH(name) AGAINST
-> ('+apple ~pomegranate'
-> IN BOOLEAN MODE);
+----+-------------------------+
| id | name |
+----+-------------------------+
| 1 | apple orange watermelon |
| 2 | apple mango pomegranate |
| 3 | apple mango banana |
| 6 | apple banana kiwi |
+----+-------------------------+
查询6:
SELECT id, name FROM fruits
-> WHERE MATCH(name) AGAINST
-> ('+mango ~pomegranate'
-> IN BOOLEAN MODE);
+----+-------------------------+
| id | name |
+----+-------------------------+
| 2 | apple mango pomegranate |
| 3 | apple mango banana |
| 4 | mango kiwi pomegranate |
| 5 | mango guava watermelon |
+----+-------------------------+
这里,在查询4、5&;6+
中,运算符位于值apple
之前,mango
和~
运算符位于值pomegranate
之前。显然,+
运算符的存在会使~
运算符的任何效果无效。
查询7:
SELECT id, name FROM fruits
-> WHERE MATCH(name) AGAINST
-> ('+apple +mango <pomegranate'
-> IN BOOLEAN MODE);
+----+-------------------------+
| id | name |
+----+-------------------------+
| 3 | apple mango banana |
| 2 | apple mango pomegranate |
+----+-------------------------+
查询8:
SELECT id, name FROM fruits
-> WHERE MATCH(name) AGAINST
-> ('+apple <pomegranate'
-> IN BOOLEAN MODE);
+----+-------------------------+
| id | name |
+----+-------------------------+
| 1 | apple orange watermelon |
| 3 | apple mango banana |
| 6 | apple banana kiwi |
| 2 | apple mango pomegranate |
+----+-------------------------+
查询9:
SELECT id, name FROM fruits
-> WHERE MATCH(name) AGAINST
-> ('+mango <pomegranate'
-> IN BOOLEAN MODE);
+----+-------------------------+
| id | name |
+----+-------------------------+
| 3 | apple mango banana |
| 5 | mango guava watermelon |
| 2 | apple mango pomegranate |
| 4 | mango kiwi pomegranate |
+----+-------------------------+
在这里,在查询7,8&;9+
中,运算符位于值apple
之前,mango
和<
运算符位于值pomegranate
之前。这可确保具有单词pomegranate
的行的排名低于其他行。
因此,从这里可以推断的是-
如果存在+
运算符,请使用<
运算符,而不是~
运算符
更新
经过大量计算,我创建了fruits_score_count
表,它显示了执行布尔全文搜索时每个fruit
的score
。
SELECT * FROM fruits_score_count;
+----+-------------+---------------------+----------------------+
| id | fruit_name | row_numbers_matched | score |
+----+-------------+---------------------+----------------------+
| 1 | apple | 4 | 0.031008131802082062 |
| 2 | banana | 2 | 0.22764469683170319 |
| 3 | guava | 1 | 0.6055193543434143 |
| 4 | kiwi | 2 | 0.22764469683170319 |
| 5 | mango | 4 | 0.031008131802082062 |
| 6 | orange | 1 | 0.6055193543434143 |
| 7 | pomegranate | 2 | 0.22764469683170319 |
| 8 | watermelon | 2 | 0.22764469683170319 |
+----+-------------+---------------------+----------------------+
查询1:
SELECT id, name, score FROM
-> (SELECT id, name, MATCH(name) AGAINST
-> ('apple mango ~pomegranate' IN BOOLEAN MODE)
-> AS score FROM fruits ORDER BY score DESC)
-> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name | score |
+----+-------------------------+----------------------+
| 3 | apple mango banana | 0.062016263604164124 |
| 1 | apple orange watermelon | 0.031008131802082062 |
| 5 | mango guava watermelon | 0.031008131802082062 |
| 6 | apple banana kiwi | 0.031008131802082062 |
| 2 | apple mango pomegranate | -0.7103390693664551 |
| 4 | mango kiwi pomegranate | -0.7413471937179565 |
+----+-------------------------+----------------------+
查询2:
SELECT id, name, score FROM
-> (SELECT id, name, MATCH(name) AGAINST
-> ('apple ~pomegranate' IN BOOLEAN MODE)
-> AS score FROM fruits ORDER BY score DESC)
-> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name | score |
+----+-------------------------+----------------------+
| 1 | apple orange watermelon | 0.031008131802082062 |
| 3 | apple mango banana | 0.031008131802082062 |
| 6 | apple banana kiwi | 0.031008131802082062 |
| 2 | apple mango pomegranate | -0.7413471937179565 |
+----+-------------------------+----------------------+
查询3:
SELECT id, name, score FROM
-> (SELECT id, name, MATCH(name) AGAINST
-> ('mango ~pomegranate' IN BOOLEAN MODE)
-> AS score FROM fruits ORDER BY score DESC)
-> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name | score |
+----+-------------------------+----------------------+
| 3 | apple mango banana | 0.031008131802082062 |
| 5 | mango guava watermelon | 0.031008131802082062 |
| 2 | apple mango pomegranate | -0.7413471937179565 |
| 4 | mango kiwi pomegranate | -0.7413471937179565 |
+----+-------------------------+----------------------+
查询4:
SELECT id, name, score FROM
-> (SELECT id, name, MATCH(name) AGAINST
-> ('+apple +mango ~pomegranate' IN BOOLEAN MODE)
-> AS score FROM fruits ORDER BY score DESC)
-> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name | score |
+----+-------------------------+----------------------+
| 2 | apple mango pomegranate | 0.062016263604164124 |
| 3 | apple mango banana | 0.062016263604164124 |
+----+-------------------------+----------------------+
查询5:
SELECT id, name, score FROM
-> (SELECT id, name, MATCH(name) AGAINST
-> ('+apple ~pomegranate' IN BOOLEAN MODE)
-> AS score FROM fruits ORDER BY score DESC)
-> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name | score |
+----+-------------------------+----------------------+
| 1 | apple orange watermelon | 0.031008131802082062 |
| 2 | apple mango pomegranate | 0.031008131802082062 |
| 3 | apple mango banana | 0.031008131802082062 |
| 6 | apple banana kiwi | 0.031008131802082062 |
+----+-------------------------+----------------------+
查询6:
SELECT id, name, score FROM
-> (SELECT id, name, MATCH(name) AGAINST
-> ('+mango ~pomegranate' IN BOOLEAN MODE)
-> AS score FROM fruits ORDER BY score DESC)
-> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name | score |
+----+-------------------------+----------------------+
| 2 | apple mango pomegranate | 0.031008131802082062 |
| 3 | apple mango banana | 0.031008131802082062 |
| 4 | mango kiwi pomegranate | 0.031008131802082062 |
| 5 | mango guava watermelon | 0.031008131802082062 |
+----+-------------------------+----------------------+
查询7:
SELECT id, name, score FROM
-> (SELECT id, name, MATCH(name) AGAINST
-> ('+apple +mango <pomegranate' IN BOOLEAN MODE)
-> AS score FROM fruits ORDER BY score DESC)
-> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name | score |
+----+-------------------------+----------------------+
| 3 | apple mango banana | 0.062016263604164124 |
| 2 | apple mango pomegranate | -0.7103390693664551 |
+----+-------------------------+----------------------+
查询8:
SELECT id, name, score FROM
-> (SELECT id, name, MATCH(name) AGAINST
-> ('+apple <pomegranate' IN BOOLEAN MODE)
-> AS score FROM fruits ORDER BY score DESC)
-> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name | score |
+----+-------------------------+----------------------+
| 1 | apple orange watermelon | 0.031008131802082062 |
| 3 | apple mango banana | 0.031008131802082062 |
| 6 | apple banana kiwi | 0.031008131802082062 |
| 2 | apple mango pomegranate | -0.7413471937179565 |
+----+-------------------------+----------------------+
查询9:
SELECT id, name, score FROM
-> (SELECT id, name, MATCH(name) AGAINST
-> ('+mango <pomegranate' IN BOOLEAN MODE)
-> AS score FROM fruits ORDER BY score DESC)
-> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name | score |
+----+-------------------------+----------------------+
| 3 | apple mango banana | 0.031008131802082062 |
| 5 | mango guava watermelon | 0.031008131802082062 |
| 2 | apple mango pomegranate | -0.7413471937179565 |
| 4 | mango kiwi pomegranate | -0.7413471937179565 |
+----+-------------------------+----------------------+
此处,查询%1、查询%2、查询%3、查询%7、查询%8、查询%9的行为符合预期。
但从查询4、查询5、查询6可以明显看出-
在值前面有+
运算符的情况下,~
运算符基本上使该值不可见。
仔细观察还发现-
x ~y
和+x <y
等效
进一步试验
查询1:
SELECT id, name, score FROM
-> (SELECT id, name, MATCH(name) AGAINST
-> ('+mango apple ~pomegranate' IN BOOLEAN MODE)
-> AS score FROM fruits ORDER BY score DESC)
-> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name | score |
+----+-------------------------+----------------------+
| 3 | apple mango banana | 0.062016263604164124 |
| 4 | mango kiwi pomegranate | 0.031008131802082062 |
| 5 | mango guava watermelon | 0.031008131802082062 |
| 2 | apple mango pomegranate | -0.7103390693664551 |
+----+-------------------------+----------------------+
id = 3
的第1行获得最高分数,它是mango
和apple
的分数之和。- 具有
id = 4
的第2行获得第二高分,这是mango
的得分。+
运算符出现在mango
前面使搜索短语的~pomegranate
无关紧要。 - 具有
id = 5
的第3行与第2行得分相同,但低于第2行,因为得分相同时,行按primary key
升序排列,这里id
是primary key
。 - 第4行
id = 2
得分最低,因此排在最后。这里,由于存在单词apple
,且搜索短语中apple
之前没有+
运算符,因此考虑了搜索短语中的~pomegranate
,大大降低了得分。
查询2:
SELECT id, name, score FROM
-> (SELECT id, name, MATCH(name) AGAINST
-> ('+mango apple <pomegranate' IN BOOLEAN MODE)
-> AS score FROM fruits ORDER BY score DESC)
-> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name | score |
+----+-------------------------+----------------------+
| 3 | apple mango banana | 0.062016263604164124 |
| 5 | mango guava watermelon | 0.031008131802082062 |
| 2 | apple mango pomegranate | -0.7103390693664551 |
| 4 | mango kiwi pomegranate | -0.7413471937179565 |
+----+-------------------------+----------------------+
这再次说明<
运算符即使在+
运算符存在的情况下也有效。
这进一步加强了我早先的观察,即-
如果存在+
运算符,请使用<
运算符而不是~
运算符
相关文章