MySQL中布尔全文搜索中的`~`(代字号)运算符的行为与MySQL开发人员网站中的声明不符

我已经创建了下表fruits-

CREATE TABLE `fruits` (
  `id` tinyint unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(200) NOT NULL,
  PRIMARY KEY (`id`),
  FULLTEXT KEY `ft_name` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

然后我在表fruits-

中输入了下列值
SELECT * FROM fruits;
+----+---------------+
| id | name          |
+----+---------------+
|  1 | apple, orange |
|  2 | apple, mango  |
|  3 | mango, kiwi   |
|  4 | mango, guava  |
|  5 | apple, banana |
+----+---------------+

现在我运行以下三个SQL查询-

查询1:

SELECT id, name FROM fruits
    -> WHERE MATCH(name) AGAINST
    -> ('+apple' IN BOOLEAN MODE);
+----+---------------+
| id | name          |
+----+---------------+
|  1 | apple, orange |
|  2 | apple, mango  |
|  5 | apple, banana |
+----+---------------+

查询2:

SELECT id, name FROM fruits
    -> WHERE MATCH(name) AGAINST
    -> ('+apple -orange' IN BOOLEAN MODE);
+----+---------------+
| id | name          |
+----+---------------+
|  2 | apple, mango  |
|  5 | apple, banana |
+----+---------------+

查询3:

SELECT id, name FROM fruits
    -> WHERE MATCH(name) AGAINST
    -> ('+apple ~orange' IN BOOLEAN MODE);
+----+---------------+
| id | name          |
+----+---------------+
|  1 | apple, orange |
|  2 | apple, mango  |
|  5 | apple, banana |
+----+---------------+

根据MySQL开发人员网站,以下是‘布尔全文搜索’中~(波浪号)运算符的功能

https://dev.mysql.com/doc/refman/8.0/en/fulltext-boolean.html

  • ‘+Apple~Macintosh’
查找包含单词"Apple"的行,但如果该行还包含单词"Macintosh",则其评级低于不包含单词"Macintosh"的行。这比搜索‘+APPLE-Macintosh’要"温和",搜索"Macintosh"会导致该行根本不返回。

我尝试了‘Query 3’中的~(代字号)运算符,但输出肯定不是预期的结果。在这里,预期行为是最后出现id = 1的行。

附注: 我在x86_64((Ubuntu))上使用MySQL版本-8.0.26-0ubuntu0.20.04.2 for Linux((Ubuntu))


解决方案

虽然任何地方都没有关于我的答案的文档,但经过彻底的试验,我得出了这个最符合逻辑的结论-

存在‘+’运算符将使‘~’运算符的任何效果无效

我已使用下列值更新了我的表fruits-

SELECT * FROM fruits;
+----+-------------------------+
| id | name                    |
+----+-------------------------+
|  1 | apple orange watermelon |
|  2 | apple mango pomegranate |
|  3 | apple mango banana      |
|  4 | mango kiwi pomegranate  |
|  5 | mango guava watermelon  |
|  6 | apple banana kiwi       |
+----+-------------------------+

查询1:

SELECT id, name FROM fruits
    -> WHERE MATCH(name) AGAINST
    -> ('apple mango ~pomegranate'
    -> IN BOOLEAN MODE);
+----+-------------------------+
| id | name                    |
+----+-------------------------+
|  3 | apple mango banana      |
|  1 | apple orange watermelon |
|  5 | mango guava watermelon  |
|  6 | apple banana kiwi       |
|  2 | apple mango pomegranate |
|  4 | mango kiwi pomegranate  |
+----+-------------------------+

查询2:

SELECT id, name FROM fruits
    -> WHERE MATCH(name) AGAINST
    -> ('apple ~pomegranate'
    -> IN BOOLEAN MODE);
+----+-------------------------+
| id | name                    |
+----+-------------------------+
|  1 | apple orange watermelon |
|  3 | apple mango banana      |
|  6 | apple banana kiwi       |
|  2 | apple mango pomegranate |
+----+-------------------------+

查询3:

SELECT id, name FROM fruits
    -> WHERE MATCH(name) AGAINST
    -> ('mango ~pomegranate'
    -> IN BOOLEAN MODE);
+----+-------------------------+
| id | name                    |
+----+-------------------------+
|  3 | apple mango banana      |
|  5 | mango guava watermelon  |
|  2 | apple mango pomegranate |
|  4 | mango kiwi pomegranate  |
+----+-------------------------+
此处,在查询1、2&;3中,值apple之前没有运算符,mango~运算符位于值pomegranate之前。这可确保具有单词pomegranate的行的排名低于其他行。

查询4:

SELECT id, name FROM fruits
    -> WHERE MATCH(name) AGAINST
    -> ('+apple +mango ~pomegranate'
    -> IN BOOLEAN MODE);
+----+-------------------------+
| id | name                    |
+----+-------------------------+
|  2 | apple mango pomegranate |
|  3 | apple mango banana      |
+----+-------------------------+

查询5:

SELECT id, name FROM fruits
    -> WHERE MATCH(name) AGAINST
    -> ('+apple ~pomegranate'
    -> IN BOOLEAN MODE);
+----+-------------------------+
| id | name                    |
+----+-------------------------+
|  1 | apple orange watermelon |
|  2 | apple mango pomegranate |
|  3 | apple mango banana      |
|  6 | apple banana kiwi       |
+----+-------------------------+

查询6:

SELECT id, name FROM fruits
    -> WHERE MATCH(name) AGAINST
    -> ('+mango ~pomegranate'
    -> IN BOOLEAN MODE);
+----+-------------------------+
| id | name                    |
+----+-------------------------+
|  2 | apple mango pomegranate |
|  3 | apple mango banana      |
|  4 | mango kiwi pomegranate  |
|  5 | mango guava watermelon  |
+----+-------------------------+
这里,在查询4、5&;6+中,运算符位于值apple之前,mango~运算符位于值pomegranate之前。显然,+运算符的存在会使~运算符的任何效果无效。

查询7:

SELECT id, name FROM fruits
    -> WHERE MATCH(name) AGAINST
    -> ('+apple +mango <pomegranate'
    -> IN BOOLEAN MODE);
+----+-------------------------+
| id | name                    |
+----+-------------------------+
|  3 | apple mango banana      |
|  2 | apple mango pomegranate |
+----+-------------------------+

查询8:

SELECT id, name FROM fruits
    -> WHERE MATCH(name) AGAINST
    -> ('+apple <pomegranate'
    -> IN BOOLEAN MODE);
+----+-------------------------+
| id | name                    |
+----+-------------------------+
|  1 | apple orange watermelon |
|  3 | apple mango banana      |
|  6 | apple banana kiwi       |
|  2 | apple mango pomegranate |
+----+-------------------------+

查询9:

SELECT id, name FROM fruits
    -> WHERE MATCH(name) AGAINST
    -> ('+mango <pomegranate'
    -> IN BOOLEAN MODE);
+----+-------------------------+
| id | name                    |
+----+-------------------------+
|  3 | apple mango banana      |
|  5 | mango guava watermelon  |
|  2 | apple mango pomegranate |
|  4 | mango kiwi pomegranate  |
+----+-------------------------+
在这里,在查询7,8&;9+中,运算符位于值apple之前,mango<运算符位于值pomegranate之前。这可确保具有单词pomegranate的行的排名低于其他行。

因此,从这里可以推断的是- 如果存在+运算符,请使用<运算符,而不是~运算符


更新

经过大量计算,我创建了fruits_score_count表,它显示了执行布尔全文搜索时每个fruitscore

SELECT * FROM fruits_score_count;
+----+-------------+---------------------+----------------------+
| id | fruit_name  | row_numbers_matched | score                |
+----+-------------+---------------------+----------------------+
|  1 | apple       |                   4 | 0.031008131802082062 |
|  2 | banana      |                   2 |  0.22764469683170319 |
|  3 | guava       |                   1 |   0.6055193543434143 |
|  4 | kiwi        |                   2 |  0.22764469683170319 |
|  5 | mango       |                   4 | 0.031008131802082062 |
|  6 | orange      |                   1 |   0.6055193543434143 |
|  7 | pomegranate |                   2 |  0.22764469683170319 |
|  8 | watermelon  |                   2 |  0.22764469683170319 |
+----+-------------+---------------------+----------------------+

查询1:

SELECT id, name, score FROM
    -> (SELECT id, name, MATCH(name) AGAINST
    -> ('apple mango ~pomegranate' IN BOOLEAN MODE)
    -> AS score FROM fruits ORDER BY score DESC)
    -> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name                    | score                |
+----+-------------------------+----------------------+
|  3 | apple mango banana      | 0.062016263604164124 |
|  1 | apple orange watermelon | 0.031008131802082062 |
|  5 | mango guava watermelon  | 0.031008131802082062 |
|  6 | apple banana kiwi       | 0.031008131802082062 |
|  2 | apple mango pomegranate |  -0.7103390693664551 |
|  4 | mango kiwi pomegranate  |  -0.7413471937179565 |
+----+-------------------------+----------------------+

查询2:

SELECT id, name, score FROM
    -> (SELECT id, name, MATCH(name) AGAINST
    -> ('apple ~pomegranate' IN BOOLEAN MODE)
    -> AS score FROM fruits ORDER BY score DESC)
    -> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name                    | score                |
+----+-------------------------+----------------------+
|  1 | apple orange watermelon | 0.031008131802082062 |
|  3 | apple mango banana      | 0.031008131802082062 |
|  6 | apple banana kiwi       | 0.031008131802082062 |
|  2 | apple mango pomegranate |  -0.7413471937179565 |
+----+-------------------------+----------------------+

查询3:

SELECT id, name, score FROM
    -> (SELECT id, name, MATCH(name) AGAINST
    -> ('mango ~pomegranate' IN BOOLEAN MODE)
    -> AS score FROM fruits ORDER BY score DESC)
    -> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name                    | score                |
+----+-------------------------+----------------------+
|  3 | apple mango banana      | 0.031008131802082062 |
|  5 | mango guava watermelon  | 0.031008131802082062 |
|  2 | apple mango pomegranate |  -0.7413471937179565 |
|  4 | mango kiwi pomegranate  |  -0.7413471937179565 |
+----+-------------------------+----------------------+

查询4:

SELECT id, name, score FROM
    -> (SELECT id, name, MATCH(name) AGAINST
    -> ('+apple +mango ~pomegranate' IN BOOLEAN MODE)
    -> AS score FROM fruits ORDER BY score DESC)
    -> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name                    | score                |
+----+-------------------------+----------------------+
|  2 | apple mango pomegranate | 0.062016263604164124 |
|  3 | apple mango banana      | 0.062016263604164124 |
+----+-------------------------+----------------------+

查询5:

SELECT id, name, score FROM
    -> (SELECT id, name, MATCH(name) AGAINST
    -> ('+apple ~pomegranate' IN BOOLEAN MODE)
    -> AS score FROM fruits ORDER BY score DESC)
    -> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name                    | score                |
+----+-------------------------+----------------------+
|  1 | apple orange watermelon | 0.031008131802082062 |
|  2 | apple mango pomegranate | 0.031008131802082062 |
|  3 | apple mango banana      | 0.031008131802082062 |
|  6 | apple banana kiwi       | 0.031008131802082062 |
+----+-------------------------+----------------------+

查询6:

SELECT id, name, score FROM
    -> (SELECT id, name, MATCH(name) AGAINST
    -> ('+mango ~pomegranate' IN BOOLEAN MODE)
    -> AS score FROM fruits ORDER BY score DESC)
    -> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name                    | score                |
+----+-------------------------+----------------------+
|  2 | apple mango pomegranate | 0.031008131802082062 |
|  3 | apple mango banana      | 0.031008131802082062 |
|  4 | mango kiwi pomegranate  | 0.031008131802082062 |
|  5 | mango guava watermelon  | 0.031008131802082062 |
+----+-------------------------+----------------------+

查询7:

SELECT id, name, score FROM
    -> (SELECT id, name, MATCH(name) AGAINST
    -> ('+apple +mango <pomegranate' IN BOOLEAN MODE)
    -> AS score FROM fruits ORDER BY score DESC)
    -> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name                    | score                |
+----+-------------------------+----------------------+
|  3 | apple mango banana      | 0.062016263604164124 |
|  2 | apple mango pomegranate |  -0.7103390693664551 |
+----+-------------------------+----------------------+

查询8:

SELECT id, name, score FROM
    -> (SELECT id, name, MATCH(name) AGAINST
    -> ('+apple <pomegranate' IN BOOLEAN MODE)
    -> AS score FROM fruits ORDER BY score DESC)
    -> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name                    | score                |
+----+-------------------------+----------------------+
|  1 | apple orange watermelon | 0.031008131802082062 |
|  3 | apple mango banana      | 0.031008131802082062 |
|  6 | apple banana kiwi       | 0.031008131802082062 |
|  2 | apple mango pomegranate |  -0.7413471937179565 |
+----+-------------------------+----------------------+

查询9:

SELECT id, name, score FROM
    -> (SELECT id, name, MATCH(name) AGAINST
    -> ('+mango <pomegranate' IN BOOLEAN MODE)
    -> AS score FROM fruits ORDER BY score DESC)
    -> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name                    | score                |
+----+-------------------------+----------------------+
|  3 | apple mango banana      | 0.031008131802082062 |
|  5 | mango guava watermelon  | 0.031008131802082062 |
|  2 | apple mango pomegranate |  -0.7413471937179565 |
|  4 | mango kiwi pomegranate  |  -0.7413471937179565 |
+----+-------------------------+----------------------+

此处,查询%1、查询%2、查询%3、查询%7、查询%8、查询%9的行为符合预期。

但从查询4、查询5、查询6可以明显看出-

在值前面有+运算符的情况下,~运算符基本上使该值不可见。

仔细观察还发现-

x ~y+x <y等效


进一步试验

查询1:

SELECT id, name, score FROM
    -> (SELECT id, name, MATCH(name) AGAINST
    -> ('+mango apple ~pomegranate' IN BOOLEAN MODE)
    -> AS score FROM fruits ORDER BY score DESC)
    -> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name                    | score                |
+----+-------------------------+----------------------+
|  3 | apple mango banana      | 0.062016263604164124 |
|  4 | mango kiwi pomegranate  | 0.031008131802082062 |
|  5 | mango guava watermelon  | 0.031008131802082062 |
|  2 | apple mango pomegranate |  -0.7103390693664551 |
+----+-------------------------+----------------------+
  • id = 3的第1行获得最高分数,它是mangoapple的分数之和。
  • 具有id = 4的第2行获得第二高分,这是mango的得分。+运算符出现在mango前面使搜索短语的~pomegranate无关紧要。
  • 具有id = 5的第3行与第2行得分相同,但低于第2行,因为得分相同时,行按primary key升序排列,这里idprimary key
  • 第4行id = 2得分最低,因此排在最后。这里,由于存在单词apple,且搜索短语中apple之前没有+运算符,因此考虑了搜索短语中的~pomegranate,大大降低了得分。

查询2:

SELECT id, name, score FROM
    -> (SELECT id, name, MATCH(name) AGAINST
    -> ('+mango apple <pomegranate' IN BOOLEAN MODE)
    -> AS score FROM fruits ORDER BY score DESC)
    -> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name                    | score                |
+----+-------------------------+----------------------+
|  3 | apple mango banana      | 0.062016263604164124 |
|  5 | mango guava watermelon  | 0.031008131802082062 |
|  2 | apple mango pomegranate |  -0.7103390693664551 |
|  4 | mango kiwi pomegranate  |  -0.7413471937179565 |
+----+-------------------------+----------------------+

这再次说明<运算符即使在+运算符存在的情况下也有效。

这进一步加强了我早先的观察,即-

如果存在+运算符,请使用<运算符而不是~运算符

相关文章