计算表列中的单词出现次数

2021-12-30 00:00:00 word count mysql

我有一个带有 varchar(255) 字段的表.我想(通过查询、函数或 SP)获取该表中一组行中每个单词的出现次数.

如果有 2 行包含这些字段:

<前>我喜欢吃香蕉"我不喜欢像猴子一样吃东西"

我想得到

<前>词 |数数()---------------喜欢 3吃2到 2我 21

有什么想法吗?我使用的是 MySQL 5.2.

解决方案

@Elad Meidar，我喜欢你的问题，我找到了解决方案:

SELECT SUM(total_count) as total, value从 (SELECT count(*) AS total_count, REPLACE(REPLACE(REPLACE(REPLACE(x.value,'?',''),'.',''),'!','') 作为值从 (SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(t.sentence, ' ', n.n), ' ', -1) 值FROM table_name t CROSS JOIN(选择 a.N + b.N * 10 + 1 n从(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9),(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b按 n 排序) nWHERE n.n <= 1 + (LENGTH(t.sentence) - LENGTH(REPLACE(t.sentence, ' ', '')))按值排序) 作为 x按 x.value 分组) 作为按值分组

这是完整的工作小提琴:http://sqlfiddle.com/#!2/17481a/1

首先，我们进行查询以提取所有单词，如@peterm 所述here(按照他的说明(如果您想自定义处理的总字数).然后我们将其转换为子查询，然后我们 COUNT 和 GROUP BY 每个单词的值，然后在其上对 GROUP 进行另一个查询BY 未将可能存在伴随标志的单词分组.即:你好=你好！带有 REPLACE

I have a table with a varchar(255) field. I want to get (via a query, function, or SP) the number of occurences of each word in a group of rows from this table.

If there are 2 rows with these fields:

"I like to eat bananas"
"I don't like to eat like a monkey"

I want to get

    word | count()
    ---------------
    like  3
    eat   2
    to    2
    i     2
    a     1

Any idea? I am using MySQL 5.2.

解决方案

@Elad Meidar, I like your question and I found a solution:

SELECT SUM(total_count) as total, value
FROM (

SELECT count(*) AS total_count, REPLACE(REPLACE(REPLACE(x.value,'?',''),'.',''),'!','') as value
FROM (
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(t.sentence, ' ', n.n), ' ', -1) value
  FROM table_name t CROSS JOIN 
(
   SELECT a.N + b.N * 10 + 1 n
     FROM 
    (SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
   ,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
    ORDER BY n
) n
 WHERE n.n <= 1 + (LENGTH(t.sentence) - LENGTH(REPLACE(t.sentence, ' ', '')))
 ORDER BY value

) AS x
GROUP BY x.value

) AS y
GROUP BY value

Here is the full working fiddle: http://sqlfiddle.com/#!2/17481a/1

First we do a query to extract all words as explained here by @peterm(follow his instructions if you want to customize the total number of words processed). Then we convert that into a sub-query and then we COUNT and GROUP BY the value of each word, and then make another query on top of that to GROUP BY not grouped words cases where accompanied signs might be present. ie: hello = hello! with a REPLACE

相关文章