MySQL - 行到列

2021-11-20 00:00:00 pivot-table sql etl mysql crosstab

我尝试搜索帖子,但我只找到了 SQL Server/Access 的解决方案.我需要 MySQL (5.X) 中的解决方案.

I tried to search posts, but I only found solutions for SQL Server/Access. I need a solution in MySQL (5.X).

我有一个包含 3 列的表(称为历史记录):hostid、itemname、itemvalue.
如果我做了一个选择(select * from history),它会返回

I have a table (called history) with 3 columns: hostid, itemname, itemvalue.
If I do a select (select * from history), it will return

   +--------+----------+-----------+
   | hostid | itemname | itemvalue |
   +--------+----------+-----------+
   |   1    |    A     |    10     |
   +--------+----------+-----------+
   |   1    |    B     |     3     |
   +--------+----------+-----------+
   |   2    |    A     |     9     |
   +--------+----------+-----------+
   |   2    |    c     |    40     |
   +--------+----------+-----------+

如何查询数据库以返回类似的内容

How do I query the database to return something like

   +--------+------+-----+-----+
   | hostid |   A  |  B  |  C  |
   +--------+------+-----+-----+
   |   1    |  10  |  3  |  0  |
   +--------+------+-----+-----+
   |   2    |   9  |  0  |  40 |
   +--------+------+-----+-----+

推荐答案

我将添加更长更详细的说明,说明解决此问题的步骤.如果太长,我深表歉意.

I'm going to add a somewhat longer and more detailed explanation of the steps to take to solve this problem. I apologize if it's too long.

我将从您提供的基础开始,并使用它来定义我将在本文的其余部分使用的几个术语.这将是基表:

I'll start out with the base you've given and use it to define a couple of terms that I'll use for the rest of this post. This will be the base table:

select * from history;

+--------+----------+-----------+
| hostid | itemname | itemvalue |
+--------+----------+-----------+
|      1 | A        |        10 |
|      1 | B        |         3 |
|      2 | A        |         9 |
|      2 | C        |        40 |
+--------+----------+-----------+

这将是我们的目标,漂亮的数据透视表:

select * from history_itemvalue_pivot;

+--------+------+------+------+
| hostid | A    | B    | C    |
+--------+------+------+------+
|      1 |   10 |    3 |    0 |
|      2 |    9 |    0 |   40 |
+--------+------+------+------+

history.hostid 列中的值将成为数据透视表中的 y 值.history.itemname 列中的值将变为 x 值(原因很明显).

Values in the history.hostid column will become y-values in the pivot table. Values in the history.itemname column will become x-values (for obvious reasons).

当我必须解决创建数据透视表的问题时,我会使用三步流程(可选的第四步)来解决它:

When I have to solve the problem of creating a pivot table, I tackle it using a three-step process (with an optional fourth step):

  1. 选择感兴趣的列,即y值和x值
  2. 使用额外的列扩展基表 - 每个 x 值
  3. 对扩展表进行分组和聚合 -- 每个 y 值
  4. (可选)美化聚合表

让我们将这些步骤应用于您的问题,看看我们得到了什么:

Let's apply these steps to your problem and see what we get:

第 1 步:选择感兴趣的列.在所需的结果中,hostid 提供 y 值,itemname 提供 x 值.

Step 1: select columns of interest. In the desired result, hostid provides the y-values and itemname provides the x-values.

第 2 步:使用额外的列扩展基表.我们通常每个 x 值需要一列.回想一下我们的 x 值列是 itemname:

Step 2: extend the base table with extra columns. We typically need one column per x-value. Recall that our x-value column is itemname:

create view history_extended as (
  select
    history.*,
    case when itemname = "A" then itemvalue end as A,
    case when itemname = "B" then itemvalue end as B,
    case when itemname = "C" then itemvalue end as C
  from history
);

select * from history_extended;

+--------+----------+-----------+------+------+------+
| hostid | itemname | itemvalue | A    | B    | C    |
+--------+----------+-----------+------+------+------+
|      1 | A        |        10 |   10 | NULL | NULL |
|      1 | B        |         3 | NULL |    3 | NULL |
|      2 | A        |         9 |    9 | NULL | NULL |
|      2 | C        |        40 | NULL | NULL |   40 |
+--------+----------+-----------+------+------+------+

请注意,我们没有更改行数——我们只是添加了额外的列.还要注意 NULLs 的模式——itemname = "A" 的行对于新列 A 具有非空值,并且其他新列的空值.

Note that we didn't change the number of rows -- we just added extra columns. Also note the pattern of NULLs -- a row with itemname = "A" has a non-null value for new column A, and null values for the other new columns.

步骤 3:对扩展表进行分组和聚合.我们需要group by hostid,因为它提供了y值:

Step 3: group and aggregate the extended table. We need to group by hostid, since it provides the y-values:

create view history_itemvalue_pivot as (
  select
    hostid,
    sum(A) as A,
    sum(B) as B,
    sum(C) as C
  from history_extended
  group by hostid
);

select * from history_itemvalue_pivot;

+--------+------+------+------+
| hostid | A    | B    | C    |
+--------+------+------+------+
|      1 |   10 |    3 | NULL |
|      2 |    9 | NULL |   40 |
+--------+------+------+------+

(请注意,我们现在每个 y 值有一行.) 好的,我们快到了!我们只需要摆脱那些丑陋的NULLs.

(Note that we now have one row per y-value.) Okay, we're almost there! We just need to get rid of those ugly NULLs.

第 4 步:美化.我们将用零替换任何空值,以便结果集更好看:

Step 4: prettify. We're just going to replace any null values with zeroes so the result set is nicer to look at:

create view history_itemvalue_pivot_pretty as (
  select 
    hostid, 
    coalesce(A, 0) as A, 
    coalesce(B, 0) as B, 
    coalesce(C, 0) as C 
  from history_itemvalue_pivot 
);

select * from history_itemvalue_pivot_pretty;

+--------+------+------+------+
| hostid | A    | B    | C    |
+--------+------+------+------+
|      1 |   10 |    3 |    0 |
|      2 |    9 |    0 |   40 |
+--------+------+------+------+

我们已经完成了——我们已经使用 MySQL 构建了一个漂亮、漂亮的数据透视表.

And we're done -- we've built a nice, pretty pivot table using MySQL.

应用此程序时的注意事项:

Considerations when applying this procedure:

  • 在额外的列中使用什么值.我在这个例子中使用了 itemvalue
  • 在额外的列中使用什么中性"值.我使用了 NULL,但也可以是 0"",具体取决于您的具体情况
  • 分组时使用什么聚合函数.我用的是sum,但是countmax 也经常用到(max 常用于构建单行分散在多行中的对象")
  • 对 y 值使用多列.此解决方案不限于对 y 值使用单列——只需将额外的列插入 group by 子句(不要忘记 select他们)
  • what value to use in the extra columns. I used itemvalue in this example
  • what "neutral" value to use in the extra columns. I used NULL, but it could also be 0 or "", depending on your exact situation
  • what aggregate function to use when grouping. I used sum, but count and max are also often used (max is often used when building one-row "objects" that had been spread across many rows)
  • using multiple columns for y-values. This solution isn't limited to using a single column for the y-values -- just plug the extra columns into the group by clause (and don't forget to select them)

已知限制:

  • 该解决方案不允许数据透视表中有 n 列——在扩展基表时需要手动添加每个数据透视列.所以对于 5 或​​ 10 个 x 值,这个解决方案很好.100,不太好.有一些存储过程生成查询的解决方案,但它们很丑陋且难以正确处理.当数据透视表需要有很多列时,我目前不知道有什么好的方法可以解决这个问题.

相关文章