获取直方图数据
有没有办法在 MySQL 中指定 bin 大小?现在,我正在尝试以下 SQL 查询:
Is there a way to specify bin sizes in MySQL? Right now, I am trying the following SQL query:
select total, count(total) from faults GROUP BY total;
正在生成的数据足够好,但行太多了.我需要的是一种将数据分组到预定义箱中的方法.我可以用脚本语言来做到这一点,但有没有办法直接在 SQL 中做到这一点?
The data that is being generated is good enough but there are just too many rows. What I need is a way to group the data into predefined bins. I can do this from a scripting language, but is there a way to do it directly in SQL?
示例:
+-------+--------------+
| total | count(total) |
+-------+--------------+
| 30 | 1 |
| 31 | 2 |
| 33 | 1 |
| 34 | 3 |
| 35 | 2 |
| 36 | 6 |
| 37 | 3 |
| 38 | 2 |
| 41 | 1 |
| 42 | 5 |
| 43 | 1 |
| 44 | 7 |
| 45 | 4 |
| 46 | 3 |
| 47 | 2 |
| 49 | 3 |
| 50 | 2 |
| 51 | 3 |
| 52 | 4 |
| 53 | 2 |
| 54 | 1 |
| 55 | 3 |
| 56 | 4 |
| 57 | 4 |
| 58 | 2 |
| 59 | 2 |
| 60 | 4 |
| 61 | 1 |
| 63 | 2 |
| 64 | 5 |
| 65 | 2 |
| 66 | 3 |
| 67 | 5 |
| 68 | 5 |
------------------------
我在找什么:
+------------+---------------+
| total | count(total) |
+------------+---------------+
| 30 - 40 | 23 |
| 40 - 50 | 15 |
| 50 - 60 | 51 |
| 60 - 70 | 45 |
------------------------------
我想这不能以直接的方式实现,但对任何相关存储过程的引用也可以.
I guess this cannot be achieved in a straight forward manner but a reference to any related stored procedure would be fine as well.
推荐答案
这是一篇关于创建直方图的超级快速和肮脏的方法的帖子在 MySQL 中用于数值.
This is a post about a super quick-and-dirty way to create a histogram in MySQL for numeric values.
有多种其他方法可以创建更好的直方图更灵活,使用 CASE 语句和其他类型的复杂逻辑.这种方法一次又一次地赢得了我,因为它是如此简单为每个用例进行修改,如此简短.你就是这样这样做:
There are multiple other ways to create histograms that are better and more flexible, using CASE statements and other types of complex logic. This method wins me over time and time again since it's just so easy to modify for each use case, and so short and concise. This is how you do it:
SELECT ROUND(numeric_value, -2) AS bucket,
COUNT(*) AS COUNT,
RPAD('', LN(COUNT(*)), '*') AS bar
FROM my_table
GROUP BY bucket;
只需将 numeric_value 更改为您的任何列,更改舍入增量,就是这样.我已经让酒吧在对数刻度,这样当你有大值.
Just change numeric_value to whatever your column is, change the rounding increment, and that's it. I've made the bars to be in logarithmic scale, so that they don't grow too much when you have large values.
numeric_value 应该在 ROUNDing 操作中偏移,基于舍入增量,以确保第一个桶包含与后面的桶一样多的元素.
numeric_value should be offset in the ROUNDing operation, based on the rounding increment, in order to ensure the first bucket contains as many elements as the following buckets.
例如使用 ROUND(numeric_value,-1) 时,范围 [0,4](5 个元素)中的 numeric_value 将被放置在第一个桶中,而 [5,14](10 个元素)在第二个桶中,[15,24] 在第三个桶中,除非numeric_value 通过 ROUND(numeric_value - 5, -1) 适当偏移.
e.g. with ROUND(numeric_value,-1), numeric_value in range [0,4] (5 elements) will be placed in first bucket, while [5,14] (10 elements) in second, [15,24] in third, unless numeric_value is offset appropriately via ROUND(numeric_value - 5, -1).
这是对一些看起来很漂亮的随机数据进行此类查询的示例甜的.足以快速评估数据.
This is an example of such query on some random data that looks pretty sweet. Good enough for a quick evaluation of the data.
+--------+----------+-----------------+
| bucket | count | bar |
+--------+----------+-----------------+
| -500 | 1 | |
| -400 | 2 | * |
| -300 | 2 | * |
| -200 | 9 | ** |
| -100 | 52 | **** |
| 0 | 5310766 | *************** |
| 100 | 20779 | ********** |
| 200 | 1865 | ******** |
| 300 | 527 | ****** |
| 400 | 170 | ***** |
| 500 | 79 | **** |
| 600 | 63 | **** |
| 700 | 35 | **** |
| 800 | 14 | *** |
| 900 | 15 | *** |
| 1000 | 6 | ** |
| 1100 | 7 | ** |
| 1200 | 8 | ** |
| 1300 | 5 | ** |
| 1400 | 2 | * |
| 1500 | 4 | * |
+--------+----------+-----------------+
一些注意事项:没有匹配的范围不会出现在计数中 -您将不会在计数列中出现零.另外,我正在使用此处为 ROUND 函数.你可以很容易地用 TRUNCATE 替换它如果你觉得这对你更有意义.
Some notes: Ranges that have no match will not appear in the count - you will not have a zero in the count column. Also, I'm using the ROUND function here. You can just as easily replace it with TRUNCATE if you feel it makes more sense to you.
我在这里找到它http://blog.shlomoid.com/2011/08/how-to-quickly-create-histogram-in.html
相关文章