为什么 NULL 值在事实表中映射为 0?
在事实表(维度建模的数据仓库)中的度量字段中,NULL 值通常映射为 0 的原因是什么?
What is the reason that in measure fields in fact tables (dimensionally modeled data warehouses) NULL values are usually mapped as 0?
推荐答案
虽然您已经接受了另一个答案,但我想说使用 NULL 实际上是更好的选择,原因有几个.
Although you've already accepted another answer, I would say that using NULL is actually a better choice, for a couple of reasons.
第一个原因是,当存在 NULL 时,聚合返回正确"答案(即用户倾向于期望的答案),但在使用零时给出错误"答案.考虑这两个查询中 AVG() 的结果:
The first reason is that aggregates return the 'correct' answer (i.e. the one that users tend to expect) when NULL is present but give the 'wrong' answer when you use zero. Consider the results from AVG() in these two queries:
-- with zero; gives 1.5
select SUM(measure), AVG(measure)
from
(
select 1.0 as 'measure'
union all
select 2.0
union all
select 3.0
union all
select 0
) dt
-- with null; gives 2
select SUM(measure), AVG(measure)
from
(
select 1.0 as 'measure'
union all
select 2.0
union all
select 3.0
union all
select null
) dt
如果我们假设这里的度量是制造商品的天数",而 NULL 代表仍在生产的商品,那么零给出了错误的答案.同样的道理也适用于 MIN() 和 MAX().
If we assume that the measure here is "number of days to manufacture item" and NULL represents an item that is still being produced then zero gives the wrong answer. The same reasoning applies to MIN() and MAX() too.
第二个问题是,如果零是默认值,那么如何区分零作为默认值和零作为实际值?例如,考虑以欧元计算的运费"的度量,其中 NULL 表示客户自己取货,因此没有运费,零表示订单免费运送给客户.您不能在不完全改变数据含义的情况下使用零来替换 NULL.您显然可以争辩说,应该从其他维度(例如运输方式)中清楚区分,但这会增加报告和理解数据的复杂性.
The second issue is that if zero is a default value, then how do you distinguish between zero as a default and zero as a real value? For example, consider a measure of "shipping charges in EUR" where NULL means that the customer picked up the order himself so there were no shipping charges and zero means the order was shipped to the customer for free. You can't use zero to replace NULL without completely changing the meaning of the data. You can obviously argue that the distinction should be clear from other dimensions (e.g. shipping method) but that adds more complexity to reports and understanding the data.
相关文章