使用十进制数据类型(MySQL/Postgres)是否会影响性能
我了解整数和浮点数据类型是如何存储的,我猜测十进制数据类型的可变长度意味着它更像是一个字符串.
I understand how integer and floating point data types are stored, and I am guessing that the variable length of decimal data types means it is stored more like a string.
当使用十进制数据类型并搜索它们时,这是否意味着性能开销?
Does that imply a performance overhead when using a decimal data type and searching against them?
推荐答案
Pavel 说得很对,我想解释一下.
Pavel has it quite right, I'd just like to explain a little.
假设您的意思是与浮点数或定点偏移整数(即,将千分之一美分存储为整数)相比的性能影响:是的,性能影响非常大.PostgreSQL,根据 MySQL 的说法,以二进制编码的十进制形式存储 DECIMAL
/NUMERIC
.这种格式比将数字存储为文本更紧凑,但使用起来仍然效率不高.
Presuming that you mean a performance impact as compared to floating point, or fixed-point-offset integer (i.e. storing thousandsths of a cent as an integer): Yes, there is very much a performance impact. PostgreSQL, and by the sounds of things MySQL, store DECIMAL
/ NUMERIC
in binary-coded decimal. This format is more compact than storing the digits as text, but it's still not very efficient to work with.
如果您没有在数据库中进行很多计算,则影响仅限于与整数或浮点数相比 BCD 需要更大的存储空间,因此行更宽、扫描更慢、索引更大等. 比较b-tree 索引搜索中的操作也较慢,但还不够重要,除非您已经因其他原因受到 CPU 限制.
If you're not doing many calculations in the database, the impact is limited to the greater storage space requried for BCD as compared to integer or floating point, and thus the wider rows and slower scans, bigger indexes, etc. Comparision operations in b-tree index searches are also slower, but not enough to matter unless you're already CPU-bound for some other reason.
如果您使用数据库中的 DECIMAL
/NUMERIC
值进行大量计算,那么性能真的会受到影响.这一点尤其明显,至少在 PostgreSQL 中是这样,因为对于任何给定的查询,Pg 不能使用多个 CPU.如果你正在做大量的除法和运算;数字上的乘法、更复杂的数学、聚合等,在使用浮点数或整数数据类型时您永远不会遇到的情况下,您可能会开始发现自己受 CPU 限制.这在类似 OLAP 的(分析)工作负载以及加载或提取 (ETL) 期间的报告或数据转换中尤为明显.
If you're doing lots of calculations with the DECIMAL
/ NUMERIC
values in the database, then performance can really suffer. This is particularly noticeable, at least in PostgreSQL, because Pg can't use more than one CPU for any given query. If you're doing a huge bunch of division & multiplication, more complex maths, aggregation, etc on numerics you can start to find yourself CPU-bound in situations where you would never be when using a float or integer data type. This is particularly noticeable in OLAP-like (analytics) workloads, and in reporting or data transformation during loading or extraction (ETL).
尽管存在 性能影响(根据工作负载从可忽略不计到相当大),您通常应该使用 numeric
/decimal
当它是最适合您的任务的类型时 - 即当必须存储非常大的范围值和/或舍入误差是不可接受的.
Despite the fact that there is a performance impact (which varies based on workload from negligible to quite big) you should generally use numeric
/ decimal
when it is the most appropriate type for your task - i.e. when very high range values must be stored and/or rounding error isn't acceptable.
有时使用 bigint 和定点偏移的麻烦是值得的,但这是笨拙且不灵活的.使用浮点数很少是正确的答案,因为可靠地使用浮点数来处理货币等事物的所有挑战.
Occasionally it's worth the hassle of using a bigint and fixed-point offset, but that is clumsy and inflexible. Using floating point instead is very rarely the right answer due to all the challenges of working reliably with floating point values for things like currency.
(顺便说一句,我很高兴一些新的 Intel CPU 和 IBM 的 Power 7 系列 CPU,包括对 IEEE 754 十进制浮点的硬件支持.如果这在低端 CPU 中可用,那将是一个巨大的赢得数据库.)
(BTW, I'm quite excited that some new Intel CPUs, and IBM's Power 7 range of CPUs, include hardware support for IEEE 754 decimal floating point. If this ever becomes available in lower end CPUs it'll be a huge win for databases.)
相关文章