SQL varchar 列长度的最佳实践

2021-11-20 00:00:00 postgresql sql mysql sql-server

每次设置新的 SQL 表或向现有表添加新的 varchar 列时,我想知道一件事:length.

Every time is set up a new SQL table or add a new varchar column to an existing table, I am wondering one thing: what is the best value for the length.

那么,假设您有一个名为 name 的列,其类型为 varchar.所以,你必须选择长度.我想不出超过 20 个字符的名字,但你永远不会知道.但是我没有使用 20,而是总是四舍五入到下一个 2^n 数字.在这种情况下,我会选择 32 作为长度.我这样做,因为从计算机科学家的角度来看,数字 2^n 对我来说比其他数字看起来更even,我只是假设下面的架构可以比其他数字更好地处理这些数字其他.

So, lets say, you have a column called name of type varchar. So, you have to choose the length. I cannot think of a name > 20 chars, but you will never know. But instead of using 20, I always round up to the next 2^n number. In this case, I would choose 32 as the length. I do that, because from an computer scientist point of view, a number 2^n looks more even to me than other numbers and I'm just assuming that the architecture underneath can handle those numbers slightly better than others.

另一方面,例如,当您选择创建 varchar 列时,MSSQL 服务器将默认长度值设置为 50.这让我思考.为什么是50?它只是一个随机数,还是基于平均列长度,还是什么?

On the other hand, MSSQL server for example, sets the default length value to 50, when you choose to create a varchar column. That makes me thinking about it. Why 50? is it just a random number, or based on average column length, or what?

也可能——或者可能是——不同的 SQL 服务器实现(如 MySQL、MSSQL、Postgres 等)具有不同的最佳列长度值.

It could also be - or probably is - that different SQL servers implementations (like MySQL, MSSQL, Postgres, ...) have different best column length values.

推荐答案

据我所知,没有任何 DBMS 有任何优化"可以使 VARCHAR 带有 2^n> length 的性能优于 max 长度不是 2 的幂的长度.

No DBMS I know of has any "optimization" that will make a VARCHAR with a 2^n length perform better than one with a max length that is not a power of 2.

我认为早期的 SQL Server 版本实际上处理长度为 255 的 VARCHAR 与最大长度更高的版本不同.我不知道现在是否仍然如此.

I think early SQL Server versions actually treated a VARCHAR with length 255 differently than one with a higher maximum length. I don't know if this is still the case.

对于几乎所有 DBMS,所需的实际存储量仅取决于您放入其中的字符数,而不是您定义的 max 长度.因此,从存储的角度(也很可能是性能的角度),将列声明为 VARCHAR(100)VARCHAR(500)<没有任何区别/代码>.

For almost all DBMS, the actual storage that is required is only determined by the number of characters you put into it, not the max length you define. So from a storage point of view (and most probably a performance one as well), it does not make any difference whether you declare a column as VARCHAR(100) or VARCHAR(500).

您应该看到为 VARCHAR 列提供的 max 长度是一种约束(或业务规则),而不是技术/物理事物.

You should see the max length provided for a VARCHAR column as a kind of constraint (or business rule) rather than a technical/physical thing.

对于 PostgreSQL,最好的设置是使用 text 没有长度限制和 CHECK CONSTRAINT 将字符数限制为您的业务需要的任何内容.

For PostgreSQL the best setup is to use text without a length restriction and a CHECK CONSTRAINT that limits the number of characters to whatever your business requires.

如果需求发生变化,改变检查约束比改变表要快得多(因为表不需要重写)

If that requirement changes, altering the check constraint is much faster than altering the table (because the table does not need to be re-written)

同样可以应用于 Oracle 和其他 - 在 Oracle 中,它将是 VARCHAR(4000) 而不是 text.

The same can be applied for Oracle and others - in Oracle it would be VARCHAR(4000) instead of text though.

我不知道 VARCHAR(max) 和 e.g. 之间是否存在物理存储差异.VARCHAR(500) 在 SQL Server 中.但显然,与 varchar(8000) 相比,使用 varchar(max) 会影响性能.

I don't know if there is a physical storage difference between VARCHAR(max) and e.g. VARCHAR(500) in SQL Server. But apparently there is a performance impact when using varchar(max) as compared to varchar(8000).

请参阅此链接(已发布由 Erwin Brandstetter 作为评论)

See this link (posted by Erwin Brandstetter as a comment)

编辑 2013-09-22

关于 bigown 的评论:

Regarding bigown's comment:

在 9.2 之前的 Postgres 版本中(在我写初始答案时不可用)对列定义的更改确实重写了整个表,参见例如此处.从 9.2 开始,情况不再如此,快速测试证实,为具有 120 万行的表增加列大小确实只需要 0.5 秒.

In Postgres versions before 9.2 (which was not available when I wrote the initial answer) a change to the column definition did rewrite the whole table, see e.g. here. Since 9.2 this is no longer the case and a quick test confirmed that increasing the column size for a table with 1.2 million rows indeed only took 0.5 seconds.

对于 Oracle 来说,这似乎也是正确的,从改变大表的 varchar 列所需的时间来判断.但我找不到任何参考资料.

For Oracle this seems to be true as well, judging by the time it takes to alter a big table's varchar column. But I could not find any reference for that.

对于 MySQL 手册说 "在大多数情况下,ALTER TABLE 制作原始表的临时副本".并且我自己的测试证实:在具有 120 万行的表上运行 ALTER TABLE 以增加列的大小需要 1.5 分钟(与我使用 Postgres 的测试相同).但是,在 MySQL 中,您可以不使用解决方法"来使用检查约束来限制列中的字符数.

For MySQL the manual says "In most cases, ALTER TABLE makes a temporary copy of the original table". And my own tests confirm that: running an ALTER TABLE on a table with 1.2 million rows (the same as in my test with Postgres) to increase the size of a column took 1.5 minutes. In MySQL however you can not use the "workaround" to use a check constraint to limit the number of characters in a column.

对于 SQL Server,我找不到明确的说明,但是增加 varchar 列大小的执行时间(同样是上面的 120 万行表)表明 没有 重写发生.

For SQL Server I could not find a clear statement on this but the execution time to increase the size of a varchar column (again the 1.2 million rows table from above) indicates that no rewrite takes place.

编辑 2017-01-24

似乎我对 SQL Server 的看法(至少部分)是错误的.请参阅 Aaron Bertrand 的这个答案,它表明 nvarcharvarchar 列对性能产生巨大影响.

Seems I was (at least partially) wrong about SQL Server. See this answer from Aaron Bertrand that shows that the declared length of a nvarchar or varchar columns makes a huge difference for the performance.

相关文章