对所有基于文本的字段使用通用 varchar(255) 是否有缺点?
我有一个 contacts
表,其中包含诸如 postcode
、first name
、last name
、town
、country
、phone number
等,所有这些都被定义为 VARCHAR(255)
即使这些都没有字段将接近于 255 个字符.(如果您想知道,这是因为 Ruby on Rails 迁移默认将 String 字段映射到 VARCHAR(255)
并且我从不费心去覆盖它).
I have a contacts
table which contains fields such as postcode
, first name
, last name
, town
, country
, phone number
etc, all of which are defined as VARCHAR(255)
even though none of these fields will ever come close to having 255 characters. (If you're wondering, it's this way because Ruby on Rails migrations map String fields to VARCHAR(255)
by default and I never bothered to override it).
由于 VARCHAR 将只存储字段的实际字符数(以及字段长度),使用 VARCHAR(16)
是否有任何明显的优势(性能或其他)超过 VARCHAR(255)
?
Since VARCHAR will only store the number of actual characters of the field (along with the field length), is there any distinct advantage (performance or otherwise) to using, say, VARCHAR(16)
over VARCHAR(255)
?
此外,这些字段中的大多数都有索引.字段上较大的 VARCHAR 大小是否会影响索引的大小或性能?
Additionally, most of these fields have indexes on them. Does a larger VARCHAR size on the field affect the size or performance of the index at all?
仅供参考,我使用的是 MySQL 5.
FYI I'm using MySQL 5.
推荐答案
在存储方面,VARCHAR(255)
足够智能,可以仅存储给定行所需的长度,这与 不同CHAR(255)
总是存储 255 个字符.
In storage, VARCHAR(255)
is smart enough to store only the length you need on a given row, unlike CHAR(255)
which would always store 255 characters.
但是既然你用 MySQL 标记了这个问题,我会提到一个 MySQL 特定的提示:当行从存储引擎层复制到 SQL 层时,VARCHAR
字段被转换为 CHAR
以获得使用固定宽度行的优势.因此,内存中的字符串将填充到您声明的 VARCHAR
列的最大长度.
But since you tagged this question with MySQL, I'll mention a MySQL-specific tip: as rows are copied from the storage engine layer to the SQL layer, VARCHAR
fields are converted to CHAR
to gain the advantage of working with fixed-width rows. So the strings in memory become padded out to the maximum length of your declared VARCHAR
column.
当您的查询隐式生成临时表时,例如在排序或 GROUP BY
时,这会占用大量内存.如果您将大量 VARCHAR(255)
字段用于不需要那么长的数据,这会使临时表变得非常大.
When your query implicitly generates a temporary table, for instance while sorting or GROUP BY
, this can use a lot of memory. If you use a lot of VARCHAR(255)
fields for data that doesn't need to be that long, this can make the temporary table very large.
您可能还想知道,这种填充"行为意味着使用 utf8 字符集声明的字符串每个字符填充三个字节,即使对于您存储的带有单字节内容的字符串(例如 ascii 或 latin1 字符).同样 utf8mb4 字符集会导致字符串在内存中填充为每个字符四个字节.
You may also like to know that this "padding out" behavior means that a string declared with the utf8 character set pads out to three bytes per character even for strings you store with single-byte content (e.g. ascii or latin1 characters). And likewise utf8mb4 character set causes the string to pad out to four bytes per character in memory.
因此,utf8 中的 VARCHAR(255)
存储像No opinion"这样的短字符串在磁盘上占用 11 个字节(10 个低字符集字符,加上一个长度字节)但它占用 765 个字节在内存中,因此在临时表或排序结果中.
So a VARCHAR(255)
in utf8 storing a short string like "No opinion" takes 11 bytes on disk (ten lower-charset characters, plus one byte for length) but it takes 765 bytes in memory, and thus in temp tables or sorted results.
我帮助过经常在不知不觉中创建 1.5GB 临时表并填满磁盘空间的 MySQL 用户.他们有很多 VARCHAR(255)
列,实际上存储了非常短的字符串.
I have helped MySQL users who unknowingly created 1.5GB temp tables frequently and filled up their disk space. They had lots of VARCHAR(255)
columns that in practice stored very short strings.
最好根据要存储的数据类型定义列.正如其他人提到的那样,强制执行与应用程序相关的约束有好处.但它具有避免我上面描述的内存浪费的物理优势.
It's best to define the column based on the type of data that you intend to store. It has benefits to enforce application-related constraints, as other folks have mentioned. But it has the physical benefits to avoid the memory waste I described above.
当然,很难知道最长的邮政地址是多少,这就是为什么许多人选择长 VARCHAR
肯定比任何地址都长的原因.255 是惯例,因为它是 VARCHAR
的最大长度,其长度可以用一个字节进行编码.这也是 MySQL 5.0 之前的最大 VARCHAR
长度.
It's hard to know what the longest postal address is, of course, which is why many people choose a long VARCHAR
that is certainly longer than any address. And 255 is customary because it is the maximum length of a VARCHAR
for which the length can be encoded with one byte. It was also the maximum VARCHAR
length in MySQL older than 5.0.
相关文章