了解 MyISAM 记录结构

2021-11-17 00:00:00 database mysql architecture myisam

我试图了解 MyISAM 如何物理存储其记录以及在记录插入和记录删除后如何维护其结构.我已阅读以下链接:

第 10 章:存储引擎"第196页第7段说
<块引用>
对于可变长度的记录，格式更复杂.第一个字节包含描述记录子类型的特殊代码.后续字节的含义因每个子类型而异，但共同的主题是有一个字节序列，其中包含记录的长度、块中未使用的字节数、NULL 值指示符标志以及可能的指向如果记录不适合先前创建的空间并且必须拆分，则记录的延续.当一条记录被删除，并且要插入其位置的新记录超过原始记录的大小时，就会发生这种情况.可以通过研究storage/myisam/mi_dynrec.c中的switch语句in_mi_get_block_info()得到不同代码含义的详细信息.
基于该段落，只有当要插入的新数据无法放入先前分配的块时，旧记录才会被链接数据覆盖.这可能会导致许多臃肿的行.
附加问题
<块引用>
如果表被多次删除和插入，是否会非常低效，因为记录结构可能充满溢出指针和未使用的空间?
根据我之前的回答，会有很多块
- 空间块
- 记录长度
- 块中未使用的字节数
- NULL 值指示符标志
- 如果记录不适合先前创建的空间并且必须拆分，则可能是指向记录延续的指针
此类记录链接将从插入过大数据的每一行的前面开始.这会很快使 MyISAM 表 .MYD 文件膨胀.
建议
MyISAM 的默认行格式是动态的.当一个表是动态的并且经历了很多 INSERT、UPDATE 和 DELETE 时，这样的表需要使用
进行优化
```
优化表mytable;
```
还有一种选择:将表格的行格式切换为固定.这样，所有行的大小都相同.这是使行格式固定的方式:
```
ALTER TABLE mytable ROW_FORMAT=Fixed;
```
即使使用固定行格式，也必须花费时间来定位可用记录，但时间将是 O(1) 搜索时间(通俗地说，无论定位可用记录，都需要相同的时间表有多少行或有多少已删除的行).您可以通过启用来绕过该步骤concurrent_insert 如下:
将此添加到 my.cnf
```
[mysqld]并发插入 = 2
```
不需要重启 MySQL.就跑
```
mysql>SET GLOBAL concurrent_insert = 2;
```
这将导致所有 INSERT 都转到表的后面而不寻找可用空间.
固定行表的优势
- INSERT、UPDATE 和 DELETE 会更快一些
- SELECT 速度提高了 20-25%
这是我关于 SELECT 因行格式固定而更快的一些帖子
- 2012 年 5 月 3 日:InnoDB 和 MyISAM 哪个更快?
- 2011 年 9 月 20 日:最佳MyISAM 和 InnoDB
- 2011 年 5 月 10 日:在固定设备上使用 CHAR 与 VARCHAR 对性能有何影响?-size 字段?
Fixed Row 表的缺点
在大多数情况下，当您运行 ALTER TABLE mytable ROW_FORMAT=Fixed; 时，表可能会增长 80-100%..MYI 文件(MyISAM 表的索引页)也将以相同的速度增长.
结语
如果您想要 MyISAM 表的速度并且可以使用更大的表，则需要我的替代建议.如果要为每个 MyISAM 表节省空间，请保留行格式(动态).您将不得不使用 OPTIMIZE TABLE mytable; 更频繁地使用动态表来压缩表.
I am trying to understand how MyISAM physically store its records and how it maintains its structure after record insertion and record deletion. I have read the following link:
- MyISAM Dynamic Data File Layout
- MyISAM Record Structure
I want to make sure if I understand it correctly, please correct me if it is not right.

Fixed-sized record
- Delete marker determines whether record is deleted or not deleted.
- Record header holds which column of a row contains NULL value
- The length of data is fixed.
Variable-sized record
- Delete marker is replaced with BLOCK_DELETED block type
- Record header holds length of data and length of unused data
- A single record can be seperated into multiple block connected by overflow pointer.
Deletion
- For variable-sized record, change block type to BLOCK_DELETED
- Maintain double linked-list of all deleted record by having the previous pointer of the newly deleted record points to last deleted record. Then, the last deleted record's next pointer points to the newly deleted record.
- For fixed-sized record, simply change delete marker as deleted. (unsure if they use double linked-list to connect all the deleted record with fixed-sized record)
Insertion
- If there is no unused space (deleted records), append the data at the end of the file
- If there is unused space that fits the newly inserted record, write the new record there.
- If there is unused space that is far bigger than newly inserted record, split into two records: the new record and the deleted record.
- If there is unused space that is smaller than newly inserted record, write data there, have overflow pointer to points to the unfitted data at other block.
Updating
- What if users update existed data with longer data? Will MyISAM marked the record as deleted and find place that fits the new data or simply use overflow pointer to point to unfitted data?
Recap the question again

I want to make sure if I understand it correctly, please correct me if it is not right.

Additional questions
- Would it be very inefficient if the table has been deleted and inserted for many times since the record structure could potentially full of overflow pointers and unused space?
解决方案
The information you have in the question concerning MyISAM is right on target. However, I would like to address your two additional questions:

LATEST QUESTION

What if users update existed data with longer data? Will MyISAM marked the record as deleted and find place that fits the new data or simply use overflow pointer to point to unfitted data?

According to the Book

Chapter 10 : "Storage Engines" Page 196 Paragraph 7 says

For records with variable length, the format is more complicated. The first byte contains a special code describing the subtype of the record. The meaning of the subsequent bytes varies with each subtype, but the common theme is that there is a sequence of bytes that contains the length of the record, the number of unused bytes in the block, NULL value indicator flags, and possibly a pointer to the continuation of the record if the record did not fit into the previously created space and had to be split up. This can happen when one record gets deleted, and a new one to be inserted into its place exceeds the original one is size. You can get the details of the meanings of different codes by studying the switch statement in_mi_get_block_info() in storage/myisam/mi_dynrec.c.

Based on that paragraph, the old record gets overwritten with linkage data only if the new data to insert cannot fit in the previously allocated block. This can result in many bloated rows.

ADDITIONAL QUESTION

Would it be very inefficient if the table has been deleted and inserted for many times since the record structure could potentially full of overflow pointers and unused space?

From my previous answer, there would be lots of blocks that have
- block of space
- the length of the record
- the number of unused bytes in the block
- NULL value indicator flags
- possibly a pointer to the continuation of the record if the record did not fit into the previously created space and had to be split up
Such record links would start in the front of every row that have oversized data being inserted. This can bloat a MyISAM tables .MYD file very quickly.

SUGGESTIONS

The default row format of a MyISAM is Dynamic. When a table is Dynamic and experiences lots of INSERTs, UPDATEs, and DELETEs, such a table would need to optimized with
```
OPTIMIZE TABLE mytable;
```
There is an alternative: switch the table's row format to Fixed. That way, all rows are the same size. This is how you make the row format Fixed:
```
ALTER TABLE mytable ROW_FORMAT=Fixed;
```
Even with a Fixed Row Format, time must be taken to locate an available record but the time would be O(1) search time (In layman's terms, it would take the same amount of time to locate an available record no matter how many rows the table has or how many deleted rows there are). You could bypass that step by enabling concurrent_insert as follows:

Add this to my.cnf
```
[mysqld]
concurrent_insert = 2
```
MySQL restart not required. Just run
```
mysql> SET GLOBAL concurrent_insert = 2;
```
This would cause all INSERTs to go to the back of the table without looking for free space.

Advantage of Fixed Row tables
- INSERTs, UPDATEs, and DELETEs would be somewhat faster
- SELECT are 20-25% faster
Here are some of my posts on SELECT being faster for Row Formats being Fixed
- May 03, 2012 : Which is faster, InnoDB or MyISAM?
- Sep 20, 2011 : Best of MyISAM and InnoDB
- May 10, 2011 : What is the performance impact of using CHAR vs VARCHAR on a fixed-size field?
Disadvantage of Fixed Row tables

In most cases, when you run ALTER TABLE mytable ROW_FORMAT=Fixed;, the table may grow 80-100%. The .MYI file (index pages for the MyISAM table) would also grow at the same rate.

EPILOGUE

If you want speed for MyISAM tables and can live with bigger tables, my alternate suggestions would be needed. If you want to conserve space for each MyISAM table, leave the row format as is (Dynamic). You will have to compress the table with OPTIMIZE TABLE mytable; more frequent with Dynamic tables.

相关文章

了解 MyISAM 记录结构

附加问题

建议

固定行表的优势

Fixed Row 表的缺点

结语

Fixed-sized record

Variable-sized record

Deletion

Insertion

Updating

LATEST QUESTION

ADDITIONAL QUESTION

SUGGESTIONS

Advantage of Fixed Row tables

Disadvantage of Fixed Row tables

EPILOGUE