元组在 CPython 中是如何实现的?

2022-01-19 00:00:00 python data-structures tuples python-internals cpython

问题描述

我一直在尝试了解 CPython 是如何在幕后实现的.Python 是高级别的很好，但我不喜欢把它当作一个黑盒子.

I've been trying to learn how CPython is implemented under the scenes. It's great that Python is high level, but I don't like treating it like a black box.

考虑到这一点，如何实现元组?我看过源代码 (tupleobject.c)，但它超出了我的头脑.

With that in mind, how are tuples implemented? I've had a look at the source (tupleobject.c), but it's going over my head.

我看到 PyTuple_MAXSAVESIZE = 20 和 PyTuple_MAXFREELIST = 2000，什么是保存和空闲列表"?(长度为 20/21 或 2000/2001 的元组之间会有性能差异吗?是什么强制执行最大元组长度?)

I see that PyTuple_MAXSAVESIZE = 20 and PyTuple_MAXFREELIST = 2000, what is saving and the "free list"? (Will there be a performance difference between tuples of length 20/21 or 2000/2001? What enforces the maximum tuple length?)

解决方案

请注意，此答案中的所有内容均基于我从查看您链接的实现中收集到的内容.

As a caveat, everything in this answer is based on what I've gleaned from looking over the implementation you linked.

元组的标准实现似乎只是一个数组.但是，有很多优化可以加快速度.

It seems that the standard implementation of a tuple is simply as an array. However, there are a bunch of optimizations in place to speed things up.

首先，如果您尝试创建一个空元组，CPython 将返回一个代表空元组的规范对象.因此，它可以节省大量仅分配单个对象的分配.

First, if you try to make an empty tuple, CPython instead will hand back a canonical object representing the empty tuple. As a result, it can save on a bunch of allocations that are just allocating a single object.

接下来，为了避免分配一堆小对象，CPython 为许多小列表回收内存.有一个固定常量 (PyTuple_MAXSAVESIZE)，这样所有小于此长度的元组都有资格回收它们的空间.每当一个长度小于该常数的对象被释放时，与它相关的内存就有可能不会被释放，而是被存储在一个空闲列表"中.(更多关于下一段的内容)基于它的大小.这样，如果您需要分配一个大小为 n 的元组，而其中一个已被分配且不再使用，CPython 可以回收旧数组.

Next, to avoid allocating a bunch of small objects, CPython recycles memory for many small lists. There is a fixed constant (PyTuple_MAXSAVESIZE) such that all tuples less than this length are eligible to have their space reclaimed. Whenever an object of length less than this constant is deallocated, there is a chance that the memory associated with it will not be freed and instead will be stored in a "free list" (more on that in the next paragraph) based on its size. That way, if you ever need to allocate a tuple of size n and one has previously been allocated and is no longer in use, CPython can just recycle the old array.

空闲列表本身被实现为一个大小为 PyTuple_MAXSAVESIZE 的数组，存储指向未使用元组的指针，其中数组的第 n 个元素指向 NULL(如果没有大小为 n 的额外元组可用)或者到一个大小为 n 的回收元组.如果有多个不同的大小为 n 的元组可以重用，则它们通过将每个元组的第零个入口点指向下一个可以重用的元组，以一种链表的形式链接在一起.(由于只分配了一个长度为零的元组，因此永远不会有读取不存在的第零元素的风险).通过这种方式，分配器可以存储一定数量的每个大小的元组以供重用.为了确保这不会使用太多内存，还有第二个常量 PyTuple_MAXFREELIST 用于控制任何存储桶中任何这些链表的最大长度.然后有一个长度为 PyTuple_MAXSAVESIZE 的辅助数组，它存储每个给定长度的元组的链表长度，这样就不会超过这个上限.

The free list itself is implemented as an array of size PyTuple_MAXSAVESIZE storing pointers to unused tuples, where the nth element of the array points either to NULL (if no extra tuples of size n are available) or to a reclaimed tuple of size n. If there are multiple different tuples of size n that could be reused, they are chained together in a sort of linked list by having each tuple's zeroth entry point to the next tuple that can be reused. (Since there is only one tuple of length zero ever allocated, there is never a risk of reading a nonexistent zeroth element). In this way, the allocator can store some number of tuples of each size for reuse. To ensure that this doesn't use too much memory, there is a second constant PyTuple_MAXFREELIST that controls the maximum length of any of these linked lists within any bucket. There is then a secondary array of length PyTuple_MAXSAVESIZE that stores the length of the linked lists for tuples of each given length so that this upper limit isn't exceeded.

总而言之，这是一个非常聪明的实现！

All in all, it's a very clever implementation!

相关文章