为什么元组在内存中占用的空间比列表少?

问题描述

tuple 在 Python 中占用更少的内存空间:

A tuple takes less memory space in Python:

>>> a = (1,2,3)
>>> a.__sizeof__()
48

lists 占用更多内存空间:

whereas lists takes more memory space:

>>> b = [1,2,3]
>>> b.__sizeof__()
64

Python 内存管理内部发生了什么?

What happens internally on the Python memory management?


解决方案

我假设您使用的是 64 位的 CPython(我在 CPython 2.7 64 位上得到了相同的结果).其他 Python 实现或您使用 32 位 Python 时可能存在差异.

I assume you're using CPython and with 64bits (I got the same results on my CPython 2.7 64-bit). There could be differences in other Python implementations or if you have a 32bit Python.

不管实现如何,lists 是可变大小的,而 tuples 是固定大小的.

Regardless of the implementation, lists are variable-sized while tuples are fixed-size.

所以 tuples 可以将元素直接存储在结构中,另一方面,列表需要一个间接层(它存储指向元素的指针).这一间接层是一个指针,在 64 位系统上是 64 位,因此是 8 字节.

So tuples can store the elements directly inside the struct, lists on the other hand need a layer of indirection (it stores a pointer to the elements). This layer of indirection is a pointer, on 64bit systems that's 64bit, hence 8bytes.

但是 list 还有另一件事:它们过度分配.否则 list.append 将是一个 O(n) 操作 always - 使其摊销 O(1)(快得多!!!)它过度分配.但是现在它必须跟踪 allocated 大小和 filled 大小(tuple 只需要存储一个大小,因为分配和填充大小始终相同).这意味着每个列表必须存储另一个大小",在 64 位系统上是一个 64 位整数,也是 8 个字节.

But there's another thing that lists do: They over-allocate. Otherwise list.append would be an O(n) operation always - to make it amortized O(1) (much faster!!!) it over-allocates. But now it has to keep track of the allocated size and the filled size (tuples only need to store one size, because allocated and filled size are always identical). That means each list has to store another "size" which on 64bit systems is a 64bit integer, again 8 bytes.

所以 lists 需要比 tuples 至少多 16 个字节的内存.为什么我说至少"?因为过度分配.过度分配意味着它分配了比需要更多的空间.但是,过度分配的数量取决于您如何"创建列表以及追加/删除历史记录:

So lists need at least 16 bytes more memory than tuples. Why did I say "at least"? Because of the over-allocation. Over-allocation means it allocates more space than needed. However, the amount of over-allocation depends on "how" you create the list and the append/deletion history:

>>> l = [1,2,3]
>>> l.__sizeof__()
64
>>> l.append(4)  # triggers re-allocation (with over-allocation), because the original list is full
>>> l.__sizeof__()
96

>>> l = []
>>> l.__sizeof__()
40
>>> l.append(1)  # re-allocation with over-allocation
>>> l.__sizeof__()
72
>>> l.append(2)  # no re-alloc
>>> l.append(3)  # no re-alloc
>>> l.__sizeof__()
72
>>> l.append(4)  # still has room, so no over-allocation needed (yet)
>>> l.__sizeof__()
72

图片

我决定创建一些图像来配合上面的解释.也许这些有用

Images

I decided to create some images to accompany the explanation above. Maybe these are helpful

在您的示例中,这就是它(示意性地)存储在内存中的方式.我用红色(徒手)循环强调了不同之处:

This is how it (schematically) is stored in memory in your example. I highlighted the differences with red (free-hand) cycles:

这实际上只是一个近似值,因为 int 对象也是 Python 对象,而 CPython 甚至重用小整数,因此内存中对象的更准确表示(尽管不那么可读)可能是:

That's actually just an approximation because int objects are also Python objects and CPython even reuses small integers, so a probably more accurate representation (although not as readable) of the objects in memory would be:

有用的链接:

  • tuple 结构体适用于 Python 2.7 的 CPython 存储库
  • list 结构体在 CPython 存储库中对于 Python 2.7
  • int 结构体适用于 Python 2.7 的 CPython 存储库
  • tuple struct in CPython repository for Python 2.7
  • list struct in CPython repository for Python 2.7
  • int struct in CPython repository for Python 2.7

请注意,__sizeof__ 并没有真正返回正确"的大小!它只返回存储值的大小.但是,当您使用 sys.getsizeof 结果不一样:

Note that __sizeof__ doesn't really return the "correct" size! It only returns the size of the stored values. However when you use sys.getsizeof the result is different:

>>> import sys
>>> l = [1,2,3]
>>> t = (1, 2, 3)
>>> sys.getsizeof(l)
88
>>> sys.getsizeof(t)
72

有 24 个额外"字节.这些是真实的,即 __sizeof__ 方法中未考虑的垃圾收集器开销.这是因为您通常不应该直接使用魔术方法 - 使用知道如何处理它们的函数,在这种情况下:sys.getsizeof (实际上 将 GC 开销添加到 __sizeof__ 的返回值).

There are 24 "extra" bytes. These are real, that's the garbage collector overhead that isn't accounted for in the __sizeof__ method. That's because you're generally not supposed to use magic methods directly - use the functions that know how to handle them, in this case: sys.getsizeof (which actually adds the GC overhead to the value returned from __sizeof__).

相关文章