为什么元组在内存中占用的空间比列表少?
问题描述
tuple
在 Python 中占用更少的内存空间:
A tuple
takes less memory space in Python:
>>> a = (1,2,3)
>>> a.__sizeof__()
48
而 list
s 占用更多内存空间:
whereas list
s takes more memory space:
>>> b = [1,2,3]
>>> b.__sizeof__()
64
Python 内存管理内部发生了什么?
What happens internally on the Python memory management?
解决方案
我假设您使用的是 64 位的 CPython(我在 CPython 2.7 64 位上得到了相同的结果).其他 Python 实现或您使用 32 位 Python 时可能存在差异.
I assume you're using CPython and with 64bits (I got the same results on my CPython 2.7 64-bit). There could be differences in other Python implementations or if you have a 32bit Python.
不管实现如何,list
s 是可变大小的,而 tuple
s 是固定大小的.
Regardless of the implementation, list
s are variable-sized while tuple
s are fixed-size.
所以 tuple
s 可以将元素直接存储在结构中,另一方面,列表需要一个间接层(它存储指向元素的指针).这一间接层是一个指针,在 64 位系统上是 64 位,因此是 8 字节.
So tuple
s can store the elements directly inside the struct, lists on the other hand need a layer of indirection (it stores a pointer to the elements). This layer of indirection is a pointer, on 64bit systems that's 64bit, hence 8bytes.
但是 list
还有另一件事:它们过度分配.否则 list.append
将是一个 O(n)
操作 always - 使其摊销 O(1)
(快得多!!!)它过度分配.但是现在它必须跟踪 allocated 大小和 filled 大小(tuple
只需要存储一个大小,因为分配和填充大小始终相同).这意味着每个列表必须存储另一个大小",在 64 位系统上是一个 64 位整数,也是 8 个字节.
But there's another thing that list
s do: They over-allocate. Otherwise list.append
would be an O(n)
operation always - to make it amortized O(1)
(much faster!!!) it over-allocates. But now it has to keep track of the allocated size and the filled size (tuple
s only need to store one size, because allocated and filled size are always identical). That means each list has to store another "size" which on 64bit systems is a 64bit integer, again 8 bytes.
所以 list
s 需要比 tuple
s 至少多 16 个字节的内存.为什么我说至少"?因为过度分配.过度分配意味着它分配了比需要更多的空间.但是,过度分配的数量取决于您如何"创建列表以及追加/删除历史记录:
So list
s need at least 16 bytes more memory than tuple
s. Why did I say "at least"? Because of the over-allocation. Over-allocation means it allocates more space than needed. However, the amount of over-allocation depends on "how" you create the list and the append/deletion history:
>>> l = [1,2,3]
>>> l.__sizeof__()
64
>>> l.append(4) # triggers re-allocation (with over-allocation), because the original list is full
>>> l.__sizeof__()
96
>>> l = []
>>> l.__sizeof__()
40
>>> l.append(1) # re-allocation with over-allocation
>>> l.__sizeof__()
72
>>> l.append(2) # no re-alloc
>>> l.append(3) # no re-alloc
>>> l.__sizeof__()
72
>>> l.append(4) # still has room, so no over-allocation needed (yet)
>>> l.__sizeof__()
72
图片
我决定创建一些图像来配合上面的解释.也许这些有用
Images
I decided to create some images to accompany the explanation above. Maybe these are helpful
在您的示例中,这就是它(示意性地)存储在内存中的方式.我用红色(徒手)循环强调了不同之处:
This is how it (schematically) is stored in memory in your example. I highlighted the differences with red (free-hand) cycles:
这实际上只是一个近似值,因为 int
对象也是 Python 对象,而 CPython 甚至重用小整数,因此内存中对象的更准确表示(尽管不那么可读)可能是:
That's actually just an approximation because int
objects are also Python objects and CPython even reuses small integers, so a probably more accurate representation (although not as readable) of the objects in memory would be:
有用的链接:
tuple
结构体适用于 Python 2.7 的 CPython 存储库list
结构体在 CPython 存储库中对于 Python 2.7int
结构体适用于 Python 2.7 的 CPython 存储库
tuple
struct in CPython repository for Python 2.7list
struct in CPython repository for Python 2.7int
struct in CPython repository for Python 2.7
请注意,__sizeof__
并没有真正返回正确"的大小!它只返回存储值的大小.但是,当您使用 sys.getsizeof
结果不一样:
Note that __sizeof__
doesn't really return the "correct" size! It only returns the size of the stored values. However when you use sys.getsizeof
the result is different:
>>> import sys
>>> l = [1,2,3]
>>> t = (1, 2, 3)
>>> sys.getsizeof(l)
88
>>> sys.getsizeof(t)
72
有 24 个额外"字节.这些是真实的,即 __sizeof__
方法中未考虑的垃圾收集器开销.这是因为您通常不应该直接使用魔术方法 - 使用知道如何处理它们的函数,在这种情况下:sys.getsizeof
(实际上 将 GC 开销添加到 __sizeof__
的返回值).
There are 24 "extra" bytes. These are real, that's the garbage collector overhead that isn't accounted for in the __sizeof__
method. That's because you're generally not supposed to use magic methods directly - use the functions that know how to handle them, in this case: sys.getsizeof
(which actually adds the GC overhead to the value returned from __sizeof__
).
相关文章