高级索引分配是否复制数组数据?

2022-01-20 00:00:00 python numpy 复制

问题描述

我正在慢慢尝试理解 numpy 中 views 和 copys 以及可变类型与不可变类型之间的区别.

I am slowly trying to understand the difference between views and copys in numpy, as well as mutable vs. immutable types.

如果我使用 '高级索引' 它应该返回一个副本.这似乎是真的:

If I access part of an array with 'advanced indexing' it is supposed to return a copy. This seems to be true:

In [1]: import numpy as np
In [2]: a = np.zeros((3,3))
In [3]: b = np.array(np.identity(3), dtype=bool)

In [4]: c = a[b]

In [5]: c[:] = 9

In [6]: a
Out[6]: 
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

由于c 只是一个副本,它不共享数据并且更改它不会改变a.然而,这让我感到困惑:

Since c is just a copy, it does not share data and changing it does not mutate a. However, this is what confuses me:

In [7]: a[b] = 1

In [8]: a
Out[8]: 
array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

看来,即使我使用高级索引,赋值仍然会将左侧的内容视为视图.显然,第 2 行中的 a 与第 6 行中的 a 是相同的对象/数据,因为改变 c 对其没有影响.

So, it seems, even if I use advanced indexing, assignment still treats the thing on the left as a view. Clearly the a in line 2 is the same object/data as the a in line 6, since mutating c has no effect on it.

所以我的问题是:第 8 行中的 a 是和以前一样的对象/数据(当然不包括对角线)还是副本?换句话说,是a的数据被复制到了新的a,还是它的数据在原地发生了变异?

So my question: is the a in line 8 the same object/data as before (not counting the diagonal of course) or is it a copy? In other words, was a's data copied to the new a, or was its data mutated in place?

例如,是不是这样的:

x = [1,2,3]
x += [4]

或喜欢:

y = (1,2,3)
y += (4,)

我不知道如何检查这一点,因为在任何一种情况下,a.flags.owndata 都是 True.如果我以一种令人困惑的方式思考这个问题,请随时详细说明或回答不同的问题.

I don't know how to check for this because in either case, a.flags.owndata is True. Please feel free to elaborate or answer a different question if I'm thinking about this in a confusing way.


解决方案

当你执行 c = a[b] 时,a.__get_item__ 调用b 作为其唯一参数,返回的任何内容都分配给 c.

When you do c = a[b], a.__get_item__ is called with b as its only argument, and whatever gets returned is assigned to c.

当您执行a[b] = c 时,a.__setitem__ 会与 bc 一起调用作为参数,返回的任何内容都会被默默地丢弃.

When you doa[b] = c, a.__setitem__ is called with b and c as arguments and whatever gets returned is silently discarded.

因此,尽管具有相同的 a[b] 语法,但两个表达式执行不同的操作.您可以继承 ndarray,重载这两个函数,并让它们表现不同.在 numpy 中默认情况下,前者返回一个副本(如果 b 是一个数组),但后者修改 a 就地.

So despite having the same a[b] syntax, both expressions are doing different things. You could subclass ndarray, overload this two functions, and have them behave differently. As is by default in numpy, the former returns a copy (if b is an array) but the latter modifies a in place.

相关文章