避免 Python sum 默认启动 arg 行为
问题描述
我正在使用一个实现 __add__
但不继承 int
的 Python 对象.MyObj1 + MyObj2
工作正常,但 sum([MyObj1, MyObj2])
导致 TypeError
,因为 sum()
第一次尝试 0 + MyObj
.为了使用 sum()
,我的对象需要 __radd__
来处理 MyObj + 0
或我需要提供一个空对象作为 start
参数.有问题的对象并非设计为空的.
I am working with a Python object that implements __add__
, but does not subclass int
. MyObj1 + MyObj2
works fine, but sum([MyObj1, MyObj2])
led to a TypeError
, becausesum()
first attempts 0 + MyObj
. In order to use sum()
, my object needs __radd__
to handle MyObj + 0
or I need to provide an empty object as the start
parameter. The object in question is not designed to be empty.
在任何人问之前,对象不是类似列表或类似字符串的,因此使用 join() 或 itertools 无济于事.
Before anyone asks, the object is not list-like or string-like, so use of join() or itertools would not help.
编辑详情:该模块有一个 SimpleLocation 和一个 CompoundLocation.我将 Location 缩写为 Loc.SimpleLoc
包含一个右开区间,即 [start, end).添加 SimpleLoc
会产生一个 CompoundLoc
,其中包含间隔列表,例如[[3, 6), [10, 13)]
.最终用途包括遍历联合,例如[3, 4, 5, 10, 11, 12]
,检查长度,检查成员.
Edit for details: the module has a SimpleLocation and a CompoundLocation. I'll abbreviate Location to Loc. A SimpleLoc
contains one right-open interval, i.e. [start, end). Adding SimpleLoc
yields a CompoundLoc
, which contains a list of the intervals, e.g. [[3, 6), [10, 13)]
. End uses include iterating through the union, e.g. [3, 4, 5, 10, 11, 12]
, checking length, and checking membership.
数字可能相对较大(例如,小于 2^32,但通常为 2^20).间隔可能不会很长(100-2000,但可能更长).目前,仅存储端点.我现在正在试探性地考虑尝试对 set
进行子类化,以便将位置构造为 set(xrange(start, end))
.但是,添加集合会让 Python(和数学家)适应.
The numbers can be relatively large (say, smaller than 2^32 but commonly 2^20). The intervals probably won't be extremely long (100-2000, but could be longer). Currently, only the endpoints are stored. I am now tentatively thinking of attempting to subclass set
such that the location is constructed as set(xrange(start, end))
. However, adding sets will give Python (and mathematicians) fits.
我看过的问题:
- python 的 sum() 和非整数值
- 为什么在 python 中有一个 start 参数内置求和函数
- 重写 __add__ 方法后出现类型错误
我正在考虑两种解决方案.一种是避免 sum()
并使用此 评论.我不明白为什么 sum()
首先将迭代的第 0 项添加到 0 而不是添加第 0 项和第 1 项(如链接注释中的循环);我希望有一个神秘的整数优化原因.
I'm considering two solutions. One is to avoid sum()
and use the loop offered in this comment. I don't understand why sum()
begins by adding the 0th item of the iterable to 0 rather than adding the 0th and 1st items (like the loop in the linked comment); I hope there's an arcane integer optimization reason.
我的其他解决方案如下;虽然我不喜欢硬编码的零校验,但这是我能够使 sum()
工作的唯一方法.
My other solution is as follows; while I don't like the hard-coded zero check, it's the only way I've been able to make sum()
work.
# ...
def __radd__(self, other):
# This allows sum() to work (the default start value is zero)
if other == 0:
return self
return self.__add__(other)
总而言之,还有其他方法可以对既不能加整数也不能为空的对象使用sum()
?
In summary, is there another way to use sum()
on objects that can neither be added to integers nor be empty?
解决方案
代替sum
,使用:
import operator
from functools import reduce
reduce(operator.add, seq)
在 Python 2 中 reduce
是内置的,所以看起来像:
in Python 2 reduce
was built-in so this looks like:
import operator
reduce(operator.add, seq)
Reduce 通常比 sum 更灵活——你可以提供任何二进制函数,不仅 add
,而且你可以可选地提供一个初始元素,而 sum
总是使用一个.
Reduce is generally more flexible than sum - you can provide any binary function, not only add
, and you can optionally provide an initial element while sum
always uses one.
另请注意:(警告:数学在前面咆哮)
从代数的角度来看,为没有中性元素的 add
w/r/t 对象提供支持有点尴尬.
Providing support for add
w/r/t objects that have no neutral element is a bit awkward from the algebraic points of view.
请注意:
- 自然
- 真实
- 复数
- N-d 个向量
- NxM 矩阵
- 字符串
连同添加形式的Monoid - 即它们是关联的并且具有某种中性元素.
together with addition form a Monoid - i.e. they are associative and have some kind of neutral element.
如果您的操作不是关联的并且没有中性元素,那么它就不会类似于"加法.因此,不要期望它与 一起工作得很好总和
.
If your operation isn't associative and doesn't have a neutral element, then it doesn't "resemble" addition. Hence, don't expect it to work well with sum
.
在这种情况下,使用函数或方法而不是运算符可能会更好.这可能不那么令人困惑,因为您的类的用户看到它支持 +
,可能会期望它会以单向方式表现(就像加法通常那样).
In such case, you might be better off with using a function or a method instead of an operator. This may be less confusing since the users of your class, seeing that it supports +
, are likely to expect that it will behave in a monoidic way (as addition normally does).
感谢您的扩展,我现在将参考您的特定模块:
Thanks for expanding, I'll refer to your particular module now:
这里有两个概念:
- 简单的地点,
- 复合地点.
可以添加简单的位置确实是有道理的,但是它们不会形成一个幺半群,因为它们的添加不满足闭包的基本属性——两个 SimpleLoc 的总和不是一个 SimpleLoc.它通常是一个 CompoundLoc.
It indeed makes sense that simple locations could be added, but they don't form a monoid because their addition doesn't satisfy the basic property of closure - the sum of two SimpleLocs isn't a SimpleLoc. It's, generally, a CompoundLoc.
OTOH,带有加法的 CompoundLocs 对我来说就像一个幺半群(一个可交换的幺半群,而我们正在使用它):它们的总和也是一个 CompoundLoc,它们的加法是关联的、可交换的和 中性元素是一个包含零个 SimpleLocs 的空 CompoundLoc.
OTOH, CompoundLocs with addition looks like a monoid to me (a commutative monoid, while we're at it): A sum of those is a CompoundLoc too, and their addition is associative, commutative and the neutral element is an empty CompoundLoc that contains zero SimpleLocs.
如果您同意我的观点(并且以上内容与您的实现相匹配),那么您将能够使用 sum
,如下所示:
If you agree with me (and the above matches your implementation), then you'll be able to use sum
as following:
sum( [SimpleLoc1, SimpleLoc2, SimpleLoc3], start=ComplexLoc() )
确实,这似乎有效.
我现在正在尝试对 set 进行子类化,以便将位置构造为 set(xrange(start, end)).但是,添加集合会让 Python(和数学家)适应.
I am now tentatively thinking of attempting to subclass set such that the location is constructed as set(xrange(start, end)). However, adding sets will give Python (and mathematicians) fits.
嗯,位置是一组数字,所以在它们之上抛出一个类似集合的接口是有意义的(所以 __contains__
、__iter__
、__len__
,也许 __or__
作为 +
的别名,__and__
作为产品等).
Well, locations are some sets of numbers, so it makes sense to throw a set-like interface on top of them (so __contains__
, __iter__
, __len__
, perhaps __or__
as an alias of +
, __and__
as the product, etc).
至于 xrange
的构造,你真的需要吗?如果您知道要存储间隔集,那么您可能会通过坚持 [start, end)
对的表示来节省空间.如果您觉得有帮助,您可以输入一个实用方法,该方法采用任意整数序列并将其转换为最佳 SimpleLoc
或 CompoundLoc
.
As for construction from xrange
, do you really need it? If you know that you're storing sets of intervals, then you're likely to save space by sticking to your representation of [start, end)
pairs. You could throw in an utility method that takes an arbitrary sequence of integers and translates it to an optimal SimpleLoc
or CompoundLoc
if you feel it's going to help.
相关文章