允许在迭代期间删除的自定义字典

2022-01-10 00:00:00 python python-3.x dictionary iterator

问题描述

根据 Lennart Regebro 的回答更新

UPDATED based on Lennart Regebro's answer

假设你遍历一个字典,有时需要删除一个元素.以下是非常有效的:

Suppose you iterate through a dictionary, and sometimes need to delete an element. The following is very efficient:

remove = []
for k, v in dict_.items():
  if condition(k, v):
    remove.append(k)
    continue
  # do other things you need to do in this loop
for k in remove:
  del dict_[k]

这里唯一的开销是构建要删除的键列表;除非它与字典大小相比变大,否则这不是问题.但是,这种方法需要一些额外的编码,所以不是很流行.

The only overhead here is building the list of keys to remove; unless it grows large compared to the dictionary size, it's not an issue. However, this approach requires some extra coding, so it's not very popular.

流行的字典理解方法:

dict_ = {k : v for k, v in dict_ if not condition(k, v)}
for k, v in dict_.items():
  # do other things you need to do in this loop

会产生完整的字典副本,因此如果字典变大或经常调用包含函数,则可能会出现愚蠢的性能损失.

results in a full dictionary copy, and so has the risk of a silly performance hit if dictionaries grow large or the containing function is called often.

更好的方法是只复制键而不是整个字典:

A much better approach is to copy the keys only rather than whole dictionary:

for k in list(dict_.keys()):
  if condition(k, dict_[k]):
    del dict_[k]
    continue
  # do other things you need to do in this loop       

(请注意,所有代码示例都在 Python 3 中,因此 keys()items() 返回的是视图,而不是副本.)

(Note that all code examples are in Python 3, so keys(), items() returns a view, not a copy.)

在大多数情况下,它不会对性能造成太大影响,因为检查最简单的条件(更不用说您在循环中执行的其他操作)的时间通常比添加一个键的时间要长一个列表.

In most cases, it won't hurt performance that much, since the time to check even the simplest condition (not to mention other stuff you're doing in the loop) is usually greater than the time to add one key to a list.

不过,我想知道是否可以使用允许在迭代时删除的自定义字典来避免这种情况:

Still, I am wondering if it's possible to avoid even that with a custom dictionary that allows deletions while iterating:

for k, v in dict_.items():
  if condition(k, v):
    del dict_[k]
    continue
  # do other things you need to do in this loop

也许迭代器总是可以向前看,这样当 __next__ 被调用时,迭代器甚至不用看当前元素就知道去哪里(它只需要在它首先到达它).如果没有下一个元素,迭代器可以设置一个标志,当再次调用 __next__ 时会引发 StopIteration 异常.

Perhaps an iterator could always look ahead, so that when the __next__ is called, the iterator knows where to go without even looking at the current element (it would only need to look at the element when it first gets to it). And if there is no next element, the iterator could just set the flag that would cause StopIteration exception raised whenever __next__ is called again.

如果迭代器尝试前进的元素被删除,则可以引发异常;当多个迭代同时进行时,不需要支持删除.

If the element the iterator tries to advance to turns out to be deleted, it's fine to raise an exception; there is no need to support deletions while multiple iterations are going on simultaneously.

这种方法有什么问题吗?

Are there any problems with this approach?

一个问题是,与现有的 dict 相比,我不确定它是否可以在没有材料开销的情况下完成;否则,使用 list(dict_) 方法会更快!

One problem is that I'm not sure it can be done with no material overhead compared to the existing dict; otherwise, it would be faster to use the list(dict_) approach!

更新:

我尝试了所有版本.我没有报告时间,因为它们显然非常依赖于确切的情况.但可以肯定地说,在许多情况下,最快的方法可能是 list(dict_).毕竟,如果你想一想,复制是最快的操作,它会随着列表的大小线性增长;几乎任何其他开销,只要它也与列表大小成正比,都可能更大.

I tried all the versions. I don't report the timing, since they are clearly very dependent on the exact situation. But it seems safe to say that in many cases, the fastest approach is likely to be list(dict_). After all, if you think about, the copy is the fastest operation that grows linearly with size of the list; almost any other overhead, as long as it's also proportional to the list size, is likely to be bigger.

我真的很喜欢所有的想法,但由于我只能选择一个,我接受上下文管理器解决方案,因为它允许使用字典作为正常或增强",只需非常小的代码更改.

I really like all the ideas, but since I have to select only one, I'm accepting the context manager solution since it allows to use the dictionary as either normal or "enhanced" with very small code changes.


解决方案

正如您所注意到的,您可以将要删除的项目存储在某处,并将它们的删除推迟到以后.然后问题就变成了何时 清除它们以及如何 以确保最终调用清除方法.答案是上下文管理器,它也是 dict 的子类.

As you note, you can store the items to delete somewhere and defer the deletion of them until later. The problem then becomes when to purge them and how to make sure that the purge method eventually gets called. The answer to this is a context manager which is also a subclass of dict.

class dd_dict(dict):    # the dd is for "deferred delete"
    _deletes = None
    def __delitem__(self, key):
        if key not in self:
            raise KeyError(str(key))
        dict.__delitem__(self, key) if self._deletes is None else self._deletes.add(key)
    def __enter__(self):
        self._deletes = set()
    def __exit__(self, type, value, tb):
        for key in self._deletes:
            try:
                dict.__delitem__(self, key)
            except KeyError:
                pass
        self._deletes = None

用法:

# make the dict and do whatever to it
ddd = dd_dict(a=1, b=2, c=3)

# now iterate over it, deferring deletes
with ddd:
    for k, v in ddd.iteritems():
        if k is "a":
            del ddd[k]
            print ddd     # shows that "a" is still there

print ddd                 # shows that "a" has been deleted

如果您不在 with 块中,当然,删除是立即的;由于这是一个 dict 子类,它的工作方式与上下文管理器之外的常规 dict 一样.

If you're not in a with block, of course, deletes are immediate; as this is a dict subclass, it works just like a regular dict outside of a context manager.

您也可以将其实现为字典的包装类:

You could also implement this as a wrapper class for a dictionary:

class deferring_delete(object):
    def __init__(self, d):
        self._dict = d
    def __enter__(self):
        self._deletes = set()
        return self
    def __exit__(self, type, value, tb):
        for key in self._deletes:
            try:
                del self._dict[key]
            except KeyError:
                pass
        del self._deletes
    def __delitem__(self, key):
        if key not in self._dict:
            raise KeyError(str(key))
        self._deletes.add(key)

d = dict(a=1, b=2, c=3)

with deferring_delete(d) as dd:
    for k, v in d.iteritems():
        if k is "a":
            del dd[k]    # delete through wrapper

print d

如果您愿意,甚至可以将包装类完全用作字典,尽管这需要更多代码.

It's even possible to make the wrapper class fully functional as a dictionary, if you want, though that's a fair bit more code.

在性能方面,诚然这不是一场胜利,但从程序员友好的角度来看,我喜欢它.第二种方法应该会稍微快一些,因为它不会在每次删除时测试一个标志.

Performance-wise, this is admittedly not such a win, but I like it from a programmer-friendliness standpoint. The second method should be very slightly faster since it's not testing a flag on each delete.

相关文章