Python“设置"带有重复/重复的元素

2022-01-17 00:00:00 python dictionary set collections

问题描述

是否有一种标准方式来表示可以包含重复元素的集合".

据我了解,一个集合恰好有一个元素或零个元素.我希望功能有任何数字.

我目前正在使用以元素为键、数量为值的字典,但这似乎是错误的,原因有很多.

动机:我相信这样的收藏有很多应用.例如,对最喜欢的颜色的调查可以表示为:调查 = ['蓝色','红色','蓝色','绿色']

在这里,我不关心订单,但我关心数量.我想做这样的事情:

survey.add('blue')# 会给出调查 == ['blue', 'red', 'blue', 'green', 'blue']

...甚至可能

survey.remove('blue')# 会给出调查 == ['blue', 'red', 'green']

注意事项:是的,set 不是这种集合的正确术语.还有更正确的吗?

列表当然可以,但所需的集合是无序的.更不用说集合的方法命名在我看来更合适.

解决方案

您正在寻找 multiset.p>

Python 最接近的数据类型是 collections.Counter:

<块引用>

Counter 是一个 dict 子类,用于计算可散列对象.它是一个无序集合,其中元素存储为字典键和它们的计数存储为字典值.允许计数任何整数值,包括零或负数.Counter 类类似于其他语言中的 bag 或 multisets.

对于多重集的实际实现,请使用 bag 类来自 pypi 上的数据结构包.请注意,这仅适用于 Python 3.如果您需要 Python 2,这里 是为 Python 2.4 编写的 bag 的配方.

Is there a standard way to represent a "set" that can contain duplicate elements.

As I understand it, a set has exactly one or zero of an element. I want functionality to have any number.

I am currently using a dictionary with elements as keys, and quantity as values, but this seems wrong for many reasons.

Motivation: I believe there are many applications for such a collection. For example, a survey of favourite colours could be represented by: survey = ['blue', 'red', 'blue', 'green']

Here, I do not care about the order, but I do about quantities. I want to do things like:

survey.add('blue')
# would give survey == ['blue', 'red', 'blue', 'green', 'blue']

...and maybe even

survey.remove('blue')
# would give survey == ['blue', 'red', 'green']

Notes: Yes, set is not the correct term for this kind of collection. Is there a more correct one?

A list of course would work, but the collection required is unordered. Not to mention that the method naming for sets seems to me to be more appropriate.

解决方案

You are looking for a multiset.

Python's closest datatype is collections.Counter:

A Counter is a dict subclass for counting hashable objects. It is an unordered collection where elements are stored as dictionary keys and their counts are stored as dictionary values. Counts are allowed to be any integer value including zero or negative counts. The Counter class is similar to bags or multisets in other languages.

For an actual implementation of a multiset, use the bag class from the data-structures package on pypi. Note that this is for Python 3 only. If you need Python 2, here is a recipe for a bag written for Python 2.4.

相关文章