Dict/Set 解析顺序一致性

2022-01-17 00:00:00 python dictionary set python-internals

问题描述

采用可散列对象的容器(例如 dict 键或 set 项).因此,字典只能有一个值为 11.0True 等的键.(注意:有些简化 - 哈希冲突是允许,但这些值被认为是相等的)

Containers that take hashable objects (such as dict keys or set items). As such, a dictionary can only have one key with the value 1, 1.0 or True etc. (note: simplified somewhat - hash collisions are permitted, but these values are considered equal)

我的问题是:解析顺序是否明确定义,生成的对象是否可跨实现预测?例如,OSX Python 2.7.11 和 3.5.1 对 dict 的解释如下:

My question is: is the parsing order well-defined and is the resulting object predictable across implementations? For example, OSX Python 2.7.11 and 3.5.1 interprets dict like so:

>>> { True: 'a', 1: 'b', 1.0: 'c', (1+0j): 'd' }
{True: 'd'}

在这种情况下,似乎保留了第一个键和最后一个值.

In this case, it appears that the first key and the last value are preserved.

类似,在set的情况下:

>>> { True, 1, 1.0, (1+0j) }
set([(1+0j)])

这里似乎保留了最后一个项.

但是(如评论中所述):

But (as mentioned in comments):

>>> set([True, 1, 1.0])
set([True])

现在迭代中的第一个被保留.

Now the first in the iterable is preserved.

文档指出项目的顺序(例如在 dict.items 中)是未定义的,但是我的问题是指 constructing dictset 对象.

The documentation notes that the order of items (for example in dict.items) is undefined, however my question refers to the result of constructing dict or set objects.


解决方案

  • 如 @jsf 的回答中所述,错误现已在最新版本的 python 中得到修复
    • The bug is now fixed in recent versions of python as explained in @jsf's answer
    • 词典显示

      如果给出了一个逗号分隔的键/数据对序列,它们从左到右被评估以定义字典的条目:每个键对象都用作字典中的键以存储相应的数据.这意味着您可以在键/数据列表中多次指定同一个键,并且该键的最终字典值将是最后一个给定的值.

      If a comma-separated sequence of key/datum pairs is given, they are evaluated from left to right to define the entries of the dictionary: each key object is used as a key into the dictionary to store the corresponding datum. This means that you can specify the same key multiple times in the key/datum list, and the final dictionary’s value for that key will be the last one given.

      与列表和集合推导相比,字典推导需要两个用冒号隔开的表达式,后跟通常的for"和if"子句.运行推导时,生成的键和值元素会按照它们产生的顺序插入到新字典中.

      A dict comprehension, in contrast to list and set comprehensions, needs two expressions separated with a colon followed by the usual "for" and "if" clauses. When the comprehension is run, the resulting key and value elements are inserted in the new dictionary in the order they are produced.

      设置显示

      集合显示产生一个新的可变集合对象,其内容由表达式序列或推导式指定.当提供以逗号分隔的表达式列表时,它的元素从左到右求值并添加到集合对象中.当提供推导时,集合由推导产生的元素构成.

      A set display yields a new mutable set object, the contents being specified by either a sequence of expressions or a comprehension. When a comma-separated list of expressions is supplied, its elements are evaluated from left to right and added to the set object. When a comprehension is supplied, the set is constructed from the elements resulting from the comprehension.

      调用集合构造函数或使用推导式和普通文字是有区别的.

      There is a difference in calling the set constructor or using a comprehension and the plain literal.

      def f1():
          return {x for x in [True, 1]}
      
      def f2():
          return set([True, 1])
      def f3():
          return {True, 1}
      print(f1())
      print(f2())
      print(f3())
      import dis
      
      print("f1")
      dis.dis(f1)
      
      print("f2")
      
      dis.dis(f2)
      
      print("f3")
      dis.dis(f3)
      

      输出:

      {True}
      {True}
      {1}
      

      它们的创建方式会影响结果:

      How they are created influences the outcome:

          605           0 LOAD_CONST               1 (<code object <setcomp> at 0x7fd17dc9a270, file "/home/padraic/Dropbox/python/test.py", line 605>)
                    3 LOAD_CONST               2 ('f1.<locals>.<setcomp>')
                    6 MAKE_FUNCTION            0
                    9 LOAD_CONST               3 (True)
                   12 LOAD_CONST               4 (1)
                   15 BUILD_LIST               2
                   18 GET_ITER
                   19 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
                   22 RETURN_VALUE
      f2
      608           0 LOAD_GLOBAL              0 (set)
                    3 LOAD_CONST               1 (True)
                    6 LOAD_CONST               2 (1)
                    9 BUILD_LIST               2
                   12 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
                   15 RETURN_VALUE
      f3
      611           0 LOAD_CONST               1 (True)
                    3 LOAD_CONST               2 (1)
                    6 BUILD_SET                2
                    9 RETURN_VALUE
      

      当你传递一个用逗号分隔的纯文字时,Python 只运行 BUILD_SET 字节码:

      Python only runs the BUILD_SET bytecode when you pass a pure literal separated by commas as per:

      当提供一个逗号分隔的表达式列表时,它的元素从左到右被计算并添加到集合对象中.

      理解的行:

      当提供推导时,集合由推导产生的元素构成.

      感谢 Hamish 提交 错误报告,它确实归结为 BUILD_SET 操作码根据 Raymond Hettinger 在链接中的评论 罪魁祸首是 Python/ceval.c 不必要地向后循环,其实现如下:

      So thanks to Hamish filing a bug report it does indeed come down to the BUILD_SET opcode as per Raymond Hettinger's comment in the link The culprit is the BUILD_SET opcode in Python/ceval.c which unnecessarily loops backwards, the implementation of which is below:

       TARGET(BUILD_SET) {
                  PyObject *set = PySet_New(NULL);
                  int err = 0;
                  if (set == NULL)
                      goto error;
                  while (--oparg >= 0) {
                      PyObject *item = POP();
                      if (err == 0)
                          err = PySet_Add(set, item);
                      Py_DECREF(item);
                  }
                  if (err != 0) {
                      Py_DECREF(set);
                      goto error;
                  }
                  PUSH(set);
                  DISPATCH();
              }
      

相关文章