仅使用传递的参数子集创建 namedtuple 对象

2022-01-21 00:00:00 python namedtuple decorator arguments

问题描述

我正在从 MySQL 数据库中提取行作为字典(使用 SSDictCursor)并使用以下方法进行一些处理:

I am pulling rows from a MySQL database as dictionaries (using SSDictCursor) and doing some processing, using the following approach:

from collections import namedtuple

class Foo(namedtuple('Foo', ['id', 'name', 'age'])):
    __slots__ = ()

    def __init__(self, *args):
        super(Foo, self).__init__(self, *args)

    # ...some class methods below here

class Bar(namedtuple('Bar', ['id', 'address', 'city', 'state']):
    __slots__ = ()

    def __init__(self, *args):
        super(Bar, self).__init__(self, *args)

    # some class methods here...

# more classes for distinct processing tasks...

要使用namedtuple,我必须事先确切知道我想要的字段,这很好.但是,我希望允许用户将一个简单的 SELECT * 语句输入到我的程序中,然后该语句将遍历结果集的行,使用这些不同的类执行多个任务.为了完成这项工作,我的课程必须以某种方式检查从光标传入的 N 个字段,并且只获取特定的子集 M <N 对应于 namedtuple 定义所期望的名称.

To use namedtuple, I have to know exactly the fields I want beforehand, which is fine. However, I would like to allow the user to feed a simple SELECT * statement into my program, which will then iterate through the rows of the result set, performing multiple tasks using these different classes. In order to make this work, my classes have to somehow examine the N fields coming in from the cursor and take only the particular subset M < N corresponding to the names expected by the namedtuple definition.

我的第一个想法是尝试编写一个可以应用于每个类的装饰器,它会检查类以查看它期望的字段,并仅将适当的参数传递给新对象.但是这几天我才开始阅读有关装饰器的内容,我对它们还没有那么自信.

My first thought was to try writing a single decorator that I could apply to each of my classes, which would examine the class to see what fields it was expecting, and pass only the appropriate arguments to the new object. But I've just started reading about decorators in the past few days, and I'm not that confident yet with them.

所以我的问题分为两部分:

So my question is in two parts:

  1. 这是否可以使用单个装饰器来确定被装饰的特定类需要哪些字段?
  2. 是否存在具有相同功能且更易于使用、修改和理解的替代方案?

我有太多潜在的表和字段排列,每个结果集中有数百万行,只需要编写一个通用的 namedtuple 子类来处理每个不同的任务.查询时间和可用内存已被证明是限制因素.

I have too many potential permutations of tables and fields, with millions of rows in each result set, to just write one all-purpose namedtuple subclass to deal with each different task. Query time and available memory have proven to be limiting factors.

如果需要:

>>> sys.version
'2.7.5 (default, May 15 2013, 22:43:36) [MSC v.1500 32 bit (Intel)]'


解决方案

首先,您必须重写 __new__ 以自定义 namedtuple 创建,因为 namedtuple__new__ 方法在您到达 __init__ 之前检查它的参数.

First, you have to override __new__ in order to customize namedtuple creation, because a namedtuple's __new__ method checks its arguments before you even get to __init__.

其次,如果您的目标是接受和过滤关键字参数,您需要获取 **kwargs 并过滤并传递它,而不仅仅是 *args.

Second, if your goal is to accept and filter keyword arguments, you need to take **kwargs and filter and pass that through, not just *args.

所以,把它放在一起:

class Foo(namedtuple('Foo', ['id', 'name', 'age'])):
    __slots__ = ()

    def __new__(cls, *args, **kwargs):
        kwargs = {k: v for k, v in kwargs.items() if k in cls._fields}
        return super(Foo, cls).__new__(cls, *args, **kwargs)

<小时>

您可以用 itemgetter 替换该 dict 理解,但每次我使用带有多个键的 itemgetter 时,没有人明白它的含义,所以我很不情愿地停止使用它.


You could replace that dict comprehension with itemgetter, but every time I use itemgetter with multiple keys, nobody understands what it means, so I've reluctantly stopped using it.

如果你有理由这样做,你也可以重写 __init__,因为一旦 __new__ 返回 Foo 就会调用它实例.

You can also override __init__ if you have a reason to do so, because it will be called as soon as __new__ returns a Foo instance.

但是你不需要仅仅为了这个,因为 namedtuple 的 __init__ 不接受任何参数或做任何事情;这些值已经在 __new__ 中设置(就像 tuple 和其他不可变类型一样).看起来对于 CPython 2.7,您实际上 can super(Foo, self).__init__(*args, **kwargs) 会被忽略,但使用PyPy 1.9 和 CPython 3.3,你会得到一个 TypeError.无论如何,没有理由通过它们,也没有说它应该起作用,所以即使在 CPython 2.7 中也不要这样做.

But you don't need to just for this, because the namedtuple's __init__ doesn't take any arguments or do anything; the values have already been set in __new__ (just as with tuple, and other immutable types). It looks like with CPython 2.7, you actually can super(Foo, self).__init__(*args, **kwargs) and it'll just be ignored, but with PyPy 1.9 and CPython 3.3, you get a TypeError. At any rate, there's no reason to pass them, and nothing saying it should work, so don't do it even in CPython 2.7.

请注意,您的 __init__ 将获得未经过滤的 kwargs.如果你想改变它,你可以在 __new__ 中就地改变 kwargs,而不是制作一个新的字典.但我相信这仍然不能保证做任何事情.它只是让它实现定义你是否得到过滤的参数或未过滤的,而不是保证未过滤的.

Note that you __init__ will get the unfiltered kwargs. If you want to change that, you could mutate kwargs in-place in __new__, instead of making a new dictionary. But I believe that still isn't guaranteed to do anything; it just makes it implementation-defined whether you get the filtered args or unfiltered, instead of guaranteeing the unfiltered.

那么,你能总结一下吗?当然!

So, can you wrap this up? Sure!

def LenientNamedTuple(name, fields):
    class Wrapper(namedtuple(name, fields)):
        __slots__ = ()
        def __new__(cls, *args, **kwargs):
            args = args[:len(fields)]
            kwargs = {k: v for k, v in kwargs.items() if k in fields}
            return super(Wrapper, cls).__new__(cls, *args, **kwargs)
    return Wrapper

请注意,这样做的好处是不必使用准私有/半文档化的 _fields 类属性,因为我们已经将 fields 作为参数.

Note that this has the advantage of not having to use the quasi-private/semi-documented _fields class attribute, because we already have fields as a parameter.

此外,正如评论中所建议的那样,我添加了一行来丢弃任何多余的位置参数.

Also, while we're at it, I added a line to toss away any excess positional arguments, as suggested in a comment.

现在您只需像使用 namedtuple 一样使用它,它会自动忽略任何多余的参数:

Now you just use it as you'd use namedtuple, and it automatically ignores any excess arguments:

class Foo(LenientNamedTuple('Foo', ['id', 'name', 'age'])):
    pass

print(Foo(id=1, name=2, age=3, spam=4))

print(Foo(1, 2, 3, 4, 5))print(Foo(1, age=3, name=2, eggs=4))

    print(Foo(1, 2, 3, 4, 5))     print(Foo(1, age=3, name=2, eggs=4))

我已经上传了一个测试,用dict()替换了dict理解在用于 2.6 兼容性的genexpr 上(2.6 是具有 namedtuple 的最早版本),但没有截断 args.它适用于 CPython 2.6.7、2.7.2、2.7.5、3.2.3、3.3.0 和 3.3.1、PyPy 1.9.0 中的位置、关键字和混合参数,包括无序关键字和 2.0b1,以及 Jython 2.7b.

I've uploaded a test, replacing the dict comprehension with dict() on a genexpr for 2.6 compatibility (2.6 is the earliest version with namedtuple), but without the args truncating. It works with positional, keyword, and mixed args, including out-of-order keywords, in CPython 2.6.7, 2.7.2, 2.7.5, 3.2.3, 3.3.0, and 3.3.1, PyPy 1.9.0 and 2.0b1, and Jython 2.7b.

相关文章