str.startswith 是如何真正起作用的?

2022-01-19 00:00:00 python python-3.x list string tuples

问题描述

我用 startswith() 玩了一会儿,发现了一些有趣的东西:

I've been playing for a bit with startswith() and I've discovered something interesting:

>>> tup = ('1', '2', '3')
>>> lis = ['1', '2', '3', '4']
>>> '1'.startswith(tup)
True
>>> '1'.startswith(lis)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: startswith first arg must be str or a tuple of str, not list

现在,错误很明显,将列表转换为元组可以正常工作:

Now, the error is obvious and casting the list into a tuple will work just fine as it did in the first place:

>>> '1'.startswith(tuple(lis))
True

现在,我的问题是:为什么 第一个参数必须是 str 或 str 前缀的元组,而不是 str 前缀的列表?

Now, my question is: why the first argument must be str or a tuple of str prefixes, but not a list of str prefixes?

AFAIK,startswith() 的 Python 代码可能如下所示:

AFAIK, the Python code for startswith() might look like this:

def startswith(src, prefix):
    return src[:len(prefix)] == prefix

但这只是让我更加困惑,因为即使考虑到这一点,无论是列表还是元组,它仍然没有任何区别.我错过了什么?

But that just confuses me more, because even with it in mind, it still shouldn't make any difference whether is a list or tuple. What am I missing ?


解决方案

技术上没有理由接受其他序列类型,不.源代码大致是这样的:

There is technically no reason to accept other sequence types, no. The source code roughly does this:

if isinstance(prefix, tuple):
    for substring in prefix:
        if not isinstance(substring, str):
            raise TypeError(...)
        return tailmatch(...)
elif not isinstance(prefix, str):
    raise TypeError(...)
return tailmatch(...)

(其中 tailmatch(...) 进行实际的匹配工作).

(where tailmatch(...) does the actual matching work).

所以是的,任何可迭代的都可以用于该 for 循环.但是,所有其他接受多个值的字符串测试 API(以及 isinstance()issubclass())也只接受元组,这告诉您作为用户可以安全地假设值不会被改变.您不能改变元组,但理论上该方法可以改变列表.

So yes, any iterable would do for that for loop. But, all the other string test APIs (as well as isinstance() and issubclass()) that take multiple values also only accept tuples, and this tells you as a user of the API that it is safe to assume that the value won't be mutated. You can't mutate a tuple but the method could in theory mutate the list.

还请注意,您通常测试固定数量的前缀或后缀或类(在 isinstance()issubclass() 的情况下)代码>);该实现不适合 大量 数量的元素.元组意味着您的元素数量有限,而列表可以任意大.

Also note that you usually test for a fixed number of prefixes or suffixes or classes (in the case of isinstance() and issubclass()); the implementation is not suited for a large number of elements. A tuple implies that you have a limited number of elements, while lists can be arbitrarily large.

接下来,如果可以接受任何可迭代或序列类型,那么这将包括字符串;单个字符串也是一个序列.那么应该将单个字符串参数视为单独的字符还是单个前缀?

Next, if any iterable or sequence type would be acceptable, then that would include strings; a single string is also a sequence. Should then a single string argument be treated as separate characters, or as a single prefix?

因此,换句话说,自文档的限制是序列不会发生突变,与其他 API 一致,它暗示要测试的项目数量有限,并消除了关于如何进行的歧义应处理单个字符串参数.

So in other words, it's a limitation to self-document that the sequence won't be mutated, is consistent with other APIs, it carries an implication of a limited number of items to test against, and removes ambiguity as to how a single string argument should be treated.

请注意,这是之前在 Python Ideas 列表中提出的;参见这个帖子;Guido van Rossum 的主要论点是,您要么是单个字符串的特殊情况,要么只接受一个元组.他选择了后者,认为没有必要改变这一点.

Note that this was brought up before on the Python Ideas list; see this thread; Guido van Rossum's main argument there is that you either special case for single strings or for only accepting a tuple. He picked the latter and doesn't see a need to change this.

相关文章