为什么 Python 'for word in words:' 迭代单个字符而不是单词?


当我在字符串 words 上运行以下代码时:

When I run the following code on a string words:

def word_feats(words):
    return dict([(word, True) for word in words])
print(word_feats("I love this sandwich."))


I get the output dict-comprehension in letters instead of words:

{'a': True, ' ': True, 'c': True, 'e': True, 'd': True, 'I': True, 'h': True, 'l': True, 'o': True, 'n': True, 'i': True, 's': True, 't': True, 'w': True, 'v': True, '.': True}




You need to explicitly split the string on whitespace:

def word_feats(words):
    return dict([(word, True) for word in words.split()])

这使用 str.split() 而没有参数,在任意宽度的空白处分割(包括制表符和行分隔符).否则,字符串是单个字符的序列,直接迭代实际上只会遍历每个字符.

This uses str.split() without arguments, splitting on arbitrary-width whitespace (including tabs and line separators). A string is a sequence of individual characters otherwise, and direct iteration will indeed just loop over each character.


Splitting into words, however, has to be an explicit operation you need to perform yourself, because different use-cases will have different needs on how to split a string into separate parts. Does punctuation count, for example? What about parenthesis or quoting, should words grouped by those not be split, perhaps? Etc.

如果您所做的只是将所有值设置为 True,那么使用 dict.fromkeys() 改为:

If all you are doing is setting all values to True, it'll be much more efficient to use dict.fromkeys() instead:

def word_feats(words):
    return dict.fromkeys(words.split(), True)


>>> def word_feats(words):
...     return dict.fromkeys(words.split(), True)
>>> print(word_feats("I love this sandwich."))
{'I': True, 'this': True, 'love': True, 'sandwich.': True}
