为什么 Python 'for word in words:' 迭代单个字符而不是单词?

2022-01-24 00:00:00 python string iteration for-loop string-iteration

问题描述

当我在字符串 words 上运行以下代码时:

When I run the following code on a string words:

def word_feats(words): return dict([(word, True) for word in words]) print(word_feats("I love this sandwich."))

我得到的是字母而不是单词的输出字典理解:

I get the output dict-comprehension in letters instead of words:

{'a': True, ' ': True, 'c': True, 'e': True, 'd': True, 'I': True, 'h': True, 'l': True, 'o': True, 'n': True, 'i': True, 's': True, 't': True, 'w': True, 'v': True, '.': True}

我做错了什么?

解决方案

你需要显式拆分空格上的字符串:

You need to explicitly split the string on whitespace:

def word_feats(words): return dict([(word, True) for word in words.split()])

这使用 str.split() 而没有参数，在任意宽度的空白处分割(包括制表符和行分隔符).否则，字符串是单个字符的序列，直接迭代实际上只会遍历每个字符.

This uses str.split() without arguments, splitting on arbitrary-width whitespace (including tabs and line separators). A string is a sequence of individual characters otherwise, and direct iteration will indeed just loop over each character.

然而，拆分成单词必须是您需要自己执行的显式操作，因为不同的用例对如何将字符串拆分成单独的部分有不同的需求.例如，标点符号算不算?括号或引用呢，也许按这些分组的单词不应该分开?等等.

Splitting into words, however, has to be an explicit operation you need to perform yourself, because different use-cases will have different needs on how to split a string into separate parts. Does punctuation count, for example? What about parenthesis or quoting, should words grouped by those not be split, perhaps? Etc.

如果您所做的只是将所有值设置为 True，那么使用 dict.fromkeys() 改为:

If all you are doing is setting all values to True, it'll be much more efficient to use dict.fromkeys() instead:

def word_feats(words): return dict.fromkeys(words.split(), True)

演示:

>>> def word_feats(words): ... return dict.fromkeys(words.split(), True) ... >>> print(word_feats("I love this sandwich.")) {'I': True, 'this': True, 'love': True, 'sandwich.': True}

相关文章