在python的列表中连接元组的元素
问题描述
我有一个包含字符串的元组列表例如:
I have a list of tuples that has strings in it For instance:
[('this', 'is', 'a', 'foo', 'bar', 'sentences')
('is', 'a', 'foo', 'bar', 'sentences', 'and')
('a', 'foo', 'bar', 'sentences', 'and', 'i')
('foo', 'bar', 'sentences', 'and', 'i', 'want')
('bar', 'sentences', 'and', 'i', 'want', 'to')
('sentences', 'and', 'i', 'want', 'to', 'ngramize')
('and', 'i', 'want', 'to', 'ngramize', 'it')]
现在我希望将每个字符串连接到一个元组中以创建一个空格分隔的字符串列表.我使用了以下方法:
Now I wish to concatenate each string in a tuple to create a list of space separated strings. I used the following method:
NewData=[]
for grams in sixgrams:
NewData.append( (''.join([w+' ' for w in grams])).strip())
它工作得很好.
但是,我拥有的列表有超过一百万个元组.所以我的问题是这种方法是否足够有效,或者是否有更好的方法来做到这一点.谢谢.
However, the list that I have has over a million tuples. So my question is that is this method efficient enough or is there some better way to do it. Thanks.
解决方案
对于大量数据,您应该考虑是否需要将它们全部保存在一个列表中.如果您一次处理每一个,则可以创建一个生成器,该生成器将生成每个连接的字符串,但不会让它们全部占用内存:
For a lot of data, you should consider whether you need to keep it all in a list. If you are processing each one at a time, you can create a generator that will yield each joined string, but won't keep them all around taking up memory:
new_data = (' '.join(w) for w in sixgrams)
如果您也可以从生成器中获取原始元组,那么您也可以避免将 sixgrams
列表放在内存中.
if you can get the original tuples also from a generator, then you can avoid having the sixgrams
list in memory as well.
相关文章