如何将项目附加到 Pandas 中不同列的列表中
问题描述
我有一个如下所示的数据框:
I have a dataframe that looks like this:
dic = {'A':['PINCO','PALLO','CAPPO','ALLOP'],
'B':['KILO','KULO','FIGA','GAGO'],
'C':[['CAL','GOL','TOA','PIA','STO'],
['LOL','DAL','ERS','BUS','TIS'],
['PIS','IPS','ZSP','YAS','TUS'],
[]]}
df1 = pd.DataFrame(dic)
我的目标是为每一行插入 A
的元素作为列 C
中包含的列表的第一项.同时我想将 B
的元素设置为 C
中包含的列表的最后一项.
My goal is to insert for each row the element of A
as first item of the list contained in column C
. At the same time I want to set the element of B
as last item of the list contained in C
.
我能够通过使用以下代码行来实现我的目标:
I was able to achieve my goal by using the following lines of code:
for index, row in df1.iterrows():
try:
row['C'].insert(0,row['A'])
row['C'].append(row['B'])
except:
pass
是否有更优雅、更有效的方法来实现我的目标,也许是使用一些 Pandas 功能?我想尽可能避免 for 循环.
Is there a more elegant and efficient way to achieve my goal maybe using some Pandas function? I would like to avoid for loops possibly.
解决方案
一个好的一般规则是尽可能避免使用 apply
和 axis=1
作为迭代在行上是昂贵的
A good general rule is to avoid using apply
with axis=1
if at all possible as iterating over the rows is expenisve
您可以使用 map
将 A 列和 B 列中的每个元素转换为列表,然后对各行求和.
You can convert each element in columns A and B to a list with map
and then sum across the rows.
df1['A'] = df1.A.map(lambda x: [x])
df1['B'] = df1.B.map(lambda x: [x])
df1.sum(1)
CPU times: user 3.07 s, sys: 207 ms, total: 3.27 s
替代方法是使用轴=1 的 apply
在我的计算机上运行 100 万行时慢 15 倍
The alternative is to use apply
with axis=1 which ran 15 times slower on my computer on 1 million rows
df1.apply(lambda x: [x['A']] + x['C'] + [x['B']], 1)
CPU times: user 48.5 s, sys: 119 ms, total: 48.6 s
相关文章