将数据框转换为元组列表字典

2022-01-20 00:00:00 python pandas dataframe dictionary tuples

问题描述

我有一个如下所示的数据框

I have a dataframe that looks like the following

user item 0 b80344d063b5ccb3212f76538f3d9e43d87dca9e The Cove - Jack Johnson 1 b80344d063b5ccb3212f76538f3d9e43d87dca9e Entre Dos Aguas - Paco De Lucia 2 b80344d063b5ccb3212f76538f3d9e43d87dca9e Stronger - Kanye West 3 b80344d063b5ccb3212f76538f3d9e43d87dca9e Constellations - Jack Johnson 4 b80344d063b5ccb3212f76538f3d9e43d87dca9e Learn To Fly - Foo Fighters rating 0 1 1 2 2 1 3 1 4 1

并想实现如下结构:

dict-> list of tuples user-> (item, rating) b80344d063b5ccb3212f76538f3d9e43d87dca9e -> list((The Cove - Jack Johnson, 1), ... , )

我能做到:

item_set = dict((user, set(items)) for user, items in data.groupby('user')['item'])

但这只会让我半途而废.如何从 groupby 中获取相应的评分"值?

But that only gets me halfways. How do I get the corresponding "rating" value from the groupby?

解决方案

设置user为索引，使用df.apply转换成元组，使用分组索引df.groupby(level=0) 并使用 dfGroupBy.agg 获取列表并使用 df.to_dict 转换为字典:

Set user as index, convert to tuple using df.apply, groupby index using df.groupby(level=0) and get a list using dfGroupBy.agg and convert to dictionary using df.to_dict:

In [1417]: df Out[1417]: user item 0 b80344d063b5ccb3212f76538f3d9e43d87dca9e The Cove - Jack Johnson 1 b80344d063b5ccb3212f76538f3d9e43d87dca9e Entre Dos Aguas - Paco De Lucia 2 b80344d063b5ccb3212f76538f3d9e43d87dca9e Stronger - Kanye West 3 b80344d063b5ccb3212f76538f3d9e43d87dca9e Constellations - Jack Johnson 4 b80344d063b5ccb3212f76538f3d9e43d87dca9e Learn To Fly - Foo Fighters rating 0 1 1 2 2 2 3 2 4 2 In [1418]: df.set_index('user').apply(tuple, 1) .groupby(level=0).agg(lambda x: list(x.values)) .to_dict() Out[1418]: {'b80344d063b5ccb3212f76538f3d9e43d87dca9e': [('The Cove - Jack Johnson', 1), ('Entre Dos Aguas - Paco De Lucia', 2), ('Stronger - Kanye West', 2), ('Constellations - Jack Johnson', 2), ('Learn To Fly - Foo Fighters', 2)]}

相关文章