将 pandas DataFrame 旋转为正确的格式:`DataError: No numeric types to aggregate`

2022-01-22 00:00:00 python pandas dataframe pivot

问题描述

这是我想要操作的 pandas DataFrame:

Here is a pandas DataFrame I would like to manipulate:

import pandas as pd data = {"grouping": ["item1", "item1", "item1", "item2", "item2", "item2", "item2", ...], "labels": ["A", "B", "C", "A", "B", "C", "D", ...], "count": [5, 1, 8, 3, 731, 189, 9, ...]} df = pd.DataFrame(data) print(df) >>> grouping labels count 0 item1 A 5 1 item1 B 1 2 item1 C 8 3 item2 A 3 4 item2 B 731 5 item2 C 189 6 item2 D 9 7 ... ... ....

我想将此数据框展开"为以下格式:

I would like to "unfold" this dataframe into the following format:

grouping A B C D item1 5 1 8 3 item2 3 731 189 9 .... ........

如何做到这一点?我认为这会起作用:

How would one do this? I would think that this would work:

pd.pivot_table(df,index=["grouping", "labels"]

但我收到以下错误:

DataError: No numeric types to aggregate

解决方案

有四种惯用的 pandas 方法可以做到这一点.

There are four idiomatic pandas ways to do this.

分组列之间没有重复.不需要聚合
枢轴
set_index

数据透视表
分组方式

枢轴

df.pivot('grouping', 'labels', 'count')

set_index

df.set_index(['grouping', 'labels'])['count'].unstack()

pivot_table

df.pivot_table('count', 'grouping', 'labels')

groupby

df.groupby(['grouping', 'labels'])['count'].sum().unstack()

全部收益

labels A B C D grouping item1 5.0 1.0 8.0 NaN item2 3.0 731.0 189.0 9.0

时机

使用 groupby、set_index 或 pivot_table 方法，您可以使用 fill_value=0
With the groupby, set_index, or pivot_table approach, you can easily fill in missing values with fill_value=0 df.pivot_table('count', 'grouping', 'labels', fill_value=0) df.groupby(['grouping', 'labels'])['count'].sum().unstack(fill_value=0) df.set_index(['grouping', 'labels'])['count'].sum().unstack(fill_value=0) 全部收益 labels A B C D grouping item1 5 1 8 0 item2 3 731 189 9 <小时> 关于groupby的其他想法因为我们不需要任何聚合.如果我们想使用 groupby，我们可以通过使用影响较小的聚合器来最小化隐式聚合的影响. Because we don't require any aggregation. If we wanted to use groupby, we can minimize the impact of the implicit aggregation by utilizing a less impactful aggregator. df.groupby(['grouping', 'labels'])['count'].max().unstack() 或 df.groupby(['grouping', 'labels'])['count'].first().unstack() 定时groupby

相关文章