尝试在 DataFrame.pivot 中使用多索引时出现 ValueError
问题描述
我已阅读 pandas: how to run a pivot with a multi-index? 但它无法解决我的问题.
I have read pandas: how to run a pivot with a multi-index? but it could not solve my problem.
给定下面的数据框:
import pandas as pd
df = pd.DataFrame({
"date": ["20180920"] * 6,
"id": ["A123456789"] * 6,
"test": ["a", "b", "c", "d", "e", "f"],
"result": [70, 90, 110, "(-)", "(+)", 0.3],
"ref": ["< 90", "70 - 100", "100 - 120", "(-)", "(-)", "< 1"]
})
我想扩展 test
列,使用 result
中的值,并忽略 ref
.换句话说,期望的输出是这样的:
I'd like to spread the test
column, use the values in result
, and ignore ref
. In other words, the desired output is like:
date id a b c d e f
0 20180920 A123456789 70 90 110 (-) (+) 0.3
所以我尝试了 df.pivot(index=["date", "id"], columns="test", values="result")
,但它失败了 ValueError: 传递值的长度是 6,索引意味着 2.我认为它与如果传递一个数组,它必须与数据的长度相同"有关.在 pivot_table
文档中,但我就是不明白这是什么意思.有人可以详细说明一下吗?
So I tried df.pivot(index=["date", "id"], columns="test", values="result")
, but it failed with ValueError: Length of passed values is 6, index implies 2. I think it is related to "If an array is passed, it must be the same length as the data." in pivot_table
documentation, but I just don't understand what it means. Can someone elaborate that please?
顺便说一句,我终于通过 df.drop(columns="ref").set_index(["date", "id", "test"]).unstack(level=2)代码>.这是唯一正确的方法吗?
BTW, I finally get my desired output by df.drop(columns="ref").set_index(["date", "id", "test"]).unstack(level=2)
. Is it the only correct way?
解决方案
pivot
可以用,但是代码有点疯狂:
pivot
is possible use, but code is a bit crazy:
df = (df.set_index(["date", "id"])
.pivot(columns="test")['result']
.reset_index()
.rename_axis(None, axis=1)
)
print (df)
date id a b c d e f
0 20180920 A123456789 70 90 110 (-) (+) 0.3
关于文档,您可以查看 issue 16578 和 pandas0.24.0 应该是改进的文档或者也许是对使用 MultiIndex
的新支持?issue 8160 也有点不清楚.
About docs you can check issue 16578 and in pandas 0.24.0 should be improved docs or maybe new support for working with MultiIndex
? A bit unclear also from issue 8160.
在我看来,您的最后一个代码应该只改进一点(与@Vaishali 相同的解决方案) - 通过在 set_index
和 之后选择来创建
移除 Series with MultiIndex
unstacklevel
,因为默认情况下是 unstacked 最后一层 MultiIndex - Series.unstack
:
In my opinion your last code should be only a bit improved (same solution like @Vaishali) - create Series with MultiIndex
by selecting after set_index
and for unstack
remove level
, because by default is unstacked last level of MultiIndex - Series.unstack
:
level : int, string, 或者这些的列表,默认最后一级
level : int, string, or list of these, default last level
要取消堆叠的关卡,可以传递关卡名称
Level(s) to unstack, can pass level name
#all 3 return same output
df.set_index(["date", "id", "test"])['result'].unstack()
df.set_index(["date", "id", "test"])['result'].unstack(level=2)
df.set_index(["date", "id", "test"])['result'].unstack(level=-1)
相关文章