如何在 Pandas 中使用总计(边距)创建数据透视?
问题描述
例如,我有一个非常简单的数据框:
For example, I have a very simple data frame:
values = pd.Series(i for i in range(5))
rows = pd.Series(['a', 'b', 'a', 'a', 'b'])
columns = pd.date_range('20130101',periods=5)
df = pd.DataFrame({'values': values, 'rows': rows, 'columns': columns})
以及它的外观:
columns rows values
0 2013-01-01 00:00:00 a 0
1 2013-01-02 00:00:00 b 1
2 2013-01-03 00:00:00 a 2
3 2013-01-04 00:00:00 a 3
4 2013-01-05 00:00:00 b 4
当我尝试在没有边距(总计)的情况下进行支点时,我取得了成功:
I have success when I try to make pivot without margins(totals):
pivot = pd.pivot_table(
data=df,
rows='rows',
cols='columns',
values='values',
margins=False
)
它看起来如何:
columns 2013-01-01 2013-01-02 2013-01-03 2013-01-04 2013-01-05
rows
a 0 NaN 2 3 NaN
b NaN 1 NaN NaN 4
但如果我想创建带边距的枢轴:
but if I want create pivot with margins:
pivot = pd.pivot_table(
data=df,
rows='rows',
cols='columns',
values='values',
margins=True
)
我收到错误:
Traceback (most recent call last):
File "./test.py", line 17, in <module>
margins=True
File "/usr/local/lib/python2.6/dist-packages/pandas/tools/pivot.py", line 135, in pivot_table
cols=cols, aggfunc=aggfunc)
File "/usr/local/lib/python2.6/dist-packages/pandas/tools/pivot.py", line 174, in _add_margins
piece[all_key] = margin[key]
File "/usr/local/lib/python2.6/dist-packages/pandas/core/frame.py", line 2119, in __setitem__
self._set_item(key, value)
File "/usr/local/lib/python2.6/dist-packages/pandas/core/frame.py", line 2166, in _set_item
NDFrame._set_item(self, key, value)
File "/usr/local/lib/python2.6/dist-packages/pandas/core/generic.py", line 679, in _set_item
self._data.set(key, value)
File "/usr/local/lib/python2.6/dist-packages/pandas/core/internals.py", line 1781, in set
self.insert(len(self.items), item, value)
File "/usr/local/lib/python2.6/dist-packages/pandas/core/internals.py", line 1801, in insert
new_items = self.items.delete(loc)
File "/usr/local/lib/python2.6/dist-packages/pandas/core/index.py", line 2610, in delete
new_labels = [np.delete(lab, loc) for lab in self.labels]
File "/usr/lib/pymodules/python2.6/numpy/lib/function_base.py", line 3339, in delete
"invalid entry")
ValueError: invalid entry
- Python 版本:2.6.8
- 熊猫版本:0.12.0
- 系统:Debian Linux 3.2.0 内核,64 位.
谢谢.
解决方案
我可以重现您的问题.这听起来像一个错误.至少我发现重新分配列名可以解决这个问题:
I can reproduce your issue. It sounds like a bug. At least I found that reassigning the column names workaround the issue:
df.columns = ['rows', 'columns', 'values']
pd.pivot_table(
...: data=df,
...: rows='rows',
...: cols='columns',
...: values='values',
...: margins=True)
Out[34]:
columns a b All
rows
2013-01-01 00:00:00 0.000000 NaN 0
2013-01-02 00:00:00 NaN 1.0 1
2013-01-03 00:00:00 2.000000 NaN 2
2013-01-04 00:00:00 3.000000 NaN 3
2013-01-05 00:00:00 NaN 4.0 4
All 1.666667 2.5 2
相关文章