TypeError:按多列分组时,无法将bool转换为numpy.ndarray"

2022-02-26 00:00:00 python pandas dataframe pandas-groupby

问题描述

我想按两列对数据帧进行分组,以汇总每家商店的月平均销售额。

数据(fact pandas 数据帧):

store_id    sku_id  date    quantity    city    city    category    month
0   354 31253   2017-08-08  1   Paris   Paris   Shirt   8
1   354 31253   2017-08-19  1   Paris   Paris   Shirt   8
2   354 31258   2017-07-30  1   Paris   Paris   Shirt   7
3   354 277171  2017-09-28  1   Paris   Paris   Shirt   9
4   174 295953  2017-08-16  1   London  London  Shirt   8

基于store_idmonth的分组只能正常工作,但是当我尝试同时按store_idmonth分组时,我得到:

groupby_month = fact['quantity'].groupby(fact['store_id', 'month'])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-169-a8cffb72ab7c> in <module>
----> 1 groupby_month = fact['quantity'].groupby(fact['store_id', 'month'])
      2 
      3 

D:Anaconda3libsite-packagespandascoreframe.py in __getitem__(self, key)
   2925             if self.columns.nlevels > 1:
   2926                 return self._getitem_multilevel(key)
-> 2927             indexer = self.columns.get_loc(key)
   2928             if is_integer(indexer):
   2929                 indexer = [indexer]

D:Anaconda3libsite-packagespandascoreindexesase.py in get_loc(self, key, method, tolerance)
   2655                                  'backfill or nearest lookups')
   2656             try:
-> 2657                 return self._engine.get_loc(key)
   2658             except KeyError:
   2659                 return self._engine.get_loc(self._maybe_cast_indexer(key))

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine._get_loc_duplicates()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine._maybe_get_bool_indexer()

TypeError: Cannot convert bool to numpy.ndarray

解决方案

首先检查索引标签和列

fact.index 
fact.columns

如果需要将索引转换为列,请使用:

使用:

fact.reset_index()

然后您可以使用:

fact.groupby(['store_id', 'month'])['quantity'].mean()

输出:

store_id  month
174       8        1
354       7        1
          8        1
          9        1
Name: quantity, dtype: int64

或更好:

fact['mean']=fact.groupby(['store_id', 'month'])['quantity'].transform('mean')
print(fact)
   store_id  sku_id        date  quantity    city  city.1 category  month  
0       354   31253  2017-08-08         1   Paris   Paris    Shirt      8   
1       354   31253  2017-08-19         1   Paris   Paris    Shirt      8   
2       354   31258  2017-07-30         1   Paris   Paris    Shirt      7   
3       354  277171  2017-09-28         1   Paris   Paris    Shirt      9   
4       174  295953  2017-08-16         1  London  London    Shirt      8   

   mean  
0     1  
1     1  
2     1  
3     1  
4     1  

相关文章