从 Pandas 列中添加和减去的 for 循环
问题描述
所以我有这个 df
SUPPLIER PRODUCTID STOREID BALANCE AVG_SALES TO_SHIP
SUP1 P1 STR1 50 5 18
SUP1 P1 STR2 6 7 18
SUP1 P1 STR3 74 4 18
SUP2 P4 STR1 35 3 500
SUP2 P4 STR2 5 4 500
SUP2 P4 STR3 54 7 500
始终按供应商和产品 ID 分组.TO_SHIP 列对于该组是唯一的.因此,例如,我有 18 个产品要发送给带有 P1 的 SUP1.然后我添加新列:
It's always grouped by Supplier and product ID. The TO_SHIP column is unique for the group. So for example, I have 18 products for that SUP1 with P1 to send. Then I add new columns:
- 计算 Wk_bal = (BALANCE/AVG_SALES)
- 按供应商id-productid 组对 Wk_bal 进行排名
- 组的最低 Wk_bal:SEND_PKGS = +1
- 然后再次计算 Wk_bal 但添加 pkg sent = ((BALANCE+SEND_PKGS)/AVG_SALES)
- 如此循环,直到所有 TO_SHIP 都已分发到最需要的商店
可视化运行:
第一个输出(计算wk_bal,然后发送1 pkg到最低):
First output (calculate wk_bal, then send 1 pkg to the lowest):
SUPPLIER PRODUCTID STOREID BALANCE AVG_SALES TO_SHIP Wk_Bal SEND_PKGS
SUP1 P1 STR1 50 5 18 10 0
SUP1 P1 STR2 6 4 18 1.5 1
SUP1 P1 STR3 8 4 18 2 0
SUP2 P4 STR1 35 3 500 11.67 0
SUP2 P4 STR2 5 4 500 1.25 1
SUP2 P4 STR3 54 7 500 7.71 0
第二个输出(计算更新的wk_bal,发送一个 pkg到最低):
Second output (calculate updated wk_bal, send one pkg to lowest):
SUPPLIER PRODUCTID STOREID BALANCE AVG_SALES TO_SHIP Wk_Bal SEND_PKGS
SUP1 P1 STR1 50 5 17 10 0
SUP1 P1 STR2 8 4 17 1.75 2
SUP1 P1 STR3 8 4 17 2 0
SUP2 P4 STR1 35 3 499 11.67 0
SUP2 P4 STR2 7 4 499 1.5 2
SUP2 P4 STR3 54 7 499 7.71 0
以此类推……直到剩下 to_ship 为止,计算-排名-给一个 pkg.这个过程的原因是我想确保 wk_balance 最低的商店首先拿到包.(还有很多其他原因)
And so on...so until there is to_ship left, calculate-rank-give one pkg. The reason for this process is I want to make sure that the store with the lowest wk_balance get the package first. (and there's a lot of other reasons why)
我最初是在 SQL 上构建的,但由于复杂性,我转向了 python.不幸的是,我的 python 在提出具有多个条件的循环方面并不是很好,尤其是在 pandas df 上.到目前为止,我已经尝试过(但失败了):
I initially built this on SQL, but with the complexity I moved to python. Unfortunately my python isn't very good in coming up with loops with several conditions esp on pandas df. So far I've tried (and failed):
df['Wk_Bal'] = 0
df['TO_SHIP'] = 0
for i in df.groupby(["SUPPLIER", "PRODUCTID"])['TO_SHIP']:
if i > 0:
df['Wk_Bal'] = df['BALANCE'] / df['AVG_SALES']
df['TO_SHIP'] = df.groupby(["SUPPLIER", "PRODUCTID"])['TO_SHIP']-1
df['SEND_PKGS'] = + 1
df['BALANCE'] = + 1
else:
df['TO_SHIP'] = 0
我怎样才能做得更好?
解决方案
希望我已经理解了您的所有要求.这是您的原始数据:
Hopefully I've understood all of your requirements. Here is your original data:
df = pd.DataFrame({'SUPPLIER': ['SUP1', 'SUP1', 'SUP1', 'SUP2', 'SUP2', 'SUP2'],
'PRODUCTID': ['P1', 'P1', 'P1', 'P4', 'P4', 'P4'],
'STOREID': ['STR1', 'STR2', 'STR3', 'STR1', 'STR2', 'STR3'],
'BALANCE': [50, 6, 74, 35, 5, 54],
'AVG_SALES': [5, 4, 4, 3, 4, 7],
'TO_SHIP': [18, 18, 18, 500, 500, 500]})
这是我的方法:
df['SEND_PKGS'] = 0
df['Wk_bal'] = df['BALANCE'] / df['AVG_SALES']
while (df['TO_SHIP'] != 0).any():
lowest_idx = df[df['TO_SHIP'] > 0].groupby(["SUPPLIER", "PRODUCTID"])['Wk_bal'].idxmin()
df.loc[lowest_idx, 'SEND_PKGS'] += 1
df['Wk_bal'] = (df['BALANCE'] + df['SEND_PKGS']) / df['AVG_SALES']
df.loc[df['TO_SHIP'] > 0, 'TO_SHIP'] -= 1
我继续更新 df
直到 TO_SHIP
列全为零.然后我增加 SEND_PKGS
对应于每个组的最低 Wk_bal
.然后更新 Wk_bal
并减少任何非零 TO_SHIP
列.
I continue updating df
until the TO_SHIP
column is all zero. Then I increment SEND_PKGS
which correspond to the lowest Wk_bal
of each group. Then update Wk_bal
and decrement any non-zero TO_SHIP
columns.
我最终得到:
SUPPLIER PRODUCTID STOREID BALANCE AVG_SALES TO_SHIP SEND_PKGS Wk_bal
0 SUP1 P1 STR1 50 5 0 0 10.000000
1 SUP1 P1 STR2 6 4 0 18 6.000000
2 SUP1 P1 STR3 74 4 0 0 18.500000
3 SUP2 P4 STR1 35 3 0 92 42.333333
4 SUP2 P4 STR2 5 4 0 165 42.500000
5 SUP2 P4 STR3 54 7 0 243 42.428571
在有多个Wk_bal
最小值的情况下,我们可以根据最小值选择AVG_SALES
:
In the case of multiple Wk_bal
minimums, we can choose based on the minimum AVG_SALES
:
def find_min(x):
num_mins = x["Wk_bal"].loc[x["Wk_bal"] == x["Wk_bal"].min()].shape[0]
if num_mins == 1:
return(x["Wk_bal"].idxmin())
else:
min_df = x.loc[x["Wk_bal"] == x["Wk_bal"].min()]
return(min_df["AVG_SALES"].idxmin())
然后,或多或少和以前一样:
Then, more or less as before:
df['SEND_PKGS'] = 0
df['Wk_bal'] = df['BALANCE'] / df['AVG_SALES']
while (df['TO_SHIP'] != 0).any():
lowest_idx = df[df['TO_SHIP'] > 0].groupby(["SUPPLIER", "PRODUCTID"])[['Wk_bal', 'AVG_SALES']].apply(find_min)
df.loc[lowest_idx, 'SEND_PKGS'] += 1
df['Wk_bal'] = (df['BALANCE'] + df['SEND_PKGS']) / df['AVG_SALES']
df.loc[df['TO_SHIP'] > 0, 'TO_SHIP'] -= 1
相关文章