从 Pandas 列中添加和减去的 for 循环

2022-01-24 00:00:00 python pandas loops iteration if-statement

问题描述

所以我有这个 df

SUPPLIER   PRODUCTID   STOREID   BALANCE   AVG_SALES   TO_SHIP
SUP1       P1          STR1      50        5           18
SUP1       P1          STR2      6         7           18
SUP1       P1          STR3      74        4           18
SUP2       P4          STR1      35        3           500
SUP2       P4          STR2      5         4           500
SUP2       P4          STR3      54        7           500

始终按供应商和产品 ID 分组.TO_SHIP 列对于该组是唯一的.因此，例如，我有 18 个产品要发送给带有 P1 的 SUP1.然后我添加新列:

It's always grouped by Supplier and product ID. The TO_SHIP column is unique for the group. So for example, I have 18 products for that SUP1 with P1 to send. Then I add new columns:

计算 Wk_bal = (BALANCE/AVG_SALES)
按供应商id-productid 组对 Wk_bal 进行排名
组的最低 Wk_bal:SEND_PKGS = +1
然后再次计算 Wk_bal 但添加 pkg sent = ((BALANCE+SEND_PKGS)/AVG_SALES)
如此循环，直到所有 TO_SHIP 都已分发到最需要的商店

可视化运行:

第一个输出(计算wk_bal，然后发送1 pkg到最低):

First output (calculate wk_bal, then send 1 pkg to the lowest):

SUPPLIER PRODUCTID STOREID BALANCE AVG_SALES TO_SHIP Wk_Bal SEND_PKGS SUP1 P1 STR1 50 5 18 10 0 SUP1 P1 STR2 6 4 18 1.5 1 SUP1 P1 STR3 8 4 18 2 0 SUP2 P4 STR1 35 3 500 11.67 0 SUP2 P4 STR2 5 4 500 1.25 1 SUP2 P4 STR3 54 7 500 7.71 0

第二个输出(计算更新的wk_bal，发送一个 pkg到最低):

Second output (calculate updated wk_bal, send one pkg to lowest):

SUPPLIER PRODUCTID STOREID BALANCE AVG_SALES TO_SHIP Wk_Bal SEND_PKGS SUP1 P1 STR1 50 5 17 10 0 SUP1 P1 STR2 8 4 17 1.75 2 SUP1 P1 STR3 8 4 17 2 0 SUP2 P4 STR1 35 3 499 11.67 0 SUP2 P4 STR2 7 4 499 1.5 2 SUP2 P4 STR3 54 7 499 7.71 0

以此类推……直到剩下 to_ship 为止，计算-排名-给一个 pkg.这个过程的原因是我想确保 wk_balance 最低的商店首先拿到包.(还有很多其他原因)

And so on...so until there is to_ship left, calculate-rank-give one pkg. The reason for this process is I want to make sure that the store with the lowest wk_balance get the package first. (and there's a lot of other reasons why)

我最初是在 SQL 上构建的，但由于复杂性，我转向了 python.不幸的是，我的 python 在提出具有多个条件的循环方面并不是很好，尤其是在 pandas df 上.到目前为止，我已经尝试过(但失败了):

I initially built this on SQL, but with the complexity I moved to python. Unfortunately my python isn't very good in coming up with loops with several conditions esp on pandas df. So far I've tried (and failed):

df['Wk_Bal'] = 0 df['TO_SHIP'] = 0 for i in df.groupby(["SUPPLIER", "PRODUCTID"])['TO_SHIP']: if i > 0: df['Wk_Bal'] = df['BALANCE'] / df['AVG_SALES'] df['TO_SHIP'] = df.groupby(["SUPPLIER", "PRODUCTID"])['TO_SHIP']-1 df['SEND_PKGS'] = + 1 df['BALANCE'] = + 1 else: df['TO_SHIP'] = 0

我怎样才能做得更好?

解决方案

希望我已经理解了您的所有要求.这是您的原始数据:

Hopefully I've understood all of your requirements. Here is your original data:

df = pd.DataFrame({'SUPPLIER': ['SUP1', 'SUP1', 'SUP1', 'SUP2', 'SUP2', 'SUP2'], 'PRODUCTID': ['P1', 'P1', 'P1', 'P4', 'P4', 'P4'], 'STOREID': ['STR1', 'STR2', 'STR3', 'STR1', 'STR2', 'STR3'], 'BALANCE': [50, 6, 74, 35, 5, 54], 'AVG_SALES': [5, 4, 4, 3, 4, 7], 'TO_SHIP': [18, 18, 18, 500, 500, 500]})

这是我的方法:

df['SEND_PKGS'] = 0 df['Wk_bal'] = df['BALANCE'] / df['AVG_SALES'] while (df['TO_SHIP'] != 0).any(): lowest_idx = df[df['TO_SHIP'] > 0].groupby(["SUPPLIER", "PRODUCTID"])['Wk_bal'].idxmin() df.loc[lowest_idx, 'SEND_PKGS'] += 1 df['Wk_bal'] = (df['BALANCE'] + df['SEND_PKGS']) / df['AVG_SALES'] df.loc[df['TO_SHIP'] > 0, 'TO_SHIP'] -= 1

我继续更新 df 直到 TO_SHIP 列全为零.然后我增加 SEND_PKGS 对应于每个组的最低 Wk_bal .然后更新 Wk_bal 并减少任何非零 TO_SHIP 列.

I continue updating df until the TO_SHIP column is all zero. Then I increment SEND_PKGS which correspond to the lowest Wk_bal of each group. Then update Wk_bal and decrement any non-zero TO_SHIP columns.

我最终得到:

SUPPLIER PRODUCTID STOREID BALANCE AVG_SALES TO_SHIP SEND_PKGS Wk_bal 0 SUP1 P1 STR1 50 5 0 0 10.000000 1 SUP1 P1 STR2 6 4 0 18 6.000000 2 SUP1 P1 STR3 74 4 0 0 18.500000 3 SUP2 P4 STR1 35 3 0 92 42.333333 4 SUP2 P4 STR2 5 4 0 165 42.500000 5 SUP2 P4 STR3 54 7 0 243 42.428571

在有多个Wk_bal最小值的情况下，我们可以根据最小值选择AVG_SALES:

In the case of multiple Wk_bal minimums, we can choose based on the minimum AVG_SALES:

def find_min(x): num_mins = x["Wk_bal"].loc[x["Wk_bal"] == x["Wk_bal"].min()].shape[0] if num_mins == 1: return(x["Wk_bal"].idxmin()) else: min_df = x.loc[x["Wk_bal"] == x["Wk_bal"].min()] return(min_df["AVG_SALES"].idxmin())

然后，或多或少和以前一样:

Then, more or less as before:

df['SEND_PKGS'] = 0 df['Wk_bal'] = df['BALANCE'] / df['AVG_SALES'] while (df['TO_SHIP'] != 0).any(): lowest_idx = df[df['TO_SHIP'] > 0].groupby(["SUPPLIER", "PRODUCTID"])[['Wk_bal', 'AVG_SALES']].apply(find_min) df.loc[lowest_idx, 'SEND_PKGS'] += 1 df['Wk_bal'] = (df['BALANCE'] + df['SEND_PKGS']) / df['AVG_SALES'] df.loc[df['TO_SHIP'] > 0, 'TO_SHIP'] -= 1

相关文章