在对 pandas 的两列进行分组后应用合并
问题描述
请帮帮我。 我有一个类似
的数据框| | ID | Result | measurement_1 | measurement_2 | measurement_3 | measurement_4 | measurement_5 | start_time | end-time |
|----+-------+----------+-----------------+-----------------+-----------------+-----------------+-----------------+----------------------------+----------------------------|
| 0 | 12345 | nan | nan | nan | nan | nan | nan | 2020-10-20 21:24:03.390000 | 2020-10-20 23:46:36.990000 |
| 1 | 12345 | nan | nan | nan | nan | nan | nan | 2020-10-21 04:36:03.390000 | 2020-10-21 06:58:36.990000 |
| 2 | 12345 | nan | 49584 | 2827 | nan | nan | nan | 2020-10-21 09:24:03.390000 | 2020-10-21 11:46:36.990000 |
| 3 | 12345 | nan | nan | nan | 3940 | nan | nan | 2020-10-21 14:12:03.390000 | 2020-10-21 16:34:36.990000 |
| 4 | 12345 | nan | nan | nan | nan | nan | nan | 2020-10-21 21:24:03.390000 | 2020-10-21 23:46:36.990000 |
| 5 | 12345 | nan | nan | nan | nan | nan | nan | 2020-10-22 02:40:51.390000 | 2020-10-22 05:03:24.990000 |
| 6 | 12345 | nan | nan | nan | nan | nan | nan | 2020-10-22 08:26:27.390000 | 2020-10-22 10:49:00.990000 |
| 7 | 12345 | Pass | nan | nan | nan | 392 | 304 | 2020-10-22 14:12:03.390000 | 2020-10-22 16:34:36.990000 |
| 8 | 12346 | nan | nan | nan | nan | nan | nan | 2020-10-22 19:57:39.390000 | 2020-10-22 22:20:12.990000 |
| 9 | 12346 | nan | 22839 | 4059 | nan | nan | nan | 2020-10-23 01:43:15.390000 | 2020-10-23 04:05:48.990000 |
| 10 | 12346 | nan | nan | nan | 4059 | nan | nan | 2020-10-23 07:28:51.390000 | 2020-10-23 09:51:24.990000 |
| 11 | 12346 | nan | nan | nan | nan | nan | nan | 2020-10-23 13:14:27.390000 | 2020-10-23 15:37:00.990000 |
| 12 | 12346 | nan | nan | nan | nan | nan | nan | 2020-10-23 19:00:03.390000 | 2020-10-23 21:22:36.990000 |
| 13 | 12346 | nan | nan | nan | nan | nan | nan | 2020-10-24 00:45:39.390000 | 2020-10-24 03:08:12.990000 |
| 14 | 12346 | Fail | nan | nan | nan | 2938 | 495 | 2020-10-24 06:31:15.390000 | 2020-10-24 08:53:48.990000 |
| 15 | 12345 | nan | nan | nan | nan | nan | nan | 2020-10-24 12:16:51.390000 | 2020-10-24 14:39:24.990000 |
| 16 | 12345 | nan | 62839 | 1827 | nan | nan | nan | 2020-10-24 18:02:27.390000 | 2020-10-24 20:25:00.990000 |
| 17 | 12345 | nan | nan | nan | 2726 | nan | nan | 2020-10-24 23:48:03.390000 | 2020-10-25 02:10:36.990000 |
| 18 | 12345 | nan | nan | nan | nan | nan | nan | 2020-10-25 05:33:39.390000 | 2020-10-25 07:56:12.990000 |
| 19 | 12345 | Fail | nan | nan | nan | nan | 1827 | 2020-10-25 11:19:15.390000 | 2020-10-25 13:41:48.990000 |
+----+-------+----------+-----------------+-----------------+-----------------+-----------------+-----------------+----------------------------+----------------------------+
and want my output to look like
+----+-------+----------+-----------------+-----------------+-----------------+-----------------+-----------------+----------------------------+----------------------------+
| | ID | Result | measurement_1 | measurement_2 | measurement_3 | measurement_4 | measurement_5 | start_time | end-time |
|----+-------+----------+-----------------+-----------------+-----------------+-----------------+-----------------+----------------------------+----------------------------|
| 0 | 12345 | Pass | 49584 | 2827 | 3940 | 392 | 304 | 2020-10-20 21:24:03.390000 | 2020-10-22 16:34:36.990000 |
| 1 | 12346 | Fail | 22839 | 4059 | 4059 | 2938 | 495 | 2020-10-22 19:57:39.390000 | 2020-10-24 08:53:48.990000 |
| 2 | 12345 | Fail | 62839 | 1827 | 2726 | nan | 1827 | 2020-10-24 12:16:51.390000 | 2020-10-23 13:41:48.990000 |
+----+-------+----------+-----------------+-----------------+-----------------+-----------------+-----------------+----------------------------+----------------------------+
so far I am able to group the cols on `ID` and `Result`. Now want to apply the Coalesce to it (newDf)
df = pd.read_excel("Test_Coalesce.xlsx")
newDf = df.groupby(['ID','Result'])
newDf.all().reset_index()
解决方案
看起来您希望按ID
的连续块分组。如果是:
blocks = df['ID'].ne(df['ID'].shift()).cumsum()
agg_dict = {k:'first' if k != 'end-time' else 'last'
for k in df.columns}
df.groupby(blocks).agg(agg_dict)
相关文章