在维护列数据类型的同时将行插入 pandas DataFrame

2022-01-22 00:00:00 python pandas dataframe append

问题描述

在保持列数据类型的同时，将新行插入现有 pandas DataFrame 的最佳方法是什么，同时为未指定的列提供用户定义的填充值?这是一个例子:

What's the best way to insert new rows into an existing pandas DataFrame while maintaining column data types and, at the same time, giving user-defined fill values for columns that aren't specified? Here's an example:

df = pd.DataFrame({ 'name': ['Bob', 'Sue', 'Tom'], 'age': [45, 40, 10], 'weight': [143.2, 130.2, 34.9], 'has_children': [True, True, False] })

假设我想添加一条只传递 name 和 age 的新记录.为了维护数据类型，我可以从 df 复制行，修改值，然后将 df 附加到副本，例如

Assume that I want to add a new record passing just name and age. To maintain data types, I can copy rows from df, modify values and then append df to the copy, e.g.

columns = ('name', 'age') copy_df = df.loc[0:0, columns].copy() copy_df.loc[0, columns] = 'Cindy', 42 new_df = copy_df.append(df, sort=False).reset_index(drop=True)

但这会将 bool 列转换为对象.

But that converts the bool column to an object.

这是一个非常老套的解决方案，感觉不是这样做的正确方法":

Here's a really hacky solution that doesn't feel like the "right way" to do this:

columns = ('name', 'age') copy_df = df.loc[0:0].copy() missing_remap = { 'int64': 0, 'float64': 0.0, 'bool': False, 'object': '' } for c in set(copy_df.columns).difference(columns)): copy_df.loc[:, c] = missing_remap[str(copy_df[c].dtype)] new_df = copy_df.append(df, sort=False).reset_index(drop=True) new_df.loc[0, columns] = 'Cindy', 42

我知道我一定错过了什么.

I know I must be missing something.

解决方案

如你所见，由于 NaN 是 float，添加 NaN到一个系列可能会导致它被向上转换为 float 或转换为 object.您确定这不是一个理想的结果是正确的.

As you found, since NaN is a float, adding NaN to a series may cause it to be either upcasted to float or converted to object. You are right in determining this is not a desirable outcome.

没有直接的方法.我的建议是将您的输入行数据存储在字典中，并在附加之前将其与默认字典相结合.请注意，这是有效的，因为 pd.DataFrame.append 接受 dict 参数.

There is no straightforward approach. My suggestion is to store your input row data in a dictionary and combine it with a dictionary of defaults before appending. Note that this works because pd.DataFrame.append accepts a dict argument.

在 Python 3.6 中，您可以使用语法 {**d1, **d2} 组合两个字典，并优先选择第二个.

In Python 3.6, you can use the syntax {**d1, **d2} to combine two dictionaries with preference for the second.

default = {'name': '', 'age': 0, 'weight': 0.0, 'has_children': False} row = {'name': 'Cindy', 'age': 42} df = df.append({**default, **row}, ignore_index=True) print(df) age has_children name weight 0 45 True Bob 143.2 1 40 True Sue 130.2 2 10 False Tom 34.9 3 42 False Cindy 0.0 print(df.dtypes) age int64 has_children bool name object weight float64 dtype: object

相关文章