在维护列数据类型的同时将行插入 pandas DataFrame
问题描述
在保持列数据类型的同时,将新行插入现有 pandas DataFrame 的最佳方法是什么,同时为未指定的列提供用户定义的填充值?这是一个例子:
What's the best way to insert new rows into an existing pandas DataFrame while maintaining column data types and, at the same time, giving user-defined fill values for columns that aren't specified? Here's an example:
df = pd.DataFrame({
'name': ['Bob', 'Sue', 'Tom'],
'age': [45, 40, 10],
'weight': [143.2, 130.2, 34.9],
'has_children': [True, True, False]
})
假设我想添加一条只传递 name
和 age
的新记录.为了维护数据类型,我可以从 df
复制行,修改值,然后将 df
附加到副本,例如
Assume that I want to add a new record passing just name
and age
. To maintain data types, I can copy rows from df
, modify values and then append df
to the copy, e.g.
columns = ('name', 'age')
copy_df = df.loc[0:0, columns].copy()
copy_df.loc[0, columns] = 'Cindy', 42
new_df = copy_df.append(df, sort=False).reset_index(drop=True)
但这会将 bool
列转换为对象.
But that converts the bool
column to an object.
这是一个非常老套的解决方案,感觉不是这样做的正确方法":
Here's a really hacky solution that doesn't feel like the "right way" to do this:
columns = ('name', 'age')
copy_df = df.loc[0:0].copy()
missing_remap = {
'int64': 0,
'float64': 0.0,
'bool': False,
'object': ''
}
for c in set(copy_df.columns).difference(columns)):
copy_df.loc[:, c] = missing_remap[str(copy_df[c].dtype)]
new_df = copy_df.append(df, sort=False).reset_index(drop=True)
new_df.loc[0, columns] = 'Cindy', 42
我知道我一定错过了什么.
I know I must be missing something.
解决方案
如你所见,由于 NaN
是 float
,添加 NaN
到一个系列可能会导致它被向上转换为 float
或转换为 object
.您确定这不是一个理想的结果是正确的.
As you found, since NaN
is a float
, adding NaN
to a series may cause it to be either upcasted to float
or converted to object
. You are right in determining this is not a desirable outcome.
没有直接的方法.我的建议是将您的输入行数据存储在字典中,并在附加之前将其与默认字典相结合.请注意,这是有效的,因为 pd.DataFrame.append
接受 dict
参数.
There is no straightforward approach. My suggestion is to store your input row data in a dictionary and combine it with a dictionary of defaults before appending. Note that this works because pd.DataFrame.append
accepts a dict
argument.
在 Python 3.6 中,您可以使用语法 {**d1, **d2}
组合两个字典,并优先选择第二个.
In Python 3.6, you can use the syntax {**d1, **d2}
to combine two dictionaries with preference for the second.
default = {'name': '', 'age': 0, 'weight': 0.0, 'has_children': False}
row = {'name': 'Cindy', 'age': 42}
df = df.append({**default, **row}, ignore_index=True)
print(df)
age has_children name weight
0 45 True Bob 143.2
1 40 True Sue 130.2
2 10 False Tom 34.9
3 42 False Cindy 0.0
print(df.dtypes)
age int64
has_children bool
name object
weight float64
dtype: object
相关文章