Python Pandas 使用 NaN 值写入 sql

2021-11-20 00:00:00 python pandas sql mysql

我正在尝试从 ascii 读取数百个表,然后将它们写入 mySQL.使用 Pandas 似乎很容易,但我遇到了一个对我来说没有意义的错误:

I'm trying to read a few hundred tables from ascii and then write them to mySQL. It seems easy to do with Pandas but I hit an error that doesn't make sense to me:

我有一个 8 列的数据框.这是列列表/索引:

I have a data frame of 8 columns. Here is the column list/index:

metricDF.columns

Index([u'FID', u'TYPE', u'CO', u'CITY', u'LINENO', u'SUBLINE', u'VALUE_010', u'VALUE2_015'], dtype=object)

然后我使用 to_sql 将数据附加到 mySQL

I then use to_sql to append the data up to mySQL

metricDF.to_sql(con=con, name=seqFile, if_exists='append', flavor='mysql')

我收到一个关于列是nan"的奇怪错误:

I get a strange error about a column being "nan":

OperationalError: (1054, "Unknown column 'nan' in 'field list'")

如您所见,我的所有列都有名称.我意识到开发中出现了 mysql/sql 对写作的支持,所以也许这就是原因?如果是这样,是否有解决方法?任何建议将不胜感激.

As you can see all my columns have names. I realize mysql/sql support for writing appears in development so perhaps that's the reason? If so is there a work around? Any suggestions would be greatly appreciated.

推荐答案

更新:从pandas 0.15开始,to_sql支持写入NaN值(它们将在数据库中写为 NULL),因此不再需要下面描述的解决方法(请参阅 https://github.com/pydata/pandas/pull/8208).
Pandas 0.15 将于 10 月发布,该功能已合并到开发版中.

Update: starting with pandas 0.15, to_sql supports writing NaN values (they will be written as NULL in the database), so the workaround described below should not be needed anymore (see https://github.com/pydata/pandas/pull/8208).
Pandas 0.15 will be released in coming October, and the feature is merged in the development version.

这可能是由于表中的 NaN 值造成的,这是目前 Pandas sql 函数不能很好地处理 NaN 的一个已知缺点(https://github.com/pydata/pandas/issues/2754,https://github.com/pydata/pandas/issues/4199)

This is probably due to NaN values in your table, and this is a known shortcoming at the moment that the pandas sql functions don't handle NaNs well (https://github.com/pydata/pandas/issues/2754, https://github.com/pydata/pandas/issues/4199)

作为目前的解决方法(对于 Pandas 0.14.1 及更低版本),您可以手动将 nan 值转换为 None :

As a workaround at this moment (for pandas versions 0.14.1 and lower), you can manually convert the nan values to None with:

df2 = df.astype(object).where(pd.notnull(df), None)

然后将数据帧写入sql.然而,这会将所有列转换为对象 dtype.因此,您必须基于原始数据框创建数据库表.例如,如果您的第一行不包含 NaNs:

and then write the dataframe to sql. This however converts all columns to object dtype. Because of this, you have to create the database table based on the original dataframe. Eg if your first row does not contain NaNs:

df[:1].to_sql('table_name', con)
df2[1:].to_sql('table_name', con, if_exists='append')

相关文章