Python Pandas - 使用 to_sql 以块的形式写入大型数据帧
我正在使用 Pandas 的 to_sql
函数写入 MySQL,由于大帧大小(1M 行,20 列)导致超时.
I'm using Pandas' to_sql
function to write to MySQL, which is timing out due to large frame size (1M rows, 20 columns).
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_sql.html
有没有更正式的方法来分块数据并在块中写入行?我已经编写了自己的代码,这似乎有效.不过,我更喜欢官方解决方案.谢谢!
Is there a more official way to chunk through the data and write rows in blocks? I've written my own code, which seems to work. I'd prefer an official solution though. Thanks!
def write_to_db(engine, frame, table_name, chunk_size):
start_index = 0
end_index = chunk_size if chunk_size < len(frame) else len(frame)
frame = frame.where(pd.notnull(frame), None)
if_exists_param = 'replace'
while start_index != end_index:
print "Writing rows %s through %s" % (start_index, end_index)
frame.iloc[start_index:end_index, :].to_sql(con=engine, name=table_name, if_exists=if_exists_param)
if_exists_param = 'append'
start_index = min(start_index + chunk_size, len(frame))
end_index = min(end_index + chunk_size, len(frame))
engine = sqlalchemy.create_engine('mysql://...') #database details omited
write_to_db(engine, frame, 'retail_pendingcustomers', 20000)
推荐答案
更新:此功能已合并到 pandas master 中,并将在 0.15(可能在 9 月底)发布,感谢 @artemyk!请参阅 https://github.com/pydata/pandas/pull/8062
Update: this functionality has been merged in pandas master and will be released in 0.15 (probably end of september), thanks to @artemyk! See https://github.com/pydata/pandas/pull/8062
所以从 0.15 开始,您可以指定 chunksize
参数,例如简单地做:
So starting from 0.15, you can specify the chunksize
argument and e.g. simply do:
df.to_sql('table', engine, chunksize=20000)
相关文章