使用python通过FTP下载大文件

2022-01-09 00:00:00 python ftp file-transfer

问题描述

我尝试每天从我的服务器下载一个备份文件到我的本地存储服务器，但我遇到了一些问题.

Im trying to download daily a backup file from my server to my local storage server, but i got some problems.

我写了这段代码(去掉了无用的部分，作为电子邮件功能):

I wrote this code (removed the useless parts, as the email function):

import os from time import strftime from ftplib import FTP import smtplib from email.MIMEMultipart import MIMEMultipart from email.MIMEBase import MIMEBase from email.MIMEText import MIMEText from email import Encoders day = strftime("%d") today = strftime("%d-%m-%Y") link = FTP(ftphost) link.login(passwd = ftp_pass, user = ftp_user) link.cwd(file_path) link.retrbinary('RETR ' + file_name, open('/var/backups/backup-%s.tgz' % today, 'wb').write) link.delete(file_name) #delete the file from online server link.close() mail(user_mail, "Download database %s" % today, "Database sucessfully downloaded: %s" % file_name) exit()

我使用 crontab 运行它，例如:

And i run this with a crontab like:

40 23 * * * python /usr/bin/backup-transfer.py >> /var/log/backup-transfer.log 2>&1

它适用于小文件，但它会冻结备份文件(大约 1.7Gb)，下载的文件大约 1.2Gb 然后永远不会增长(我等了大约一天)，并且日志文件是空的.

It works with small files, but with the backups files (about 1.7Gb) it freeze, the downloaded file get about 1.2Gb then never grows up (i waited about a day), and the log file is empty.

有什么想法吗?

ps:我使用的是 Python 2.6.5

p.s: im using Python 2.6.5

解决方案

对不起，如果我回答了我自己的问题，但我找到了解决方案.

Sorry if i answer my own question, but I found the solution.

我尝试了 ftputil 没有成功，所以我尝试了很多方法，最后，这行得通:

I tryed ftputil with no success, so i tryed many way and finally, this works:

def ftp_connect(path): link = FTP(host = 'example.com', timeout = 5) #Keep low timeout link.login(passwd = 'ftppass', user = 'ftpuser') debug("%s - Connected to FTP" % strftime("%d-%m-%Y %H.%M")) link.cwd(path) return link downloaded = open('/local/path/to/file.tgz', 'wb') def debug(txt): print txt link = ftp_connect(path) file_size = link.size(filename) max_attempts = 5 #I dont want death loops. while file_size != downloaded.tell(): try: debug("%s while > try, run retrbinary " % strftime("%d-%m-%Y %H.%M")) if downloaded.tell() != 0: link.retrbinary('RETR ' + filename, downloaded.write, downloaded.tell()) else: link.retrbinary('RETR ' + filename, downloaded.write) except Exception as myerror: if max_attempts != 0: debug("%s while > except, something going wrong: %s file lenght is: %i > %i " % (strftime("%d-%m-%Y %H.%M"), myerror, file_size, downloaded.tell()) ) link = ftp_connect(path) max_attempts -= 1 else: break debug("Done with file, attempt to download m5dsum") [...]

在我的日志文件中我发现:

In my log file i found:

01-12-2011 23.30 - Connected to FTP 01-12-2011 23.30 while > try, run retrbinary 02-12-2011 00.31 while > except, something going wrong: timed out file lenght is: 1754695793 > 1754695793 02-12-2011 00.31 - Connected to FTP Done with file, attempt to download m5dsum

遗憾的是，即使文件已完全下载，我也必须重新连接到 FTP，这在我的 cas 中不是问题，因为我也必须下载 md5sum.

Sadly, i have to reconnect to FTP even if the file has been fully downloaded, that in my cas is not a problem, becose i have to download the md5sum too.

如您所见，我无法检测到超时并重试连接，但是当我超时时，我只是重新连接；如果有人知道如何在不创建新的 ftplib.FTP 实例的情况下重新连接，请告诉我 ;)

As you can see, I'm not been able to detect the timeout and retry the connection, but when i got timeout, I simply reconnect again; If someone know how to reconnect without creating a new ftplib.FTP instance, let me know ;)

相关文章