遍历 FTP 列表

2022-01-09 00:00:00 python ftp traversal

问题描述

我正在尝试从 FTP 服务器获取所有目录的名称,并将它们以分层顺序存储在多维列表或字典中

I am trying to to get all directories' name from an FTP server and store them in hierarchical order in a multidimensional list or dict

例如,一个包含以下结构的服务器:

So for example, a server that contains the following structure:

/www/
    mysite.com
        images
            png
            jpg

在脚本的末尾,会给我一个列表,例如

at the end of the script, would give me a list such as

['/www/'
  ['mysite.com'
    ['images'
      ['png'],
      ['jpg']
    ]
  ]
]

我尝试过使用这样的递归函数:def 遍历(目录):FTP.dir(目录,遍历)

I have tried using a recursive function like so: def traverse(dir): FTP.dir(dir, traverse)

FTP.dir 以这种格式返回行:

FTP.dir returns lines in this format:

drwxr-xr-x    5 leavesc1 leavesc1     4096 Nov 29 20:52 mysite.com

这样做 line[56:] 只会给我目录名称(mysite.com).我在递归函数中使用它.

so doing line[56:] will give me just the directory name(mysite.com). I use this in the recursive function.

但我无法让它工作.我尝试了许多不同的方法,但无法让它发挥作用.还有很多 FTP 错误(找不到目录 - 这是一个逻辑问题,有时服务器返回意外错误,没有留下日志,我无法调试)

But i cannot get it to work. I've tried many different approaches and can't get it to work. Lots of FTP errors as well (either can't find the directory - which is a logical issue, and sometimes unexpected errors returned by the server, which leaves no log and i can't debug)

底线问题:如何从 FTP 服务器获取分层目录列表?

bottom line question: How to get a hierarchical directory listing from an FTP server?


解决方案

这是一个幼稚而缓慢的实现.它很慢,因为它尝试对每个目录条目进行 CWD 以确定它是目录还是文件,但这有效.可以通过解析 LIST 命令输出来优化它,但这很大程度上依赖于服务器实现.

Here is a naive and slow implementation. It is slow because it tries to CWD to each directory entry to determine if it is a directory or a file, but this works. One could optimize it by parsing LIST command output, but this is strongly server-implementation dependent.

import ftplib

def traverse(ftp, depth=0):
    """
    return a recursive listing of an ftp server contents (starting
    from the current directory)

    listing is returned as a recursive dictionary, where each key
    contains a contents of the subdirectory or None if it corresponds
    to a file.

    @param ftp: ftplib.FTP object
    """
    if depth > 10:
        return ['depth > 10']
    level = {}
    for entry in (path for path in ftp.nlst() if path not in ('.', '..')):
        try:
            ftp.cwd(entry)
            level[entry] = traverse(ftp, depth+1)
            ftp.cwd('..')
        except ftplib.error_perm:
            level[entry] = None
    return level

def main():
    ftp = ftplib.FTP("localhost")
    ftp.connect()
    ftp.login()
    ftp.set_pasv(True)

    print traverse(ftp)

if __name__ == '__main__':
    main()

相关文章