Python Google Drive API - 列出整个驱动文件树

2022-01-10 00:00:00 python google-api google-drive-api

问题描述

我正在构建一个使用 Google 驱动器 API 的 python 应用程序,所以开发效果很好,但是我在检索整个 Google 驱动器文件树时遇到了问题,我需要它有两个目的:

I'm building a python application that uses the Google drive APIs, so fare the development is good but I have a problem to retrieve the entire Google drive file tree, I need that for two purposes:

  1. 检查路径是否存在,所以如果我想在 root/folder1/folder2 下上传 test.txt,我想检查该文件是否已经存在并更新它
  2. 构建一个可视文件浏览器,现在我知道 google 提供了他自己的(我现在不记得名字了,但我知道存在)但我想将文件浏览器限制在特定文件夹中.

现在我有一个获取 Gdrive 根目录的函数,我可以通过递归调用一个列出单个文件夹内容的函数来构建这三个函数,但是它非常慢并且可能会向谷歌发出数千个请求这是不可接受的.

For now I have a function that fetch the root of Gdrive and I can build the three by recursive calling a function that list me the content of a single folder, but it is extremely slow and can potentially make thousand of request to google and this is unacceptable.

这里是获取根的函数:

def drive_get_root():
    """Retrieve a root list of File resources.
       Returns:
         List of dictionaries.
    """
    
    #build the service, the driveHelper module will take care of authentication and credential storage
    drive_service = build('drive', 'v2', driveHelper.buildHttp())
    # the result will be a list
    result = []
    page_token = None
    while True:
        try:
            param = {}
            if page_token:
                param['pageToken'] = page_token
            files = drive_service.files().list(**param).execute()
            #add the files in the list
            result.extend(files['items'])
            page_token = files.get('nextPageToken')
            if not page_token:
                break
        except errors.HttpError, _error:
            print 'An error occurred: %s' % _error
        break
    return result

这里是从文件夹中获取文件的那个

and here the one to get the file from a folder

def drive_files_in_folder(folder_id):
    """Print files belonging to a folder.
       Args:
         folder_id: ID of the folder to get files from.
    """
    #build the service, the driveHelper module will take care of authentication and credential storage
    drive_service = build('drive', 'v2', driveHelper.buildHttp())
    # the result will be a list
    result = []
    #code from google, is working so I didn't touch it
    page_token = None
    while True:
        try:
            param = {}

            if page_token:
                param['pageToken'] = page_token

            children = drive_service.children().list(folderId=folder_id, **param).execute()

            for child in children.get('items', []):
                result.append(drive_get_file(child['id']))

            page_token = children.get('nextPageToken')
            if not page_token:
                break
        except errors.HttpError, _error:
            print 'An error occurred: %s' % _error
            break       
    return result

例如现在要检查文件是否存在,我正在使用这个:

and for example now to check if a file exist I'm using this:

def drive_path_exist(file_path, list = False):
    """
    This is a recursive function to che check if the given path exist
    """

    #if the list param is empty set the list as the root of Gdrive
    if list == False:
        list = drive_get_root()

    #split the string to get the first item and check if is in the root
    file_path = string.split(file_path, "/")

    #if there is only one element in the filepath we are at the actual filename
    #so if is in this folder we can return it
    if len(file_path) == 1:
        exist = False
        for elem in list:
            if elem["title"] == file_path[0]:
                #set exist = to the elem because the elem is a dictionary with all the file info
                exist = elem

        return exist
    #if we are not at the last element we have to keep searching
    else:
        exist = False
        for elem in list:
            #check if the current item is in the folder
            if elem["title"] == file_path[0]:
                exist = True
                folder_id = elem["id"]
                #delete the first element and keep searching
                file_path.pop(0)

        if exist:
            #recursive call, we have to rejoin the filpath as string an passing as list the list
            #from the drive_file_exist function
            return drive_path_exist("/".join(file_path), drive_files_in_folder(folder_id))

知道如何解决我的问题吗?我在这里看到了一些关于溢出的讨论,在一些答案中人们写道这是可能的,但当然没有说如何!

any idea how to solve my problem? I saw a few discussion here on overflow and in some answers people wrote that this is possible but of course the didn't said how!

谢谢


解决方案

为了在您的应用程序中构建树的表示,您需要这样做...

In order to build a representation of a tree in your app, you need to do this ...

  1. 运行驱动器列表查询以检索所有文件夹
  2. 迭代结果数组并检查 parent 属性以构建内存层次结构
  3. 运行第二个 Drive List 查询以获取所有非文件夹(即文件)
  4. 对于返回的每个文件,将其放入内存树中

如果您只是想检查文件夹-B 中是否存在文件-A,则方法取决于名称文件夹-B"是否存在.保证是唯一的.

If you simply want to check if file-A exists in folder-B, the approach depends on whether the name "folder-B" is guaranteed to be unique.

如果它是唯一的,只需对 title='file-A' 执行 FilesList 查询,然后为其每个父项执行 Files Get 并查看其中是否有任何称为folder-B".

If it's unique, just do a FilesList query for title='file-A', then do a Files Get for each of its parents and see if any of them are called 'folder-B'.

您不会说这些文件和文件夹是由您的应用创建的,还是由使用 Google Drive Web 应用的用户创建的.如果您的应用程序是这些文件/文件夹的创建者,那么您可以使用一个技巧将搜索限制为单个根目录.说你有

You don't say if these files and folders are being created by your app, or by the user with the Google Drive Webapp. If your app is the creator of these files/folders there is a trick you can use to restrict your searches to a single root. Say you have

MyDrive/app_root/folder-C/folder-B/file-A

您可以将所有文件夹-C、文件夹-B 和文件-A 设为 app_root 的子项

you can make all of folder-C, folder-B and file-A children of app_root

这样您可以限制所有查询以包含

That way you can constrain all of your queries to include

and 'app_root_id' in parents

注意.此答案的先前版本强调驱动器文件夹不受限制为倒置的树层次结构,因为单个文件夹可能有多个父级.自 2021 年起,这不再适用,驱动器文件(包括文件夹,它们只是特殊文件)只能由单亲创建.

NB. A previous version of this answer highlighted that Drive folders were not constrained to an inverted tree hierarchy, because a single folder could have multiple parents. As of 2021, this is no longer true and a Drive File (including Folders, which are simply special files) can only be created with a single parent.

相关文章