Google Cloud Storage + Python:有什么方法可以在 GCS 的某个文件夹中列出 obj?

问题描述

我要编写一个 Python 程序来检查文件是否在我的 Google Cloud Storage 的某个文件夹中,基本思想是获取文件夹中所有对象的 list,a文件名list,然后检查文件abc.txt是否在文件名list中.

I'm going to write a Python program to check if a file is in certain folder of my Google Cloud Storage, the basic idea is to get the list of all objects in a folder, a file name list, then check if the file abc.txt is in the file name list.

现在的问题是,看起来谷歌只提供了一种获取obj list的方法,即uri.get_bucket(),请参阅以下代码,该代码来自 https://developers.google.com/storage/docs/gspythonlibrary#listing-objects

Now the problem is, it looks Google only provide the one way to get obj list, which is uri.get_bucket(), see below code which is from https://developers.google.com/storage/docs/gspythonlibrary#listing-objects

uri = boto.storage_uri(DOGS_BUCKET, GOOGLE_STORAGE)
for obj in uri.get_bucket():
    print '%s://%s/%s' % (uri.scheme, uri.bucket_name, obj.name)
    print '  "%s"' % obj.get_contents_as_string()

uri.get_bucket() 的缺陷是,它看起来是先获取所有对象,这是我不想要的,我只需要获取 obj name list of specific folder(eg gs//mybucket/abc/myfolder) ,应该很快.

The defect of uri.get_bucket() is, it looks it is getting all of the object first, this is what I don't want, I just need get the obj name list of particular folder(e.g gs//mybucket/abc/myfolder) , which should be much quickly.

有人可以帮忙解答吗?感谢每一个答案!

Could someone help answer? Appreciate every answer!


解决方案

更新:以下适用于 Python 的旧版Google API 客户端库",但如果您不使用它客户端,更喜欢 Python 的较新的Google Cloud 客户端库"(https://googleapis.dev/python/storage/latest/index.html ).对于较新的库,等效于以下代码:

Update: the below is true for the older "Google API Client Libraries" for Python, but if you're not using that client, prefer the newer "Google Cloud Client Library" for Python ( https://googleapis.dev/python/storage/latest/index.html ). For the newer library, the equivalent to the below code is:

from google.cloud import storage

client = storage.Client()
for blob in client.list_blobs('bucketname', prefix='abc/myfolder'):
  print(str(blob))

老客户的回答如下.

您可能会发现使用 JSON API 更容易,它有一个功能齐全的 Python 客户端.它有一个用于列出带有前缀参数的对象的功能,您可以使用它以这种方式检查某个目录及其子目录:

You may find it easier to work with the JSON API, which has a full-featured Python client. It has a function for listing objects that takes a prefix parameter, which you could use to check for a certain directory and its children in this manner:

from apiclient import discovery

# Auth goes here if necessary. Create authorized http object...
client = discovery.build('storage', 'v1') # add http=whatever param if auth
request = client.objects().list(
    bucket="mybucket",
    prefix="abc/myfolder")
while request is not None:
  response = request.execute()
  print json.dumps(response, indent=2)
  request = request.list_next(request, response)

列表调用的更完整文档在这里:https://developers.google.com/storage/docs/json_api/v1/objects/list

Fuller documentation of the list call is here: https://developers.google.com/storage/docs/json_api/v1/objects/list

Google Python API 客户端记录在这里:https://code.google.com/p/google-api-python-客户/

And the Google Python API client is documented here: https://code.google.com/p/google-api-python-client/

相关文章