Google Cloud Storage + Python:有什么方法可以在 GCS 的某个文件夹中列出 obj?
问题描述
我要编写一个 Python 程序来检查文件是否在我的 Google Cloud Storage 的某个文件夹中,基本思想是获取文件夹中所有对象的 list
,a文件名list
,然后检查文件abc.txt
是否在文件名list
中.
I'm going to write a Python program to check if a file is in certain folder of my Google Cloud Storage, the basic idea is to get the list
of all objects in a folder, a file name list
, then check if the file abc.txt
is in the file name list
.
现在的问题是,看起来谷歌只提供了一种获取obj
list
的方法,即uri.get_bucket()
,请参阅以下代码,该代码来自 https://developers.google.com/storage/docs/gspythonlibrary#listing-objects
Now the problem is, it looks Google only provide the one way to get obj
list
, which is uri.get_bucket()
, see below code which is from https://developers.google.com/storage/docs/gspythonlibrary#listing-objects
uri = boto.storage_uri(DOGS_BUCKET, GOOGLE_STORAGE)
for obj in uri.get_bucket():
print '%s://%s/%s' % (uri.scheme, uri.bucket_name, obj.name)
print ' "%s"' % obj.get_contents_as_string()
uri.get_bucket()
的缺陷是,它看起来是先获取所有对象,这是我不想要的,我只需要获取 obj
name list
of specific folder(eg gs//mybucket/abc/myfolder
) ,应该很快.
The defect of uri.get_bucket()
is, it looks it is getting all of the object first, this is what I don't want, I just need get the obj
name list
of particular folder(e.g gs//mybucket/abc/myfolder
) , which should be much quickly.
有人可以帮忙解答吗?感谢每一个答案!
Could someone help answer? Appreciate every answer!
解决方案
更新:以下适用于 Python 的旧版Google API 客户端库",但如果您不使用它客户端,更喜欢 Python 的较新的Google Cloud 客户端库"(https://googleapis.dev/python/storage/latest/index.html ).对于较新的库,等效于以下代码:
Update: the below is true for the older "Google API Client Libraries" for Python, but if you're not using that client, prefer the newer "Google Cloud Client Library" for Python ( https://googleapis.dev/python/storage/latest/index.html ). For the newer library, the equivalent to the below code is:
from google.cloud import storage
client = storage.Client()
for blob in client.list_blobs('bucketname', prefix='abc/myfolder'):
print(str(blob))
老客户的回答如下.
您可能会发现使用 JSON API 更容易,它有一个功能齐全的 Python 客户端.它有一个用于列出带有前缀参数的对象的功能,您可以使用它以这种方式检查某个目录及其子目录:
You may find it easier to work with the JSON API, which has a full-featured Python client. It has a function for listing objects that takes a prefix parameter, which you could use to check for a certain directory and its children in this manner:
from apiclient import discovery
# Auth goes here if necessary. Create authorized http object...
client = discovery.build('storage', 'v1') # add http=whatever param if auth
request = client.objects().list(
bucket="mybucket",
prefix="abc/myfolder")
while request is not None:
response = request.execute()
print json.dumps(response, indent=2)
request = request.list_next(request, response)
列表调用的更完整文档在这里:https://developers.google.com/storage/docs/json_api/v1/objects/list
Fuller documentation of the list call is here: https://developers.google.com/storage/docs/json_api/v1/objects/list
Google Python API 客户端记录在这里:https://code.google.com/p/google-api-python-客户/
And the Google Python API client is documented here: https://code.google.com/p/google-api-python-client/
相关文章