如何从 Python 3(或 2)将 Google 表格文件保存为 CSV?

2022-01-10 00:00:00 python csv google-sheets-api google-api-python-client google-drive-api

问题描述

我正在寻找一种简单的方法来保存源自已发布的 Google 表格文档的 csv 文件?由于它已发布，因此可以通过直接链接访问(在下面的示例中特意修改).

I am looking for a simple way to save a csv file originating from a published Google Sheets document? Since it's published, it's accessible through a direct link (modified on purpose in the example below).

一旦我启动链接，我的所有浏览器都会提示我保存 csv 文件.

All my browsers will prompt me to save the csv file as soon as I launch the link.

都不是:

DOC_URL = 'https://docs.google.com/spreadsheet/ccc?key=0AoOWveO-dNo5dFNrWThhYmdYW9UT1lQQkE&output=csv' f = urllib.request.urlopen(DOC_URL) cont = f.read(SIZE) f.close() cont = str(cont, 'utf-8') print(cont)

，也不是:

req = urllib.request.Request(DOC_URL) req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.13 (KHTML, like Gecko) Chrome/24.0.1284.0 Safari/537.13') f = urllib.request.urlopen(req) print(f.read().decode('utf-8'))

打印除 html 内容之外的任何内容.

print anything but html content.

(在阅读其他帖子后尝试了第二个版本:使用 python 将谷歌文档公共电子表格下载到 csv.)

(Tried the 2nd version after reading this other post: Download google docs public spreadsheet to csv with python .)

知道我做错了什么吗?我已经退出了我的 Google 帐户，如果这值得的话，但这适用于我尝试过的任何浏览器.据我了解，Google Docs API 还没有移植到 Python 3 上，并且考虑到我个人使用的小项目的玩具"规模，从一开始就使用它甚至没有太大意义，如果我可以绕过它.

Any idea on what I am doing wrong? I am logged out of my Google account, if that worths to anything, but this works from any browser that I tried. As far as I understood, the Google Docs API is not yet ported on Python 3 and given the "toy" magnitude of my little project for personal use, it would not even make too much sense to use it from the get-go, if I can circumvent it.

在第二次尝试中，我离开了用户代理"，因为我在想可能被认为来自脚本的请求(b/c 不存在标识信息)可能会被忽略，但它没有产生区别.

In the 2nd attempt, I left the 'User-Agent', as I was thinking that maybe requests thought as coming from scripts (b/c no identification info is present) might be ignored, but it didn't make a difference.

解决方案

Google 通过一系列 cookie 设置 302 重定向响应初始请求.如果您不存储并在请求之间重新提交 cookie，它会将您重定向到登录页面.

Google responds to the initial request with a series of cookie-setting 302 redirects. If you don't store and resubmit the cookies between requests, it redirects you to the login page.

所以，问题不在于 User-Agent 标头，而是默认情况下，urllib.request.urlopen 不存储 cookie，但它会遵循 HTTP 302 重定向.

So, the problem is not with the User-Agent header, it's the fact that by default, urllib.request.urlopen doesn't store cookies, but it will follow the HTTP 302 redirects.

以下代码在 DOC_URL 指定位置的公共电子表格上运行良好:

The following code works just fine on a public spreadsheet available at the location specified by DOC_URL:

>>> from http.cookiejar import CookieJar >>> from urllib.request import build_opener, HTTPCookieProcessor >>> opener = build_opener(HTTPCookieProcessor(CookieJar())) >>> resp = opener.open(DOC_URL) >>> # should really parse resp.getheader('content-type') for encoding. >>> csv_content = resp.read().decode('utf-8')

<小时>
已经向您展示了如何在 vanilla python 中执行此操作，我现在要说正确的方法是使用最优秀的请求库.它是非常有据可查的，让这些任务完成起来非常愉快.

Having shown you how to do it in vanilla python, I'll now say that the Right Way™ to go about this is to use the most-excellent requests library. It is extremely well documented and makes these sorts of tasks incredibly pleasant to complete.

例如，使用 requests 库获得与上述相同的 csv_content 非常简单:

For instance, to get the same csv_content as above using the requests library is as simple as:

>>> import requests >>> csv_content = requests.get(DOC_URL).text

那一行更清楚地表达了您的意图.它更容易编写和阅读.做你自己 - 以及任何分享你代码库的其他人 - 一个忙，只需使用 requests.

That single line expresses your intent more clearly. It's easier to write and easier to read. Do yourself - and anyone else who shares your codebase - a favor and just use requests.

相关文章