无法在 python 中使用 youtube API v3 下载视频字幕
问题描述
我正在尝试下载此公共 youtube 视频的隐藏式字幕(仅用于测试)我不知道为什么它无法识别这个公开视频的视频ID.
有人遇到过类似问题吗?
提前谢谢大家.
代码示例:
# 使用示例:# python captions.py --videoid='<video_id>'--name='<名称>'--file='<文件>'--language='<语言>'--action='动作'导入 httplib2导入操作系统导入系统从 apiclient.discovery 导入 build_from_document从 apiclient.errors 导入 HttpError从 oauth2client.client 导入 flow_from_clientsecrets从 oauth2client.file 导入存储从 oauth2client.tools 导入 argparser,run_flow# CLIENT_SECRETS_FILE 变量指定包含文件的文件名# 此应用程序的 OAuth 2.0 信息,包括其 client_id 和#client_secret.您可以从以下位置获取 OAuth 2.0 客户端 ID 和客户端密码# {{谷歌云控制台}}在# {{ https://cloud.google.com/console }}.# 请确保您已为您的项目启用 YouTube 数据 API.# 有关使用 OAuth2 访问 YouTube 数据 API 的更多信息,请参阅:# https://developers.google.com/youtube/v3/guides/authentication# 有关client_secrets.json文件格式的更多信息,请参见:# https://developers.google.com/api-client-library/python/guide/aaa_client_secretsCLIENT_SECRETS_FILE = "client_secrets.json"# 此 OAuth 2.0 访问范围允许对# 经过身份验证的用户帐户并要求请求使用 SSL 连接.YOUTUBE_READ_WRITE_SSL_SCOPE = "https://www.googleapis.com/auth/youtube.force-ssl"YOUTUBE_API_SERVICE_NAME = "YouTube"YOUTUBE_API_VERSION = "v3"# 如果 CLIENT_SECRETS_FILE 是,此变量定义要显示的消息# 失踪.MISSING_CLIENT_SECRETS_MESSAGE = """警告:请配置 OAuth 2.0要运行此示例,您需要填充 client_secrets.json 文件发现于:%s来自 API 控制台的信息https://console.developers.google.com有关 client_secrets.json 文件格式的更多信息,请访问:https://developers.google.com/api-client-library/python/guide/aaa_client_secrets""" % os.path.abspath(os.path.join(os.path.dirname(__file__),CLIENT_SECRETS_FILE))# 授权请求并存储授权凭证.def get_authenticated_service(args):flow = flow_from_clientsecrets(CLIENT_SECRETS_FILE,范围=YOUTUBE_READ_WRITE_SSL_SCOPE,消息=MISSING_CLIENT_SECRETS_MESSAGE)storage = Storage("%s-oauth2.json" % sys.argv[0])凭据 = storage.get()如果凭据为无或凭据.无效:凭据 = run_flow(流、存储、参数)# 受信任的测试人员可以从开发者页面下载这个发现文档# 它应该与代码在同一目录中.使用 open("youtube-v3-api-captions.json", "r") 作为 f:doc = f.read()return build_from_document(doc, http=credentials.authorize(httplib2.Http()))# 调用API的captions.list方法列出现有的字幕轨道.def list_captions(youtube,video_id):结果 = youtube.captions().list(部分=片段",videoId=video_id).执行()结果中的项目[项目"]:id = 项目[id"]名称 = 项目[片段"][名称"]语言 = 项目[片段"][语言"]print "'%s' 语言的字幕轨道 '%s(%s)'."%(姓名、身份证、语言)返回结果[项目"]# 调用API的captions.insert方法上传草稿状态的字幕轨道.def upload_caption(youtube、video_id、语言、名称、文件):insert_result = youtube.captions().insert(部分=片段",身体=字典(片段=字典(videoId=video_id,语言=语言,名称=名称,isDraft=真)),media_body=文件).执行()id = insert_result["id"]名称 = 插入结果 [片段"][名称"]语言 = 插入结果[片段"][语言"]status = insert_result["snippet"]["status"]print "以'%s'语言上传的字幕轨道'%s(%s),'%s'状态."% (名称,身份、语言、状态)# 调用 API 的 captions.update 方法来更新现有字幕轨道的草稿状态# 并发布.如果存在新的二进制文件,请同时使用该文件更新轨道.def update_caption(youtube,caption_id,文件):update_result = youtube.captions().update(部分=片段",身体=字典(id=caption_id,片段=字典(isDraft=假)),media_body=文件).执行()name = update_result["snippet"]["name"]isDraft = update_result["snippet"]["isDraft"]print "更新字幕轨道'%s'草稿状态为:'%s'" % (name, isDraft)如果文件:打印并使用新上传的文件更新曲目."# 调用 API 的 captions.download 方法下载现有的字幕轨道.def download_caption(youtube,caption_id,tfmt):字幕 = youtube.captions().download(id=caption_id,tfmt=tfmt).执行()print "第一行字幕轨道:%s" % (subtitle)# 调用API的captions.delete方法删除已有的字幕轨道.def delete_caption(youtube,caption_id):youtube.captions().delete(id=caption_id).执行()print "字幕轨道 '%s' 删除成功" % (caption_id)如果 __name__ == "__main__":# "videoid" 选项指定唯一的 YouTube 视频 ID# 标识将为其上传字幕轨道的视频.argparser.add_argument("--videoid",help="必填;要上传字幕轨道的视频的 ID.")# "name" 选项指定要使用的字幕轨道的名称.argparser.add_argument("--name", help="字幕轨道名称", default="YouTube for Developers")# "file" 选项指定要作为字幕轨道上传的二进制文件.argparser.add_argument("--file", help="字幕跟踪文件上传")# "language" 选项指定要上传的字幕轨道的语言.argparser.add_argument("--language", help="字幕轨道语言", default="en")# "captionid" 选项指定要处理的字幕轨道的 ID.argparser.add_argument("--captionid", help="必填;要处理的字幕轨道ID")# action"选项指定要处理的动作.argparser.add_argument("--action", help="Action", default="all")args = argparser.parse_args()if (args.action in ('upload', 'list', 'all')):如果不是 args.videoid:exit("请使用 --videoid= 参数指定 videoid.")if (args.action in ('update', 'download', 'delete')):如果不是 args.captionid:exit("请使用 --captionid= 参数指定标题 ID.")if (args.action in ('upload', 'all')):如果不是 args.file:exit("请使用 --file= 参数指定字幕轨道文件.")如果不是 os.path.exists(args.file):exit("请使用 --file= 参数指定一个有效文件.")youtube = get_authenticated_service(args)尝试:如果 args.action == '上传':upload_caption(youtube,args.videoid,args.language,args.name,args.file)elif args.action == '列表':list_captions(youtube,args.videoid)elif args.action == '更新':update_caption(youtube,args.captionid,args.file);elif args.action == '下载':下载标题(youtube,args.captionid,'srt')elif args.action == '删除':delete_caption(youtube,args.captionid);别的:# 所有可用的方法都按顺序使用,只是为了举例.upload_caption(youtube,args.videoid,args.language,args.name,args.file)字幕 = list_captions(youtube,args.videoid)如果字幕:first_caption_id = 字幕[0]['id'];update_caption(youtube,first_caption_id,无);下载标题(youtube,first_caption_id,'srt')delete_caption(youtube,first_caption_id);除了 HttpError,e:print "发生 HTTP 错误 %d:
%s" % (e.resp.status, e.content)别的:打印创建和管理的字幕轨道."
解决方案 您的应用程序似乎过于复杂......它的结构是能够执行 所有 可以通过字幕完成的事情,而不是只需下载.这使得调试变得更加困难,因此我编写了一个仅下载字幕的精简版(Python 2 或 3):
# 使用示例:$ python captions-download.py Txvud7wPbv4从 __future__ 导入 print_function从 apiclient 导入发现从 httplib2 导入 Http从 oauth2client 导入文件、客户端、工具范围 = 'https://www.googleapis.com/auth/youtube.force-ssl'store = file.Storage('storage.json')信用 = store.get()如果不是 creds 或 creds.invalid:flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)creds = tools.run_flow(流,存储)YOUTUBE = discovery.build('youtube', 'v3', http=creds.authorize(Http()))定义过程(视频):caption_info = YOUTUBE.captions().list(part='id', videoId=vid).execute().get('items', [])caption_str = YOUTUBE.captions().download(id=caption_info[0]['id'], tfmt='srt').execute()caption_data = caption_str.split('
')对于标题数据中的行:如果 line.count('
') >1:i, cap_time, 标题 = line.split('
', 2)print('%02d) [%s] %s' % (int(i), cap_time, ' '.join(caption.split())))如果 __name__ == '__main__':导入系统如果 len(sys.argv) == 2:VID = sys.argv[1]进程(VID)
它的工作方式是这样的:
- 您传入视频 ID (VID) 作为唯一参数 (
sys.argv[1]
) - 它使用该 VID 来查找字幕 ID
YOUTUBE.captions().list()
- 假设视频有(至少)一个字幕轨道,我获取它的 ID (
caption_info[0]['id']
) - 然后它调用
YOUTUBE.captions().download()
并使用该标题 ID 请求srt
曲目格式 - 所有单独的标题均由双换行符分隔,因此请按 'em 拆分
- 循环浏览每个标题;如果该行中至少有 2 个 NEWLINE,则有数据,因此第一对只有
split()
- 显示标题#、它出现的时间线,然后是标题本身,将所有剩余的 NEWLINE 更改为空格
当我运行它时,我得到了预期的结果......在我拥有的视频上:
$ python captions-download.py MY_VIDEO_ID01) [00:00:06,390 -->00:00:09,280] 迭代器很酷,但这很酷02) [00:00:09,280 -->00:00:12,280] 你的时刻03) [00:00:13,380 -->00:00:16,380] 卖家非常激动:
几件事...
- 我认为您需要成为尝试下载字幕的视频的所有者.
- 我在您的视频上尝试了我的脚本,但收到 403 HTTP Forbidden 错误
- 以下是您可能会从API
- 在您的情况下,您传入的视频 ID 似乎有问题.
- 它认为你正在给它
<code>
和</code>
(注意十六进制 0x3c 和 0x3e 值)...富文本? - 无论如何,这就是我编写自己的较短版本的原因...所以我有一个更可控的实验环境.
- 它认为你正在给它
FWIW,由于您是使用 Google API 的新手,因此我制作了几个介绍视频,让开发人员了解如何在 这个播放列表.验证码是最难的,因此请关注该播放列表中的视频 3 和 4,以帮助您适应.
虽然我有一个 Google Apps 脚本 示例(播放列表中的视频 22);如果您是 Apps 脚本的新手,您需要先查看您的 JavaScript,然后先查看视频 5.希望这会有所帮助!
I am trying to download closed captions for this public youtube video (just for testing) https://www.youtube.com/watch?v=Txvud7wPbv4
I am using the code sample(captions.py) below that i got from this link https://developers.google.com/youtube/v3/docs/captions/download
I have already stored the client-secrets.json(oauth2 authentification) and youtube-v3-api-captions.json in the same directory (asked in the sample code)
I put this code line in cmd : python captions.py --videoid='Txvud7wPbv4' --action='download'
I get this error: I don't know why it doesn't recognise the video id of this public video.
Anyone had the a similar issue ?
Thank you all in advance.
Code sample:
# Usage example:
# python captions.py --videoid='<video_id>' --name='<name>' --file='<file>' --language='<language>' --action='action'
import httplib2
import os
import sys
from apiclient.discovery import build_from_document
from apiclient.errors import HttpError
from oauth2client.client import flow_from_clientsecrets
from oauth2client.file import Storage
from oauth2client.tools import argparser, run_flow
# The CLIENT_SECRETS_FILE variable specifies the name of a file that contains
# the OAuth 2.0 information for this application, including its client_id and
# client_secret. You can acquire an OAuth 2.0 client ID and client secret from
# the {{ Google Cloud Console }} at
# {{ https://cloud.google.com/console }}.
# Please ensure that you have enabled the YouTube Data API for your project.
# For more information about using OAuth2 to access the YouTube Data API, see:
# https://developers.google.com/youtube/v3/guides/authentication
# For more information about the client_secrets.json file format, see:
# https://developers.google.com/api-client-library/python/guide/aaa_client_secrets
CLIENT_SECRETS_FILE = "client_secrets.json"
# This OAuth 2.0 access scope allows for full read/write access to the
# authenticated user's account and requires requests to use an SSL connection.
YOUTUBE_READ_WRITE_SSL_SCOPE = "https://www.googleapis.com/auth/youtube.force-ssl"
YOUTUBE_API_SERVICE_NAME = "youtube"
YOUTUBE_API_VERSION = "v3"
# This variable defines a message to display if the CLIENT_SECRETS_FILE is
# missing.
MISSING_CLIENT_SECRETS_MESSAGE = """
WARNING: Please configure OAuth 2.0
To make this sample run you will need to populate the client_secrets.json file
found at:
%s
with information from the APIs Console
https://console.developers.google.com
For more information about the client_secrets.json file format, please visit:
https://developers.google.com/api-client-library/python/guide/aaa_client_secrets
""" % os.path.abspath(os.path.join(os.path.dirname(__file__),
CLIENT_SECRETS_FILE))
# Authorize the request and store authorization credentials.
def get_authenticated_service(args):
flow = flow_from_clientsecrets(CLIENT_SECRETS_FILE, scope=YOUTUBE_READ_WRITE_SSL_SCOPE,
message=MISSING_CLIENT_SECRETS_MESSAGE)
storage = Storage("%s-oauth2.json" % sys.argv[0])
credentials = storage.get()
if credentials is None or credentials.invalid:
credentials = run_flow(flow, storage, args)
# Trusted testers can download this discovery document from the developers page
# and it should be in the same directory with the code.
with open("youtube-v3-api-captions.json", "r") as f:
doc = f.read()
return build_from_document(doc, http=credentials.authorize(httplib2.Http()))
# Call the API's captions.list method to list the existing caption tracks.
def list_captions(youtube, video_id):
results = youtube.captions().list(
part="snippet",
videoId=video_id
).execute()
for item in results["items"]:
id = item["id"]
name = item["snippet"]["name"]
language = item["snippet"]["language"]
print "Caption track '%s(%s)' in '%s' language." % (name, id, language)
return results["items"]
# Call the API's captions.insert method to upload a caption track in draft status.
def upload_caption(youtube, video_id, language, name, file):
insert_result = youtube.captions().insert(
part="snippet",
body=dict(
snippet=dict(
videoId=video_id,
language=language,
name=name,
isDraft=True
)
),
media_body=file
).execute()
id = insert_result["id"]
name = insert_result["snippet"]["name"]
language = insert_result["snippet"]["language"]
status = insert_result["snippet"]["status"]
print "Uploaded caption track '%s(%s) in '%s' language, '%s' status." % (name,
id, language, status)
# Call the API's captions.update method to update an existing caption track's draft status
# and publish it. If a new binary file is present, update the track with the file as well.
def update_caption(youtube, caption_id, file):
update_result = youtube.captions().update(
part="snippet",
body=dict(
id=caption_id,
snippet=dict(
isDraft=False
)
),
media_body=file
).execute()
name = update_result["snippet"]["name"]
isDraft = update_result["snippet"]["isDraft"]
print "Updated caption track '%s' draft status to be: '%s'" % (name, isDraft)
if file:
print "and updated the track with the new uploaded file."
# Call the API's captions.download method to download an existing caption track.
def download_caption(youtube, caption_id, tfmt):
subtitle = youtube.captions().download(
id=caption_id,
tfmt=tfmt
).execute()
print "First line of caption track: %s" % (subtitle)
# Call the API's captions.delete method to delete an existing caption track.
def delete_caption(youtube, caption_id):
youtube.captions().delete(
id=caption_id
).execute()
print "caption track '%s' deleted succesfully" % (caption_id)
if __name__ == "__main__":
# The "videoid" option specifies the YouTube video ID that uniquely
# identifies the video for which the caption track will be uploaded.
argparser.add_argument("--videoid",
help="Required; ID for video for which the caption track will be uploaded.")
# The "name" option specifies the name of the caption trackto be used.
argparser.add_argument("--name", help="Caption track name", default="YouTube for Developers")
# The "file" option specifies the binary file to be uploaded as a caption track.
argparser.add_argument("--file", help="Captions track file to upload")
# The "language" option specifies the language of the caption track to be uploaded.
argparser.add_argument("--language", help="Caption track language", default="en")
# The "captionid" option specifies the ID of the caption track to be processed.
argparser.add_argument("--captionid", help="Required; ID of the caption track to be processed")
# The "action" option specifies the action to be processed.
argparser.add_argument("--action", help="Action", default="all")
args = argparser.parse_args()
if (args.action in ('upload', 'list', 'all')):
if not args.videoid:
exit("Please specify videoid using the --videoid= parameter.")
if (args.action in ('update', 'download', 'delete')):
if not args.captionid:
exit("Please specify captionid using the --captionid= parameter.")
if (args.action in ('upload', 'all')):
if not args.file:
exit("Please specify a caption track file using the --file= parameter.")
if not os.path.exists(args.file):
exit("Please specify a valid file using the --file= parameter.")
youtube = get_authenticated_service(args)
try:
if args.action == 'upload':
upload_caption(youtube, args.videoid, args.language, args.name, args.file)
elif args.action == 'list':
list_captions(youtube, args.videoid)
elif args.action == 'update':
update_caption(youtube, args.captionid, args.file);
elif args.action == 'download':
download_caption(youtube, args.captionid, 'srt')
elif args.action == 'delete':
delete_caption(youtube, args.captionid);
else:
# All the available methods are used in sequence just for the sake of an example.
upload_caption(youtube, args.videoid, args.language, args.name, args.file)
captions = list_captions(youtube, args.videoid)
if captions:
first_caption_id = captions[0]['id'];
update_caption(youtube, first_caption_id, None);
download_caption(youtube, first_caption_id, 'srt')
delete_caption(youtube, first_caption_id);
except HttpError, e:
print "An HTTP error %d occurred:
%s" % (e.resp.status, e.content)
else:
print "Created and managed caption tracks."
解决方案
Your app seems overly-complex... it's structured to be able to do everything that can be done w/captions, not just download. That makes it harder to debug, so I wrote an abridged (Python 2 or 3) version that just downloads captions:
# Usage example: $ python captions-download.py Txvud7wPbv4
from __future__ import print_function
from apiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools
SCOPES = 'https://www.googleapis.com/auth/youtube.force-ssl'
store = file.Storage('storage.json')
creds = store.get()
if not creds or creds.invalid:
flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
creds = tools.run_flow(flow, store)
YOUTUBE = discovery.build('youtube', 'v3', http=creds.authorize(Http()))
def process(vid):
caption_info = YOUTUBE.captions().list(
part='id', videoId=vid).execute().get('items', [])
caption_str = YOUTUBE.captions().download(
id=caption_info[0]['id'], tfmt='srt').execute()
caption_data = caption_str.split('
')
for line in caption_data:
if line.count('
') > 1:
i, cap_time, caption = line.split('
', 2)
print('%02d) [%s] %s' % (
int(i), cap_time, ' '.join(caption.split())))
if __name__ == '__main__':
import sys
if len(sys.argv) == 2:
VID = sys.argv[1]
process(VID)
The way it works is this:
- You pass in the video ID (VID) as the only argument (
sys.argv[1]
) - It uses that VID to look up the caption IDs with
YOUTUBE.captions().list()
- Assuming the video has (at least) one caption track, I grab its ID (
caption_info[0]['id']
) - Then it calls
YOUTUBE.captions().download()
with that caption ID requesting thesrt
track format - All individual captions are delimited by double NEWLINEs, so split on 'em
- Loop through each caption; there's data if there are at least 2 NEWLINEs in the line, so only
split()
on the 1st pair - Display the caption#, timeline of when it appears, then the caption itself, changing all remaining NEWLINEs to spaces
When I run it, I get the expected result... here on a video I own:
$ python captions-download.py MY_VIDEO_ID
01) [00:00:06,390 --> 00:00:09,280] iterator cool but that's cool
02) [00:00:09,280 --> 00:00:12,280] your the moment
03) [00:00:13,380 --> 00:00:16,380] and sellers very thrilled
:
Couple of things...
- I think you need to be the owner of the video you're trying to download the captions for.
- I tried my script on your video, and I get a 403 HTTP Forbidden error
- Here are other errors you may get from the API
- In your case, it looks like something is messing up the video ID you're passing in.
- It thinks you're giving it
<code>
and</code>
(notice the hex 0x3c & 0x3e values)... rich text? - Anyway, this is why I wrote my own, shorter version... so I have a more controlled environment to experiment.
- It thinks you're giving it
FWIW, since you're new to using Google APIs, I've made a couple of intro videos I made to get developers on-boarded with using Google APIs in this playlist. The auth code is the toughest, so focus on videos 3 and 4 in that playlist to help get you acclimated.
I don't really have any videos that cover YouTube APIs (as I focus more on G Suite APIs) although I do have the one Google Apps Script example (video 22 in playlist); if you're new to Apps Script, you need to review your JavaScript then check out video 5 first. Hope this helps!
相关文章