无法在 python 中使用 youtube API v3 下载视频字幕

问题描述

我正在尝试下载此公共 youtube 视频的隐藏式字幕(仅用于测试)我不知道为什么它无法识别这个公开视频的视频ID.

有人遇到过类似问题吗?

提前谢谢大家.

代码示例:

# 使用示例:# python captions.py --videoid='<video_id>'--name='<名称>'--file='<文件>'--language='<语言>'--action='动作'导入 httplib2导入操作系统导入系统从 apiclient.discovery 导入 build_from_document从 apiclient.errors 导入 HttpError从 oauth2client.client 导入 flow_from_clientsecrets从 oauth2client.file 导入存储从 oauth2client.tools 导入 argparser,run_flow# CLIENT_SECRETS_FILE 变量指定包含文件的文件名# 此应用程序的 OAuth 2.0 信息,包括其 client_id 和#client_secret.您可以从以下位置获取 OAuth 2.0 客户端 ID 和客户端密码# {{谷歌云控制台}}在# {{ https://cloud.google.com/console }}.# 请确保您已为您的项目启用 YouTube 数据 API.# 有关使用 OAuth2 访问 YouTube 数据 API 的更多信息,请参阅:# https://developers.google.com/youtube/v3/guides/authentication# 有关client_secrets.json文件格式的更多信息,请参见:# https://developers.google.com/api-client-library/python/guide/aaa_client_secretsCLIENT_SECRETS_FILE = "client_secrets.json"# 此 OAuth 2.0 访问范围允许对# 经过身份验证的用户帐户并要求请求使用 SSL 连接.YOUTUBE_READ_WRITE_SSL_SCOPE = "https://www.googleapis.com/auth/youtube.force-ssl"YOUTUBE_API_SERVICE_NAME = "YouTube"YOUTUBE_API_VERSION = "v3"# 如果 CLIENT_SECRETS_FILE 是,此变量定义要显示的消息# 失踪.MISSING_CLIENT_SECRETS_MESSAGE = """警告:请配置 OAuth 2.0要运行此示例,您需要填充 client_secrets.json 文件发现于:%s来自 API 控制台的信息https://console.developers.google.com有关 client_secrets.json 文件格式的更多信息,请访问:https://developers.google.com/api-client-library/python/guide/aaa_client_secrets""" % os.path.abspath(os.path.join(os.path.dirname(__file__),CLIENT_SECRETS_FILE))# 授权请求并存储授权凭证.def get_authenticated_service(args):flow = flow_from_clientsecrets(CLIENT_SECRETS_FILE,范围=YOUTUBE_READ_WRITE_SSL_SCOPE,消息=MISSING_CLIENT_SECRETS_MESSAGE)storage = Storage("%s-oauth2.json" % sys.argv[0])凭据 = storage.get()如果凭据为无或凭据.无效:凭据 = run_flow(流、存储、参数)# 受信任的测试人员可以从开发者页面下载这个发现文档# 它应该与代码在同一目录中.使用 open("youtube-v3-api-captions.json", "r") 作为 f:doc = f.read()return build_from_document(doc, http=credentials.authorize(httplib2.Http()))# 调用API的captions.list方法列出现有的字幕轨道.def list_captions(youtube,video_id):结果 = youtube.captions().list(部分=片段",videoId=video_id).执行()结果中的项目[项目"]:id = 项目[id"]名称 = 项目[片段"][名称"]语言 = 项目[片段"][语言"]print "'%s' 语言的字幕轨道 '%s(%s)'."%(姓名、身份证、语言)返回结果[项目"]# 调用API的captions.insert方法上传草稿状态的字幕轨道.def upload_caption(youtube、video_id、语言、名称、文件):insert_result = youtube.captions().insert(部分=片段",身体=字典(片段=字典(videoId=video_id,语言=语言,名称=名称,isDraft=真)),media_body=文件).执行()id = insert_result["id"]名称 = 插入结果 [片段"][名称"]语言 = 插入结果[片段"][语言"]status = insert_result["snippet"]["status"]print "以'%s'语言上传的字幕轨道'%s(%s),'%s'状态."% (名称,身份、语言、状态)# 调用 API 的 captions.update 方法来更新现有字幕轨道的草稿状态# 并发布.如果存在新的二进制文件,请同时使用该文件更新轨道.def update_caption(youtube,caption_id,文件):update_result = youtube.captions().update(部分=片段",身体=字典(id=caption_id,片段=字典(isDraft=假)),media_body=文件).执行()name = update_result["snippet"]["name"]isDraft = update_result["snippet"]["isDraft"]print "更新字幕轨道'%s'草稿状态为:'%s'" % (name, isDraft)如果文件:打印并使用新上传的文件更新曲目."# 调用 API 的 captions.download 方法下载现有的字幕轨道.def download_caption(youtube,caption_id,tfmt):字幕 = youtube.captions().download(id=caption_id,tfmt=tfmt).执行()print "第一行字幕轨道:%s" % (subtitle)# 调用API的captions.delete方法删除已有的字幕轨道.def delete_caption(youtube,caption_id):youtube.captions().delete(id=caption_id).执行()print "字幕轨道 '%s' 删除成功" % (caption_id)如果 __name__ == "__main__":# "videoid" 选项指定唯一的 YouTube 视频 ID# 标识将为其上传字幕轨道的视频.argparser.add_argument("--videoid",help="必填;要上传字幕轨道的视频的 ID.")# "name" 选项指定要使用的字幕轨道的名称.argparser.add_argument("--name", help="字幕轨道名称", default="YouTube for Developers")# "file" 选项指定要作为字幕轨道上传的二进制文件.argparser.add_argument("--file", help="字幕跟踪文件上传")# "language" 选项指定要上传的字幕轨道的语言.argparser.add_argument("--language", help="字幕轨道语言", default="en")# "captionid" 选项指定要处理的字幕轨道的 ID.argparser.add_argument("--captionid", help="必填;要处理的字幕轨道ID")# action"选项指定要处理的动作.argparser.add_argument("--action", help="Action", default="all")args = argparser.parse_args()if (args.action in ('upload', 'list', 'all')):如果不是 args.videoid:exit("请使用 --videoid= 参数指定 videoid.")if (args.action in ('update', 'download', 'delete')):如果不是 args.captionid:exit("请使用 --captionid= 参数指定标题 ID.")if (args.action in ('upload', 'all')):如果不是 args.file:exit("请使用 --file= 参数指定字幕轨道文件.")如果不是 os.path.exists(args.file):exit("请使用 --file= 参数指定一个有效文件.")youtube = get_authenticated_service(args)尝试:如果 args.action == '上传':upload_caption(youtube,args.videoid,args.language,args.name,args.file)elif args.action == '列表':list_captions(youtube,args.videoid)elif args.action == '更新':update_caption(youtube,args.captionid,args.file);elif args.action == '下载':下载标题(youtube,args.captionid,'srt')elif args.action == '删除':delete_caption(youtube,args.captionid);别的:# 所有可用的方法都按顺序使用,只是为了举例.upload_caption(youtube,args.videoid,args.language,args.name,args.file)字幕 = list_captions(youtube,args.videoid)如果字幕:first_caption_id = 字幕[0]['id'];update_caption(youtube,first_caption_id,无);下载标题(youtube,first_caption_id,'srt')delete_caption(youtube,first_caption_id);除了 HttpError,e:print "发生 HTTP 错误 %d:
%s" % (e.resp.status, e.content)别的:打印创建和管理的字幕轨道."

解决方案

您的应用程序似乎过于复杂......它的结构是能够执行 所有 可以通过字幕完成的事情,而不是只需下载.这使得调试变得更加困难,因此我编写了一个仅下载字幕的精简版(Python 2 或 3):

# 使用示例:$ python captions-download.py Txvud7wPbv4从 __future__ 导入 print_function从 apiclient 导入发现从 httplib2 导入 Http从 oauth2client 导入文件、客户端、工具范围 = 'https://www.googleapis.com/auth/youtube.force-ssl'store = file.Storage('storage.json')信用 = store.get()如果不是 creds 或 creds.invalid:flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)creds = tools.run_flow(流,存储)YOUTUBE = discovery.build('youtube', 'v3', http=creds.authorize(Http()))定义过程(视频):caption_info = YOUTUBE.captions().list(part='id', videoId=vid).execute().get('items', [])caption_str = YOUTUBE.captions().download(id=caption_info[0]['id'], tfmt='srt').execute()caption_data = caption_str.split('

')对于标题数据中的行:如果 line.count('
') >1:i, cap_time, 标题 = line.split('
', 2)print('%02d) [%s] %s' % (int(i), cap_time, ' '.join(caption.split())))如果 __name__ == '__main__':导入系统如果 len(sys.argv) == 2:VID = sys.argv[1]进程(VID)

它的工作方式是这样的:

  1. 您传入视频 ID (VID) 作为唯一参数 (sys.argv[1])
  2. 它使用该 VID 来查找字幕 IDYOUTUBE.captions().list()
  3. 假设视频有(至少)一个字幕轨道,我获取它的 ID (caption_info[0]['id'])
  4. 然后它调用 YOUTUBE.captions().download() 并使用该标题 ID 请求 srt 曲目格式
  5. 所有单独的标题均由双换行符分隔,因此请按 'em 拆分
  6. 循环浏览每个标题;如果该行中至少有 2 个 NEWLINE,则有数据,因此第一对只有 split()
  7. 显示标题#、它出现的时间线,然后是标题本身,将所有剩余的 NEWLINE 更改为空格

当我运行它时,我得到了预期的结果......在我拥有的视频上:

$ python captions-download.py MY_VIDEO_ID01) [00:00:06,390 -->00:00:09,280] 迭代器很酷,但这很酷02) [00:00:09,280 -->00:00:12,280] 你的时刻03) [00:00:13,380 -->00:00:16,380] 卖家非常激动:

几件事...

  1. 我认为您需要成为尝试下载字幕的视频的所有者.
    • 我在您的视频上尝试了我的脚本,但收到 403 HTTP Forbidden 错误
    • 以下是您可能会从API
  2. 在您的情况下,您传入的视频 ID 似乎有问题.
    • 它认为你正在给它 <code></code>(注意十六进制 0x3c 和 0x3e 值)...富文本?
    • 无论如何,这就是我编写自己的较短版本的原因...所以我有一个更可控的实验环境.

FWIW,由于您是使用 Google API 的新手,因此我制作了几个介绍视频,让开发人员了解如何在 这个播放列表.验证码是最难的,因此请关注该播放列表中的视频 3 和 4,以帮助您适应.

虽然我有一个 Google Apps 脚本 示例(播放列表中的视频 22);如果您是 Apps 脚本的新手,您需要先查看您的 JavaScript,然后先查看视频 5.希望这会有所帮助!

I am trying to download closed captions for this public youtube video (just for testing) https://www.youtube.com/watch?v=Txvud7wPbv4

I am using the code sample(captions.py) below that i got from this link https://developers.google.com/youtube/v3/docs/captions/download

I have already stored the client-secrets.json(oauth2 authentification) and youtube-v3-api-captions.json in the same directory (asked in the sample code)

I put this code line in cmd : python captions.py --videoid='Txvud7wPbv4' --action='download'

I get this error: I don't know why it doesn't recognise the video id of this public video.

Anyone had the a similar issue ?

Thank you all in advance.

Code sample:

# Usage example:
# python captions.py --videoid='<video_id>' --name='<name>' --file='<file>' --language='<language>' --action='action'

import httplib2
import os
import sys

from apiclient.discovery import build_from_document
from apiclient.errors import HttpError
from oauth2client.client import flow_from_clientsecrets
from oauth2client.file import Storage
from oauth2client.tools import argparser, run_flow


# The CLIENT_SECRETS_FILE variable specifies the name of a file that contains

# the OAuth 2.0 information for this application, including its client_id and
# client_secret. You can acquire an OAuth 2.0 client ID and client secret from
# the {{ Google Cloud Console }} at
# {{ https://cloud.google.com/console }}.
# Please ensure that you have enabled the YouTube Data API for your project.
# For more information about using OAuth2 to access the YouTube Data API, see:
#   https://developers.google.com/youtube/v3/guides/authentication
# For more information about the client_secrets.json file format, see:
#   https://developers.google.com/api-client-library/python/guide/aaa_client_secrets
CLIENT_SECRETS_FILE = "client_secrets.json"

# This OAuth 2.0 access scope allows for full read/write access to the
# authenticated user's account and requires requests to use an SSL connection.
YOUTUBE_READ_WRITE_SSL_SCOPE = "https://www.googleapis.com/auth/youtube.force-ssl"
YOUTUBE_API_SERVICE_NAME = "youtube"
YOUTUBE_API_VERSION = "v3"

# This variable defines a message to display if the CLIENT_SECRETS_FILE is
# missing.
MISSING_CLIENT_SECRETS_MESSAGE = """
WARNING: Please configure OAuth 2.0

To make this sample run you will need to populate the client_secrets.json file
found at:
   %s
with information from the APIs Console
https://console.developers.google.com

For more information about the client_secrets.json file format, please visit:
https://developers.google.com/api-client-library/python/guide/aaa_client_secrets
""" % os.path.abspath(os.path.join(os.path.dirname(__file__),
                                   CLIENT_SECRETS_FILE))

# Authorize the request and store authorization credentials.
def get_authenticated_service(args):
  flow = flow_from_clientsecrets(CLIENT_SECRETS_FILE, scope=YOUTUBE_READ_WRITE_SSL_SCOPE,
    message=MISSING_CLIENT_SECRETS_MESSAGE)

  storage = Storage("%s-oauth2.json" % sys.argv[0])
  credentials = storage.get()

  if credentials is None or credentials.invalid:
    credentials = run_flow(flow, storage, args)

  # Trusted testers can download this discovery document from the developers page
  # and it should be in the same directory with the code.
  with open("youtube-v3-api-captions.json", "r") as f:
    doc = f.read()
    return build_from_document(doc, http=credentials.authorize(httplib2.Http()))


# Call the API's captions.list method to list the existing caption tracks.
def list_captions(youtube, video_id):
  results = youtube.captions().list(
    part="snippet",
    videoId=video_id
  ).execute()

  for item in results["items"]:
    id = item["id"]
    name = item["snippet"]["name"]
    language = item["snippet"]["language"]
    print "Caption track '%s(%s)' in '%s' language." % (name, id, language)

  return results["items"]


# Call the API's captions.insert method to upload a caption track in draft status.
def upload_caption(youtube, video_id, language, name, file):
  insert_result = youtube.captions().insert(
    part="snippet",
    body=dict(
      snippet=dict(
        videoId=video_id,
        language=language,
        name=name,
        isDraft=True
      )
    ),
    media_body=file
  ).execute()

  id = insert_result["id"]
  name = insert_result["snippet"]["name"]
  language = insert_result["snippet"]["language"]
  status = insert_result["snippet"]["status"]
  print "Uploaded caption track '%s(%s) in '%s' language, '%s' status." % (name,
      id, language, status)


# Call the API's captions.update method to update an existing caption track's draft status
# and publish it. If a new binary file is present, update the track with the file as well.
def update_caption(youtube, caption_id, file):
  update_result = youtube.captions().update(
    part="snippet",
    body=dict(
      id=caption_id,
      snippet=dict(
        isDraft=False
      )
    ),
    media_body=file
  ).execute()

  name = update_result["snippet"]["name"]
  isDraft = update_result["snippet"]["isDraft"]
  print "Updated caption track '%s' draft status to be: '%s'" % (name, isDraft)
  if file:
    print "and updated the track with the new uploaded file."


# Call the API's captions.download method to download an existing caption track.
def download_caption(youtube, caption_id, tfmt):
  subtitle = youtube.captions().download(
    id=caption_id,
    tfmt=tfmt
  ).execute()

  print "First line of caption track: %s" % (subtitle)

# Call the API's captions.delete method to delete an existing caption track.
def delete_caption(youtube, caption_id):
  youtube.captions().delete(
    id=caption_id
  ).execute()

  print "caption track '%s' deleted succesfully" % (caption_id)


if __name__ == "__main__":
  # The "videoid" option specifies the YouTube video ID that uniquely
  # identifies the video for which the caption track will be uploaded.
  argparser.add_argument("--videoid",
    help="Required; ID for video for which the caption track will be uploaded.")
  # The "name" option specifies the name of the caption trackto be used.
  argparser.add_argument("--name", help="Caption track name", default="YouTube for Developers")
  # The "file" option specifies the binary file to be uploaded as a caption track.
  argparser.add_argument("--file", help="Captions track file to upload")
  # The "language" option specifies the language of the caption track to be uploaded.
  argparser.add_argument("--language", help="Caption track language", default="en")
  # The "captionid" option specifies the ID of the caption track to be processed.
  argparser.add_argument("--captionid", help="Required; ID of the caption track to be processed")
  # The "action" option specifies the action to be processed.
  argparser.add_argument("--action", help="Action", default="all")


  args = argparser.parse_args()

  if (args.action in ('upload', 'list', 'all')):
    if not args.videoid:
          exit("Please specify videoid using the --videoid= parameter.")

  if (args.action in ('update', 'download', 'delete')):
    if not args.captionid:
          exit("Please specify captionid using the --captionid= parameter.")

  if (args.action in ('upload', 'all')):
    if not args.file:
      exit("Please specify a caption track file using the --file= parameter.")
    if not os.path.exists(args.file):
      exit("Please specify a valid file using the --file= parameter.")

  youtube = get_authenticated_service(args)
  try:
    if args.action == 'upload':
      upload_caption(youtube, args.videoid, args.language, args.name, args.file)
    elif args.action == 'list':
      list_captions(youtube, args.videoid)
    elif args.action == 'update':
      update_caption(youtube, args.captionid, args.file);
    elif args.action == 'download':
      download_caption(youtube, args.captionid, 'srt')
    elif args.action == 'delete':
      delete_caption(youtube, args.captionid);
    else:
      # All the available methods are used in sequence just for the sake of an example.
      upload_caption(youtube, args.videoid, args.language, args.name, args.file)
      captions = list_captions(youtube, args.videoid)

      if captions:
        first_caption_id = captions[0]['id'];
        update_caption(youtube, first_caption_id, None);
        download_caption(youtube, first_caption_id, 'srt')
        delete_caption(youtube, first_caption_id);
  except HttpError, e:
    print "An HTTP error %d occurred:
%s" % (e.resp.status, e.content)
  else:
    print "Created and managed caption tracks."

解决方案

Your app seems overly-complex... it's structured to be able to do everything that can be done w/captions, not just download. That makes it harder to debug, so I wrote an abridged (Python 2 or 3) version that just downloads captions:

# Usage example: $ python captions-download.py Txvud7wPbv4

from __future__ import print_function

from apiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools

SCOPES = 'https://www.googleapis.com/auth/youtube.force-ssl'
store = file.Storage('storage.json')
creds = store.get()
if not creds or creds.invalid:
    flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
    creds = tools.run_flow(flow, store)
YOUTUBE = discovery.build('youtube', 'v3', http=creds.authorize(Http()))

def process(vid):
    caption_info = YOUTUBE.captions().list(
            part='id', videoId=vid).execute().get('items', [])
    caption_str = YOUTUBE.captions().download(
            id=caption_info[0]['id'], tfmt='srt').execute()
    caption_data = caption_str.split('

')
    for line in caption_data:
        if line.count('
') > 1:
            i, cap_time, caption = line.split('
', 2)
            print('%02d) [%s] %s' % (
                    int(i), cap_time, ' '.join(caption.split())))

if __name__ == '__main__':
    import sys
    if len(sys.argv) == 2:
        VID = sys.argv[1]
    process(VID)

The way it works is this:

  1. You pass in the video ID (VID) as the only argument (sys.argv[1])
  2. It uses that VID to look up the caption IDs with YOUTUBE.captions().list()
  3. Assuming the video has (at least) one caption track, I grab its ID (caption_info[0]['id'])
  4. Then it calls YOUTUBE.captions().download() with that caption ID requesting the srt track format
  5. All individual captions are delimited by double NEWLINEs, so split on 'em
  6. Loop through each caption; there's data if there are at least 2 NEWLINEs in the line, so only split() on the 1st pair
  7. Display the caption#, timeline of when it appears, then the caption itself, changing all remaining NEWLINEs to spaces

When I run it, I get the expected result... here on a video I own:

$ python captions-download.py MY_VIDEO_ID
01) [00:00:06,390 --> 00:00:09,280] iterator cool but that's cool
02) [00:00:09,280 --> 00:00:12,280] your the moment
03) [00:00:13,380 --> 00:00:16,380] and sellers very thrilled
    :

Couple of things...

  1. I think you need to be the owner of the video you're trying to download the captions for.
    • I tried my script on your video, and I get a 403 HTTP Forbidden error
    • Here are other errors you may get from the API
  2. In your case, it looks like something is messing up the video ID you're passing in.
    • It thinks you're giving it <code> and </code> (notice the hex 0x3c & 0x3e values)... rich text?
    • Anyway, this is why I wrote my own, shorter version... so I have a more controlled environment to experiment.

FWIW, since you're new to using Google APIs, I've made a couple of intro videos I made to get developers on-boarded with using Google APIs in this playlist. The auth code is the toughest, so focus on videos 3 and 4 in that playlist to help get you acclimated.

I don't really have any videos that cover YouTube APIs (as I focus more on G Suite APIs) although I do have the one Google Apps Script example (video 22 in playlist); if you're new to Apps Script, you need to review your JavaScript then check out video 5 first. Hope this helps!

相关文章