打通谷歌Cloud Storage至bigquery的数据流
背景
由于谷歌的bigquery不支持直接上传大于1G的数据,因此需要通过先上传到GCS(google cloud storage),再从GCS将数据load进bigquery。
年初谷歌的验证规则作了更新,基于googleAuthR的gar_auth()函数已经失效,再也不能通过点一下进行验证了,会报如下错误:
但是hadley大神开发的bigrquery包,却能够通过验证,
可能tidyverse API用的是Rstudio的官方邮箱,官方认证,所以能够用来验证。
但是,用bigrquery包有两个局限性:
- 不支持直接上传大数据集;
- 该验证只能使用bigquery服务。
下面介绍下GCS、bigquery通用的验证方法,
依赖
- bigquery、GCS服务已开通;
- BigQuery API、Google Cloud Storage JSON API已启用:
- 服务账号已创建,且为owner权限:
失败的尝试
- 使用gar_set_client()无法通过验证;
- 创建客户端id,并使用该凭据,无法通过验证。
成功的尝试
- 点击服务账号;
- 创建新密钥;
- 创建json私钥;
创建完后会自动下载新密钥的json文件。
- 使用私钥json文件完成GCS、bigquery验证。
- 官方例子测试*:
可自行测试文件 -> GCS -> bigquery的数据流,
## Not run: library(googleCloudStorageR) library(bigQueryR) gcs_global_bucket("your-project") ## custom upload function to ignore quotes and column headers f <- function(input, output) { write.table(input, sep = ",", col.names = FALSE, row.names = FALSE, quote = FALSE, file = output, qmethod = "double")} ## upload files to Google Cloud Storage gcs_upload(mtcars, name = "mtcars_test1.csv", object_function = f) gcs_upload(mtcars, name = "mtcars_test2.csv", object_function = f) ## create the schema of the files you just uploaded user_schema <- schema_fields(mtcars) ## load files from Google Cloud Storage into BigQuery bqr_upload_data(projectId = "your-project", datasetId = "test", tableId = "from_gcs_mtcars", upload_data = c("gs://your-project/mtcars_test1.csv", "gs://your-project/mtcars_test2.csv"), schema = user_schema) ## for big files, its helpful to create your schema on a small sample ## a quick way to do this on the command line is: # "head bigfile.csv > head_bigfile.csv" ## upload nested lists as JSON the_list <- list(list(col1 = "yes", col2 = "no", col3 = list(nest1 = 1, nest2 = 3), col4 = "oh"), list(col1 = "yes2", col2 = "n2o", col3 = list(nest1 = 5, nest2 = 7), col4 = "oh2"), list(col1 = "yes3", col2 = "no3", col3 = list(nest1 = 7, nest2 = 55), col4 = "oh3")) bqr_upload_data(datasetId = "test", tableId = "nested_list_json", upload_data = the_list, autodetect = TRUE) ## End(Not run)
来源 https://zhuanlan.zhihu.com/p/157385761
相关文章