如何使用数据库将CSV写回Azure Blob存储？

2022-04-11 00:00:00 pandas databricks azure-databricks scala

问题描述

我正在努力回写Azure Blob存储容器。我可以使用以下内容从容器中读取内容：

storage_account_name = "expstorage"
storage_account_key = "1VP89J..."
container = "source"

spark.conf.set("fs.azure.account.key.{0}.blob.core.windows.net".format(storage_account_name), storage_account_key)

dbutils.fs.ls("dbfs:/mnt/azurestorage")

我尝试了多种方法来回写我的容器，只是在进行搜索，但我找不到一种确定的方法。

这里有一个指向使用SAS密钥的备用密钥的链接，但我不想混合/匹配密钥类型。

Write dataframe to blob using azure databricks

解决方案

若要写入Blob存储，您只需指定路径，以dbfs:/mnt/azurestorage：

开始

df.write
 .mode("overwrite")
 .option("header", "true")
 .csv("dbfs:/mnt/azurestorage/filename.csv"))

这将创建一个包含分布式数据的文件夹。如果您正在寻找单个CSV文件，请尝试执行以下操作：

df.toPandas().to_csv("dbfs:/mnt/azurestorage/filename.csv")

如果您只使用 pandas ，您将无法访问dBFS API，因此您需要使用本地文件API，这意味着您的路径必须以/dbfs/开头，而不是dbfs:/，如下所示：

df.to_csv(r'/dbfs/mnt/azurestorage/filename.csv', index = False)

相关文章