使用 PyYAML 在 yaml 中将文档作为原始字符串加载

2022-01-14 00:00:00 python pyyaml yaml

问题描述

我想解析如下的 yaml 文档

I want to parse yaml documents like the following

meta-info-1: val1
meta-info-2: val2

---

Plain text/markdown content!
jhaha

如果我使用 PyYAML load_all 这个,我会得到以下结果

If I load_all this with PyYAML, I get the following

>>> list(yaml.load_all(open('index.yml')))
[{'meta-info-1': 'val1', 'meta-info-2': 'val2'}, 'Plain text/markdown content! jhaha']

我在这里想要实现的是 yaml 文件应该包含两个文档,而第二个文档应该被解释为单个字符串文档,更具体地说,任何带有 markdown 格式的大文本正文.我不希望它被解析为 YAML 语法.

What I am trying to achieve here is that the yaml file should contain two documents, and the second one is supposed to be interpreted as a single string document, more specifically any large body of text with markdown formatting. I don't want it to be parsed as YAML syntax.

在上面的示例中,PyYAML 将第二个文档作为单个字符串返回.但是,如果第二个文档有一个 : 字符代替 !,例如,我会收到语法错误.这是因为 PyYAML 正在解析该文档中的内容.

In the above example, PyYAML returns the second document as a single string. But if the second document has a : character in place of the ! for instance, I get a syntax error. This is because PyYAML is parsing the stuff in that document.

有没有办法告诉 PyYAML 第二个文档只是一个原始字符串而不是解析它?

Is there a way I can tell PyYAML that the second document is a just a raw string and not to parse it?

编辑:那里有一些很好的答案.虽然使用引号或文字语法解决了上述问题,但我希望用户能够编写纯文本而无需任何额外的麻烦.只需三个 -(或 .)并写下一大段纯文本.其中也可能包括引号.所以,我想知道我是否可以告诉 PyYAML 只解析一个文档,然后将第二个文档直接提供给我.

Edit: A few excellent answers there. While using quotes or the literal syntax solves the said problem, I'd like the users to be able to write the plain text without any extra cruft. Just the three -'s (or .'s) and write away a large body of plain text. Which might also include quotes too. So, I'd like to know if I can tell PyYAML to parse only one document, and give the second to me raw.

Eidt 2:因此,适应 agf 的想法,而不是使用 try/except 作为第二个文档可能是有效的 yaml 语法,

Eidt 2: So, adapting agf's idea, instead of using a try/except as the second document could be valid yaml syntax,

config_content, body_content = open(filename).read().split('
---')
config = yaml.loads(config_content)
body = yaml.loads(body_content)

感谢 agf.


解决方案

你可以做

raw = open(filename).read()
docs = []
for raw_doc in raw.split('
---'):
    try:
        docs.append(yaml.load(raw_doc))
    except SyntaxError:
        docs.append(raw_doc)

如果您无法控制原始文档的格式.

If you won't have control over the format of the original document.

来自 PyYAML 文档,

From the PyYAML docs,

双引号是最强大的样式,也是唯一可以表达任何标量值的样式.双引号标量允许转义.使用转义序列 x** 和 u****,您可以表示任何 ASCII 或 Unicode 字符.

Double-quoted is the most powerful style and the only style that can express any scalar value. Double-quoted scalars allow escaping. Using escaping sequences x** and u****, you may express any ASCII or Unicode character.

因此,如果它不是双引号,听起来好像没有办法在解析中表示任意标量.

So it sounds like there is no way to represent an arbitrary scalar in the parsing if it's not double quoted.

相关文章