Python json 解析器允许重复键
问题描述
我需要解析一个 json 文件,不幸的是,它不符合原型.我的数据有两个问题,但我已经找到了解决方法,所以我会在最后提到它,也许有人也可以提供帮助.
I need to parse a json file which unfortunately for me, does not follow the prototype. I have two issues with the data, but i've already found a workaround for it so i'll just mention it at the end, maybe someone can help there as well.
所以我需要像这样解析条目:
So i need to parse entries like this:
"Test":{
"entry":{
"Type":"Something"
},
"entry":{
"Type":"Something_Else"
}
}, ...
json 默认解析器更新字典,因此只使用最后一个条目.我也必须以某种方式存储另一个,我不知道该怎么做.我还必须以它们在文件中出现的相同顺序将键存储在几个字典中,这就是我使用 OrderedDict 这样做的原因.它工作正常,所以如果有任何方法可以用重复的条目来扩展它,我将不胜感激.
The json default parser updates the dictionary and therfore uses only the last entry. I HAVE to somehow store the other one as well, and i have no idea how to do this. I also HAVE to store the keys in the several dictionaries in the same order they appear in the file, thats why i am using an OrderedDict to do so. it works fine, so if there is any way to expand this with the duplicate entries i'd be grateful.
我的第二个问题是这个相同的 json 文件包含这样的条目:
My second issue is that this very same json file contains entries like that:
"Test":{
{
"Type":"Something"
}
}
Json.load() 函数在到达 json 文件中的该行时引发异常.我解决此问题的唯一方法是自己手动删除内括号.
Json.load() function raises an exception when it reaches that line in the json file. The only way i worked around this was to manually remove the inner brackets myself.
提前致谢
解决方案
您可以使用 JSONDecoder.object_pairs_hook
自定义 JSONDecoder
解码对象.这个钩子函数将传递一个 (key, value)
对的列表,你通常会对其进行一些处理,然后变成 dict
.
You can use JSONDecoder.object_pairs_hook
to customize how JSONDecoder
decodes objects. This hook function will be passed a list of (key, value)
pairs that you usually do some processing on, and then turn into a dict
.
但是,由于 Python 字典不允许重复键(而且您根本无法更改),您可以在挂钩中返回未更改的对并获得 (key, value)<解码 JSON 时的/code> 对:
However, since Python dictionaries don't allow for duplicate keys (and you simply can't change that), you can return the pairs unchanged in the hook and get a nested list of (key, value)
pairs when you decode your JSON:
from json import JSONDecoder
def parse_object_pairs(pairs):
return pairs
data = """
{"foo": {"baz": 42}, "foo": 7}
"""
decoder = JSONDecoder(object_pairs_hook=parse_object_pairs)
obj = decoder.decode(data)
print obj
输出:
[(u'foo', [(u'baz', 42)]), (u'foo', 7)]
如何使用此数据结构取决于您.如上所述,Python 字典不允许重复键,并且没有办法解决这个问题.您甚至会如何根据键进行查找?dct[key]
会模棱两可.
How you use this data structure is up to you. As stated above, Python dictionaries won't allow for duplicate keys, and there's no way around that. How would you even do a lookup based on a key? dct[key]
would be ambiguous.
因此,您可以实现自己的逻辑以按照您期望的方式处理查找,或者实现某种避免冲突以使键唯一(如果它们不是唯一的),然后然后创建嵌套列表中的字典.
So you can either implement your own logic to handle a lookup the way you expect it to work, or implement some sort of collision avoidance to make keys unique if they're not, and then create a dictionary from your nested list.
编辑:既然您说要修改重复键以使其唯一,那么您可以这样做:
Edit: Since you said you would like to modify the duplicate key to make it unique, here's how you'd do that:
from collections import OrderedDict
from json import JSONDecoder
def make_unique(key, dct):
counter = 0
unique_key = key
while unique_key in dct:
counter += 1
unique_key = '{}_{}'.format(key, counter)
return unique_key
def parse_object_pairs(pairs):
dct = OrderedDict()
for key, value in pairs:
if key in dct:
key = make_unique(key, dct)
dct[key] = value
return dct
data = """
{"foo": {"baz": 42, "baz": 77}, "foo": 7, "foo": 23}
"""
decoder = JSONDecoder(object_pairs_hook=parse_object_pairs)
obj = decoder.decode(data)
print obj
输出:
OrderedDict([(u'foo', OrderedDict([(u'baz', 42), ('baz_1', 77)])), ('foo_1', 7), ('foo_2', 23)])
make_unique
函数负责返回一个无冲突的密钥.在这个例子中,它只是用 _n
作为键的后缀,其中 n
是一个增量计数器 - 只需根据您的需要调整它即可.
The make_unique
function is responsible for returning a collision-free key. In this example it just suffixes the key with _n
where n
is an incremental counter - just adapt it to your needs.
因为 object_pairs_hook
完全按照它们在 JSON 文档中出现的顺序接收对,所以也可以通过使用 OrderedDict
来保留该顺序,我将其包含为好吧.
Because the object_pairs_hook
receives the pairs exactly in the order they appear in the JSON document, it's also possible to preserve that order by using an OrderedDict
, I included that as well.
相关文章