使用 C++ 和 BOOST 读取 JSON 文件

2021-12-24 00:00:00 directory json tree c++ boost

HTTP 服务器向我发送这样的 JSON 响应(字符串):

An HTTP server sends me a JSON response (a string) like this :

{
    "folders" :
    [{
            "id" : 109,
            "parent_id" : 110,
            "path" : "/1/105/110/"
        },
        {
            "id" : 110,
            "parent_id" : 105,
            "path" : "/1/105/"
        }
    ],

    "files" :
    [{
            "id" : 26,
            "parent_id" : 105,
            "name" : "picture.png",
            "hash" : "md5_hash",
            "path" : "/1/105/"
        },
        {
            "id" : 25,
            "parent_id" : 110,
            "name" : "another_picture.jpg",
            "hash" : "md5_hash",
            "path" : "/1/105/110/"
        }
    ]
}

我想将此远程文件夹树"与本地文件夹树(例如包含本地文件位置的字符串向量)进行比较,因此我想在 (string, vector (string, vector (map(string, string) ) ) (我不知道这是否可能).

I want to compare this "tree of a remote folder" with a local folder tree (for example a string vector containing location of my local files), so I thought in converting this JSON on a map of (string, vector ( map(string, string) ) ) (I don't know if this is possible).

我正在开发一个工具来同步本地和远程文件夹之间的文件,所以我使用 boost 来列出本地文件夹,我想将本地列表与远程列表(JSON 响应)进行比较生成动作(下载本地文件夹中不存在的丢失文件,上传远程文件夹中不存在的文件).

I'm developing a tool to synchronize files between a local and a remote folder, so I'm using boost to list a local folder, and I want to compare the local listing with the remote listing (the JSON response) to generate actions (download missing files that dont exist in the local folder, uploading files that dont exist in the remote folder).

我在另一个问题上发现了这个功能:

with this function I found on another question :

void print(boost::property_tree::ptree const& pt)
{
    using boost::property_tree::ptree;
    ptree::const_iterator end = pt.end();
    for (ptree::const_iterator it = pt.begin(); it != end; ++it)
    {
        std::cout << it->first << ": " << it->second.get_value<std::string>() << std::endl;
        print(it->second);
    }
}

我成功地打印了这样的东西:

I succeeded in printing something like this :

folders:
:
id: 109
parent_id: 110
name: 2011_pictures
:
id: 110
parent_id: 105
name: Aminos
files:
id: 26
parent_id: 105
name: logo.png
:
id: 5
parent_id: 109
name: me.jpg

我想知道是否可以用这个结果生成 map>>,它将有两个键:文件夹"和文件",通过这两个键,我们可以访问包含每个对象(文件或文件夹)信息的映射类型向量.如果这样可行,就会降低任务的复杂度(比较两个文件夹列表)

I want to know if it is possible to generate with this result a map<string, vector <map<string,string> > >, it will have 2 keys : "folders" and "files" and with those 2 keys we can access a vector of type map that contains informations for each object (file or folder). If this is feasible, it will reduce the complexity of the task (comparing two folders listing)

示例:T["folder"][0]["id"] 将返回 "109" ;T["files"][0]["name"] 将返回 "logo.png"

example : T["folder"][0]["id"] would return "109" ; T["files"][0]["name"] would return "logo.png"

更新:这个问题很老了,但我想给出一个建议:每当你想在 C++ 下处理 Json 时,都使用 RAPIDJSON.

推荐答案

因为另一个答案中的数据结构 被认为非常复杂",目标数据结构是建议是:

Because the data structure in the other answer was deemed "very complex" and the target data structure was suggested to be:

struct Data {
    struct Folder { int id, parent_id; std::string path; };
    struct File   { int id, parent_id; std::string path, name, md5_hash; };

    using Folders = std::vector<Folder>;
    using Files   = std::vector<File>;

    Folders folders;
    Files   files;
};

我最终编写了从通用JSON"到该数据结构的转换(请参阅另一个答案:使用 C++ 和 BOOST 读取 JSON 文件).

I ended up writing a transformation from generic "JSON" to that data structure (see the other answer: Reading JSON file with C++ and BOOST).

但是,如果我们跳过中间人"并将 JSON 专门解析为显示的 Data 结构,也许 OP 会更满意.这简化"了语法,使其仅针对此类文档:

However, perhaps the OP will be more pleased if we "skip the middle man" and parse the JSON specifically into the shown Data structure. This "simplifies" the grammar making it specific for this type of document only:

start    = '{' >> 
           (folders_ >> commasep) ^
           (files_ >> commasep)
         >> '}';

folders_ = prop_key(+"folders") >> '[' >> -(folder_ % ',') >> ']';
files_   = prop_key(+"files")   >> '[' >> -(file_   % ',') >> ']';

folder_  = '{' >> (
                (prop_key(+"id")        >> int_  >> commasep) ^
                (prop_key(+"parent_id") >> int_  >> commasep) ^
                (prop_key(+"path")      >> text_ >> commasep)
            ) >> '}';
file_    = '{' >> (
                (prop_key(+"id")        >> int_  >> commasep) ^
                (prop_key(+"parent_id") >> int_  >> commasep) ^
                (prop_key(+"path")      >> text_ >> commasep) ^
                (prop_key(+"name")      >> text_ >> commasep) ^
                (prop_key(+"hash")      >> text_ >> commasep)
            ) >> '}';

prop_key = lexeme ['"' >> lazy(_r1) >> '"'] >> ':';
commasep = &char_('}') | ',';

这个语法允许

  • 无意义的空格,
  • 重新排序对象内的属性
  • 和省略的对象属性

优点:

  • 早期检查属性值类型
  • 缩短编译时间
  • 确实减少了代码:减少了 37 个 LoC(不包括大约 22% 的示例 JSON 行)
  • early checking of property value types
  • lower compile times
  • less code indeed: 37 fewer LoC (not counting the sample JSON lines that's ~22%)

最后一个好处有一个反面:如果您想读取略有不同的 JSON,现在您需要处理语法,而不仅仅是编写不同的提取/转换.在 37 行代码中,我更喜欢其他答案,但我会留给你决定.em>

That last benefit has a flip side: if ever you want to read slightly different JSON, now you need to muck with the grammar instead of just writing a different extraction/transform. At 37 lines of code, my preference is with the other answer but I'll leave it to you to decide.

这里是直接使用这个语法的同一个演示程序:

Here's the same demo program using this grammar directly:

生活在 Coliru

//#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>

namespace qi = boost::spirit::qi;

static std::string const sample = R"(
    {
        "folders" :
        [{
                "id" : 109,
                "parent_id" : 110,
                "path" : "/1/105/110/"
            },
            {
                "id" : 110,
                "parent_id" : 105,
                "path" : "/1/105/"
            }
        ],

        "files" :
        [{
                "id" : 26,
                "parent_id" : 105,
                "name" : "picture.png",
                "hash" : "md5_hash",
                "path" : "/1/105/"
            },
            {
                "id" : 25,
                "parent_id" : 110,
                "name" : "another_picture.jpg",
                "hash" : "md5_hash",
                "path" : "/1/105/110/"
            }
        ]
    })";

struct Data {
    struct Folder { int id, parent_id; std::string path; };
    struct File   { int id, parent_id; std::string path, name, md5_hash; };

    using Folders = std::vector<Folder>;
    using Files   = std::vector<File>;

    Folders folders;
    Files   files;
};

BOOST_FUSION_ADAPT_STRUCT(Data::Folder, (int,id)(int,parent_id)(std::string,path))
BOOST_FUSION_ADAPT_STRUCT(Data::File,   (int,id)(int,parent_id)(std::string,path)(std::string,name)(std::string,md5_hash))
BOOST_FUSION_ADAPT_STRUCT(Data,         (Data::Folders,folders)(Data::Files,files))

namespace folder_info { // adhoc JSON parser

    template <typename It, typename Skipper = qi::space_type>
    struct grammar : qi::grammar<It, Data(), Skipper>
    {
        grammar() : grammar::base_type(start) {
            using namespace qi;

            start    = '{' >> 
                       (folders_ >> commasep) ^
                       (files_ >> commasep)
                     >> '}';

            folders_ = prop_key(+"folders") >> '[' >> -(folder_ % ',') >> ']';
            files_   = prop_key(+"files")   >> '[' >> -(file_   % ',') >> ']';

            folder_  = '{' >> (
                            (prop_key(+"id")        >> int_  >> commasep) ^
                            (prop_key(+"parent_id") >> int_  >> commasep) ^
                            (prop_key(+"path")      >> text_ >> commasep)
                        ) >> '}';
            file_    = '{' >> (
                            (prop_key(+"id")        >> int_  >> commasep) ^
                            (prop_key(+"parent_id") >> int_  >> commasep) ^
                            (prop_key(+"path")      >> text_ >> commasep) ^
                            (prop_key(+"name")      >> text_ >> commasep) ^
                            (prop_key(+"hash")      >> text_ >> commasep)
                        ) >> '}';

            prop_key = lexeme ['"' >> lazy(_r1) >> '"'] >> ':';
            commasep = &char_('}') | ',';

            ////////////////////////////////////////
            // Bonus: properly decoding the string:
            text_   = '"' >> *ch_ >> '"';

            ch_ = +(
                    ~char_(""\")) [ _val += _1 ] |
                       qi::lit("x5C") >> (               //  (reverse solidus)
                       qi::lit("x22") [ _val += '"'  ] | // "    quotation mark  U+0022
                       qi::lit("x5C") [ _val += '\' ] | //     reverse solidus U+005C
                       qi::lit("x2F") [ _val += '/'  ] | // /    solidus         U+002F
                       qi::lit("x62") [ _val += '' ] | // b    backspace       U+0008
                       qi::lit("x66") [ _val += 'f' ] | // f    form feed       U+000C
                       qi::lit("x6E") [ _val += '
' ] | // n    line feed       U+000A
                       qi::lit("x72") [ _val += '' ] | // r    carriage return U+000D
                       qi::lit("x74") [ _val += '	' ] | // t    tab             U+0009
                       qi::lit("x75")                    // uXXXX                U+XXXX
                            >> _4HEXDIG [ append_utf8(qi::_val, qi::_1) ]
                    );

            BOOST_SPIRIT_DEBUG_NODES((files_)(folders_)(file_)(folder_)(start)(text_))
        }
    private:
        qi::rule<It, Data(),            Skipper> start;
        qi::rule<It, Data::Files(),     Skipper> files_;
        qi::rule<It, Data::Folders(),   Skipper> folders_;
        qi::rule<It, Data::File(),      Skipper> file_;
        qi::rule<It, Data::Folder(),    Skipper> folder_;
        qi::rule<It, void(const char*), Skipper> prop_key;

        qi::rule<It, std::string()> text_, ch_;
        qi::rule<It> commasep;

        struct append_utf8_f {
            template <typename...> struct result { typedef void type; };
            template <typename String, typename Codepoint>
            void operator()(String& to, Codepoint codepoint) const {
                auto out = std::back_inserter(to);
                boost::utf8_output_iterator<decltype(out)> convert(out);
                *convert++ = codepoint;
            }
        };
        boost::phoenix::function<append_utf8_f> append_utf8;
        qi::uint_parser<uint32_t, 16, 4, 4> _4HEXDIG;
    };

    template <typename Range, typename It = typename boost::range_iterator<Range const>::type>
    Data parse(Range const& input) {
        grammar<It> g;

        It first(boost::begin(input)), last(boost::end(input));
        Data parsed;
        bool ok = qi::phrase_parse(first, last, g, qi::space, parsed);

        if (ok && (first == last))
            return parsed;

        throw std::runtime_error("Remaining unparsed: '" + std::string(first, last) + "'");
    }
}

int main()
{
    auto parsed = folder_info::parse(sample);

    for (auto& e : parsed.folders) 
        std::cout << "folder:	" << e.id << "	" << e.path << "
";
    for (auto& e : parsed.files) 
        std::cout << "file:	"   << e.id << "	" << e.path << "	" << e.name << "
";
}

输出:

folder: 109 /1/105/110/
folder: 110 /1/105/
file:   26  /1/105/ picture.png
file:   25  /1/105/110/ another_picture.jpg

相关文章