Boost Spirit还原解析

2021-12-24 00:00:00 c++ boost boost-spirit boost-spirit-qi boost-phoenix

我想解析一个包含以下结构的文件:

I want to parse a file containing the following structure:

some garbage *&% section1 { section_content } section2 { section_content }

解析section_name1 { ... } section_name2 { ... } 的规则已经定义:

The rule parsing section_name1 { ... } section_name2 { ... } is already defined:

section_name_rule = lexeme[+char_("A-Za-z0-9_")]; section = section_name_rule > lit("{") > /*some complicated things*/... > lit("}"); sections %= +section;

所以我需要跳过任何垃圾，直到满足 sections 规则.有什么办法可以做到这一点吗?我试过 seek[sections]，但它似乎不起作用.

So I need to skip any garbage until the sections rule is met. Is there any way to accomplish this? I have tried seek[sections], but it seems not to work.

编辑:我本地化了seek不起作用的原因:如果我使用跟随操作符(>>)，那么它就起作用了.如果使用了期望解析器 (>)，则会抛出异常.这是一个示例代码:

EDIT: I localized the reason why seek is not working: if I use follows operator(>>), then it works. If expectation parser is used (>), then it throws an exception. Here is a sample code:

#define BOOST_SPIRIT_DEBUG #include <boost/fusion/adapted/struct.hpp> #include <boost/spirit/include/qi.hpp> #include <boost/spirit/repository/include/qi_seek.hpp> #include <boost/spirit/include/phoenix.hpp> namespace qi = boost::spirit::qi; using boost::phoenix::push_back; struct section_t { std::string name, contents; friend std::ostream& operator<<(std::ostream& os, section_t const& s) { return os << "section_t[" << s.name << "] {" << s.contents << "}"; } }; BOOST_FUSION_ADAPT_STRUCT(section_t, (std::string, name)(std::string, contents)) typedef std::vector<section_t> sections_t; template <typename It, typename Skipper = qi::space_type> struct grammar : qi::grammar<It, sections_t(), Skipper> { grammar() : grammar::base_type(start) { using namespace qi; using boost::spirit::repository::qi::seek; section_name_rule = lexeme[+char_("A-Za-z0-9_")]; //Replacing '>>'s with '>'s throws an exception, while this works as expected!! section = section_name_rule >> lit("{") >> lexeme[*~char_('}')] >> lit("}"); start = seek [ hold[section[push_back(qi::_val, qi::_1)]] ] >> *(section[push_back(qi::_val, qi::_1)]); } private: qi::rule<It, sections_t(), Skipper> start; qi::rule<It, section_t(), Skipper> section; qi::rule<It, std::string(), Skipper> section_name_rule; }; int main() { typedef std::string::const_iterator iter; std::string storage("sdfsdf sd:fgdfg section1 {dummy } section2 {dummy } section3 {dummy }"); iter f(storage.begin()), l(storage.end()); sections_t sections; if (qi::phrase_parse(f, l, grammar<iter>(), qi::space, sections)) { for(auto& s : sections) std::cout << "Parsed: " << s << " "; } if (f != l) std::cout << "Remaining unparsed: '" << std::string(f,l) << "' "; }

所以在真实的例子中，我的整个语法都是用期望运算符构建的.我是否必须更改所有内容才能使寻求"工作，还是有其他方法(比方说，寻求一个简单的{"，然后恢复一个 section_name_rule)??

So in the real example my entire grammar is constructed with expectation operators. Do I have to change everything to make the "seek" work, or is there any other way (let's say, seek a simple "{", and revert one section_name_rule back)??

推荐答案

这里有一个演示，以哈姆雷特为灵感:生活在 Coliru

Here's a demonstration, using Hamlet for inspiration: Live On Coliru

start = *seek [ no_skip[eol] >> hold [section] ];

注意事项:

降低期望值
通过要求在部分名称之前开始行来进行优化

示例输入:

some garbage *&% section1 { Claudius: ...But now, my cousin Hamlet, and my son ― Hamlet: A little more than kin, and less than kind. } WE CAN DO MOAR GARBAGE section2 { Claudius: How is it that the clouds still hang on you? Hamlet: Not so my lord; I am too much i' the sun }

输出:

Parsed: section_t[section1] {Claudius: ...But now, my cousin Hamlet, and my son ― Hamlet: A little more than kin, and less than kind. } Parsed: section_t[section2] {Claudius: How is it that the clouds still hang on you? Hamlet: Not so my lord; I am too much i' the sun }

参考清单

// #define BOOST_SPIRIT_DEBUG #include <boost/fusion/adapted/struct.hpp> #include <boost/spirit/include/qi.hpp> #include <boost/spirit/repository/include/qi_seek.hpp> namespace qi = boost::spirit::qi; struct section_t { std::string name, contents; friend std::ostream& operator<<(std::ostream& os, section_t const& s) { return os << "section_t[" << s.name << "] {" << s.contents << "}"; } }; BOOST_FUSION_ADAPT_STRUCT(section_t, (std::string, name)(std::string, contents)) typedef std::vector<section_t> sections_t; template <typename It, typename Skipper = qi::space_type> struct grammar : qi::grammar<It, sections_t(), Skipper> { grammar() : grammar::base_type(start) { using namespace qi; using boost::spirit::repository::qi::seek; section_name_rule = lexeme[+char_("A-Za-z0-9_")]; section = section_name_rule >> '{' >> lexeme[*~char_('}')] >> '}'; start = *seek [ no_skip[eol] >> hold [section] ]; BOOST_SPIRIT_DEBUG_NODES((start)(section)(section_name_rule)) } private: qi::rule<It, sections_t(), Skipper> start; qi::rule<It, section_t(), Skipper> section; qi::rule<It, std::string(), Skipper> section_name_rule; }; int main() { using It = boost::spirit::istream_iterator; It f(std::cin >> std::noskipws), l; sections_t sections; if (qi::phrase_parse(f, l, grammar<It>(), qi::space, sections)) { for(auto& s : sections) std::cout << "Parsed: " << s << " "; } if (f != l) std::cout << "Remaining unparsed: '" << std::string(f,l) << "' "; }

相关文章