Spirit-Qi:如何编写非终结符解析器?

2021-12-12 00:00:00 parsing c++ boost boost-spirit boost-spirit-qi

我想写一个可以使用的解析器(作为qi扩展)通过 my_parser(p1, p2, ...) 其中 p1, p2, ... 是 qi 解析器表达式.

I want to write a parser (as a qi extension) which can be used via my_parser(p1, p2, ...) where p1, p2, ... are qi parser expressions.

实际上，我想实现一个 best_match 解析器，它的工作方式类似于 qi 替代方案，但不选择第一个匹配规则，而是选择解释"大部分输入的规则.

Actually, I want to implement a best_matchparser which works like qi alternative, but selects not the first matching rule but the rule which 'explains' most of the input.

给定两个规则 simple_id = +(qi::alpha) 和 complex_id = simple_id >>*(qi::string("::") > simple_id) 它将在输入 willy::anton 上选择 complex_id.并且这样做时不会产生中间属性.这些好处是通过运行时支付的，因为需要先行解析.

Given two rules simple_id = +(qi::alpha) and complex_id = simple_id >> *(qi::string("::") > simple_id) it would select complex_id on the input willy::anton. And it would not produce intermediate attributes while doing so. These benefits get payed for with runtime because lookahead parsing is required.

在我看来，这种解析器结构有几个用例.benchmark(p1, ...) 作为优化的支持解析器，仅举一个例子.一旦我知道怎么做，我就会提供.

In my view there are several use cases for such a parser construct. benchmark(p1, ...) as a supporting parser for optimization, just one example. I will provide that, once I know how to that.

这个解析器将是一个非终结符.我尝试了(努力)，但我无法在 qi 中找到我的问题的答案.我的猜测是，C++ 机制与 c qi 如此紧密地集成在一起，以至于没有可理解的切入点，至少对我来说是这样.

This parser would be an non-terminal. I tried (hard), but I can not find answers to my problem within qi. My guess is, that C++ mechanisms are so tightly integrated withc qi, that there is no understandable entry point, at least for me.

实现我想要引入的操作符非常容易.我在下面附加了当前的解决方案.它按预期编译但不完整.

It is quite easy to implement what I want introducing an operator. I append the current solution below. It compiles as expected but is incomplete.

qi 算子得到一个已经准备好的融合::向量/序列(不管)并对其采取行动.似乎没有图书馆产品可以解决我的问题.甚至 make_nary_composite 已经期望将 args 编译为 Elements.

A qi operator gets a fusion::vector/sequence (whatever) ready made and acts on it. There seem to be no library offerings to solve my problem. Even make_nary_composite already expects the args being compiled to Elements.

我尝试了很多，但都失败了，所以我不想让你厌烦.

I tried a lot, everything failed, so I don't want to bore you with that.

我可以想象的一种解决方法是提供一个有状态的运算符 ,，这将使 p1, ... 成为合法的 qi 表达式.然后实现一个 unary_parser best_match 或指令来处理该表达式., 将获得两种模式:获取当前(成功)解析器消耗的输入长度，并实际解析第 1 阶段中选定的输入.包装器首先在模式 1 中运行该逗号解析器，然后在模式 2.它可能很难看，但可以工作.

A workaround which I can imagine could be to provide a stateful operator ,, which would make p1, ... to a legal qi expression. Then implement a unary_parser best_match or directive to process that expression. The , would get two modes: getting the length of input the current (successful) parser consumes and actually parsing the selected one from phase 1. The wrapper what run that comma parser first in mode 1 and then in mode 2. It might be ugly, but could work.

操作符实现 p1 |= p2 |= ... 对于 n > 2 的运行时间来说是最昂贵的.我很乐意解决这个问题.

The operator implementation p1 |= p2 |= ... is the most expensive regarding runtime for n > 2. I would be happy to get around that.

最小(但仍然合理)的开销将具有 best_match(p1, ...).这可能吗?

Minimum (but nevertheless reasonable) overhead would have best_match(p1, ...). Is that possible at all?

因为我对 boost::fusion 不太熟悉，所以我也在代码中添加了一些问题(关于融合的方法).

For I am not too experienced with boost::fusion I added some questions to the code as well (how tos regarding fusion).

感觉不对，将实际上是一元非终端解析器的东西压入一元解析器方案中.但由于我的理解很差，这似乎是完成它的唯一方法.如果您能帮助我摆脱这种困境，我将不胜感激.

It feels wrong, to press something which actually is a nary non-terminal parser into the unary parser scheme. But with my poor understanding it seems to be the only way to get it done. I'd highly appreciate any help to get me out of this.

我的环境:Boost 1.61，MSVC 2015 Upd 2，目标 win32console.

My environment: Boost 1.61, MSVC 2015 Upd 2, target win32console.

进展:

我想，我正在缓慢但肯定地得到它.我很高兴看到 error_invalid_expression.由于以下 proto 表达式，编译失败:

I think, I'm getting it slowly but surely. I was very happy to see an error_invalid_expression. Compile failed because of the following proto expression:

boost::proto::exprns_::expr< boost::proto::tagns_::tag::function,boost::proto::argsns_::list5< const boost::proto::exprns_::expr< boost::proto::tagns_::tag::terminal,boost::proto::argsns_::term< mxc::qitoo::tag::best_match > ,0> & ,const boost::spirit::qi::rule< iterator_type,std::string (void), boost::spirit::standard::space_type, boost::spirit::unused_type,boost::spirit::unused_type> & ,const boost::spirit::qi::rule< iterator_type,std::string (void), boost::spirit::standard::space_type, boost::spirit::unused_type, boost::spirit::unused_type> & , const boost::spirit::qi::rule< iterator_type,std::string (void), boost::spirit::standard::space_type, boost::spirit::unused_type, boost::spirit::unused_type> & ,const boost::spirit::qi::rule< iterator_type,std::string (void), boost::spirit::standard::space_type, boost::spirit::unused_type, boost::spirit::unused_type> & > ,5>

这实际上是描述我的 best_match 解析器的函数样式用法的表达式.我的测试规则看起来像

This actually is the expression which describes my function-style usage of my best_match parser. My test rule looks like

qitoo::best_match(id, qualified_id, id, qualified_id)

其中 id 和 qualified_id 是上面提到的规则.我想要的一切.发生错误是因为此表达式没有成员 parse.好的.深入挖掘后，我发现 qi 的元编译器确实支持一元、二元和一元(即可变参数)解析器.但它不支持函数风格的语法.Qi 似乎仅将一元、二元和一元用于运算符.尽管支持 nary 类型，但它或多或少已经过时了，因为 qi 运算符是二进制最大值.

where id and qualified_id are the rules as mentioned above. All that I want. The error occurs because this expression does not have a member parse. Okay. Digging deeper I found qi's meta-compiler does support unary, binary and nary (what means variadic) parsers. But it does not support function style syntax. Qi seems to utilize unary, binary and nary for operators only. And allthough the nary type is supported, it is more or less obsolete, because qi operators are binary max.

结束进度编辑

#include <boost/spirit/home/qi.hpp> namespace boost { namespace spirit { /////////////////////////////////////////////////////////////////////////// // Enablers /////////////////////////////////////////////////////////////////////////// template <> struct use_operator<qi::domain, proto::tag::bitwise_or_assign> // enables |= : mpl::true_ {}; template <> struct flatten_tree<qi::domain, proto::tag::bitwise_or_assign> // flattens |= : mpl::true_ {}; } } namespace mxc { namespace qitoo { namespace spirit = boost::spirit; namespace qi = spirit::qi; namespace fusion = boost::fusion; template <typename Elements> struct best_match_parser : qi::nary_parser<mxc::qitoo::best_match_parser<Elements>> { // This one does a lookahead parse of each qi expression to find out which rule matches // most of the input template <typename Iterator, typename Context, typename Skipper> struct best_function { best_function(Iterator& first, Iterator const& last, Context& context, Skipper const& skipper, int& best, int& idx, int& size) : first(first), last(last), context(context), skipper(skipper), best(best), idx(idx), size(size) {}; template <typename Component> void operator()(Component& component) const { Iterator f = first; if (component.parse(f, last, context, skipper, qi::unused)) { // please have a look on how I transport best, idx and size // into the parser context // // I guess, this is a nasty hack and could be done better // Any advice would be highliy appreciated int l = f - first; if (l > size) { size = l; best = idx++; } idx++; } } private: int& best; int& idx; int& size; Iterator& first; Iterator const& last; Context& context; Skipper const& skipper; }; template <typename Context, typename Iterator> struct attribute { // Put all the element attributes in a tuple typedef typename spirit::traits::build_attribute_sequence < Elements, Context, spirit::traits::alternative_attribute_transform , Iterator, qi::domain >::type all_attributes; // Ok, now make a variant over the attribute sequence. Note that // build_variant makes sure that 1) all attributes in the variant // are unique 2) puts the unused attribute, if there is any, to // the front and 3) collapses single element variants, variant<T> // to T. typedef typename spirit::traits::build_variant<all_attributes>::type type; }; best_match_parser(Elements const& elements_) : elements(elements_), size(0), idx(0), best(-1) {} template <typename Iterator, typename Context, typename Skipper, typename Attribute> bool parse(Iterator& first, Iterator const& last , Context& context, Skipper const& skipper , Attribute& attr_) const { best_function<Iterator, Context, Skipper> f(first, last, context, skipper, best, idx, size); // find out which parser matches most of the input fusion::for_each(elements, f); // best >= 0 if at least one parser was successful if (best >= 0) { // now that I have the best parser identified, how do I access it? // I understand that the next line can not work, but I'm looking for something like that // --> auto v = fusion::at<boost::mpl::int_<best>>(elements); }; return false; } template <typename Context> qi::info what(Context& context) const { qi::info result("best_match"); fusion::for_each(elements, spirit::detail::what_function<Context>(result, context)); return result; } Elements elements; mutable int best; mutable int idx; mutable int size; }; } } namespace boost { namespace spirit { namespace qi { /////////////////////////////////////////////////////////////////////////// // Parser generators: make_xxx function (objects) /////////////////////////////////////////////////////////////////////////// template <typename Elements, typename Modifiers> struct make_composite<proto::tag::bitwise_or_assign, Elements, Modifiers> : make_nary_composite < Elements, mxc::qitoo::best_match_parser > {}; } namespace traits { /////////////////////////////////////////////////////////////////////////// template <typename Elements> struct has_semantic_action<mxc::qitoo::best_match_parser<Elements> > : nary_has_semantic_action<Elements> {}; /////////////////////////////////////////////////////////////////////////// template <typename Elements, typename Attribute, typename Context , typename Iterator> struct handles_container<mxc::qitoo::best_match_parser<Elements>, Attribute, Context , Iterator> : nary_handles_container<Elements, Attribute, Context, Iterator> {}; } } } namespace qi = boost::spirit::qi; namespace qitoo = mxc::qitoo; using iterator_type = std::string::const_iterator; using result_type = std::string; template<typename Parser> void parse(const std::string message, const std::string& input, const std::string& rule, const Parser& parser) { iterator_type iter = input.begin(), end = input.end(); std::vector<result_type> parsed_result; std::cout << "------------------------- "; std::cout << message << " "; std::cout << "Rule: " << rule << std::endl; std::cout << "Parsing: "" << input << "" "; bool result = qi::phrase_parse(iter, end, parser, qi::space, parsed_result); if (result) { std::cout << "Parser succeeded. "; std::cout << "Parsed " << parsed_result.size() << " elements:"; for (const auto& str : parsed_result) std::cout << "[" << str << "]"; std::cout << std::endl; } else { std::cout << "Parser failed" << std::endl; } if (iter == end) { std::cout << "EOI reached." << std::endl; } else { std::cout << "EOI not reached. Unparsed: "" << std::string(iter, end) << """ << std::endl; } std::cout << "------------------------- "; } int main() { namespace qi = boost::spirit::qi; qi::rule < iterator_type, std::string(), qi::space_type> id = (qi::alpha | qi::char_('_')) >> *(qi::alnum | qi::char_('_')); qi::rule < iterator_type, std::string(), qi::space_type> qualified_id = id >> *(qi::string("::") > id); namespace qitoo = mxc::qitoo; namespace qi = boost::spirit::qi; parse("best match operator, select second rule" , "willy::anton::lutz" , "id |= qualified_id" , id |= qualified_id); }

推荐答案

你的例子
我看不出你的样本需要这些.只需重新排序您的分支，然后意识到短分支只是 n=1 的限定情况的特例:Live On Coliru1(或使用 X3 版本，如果您愿意).

Your example

I don't see how your sample requires any of this. Just reorder your branches, then realize that the short branch is just a special case of the qualified case with n=1: Live On Coliru1 (or using X3 version if you prefer).

现在，提到 x3，它有能力让您的生活更轻松！

Now, having mentioned x3, it has the capacity to make your live much easier!

以下是我认为您想要的，在一般情况下:

Here's what I think you wanted, in the general case:

namespace parser { template <typename... Parsers> struct longest_parser : x3::parser_base { longest_parser(Parsers... sub) : _alternatives {sub...} { } template <typename It, typename Ctx, typename Other, typename Attr> bool parse(It& f, It l, Ctx& ctx, Other const& other, Attr& attr) const { auto const saved = f; //// To exclude pre-skip from length comparisons, do pre-skip here: // x3::skip_over(f, l, ctx); auto seq = std::make_index_sequence<sizeof...(Parsers)>(); auto best = select_best(f, l, ctx, seq); //std::cout << "Longest match at index #" << best << " "; bool ok = dispatch(f, l, ctx, other, attr, best, seq); if (!ok) f = saved; return ok; } private: template <typename It, typename Ctx, typename P> size_t length_of(It f, It l, Ctx ctx, P const& p) const { boost::iterator_range<It> matched; return x3::raw[p].parse(f, l, ctx, x3::unused, matched)? boost::size(matched) : 0; } template <typename It, typename Ctx, size_t... I> size_t select_best(It f, It l, Ctx& ctx, std::index_sequence<I...>) const { std::array<size_t, sizeof...(I)> lengths { length_of(f, l, ctx, std::get<I>(_alternatives))... }; return std::distance(lengths.begin(), std::max_element(lengths.begin(), lengths.end())); } template <typename It, typename Ctx, typename Other, typename Attr, size_t... I> bool dispatch(It& f, It l, Ctx& ctx, Other const& other, Attr& attr, size_t targetIdx, std::index_sequence<I...>) const { //return (real_parse<I>(f, l, ctx, other, attr, targetIdx) || ...); std::array<bool, sizeof...(I)> b = { real_parse<I>(f, l, ctx, other, attr, targetIdx)... }; return std::accumulate(b.begin(), b.end(), false, std::logical_or<bool>()); } template <size_t Idx, typename It, typename Ctx, typename Other, typename Attr> bool real_parse(It& f, It l, Ctx& ctx, Other const& other, Attr& attr, size_t targetIdx) const { if (targetIdx != Idx) return false; return std::get<Idx>(_alternatives).parse(f, l, ctx, other, attr); } std::tuple<Parsers...> _alternatives; }; template <typename... Ps> longest_parser<Ps...> longest(Ps... p) { return {x3::as_parser(p)...}; } }

如果您的编译器支持，请注意您可以在 dispatch 中使用的折叠表达式(Coliru 支持，编辑它以查看！).

Note the fold expression you could use in dispatch if your compiler supports it (Coliru does, edit it to see!).

还要注意关于可跳过(可能是空格)的微妙选择；如果它对长度比较不重要，请取消对 pre-skip 的注释.

现场演示
生活在 Coliru

#include <boost/spirit/home/x3.hpp> #include <type_traits> #include <iostream> #include <numeric> namespace x3 = boost::spirit::x3; namespace std { template <typename T> // just for easy debug printing; hack static std::ostream& operator<<(std::ostream& os, std::vector<T> const& v) { for (auto& el : v) std::cout << '[' << el << ']'; return os; } } using string_vec = std::vector<std::string>; using result_type = boost::variant<std::string, double, string_vec>; template <typename Parser> void parse(const std::string message, const std::string &input, const std::string &rule, const Parser &parser) { auto iter = input.begin(), end = input.end(); std::cout << "------------------------- "; std::cout << message << " "; std::cout << "Rule: " << rule << " "; std::cout << "Parsing: '" << input << "' "; result_type parsed_result; bool result = phrase_parse(iter, end, parser, x3::space, parsed_result); if (result) { std::cout << "Parsed " << parsed_result << " "; } else { std::cout << "Parser failed "; } if (iter != end) std::cout << "EOI not reached. Unparsed: '" << std::string(iter, end) << "' "; } namespace parser { template <typename... Parsers> struct longest_parser : x3::parser_base { longest_parser(Parsers... sub) : _alternatives {sub...} { } template <typename It, typename Ctx, typename Other, typename Attr> bool parse(It& f, It l, Ctx& ctx, Other const& other, Attr& attr) const { auto const saved = f; //// To exclude pre-skip from length comparisons, do pre-skip here: // x3::skip_over(f, l, ctx); auto seq = std::make_index_sequence<sizeof...(Parsers)>(); auto best = select_best(f, l, ctx, seq); //std::cout << "Longest match at index #" << best << " "; bool ok = dispatch(f, l, ctx, other, attr, best, seq); if (!ok) f = saved; return ok; } private: template <typename It, typename Ctx, typename P> size_t length_of(It f, It l, Ctx ctx, P const& p) const { boost::iterator_range<It> matched; return x3::raw[p].parse(f, l, ctx, x3::unused, matched)? boost::size(matched) : 0; } template <typename It, typename Ctx, size_t... I> size_t select_best(It f, It l, Ctx& ctx, std::index_sequence<I...>) const { std::array<size_t, sizeof...(I)> lengths { length_of(f, l, ctx, std::get<I>(_alternatives))... }; return std::distance(lengths.begin(), std::max_element(lengths.begin(), lengths.end())); } template <typename It, typename Ctx, typename Other, typename Attr, size_t... I> bool dispatch(It& f, It l, Ctx& ctx, Other const& other, Attr& attr, size_t targetIdx, std::index_sequence<I...>) const { //return (real_parse<I>(f, l, ctx, other, attr, targetIdx) || ...); std::array<bool, sizeof...(I)> b = { real_parse<I>(f, l, ctx, other, attr, targetIdx)... }; return std::accumulate(b.begin(), b.end(), false, std::logical_or<bool>()); } template <size_t Idx, typename It, typename Ctx, typename Other, typename Attr> bool real_parse(It& f, It l, Ctx& ctx, Other const& other, Attr& attr, size_t targetIdx) const { if (targetIdx != Idx) return false; return std::get<Idx>(_alternatives).parse(f, l, ctx, other, attr); } std::tuple<Parsers...> _alternatives; }; template <typename... Ps> longest_parser<Ps...> longest(Ps... p) { return {x3::as_parser(p)...}; } } int main() { auto id = x3::rule<void, std::string> {} = x3::lexeme [ x3::char_("a-zA-Z_") >> *x3::char_("a-zA-Z0-9_") ]; auto qualified = x3::rule<void, string_vec> {} = id % "::"; #define TEST_CASE(label, input, rule) parse(label, input, #rule, rule) TEST_CASE("unqualified" , "willy" , parser::longest(id, x3::int_, x3::double_)); TEST_CASE("unqualified with whitespace", " willy " , parser::longest(id, x3::int_, x3::double_)); TEST_CASE("integral or number" , "123.78::anton::lutz" , parser::longest(id, x3::int_, x3::double_)); TEST_CASE("qualified" , "willy anton::lutz" , parser::longest(id, x3::int_, x3::double_)); TEST_CASE("qualified with whitespace" , "willy anton::lutz" , parser::longest(id, x3::int_, x3::double_)); TEST_CASE("unqualified" , "willy" , parser::longest(id, x3::int_, x3::double_, qualified)); TEST_CASE("unqualified with whitespace", " willy " , parser::longest(id, x3::int_, x3::double_, qualified)); TEST_CASE("integral or number" , "123.78::anton::lutz" , parser::longest(id, x3::int_, x3::double_, qualified)); TEST_CASE("qualified" , "willy::anton::lutz" , parser::longest(id, x3::int_, x3::double_, qualified)); TEST_CASE("qualified with whitespace" , "willy :: anton::lutz", parser::longest(id, x3::int_, x3::double_, qualified)); TEST_CASE("unqualified" , "willy" , parser::longest(x3::int_, x3::double_, qualified)); TEST_CASE("unqualified with whitespace", " willy " , parser::longest(x3::int_, x3::double_, qualified)); TEST_CASE("integral or number" , "123.78::anton::lutz" , parser::longest(x3::int_, x3::double_, qualified)); TEST_CASE("qualified" , "willy::anton::lutz" , parser::longest(x3::int_, x3::double_, qualified)); TEST_CASE("qualified with whitespace" , "willy :: anton::lutz", parser::longest(x3::int_, x3::double_, qualified)); }

印刷品

------------------------- unqualified Rule: parser::longest(id, x3::int_, x3::double_) Parsing: 'willy' Parsed willy ------------------------- unqualified with whitespace Rule: parser::longest(id, x3::int_, x3::double_) Parsing: ' willy ' Parsed willy ------------------------- integral or number Rule: parser::longest(id, x3::int_, x3::double_) Parsing: '123.78::anton::lutz' Parsed 123.78 EOI not reached. Unparsed: '::anton::lutz' ------------------------- qualified Rule: parser::longest(id, x3::int_, x3::double_) Parsing: 'willy anton::lutz' Parsed willy EOI not reached. Unparsed: 'anton::lutz' ------------------------- qualified with whitespace Rule: parser::longest(id, x3::int_, x3::double_) Parsing: 'willy anton::lutz' Parsed willy EOI not reached. Unparsed: 'anton::lutz' ------------------------- unqualified Rule: parser::longest(id, x3::int_, x3::double_, qualified) Parsing: 'willy' Parsed willy ------------------------- unqualified with whitespace Rule: parser::longest(id, x3::int_, x3::double_, qualified) Parsing: ' willy ' Parsed willy ------------------------- integral or number Rule: parser::longest(id, x3::int_, x3::double_, qualified) Parsing: '123.78::anton::lutz' Parsed 123.78 EOI not reached. Unparsed: '::anton::lutz' ------------------------- qualified Rule: parser::longest(id, x3::int_, x3::double_, qualified) Parsing: 'willy::anton::lutz' Parsed [willy][anton][lutz] ------------------------- qualified with whitespace Rule: parser::longest(id, x3::int_, x3::double_, qualified) Parsing: 'willy :: anton::lutz' Parsed [willy][anton][lutz] ------------------------- unqualified Rule: parser::longest(x3::int_, x3::double_, qualified) Parsing: 'willy' Parsed [willy] ------------------------- unqualified with whitespace Rule: parser::longest(x3::int_, x3::double_, qualified) Parsing: ' willy ' Parsed [willy] ------------------------- integral or number Rule: parser::longest(x3::int_, x3::double_, qualified) Parsing: '123.78::anton::lutz' Parsed 123.78 EOI not reached. Unparsed: '::anton::lutz' ------------------------- qualified Rule: parser::longest(x3::int_, x3::double_, qualified) Parsing: 'willy::anton::lutz' Parsed [willy][anton][lutz] ------------------------- qualified with whitespace Rule: parser::longest(x3::int_, x3::double_, qualified) Parsing: 'willy :: anton::lutz' Parsed [willy][anton][lutz]

注意不同的结果取决于备选方案中的解析器表达式.

Note the different results depending on the parser expressions in the alternatives.

相关文章