如何正确解析 Boost.Xpressive 的胡子?

2021-12-24 00:00:00 parsing c++ boost mustache boost-xpressive

我尝试用出色的 mustache 解析器.html" rel="nofollow">Boost.XPressive 来自杰出的 Eric Niebler.但由于这是我的第一个解析器,我不熟悉编译器编写者的正常"方法和行话,并且在经过几天的反复试验后感到有点迷茫.所以我来到这里,希望有人能告诉我我的愚蠢方式是多么愚蠢;)

I have tried to write a mustache parser with the excellent Boost.XPressive from the brilliant Eric Niebler. But since this is my first parser I am not familiar with the "normal" approach and lingo of compiler writers and feel a bit lost after a few days of trial&error. So I come here and hope someone can tell me the foolishness of my n00bish ways ;)

这是我想提取的带有胡子模板的 HTML 代码 (http://mustache.github.io/):现在<bold>是{{#time}}gugus {{zeit}} oder nicht{{/time}} <i>为所有好人</i>来{007}帮助他们的</bold>{{国家}}.结果:{{#Res1}}零<b>est</b>mundi{{/Res1}}

This is the HTML code with the mustache templates that I want to extract (http://mustache.github.io/): Now <bold>is the {{#time}}gugus {{zeit}} oder nicht{{/time}} <i>for all good men</i> to come to the {007} aid of their</bold> {{country}}. Result: {{#Res1}}Nullum <b>est</b> mundi{{/Res1}}

  • 我编写的解析器不会打印任何内容,但也不会在编译时发出警告.我之前设法让它打印出部分 mustache 代码,但从来没有正确打印出来.
  • 我不知道如何遍历所有代码以查找所有出现的事件,然后还可以像使用 smatch what; 变量一样访问它们.该文档仅显示如何使用what"查找第一次出现或如何使用迭代器"输出所有出现.
    • 实际上我需要两者的结合.因为一旦找到某些东西,我就需要质疑标签名称和标签之间的内容(什么"会提供但迭代器"不允许) - 并采取相应的行动.我想我可以使用动作",但如何使用?
    • 我认为应该可以一次性完成标签查找和标签之间的内容",对吧?或者我需要为此解析 2 次 - 如果是,如何解析?
    • The parser I wrote doesn't print out anything but also doesn't issue a warning at compile-time. I managed before to have it print out parts of the mustache code but never all of it correctly.
    • I don't know how I can loop through all the code to find all occurrences but then also access them like with the smatch what; variable. The doc only shows how to find the first occurrence with "what" or how to output all the occurrences with the "iterator".
      • Actually I need a combination of both. Because once something is found I need to question the tags name and the content between the tags (which "what" would offer but the "iterator" won't allow) - and act accordingly. I guess I could use "actions" but how?
      • I think that it should be possible to do the tag finding and "content between tags" in one swoop, right? Or do I need to parser 2 times for that - and if so how?
      #include <boost/xpressive/xpressive_static.hpp>
      #include <boost/xpressive/match_results.hpp>
      typedef std::string::const_iterator It;
      using namespace boost::xpressive;
      
      std::string str = "Now <bold>is the {{#time}}gugus {{zeit}} oder nicht{{/time}} <i>for all good men</i> to come to the {007} aid of their</bold> {{country}}. Result: {{#Res1}}Nullum <b>est</b> mundi{{/Res1}}";
      // Parser setup --------------------------------------------------------
      mark_tag mtag (1), cond_mtag (2), user_str (3);
      sregex brackets = "{{"
                        >> keep ( mtag = repeat<1, 20> (_w) )
                        >> "}}"
                        ;
      
      sregex cond_brackets = "{{#"
                         >> keep (cond_mtag = repeat<1, 20> (_w) )
                         >> "}}"
                         >> * (
                             keep (user_str = + (*_s >> +alnum >> *_s) ) |
                             by_ref (brackets) |
                             by_ref (cond_brackets)
                         )
                         >> "{{/"
                         >> cond_mtag
                         >> "}}"
                         ;
      sregex mexpression = *( by_ref (cond_brackets) | by_ref (brackets) );
      
      // Looping + catching the results --------------------------------------
      smatch what2;
      std::cout << "
      regex_search:
      " << str << '
      ';
      It strBegin = str.begin(), strEnd = str.end();
      int ic = 0;
      
      do
      {
          if ( !regex_search ( strBegin, strEnd, what2, mexpression ) )
          {
              std::cout << "	>> Breakout of this life...! Exit after " << ic << " loop(s)." << std::endl;
              break;
          }
          else
          {
              std::cout << "**Loop Nr: " << ic << '
      ';
              std::cout << "	what2[0] "         << what2[0]      << '
      '; // whole match
              std::cout << "	what2[mtag] "      << what2[mtag]   << '
      ';
              std::cout << "	what2[cond_mtag] " << what2[cond_mtag]   << '
      ';
              std::cout << "	what2[user_str] "  << what2[user_str]    << '
      ';
              // display the nested results
              std::for_each (
                  what2.nested_results().begin(),
                  what2.nested_results().end(),
                  output_nested_results() // <--identical function from E.Nieblers documentation
              );
      
              strBegin = what2[0].second;
          }
          ++ic;
      }
      while (ic < 6 || strBegin != str.end() );
      

      推荐答案

      这是来自@sehe 的正确完整代码,现在可以在 GCC >4.8 和 Linux 和 Windows 下的 CLANG 下工作.再次非常感谢伙伴提供的这个很棒的帮助,尽管这意味着我可以埋葬 XPressive :D

      Here is the correct full code from @sehe that now works under GCC >4.8 and CLANG under Linux and Windows. Again many thanks mate for this awesome help, even though this means that I can bury XPressive :D

      以下几行已更改或添加:

      The following lines have changed or been added:

      // --
      #define BOOST_RESULT_OF_USE_DECLTYPE
      // --
      struct to_string_f {
      template <typename T>
      std::string operator()(T const& v) const { return v.to_string(); }};
      // --
      section     %= "{{" >> sense >> reference [ section_id = to_string(_1) ] >> "}}"
                      >> sequence // contents
                      > ("{{" >> ('/' >> lexeme [ lit(section_id) ]) >> "}}");
      // --
      phx::function<to_string_f> to_string;
      

      //#define BOOST_SPIRIT_DEBUG
      #define BOOST_RESULT_OF_USE_DECLTYPE
      #define BOOST_SPIRIT_USE_PHOENIX_V3
      #include <boost/fusion/adapted/struct.hpp>
      #include <boost/spirit/include/qi.hpp>
      #include <boost/spirit/include/phoenix.hpp>
      #include <boost/utility/string_ref.hpp>
      #include <functional>
      #include <map>
      
      namespace mustache {
      
          // any atom refers directly to source iterators for efficiency
          using boost::string_ref;
          template <typename Kind> struct atom {
              string_ref value;
      
              atom() { }
              atom(string_ref const& value) : value(value) { }
      
              friend std::ostream& operator<<(std::ostream& os, atom const& v) { return os << typeid(v).name() << "[" << v.value << "]"; }
          };
      
          // the atoms
          using verbatim = atom<struct verbatim_tag>;
          using variable = atom<struct variable_tag>;
          using partial  = atom<struct partial_tag>;
      
          // the template elements (any atom or a section)
          struct section;
      
          using melement = boost::variant<
                  verbatim,
                  variable,
                  partial, // TODO comments and set-separators
                  boost::recursive_wrapper<section>
              >;
      
          // the template: sequences of elements
          using sequence = std::vector<melement>;
      
          // section: recursively define to contain a template sequence
          struct section {
              bool       sense; // positive or negative
              string_ref control;
              sequence   content;
          };
      }
      
      BOOST_FUSION_ADAPT_STRUCT(mustache::section, (bool, sense)(boost::string_ref, control)(mustache::sequence, content))
      
      namespace qi = boost::spirit::qi;
      namespace phx= boost::phoenix;
      
      struct to_string_f {
          template <typename T>
          std::string operator()(T const& v) const { return v.to_string(); }
      };
      
      template <typename Iterator>
          struct mustache_grammar : qi::grammar<Iterator, mustache::sequence()>
      {
          mustache_grammar() : mustache_grammar::base_type(sequence)
          {
              using namespace qi;
              static const _a_type section_id = {}; // local
              using boost::phoenix::construct;
              using boost::phoenix::begin;
              using boost::phoenix::size;
      
              sequence     = *element;
              element      = 
                          !(lit("{{") >> '/') >> // section-end ends the current sequence
                          (partial | section | variable | verbatim);
      
              reference    = raw [ lexeme [ +(graph - "}}") ] ]
                              [ _val = construct<boost::string_ref>(&*begin(_1), size(_1)) ];
      
              partial      = qi::lit("{{") >> "> " >> reference >> "}}";
      
              sense        = ('#' > attr(true))
                           | ('^' > attr(false));
      
              section     %= "{{" >> sense >> reference [ section_id = to_string(_1) ] >> "}}"
                          >> sequence // contents
                          > ("{{" >> ('/' >> lexeme [ lit(section_id) ]) >> "}}");
      
              variable     = "{{" >> reference >> "}}";
      
              verbatim     = raw [ lexeme [ +(char_ - "{{") ] ]
                              [ _val = construct<boost::string_ref>(&*begin(_1), size(_1)) ];
      
              BOOST_SPIRIT_DEBUG_NODES(
                      (sequence)(element)(partial)(variable)(section)(verbatim)
                      (reference)(sense)
                  )
          }
        private:
          phx::function<to_string_f> to_string;
          qi::rule<Iterator, mustache::sequence()> sequence;
          qi::rule<Iterator, mustache::melement()> element;
          qi::rule<Iterator, mustache::partial()>  partial;
          qi::rule<Iterator, mustache::section(), qi::locals<std::string> >  section;
          qi::rule<Iterator, bool()>                sense;                  // postive  or negative
          qi::rule<Iterator, mustache::variable()> variable;
          qi::rule<Iterator, mustache::verbatim()> verbatim;
          qi::rule<Iterator, boost::string_ref()>   reference;
      };
      
      namespace Dumping {
          struct dumper : boost::static_visitor<std::ostream&>
          {
              std::ostream& operator()(std::ostream& os, mustache::sequence const& v) const {
                  for(auto& element : v)
                      boost::apply_visitor(std::bind(dumper(), std::ref(os), std::placeholders::_1), element);
                  return os;
              }
              std::ostream& operator()(std::ostream& os, mustache::verbatim const& v) const {
                  return os << v.value;
              }
              std::ostream& operator()(std::ostream& os, mustache::variable const& v) const {
                  return os << "{{" << v.value << "}}";
              }
              std::ostream& operator()(std::ostream& os, mustache::partial const& v) const {
                  return os << "{{> " << v.value << "}}";
              }
              std::ostream& operator()(std::ostream& os, mustache::section const& v) const {
                  os << "{{" << (v.sense?'#':'^') << v.control << "}}";
                  (*this)(os, v.content);
                  return os << "{{/" << v.control << "}}";
              }
          };
      }
      
      namespace ContextExpander {
      
          struct Nil { };
      
          using Value = boost::make_recursive_variant<
              Nil,
              double,
              std::string,
              std::map<std::string, boost::recursive_variant_>,
              std::vector<boost::recursive_variant_>
          >::type;
      
          using Dict  = std::map<std::string, Value>;
          using Array = std::vector<Value>;
      
          static inline std::ostream& operator<<(std::ostream& os, Nil   const&)   { return os << "#NIL#"; }
          static inline std::ostream& operator<<(std::ostream& os, Dict  const& v) { return os << "#DICT("  << v.size() << ")#"; }
          static inline std::ostream& operator<<(std::ostream& os, Array const& v) { return os << "#ARRAY(" << v.size() << ")#"; }
      
          struct expander : boost::static_visitor<std::ostream&>
          {
              std::ostream& operator()(std::ostream& os, Value const& ctx, mustache::sequence const& v) const {
                  for(auto& element : v)
                      boost::apply_visitor(std::bind(expander(), std::ref(os), std::placeholders::_1, std::placeholders::_2), ctx, element);
                  return os;
              }
      
              template <typename Ctx>
              std::ostream& operator()(std::ostream& os, Ctx const&/*ignored*/, mustache::verbatim const& v) const {
                  return os << v.value;
              }
      
              std::ostream& operator()(std::ostream& os, Dict const& ctx, mustache::variable const& v) const {
                  auto it = ctx.find(v.value.to_string());
                  if (it != ctx.end())
                      os << it->second;
                  return os;
              }
      
              template <typename Ctx>
              std::ostream& operator()(std::ostream& os, Ctx const&, mustache::variable const&) const {
                  return os;
              }
      
              std::ostream& operator()(std::ostream& os, Dict const& ctx, mustache::partial const& v) const {
                  auto it = ctx.find(v.value.to_string());
                  if (it != ctx.end())
                  {
                      static const mustache_grammar<std::string::const_iterator> p;
      
                      auto const& subtemplate = boost::get<std::string>(it->second);
                      std::string::const_iterator first = subtemplate.begin(), last = subtemplate.end();
      
                      mustache::sequence dynamic_template;
                      if (qi::parse(first, last, p, dynamic_template))
                          return (*this)(os, Value{ctx}, dynamic_template);
                  }
                  return os << "#ERROR#";
              }
      
              std::ostream& operator()(std::ostream& os, Dict const& ctx, mustache::section const& v) const {
                  auto it = ctx.find(v.control.to_string());
                  if (it != ctx.end())
                      boost::apply_visitor(std::bind(do_section(), std::ref(os), std::placeholders::_1, std::cref(v)), it->second);
                  else if (!v.sense)
                      (*this)(os, Value{/*Nil*/}, v.content);
      
                  return os;
              }
      
              template <typename Ctx, typename T>
              std::ostream& operator()(std::ostream& os, Ctx const&/* ctx*/, T const&/* element*/) const {
                  return os << "[TBI:" << __PRETTY_FUNCTION__ << "]";
              }
      
            private:
              struct do_section : boost::static_visitor<> {
                  void operator()(std::ostream& os, Array const& ctx, mustache::section const& v) const {
                      for(auto& item : ctx)
                          expander()(os, item, v.content);
                  }
                  template <typename Ctx>
                  void operator()(std::ostream& os, Ctx const& ctx, mustache::section const& v) const {
                      if (v.sense == truthiness(ctx))
                          expander()(os, Value(ctx), v.content);
                  }
                private:
                  static bool truthiness(Nil)                              { return false; }
                  static bool truthiness(double d)                         { return 0. == d; }
                  template <typename T> static bool truthiness(T const& v) { return !v.empty(); }
              };
          };
      
      }
      
      int myMain()
      {
          std::cout << std::unitbuf;
          std::string input = "<ul>{{#time}}
      	<li>{{> partial}}</li>{{/time}}</ul>
       "
              "<i>for all good men</i> to come to the {007} aid of "
              "their</bold> {{country}}. Result: {{^Res2}}(absent){{/Res2}}{{#Res2}}{{Res2}}{{/Res2}}"
              ;
          // Parser setup --------------------------------------------------------
          typedef std::string::const_iterator It;
          static const mustache_grammar<It> p;
      
          It first = input.begin(), last = input.end();
      
          try {
              mustache::sequence parsed_template;
              if (qi::parse(first, last, p, parsed_template))
              {
                  std::cout << "Parse success
      ";
              } else
              {
                  std::cout << "Parse failed
      ";
              }
      
              if (first != last)
              {
                  std::cout << "Remaing unparsed input: '" << std::string(first, last) << "'
      ";
              }
      
              std::cout << "Input:      " << input << "
      ";
              std::cout << "Dump:       ";
              Dumping::dumper()(std::cout, parsed_template) << "
      ";
      
              std::cout << "Evaluation: ";
      
              {
                  using namespace ContextExpander;
                  expander engine;
      
                  Value const ctx = Dict { 
                      { "time", Array {
                          Dict { { "partial", "gugus {{zeit}} (a.k.a. <u>{{title}}</u>)"},             { "title", "noon" },    { "zeit", "12:00" } },
                          Dict { { "partial", "gugus {{zeit}} (a.k.a. <u>{{title}}</u>)"},             { "title", "evening" }, { "zeit", "19:30" } },
                          Dict { { "partial", "gugus <u>{{title}}</u> (expected at around {{zeit}})"}, { "title", "dawn" },    { "zeit", "06:00" } },
                      } },
                      { "country", "ESP" },
                      { "Res3", "unused" }
                  };
      
                  engine(std::cout, ctx, parsed_template);
              }
          } catch(qi::expectation_failure<It> const& e)
          {
              std::cout << "Unexpected: '" << std::string(e.first, e.last) << "'
      ";
          }
      }
      

相关文章