Boost.Spirit:Lex + Qi 错误报告

2021-12-26 00:00:00 error-handling c++ boost boost-spirit lex

我正在为使用缩进等的相当复杂的配置文件编写解析器.我决定使用 Lex 将输入分解为标记，因为它似乎让生活更轻松.问题是我找不到任何使用 Qi 错误报告工具 (on_error) 和解析器的示例，这些解析器对令牌流而不是字符进行操作.

I am writing a parser for quite complicated config files that make use of indentation etc. I decided to use Lex to break input into tokens as it seems to make life easier. The problem is that I cannot find any examples of using Qi error reporting tools (on_error) with parsers that operate on stream of tokens instead of characters.

要在 on_error 中使用的错误处理程序需要一些时间才能准确指出错误在输入流中的位置.所有示例都只是从一对迭代器构造 std::string 并打印它们.但是如果使用 Lex，则迭代器是标记序列的迭代器，而不是字符.在我的程序中，这导致在我注意到无效的迭代器类型之前挂在 std::string 构造函数中.

Error handler to be used in on_error takes some to be able to indicate exactly where the error is in the input stream. All examples just construct std::string from the pair of iterators and print them. But if Lex is used, that iterators are iterators to the sequence of tokens, not characters. In my program this led to hang in std::string constructor before I noticed invalid iterator type.

据我所知，token 可以将一对迭代器保存到输入流中作为其值.这是默认的属性类型(如果类型类似于 lex::lexertl::token<>).但是，如果我希望我的令牌包含对解析更有用的东西(int、std::string 等)，那么这些迭代器就会丢失.

As I understand token can hold a pair of iterators to the input stream as its value. This is the default attribute type (if type is like lex::lexertl::token<>). But if I want my token to contain something more useful for parsing (int, std::string, etc), those iterators are lost.

在将 Lex 与 Qi 一起使用时，如何生成人性化的错误消息来指示输入流中的位置?有没有这种用法的例子?

How can I produce human friendly error messages indicating position in the input stream while using Lex with Qi? Are there any examples of such usage?

谢谢.

推荐答案

抱歉回复晚了，但我花了一些时间来准备一个体面的示例来说明您要实现的目标.我现在向 Spirit 添加了一个新的词法分析器示例:conjure_lexer.它是 conjure (Qi) 示例的修改版本，实现了一种小型编程语言.主要区别在于它使用词法分析器而不是纯粹的 Qi 语法.

Sorry for the late reply, but it took me some time to prepare a decent example of what you're trying to achieve. I now added a new lexer example to Spirit: conjure_lexer. It is a modified version of the conjure (Qi) example implementing a small programming language. The main difference is that it is using a lexer instead of a pure Qi grammar.

新的 conjure_lexer 示例演示了几件事:a) 它使用了一个新的 position_token 类，它扩展了现有的 token 类型.它始终存储指向相应匹配输入序列的迭代器对(除了通常的信息，如令牌 ID、令牌值等).b) 它使用这个位置信息来报告错误c) 顺着思路，它演示了如何使用词法分析器来简化语法.

The new conjure_lexer example demonstrates several things: a) it is using a new position_token class, which extends the existing token type. It always stores the pair of iterators pointing to the corresponding matched input sequence (in addition to the usual information like token id, token value, etc.). b) it is using this positional information for error reporting c) and along the lines, it demonstrates how using a lexer can simplify the grammar.

新示例位于 SVN(主干)中，并将在 Boost V1.47(即将发布)中提供.它在这个目录中:$BOOST_ROOT/libs/spirit/example/qi/compiler-tutorial/conjure_lexer.

The new example is in SVN (trunk) and will be available in Boost V1.47 (to be released soon). It's in this directory: $BOOST_ROOT/libs/spirit/example/qi/compiler-tutorial/conjure_lexer.

相关文章