创建 C/C++ 解析器/分析器的好工具
有哪些好的工具可以快速开始解析和分析 C/C++ 代码?
What are some good tools for getting a quick start for parsing and analyzing C/C++ code?
特别是,我正在寻找处理 C/C++ 预处理器和语言的开源工具.这些工具最好使用 lex/yacc(或 flex/bison)作为语法,不要太复杂.他们应该处理最新的 ANSI C/C++ 定义.
In particular, I'm looking for open source tools that handle the C/C++ preprocessor and language. Preferably, these tools would use lex/yacc (or flex/bison) for the grammar, and not be too complicated. They should handle the latest ANSI C/C++ definitions.
这是我目前发现的,但还没有详细研究它们(想法?):
Here's what I've found so far, but haven't looked at them in detail (thoughts?):
- CScope - 老式 C 分析器.不过,似乎没有做完整的解析.被描述为用于查找 C 函数的美化grep".
- GCC - 每个人都喜欢的开源编译器.非常复杂,但似乎做到了这一切.有一个创建 GCC 扩展名为 GEM 的相关项目,但没有'自 GCC 4.1 (2006) 起未更新.
- PUMA - 纯粹的机械手.(来自页面:这个项目的意图是提供用于分析和操作 C/C++ 源代码的类库.为了这目的 PUMA 提供用于扫描、解析和操作 C/C++ 的类来源.".这看起来很有希望,但自 2001 年以来一直没有更新.显然 PUMA 已被纳入 AspectC++,但即使是这个项目自 2006 年以来也没有更新过.
- 各种 C/C++ 原始语法.你可以得到 c-c++-grammars-1.2.tar.gz,但这已经自 1997 年以来一直未维护.在 Google 上稍微搜索一下就可以找到可以作为起点的其他基本 lex/yacc 语法.
- 还有其他人吗?
- CScope - Old-school C analyzer. Doesn't seem to do a full parse, though. Described as a glorified 'grep' for finding C functions.
- GCC - Everybody's favorite open source compiler. Very complicated, but seems to do it all. There's a related project for creating GCC extensions called GEM, but hasn't been updated since GCC 4.1 (2006).
- PUMA - The PUre MAnipulator. (from the page: "The intention of this project is to provide a library of classes for the analysis and manipulation of C/C++ sources. For this purpose PUMA provides classes for scanning, parsing and of course manipulating C/C++ sources."). This looks promising, but hasn't been updated since 2001. Apparently PUMA has been incorporated into AspectC++, but even this project hasn't been updated since 2006.
- Various C/C++ raw grammars. You can get c-c++-grammars-1.2.tar.gz, but this has been unmaintained since 1997. A little Google searching pulls up other basic lex/yacc grammars that could serve as a starting place.
- Any others?
我希望以此为起点,将 C/C++ 源代码翻译成一种新的玩具语言.
I'm hoping to use this as a starting point for translating C/C++ source into a new toy language.
谢谢!-马特
(2/9 添加):澄清一下:除了 C/C++ 代码本身之外,我还想从预处理器中提取语义信息.我不希望#define foo 42"消失在整数42"中,但仍与名称foo"相连.不幸的是,这排除了一些首先运行预处理器并且只提供 C/C++ 解析树的解决方案)
(Added 2/9): Just a clarification: I want to extract semantic information from the preprocessor in addition to the C/C++ code itself. I don't want "#define foo 42" to disappear into the integer "42", but remain attached to the name "foo". This, unfortunately, excludes several solutions that run the preprocessor first and only deliver the C/C++ parse tree)
推荐答案
解析 C++ 非常困难,因为语法是不可判定的.引用 Yossi Kreinin:
Parsing C++ is extremely hard because the grammar is undecidable. To quote Yossi Kreinin:
异常复杂的语法
杰出"应按字面解释,因为所有流行语言都有上下文无关(或几乎"上下文无关)语法,而 C++ 具有不可判定 语法.如果您喜欢编译器和解析器,您可能知道这意味着什么.如果您不喜欢这种事情,这里有一个简单示例 显示解析 C++ 的问题:AA BB(CC);
是对象定义还是函数声明?事实证明,答案在很大程度上取决于语句之前的代码――上下文".这表明(在直观层面上)C++ 语法对上下文非常敏感.
"Outstandingly" should be interpreted literally, because all popular languages have context-free (or "nearly" context-free) grammars, while C++ has undecidable grammar. If you like compilers and parsers, you probably know what this means. If you're not into this kind of thing, there's a simple example showing the problem with parsing C++: is AA BB(CC);
an object definition or a function declaration? It turns out that the answer depends heavily on the code before the statement - the "context". This shows (on an intuitive level) that the C++ grammar is quite context-sensitive.
相关文章