我可以对代码强制执行哪些类型的模式以使其更容易转换为另一种编程语言?
我正在着手做一个附带项目,其目标是将代码从一种编程语言翻译成另一种编程语言.我开始使用的语言是 PHP 和 Python(Python 到 PHP 应该更容易开始),但理想情况下,我可以(相对)轻松地添加其他语言.计划是:
I am setting out to do a side project that has the goal of translating code from one programming language to another. The languages I am starting with are PHP and Python (Python to PHP should be easier to start with), but ideally I would be able to add other languages with (relative) ease. The plan is:
这是面向 Web 开发的.原始代码和目标代码将位于框架之上(我也必须编写).这些框架将采用 MVC 设计模式并遵循严格的编码约定.这应该会使翻译更容易一些.
This is geared towards web development. The original and target code will be be sitting on top of frameworks (which I will also have to write). These frameworks will embrace an MVC design pattern and follow strict coding conventions. This should make translation somewhat easier.
我也在研究 IOC 和依赖注入,因为它们可能使翻译过程更容易,更不容易出错.
I am also looking at IOC and dependency injection, as they might make the translation process easier and less error prone.
我将使用 Python 的 解析器模块,它让我摆弄抽象语法树.显然我能用 PHP 得到的最接近的是 token_get_all(),这是一个开始.
I'll make use of Python's parser module, which lets me fiddle with the Abstract Syntax Tree. Apparently the closest I can get with PHP is token_get_all(), which is a start.
从那时起,我可以构建 AST、符号表和控制流.
From then on I can build the AST, symbol tables and control flow.
那我相信我可以开始输出代码了.我不需要完美的翻译.我仍然需要检查生成的代码并修复问题.理想情况下,翻译人员应该标记有问题的翻译.
Then I believe I can start outputting code. I don't need a perfect translation. I'll still have to review the generated code and fix problems. Ideally the translator should flag problematic translations.
在你问这到底是什么意思?"之前答案是……这将是一次有趣的学习经历.如果您对如何使这不那么令人生畏有任何见解,请告诉我.
Before you ask "What the hell is the point of this?" The answer is... It'll be an interesting learning experience. If you have any insights on how to make this less daunting, please let me know.
我更感兴趣的是知道我可以对代码强制执行哪些类型的模式以使其更容易翻译(即:IoC、SOA?)代码,而不是如何进行翻译.
I am more interested in knowing what kinds of patterns I could enforce on the code to make it easier to translate (ie: IoC, SOA ?) the code than how to do the translation.
推荐答案
我一直自 1995 年以来,在强大的计算机科学家团队的支持下,构建工具(DMS Software Reengineering Toolkit) 来进行通用程序操作(语言翻译是一个特例).DMS 提供通用解析、AST 构建、符号表、控制和数据流分析、翻译规则的应用、带有注释的源文本的再生等,所有这些都由计算机语言的显式定义参数化.
I've been building tools (DMS Software Reengineering Toolkit) to do general purpose program manipulation (with language translation being a special case) since 1995, supported by a strong team of computer scientists. DMS provides generic parsing, AST building, symbol tables, control and data flow analysis, application of translation rules, regeneration of source text with comments, etc., all parameterized by explicit definitions of computer languages.
您需要很好执行此操作的机器数量巨大(尤其是如果您希望能够以通用方式为多种语言执行此操作),然后您需要可靠的语言解析器具有不可靠的定义(PHP 就是一个很好的例子).
The amount of machinery you need to do this well is vast (especially if you want to be able to do this for multiple languages in a general way), and then you need reliable parsers for languages with unreliable definitions (PHP is perfect example of this).
您考虑构建语言到语言的翻译器或尝试它并没有错,但我认为您会发现对于真正的语言来说,这比您预期的要大得多.我们在 DMS 上投入了大约 100 人年,在每个可靠"语言定义(包括我们为 PHP 痛苦地构建的那个)上再投入 6-12 个月,对于 C++ 等讨厌的语言则更多.这将是一次地狱般的学习经历";它一直是给我们的.(您可能会发现上述网站上的技术论文部分很有趣,可以快速开始学习).
There's nothing wrong with you thinking about building a language-to-language translator or attempting it, but I think you'll find this a much bigger task for real languages than you expect. We have some 100 man-years invested in just DMS, and another 6-12 months in each "reliable" language definition (including the one we painfully built for PHP), much more for nasty languages such as C++. It will be a "hell of a learning experience"; it has been for us. (You might find the technical Papers section at the above website interesting to jump start that learning).
人们经常尝试从他们熟悉的某些技术开始构建某种通用机器,这可以完成部分工作.(Python AST 就是很好的例子).好消息是,部分工作已经完成.坏消息是机器内置了无数的假设,其中大部分在你试图让它做其他事情之前你不会发现.到那时,您会发现机器已经连线可以做它最初做的事情,并且会真的,真的会抵制你让它做其他事情的尝试.(我怀疑尝试让 Python AST 来为 PHP 建模会很有趣).
People often attempt to build some kind of generalized machinery by starting with some piece of technology with which they are familiar, that does a part of the job. (Python ASTs are great example). The good news, is that part of the job is done. The bad news is that machinery has a zillion assumptions built into it, most of which you won't discover until you try to wrestle it into doing something else. At that point you find out the machinery is wired to do what it originally does, and will really, really resist your attempt to make it do something else. (I suspect trying to get the Python AST to model PHP is going to be a lot of fun).
我最初开始构建 DMS 的原因是为了构建很少内置此类假设的基础.它有一些让我们头疼的地方.到目前为止,没有黑洞.(在过去的 15 年里,我工作中最困难的部分是试图阻止这种假设蔓延).
The reason I started to build DMS originally was to build foundations that had very few such assumptions built in. It has some that give us headaches. So far, no black holes. (The hardest part of my job over the last 15 years is to try to prevent such assumptions from creeping in).
很多人还错误地认为,如果他们可以解析(并且可能获得 AST),那么他们就可以做一些复杂的事情了.艰难的教训之一是您需要符号表和流分析来进行良好的程序分析或转换.AST 是必要的,但还不够.这就是 Aho&Ullman 的编译器书没有停留在第 2 章的原因.(OP 有这个权利,因为他计划在 AST 之外构建额外的机器).有关此主题的更多信息,请参阅解析后的生活.
Lots of folks also make the mistake of assuming that if they can parse (and perhaps get an AST), they are well on the way to doing something complicated. One of the hard lessons is that you need symbol tables and flow analysis to do good program analysis or transformation. ASTs are necessary but not sufficient. This is the reason that Aho&Ullman's compiler book doesn't stop at chapter 2. (The OP has this right in that he is planning to build additional machinery beyond the AST). For more on this topic, see Life After Parsing.
我不需要完美的翻译"这句话很麻烦.弱翻译者所做的是转换简单"的 80% 代码,而将难的 20% 手动完成.如果您打算转换的应用程序非常小,并且您只打算将其转换一次,那么 20% 就可以了.如果您想转换许多应用程序(甚至是同一个应用程序,但随着时间的推移会有细微的变化),这并不好.如果您尝试转换 100K SLOC,那么 20% 是 20,000 行原始代码,在另外 80,000 行您已经不理解的翻译程序的上下文中难以翻译、理解和修改.这需要付出巨大的努力.在百万行级别,这在实践中根本是不可能的.(令人惊讶的是,有些人不信任自动化工具并坚持手动翻译数百万行系统;这甚至更难,而且他们通常会因长时间延迟、高成本和经常彻底失败而痛苦地发现.)
The remark about "I don't need a perfect translation" is troublesome. What weak translators do is convert the "easy" 80% of the code, leaving the hard 20% to do by hand. If the application you intend to convert are pretty small, and you only intend to convert it once well, then that 20% is OK. If you want to convert many applications (or even the same one with minor changes over time), this is not nice. If you attempt to convert 100K SLOC then 20% is 20,000 original lines of code that are hard to translate, understand and modify in the context of another 80,000 lines of translated program you already don't understand. That takes a huge amount of effort. At the million line level, this is simply impossible in practice. (Amazingly there are people that distrust automated tools and insist on translating million line systems by hand; that's even harder and they normally find out painfully with long time delays, high costs and often outright failure.)
要翻译大型系统,您必须努力实现 90 % 的高转化率,否则您可能无法完成翻译活动的手动部分.
What you have to shoot for to translate large-scale systems is high nineties percentage conversion rates, or it is likely that you can't complete the manual part of the translation activity.
另一个关键考虑因素是要翻译的代码大小.即使使用好的工具,也需要花费大量精力来构建一个有效的、强大的翻译器.虽然构建翻译器而不是简单地进行手动转换看起来很性感和酷,但对于小型代码库(例如,根据我们的经验,高达 100K SLOC),经济学根本无法证明它是合理的.没有人喜欢这个答案,但如果你真的只需要翻译 10K SLOC 的代码,你最好还是硬着头皮去做.是的,这很痛苦.
Another key consideration is size of code to be translated. It takes a lot of energy to build a working, robust translator, even with good tools. While it seems sexy and cool to build a translator instead of simply doing a manual conversion, for small code bases (e.g., up to about 100K SLOC in our experience) the economics simply don't justify it. Nobody likes this answer, but if you really have to translate just 10K SLOC of code, you are probably better off just biting the bullet and doing it. And yes, that's painful.
我认为我们的工具非常好(但是,我很偏颇).而且要造一个好的翻译还是很困难的;我们大约需要 1.5-2 人年,并且我们知道如何使用我们的工具.不同之处在于,有了这么多机器,我们成功的次数要多于失败次数.
I consider our tools to be extremely good (but then, I'm pretty biased). And it is still very hard to build a good translator; it takes us about 1.5-2 man-years and we know how to use our tools. The difference is that with this much machinery, we succeed considerably more often than we fail.
相关文章