java编译究竟是如何发生的?

2022-01-16 00:00:00 jvm compiler-construction java

被java编译过程迷惑

Confused by java compilation process

好的,我知道:我们编写java源代码,独立于平台的编译器将其翻译成字节码,然后依赖于平台的jvm将其翻译成机器码.

OK i know this: We write java source code, the compiler which is platform independent translates it into bytecode, then the jvm which is platform dependent translates it into machine code.

所以从一开始,我们就编写java源代码.编译器 javac.exe 是一个 .exe 文件.这个 .exe 文件到底是什么?java编译器不是用java写的,那怎么会有.exe文件来执行呢?如果编写的编译器代码是java,那么编译器代码怎么会在编译阶段执行,因为它是jvm的工作来执行java代码.语言本身如何编译自己的语言代码?对我来说,这一切都像是先有鸡还是先有蛋的问题.

So from start, we write java source code. The compiler javac.exe is a .exe file. What exactly is this .exe file? Isn't the java compiler written in java, then how come there is .exe file which executes it? If the compiler code is written is java, then how come compiler code is executed at the compilation stage, since its the job of the jvm to execute java code. How can a language itself compile its own language code? It all seems like chicken and egg problem to me.

现在 .class 文件究竟包含什么?是不是文本形式的抽象语法树,是表格信息,是什么?

Now what exactly does the .class file contain? Is it a abstract syntax tree in text form, is it tabular information, what is it?

谁能告诉我有关如何将我的 java 源代码转换为机器代码的清晰详细的方法.

can anybody tell me clear and detailed way about how my java source code gets converted in machine code.

推荐答案

好的,我知道:我们编写java源代码,独立于平台的编译器将其翻译成字节码,

OK i know this: We write java source code, the compiler which is platform independent translates it into bytecode,

实际上,编译器本身工作作为本机可执行文件(因此是 javac.exe).确实,它将源文件转换为字节码.字节码是平台无关的,因为它是针对 Java 虚拟机的.

Actually the compiler itself works as a native executable (hence javac.exe). And true, it transforms source file into bytecode. The bytecode is platform independent, because it's targeted at Java Virtual Machine.

然后依赖于平台的 jvm 将其转换为机器码.

then the jvm which is platform dependent translates it into machine code.

并非总是如此.至于 Sun 的 JVM,有两个 jvm:客户端和服务器.它们都可以,但不一定必须编译为本机代码.

Not always. As for Sun's JVM there are two jvms: client and server. They both can, but not certainly have to compile to native code.

所以从一开始,我们就编写java源代码.编译器 javac.exe 是一个 .exe 文件.这个 .exe 文件到底是什么?java编译器不是用java写的,那怎么会有.exe文件来执行呢?

So from start, we write java source code. The compiler javac.exe is a .exe file. What exactly is this .exe file? Isn't the java compiler written in java, then how come there is .exe file which executes it?

这个 exe 文件是一个包装好的 java 字节码.这是为了方便 - 避免复杂的批处理脚本.它启动一个 JVM 并执行编译器.

This exe file is a wrapped java bytecode. It's for convenience - to avoid complicated batch scripts. It starts a JVM and executes the compiler.

如果编译器代码是java写的,那么编译器代码怎么会在编译阶段执行,因为它是jvm的工作来执行java代码.

If the compiler code is written is java, then how come compiler code is executed at the compilation stage, since its the job of the jvm to execute java code.

这正是包装代码的作用.

That's exactly what wrapping code does.

语言本身如何编译自己的语言代码?对我来说,这一切都像是先有鸡还是先有蛋的问题.

How can a language itself compile its own language code? It all seems like chicken and egg problem to me.

的确,乍一看令人困惑.不过,这不仅仅是 Java 的惯用语.Ada 的编译器也是用 Ada 本身编写的.它可能看起来像一个鸡和蛋的问题",但实际上,这只是一个引导问题.

True, confusing at first glance. Though, it's not only Java's idiom. The Ada's compiler is also written in Ada itself. It may look like a "chicken and egg problem", but in truth, it's only a bootstrapping problem.

现在 .class 文件究竟包含什么?是不是文本形式的抽象语法树,是表格信息,是什么?

Now what exactly does the .class file contain? Is it an abstract syntax tree in text form, is it tabular information, what is it?

这不是抽象语法树.AST 仅由标记器和编译器在编译时用于表示内存中的代码..class 文件就像一个程序集,但用于 JVM.反过来,JVM 是一种抽象机器,可以运行专门的机器语言——仅针对虚拟机.在最简单的情况下,.class 文件的结构与普通程序集非常相似.一开始声明了所有的静态变量,然后是一些外部函数签名表,最后是机器码.

It's not Abstract Syntax Tree. AST is only used by tokenizer and compiler at compiling time to represent code in memory. .class file is like an assembly, but for JVM. JVM, in turn, is an abstract machine which can run specialized machine language - targeted only at virtual machine. In it's simplest, .class file has a very similar structure to normal assembly. At the beginning there are declared all static variables, then comes some tables of extern function signatures and lastly the machine code.

如果您真的很好奇,您可以使用javap"实用程序深入研究类文件.这是调用 javap -c Main 的示例(混淆)输出:

If You are really curious You can dig into classfile using "javap" utility. Here is sample (obfuscated) output of invoking javap -c Main:

0:   new #2; //class SomeObject
3:   dup
4:   invokespecial   #3; //Method SomeObject."<init>":()V
7:   astore_1
8:   aload_1
9:   invokevirtual   #4; //Method SomeObject.doSomething:()V
12:  return

所以你应该已经知道它到底是什么了.

So You should have an idea already what it really is.

谁能告诉我有关如何将我的 java 源代码转换为机器代码的清晰详细的方法.

can anybody tell me clear and detailed way about how my java source code gets converted in machine code.

我认为现在应该更清楚了,但这里有一个简短的总结:

I think it should be more clear right now, but here's short summary:

  • 您调用 javac 指向您的源代码文件.javac 的内部 reader(或标记器)读取您的文件并从中构建一个实际的 AST.所有的语法错误都来自这个阶段.

  • You invoke javac pointing to your source code file. The internal reader (or tokenizer) of javac reads your file and builds an actual AST out of it. All syntax errors come from this stage.

javac 还没有完成它的工作.当它拥有 AST 时,真正的编译就可以开始了.它使用访问者模式来遍历 AST 并解析外部依赖项以向代码添加含义(语义).成品保存为包含字节码的.class文件.

The javac hasn't finished its job yet. When it has the AST the true compilation can begin. It's using visitor pattern to traverse AST and resolves external dependencies to add meaning (semantics) to the code. The finished product is saved as a .class file containing bytecode.

现在是时候运行这个东西了.您使用 .class 文件的名称调用 java.现在 JVM 再次启动,但要解释您的代码.JVM 可能会将您的抽象字节码编译成本机程序集,也可能不会.如果需要,Sun 的 HotSpot 编译器可以与 Just In Time 编译结合使用.如果满足某些规则,JVM 会不断分析运行代码并重新编译为本机代码.最常见的是 hot 代码是第一个本地编译的.

Now it's time to run the thing. You invoke java with the name of .class file. Now the JVM starts again, but to interpret Your code. The JVM may, or may not compile Your abstract bytecode into the native assembly. The Sun's HotSpot compiler in conjunction with Just In Time compilation may do so if needed. The running code is constantly being profiled by the JVM and recompiled to native code if certain rules are met. Most commonly the hot code is the first to compile natively.

如果没有 javac,则必须使用类似于以下内容的方式调用编译器:

Without the javac one would have to invoke compiler using something similar to this:

%JDK_HOME%/bin/java.exe -cp:myclasspath com.sun.tools.javac.Main fileToCompile

如您所见,它正在调用 Sun 的私有 API,因此它绑定到 Sun JDK 实现.它将使构建系统依赖于它.如果切换到任何其他 JDK(除了 Sun 的 wiki 列出了 5 个),那么应该更新上面的代码以反映更改(因为编译器不太可能驻留在 com.sun.tools.javac 包中).其他编译器可以用本机代码编写.

As you can see it's calling Sun's private API so it's bound to Sun JDK implementation. It would make build systems dependent on it. If one switched to any other JDK (wiki lists 5 other than Sun's) then above code should be updated to reflect the change (since it's unlikely the compiler would reside in com.sun.tools.javac package). Other compilers could be written in native code.

所以标准的方法是把 javac 包装器与 JDK 一起提供.

So the standard way is to ship javac wrapper with JDK.

相关文章