C++ 链接在实践中是如何工作的?

2022-01-11 00:00:00 linker c++

C++ 链接在实践中是如何工作的?我正在寻找的是关于如何链接发生的详细解释,而不是什么命令进行链接.

How does C++ linking work in practice? What I am looking for is a detailed explanation about how the linking happens, and not what commands do the linking.


This answer focuses on address relocation, which is one of the crucial functions of linking.


A minimal example will be used to clarify the concept.


Summary: relocation edits the .text section of object files to translate:

  • 目标文件地址
  • 进入可执行文件的最终地址


This must be done by the linker because the compiler only sees one input file at a time, but we must know about all object files at once to decide how to:

  • 解析未定义的符号,例如声明的未定义函数
  • 不冲突多个目标文件的多个 .text.data 部分
  • resolve undefined symbols like declared undefined functions
  • not clash multiple .text and .data sections of multiple object files


  • x86-64 或 IA-32 程序集
  • ELF 文件的全局结构.我已经制作了 一个教程

链接与 C 或 C++ 无关:编译器只是生成目标文件.然后链接器将它们作为输入,而不知道是什么语言编译了它们.也可以是 Fortran.

Linking has nothing to do with C or C++ specifically: compilers just generate the object files. The linker then takes them as input without ever knowing what language compiled them. It might as well be Fortran.

所以为了减少外壳,让我们研究一个 NASM x86-64 ELF Linux hello world:

So to reduce the crust, let's study a NASM x86-64 ELF Linux hello world:

section .data
    hello_world db "Hello world!", 10
section .text
    global _start

        ; sys_write
        mov rax, 1
        mov rdi, 1
        mov rsi, hello_world
        mov rdx, 13

        ; sys_exit
        mov rax, 60
        mov rdi, 0


nasm -felf64 hello_world.asm            # creates hello_world.o
ld -o hello_world.out hello_world.o     # static ELF executable with no libraries

使用 NASM 2.10.09.

with NASM 2.10.09.


First we decompile the .text section of the object file:

objdump -d hello_world.o


0000000000000000 <_start>:
   0:   b8 01 00 00 00          mov    $0x1,%eax
   5:   bf 01 00 00 00          mov    $0x1,%edi
   a:   48 be 00 00 00 00 00    movabs $0x0,%rsi
  11:   00 00 00
  14:   ba 0d 00 00 00          mov    $0xd,%edx
  19:   0f 05                   syscall
  1b:   b8 3c 00 00 00          mov    $0x3c,%eax
  20:   bf 00 00 00 00          mov    $0x0,%edi
  25:   0f 05                   syscall


   a:   48 be 00 00 00 00 00    movabs $0x0,%rsi
  11:   00 00 00

它应该将hello world字符串的地址移动到rsi寄存器中,该寄存器被传递给write系统调用.

which should move the address of the hello world string into the rsi register, which is passed to the write system call.

但是等等!当程序加载时,编译器怎么可能知道 Hello world!" 将在内存中结束的位置?

But wait! How can the compiler possibly know where "Hello world!" will end up in memory when the program is loaded?

嗯,它不能,特别是在我们将一堆 .o 文件与多个 .data 部分链接在一起之后.

Well, it can't, specially after we link a bunch of .o files together with multiple .data sections.


Only the linker can do that since only he will have all those object files.


  • 在编译输出上放置一个占位符值 0x0
  • 为链接器提供了一些额外信息,说明如何使用正确的地址修改已编译的代码

这个额外信息"包含在目标文件的 .rela.text 部分中

This "extra information" is contained in the .rela.text section of the object file

.rela.text 代表.text 部分的重定位".

.rela.text stands for "relocation of the .text section".


The word relocation is used because the linker will have to relocate the address from the object into the executable.

我们可以反汇编 .rela.text 部分:

We can disassemble the .rela.text section with:

readelf -r hello_world.o


Relocation section '.rela.text' at offset 0x340 contains 1 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
00000000000c  000200000001 R_X86_64_64       0000000000000000 .data + 0


The format of this section is fixed documented at: http://www.sco.com/developers/gabi/2003-12-17/ch4.reloc.html


Each entry tells the linker about one address which needs to be relocated, here we have only one for the string.


Simplifying a bit, for this particular line we have the following information:

  • Offset = C:.text的第一个字节是这个条目改变的.

  • Offset = C: what is the first byte of the .text that this entry changes.

如果我们回头看反编译的文本,它恰好在关键的 movabs $0x0,%rsi 内,知道 x86-64 指令编码的人会注意到,它编码的是 64 位地址部分的指令.

If we look back at the decompiled text, it is exactly inside the critical movabs $0x0,%rsi, and those that know x86-64 instruction encoding will notice that this encodes the 64-bit address part of the instruction.

Name = .data:地址指向.data部分

Type = R_X86_64_64,它指定了确切的计算来转换地址.

Type = R_X86_64_64, which specifies what exactly what calculation has to be done to translate the address.

此字段实际上取决于处理器,因此记录在 AMD64 System V ABI 扩展 第 4.4 节重定位".

This field is actually processor dependent, and thus documented on the AMD64 System V ABI extension section 4.4 "Relocation".

该文档说 R_X86_64_64 确实:

  • Field = word64:8 个字节,因此 00 00 00 00 00 00 00 00 在地址 0xC

  • Field = word64: 8 bytes, thus the 00 00 00 00 00 00 00 00 at address 0xC

计算 = S + A

  • S是被重定位地址处的value,因此00 00 00 00 00 00 00 00
  • A 是加数,这里是 0.这是重定位条目的字段.
  • S is value at the address being relocated, thus 00 00 00 00 00 00 00 00
  • A is the addend which is 0 here. This is a field of the relocation entry.

所以 S + A == 0 我们将被重新定位到 .data 部分的第一个地址.

So S + A == 0 and we will get relocated to the very first address of the .data section.


Now lets look at the text area of the executable ld generated for us:

objdump -d hello_world.out


00000000004000b0 <_start>:
  4000b0:   b8 01 00 00 00          mov    $0x1,%eax
  4000b5:   bf 01 00 00 00          mov    $0x1,%edi
  4000ba:   48 be d8 00 60 00 00    movabs $0x6000d8,%rsi
  4000c1:   00 00 00
  4000c4:   ba 0d 00 00 00          mov    $0xd,%edx
  4000c9:   0f 05                   syscall
  4000cb:   b8 3c 00 00 00          mov    $0x3c,%eax
  4000d0:   bf 00 00 00 00          mov    $0x0,%edi
  4000d5:   0f 05                   syscall


So the only thing that changed from the object file are the critical lines:

  4000ba:   48 be d8 00 60 00 00    movabs $0x6000d8,%rsi
  4000c1:   00 00 00

现在指向地址 0x6000d8(d8 00 60 00 00 00 00 00 in little-endian)而不是 0x0.

which now point to the address 0x6000d8 (d8 00 60 00 00 00 00 00 in little-endian) instead of 0x0.

这是 hello_world 字符串的正确位置吗?

Is this the right location for the hello_world string?

为了决定我们必须检查程序头,它告诉 Linux 加载每个部分的位置.

To decide we have to check the program headers, which tell Linux where to load each section.


readelf -l hello_world.out


Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x00000000000000d7 0x00000000000000d7  R E    200000
  LOAD           0x00000000000000d8 0x00000000006000d8 0x00000000006000d8
                 0x000000000000000d 0x000000000000000d  RW     200000

 Section to Segment mapping:
  Segment Sections...
   00     .text
   01     .data

这告诉我们 .data 部分,即第二个部分,从 VirtAddr = 0x06000d8 开始.

This tells us that the .data section, which is the second one, starts at VirtAddr = 0x06000d8.

数据部分唯一的内容是我们的 hello world 字符串.

And the only thing on the data section is our hello world string.
