对象如何在程序集级别在 x86 中工作?

2021-12-11 00:00:00 object assembly c++ x86 vtable

我试图了解对象在程序集级别是如何工作的.对象究竟是如何存储在内存中的,成员函数如何访问它们?

I'm trying to understand how objects work at the assembly level. How exactly are objects stored in memory, and how do member-functions access them?

(编者注:原始版本方式太宽泛了,并且首先对汇编和结构的工作方式有些困惑.)

(editor's note: the original version was way too broad, and had some confusion over how assembly and structs work in the first place.)

推荐答案

类的存储方式与结构完全相同,除非它们具有虚拟成员.在这种情况下,有一个隐式 vtable 指针作为第一个成员(见下文).

Classes are stored exactly the same way as structs, except when they have virtual members. In that case, there's an implicit vtable pointer as the first member (see below).

结构体存储为连续的内存块(如果编译器没有对其进行优化或将成员值保留在寄存器中).在 struct 对象中,其元素的地址按照定义成员的顺序增加.(来源:http://en.cppreference.com/w/c/language/struct).我链接了 C 定义,因为在 C++ 中 struct 意味着 class(使用 public: 作为默认值而不是 private:>).

A struct is stored as a contiguous block of memory (if the compiler doesn't optimize it away or keep the member values in registers). Within a struct object, addresses of its elements increase in order in which the members were defined. (source: http://en.cppreference.com/w/c/language/struct). I linked the C definition, because in C++ struct means class (with public: as the default instead of private:).

structclass 视为可能太大而无法放入寄存器的字节块,但它作为值"被复制.汇编语言没有类型系统;内存中的字节只是字节,不需要任何特殊指令即可从浮点寄存器中存储 double 并将其重新加载到整数寄存器中.或者进行未对齐的加载并获取 1 int 的最后 3 个字节和下一个的第一个字节.struct 只是在内存块之上构建 C 类型系统的一部分,因为内存块是有用的.

Think of a struct or class as a block of bytes that might be too big to fit in a register, but which is copied around as a "value". Assembly language doesn't have a type system; bytes in memory are just bytes and it doesn't take any special instructions to store a double from a floating point register and reload it into an integer register. Or to do an unaligned load and get the last 3 bytes of 1 int and the first byte of the next. A struct is just part of building C's type system on top of blocks of memory, since blocks of memory are useful.

这些字节块可以具有静态(全局或 static)、动态(mallocnew)或自动存储(局部变量: 临时在堆栈或寄存器中,在普通 CPU 上的普通 C/C++ 实现中).无论如何,块内的布局都是相同的(除非编译器优化了结构局部变量的实际内存;请参阅下面的示例,内联返回结构的函数.)

These blocks of bytes can have static (global or static), dynamic (malloc or new), or automatic storage (local variable: temporary on the stack or in registers, in normal C/C++ implementations on normal CPUs). The layout within a block is the same regardless (unless the compiler optimizes away the actual memory for a struct local variable; see the example below of inlining a function that returns a struct.)

结构或类与任何其他对象相同.在 C 和 C++ 术语中,即使 int 也是一个对象:http://en.cppreference.com/w/c/language/object.即您可以存储的连续字节块(C++ 中的非 POD 类型除外).

A struct or class is the same as any other object. In C and C++ terminology, even an int is an object: http://en.cppreference.com/w/c/language/object. i.e. A contiguous block of bytes that you can memcpy around (except for non-POD types in C++).

您正在编译的系统的 ABI 规则指定了插入填充的时间和位置,以确保即使您执行诸如 struct { char a; 之类的操作,每个成员也有足够的对齐方式.国际b;};(例如,x86-64 System V ABI,用于 Linux 和其他非 Windows 系统,指定 int 是 32 位类型,在内存中获得 4 字节对齐.ABI是什么确定了 C 和 C++ 标准依赖于实现"的一些东西,以便该 ABI 的所有编译器都可以编写可以调用彼此函数的代码.)

The ABI rules for the system you're compiling for specify when and where padding is inserted to make sure each member has sufficient alignment even if you do something like struct { char a; int b; }; (for example, the x86-64 System V ABI, used on Linux and other non-Windows systems specifies that int is a 32-bit type that gets 4-byte alignment in memory. The ABI is what nails down some stuff that the C and C++ standards leave "implementation dependent", so that all compilers for that ABI can make code that can call each other's functions.)

请注意,您可以使用 offsetof(struct_name, member) 了解结构布局(在 C11 和 C++11 中).另请参阅 C++11 中的 alignof,或_Alignof 在 C11 中.

Note that you can use offsetof(struct_name, member) to find out about struct layout (in C11 and C++11). See also alignof in C++11, or _Alignof in C11.

由程序员对结构成员进行排序以避免在填充上浪费空间,因为 C 规则不允许编译器为您对结构进行排序.(例如,如果您有一些 char 成员,请将它们分成至少 4 个一组,而不是与更宽的成员交替.从大到小排序是一个简单的规则,记住指针可能是 64 或 32- 常见平台上的位.)

It's up to the programmer to order struct members well to avoid wasting space on padding, since C rules don't let the compiler sort your struct for you. (e.g. if you have some char members, put them in groups of at least 4, rather than alternating with wider members. Sorting from large to small is an easy rule, remembering that pointers may be 64 or 32-bit on common platforms.)

有关 ABI 等的更多详细信息,请访问 https://stackoverflow.com/tags/x86/info.Agner Fog 的优秀网站包括 ABI 指南和优化指南.

More details of ABIs and so on can be found at https://stackoverflow.com/tags/x86/info. Agner Fog's excellent site includes an ABI guide, along with optimization guides.

class foo {
  int m_a;
  int m_b;
  void inc_a(void){ m_a++; }
  int inc_b(void);
};

int foo::inc_b(void) { return m_b++; }

编译为(使用http://gcc.godbolt.org/):

foo::inc_b():                  # args: this in RDI
    mov eax, DWORD PTR [rdi+4]      # eax = this->m_b
    lea edx, [rax+1]                # edx = eax+1
    mov DWORD PTR [rdi+4], edx      # this->m_b = edx
    ret

如您所见,this 指针作为隐式第一个参数传递(在 rdi 中,在 SysV AMD64 ABI 中).m_b 存储在结构/类开头的 4 个字节处.注意 lea 的巧妙使用来实现后增量运算符,将旧值留在 eax 中.

As you can see, the this pointer is passed as an implicit first argument (in rdi, in the SysV AMD64 ABI). m_b is stored at 4 bytes from the start of the struct/class. Note the clever use of lea to implement the post-increment operator, leaving the old value in eax.

没有发出 inc_a 的代码,因为它是在类声明中定义的.它被视为与 inline 非成员函数相同.如果它真的很大并且编译器决定不内联它,它可以发出它的独立版本.

No code for inc_a is emitted, since it's defined inside the class declaration. It's treated the same as an inline non-member function. If it was really big and the compiler decided not to inline it, it could emit a stand-alone version of it.

C++ 对象与 C 结构体的真正不同之处在于涉及虚拟成员函数.对象的每个副本都必须携带一个额外的指针(指向实际类型的 vtable).

Where C++ objects really differ from C structs is when virtual member functions are involved. Each copy of the object has to carry around an extra pointer (to the vtable for its actual type).

class foo {
  public:
  int m_a;
  int m_b;
  void inc_a(void){ m_a++; }
  void inc_b(void);
  virtual void inc_v(void);
};

void foo::inc_b(void) { m_b++; }

class bar: public foo {
 public:
  virtual void inc_v(void);  // overrides foo::inc_v even for users that access it through a pointer to class foo
};

void foo::inc_v(void) { m_b++; }
void bar::inc_v(void) { m_a++; }

编译aJoPaLlDHEBg>aJoPaLlDHEBg>

compiles to

  ; This time I made the functions return void, so the asm is simpler
  ; The in-memory layout of the class is now:
  ;   vtable ptr (8B)
  ;   m_a (4B)
  ;   m_b (4B)
foo::inc_v():
    add DWORD PTR [rdi+12], 1   # this_2(D)->m_b,
    ret
bar::inc_v():
    add DWORD PTR [rdi+8], 1    # this_2(D)->D.2657.m_a,
    ret

    # if you uncheck the hide-directives box, you'll see
    .globl  foo::inc_b()
    .set    foo::inc_b(),foo::inc_v()
    # since inc_b has the same definition as foo's inc_v, so gcc saves space by making one an alias for the other.

    # you can also see the directives that define the data that goes in the vtables

<小时>

有趣的事实:add m32, imm8 在大多数 Intel CPU 上比 inc m32 快(负载微融合 + ALU uops);旧的 Pentium4 建议避免 inc 仍然适用的罕见情况之一.gcc 总是避免使用 inc,即使它可以节省代码大小而没有任何缺点:/INC 指令与 ADD 1:重要吗?


Fun fact: add m32, imm8 is faster than inc m32 on most Intel CPUs (micro-fusion of the load+ALU uops); one of the rare cases where the old Pentium4 advice to avoid inc still applies. gcc always avoids inc, though, even when it would save code size with no downsides :/ INC instruction vs ADD 1: Does it matter?

void caller(foo *p){
    p->inc_v();
}

    mov     rax, QWORD PTR [rdi]      # p_2(D)->_vptr.foo, p_2(D)->_vptr.foo
    jmp     [QWORD PTR [rax]]         # *_3

(这是一个优化的尾调用:jmp 替换 call/ret).

(This is an optimized tailcall: jmp replacing call/ret).

mov 将对象中的 vtable 地址加载到寄存器中.jmp 是内存间接跳转,即从内存加载新的 RIP 值.跳转目标地址是vtable[0],即vtable中的第一个函数指针.如果有另一个虚函数,mov不会改变,但 jmp 会使用 jmp [rax + 8].

The mov loads the vtable address from the object into a register. The jmp is a memory-indirect jump, i.e. loading a new RIP value from memory. The jump-target address is vtable[0], i.e. the first function pointer in the vtable. If there was another virtual function, the mov wouldn't change but the jmp would use jmp [rax + 8].

vtable 中条目的顺序可能与类中的声明顺序相匹配,因此在一个翻译单元中重新排序类声明会导致虚函数到达错误的目标.就像对数据成员重新排序会改变类的 ABI 一样.

The order of entries in the vtable presumably matches the order of declaration in the class, so reordering the class declaration in one translation unit would result in virtual functions going to the wrong target. Just like reordering the data members would change the class's ABI.

如果编译器有更多信息,它可以去虚拟化调用.例如如果它可以证明 foo * 总是指向一个 bar 对象,它就可以内联 bar::inc_v().

If the compiler had more information, it could devirtualize the call. e.g. if it could prove that the foo * was always pointing to a bar object, it could inline bar::inc_v().

GCC 甚至会推测性地去虚拟化,因为它可以在编译时确定可能的类型.在上面的代码中,编译器看不到任何继承自 bar 的类,所以很可能 bar* 指向一个 bar 对象,而不是某个派生类.

GCC will even speculatively devirtualize when it can figure out what the type probably is at compile time. In the above code, the compiler can't see any classes that inherit from bar, so it's a good bet that bar* is pointing to a bar object, rather than some derived class.

void caller_bar(bar *p){
    p->inc_v();
}

# gcc5.5 -O3
caller_bar(bar*):
    mov     rax, QWORD PTR [rdi]      # load vtable pointer
    mov     rax, QWORD PTR [rax]      # load target function address
    cmp     rax, OFFSET FLAT:bar::inc_v()  # check it
    jne     .L6       #,
    add     DWORD PTR [rdi+8], 1      # inlined version of bar::inc_v()
    ret
.L6:
    jmp     rax               # otherwise tailcall the derived class's function

记住,一个 foo * 实际上可以指向一个派生的 bar 对象,但是一个 bar * 不允许指向一个纯的foo 对象.

Remember, a foo * can actually point to a derived bar object, but a bar * is not allowed to point to a pure foo object.

不过这只是一个赌注;虚函数的部分要点是可以扩展类型而无需重新编译对基类型进行操作的所有代码.这就是为什么它必须比较函数指针并在错误时退回到间接调用(在这种情况下为 jmp 尾调用)的原因.编译器启发式决定何时尝试.

It is just a bet though; part of the point of virtual functions is that types can be extended without recompiling all the code that operates on the base type. This is why it has to compare the function pointer and fall back to the indirect call (jmp tailcall in this case) if it was wrong. Compiler heuristics decide when to attempt it.

请注意,它正在检查实际的函数指针,而不是比较 vtable 指针.只要派生类型没有覆盖那个虚函数,它仍然可以使用内联的bar::inc_v().覆盖其他虚函数不会影响这个,但需要一个不同的虚表.

Notice that it's checking the actual function pointer, rather than comparing the vtable pointer. It can still use the inlined bar::inc_v() as long as the derived type didn't override that virtual function. Overriding other virtual functions wouldn't affect this one, but would require a different vtable.

允许扩展而不重新编译对于库来说很方便,但也意味着大程序各部分之间的耦合更松散(即您不必在每个文件中都包含所有头文件).

Allowing extension without recompilation is handy for libraries, but also means looser coupling between parts of a big program (i.e. you don't have to include all the headers in every file).

但这对某些用途造成了一些效率成本:C++ 虚拟分派仅通过指针对对象起作用,因此您不能拥有没有黑客的多态数组,或者通过指针数组进行昂贵的间接访问(这打败了许多硬件和软件优化:在 C++ 中最快实现简单的、虚拟的、观察者类型的模式?).

But this imposes some efficiency costs for some uses: C++ virtual dispatch only works through pointers to objects, so you can't have a polymorphic array without hacks, or expensive indirection through an array of pointers (which defeats a lot of hardware and software optimizations: Fastest implementation of simple, virtual, observer-sort of, pattern in c++?).

如果您想要某种多态性/分派,但仅适用于一组封闭的类型(即在编译时已知),您可以使用 union + enum + switch,或使用 std::变体<D1,D2> 进行联合和 std::visit 进行分派,或其他各种方式.另请参阅多态类型的连续存储和c++中简单、虚拟、观察者排序模式的最快实现?.

If you want some kind of polymorphism / dispatch but only for a closed set of types (i.e. all known at compile time), you can do it manually with a union + enum + switch, or with std::variant<D1,D2> to make a union and std::visit to dispatch, or various other ways. See also Contiguous storage of polymorphic types and Fastest implementation of simple, virtual, observer-sort of, pattern in c++?.

使用 struct 并不会强制编译器实际将内容放入内存,就像小数组或指向局部变量的指针一样.例如,一个按值返回 struct 的内联函数仍然可以完全优化.

Using a struct doesn't force the compiler to actually put stuff in memory, any more than a small array or a pointer to a local variable does. For example, an inline function that returns a struct by value can still fully optimize.

as-if 规则适用:即使结构 逻辑上 有一些内存存储,编译器可以制作 asm,将所有需要的成员保存在寄存器中(并进行转换,这意味着寄存器中的值不对应于运行"源代码的 C++ 抽象机中变量或临时变量的任何值).

The as-if rule applies: even if a struct logically has some memory storage, the compiler can make asm that keeps all the needed members in registers (and do transformations that mean that values in registers don't correspond to any value of a variable or temporary in the C++ abstract machine "running" the source code).

struct pair {
  int m_a;
  int m_b;
};

pair addsub(int a, int b) {
  return {a+b, a-b};
}

int foo(int a, int b) {
  pair ab = addsub(a,b);
  return ab.m_a * ab.m_b;
}

那个 编译(使用 g++ 5.4):

# The non-inline definition which actually returns a struct
addsub(int, int):
    lea     edx, [rdi+rsi]  # add result
    mov     eax, edi
    sub     eax, esi        # sub result
                            # then pack both struct members into a 64-bit register, as required by the x86-64 SysV ABI
    sal     rax, 32
    or      rax, rdx
    ret

# But when inlining, it optimizes away
foo(int, int):
    lea     eax, [rdi+rsi]    # a+b
    sub     edi, esi          # a-b
    imul    eax, edi          # (a+b) * (a-b)
    ret

请注意,即使按值返回结构也不一定将其放入内存中.x86-64 SysV ABI 传递并返回打包到寄存器中的小结构.不同的 ABI 为此做出不同的选择.

Notice how even returning a struct by value doesn't necessarily put it in memory. The x86-64 SysV ABI passes and returns small structs packed together into registers. Different ABIs make different choices for this.

相关文章