计算机程序运行时会发生什么?
我知道一般的理论,但我无法适应细节.
I know the general theory but I can't fit in the details.
我知道程序驻留在计算机的辅助存储器中.一旦程序开始执行,它就会被完全复制到 RAM 中.然后处理器一次检索一些指令(这取决于总线的大小),将它们放入寄存器并执行.
I know that a program resides in the secondary memory of a computer. Once the program begins execution it is entirely copied to the RAM. Then the processor retrive a few instructions (it depends on the size of the bus) at a time, puts them in registers and executes them.
我也知道一个计算机程序使用两种内存:栈和堆,它们也是计算机主内存的一部分.堆栈用于非动态内存,堆用于动态内存(例如,与 C++ 中的 new
运算符相关的所有内容)
I also know that a computer program uses two kinds of memory: stack and heap, which are also part of the primary memory of the computer. The stack is used for non-dynamic memory, and the heap for dynamic memory (for example, everything related to the new
operator in C++)
我无法理解的是这两件事是如何联系在一起的.堆栈在什么时候用于执行指令?指令从 RAM 到堆栈,再到寄存器?
What I can't understand is how those two things connect. At what point is the stack used for the execution of the instructions? Instructions go from the RAM, to the stack, to the registers?
推荐答案
这确实取决于系统,但现代操作系统具有 虚拟内存 倾向于加载它们的进程映像并像这样分配内存:
It really depends on the system, but modern OSes with virtual memory tend to load their process images and allocate memory something like this:
+---------+
| stack | function-local variables, return addresses, return values, etc.
| | often grows downward, commonly accessed via "push" and "pop" (but can be
| | accessed randomly, as well; disassemble a program to see)
+---------+
| shared | mapped shared libraries (C libraries, math libs, etc.)
| libs |
+---------+
| hole | unused memory allocated between the heap and stack "chunks", spans the
| | difference between your max and min memory, minus the other totals
+---------+
| heap | dynamic, random-access storage, allocated with 'malloc' and the like.
+---------+
| bss | Uninitialized global variables; must be in read-write memory area
+---------+
| data | data segment, for globals and static variables that are initialized
| | (can further be split up into read-only and read-write areas, with
| | read-only areas being stored elsewhere in ROM on some systems)
+---------+
| text | program code, this is the actual executable code that is running.
+---------+
这是许多常见虚拟内存系统上的通用进程地址空间.洞"是你的总内存大小,减去所有其他区域占用的空间;这为堆增长提供了大量空间.这也是虚拟的",意味着它通过转换表映射到您的实际内存,并且可能实际存储在实际内存中的任何位置.这样做是为了防止一个进程访问另一个进程的内存,并使每个进程都认为它在一个完整的系统上运行.
This is the general process address space on many common virtual-memory systems. The "hole" is the size of your total memory, minus the space taken up by all the other areas; this gives a large amount of space for the heap to grow into. This is also "virtual", meaning it maps to your actual memory through a translation table, and may be actually stored at any location in actual memory. It is done this way to protect one process from accessing another process's memory, and to make each process think it's running on a complete system.
请注意,例如堆栈和堆的位置在某些系统上可能具有不同的顺序(参见 下面是比利奥尼尔的回答,了解有关 Win32 的更多详细信息).
Note that the positions of, e.g., the stack and heap may be in a different order on some systems (see Billy O'Neal's answer below for more details on Win32).
其他系统可能非常不同.例如,DOS 以实模式运行,它在运行程序时的内存分配看起来大不相同:
Other systems can be very different. DOS, for instance, ran in real mode, and its memory allocation when running programs looked much differently:
+-----------+ top of memory
| extended | above the high memory area, and up to your total memory; needed drivers to
| | be able to access it.
+-----------+ 0x110000
| high | just over 1MB->1MB+64KB, used by 286s and above.
+-----------+ 0x100000
| upper | upper memory area, from 640kb->1MB, had mapped memory for video devices, the
| | DOS "transient" area, etc. some was often free, and could be used for drivers
+-----------+ 0xA0000
| USER PROC | user process address space, from the end of DOS up to 640KB
+-----------+
|command.com| DOS command interpreter
+-----------+
| DOS | DOS permanent area, kept as small as possible, provided routines for display,
| kernel | *basic* hardware access, etc.
+-----------+ 0x600
| BIOS data | BIOS data area, contained simple hardware descriptions, etc.
+-----------+ 0x400
| interrupt | the interrupt vector table, starting from 0 and going to 1k, contained
| vector | the addresses of routines called when interrupts occurred. e.g.
| table | interrupt 0x21 checked the address at 0x21*4 and far-jumped to that
| | location to service the interrupt.
+-----------+ 0x0
你可以看到 DOS 允许直接访问操作系统内存,没有保护,这意味着用户空间程序通常可以直接访问或覆盖他们喜欢的任何东西.
You can see that DOS allowed direct access to the operating system memory, with no protection, which meant that user-space programs could generally directly access or overwrite anything they liked.
然而,在进程地址空间中,程序往往看起来相似,只是它们被描述为代码段、数据段、堆、堆栈段等,并且映射有点不同.但大部分一般区域仍然存在.
In the process address space, however, the programs tended to look similar, only they were described as code segment, data segment, heap, stack segment, etc., and it was mapped a little differently. But most of the general areas were still there.
将程序和必要的共享库加载到内存中,并将程序的各个部分分配到正确的区域后,操作系统开始在其主要方法所在的任何地方执行您的进程,您的程序从那里接管,进行系统调用在需要时视需要而定.
Upon loading the program and necessary shared libs into memory, and distributing the parts of the program into the right areas, the OS begins executing your process wherever its main method is at, and your program takes over from there, making system calls as necessary when it needs them.
不同的系统(嵌入式,无论什么)可能有非常不同的架构,例如无堆栈系统、哈佛架构系统(代码和数据保存在单独的物理内存中)、实际上将 BSS 保存在只读内存中的系统(最初由程序员设置)等.但这是一般要点.
Different systems (embedded, whatever) may have very different architectures, such as stackless systems, Harvard architecture systems (with code and data being kept in separate physical memory), systems which actually keep the BSS in read-only memory (initially set by the programmer), etc. But this is the general gist.
你说:
我也知道计算机程序使用两种内存:堆栈和堆,它们也是计算机主内存的一部分.
I also know that a computer program uses two kinds of memory: stack and heap, which are also part of the primary memory of the computer.
堆栈"和堆"只是抽象概念,而不是(必然)物理上不同的种类"内存.
"Stack" and "heap" are just abstract concepts, rather than (necessarily) physically distinct "kinds" of memory.
stack 仅仅是后进先出的数据结构.在 x86 架构中,它实际上可以通过使用从末尾的偏移量来随机寻址,但最常见的功能是 PUSH 和 POP 分别向其中添加和删除项目.它通常用于函数局部变量(所谓的自动存储")、函数参数、返回地址等(更多见下文)
A stack is merely a last-in, first-out data structure. In the x86 architecture, it can actually be addressed randomly by using an offset from the end, but the most common functions are PUSH and POP to add and remove items from it, respectively. It is commonly used for function-local variables (so-called "automatic storage"), function arguments, return addresses, etc. (more below)
"heap" 只是可以分配的内存块的昵称需求,并且是随机寻址的(也就是说,您可以直接访问其中的任何位置).它通常用于您在运行时分配的数据结构(在 C++ 中,使用 new
和 delete
,以及 malloc
和 C 中的朋友等).
A "heap" is just a nickname for a chunk of memory that can be allocated on demand, and is addressed randomly (meaning, you can access any location in it directly). It is commonly used for data structures that you allocate at runtime (in C++, using new
and delete
, and malloc
and friends in C, etc).
在 x86 架构上,栈和堆都在物理上驻留在系统内存 (RAM) 中,并通过虚拟内存分配映射到进程地址空间,如上所述.
The stack and heap, on the x86 architecture, both physically reside in your system memory (RAM), and are mapped through virtual memory allocation into the process address space as described above.
寄存器(仍在 x86 上),物理上驻留在处理器内部(而不是 RAM),并由处理器从 TEXT 区域加载(也可以从内存中的其他地方或其他地方加载,具体取决于实际执行的 CPU 指令).它们本质上只是非常小、速度非常快的片上存储器位置,可用于多种不同目的.
The registers (still on x86), physically reside inside the processor (as opposed to RAM), and are loaded by the processor, from the TEXT area (and can also be loaded from elsewhere in memory or other places depending on the CPU instructions that are actually executed). They are essentially just very small, very fast on-chip memory locations that are used for a number of different purposes.
寄存器布局高度依赖于体系结构(实际上,寄存器、指令集和内存布局/设计正是体系结构"的含义),因此我不会对其进行扩展,但建议您参加汇编语言课程以更好地理解它们.
Register layout is highly dependent on the architecture (in fact, registers, the instruction set, and memory layout/design, are exactly what is meant by "architecture"), and so I won't expand upon it, but recommend you take an assembly language course to understand them better.
您的问题:
堆栈在什么时候用于执行指令?指令从 RAM 到堆栈,再到寄存器?
At what point is the stack used for the execution of the instructions? Instructions go from the RAM, to the stack, to the registers?
堆栈(在拥有并使用它们的系统/语言中)最常像这样使用:
The stack (in systems/languages that have and use them) is most often used like this:
int mul( int x, int y ) {
return x * y; // this stores the result of MULtiplying the two variables
// from the stack into the return value address previously
// allocated, then issues a RET, which resets the stack frame
// based on the arg list, and returns to the address set by
// the CALLer.
}
int main() {
int x = 2, y = 3; // these variables are stored on the stack
mul( x, y ); // this pushes y onto the stack, then x, then a return address,
// allocates space on the stack for a return value,
// then issues an assembly CALL instruction.
}
像这样写一个简单的程序,然后编译成汇编(gcc -S foo.c
,如果你有GCC权限的话),看看.组装很容易遵循.您可以看到堆栈用于函数局部变量,以及用于调用函数、存储它们的参数和返回值.这也是为什么当您执行以下操作时:
Write a simple program like this, and then compile it to assembly (gcc -S foo.c
if you have access to GCC), and take a look. The assembly is pretty easy to follow. You can see that the stack is used for function local variables, and for calling functions, storing their arguments and return values. This is also why when you do something like:
f( g( h( i ) ) );
所有这些都会依次调用.它实际上是建立一堆函数调用及其参数,执行它们,然后在它回落(或向上;)时将它们弹出.但是,如上所述,堆栈(在 x86 上)实际上驻留在您的进程内存空间(在虚拟内存中),因此可以直接操作;它不是执行过程中的单独步骤(或至少与流程正交).
All of these get called in turn. It's literally building up a stack of function calls and their arguments, executing them, and then popping them off as it winds back down (or up ;). However, as mentioned above, the stack (on x86) actually resides in your process memory space (in virtual memory), and so it can be manipulated directly; it's not a separate step during execution (or at least is orthogonal to the process).
仅供参考,以上是 C 调用约定,也被 C++ 使用.其他语言/系统可能会以不同的顺序将参数压入堆栈,有些语言/平台甚至不使用堆栈,而是以不同的方式进行处理.
FYI, the above is the C calling convention, also used by C++. Other languages/systems may push arguments onto the stack in a different order, and some languages/platforms don't even use stacks, and go about it in different ways.
另请注意,这些不是实际执行的 C 代码行.编译器已将它们转换为可执行文件中的机器语言指令.然后(通常)将它们从 TEXT 区域复制到 CPU 管道中,然后复制到 CPU 寄存器中,并从那里执行. [这是不正确的.请参阅下面的Ben Voigt 的更正.]
Also note, these aren't actual lines of C code executing. The compiler has converted them into machine language instructions in your executable. They are then (generally) copied from the TEXT area into the CPU pipeline, then into the CPU registers, and executed from there. [This was incorrect. See Ben Voigt's correction below.]
相关文章