“取消引用"是什么意思?指针是什么意思?

2022-01-30 00:00:00 pointers c c++ dereference

请附上解释的例子.

推荐答案

回顾基本术语

通常足够好 - 除非您正在编写程序集 - 设想一个包含数字内存地址的 指针,其中 1 表示进程内存中的第二个字节,第二个字节,第三个字节,第三个字节,第四个字节,依此类推......

Reviewing the basic terminology

It's usually good enough - unless you're programming assembly - to envisage a pointer containing a numeric memory address, with 1 referring to the second byte in the process's memory, 2 the third, 3 the fourth and so on....

  • 0 和第一个字节发生了什么?好吧,我们稍后再讨论 - 请参阅下面的 空指针.
  • 要更准确地定义指针存储的内容以及内存和地址之间的关系,请参阅本文末尾的关于内存地址的更多信息,以及您可能不需要知道的原因"回答.

当您想要访问指针指向的内存中的数据/值时 - 具有该数字索引的地址的内容 - 然后您取消引用指针.

When you want to access the data/value in the memory that the pointer points to - the contents of the address with that numerical index - then you dereference the pointer.

不同的计算机语言有不同的符号来告诉编译器或解释器您现在对指向对象的(当前)值感兴趣 - 我在下面重点介绍 C 和 C++.

Different computer languages have different notations to tell the compiler or interpreter that you're now interested in the pointed-to object's (current) value - I focus below on C and C++.

考虑在 C 中,给定一个指针,例如下面的 p...

Consider in C, given a pointer such as p below...

const char* p = "abc";

...四个字节,其中包含用于对字母a"、b"、c"进行编码的数值,以及一个表示文本数据结尾的 0 字节,它们存储在内存中的某个位置,而数字该数据的地址存储在 p 中.C 在内存中编码文本的这种方式称为 ASCIIZ.

...four bytes with the numerical values used to encode the letters 'a', 'b', 'c', and a 0 byte to denote the end of the textual data, are stored somewhere in memory and the numerical address of that data is stored in p. This way C encodes text in memory is known as ASCIIZ.

例如,如果字符串文字恰好位于地址 0x1000 并且 p 是 0x2000 处的 32 位指针,则内存内容将为:

For example, if the string literal happened to be at address 0x1000 and p a 32-bit pointer at 0x2000, the memory content would be:

Memory Address (hex)    Variable name    Contents
1000                                     'a' == 97 (ASCII)
1001                                     'b' == 98
1002                                     'c' == 99
1003                                     0
...
2000-2003               p                1000 hex

请注意,地址 0x1000 没有变量名称/标识符,但我们可以使用存储其地址的指针间接引用字符串文字:p.

Note that there is no variable name/identifier for address 0x1000, but we can indirectly refer to the string literal using a pointer storing its address: p.

要引用 p 指向的字符,我们使用以下符号之一取消引用 p(同样,对于 C):

To refer to the characters p points to, we dereference p using one of these notations (again, for C):

assert(*p == 'a');  // The first character at address p will be 'a'
assert(p[1] == 'b'); // p[1] actually dereferences a pointer created by adding
                     // p and 1 times the size of the things to which p points:
                     // In this case they're char which are 1 byte in C...
assert(*(p + 1) == 'b');  // Another notation for p[1]

您还可以在指向的数据中移动指针,在执行过程中取消对它们的引用:

You can also move pointers through the pointed-to data, dereferencing them as you go:

++p;  // Increment p so it's now 0x1001
assert(*p == 'b');  // p == 0x1001 which is where the 'b' is...

如果你有一些可以写入的数据,那么你可以这样做:

If you have some data that can be written to, then you can do things like this:

int x = 2;
int* p_x = &x;  // Put the address of the x variable into the pointer p_x
*p_x = 4;       // Change the memory at the address in p_x to be 4
assert(x == 4); // Check x is now 4

上面,你必须在编译时就知道你需要一个名为x的变量,并且代码要求编译器安排它应该存储的位置,确保地址可以通过<代码>&x.

Above, you must have known at compile time that you would need a variable called x, and the code asks the compiler to arrange where it should be stored, ensuring the address will be available via &x.

在 C 中,如果您有一个变量是指向具有数据成员的结构的指针,则可以使用 -> 解引用运算符访问这些成员:

In C, if you have a variable that is a pointer to a structure with data members, you can access those members using the -> dereferencing operator:

typedef struct X { int i_; double d_; } X;
X x;
X* p = &x;
p->d_ = 3.14159;  // Dereference and access data member x.d_
(*p).d_ *= -1;    // Another equivalent notation for accessing x.d_

多字节数据类型

要使用指针,计算机程序还需要深入了解所指向的数据类型――如果该数据类型需要多个字节来表示,那么指针通常指向编号最小的字节数据.

Multi-byte data types

To use a pointer, a computer program also needs some insight into the type of data that is being pointed at - if that data type needs more than one byte to represent, then the pointer normally points to the lowest-numbered byte in the data.

所以,看一个稍微复杂一点的例子:

So, looking at a slightly more complex example:

double sizes[] = { 10.3, 13.4, 11.2, 19.4 };
double* p = sizes;
assert(p[0] == 10.3);  // Knows to look at all the bytes in the first double value
assert(p[1] == 13.4);  // Actually looks at bytes from address p + 1 * sizeof(double)
                       // (sizeof(double) is almost always eight bytes)
++p;                   // Advance p by sizeof(double)
assert(*p == 13.4);    // The double at memory beginning at address p has value 13.4
*(p + 2) = 29.8;       // Change sizes[3] from 19.4 to 29.8
                       // Note earlier ++p and + 2 here => sizes[3]

指向动态分配内存的指针

有时你不知道你需要多少内存,直到你的程序运行并看到什么数据被抛出......然后你可以使用 malloc 动态分配内存.通常将地址存储在指针中...

Pointers to dynamically allocated memory

Sometimes you don't know how much memory you'll need until your program is running and sees what data is thrown at it... then you can dynamically allocate memory using malloc. It is common practice to store the address in a pointer...

int* p = (int*)malloc(sizeof(int)); // Get some memory somewhere...
*p = 10;            // Dereference the pointer to the memory, then write a value in
fn(*p);             // Call a function, passing it the value at address p
(*p) += 3;          // Change the value, adding 3 to it
free(p);            // Release the memory back to the heap allocation library

在 C++ 中,内存分配通常使用 new 操作符完成,而解除分配使用 delete:

In C++, memory allocation is normally done with the new operator, and deallocation with delete:

int* p = new int(10); // Memory for one int with initial value 10
delete p;

p = new int[10];      // Memory for ten ints with unspecified initial value
delete[] p;

p = new int[10]();    // Memory for ten ints that are value initialised (to 0)
delete[] p;

另请参阅下面的C++ 智能指针.

通常,指针可能是内存中某些数据或缓冲区存在位置的唯一指示.如果需要持续使用该数据/缓冲区,或者需要调用 free()delete 以避免内存泄漏,则程序员必须对指针...

Often a pointer may be the only indication of where some data or buffer exists in memory. If ongoing use of that data/buffer is needed, or the ability to call free() or delete to avoid leaking the memory, then the programmer must operate on a copy of the pointer...

const char* p = asprintf("name: %s", name);  // Common but non-Standard printf-on-heap

// Replace non-printable characters with underscores....
for (const char* q = p; *q; ++q)
    if (!isprint(*q))
        *q = '_';

printf("%s
", p); // Only q was modified
free(p);

...或仔细安排任何更改的逆转...

...or carefully orchestrate reversal of any changes...

const size_t n = ...;
p += n;
...
p -= n;  // Restore earlier value...
free(p);

C++ 智能指针

在 C++ 中,最好使用 智能指针 对象来存储和管理指针,当智能指针的析构函数运行时自动释放它们.由于 C++11 标准库提供了两个,unique_ptr 当分配的对象只有一个所有者时...

C++ smart pointers

In C++, it's best practice to use smart pointer objects to store and manage the pointers, automatically deallocating them when the smart pointers' destructors run. Since C++11 the Standard Library provides two, unique_ptr for when there's a single owner for an allocated object...

{
    std::unique_ptr<T> p{new T(42, "meaning")};
    call_a_function(p);
    // The function above might throw, so delete here is unreliable, but...
} // p's destructor's guaranteed to run "here", calling delete

...和 ??shared_ptr 用于共享所有权(使用 引用计数)...

...and shared_ptr for share ownership (using reference counting)...

{
    auto p = std::make_shared<T>(3.14, "pi");
    number_storage1.may_add(p); // Might copy p into its container
    number_storage2.may_add(p); // Might copy p into its container    } // p's destructor will only delete the T if neither may_add copied it

空指针

在 C 中,NULL0 - 以及在 C++ 中的 nullptr - 可用于指示指针当前不持有变量的内存地址,不应取消引用或在指针算术中使用.例如:

Null pointers

In C, NULL and 0 - and additionally in C++ nullptr - can be used to indicate that a pointer doesn't currently hold the memory address of a variable, and shouldn't be dereferenced or used in pointer arithmetic. For example:

const char* p_filename = NULL; // Or "= 0", or "= nullptr" in C++
int c;
while ((c = getopt(argc, argv, "f:")) != -1)
    switch (c) {
      case f: p_filename = optarg; break;
    }
if (p_filename)  // Only NULL converts to false
    ...   // Only get here if -f flag specified

在 C 和 C++ 中,正如内置数字类型不一定默认为 0bools 也不一定默认为 false,指针不是始终设置为 NULL.当它们是 static 变量或(仅限 C++)静态对象或其基的直接或间接成员变量,或经历零初始化(例如 newT();new T(x, y, z); 对 T 的成员(包括指针)执行零初始化,而 new T; 不会).

In C and C++, just as inbuilt numeric types don't necessarily default to 0, nor bools to false, pointers are not always set to NULL. All these are set to 0/false/NULL when they're static variables or (C++ only) direct or indirect member variables of static objects or their bases, or undergo zero initialisation (e.g. new T(); and new T(x, y, z); perform zero-initialisation on T's members including pointers, whereas new T; does not).

此外,当您将 0NULLnullptr 分配给指针时,指针中的位不一定全部重置:指针在硬件级别可能不包含0",或者在您的虚拟地址空间中引用地址 0.如果有理由,编译器可以在那里存储其他东西,但不管它做什么――如果你来比较指针到 0NULLnullptr 或分配了其中任何一个的另一个指针,比较必须按预期工作.因此,在编译器级别的源代码之下,NULL"在 C 和 C++ 语言中可能有点神奇"...

Further, when you assign 0, NULL and nullptr to a pointer the bits in the pointer are not necessarily all reset: the pointer may not contain "0" at the hardware level, or refer to address 0 in your virtual address space. The compiler is allowed to store something else there if it has reason to, but whatever it does - if you come along and compare the pointer to 0, NULL, nullptr or another pointer that was assigned any of those, the comparison must work as expected. So, below the source code at the compiler level, "NULL" is potentially a bit "magical" in the C and C++ languages...

更严格地说,初始化的指针存储一个位模式,标识 NULL 或(通常是 虚拟) 内存地址.

More strictly, initialised pointers store a bit-pattern identifying either NULL or a (often virtual) memory address.

简单的情况是这是进程整个虚拟地址空间的数字偏移量;在更复杂的情况下,指针可能与某个特定的内存区域相关,CPU 可以根据 CPU段"寄存器或以位模式编码的某种段 id 方式来选择,和/或根据不同的位置查看不同的位置使用地址的机器代码指令.

The simple case is where this is a numeric offset into the process's entire virtual address space; in more complex cases the pointer may be relative to some specific memory area, which the CPU may select based on CPU "segment" registers or some manner of segment id encoded in the bit-pattern, and/or looking in different places depending on the machine code instructions using the address.

例如,正确初始化为指向 int 变量的 int* 可能 - 在转换为 float* 后 - 访问内存GPU"内存与 int 变量所在的内存完全不同,然后一旦转换为函数指针并用作函数指针,它可能会指向更多不同的内存保存程序的机器操作码(带有数值int* 在这些其他内存区域内实际上是一个随机的、无效的指针).

For example, an int* properly initialised to point to an int variable might - after casting to a float* - access memory in "GPU" memory quite distinct from the memory where the int variable is, then once cast to and used as a function pointer it might point into further distinct memory holding machine opcodes for the program (with the numeric value of the int* effectively a random, invalid pointer within these other memory regions).

C 和 C++ 等 3GL 编程语言倾向于隐藏这种复杂性,例如:

3GL programming languages like C and C++ tend to hide this complexity, such that:

  • 如果编译器给你一个指向变量或函数的指针,你可以自由地取消引用它(只要变量没有被破坏/释放),这是编译器的问题,例如需要预先恢复特定的 CPU 段寄存器,或使用不同的机器代码指令

  • If the compiler gives you a pointer to a variable or function, you can dereference it freely (as long as the variable's not destructed/deallocated meanwhile) and it's the compiler's problem whether e.g. a particular CPU segment register needs to be restored beforehand, or a distinct machine code instruction used

如果你得到一个指向数组中元素的指针,你可以使用指针算法来移动数组中的任何其他位置,甚至可以在数组的末尾形成一个合法的地址与指向数组中元素的其他指针(或类似地通过指针算术移动到相同的最后一个值)进行比较;再次在 C 和 C++ 中,由编译器来确保正常工作"

If you get a pointer to an element in an array, you can use pointer arithmetic to move anywhere else in the array, or even to form an address one-past-the-end of the array that's legal to compare with other pointers to elements in the array (or that have similarly been moved by pointer arithmetic to the same one-past-the-end value); again in C and C++, it's up to the compiler to ensure this "just works"

特定的操作系统功能,例如共享内存映射,可能会给你指针,它们会在对它们有意义的地址范围内正常工作"

Specific OS functions, e.g. shared memory mapping, may give you pointers, and they'll "just work" within the range of addresses that makes sense for them

尝试将合法指针移出这些边界,或将任意数字转换为指针,或使用转换为不相关类型的指针,通常有 未定义的行为,因此在更高级别的库和应用程序中应避免使用,但操作系统、设备驱动程序等的代码可能需要依赖于剩余的行为未由 C 或 C++ 标准定义,但由它们的特定实现或硬件很好地定义.

Attempts to move legal pointers beyond these boundaries, or to cast arbitrary numbers to pointers, or use pointers cast to unrelated types, typically have undefined behaviour, so should be avoided in higher level libraries and applications, but code for OSes, device drivers, etc. may need to rely on behaviour left undefined by the C or C++ Standard, that is nevertheless well defined by their specific implementation or hardware.

相关文章