如果明确给出多维数组,为什么 char[][] = {{...}, {...}} 不可能?

我经历了这个文章.我理解解释的规则,但我想知道在定义常量多维数组并使用给定类型的已知值直接初始化它时究竟是什么阻止了编译器接受以下语法:

I went through this article. I understand the rules explained but I am wondering what exactly blocks the compiler from accepting the following syntax when defining a constant multi-dimensional array and directly initializing it with known values of given type:

const int multi_arr1[][] = {{1,2,3}, {1,2,3}}; // why not?
const int multi_arr2[][3] = {{1,2,3}, {1,2,3}}; // OK

error: declaration of 'multi_arr1' as multidimensional array must have bounds
       for all dimensions except the first

什么阻止编译器向右看并意识到我们正在为每个子数组"处理 3 个元素,或者可能仅在程序员通过的情况下返回错误,例如每个子数组的元素数量不同,例如 {1,2,3}, {1,2,3,4}?

What prevents the compiler from looking to the right and realizing that we are dealing with 3 elements for each "subarray" or possibly returning an error only for cases when the programmer passes e.g. a different number of elements for each subarray like {1,2,3}, {1,2,3,4}?

例如,在处理一维字符数组时,编译器可以查看 = 右侧的字符串,这是有效的:

For example when dealing with a 1D char array the compiler can look at the string on the right hand side of = and this is valid:

const char str[] = "Str";

我想了解发生了什么,以便编译器无法推断数组维度并计算分配大小,因为现在在我看来编译器拥有这样做所需的所有信息.我在这里错过了什么?

I would like to understand what's happening so that the compiler is not able to deduce the array dimensions and calculate the size for allocation since now it seems to me like the compiler has all the information needed to do so. What am I missing here?

推荐答案

要求编译器从初始值设定项中推断内部维度将要求编译器以标准避免的方式追溯工作.

Requiring the compiler to infer inner dimensions from the initializers would require the compiler to work retroactively in a way the standard avoids.

该标准允许被初始化的对象引用它们自己.例如:

The standard allows objects being initialized to refer to themselves. For example:

struct foo { struct foo *next; int value; } head = { &head, 0 };

这定义了一个最初指向自身的链表的节点.(据推测,稍后会插入更多节点.)这是有效的,因为 C 2011 [N1570] 6.2.1 7 说标识符 head具有在其声明符完成后开始的范围."declarator 是声明语法的一部分,包括标识符名称以及声明的数组、函数和/或指针部分(例如,f(int, float)*a[3] 是声明符,在诸如 float f(int, float)int *a[3] 的声明中).

This defines a node of a linked list that points to itself initially. (Presumably, more nodes would be inserted later.) This is valid because C 2011 [N1570] 6.2.1 7 says the identifier head "has scope that begins just after the completion of its declarator." A declarator is the part of the grammar of a declaration that includes the identifier name along with the array, function, and/or pointer parts of the declaration (for example, f(int, float) and *a[3] are declarators, in a declarations such as float f(int, float) or int *a[3]).

因为 6.2.1 7,程序员可以写这个定义:

Because of 6.2.1 7, a programmer could write this definition:

void *p[][1] = { { p[1] }, { p[0] } };

考虑初始化器p[1].这是一个数组,因此它会自动转换为指向其第一个元素 p[1][0] 的指针.编译器知道该地址是因为它知道 p[i] 是一个 1 void * 的数组(对于 i 的任何值).如果编译器不知道 p[i] 有多大,它就无法计算出这个地址.所以,如果 C 标准允许我们写:

Consider the initializer p[1]. This is an array, so it is automatically converted to a pointer to its first element, p[1][0]. The compiler knows that address because it knows p[i] is an array of 1 void * (for any value of i). If the compiler did not know how big p[i] was, it could not calculate this address. So, if the C standard allowed us to write:

void *p[][] = { { p[1] }, { p[0] } };

然后编译器将不得不继续扫描 p[1] 以便它可以计算为第二维给出的初始化程序的数量(在这种情况下只有一个,但我们必须至少扫描到 } 看看,它可能更多),然后返回并计算 p[1] 的值.

then the compiler would have to continue scanning past p[1] so it can count the number of initializers given for the second dimension (just one in this case, but we have to scan at least to the } to see that, and it could be many more), then go back and calculate the value of p[1].

该标准避免强制编译器执行此类多遍工作.要求编译器推断内部维度会违反这个目标,所以标准没有这样做.

The standard avoids forcing compilers to do this sort of multiple-pass work. Requiring compilers to infer the inner dimensions would violate this goal, so the standard does not do it.

(事实上,我认为该标准可能不要求编译器做的只是有限数量的前瞻,可能在标记化过程中只需要几个字符,在解析语法时只需要一个标记,但我不确定. 有些东西的值直到链接时才知道,例如 void (*p)(void) = &SomeFunction;,但这些值由链接器填充.)

(In fact, I think the standard might not require the compiler to do any more than a finite amount of look-ahead, possibly just a few characters during tokenization and a single token while parsing the grammar, but I am not sure. Some things have values not known until link time, such as void (*p)(void) = &SomeFunction;, but those are filled in by the linker.)

另外,考虑一个定义,例如:

Additionally, consider a definition such as:

char x[][] =
    {
        {  0,  1 },
        { 10, 11 },
        { 20, 21, 22 }
    };

当编译器读取前两行初始值时,它可能希望在内存中准备一个数组副本.因此,当它读取第一行时,它将存储两个值.然后它看到行尾,因此它可以暂时假设内部维度为 2,形成 char x[][2].当它看到第二行时,它会分配更多内存(与 realloc 一样)并继续,将接下来的两个值 10 和 11 存储在适当的位置.

As the compiler reads the first two lines of initial values, it may want to prepare a copy of the array in memory. So, when it reads the first line, it will store two values. Then it sees the line end, so it can assume for the moment the inner dimension is 2, forming char x[][2]. When it sees the second line, it allocates more memory (as with realloc) and continues, storing the next two values, 10 and 11, in their appropriate places.

当它读取第三行看到22时,它意识到内部维度至少为3.现在编译器不能简单地分配更多内存.它必须重新排列 10 和 11 相对于 0 和 1 在内存中的位置,因为它们之间有一个新元素;x[0][2] 现在存在并且值为 0(到目前为止).因此,要求编译器推断内部维度,同时还允许每个子数组中有不同数量的初始化器(并根据整个列表中看到的初始化器的最大数量推断内部维度)会给编译器带来大量内存运动的负担.

When it reads the third line and sees 22, it realizes the inner dimension is at least three. Now the compiler cannot simply allocate more memory. It has to rearrange where 10 and 11 are in memory relative to 0 and 1, because there is a new element between them; x[0][2] now exists and has a value of 0 (so far). So requiring the compile to infer the inner dimensions while also allowing different numbers of initializers in each subarray (and inferring the inner dimension based on the maximum number of initializers seen throughout the entire list) can burden the compiler with a lot of memory motion.

相关文章