运算符优先级与评估顺序
术语运算符优先级"和求值顺序"是编程中非常常用的术语,对于程序员来说非常重要.而且,据我了解,这两个概念是紧密相连的;在谈论表达时,一个不能没有另一个.
我们举个简单的例子:
int a=1;//第 1 行a = a++ + ++a;//第 2 行printf("%d",a);//第 3 行
现在,很明显 Line 2
会导致未定义行为,因为 C 和 C++ 中的序列点包括:
&& 的左右操作数的求值之间(逻辑和),||(逻辑或)和逗号运营商.例如,在表达式
*p++ != 0 &&*q++ != 0
,全部子表达式的副作用*p++ != 0
在任何尝试访问q
之前完成.在计算三元的第一个操作数之间问号"运算符和第二个或第三个操作数.例如,在表达式
a = (*p++) 中?(*p++): 0
后面有一个序列点第一个*p++
,表示它已经由时间递增第二个实例被执行.在完整表达式的末尾.此类别包括表达式语句(例如赋值
a=b;
)、return 语句、控制 if、switch 的表达式,while 或 do-while 语句,以及所有for 语句中的三个表达式.在函数调用中输入函数之前.其中的顺序参数被评估不是指定,但这个序列点意味着它们的所有副作用在功能完成之前完成进入.在表达式
f(i++) + g(j++) + h(k++)
中,f
被调用i
的原始值的参数,但是i
在进入之前会增加f
的主体.类似地,j
和k
是在输入g
和h
之前更新分别.然而,它不是按顺序指定f()
、g()
、h()
i
,j
,k
递增.j
的值和因此f
主体中的k
是未定义.3 注意一个函数调用f(a,b,c)
不是使用逗号运算符和顺序a
、b
和c
的评估是未指定.在函数返回时,将返回值复制到调用上下文.(这个序列点仅在 C++ 标准中指定;它只隐含地存在于C.)
在初始化器的末尾;例如,在评估 5在声明
int a = 5;
.
因此,通过第 3 点:
在完整表达式的末尾.此类别包括表达式语句(例如赋值 a=b;)、return 语句、if、switch、while 或 do-while 语句的控制表达式,以及 for 语句中的所有三个表达式.p>
Line 2
显然会导致未定义的行为.这显示了 Undefined Behaviour 如何与 Sequence Points 紧密耦合.
现在让我们再举一个例子:
int x=10,y=1,z=2;//第 4 行int 结果 = x
现在很明显 Line 5
将使变量 result
存储 1
.
现在 Line 5
中的表达式 x<y<z
可以计算为:
x<(y
(x
result
的值将是 0
,而在第二种情况下,result
将是 1
.但是我们知道,当 Operator Precedence
是 Equal/Same
- Associativity
开始发挥作用,因此,被评估为 (x<y)
这就是 MSDN 文章中所说的:p>
C 运算符的优先级和结合性会影响表达式中操作数的分组和求值.仅当存在具有更高或更低优先级的其他运算符时,运算符的优先级才有意义.首先计算具有较高优先级运算符的表达式.优先级也可以用绑定"一词来描述.具有更高优先级的运算符被称为具有更紧密的绑定.
现在,关于上面的文章:
它提到首先评估具有更高优先级运算符的表达式."
听起来可能不正确.但是,如果我们认为 ()
也是运算符 x<y<z
与 (x
<
不是 序列点.
另外,我发现的另一个链接在 运算符优先级和关联性上说:
本页按优先顺序(从高到低)列出了 C 运算符.它们的关联性表明表达式中具有相同优先级的运算符的应用顺序.
所以以int result=x<y<z
的第二个例子,我们可以看到这里有3个表达式,x
,y
和 z
,因为最简单的表达式形式由单个文字常量或对象组成.因此表达式 x
、y
、z
的结果将是 rvalues,即 10
、1
和 2
.因此,现在我们可以将 x<y<z
解释为 10<1<2
.
现在,关联性没有发挥作用,因为现在我们要评估 2 个表达式,10<1
或 1<2
并且由于运算符的优先级是一样的,他们是从左到右评估的?
以最后一个例子作为我的论据:
int myval = ( printf("Operator
"), printf("Precedence
"), printf("vs
"),printf("计算顺序
") );
现在在上面的例子中,由于 comma
运算符具有相同的优先级,表达式被计算 left-to-right
和最后一个 的返回值printf()
存储在 myval
中.
在 SO/IEC 9899:201x 在 J.1 未指明的行为 下它提到:
子表达式的求值顺序和副作用的顺序发生,除非为函数调用 ()、&&、||、?: 和逗号指定运算符 (6.5).
现在我想知道,这样说会不会错:
评估顺序取决于运算符的优先级,留下未指定行为的情况.
如果我在问题中所说的内容有任何错误,我希望得到纠正.我发布这个问题的原因是因为 MSDN 文章在我脑海中造成的混乱.是否在错误中?
解决方案是的,MSDN 文章有错误,至少在标准 C 和 C++ 方面是错误的1.
话虽如此,让我从有关术语的注释开始:在 C++ 标准中,它们(主要是有一些失误)使用评估"来指代评估操作数,而值计算"指执行一项操作.因此,当(例如)您执行 a + b
时,a
和 b
中的每一个都会被计算,然后执行值计算以确定结果.
很明显,值计算的顺序(主要)由优先级和关联性控制――控制值计算基本上是定义优先级和关联性是的.该答案的其余部分使用评估"来指代操作数的评估,而不是值计算.
现在,关于由优先级确定的评估顺序,不,不是!就这么简单.举个例子,让我们考虑一下您的 x<y<z
示例.根据关联性规则,这将解析为 (x
push(z);//计算它的参数并将值压入堆栈推(y);推(x);test_less();//比较 TOS 和 TOS(1),将结果压入堆栈test_less();
这会在 x
或 y
之前计算 z
,但仍会计算 (x<y)
,然后比较与 z
比较的结果,正如它应该的那样.
总结:评估顺序与关联性无关.
优先级也是一样的.我们可以将表达式改为x*y+z
,仍然在x
或y
之前计算z
:
push(z);推(y);推(x);mul();添加();
总结:评估顺序与优先级无关.
当/如果我们添加副作用,这保持不变.我认为将副作用视为由单独的执行线程执行,在下一个序列点(例如,表达式的末尾)有一个 join
是很有教育意义的.所以像 a=b++ + ++c;
这样的东西可以像这样执行:
push(a);推(b);推(c+1);side_effects_thread.queue(inc, b);side_effects_thread.queue(inc, c);添加();分配();加入(side_effects_thread);
这也说明了为什么明显的依赖关系也不一定会影响评估顺序.即使 a
是赋值的目标,这仍然会评估 a
before 评估 b
或 c代码>.另请注意,尽管我在上面将其写为线程",但这也可以是线程的池,所有线程都并行执行,因此您无法保证顺序一个增量与另一个增量.
除非硬件直接(并且便宜)支持线程安全队列,否则这可能不会在实际实现中使用(即使这样也不太可能).将某些东西放入线程安全队列通常会比执行单个增量具有更多的开销,因此很难想象有人在现实中这样做过.然而,从概念上讲,这个想法符合标准的要求:当您使用前/后递增/递减操作时,您指定的操作将在表达式的该部分被评估后的某个时间发生,并将在下一个序列点.
虽然它不完全是线程,但某些架构确实允许这种并行执行.举几个例子,英特尔安腾和 VLIW 处理器(例如某些 DSP)允许编译器指定要并行执行的多个指令.大多数 VLIW 机器都有一个特定的指令包"大小,它限制了并行执行的指令数量.Itanium 也使用指令包,但在指令包中指定一个位表示当前包中的指令可以与下一个包中的指令并行执行.使用这样的机制,您可以获得并行执行的指令,就像您在我们大多数人更熟悉的架构上使用多个线程一样.
总结:评估顺序独立于明显的依赖关系
在下一个序列点之前使用该值的任何尝试都会产生未定义的行为 - 特别是,其他线程"在此期间(可能)正在修改该数据,而您 没有 方法与其他线程同步访问.任何使用它的尝试都会导致未定义的行为.
仅举一个(诚然,现在相当牵强)示例,想想您的代码在 64 位虚拟机上运行,??但真正的硬件是 8 位处理器.当您增加一个 64 位变量时,它会执行如下序列:
加载变量[0]增量存储变量[0]for (int i=1; i<8; i++) {加载变量[i]add_with_carry 0存储变量[i]}
如果您在该序列的中间某处读取该值,您可能会得到一些只修改了一些字节的东西,所以您得到的不是旧值也不是新值.
这个确切的例子可能有点牵强,但不太极端的版本(例如,32 位机器上的 64 位变量)实际上相当普遍.
结论
评估顺序不依赖于优先级、关联性或(必然)依赖于明显的依赖关系.尝试使用已在表达式的任何其他部分应用了前/后递增/递减的变量确实会产生完全未定义的行为.虽然不太可能发生实际崩溃,但绝对不保证您会获得旧值或新值 - 您可以完全获得其他值.
<小时>1 我没有查看过这篇特定的文章,但是相当多的 MSDN 文章谈论了 Microsoft 的托管 C++ 和/或 C++/CLI(或特定于他们的 C++ 实现),但几乎没有或者没有什么可以指出它们不适用于标准 C 或 C++.这可能会造成他们声称他们决定应用于他们自己的语言的规则实际上适用于标准语言的错误表象.在这些情况下,这些文章在技术上并不是错误的――它们只是与标准 C 或 C++ 没有任何关系.如果您尝试将这些语句应用于标准 C 或 C++,则结果为 false.
The terms 'operator precedence' and 'order of evaluation' are very commonly used terms in programming and extremely important for a programmer to know. And, as far as I understand them, the two concepts are tightly bound; one cannot do without the other when talking about expressions.
Let us take a simple example:
int a=1; // Line 1
a = a++ + ++a; // Line 2
printf("%d",a); // Line 3
Now, it is evident that Line 2
leads to Undefined Behavior, since Sequence points in C and C++ include:
Between evaluation of the left and right operands of the && (logical AND), || (logical OR), and comma operators. For example, in the expression
*p++ != 0 && *q++ != 0
, all side effects of the sub-expression*p++ != 0
are completed before any attempt to accessq
.Between the evaluation of the first operand of the ternary "question-mark" operator and the second or third operand. For example, in the expression
a = (*p++) ? (*p++) : 0
there is a sequence point after the first*p++
, meaning it has already been incremented by the time the second instance is executed.At the end of a full expression. This category includes expression statements (such as the assignment
a=b;
), return statements, the controlling expressions of if, switch, while, or do-while statements, and all three expressions in a for statement.Before a function is entered in a function call. The order in which the arguments are evaluated is not specified, but this sequence point means that all of their side effects are complete before the function is entered. In the expression
f(i++)?+ g(j++) + h(k++)
,f
is called with a parameter of the original value ofi
, buti
is incremented before entering the body off
. Similarly,j
andk
are updated before enteringg
andh
respectively. However, it is not specified in which orderf()
,g()
,h()
are executed, nor in which orderi
,j
,k
are incremented. The values ofj
andk
in the body off
are therefore undefined.3 Note that a function callf(a,b,c)
is not a use of the comma operator and the order of evaluation fora
,b
, andc
is unspecified.At a function return, after the return value is copied into the calling context. (This sequence point is only specified in the C++ standard; it is present only implicitly in C.)
At the end of an initializer; for example, after the evaluation of 5 in the declaration
int a = 5;
.
Thus, going by Point # 3:
At the end of a full expression. This category includes expression statements (such as the assignment a=b;), return statements, the controlling expressions of if, switch, while, or do-while statements, and all three expressions in a for statement.
Line 2
clearly leads to Undefined Behavior. This shows how Undefined Behaviour is tightly coupled with Sequence Points.
Now let us take another example:
int x=10,y=1,z=2; // Line 4
int result = x<y<z; // Line 5
Now its evident that Line 5
will make the variable result
store 1
.
Now the expression x<y<z
in Line 5
can be evaluated as either:
x<(y<z)
or (x<y)<z
. In the first case the value of result
will be 0
and in the second case result
will be 1
. But we know, when the Operator Precedence
is Equal/Same
- Associativity
comes into play, hence, is evaluated as (x<y)<z
.
This is what is said in this MSDN Article:
The precedence and associativity of C operators affect the grouping and evaluation of operands in expressions. An operator's precedence is meaningful only if other operators with higher or lower precedence are present. Expressions with higher-precedence operators are evaluated first. Precedence can also be described by the word "binding." Operators with a higher precedence are said to have tighter binding.
Now, about the above article:
It mentions "Expressions with higher-precedence operators are evaluated first."
It may sound incorrect. But, I think the article is not saying something wrong if we consider that ()
is also an operator x<y<z
is same as (x<y)<z
. My reasoning is if associativity does not come into play, then the complete expressions evaluation would become ambiguous since <
is not a Sequence Point.
Also, another link I found says this on Operator Precedence and Associativity:
This page lists C operators in order of precedence (highest to lowest). Their associativity indicates in what order operators of equal precedence in an expression are applied.
So taking, the second example of int result=x<y<z
, we can see here that there are in all 3 expressions, x
, y
and z
, since, the simplest form of an expression consists of a single literal constant or object. Hence the result of the expressions x
, y
, z
would be there rvalues, i.e., 10
, 1
and 2
respectively. Hence, now we may interpret x<y<z
as 10<1<2
.
Now, doesn't Associativity come into play since now we have 2 expressions to be evaluated, either 10<1
or 1<2
and since the precedence of operator is same, they are evaluated from left to right?
Taking this last example as my argument:
int myval = ( printf("Operator
"), printf("Precedence
"), printf("vs
"),
printf("Order of Evaluation
") );
Now in the above example, since the comma
operator has same precedence, the expressions are evaluated left-to-right
and the return value of the last printf()
is stored in myval
.
In SO/IEC 9899:201x under J.1 Unspecified behavior it mentions:
The order in which subexpressions are evaluated and the order in which side effects take place, except as specified for the function-call (), &&, ||, ?:, and comma operators (6.5).
Now I would like to know, would it be wrong to say:
Order of Evaluation depends on the precedence of operators, leaving cases of Unspecified Behavior.
I would like to be corrected if any mistakes were made in something I said in my question. The reason I posted this question is because of the confusion created in my mind by the MSDN Article. Is it in Error or not?
解决方案Yes, the MSDN article is in error, at least with respect to standard C and C++1.
Having said that, let me start with a note about terminology: in the C++ standard, they (mostly--there are a few slip-ups) use "evaluation" to refer to evaluating an operand, and "value computation" to refer to carrying out an operation. So, when (for example) you do a + b
, each of a
and b
is evaluated, then the value computation is carried out to determine the result.
It's clear that the order of value computations is (mostly) controlled by precedence and associativity--controlling value computations is basically the definition of what precedence and associativity are. The remainder of this answer uses "evaluation" to refer to evaluation of operands, not to value computations.
Now, as to evaluation order being determined by precedence, no it's not! It's as simple as that. Just for example, let's consider your example of x<y<z
. According to the associativity rules, this parses as (x<y)<z
. Now, consider evaluating this expression on a stack machine. It's perfectly allowable for it to do something like this:
push(z); // Evaluates its argument and pushes value on stack
push(y);
push(x);
test_less(); // compares TOS to TOS(1), pushes result on stack
test_less();
This evaluates z
before x
or y
, but still evaluates (x<y)
, then compares the result of that comparison to z
, just as it's supposed to.
Summary: Order of evaluation is independent of associativity.
Precedence is the same way. We can change the expression to x*y+z
, and still evaluate z
before x
or y
:
push(z);
push(y);
push(x);
mul();
add();
Summary: Order of evaluation is independent of precedence.
When/if we add in side effects, this remains the same. I think it's educational to think of side effects as being carried out by a separate thread of execution, with a join
at the next sequence point (e.g., the end of the expression). So something like a=b++ + ++c;
could be executed something like this:
push(a);
push(b);
push(c+1);
side_effects_thread.queue(inc, b);
side_effects_thread.queue(inc, c);
add();
assign();
join(side_effects_thread);
This also shows why an apparent dependency doesn't necessarily affect order of evaluation either. Even though a
is the target of the assignment, this still evaluates a
before evaluating either b
or c
. Also note that although I've written it as "thread" above, this could also just as well be a pool of threads, all executing in parallel, so you don't get any guarantee about the order of one increment versus another either.
Unless the hardware had direct (and cheap) support for thread-safe queuing, this probably wouldn't be used in in a real implementation (and even then it's not very likely). Putting something into a thread-safe queue will normally have quite a bit more overhead than doing a single increment, so it's hard to imagine anybody ever doing this in reality. Conceptually, however, the idea is fits the requirements of the standard: when you use a pre/post increment/decrement operation, you're specifying an operation that will happen sometime after that part of the expression is evaluated, and will be complete at the next sequence point.
Edit: though it's not exactly threading, some architectures do allow such parallel execution. For a couple of examples, the Intel Itanium and VLIW processors such as some DSPs, allow a compiler to designate a number of instructions to be executed in parallel. Most VLIW machines have a specific instruction "packet" size that limits the number of instructions executed in parallel. The Itanium also uses packets of instructions, but designates a bit in an instruction packet to say that the instructions in the current packet can be executed in parallel with those in the next packet. Using mechanisms like this, you get instructions executing in parallel, just like if you used multiple threads on architectures with which most of us are more familiar.
Summary: Order of evaluation is independent of apparent dependencies
Any attempt at using the value before the next sequence point gives undefined behavior -- in particular, the "other thread" is (potentially) modifying that data during that time, and you have no way of synchronizing access with the other thread. Any attempt at using it leads to undefined behavior.
Just for a (admittedly, now rather far-fetched) example, think of your code running on a 64-bit virtual machine, but the real hardware is an 8-bit processor. When you increment a 64-bit variable, it executes a sequence something like:
load variable[0]
increment
store variable[0]
for (int i=1; i<8; i++) {
load variable[i]
add_with_carry 0
store variable[i]
}
If you read the value somewhere in the middle of that sequence, you could get something with only some of the bytes modified, so what you get is neither the old value nor the new one.
This exact example may be pretty far-fetched, but a less extreme version (e.g., a 64-bit variable on a 32-bit machine) is actually fairly common.
Conclusion
Order of evaluation does not depend on precedence, associativity, or (necessarily) on apparent dependencies. Attempting to use a variable to which a pre/post increment/decrement has been applied in any other part of an expression really does give completely undefined behavior. While an actual crash is unlikely, you're definitely not guaranteed to get either the old value or the new one -- you could get something else entirely.
1 I haven't checked this particular article, but quite a few MSDN articles talk about Microsoft's Managed C++ and/or C++/CLI (or are specific to their implementation of C++) but do little or nothing to point out that they don't apply to standard C or C++. This can give the false appearance that they're claiming the rules they have decided to apply to their own languages actually apply to the standard languages. In these cases, the articles aren't technically false -- they just don't have anything to do with standard C or C++. If you attempt to apply those statements to standard C or C++, the result is false.
相关文章