在 foreach 循环中什么更好...使用 &符号或基于键重新分配?

2021-12-26 00:00:00 arrays reference foreach php assign

考虑以下 PHP 代码:

Consider the following PHP Code:

//Method 1
$array = array(1,2,3,4,5);
foreach($array as $i=>$number){
  $number++;
  $array[$i] = $number;
}
print_r($array);


//Method 2
$array = array(1,2,3,4,5);
foreach($array as &$number){
  $number++;
}
print_r($array);

这两种方法完成相同的任务,一种是分配引用,另一种是基于键重新分配.我想在我的工作中使用好的编程技术,我想知道哪种方法是更好的编程实践?或者这是其中一件并不重要的事情?

Both methods accomplish the same task, one by assigning a reference and another by re-assigning based on key. I want to use good programming techniques in my work and I wonder which method is the better programming practice? Or is this one of those it doesn't really matter things?

推荐答案

由于得分最高的答案表明第二种方法在各方面都更好,我觉得有必要在这里发布答案.诚然,按引用循环性能更高,但并非没有风险/陷阱.
底线,一如既往:X 或 Y 哪个更好",你能得到的唯一真实答案是:

Since the highest scoring answer states that the second method is better in every way, I feel compelled to post an answer here. True, looping by reference is more performant, but it isn't without risks/pitfalls.
Bottom line, as always: "Which is better X or Y", the only real answers you can get are:

  • 这取决于你在做什么/你在做什么
  • 哦,两者都可以,如果你知道自己在做什么
  • X 适合这样,Y 适合所以
  • 不要忘记 Z,即使如此...(X、Y 或 Z 哪个更好" 是同一个问题,因此适用相同的答案:视情况而定,两者都是好的,如果...)
  • It depends on what you're after/what you're doing
  • Oh, both are OK, if you know what you're doing
  • X is good for Such, Y is better for So
  • Don't forget about Z, and even then ...("which is better X, Y or Z" is the same question, so the same answers apply: it depends, both are ok if...)

尽管如此,正如 Orangepill 所示,参考方法提供了更好的性能.在这种情况下,性能与代码之间的权衡更不容易出错,更易于阅读/维护.一般来说,人们认为使用更安全、更可靠且更易于维护的代码会更好:

Be that as it may, as Orangepill showed, the reference-approach offers better performance. In this case, the tradeoff one of performance vs code that is less error-prone, easier to read/maintan. In general, it's considered better to go for safer, more reliable, and more maintainable code:

'调试的难度是最初编写代码的两倍.因此,如果您尽可能聪明地编写代码,根据定义,您就不够聪明来调试它.— 布赖恩·克尼汉

'Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.' — Brian Kernighan

我想这意味着必须考虑第一种方法最佳实践.但这并不意味着应该始终避免使用第二种方法,因此接下来是在 foreach 中使用引用时必须考虑的缺点、陷阱和怪癖循环:

I guess that means the first method has to be considered best practice. But that doesn't mean the second approach should be avoided at all time, so what follows here are the downsides, pitfalls and quirks that you'll have to take into account when using a reference in a foreach loop:

范围:
首先,PHP 并不是像 C(++)、C#、Java、Perl 或(幸运的是)ECMAScript6 那样真正的块作用域……这意味着 $value 变量一旦循环结束,不会取消设置.当按引用循环时,这意味着对您正在迭代的任何对象/数组的最后一个值的引用是浮动的.应该会想起等待发生的事故"这句话.
考虑以下代码中 $value 和随后的 $array 会发生什么:

Scope:
For a start, PHP isn't truly block-scoped like C(++), C#, Java, Perl or (with a bit of luck) ECMAScript6... That means that the $value variable will not be unset once the loop has finished. When looping by reference, this means a reference to the last value of whatever object/array you were iterating is floating around. The phrase "an accident waiting to happen" should spring to mind.
Consider what happens to $value, and subsequently $array, in the following code:

$array = range(1,10);
foreach($array as &$value)
{
    $value++;
}
echo json_encode($array);
$value++;
echo json_encode($array);
$value = 'Some random value';
echo json_encode($array);

此代码段的输出将是:

[2,3,4,5,6,7,8,9,10,11]
[2,3,4,5,6,7,8,9,10,12]
[2,3,4,5,6,7,8,9,10,"Some random value"]

换句话说,通过重用 $value 变量(它引用数组中的最后一个元素),您实际上是在操作数组本身.这使得代码容易出错,调试困难.与此相反:

In other words, by reusing the $value variable (which references the last element in the array), you're actually manipulating the array itself. This makes for error-prone code, and difficult debugging. As opposed to:

$array = range(1,10);
$array[] = 'foobar';
foreach($array as $k => $v)
{
    $array[$k]++;//increments foobar, to foobas!
    if ($array[$k] === ($v +1))//$v + 1 yields 1 if $v === 'foobar'
    {//so 'foobas' === 1 => false
        $array[$k] = $v;//restore initial value: foobar
    }
}

可维护性/防白痴:
当然,您可能会说悬空引用很容易解决,而且您是对的:

Maintainability/idiot-proofness:
Of course, you might say that the dangling reference is an easy fix, and you'd be right:

foreach($array as &$value)
{
    $value++;
}
unset($value);

但是在您用引用编写了前 100 个循环之后,您真的相信您不会忘记取消设置单个引用吗?当然不是!unset 已经在循环中使用的变量非常罕见(我们假设 GC 会为我们处理它),所以大多数时候,你不会打扰.当涉及引用时,这是令人沮丧、神秘的错误报告或移动值的来源,在这种情况下,您正在使用复杂的嵌套循环,可能有多个引用......恐怖,恐怖.
此外,随着时间的推移,谁能说下一个处理您代码的人不会忘记 unset?谁知道呢,他甚至可能不知道引用,或者看到您无数的 unset 调用并认为它们是多余的,这是您偏执的标志,然后将它们全部删除.评论本身对你没有帮助:他们需要被阅读,并且每个使用你的代码的人都应该被彻底介绍,也许让他们阅读有关该主题的完整文章.链接文章中列出的示例很糟糕,但我见过更糟糕的情况:

But after you've written your first 100 loops with references, do you honestly believe you won't have forgotten to unset a single reference? Of course not! It's so uncommon to unset variables that have been used in a loop (we assume the GC will take care of it for us), so most of the time, you don't bother. When references are involved, this is a source of frustration, mysterious bug-reports, or traveling values, where you're using complex nested loops, possibly with multiple references... The horror, the horror.
Besides, as time passes, who's to say that the next person working on your code won't foget about unset? Who knows, he might not even know about references, or see your numerous unset calls and deem them redundant, a sign of your being paranoid, and delete them all together. Comments alone won't help you: they need to be read, and everyone working with your code should be thoroughly briefed, perhaps have them read a full article on the subject. The examples listed in the linked article are bad, but I've seen worse, still:

foreach($nestedArr as &$array)
{
    if (count($array)%2 === 0)
    {
        foreach($array as &$value)
        {//pointless, but you get the idea...
            $value = array($value, 'Part of even-length array');
        }
        //$value now references the last index of $array
    }
    else
    {
        $value = array_pop($array);//assigns new value to var that might be a reference!
        $value = is_numeric($value) ? $value/2 : null;
        array_push($array, $value);//congrats, X-references ==> traveling value!
    }
}

这是一个简单的旅行值问题示例.我没有编造这个,顺便说一句,我遇到了归结为这个的代码......老实说.除了发现错误和理解代码(参考文献变得更加困难)之外,在这个例子中仍然很明显,主要是因为它只有 15 行长,即使使用宽敞的 Allman 编码风格......现在想象一下在代码中使用的这个基本结构实际上做一些更复杂、更有意义的事情.祝调试成功.

This is a simple example of a traveling value problem. I did not make this up, BTW, I've come across code that boils down to this... honestly. Quite apart from spotting the bug, and understanding the code (which has been made more difficult by the references), it's still quite obvious in this example, mainly because it's a mere 15 lines long, even using the spacious Allman coding style... Now imagine this basic construct being used in code that actually does something even slightly more complex, and meaningful. Good luck debugging that.

副作用:
人们常说函数不应该有副作用,因为副作用(理所当然地)被认为是代码气味.尽管 foreach 是一种语言结构,而不是一个函数,但在您的示例中,应该应用相同的思维方式.当使用太多引用时,你太聪明了,不利于自己,并且可能会发现自己不得不单步执行循环,只是为了知道什么变量引用了什么,什么时候引用.
第一种方法没有这个问题:你有钥匙,所以你知道你在数组中的位置.更重要的是,使用第一种方法,您可以对值执行任意数量的操作,而无需更改数组中的原始值(无副作用):

side-effects:
It's often said that functions shouldn't have side-effects, because side-effects are (rightfully) considered to be code-smell. Though foreach is a language construct, and not a function, in your example, the same mindset should apply. When using too many references, you're being too clever for your own good, and might find yourself having to step through a loop, just to know what is being referenced by what variable, and when.
The first method hasn't got this problem: you have the key, so you know where you are in the array. What's more, with the first method, you can perform any number of operations on the value, without changing the original value in the array (no side-effects):

function recursiveFunc($n, $max = 10)
{
    if (--$max)
    {
        return $n === 1 ? 10-$max : recursiveFunc($n%2 ? ($n*3)+1 : $n/2, $max);
    }
    return null;
}
$array = range(10,20);
foreach($array as $k => $v)
{
    $v = recursiveFunc($v);//reassigning $v here
    if ($v !== null)
    {
        $array[$k] = $v;//only now, will the actual array change
    }
}
echo json_encode($array);

这会生成输出:

[7,11,12,13,14,15,5,17,18,19,8]

如您所见,第一个、第七个和第十个元素已更改,其他元素未更改.如果我们使用循环引用重写这段代码,循环看起来小很多,但输出会有所不同(我们有副作用):

As you can see, the first, seventh and tenth elements have been altered, the others haven't. If we were to rewrite this code using a loop by reference, the loop looks a lot smaller, but the output will be different (we have a side-effect):

$array = range(10,20);
foreach($array as &$v)
{
    $v = recursiveFunc($v);//Changes the original array...
    //granted, if your version permits it, you'd probably do:
    $v = recursiveFunc($v) ?: $v;
}
echo json_encode($array);
//[7,null,null,null,null,null,5,null,null,null,8]

为了解决这个问题,我们要么创建一个临时变量,要么调用函数 tiwce,要么添加一个键,然后重新计算 $v 的初始值,但这只是愚蠢的(这增加了修复不应该被破坏的东西的复杂性):

To counter this, we'll either have to create a temporary variable, or call the function tiwce, or add a key, and recalculate the initial value of $v, but that's just plain stupid (that's adding complexity to fix what shouldn't be broken):

foreach($array as &$v)
{
    $temp = recursiveFunc($v);//creating copy here, anyway
    $v = $temp ? $temp : $v;//assignment doesn't require the lookup, though
}
//or:
foreach($array as &$v)
{
    $v = recursiveFunc($v) ? recursiveFunc($v) : $v;//2 calls === twice the overhead!
}
//or
$base = reset($array);//get the base value
foreach($array as $k => &$v)
{//silly combine both methods to fix what needn't be a problem to begin with
    $v = recursiveFunc($v);
    if ($v === 0)
    {
        $v = $base + $k;
    }
}

无论如何,添加分支、临时变量和你有什么,而不是打败重点.首先,它引入了额外的开销,这将侵蚀参考文献最初为您提供的性能优势.
如果您必须向循环添加逻辑,以修复不应该修复的问题,您应该退后一步,考虑一下您正在使用哪些工具.9/10 次,您为这项工作选择了错误的工具.

Anyway, adding branches, temp variables and what have you, rather defeats the point. For one, it introduces extra overhead which will eat away at the performance benefits references gave you in the first place.
If you have to add logic to a loop, to fix something that shouldn't need fixing, you should step back, and think about what tools you're using. 9/10 times, you chose the wrong tool for the job.

至少对我来说,第一种方法的最后一个令人信服的论点很简单:可读性.如果您正在做一些快速修复或尝试添加功能,则引用运算符 (&) 很容易被忽略.您可能会在运行良好的代码中创建错误.更重要的是:因为它运行良好,您可能不会彻底测试现有功能因为没有已知问题.
由于您忽略了操作员而发现进入生产的错误可能听起来很愚蠢,但您不会是第一个遇到这种情况的人.

The last thing that, to me at least, is a compelling argument for the first method is simple: readability. The reference-operator (&) is easily overlooked if you're doing some quick fixes, or try to add functionality. You could be creating bugs in the code that was working just fine. What's more: because it was working fine, you might not test the existing functionality as thoroughly because there were no known issues.
Discovering a bug that went into production, because of your overlooking an operator might sound silly, but you wouldn't be the first to have encountered this.

注意:
自 5.4 以来,在调用时通过引用传递已被删除.对可能会发生变化的特性/功能感到厌烦.数组的标准迭代多年来没有改变.我想这就是您可以称之为经过验证的技术".它按照它在罐头上所说的做,并且是更安全的做事方式.那么如果它更慢呢?如果速度是一个问题,您可以优化代码,然后引入对循环的引用.
编写新代码时,请选择易于阅读、最安全的选项.优化可以(而且确实应该)等到一切都经过尝试和测试.

Note:
Passing by reference at call-time has been removed since 5.4. Be weary of features/functionality that is subject to changes. a standard iteration of an array hasn't changed in years. I guess it's what you could call "proven technology". It does what it says on the tin, and is the safer way of doing things. So what if it's slower? If speed is an issue, you can optimize your code, and introduce references to your loops then.
When writing new code, go for the easy-to-read, most failsafe option. Optimization can (and indeed should) wait until everything's tried and tested.

和往常一样:过早的优化是万恶之源.并且为工作选择合适的工具,而不是因为它是新的和闪亮的.

相关文章