关于闭包、LexicalEnvironment 和 GC

2022-01-16 00:00:00 garbage-collection closures javascript

和ECMAScriptv5一样,每次控件输入代码时,引擎都会创建一个LexicalEnvironment(LE)和一个VariableEnvironment(VE),用于功能代码,这两个对象是完全相同的引用,这是调用 NewDeclarativeEnvironment(ECMAScriptv5 10.4.3),函数代码中声明的所有变量都存储在VariableEnvironment(ECMAScript v5 10.5),这是闭包的基本概念.

as ECMAScriptv5, each time when control enters a code, the enginge creates a LexicalEnvironment(LE) and a VariableEnvironment(VE), for function code, these 2 objects are exactly the same reference which is the result of calling NewDeclarativeEnvironment(ECMAScript v5 10.4.3), and all variables declared in function code are stored in the environment record componentof VariableEnvironment(ECMAScript v5 10.5), and this is the basic concept for closure.

让我感到困惑的是 Garbage Collect 如何使用这种闭包方法,假设我的代码如下:

What confused me is how Garbage Collect works with this closure approach, suppose I have code like:

function f1() {
    var o = LargeObject.fromSize('10MB');
    return function() {
        // here never uses o
        return 'Hello world';
    }
}
var f2 = f1();

var f2 = f1() 行之后,我们的对象图将是:

after the line var f2 = f1(), our object graph would be:

global -> f2 -> f2's VariableEnvironment -> f1's VariableEnvironment -> o

据我所知,如果javascript引擎使用引用计数方法进行垃圾回收,那么对象o至少有1个引用strong> 并且永远不会被 GC.显然这会导致内存浪费,因为 o 永远不会被使用,而是始终存储在内存中.

so as from my little knowledge, if the javascript engine uses a reference counting method for garbage collection, the object o has at lease 1 refenrence and would never be GCed. Appearently this would result a waste of memory since o would never be used but is always stored in memory.

可能有人说引擎知道f2的VariableEnvironment不使用f1的VariableEnvironment,所以整个f1的VariableEnvironment都会被GC,所以还有另一个代码片段可能会导致更复杂的情况:

Someone may said the engine knows that f2's VariableEnvironment doesn't use f1's VariableEnvironment, so the entire f1's VariableEnvironment would be GCed, so there is another code snippet which may lead to more complex situation:

function f1() {
    var o1 = LargeObject.fromSize('10MB');
    var o2 = LargeObject.fromSize('10MB');
    return function() {
        alert(o1);
    }
}
var f2 = f1();

在这种情况下,f2使用o1对象,该对象存储在f1的VariableEnvironment中,所以f2的VariableEnvironment必须保留对f1的VariableEnvironment的引用,导致o2也不能被GC,进一步造成内存浪费.

in this case, f2 uses the o1 object which stores in f1's VariableEnvironment, so f2's VariableEnvironment must keep a reference to f1's VariableEnvironment, which result that o2 cannot be GCed as well, which further result in a waste of memory.

所以我想问一下,现代 javascript 引擎(JScript.dll/V8/SpiderMonkey ...)如何处理这种情况,是否有标准的指定规则或者它是否基于实现,以及 javascript 引擎处理这种情况的确切步骤是什么执行垃圾回收时的对象图.

so I would ask, how modern javascript engine (JScript.dll / V8 / SpiderMonkey ...) handles such situation, is there a standard specified rule or is it implementation based, and what is the exact step javascript engine handles such object graph when executing Garbage Collection.

谢谢.

推荐答案

tl;dr 回答: "只有从内部 fns 引用的变量才会在 V8 中进行堆分配.如果使用 eval,则假定所有变量都被引用.".在您的第二个示例中, o2 可以在堆栈上分配并在 f1 退出后被丢弃.

tl;dr answer: "Only variables referenced from inner fns are heap allocated in V8. If you use eval then all vars assumed referenced.". In your second example, o2 can be allocated on the stack and is thrown away after f1 exits.

我认为他们无法应付.至少我们知道有些引擎不能,因为众所周知这是导致许多内存泄漏的原因,例如:

I don't think they can handle it. At least we know that some engines cannot, as this is known to be the cause of many memory leaks, as for example:

function outer(node) {
    node.onclick = function inner() { 
        // some code not referencing "node"
    };
}

innernode 上闭合,形成一个循环引用 inner ->外部变量上下文->节点->inner,在例如 IE6 中永远不会被释放,即使 DOM 节点已从文档中删除.不过有些浏览器处理得很好:循环引用本身不是问题,问题在于 IE6 中的 GC 实现.但现在我离题了.

where inner closes over node, forming a circular reference inner -> outer's VariableContext -> node -> inner, which will never be freed in for instance IE6, even if the DOM node is removed from the document. Some browsers handle this just fine though: circular references themselves are not a problem, it's the GC implementation in IE6 that is the problem. But now I digress from the subject.

打破循环引用的一种常用方法是在 outer 的末尾清除所有不必要的变量.即,设置 node = null.那么问题是现代 javascript 引擎是否可以为您执行此操作,它们能否以某种方式推断出 inner 中未使用变量?

A common way to break the circular reference is to null out all unnecessary variables at the end of outer. I.e., set node = null. The question is then whether modern javascript engines can do this for you, can they somehow infer that a variable is not used within inner?

我认为答案是否定的,但我可以被证明是错误的.原因是下面的代码执行得很好:

I think the answer is no, but I can be proven wrong. The reason is that the following code executes just fine:

function get_inner_function() {
    var x = "very big object";
    var y = "another big object";
    return function inner(varName) {
        alert(eval(varName));
    };
}

func = get_inner_function();

func("x");
func("y");

使用这个 jsfiddle 示例 亲自查看.inner 内没有对 xy 的引用,但仍可使用 eval 访问它们.(令人惊讶的是,如果您将 eval 别名为其他名称,例如 myeval,然后调用 myeval,您不会获得新的执行上下文 - 这是即使在规范中,请参阅 ECMA-262 中的第 10.4.2 和 15.1.2.1.1 节.)

See for yourself using this jsfiddle example. There are no references to either x or y inside inner, but they are still accessible using eval. (Amazingly, if you alias eval to something else, say myeval, and call myeval, you DO NOT get a new execution context - this is even in the specification, see sections 10.4.2 and 15.1.2.1.1 in ECMA-262.)

根据您的评论,似乎一些现代引擎实际上做了一些聪明的把戏,所以我试着再挖掘一点.我遇到了这个 论坛帖子 讨论这个问题,特别是一个链接一条关于如何在 V8 中分配变量的推文.它还专门涉及 eval 问题.似乎它必须解析所有内部函数中的代码.并查看引用了哪些变量,或者是否使用了 eval,然后确定每个变量应该分配在堆上还是堆栈上.挺整洁的.这是 另一个博客,其中包含许多有关 ECMAScript 实现的详细信息.

As per your comment, it appears that some modern engines actually do some smart tricks, so I tried to dig a little more. I came across this forum thread discussing the issue, and in particular, a link to a tweet about how variables are allocated in V8. It also specifically touches on the eval problem. It seems that it has to parse the code in all inner functions. and see what variables are referenced, or if eval is used, and then determine whether each variable should be allocated on the heap or on the stack. Pretty neat. Here is another blog that contains a lot of details on the ECMAScript implementation.

这意味着即使内部函数从不逃避"调用,它仍然可以强制在堆上分配变量.例如:

This has the implication that even if an inner function never "escapes" the call, it can still force variables to be allocated on the heap. E.g.:

function init(node) {

    var someLargeVariable = "...";

    function drawSomeWidget(x, y) {
        library.draw(x, y, someLargeVariable);
    }

    drawSomeWidget(1, 1);
    drawSomeWidget(101, 1);

    return function () {
        alert("hi!");
    };
}

现在,由于 init 已完成调用,someLargeVariable 不再被引用并且应该可以删除,但我怀疑它不是,除非内部函数drawSomeWidget 已被优化掉(内联?).如果是这样,当使用自执行函数来模仿具有私有/公共方法的类时,这可能会经常发生.

Now, as init has finished its call, someLargeVariable is no longer referenced and should be eligible for deletion, but I suspect that it is not, unless the inner function drawSomeWidget has been optimized away (inlined?). If so, this could probably occur pretty frequently when using self-executing functions to mimick classes with private / public methods.

回答下面的 Raynos 评论.我在调试器中尝试了上述场景(稍作修改),结果和我预测的一样,至少在 Chrome 中:

Answer to Raynos comment below. I tried the above scenario (slightly modified) in the debugger, and the results are as I predict, at least in Chrome:

执行内部函数时,someLargeVariable 仍在作用域内.

When the inner function is being executed, someLargeVariable is still in scope.

如果我在内部 drawSomeWidget 方法中注释掉对 someLargeVariable 的引用,那么你会得到不同的结果:

If I comment out the reference to someLargeVariable in the inner drawSomeWidget method, then you get a different result:

现在 someLargeVariable 不在范围内,因为它可以在堆栈上分配.

Now someLargeVariable is not in scope, because it could be allocated on the stack.

相关文章