解释 JIT 重新排序的工作原理
我已经阅读了很多关于 Java 同步以及所有可能出现的问题的信息.但是,我仍然有点困惑的是 JIT 如何重新排序写入.
I have been reading a lot about synchronization in Java and all the problems that can occur. However, what I'm still slightly confused about is how the JIT can reorder a write.
例如,一个简单的双重检查锁对我来说很有意义:
For instance, a simple double check lock makes sense to me:
class Foo {
private volatile Helper helper = null; // 1
public Helper getHelper() { // 2
if (helper == null) { // 3
synchronized(this) { // 4
if (helper == null) // 5
helper = new Helper(); // 6
}
}
return helper;
}
}
我们在第 1 行使用 volatile 来强制执行发生前的关系.没有它,JIT 完全有可能重新编码我们的代码.例如:
We use volatile on line 1 to enforce a happens-before relationship. Without it, is entirely possible for the JIT to reoder our code. For example:
线程 1 位于第 6 行,内存分配给
helper
但是,构造函数尚未运行,因为 JIT 可以重新排序我们的代码.
Thread 1 is at line 6 and memory is allocated to
helper
however, the constructor has not yet run because the JIT could reordering our code.
线程 2 在第 2 行进入并获取尚未完全创建的对象.
Thread 2 comes in at line 2 and gets an object that is not fully created yet.
我理解这一点,但我不完全理解 JIT 对重新排序的限制.
I understand this, but I don't fully understand the limitations that the JIT has on reordering.
例如,假设我有一个创建 MyObject
并将其放入 HashMap
的方法(我知道 HashMap
不是线程安全的,不应该在多线程环境中使用,但请耐心等待).线程 1 调用 createNewObject:
For instance, say I have a method that creates and puts a MyObject
into a HashMap<String, MyObject>
(I know that a HashMap
is not thread safe and should not be used in a multi-thread environment, but bear with me). Thread 1 calls createNewObject:
public class MyObject {
private Double value = null;
public MyObject(Double value) {
this.value = value;
}
}
Map<String, MyObject> map = new HashMap<String, MyObject>();
public void createNewObject(String key, Double val){
map.put(key, new MyObject( val ));
}
同时线程 2 从 Map 中调用一个 get.
At the same time thread 2 calls a get from the Map.
public MyObject getObject(String key){
return map.get(key);
}
线程 2 是否可以从 getObject(String key)
接收未完全构造的对象?比如:
Is it possible for thread 2 to receive an object from getObject(String key)
that is not fully constructed? Something like:
- 线程 1:为
new MyObject( val )
分配内存 - 线程 1:在地图中放置对象
- 线程2:调用
getObject(String key)
- 线程 1:完成新 MyObject 的构建.
或者 map.put(key, new MyObject( val ))
在完全构造之前不会将对象放入地图中?
Or will map.put(key, new MyObject( val ))
not put an object into the map until it's fully constructed?
我想答案是,它不会将对象放入地图中,直到它完全构建(因为这听起来很糟糕).那么JIT如何重新排序呢?
I'd imagine that the answer is, it wouldn't put an object into the Map until it is fully constructed (because that sounds awful). So how can the JIT reorder?
简而言之,它只能在创建新的 Object
并将其分配给引用变量(如双重检查锁)时重新排序吗?一个完整的 JIT 概要对于一个 SO 答案可能很重要,但我真正好奇的是它如何重新排序写入(如双重检查锁上的第 6 行)以及阻止它将对象放入 <未完全构造的代码>地图.
In a nutshell can it only reorder when creating a new Object
and assigning it to a reference variable, like the double checked lock? A complete rundown on the JIT might be much for a SO answer, but what I'm really curious about is how it can reorder a write (like line 6 on the double checked lock) and what stops it from putting an object into a Map
that is not fully constructed.
推荐答案
WARNING: WALL OF TEXT
WARNING: WALL OF TEXT
您的问题的答案在水平线之前.我将在答案的第二部分继续更深入地解释基本问题(这与 JIT 无关,所以如果您只对 JIT 感兴趣,那就是这样).您问题第二部分的答案位于底部,因为它依赖于我进一步描述的内容.
The answer to your question is before the horizontal line. I will continue to explain deeper the fundamental problem in the second portion of my answer (which is not related to the JIT, so that's it if you are only interested in the JIT). The answer to the second part of your question lies at the bottom because it relies on what I describe further.
TL;DR JIT 会做任何它想做的事,JMM 会做任何它想做的事,在你通过编写线程不安全代码让它们允许它们的条件下是有效的.
TL;DR The JIT will do whatever it wants, the JMM will do whatever it wants, being valid under the condition that you let them by writing thread unsafe code.
注意:初始化"指的是构造函数中发生的事情,不包括其他任何事情,例如在构造后调用静态init方法等...
NOTE: "initialization" refers to what happens in the constructor, which excludes anything else such as calling a static init method after constructing etc...
如果重新排序产生与合法执行一致的结果,则不是非法的."(JLS 17.4.5-200)
"If the reordering produces results consistent with a legal execution, it is not illegal." (JLS 17.4.5-200)
如果一组操作的结果符合 JMM 规定的有效执行链,则无论作者是否打算让代码产生该结果,结果都是允许的.
If the result of a set of actions conforms to a valid chain of execution as per the JMM, then the result is allowed regardless of whether the author intended the code to produce that result or not.
内存模型描述了程序的可能行为.一个实现可以自由地生成它喜欢的任何代码,只要程序的所有结果执行产生的结果可以由内存模型预测.
"The memory model describes possible behaviors of a program. An implementation is free to produce any code it likes, as long as all resulting executions of a program produce a result that can be predicted by the memory model.
这为实现者提供了很大的自由来执行无数的代码转换,包括重新排序操作和删除不必要的同步".(JLS 17.4).
This provides a great deal of freedom for the implementor to perform a myriad of code transformations, including the reordering of actions and removal of unnecessary synchronization" (JLS 17.4).
JIT 将重新排序它认为合适的任何东西,除非我们不允许它使用 JMM(在多线程环境中).
The JIT will reorder whatever it sees fit unless we do not allow it using the JMM (in a multithreaded environment).
JIT 可以或将要做什么的细节是不确定的.查看数百万个运行样本不会产生有意义的模式,因为重新排序是主观的,它们取决于非常具体的细节,例如 CPU 架构、时间、启发式、图形大小、JVM 供应商、字节码大小等……我们只知道当代码不需要遵守 JMM 时,JIT 将假定代码在单线程环境中运行.最后,JIT 对您的多线程代码影响不大.如果您想更深入地挖掘,请参阅此 SO answer 并对 IR Graphs,JDK HotSpot源码,编译器文章如这个.但同样,请记住,JIT 与您的多线程代码转换关系不大.
The details of what the JIT can or will do is nondeterministic. Looking at millions of samples of runs will not produce a meaningful pattern because reorderings are subjective, they depend on very specific details such as CPU arch, timings, heuristics, graph size, JVM vendor, bytecode size, etc... We only know that the JIT will assume that the code runs in a single threaded environment when it does not need to conform to the JMM. In the end, the JIT matters very little to your multithreaded code. If you want to dig deeper, see this SO answer and do a little research on such topics as IR Graphs, the JDK HotSpot source, and compiler articles such as this one. But again, remember that the JIT has very little to do with your multithreaded code transforms.
在实践中,尚未完全创建的对象"不是 JIT 的副作用,而是内存模型 (JMM).总而言之,JMM 是一个规范,它保证什么可以和不能是一组特定动作的结果,其中动作是涉及共享状态的操作.JMM 更容易被更高层次的概念所理解,例如 原子性,内存可见性和排序,这三个是线程安全程序的组成部分.
In practice, the "object that is not fully created yet" is not a side effect of the JIT but rather the memory model (JMM). In summary, the JMM is a specification that puts forth guarantees of what can and cannot be results of a certain set of actions, where actions are operations that involve a shared state. The JMM is more easily understood by higher level concepts such as atomicity, memory visibility, and ordering, those three of which are components of a thread-safe program.
为了证明这一点,您的第一个代码示例(DCL 模式)极不可能被 JIT 修改,从而产生尚未完全创建的对象".事实上,我认为不可能这样做,因为它不会遵循单线程程序的顺序或执行.
To demonstrate this, it is highly unlikely for your first sample of code (the DCL pattern) to be modified by the JIT that would produce "an object that is not fully created yet." In fact, I believe that it is not possible to do this because it would not follow the order or execution of a single-threaded program.
那么这里到底有什么问题呢?
So what exactly is the problem here?
问题在于,如果操作不是按同步顺序、发生前发生顺序等排序的...(再次由 JLS 17.4-17.5) 那么线程不能保证看到执行此类操作的副作用行动.线程可能不会刷新它们的缓存来更新字段,线程可能观察乱序写入.具体到这个例子,线程可以看到对象不一致的状态,因为它没有正确发布.如果您曾经使用过多线程,我敢肯定您以前听说过安全发布.
The problem is that if the actions aren't ordered by a synchronization order, a happens-before order, etc... (described again by JLS 17.4-17.5) then threads are not guaranteed to see the side effects of performing such actions. Threads might not flush their caches to update the field, threads might observe the write out of order. Specific to this example, threads are allowed to see the object in an inconsistent state because it is not properly published. I'm sure that you have heard of safe publishing before if you have ever worked even the tiniest bit with multithreading.
你可能会问,既然 JIT 不能修改单线程执行,那为什么多线程版本可以呢?
简单地说,这是因为线程被允许认为(感知",就像教科书中通常写的那样)由于缺乏适当的同步而导致初始化无序.
Put simply, it's because the thread is allowed to think ("perceive" as usually written in textbooks) that the initialization is out of order due to the lack of proper synchronization.
"如果 Helper 是一个不可变对象,使得 Helper 的所有字段都是 final 的,那么双重检查锁定将起作用,而不必使用 volatile 字段.这个想法是对不可变对象(如字符串或整数)的引用应该与 int 或 float 的行为方式大致相同;对不可变对象的读取和写入引用是原子的"(双重检查锁定被破坏"声明).
"If Helper is an immutable object, such that all of the fields of Helper are final, then double-checked locking will work without having to use volatile fields. The idea is that a reference to an immutable object (such as a String or an Integer) should behave in much the same way as an int or float; reading and writing references to immutable objects are atomic" (The "Double-Checked Locking is Broken" Declaration).
使对象不可变可确保状态在构造函数退出时完全初始化.
Making the object immutable ensures that the state is fully initialized when the constructor exits.
请记住,对象构造始终是不同步的.正在初始化的对象仅相对于构造它的线程是可见和安全的.为了让其他线程看到初始化,您必须安全地发布它.以下是这些方法:
Remember that object construction is always unsynchronized. An object that is being initialized is ONLY visible and safe with respect to the thread that constructed it. In order for other threads to see the initialization, you must publish it safely. Here are those ways:
"有一些简单的方法可以实现安全发布:
"There are a few trivial ways to achieve safe publication:
- 通过正确锁定的字段交换引用 (JLS 17.4.5)
- 使用静态初始化器进行初始化存储 (JLS 12.4)
- 通过 volatile 字段 (JLS 17.4.5) 或作为此规则的结果,通过 AtomicX 类交换引用
- 将值初始化为最终字段 (JLS 17.5)."
(Java 中的安全发布和安全初始化)
安全发布确保其他线程在完成后能够看到完全初始化的对象.
Safe publication ensures that other threads will be able to see the fully initialized objects when after it finishes.
重新审视我们的想法,即线程只有在有序的情况下才能保证看到副作用,您需要 volatile
的原因是您对线程 1 中的帮助程序的写入是相对于线程 2 中的读取.线程 2 不允许在读取之后感知初始化,因为它发生在写入助手之前.它依赖于 volatile 写入,因此必须在初始化之后读取,然后写入 volatile 字段(传递属性).
Revisiting our idea that threads are only guaranteed to see side effects if they are in order, the reason that you need volatile
is so that your write to the helper in thread 1 is ordered with respect to the read in thread 2. Thread 2 is not allowed to perceive the initialization after the read because it occurs before the write to helper. It piggy backs on the volatile write such that the read must happen after the initialization AND THEN the write to the volatile field (transitive property).
总而言之,初始化只会在对象创建之后发生,只是因为另一个线程认为这是命令.由于 JIT 优化,构造后将永远不会发生初始化.您可以通过确保通过 volatile 字段正确发布或使您的助手不可变来解决此问题.
To conclude, an initialization will only occur after the object is created only because another thread THINKS that is the order. An initialization will never occur after construction due to a JIT optimisation. You can fix this by ensuring proper publication through a volatile field or by making your helper immutable.
现在我已经描述了如何在 JMM 中发布工作背后的一般概念,希望理解您的第二个示例如何无法工作将很容易.
Now that I've described the general concepts behind how publication works in the JMM, hopefully understanding how your second example won't work will be easy.
我想答案是,它不会将对象放入地图中,直到它完全构建(因为这听起来很糟糕).那么 JIT 如何重新排序呢?
I'd imagine that the answer is, it wouldn't put an object into the Map until it is fully constructed (because that sounds awful). So how can the JIT reorder?
到构造线程,初始化后放入map中.
To the constructing thread, it will put it into the map after initialization.
对于读者线程,它可以看到它想要的任何东西.(HashMap 中构造不正确的对象?这绝对是可能的).
To the reader thread, it can see whatever the hell it wants. (improperly constructed object in HashMap? That is definitely within the realm of possibility).
您所描述的 4 个步骤是完全合法的.分配 value
或将其添加到 map 之间没有顺序,因此线程 2 可以感知初始化乱序,因为 MyObject
发布不安全.
What you described with your 4 steps is completely legal. There is no order between assigning value
or adding it to the map, thus thread 2 can perceive the initialization out of order since MyObject
was published unsafely.
您实际上可以通过转换为 ConcurrentHashMap
来解决此问题,并且 getObject()
将是完全线程安全的,因为一旦将对象放入映射中,初始化将发生在 put 之前,并且由于 ConcurrentHashMap
是线程安全的,因此两者都需要发生在 get
之前.然而,一旦你修改了对象,它就会变成管理的噩梦,因为你需要确保更新状态是可见的和原子的——如果一个线程检索一个对象而另一个线程在第一个线程完成修改和放置之前更新了该对象怎么办?回到地图上?
You can actually fix this problem by just converting to ConcurrentHashMap
and getObject()
will be completely thread safe as once you put the object in the map, the initialization will occur before the put and both will need to occur before the get
as a result of ConcurrentHashMap
being thread safe. However, once you modify the object, it will become a management nightmare because you need to ensure that updating the state is visible and atomic - what if a thread retrieves an object and another thread updates the object before the first thread could finish modifying and putting it back in the map?
T1 -> get() MyObject=30 ------> +1 --------------> put(MyObject=31)
T2 -------> get() MyObject=30 -------> +1 -------> put(MyObject=31)
或者,您也可以使 MyObject
不可变,但您仍然需要映射映射 ConcurrentHashMap
以便其他线程可以看到 put
- 线程缓存行为可能会缓存旧副本而不刷新并继续重用旧版本.ConcurrentHashMap
确保其写入对读者可见并确保线程安全.回顾一下线程安全的 3 个先决条件,我们通过使用线程安全数据结构获得可见性,通过使用不可变对象获得原子性,最后通过捎带 ConcurrentHashMap
的线程安全来进行排序.
Alternatively you could also make MyObject
immutable, but you still need to map the map ConcurrentHashMap
in order for other threads to see the put
- thread caching behavior might cache an old copy and not flush and keep reusing the old version. ConcurrentHashMap
ensures that its writes are visible to readers and ensures thread-safety. Recalling our 3 prerequisites for thread-safety, we get visibility from using a thread-safe data structure, atomicity by using an immutable object, and finally ordering by piggybacking on ConcurrentHashMap
's thread safety.
总结整个答案,我要说的是多线程是一个非常难以掌握的专业,而我自己绝对没有.通过了解使程序线程安全的概念并考虑 JMM 允许和保证的内容,您可以确保您的代码执行您希望它执行的操作.多线程代码中的错误经常发生,因为 JMM 允许在其参数范围内的违反直觉的结果,而不是 JIT 进行性能优化.如果您阅读了所有内容,希望您对多线程有更多的了解.线程安全应该通过构建一系列线程安全范例来实现,而不是使用规范中的一些不便(Lea 或 Bloch,甚至不确定是谁说的).
To wrap up this entire answer, I will say that multithreading is a very difficult profession to master, one that I myself most definitely have not. By understanding concepts of what makes a program thread-safe and thinking about what the JMM allows and guarantees, you can ensure that your code will do what you want it to do. Bugs in multithreaded code occur often as a result of the JMM allowing a counterintuitive result that is within its parameters, not the JIT doing performance optimisations. Hopefully you will have learned something a little bit more about multithreading if you read everything. Thread safety should be achieved by building a repertoire of thread-safe paradigms rather than using little inconveniences of the spec (Lea or Bloch, not even sure who said this).
相关文章