字符串文字、实习和反射

2022-01-16 00:00:00 string jvm reflection java

我正在尝试为这个问题找到第三种解决方案.

I'm trying to find a third solution to this question.

我不明白为什么这不打印 false.

I can't understand why this doesn't print false.

public class MyClass {

    public MyClass() {
        try {
            Field f = String.class.getDeclaredField("value");
            f.setAccessible(true);
            f.set("true", f.get("false"));
        } catch (Exception e) {
        }
    }

    public static void main(String[] args) {
        MyClass m = new MyClass();
        System.out.println(m.equals(m));
    }
}

当然,由于字符串实习,被修改的 "true" 实例与 PrintStreamprint 方法中使用的实例完全相同?

Surely, because of string interning, the "true" instance being modified is exactly the same one used in the print method of PrintStream?

public void print(boolean b) {
    write(b ? "true" : "false");
}

我错过了什么?

编辑

@yshavit 的一个有趣的观点是,如果您添加该行

An interesting point by @yshavit is that if you add the line

System.out.println(true);

try之前,输出是

true
false

推荐答案

这可能是一个 HotSpot JVM 错误.

This is arguably a HotSpot JVM bug.

问题出在字符串字面量实习机制上.

  • java.lang.String 字符串文字的实例是在常量池解析期间延迟创建的.
  • 最初,常量池中的字符串文字由 CONSTANT_String_info 结构指向 CONSTANT_Utf8_info.
  • 每个类都有自己的常量池.也就是说,MyClassPrintStream 有自己的一对 CONSTANT_String_info/CONSTANT_Utf8_info 用于文字 的 cpool 条目'真实'.
  • CONSTANT_String_info第一次被访问时,JVM启动解析过程.字符串实习是这个过程的一部分.
  • 为了找到匹配的文本,JVM 将 CONSTANT_Utf8_info 的内容与 StringTable 中的字符串实例的内容进行比较.
  • ^^^ 这就是问题所在.来自 cpool 的原始 UTF 数据与 Java char[] 数组内容进行比较,用户可以通过反射进行欺骗.
  • java.lang.String instances for the string literals are created lazily during constant pool resolution.
  • Initially a string literal is represented in the constant pool by CONSTANT_String_info structure that points to CONSTANT_Utf8_info.
  • Each class has its own constant pool. That is, MyClass and PrintStream have their own pair of CONSTANT_String_info / CONSTANT_Utf8_info cpool entries for the literal 'true'.
  • When CONSTANT_String_info is accessed for the first time, JVM initiates the process of resolution. String interning is the part of this process.
  • To find a match for a literal being interned, JVM compares the contents of CONSTANT_Utf8_info with the contents of string instances in the StringTable.
  • ^^^ And here is the problem. Raw UTF data from cpool is compared to Java char[] array contents that can be spoofed by a user via Reflection.

那么,您的测试中发生了什么?

  1. f.set("true", f.get("false")) 启动MyClass.
  2. JVM在StringTable中没有发现匹配序列'true'的实例,并创建一个新的java.lang.String,将其存储在 StringTable 中.
  3. StringTable 中该字符串的
  4. value 被反射替换.
  5. System.out.println(true) 启动PrintStream 类中文字'true' 的解析.
  6. JVM 将 UTF 序列 'true' 与来自 StringTable 的字符串进行比较,但没有找到匹配项,因为该字符串已经具有 'false' 值.'true' 的另一个字符串被创建并放置在 StringTable 中.
  1. f.set("true", f.get("false")) initiates the resolution of the literal 'true' in MyClass.
  2. JVM discovers no instances in StringTable matching the sequence 'true', and creates a new java.lang.String, which is stored in StringTable.
  3. value of that String from StringTable is replaced via Reflection.
  4. System.out.println(true) initiates the resolution of the literal 'true' in PrintStream class.
  5. JVM compares UTF sequence 'true' with Strings from StringTable, but finds no match, since that String already has 'false' value. Another String for 'true' is created and placed in StringTable.

为什么我认为这是一个错误?

JLS §3.10.5 和 JVMS §5.1 要求包含相同字符序列的字符串文字必须指向相同的 java.lang.String 实例.

JLS §3.10.5 and JVMS §5.1 require that string literals containing the same sequence of characters must point to the same instance of java.lang.String.

但是,在下面的代码中,两个具有相同字符序列的字符串文字的解析会导致不同实例.

However, in the following code the resolution of two string literals with the same sequence of characters result in different instances.

public class Test {

    static class Inner {
        static String trueLiteral = "true";
    }

    public static void main(String[] args) throws Exception {
        Field f = String.class.getDeclaredField("value");
        f.setAccessible(true);
        f.set("true", f.get("false"));

        if ("true" == Inner.trueLiteral) {
            System.out.println("OK");
        } else {
            System.out.println("BUG!");
        }
    }
}

JVM 的一个可能修复方法是将指向原始 UTF 序列的指针与 java.lang.String 对象一起存储在 StringTable 中,这样实习进程就不会比较 cpool带有 value 数组(可通过反射访问)的数据(用户无法访问).

A possible fix for JVM is to store a pointer to original UTF sequence in StringTable along with java.lang.String object, so that interning process will not compare cpool data (inaccessible by user) with value arrays (accessible via Reflection).

相关文章