Java 在字符串中看不到空格

2022-01-12 00:00:00 string split char java


So, I'm trying to parse some text file which has multiple lines of text. My job is to go through all words and print them out in file.


So, I read all lines, I'm looping through them and splitting every line by spaces, like this:


现在,问题是在某些情况下 Java 看不到两个单词之间的空格...

Now, the problem is that in some cases Java does not see space between two words...

我也试图遍历有空格但 Java 看不到它的字符串,并且 Character.isSpaceChar(char) 返回 true...

I was also trying to loop through string which has space but Java doesn't see it, and Character.isSpaceChar(char) returned true...



public void createMap(String inputPath, String outputPath)
            throws IOException {
                File f = new File(inputPath);
        FileWriter fw = new FileWriter(outputPath);
        List<String> lines = Files.readAllLines(f.toPath(),
        for (String l : lines) {
            for (String w : l.split("\s+")) {
                if (isNotRubbish(w.trim())) {
                    fw.write(w.trim() + "
private boolean isNotRubbish(String w) {
        Pattern p = Pattern.compile("@?\p{L}+",
        Matcher m = p.matcher(w);
        return m.matches();


我怀疑你的文本字符中有类似于 non-breakable-space 不是空白,因此无法通过 \s 进行匹配.

I suspect that you have in your text character which is similar to non-breakable-space which is not white space so it can't be matched via \s.

在这种情况下,请尝试使用 p{Zs} 而不是 s.

In that case try to use p{Zs} instead of s.

如 中所述

p{Zs} 将匹配任何类型的空格字符

p{Zs} will match any kind of space character

顺便说一句,如果您还想包含除空格之外的其他分隔符,例如制表符 或换行符 您可以组合p{Zs}s 类似 [p{Zs}s]

BTW if you would also like to include other separators than spaces like tabulators or line breaks you can combine p{Zs} with s like [p{Zs}s]
