如何从 Java 中的输入文本中删除标点符号?

2022-01-12 00:00:00 string regex formatting java

我正在尝试使用 Java 中的用户输入来获取一个句子,我需要将其设为小写并删除所有标点符号.这是我的代码:

I am trying to get a sentence using input from the user in Java, and i need to make it lowercase and remove all punctuation. Here is my code:

    String[] words = instring.split("\s+");
    for (int i = 0; i < words.length; i++) {
        words[i] = words[i].toLowerCase();
    }
    String[] wordsout = new String[50];
    Arrays.fill(wordsout,"");
    int e = 0;
    for (int i = 0; i < words.length; i++) {
        if (words[i] != "") {
            wordsout[e] = words[e];
            wordsout[e] = wordsout[e].replaceAll(" ", "");
            e++;
        }
    }
    return wordsout;

我似乎找不到任何方法来删除所有非字母字符.我尝试过使用正则表达式和迭代器,但没有成功.感谢您的帮助.

I cant seem to find any way to remove all non-letter characters. I have tried using regexes and iterators with no luck. Thanks for any help.

推荐答案

这首先删除所有非字母字符,折叠为小写,然后拆分输入,在一行中完成所有工作:

This first removes all non-letter characters, folds to lowercase, then splits the input, doing all the work in a single line:

String[] words = instring.replaceAll("[^a-zA-Z ]", "").toLowerCase().split("\s+");

空格最初留在输入中,因此拆分仍然有效.

Spaces are initially left in the input so the split will still work.

通过在拆分之前删除垃圾字符,您可以避免遍历元素.

By removing the rubbish characters before splitting, you avoid having to loop through the elements.

相关文章