边界匹配器正则表达式 () 上的以下片段问题

2022-01-17 00:00:00 regex set java


 1. end 
 2. end of the day or end of the week 
 3. endline
 4. something 
 5. "something" end


Based on the above discussions, If I try to replace a single string using this snippet, it removes the appropriate words from the line successfully

public class DeleteTest {

    public static void main(String[] args) {

        // TODO Auto-generated method stub
        try {
        File file = new File("C:/Java samples/myfile.txt");
        File temp = File.createTempFile("myfile1", ".txt", file.getParentFile());
        String delete="end";
        BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
        PrintWriter writer = new PrintWriter(new OutputStreamWriter(new FileOutputStream(temp)));

        for (String line; (line = reader.readLine()) != null;) {
            line = line.replaceAll("\b"+delete+"\b", "");
        catch (Exception e) {
            System.out.println("Something went Wrong");


My output If I use the above snippet:(Also my expected output)

 2. of the day or of the week
 3. endline
 4. something
 5. "something"

但是当我包含更多要删除的单词时,并且为此我使用 Set 时,我使用以下代码片段:

But when I include more words to delete, and for that purpose when I use Set, I use the below code snippet:

public static void main(String[] args) {

    // TODO Auto-generated method stub
    try {

    File file = new File("C:/Java samples/myfile.txt");
    File temp = File.createTempFile("myfile1", ".txt", file.getParentFile());
    BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
    PrintWriter writer = new PrintWriter(new OutputStreamWriter(new FileOutputStream(temp)));

        Set<String> toDelete = new HashSet<>();

    for (String line; (line = reader.readLine()) != null;) {
        line = line.replaceAll("\b"+toDelete+"\b", "");
    catch (Exception e) {
        System.out.println("Something went Wrong");


I get my output as: (It just removes the space)

 1. end
 2. endofthedayorendoftheweek
 3. endline
 4. something
 5. "something" end 


Can u guys help me on this?



你需要创建一个 交替组出组与

You need to create an alternation group out of the set with

String.join("|", toDelete)


line = line.replaceAll("\b(?:"+String.join("|", toDelete)+")\b", "");



请参阅 正则表达式演示.这里,(?:...) 是一个非捕获组,用于分组几个备选方案,而不为捕获(您不需要它,因为您删除了匹配项).

See the regex demo. Here, (?:...) is a non-capturing group that is used to group several alternatives without creating a memory buffer for the capture (you do not need it since you remove the matches).


Or, better, compile the regex before entering the loop:

Pattern pat = Pattern.compile("\b(?:" + String.join("|", toDelete) + ")\b");
    line = pat.matcher(line).replaceAll("");


要允许匹配可能包含特殊字符的整个单词",您需要 Pattern.quote 这些单词以转义这些特殊字符,然后您需要使用明确的单词边界,(?<!w) 而不是初始的  以确保之前没有单词 char 和 (?!w) 负前瞻而不是最后的  以确保匹配后没有单词 char.

To allow matching whole "words" that may contain special chars, you need to Pattern.quote those words to escape those special chars, and then you need to use unambiguous word boundaries, (?<!w) instead of the initial  to make sure there is no word char before and (?!w) negative lookahead instead of the final  to make sure there is no word char after the match.

在 Java 8 中,您可以使用以下代码:

In Java 8, you may use this code:

Set<String> nToDel = new HashSet<>();
nToDel = toDelete.stream()
String pattern = "(?<!\w)(?:" + String.join("|", nToDel) + ")(?!\w)";

正则表达式看起来像 (?<!w)(?:Q+endE|Qsomething-E)(?!w).请注意,QE 之间的符号被解析为 文字符号.

The regex will look like (?<!w)(?:Q+endE|Qsomething-E)(?!w). Note that the symbols between Q and E is parsed as literal symbols.
