比较两个文本文件以找出差异并将它们输出到新的文本文件

2022-01-25 00:00:00 python io compare diff

问题描述

我正在尝试处理一个简单的数据比较文本文档.目标是让用户能够选择一个文件，在该文件中搜索某个参数，然后在将新文本文档中的这些参数与具有默认值的文本文档进行比较之后，将这些参数打印到一个新的文本文档中参数，然后在比较它们后将差异打印到新的文本文档中.

I am trying to work on a simple data comparison text document. The goal is for the user to be able to select a file, search through this file for a certain parameter, then print those parameters into a new text document, after compare those parameters from the new text document with a text document that has the default parameters and then once they've been compared to print out the differences into a new text document.

我创建了一个简单的流程图来总结这一点:

I've created a simple flowchart to summarize this:

这是我当前的代码.我正在使用 diff 库来比较这两个文件.

This is my current code. I am using the diff lib to compare the two files.

import difflib from Tkinter import * import tkSimpleDialog import tkMessageBox from tkFileDialog import askopenfilename root = Tk() w = Label(root, text ="Configuration Inspector") w.pack() tkMessageBox.showinfo("Welcome", "This is version 1.00 of Configuration Inspector") filename = askopenfilename() # Logs File filename2 = askopenfilename() # Default Configuration compareFile = askopenfilename() # Comparison File outputfilename = askopenfilename() # Out Serial Number Configuration from Logs with open(filename, "rb") as f_input: start_token = tkSimpleDialog.askstring("Serial Number", "What is the serial number?") end_token = tkSimpleDialog.askstring("End Keyword", "What is the end keyword") reText = re.search("%s(.*?)%s" % (re.escape(start_token + ",SHOWALL"), re.escape(end_token)), f_input.read(), re.S) if reText: output = reText.group(1) fo = open(outputfilename, "wb") fo.write(output) fo.close() diff = difflib.ndiff(outputfilename, compareFile) print ' '.join(list(diff)) else: tkMessageBox.showinfo("Output", "Sorry that input was not found in the file") print "not found"

到目前为止的结果是程序正确地搜索了您选择的文件进行搜索，然后将找到的参数打印到一个新的输出文本文件中.

The result so far is that the program correctly searches through the file you select for it to search through, Then prints out the parameters it finds into a new Output Text file.

在尝试比较两个文件(默认数据和输出文件)时会出现问题.

The issues arises when trying to compare the two files, the Default Data and the Output File.

当比较程序将输出差异时，由于默认数据文件与输出文件有不同的行，它只会打印不匹配的行而不是不匹配的参数.换句话说，假设我有这两个文件:

When comparing the program will output the differences however, Since the Default Data File has different lines than the Output file it will only print out the lines that do not match rather than the Parameters that do not match. In other words lets say I have these two files:

默认数据文本文件:

Data1 = 1 Data2 = 2 Data3 = 3 Data4 = 4 Data5 = 5 Data6 = 6

输出数据文本文件:

Data1 = 1 Data2 = 2 Data3 = 8 Data4 = 7

因此，由于 Data3 和 Data4 不匹配，因此 difference.txt 文件(比较输出)应该显示这一点.例如:

So since Data3 and Data4 do Not Match the difference.txt file (The Comparison Output) should show that. For Example:

Data3 = 8 Data4 = 7 Data5 = 5 Data6 = 6

但是它不匹配或比较行，它只是检查该空间中是否有行.所以目前我的比较输出如下所示:

However it does not match or compare the lines, it just checks to see if there's a line in that space. So currently my Comparison output looks like this:

Data5 = 5 Data6 = 6

关于如何进行比较的任何想法可以显示文件参数之间的所有差异?

Any ideas on how I can make the comparison show everything that is difference between the file's parameters?

如果您需要更多详细信息，请在评论中告诉我，我将编辑原始帖子以添加更多详细信息.

If you need any more details please let me know in the comments I will edit the original post to add more details.

解决方案

我不知道你想用 difflib.ndiff() 做什么.该函数需要两个字符串列表，但您传递的是文件名.

I don't know what you're trying to do with difflib.ndiff(). That function takes two lists of strings, but you are passing it filenames.

无论如何，这是一个简短的演示，可以执行您想要的比较.它使用 dict 来加快比较过程.显然，我没有你的数据文件，所以这个程序使用字符串 .splitlines() 方法创建字符串列表.

Anyway, here's a short demo that performs the comparison that you want. It uses a dict to speed up the comparison process. Obviously, I don't have your data files, so this program creates lists of strings using the string .splitlines() method.

它逐行遍历默认数据列表.
如果输出 dict 中不存在该数据，则打印默认行.
如果输出 dict 中存在具有该值的数据键，则跳过该行.
如果找到键但输出 dict 中的值与默认值不同，则使用键 &输出值被打印.

It goes through the default data list line by line.
If that data is not present in the output dict, then the default line is printed.
If a data key with that value is present in the output dict, then that line is skipped.
If the key is found but the value in the output dict is different to the default value, then a line with the key & output value is printed.

#Build default data list defdata = ''' Data1 = 1 Data2 = 2 Data3 = 3 Data4 = 4 Data5 = 5 Data6 = 6 '''.splitlines()[1:] #Build output data list outdata = ''' Data1 = 1 Data2 = 2 Data3 = 8 Data4 = 7 '''.splitlines()[1:] outdict = dict(line.split(' = ') for line in outdata) for line in defdata: key, val = line.split(' = ') if key in outdict: outval = outdict[key] if outval != val: print '%s = %s' % (key, outval) else: print line

输出

Data3 = 8 Data4 = 7 Data5 = 5 Data6 = 6

<小时>
以下是如何将文本文件读入行列表.

Here's how to read a text file into a list of lines.

with open(filename) as f: data = f.read().splitlines()

还有一个 .readlines() 方法，但在这里用处不大，因为它在每一行的末尾保留了换行符，我们不用不想那样.

There's also a .readlines() method, but it's not so useful here because it preserves the newline character at the end of each line, and we don't want that.

请注意，如果文本文件中有任何空行，则结果列表将在该位置有一个空字符串 ''.此外，该代码不会删除每行上的任何前导或尾随空格或其他空格.但是，如果您需要这样做，那么 Stack Overflow 上有数以千计的示例可以向您展示如何操作.

Note that if there are any blank lines in the text file then the resulting list will have an empty string '' in that position. Also, that code won't remove any leading or trailing blanks or other whitespace on each line. But if you need to do that there are thousands of examples that can show you how here on Stack Overflow.

这个新版本使用了稍微不同的方法.它循环遍历在默认列表或输出列表中找到的所有键的排序列表.
如果仅在其中一个列表中找到键，则将相应的行添加到差异列表中.
如果在两个列表中都找到了一个键，但输出行与默认行不同，则将输出列表中的相应行添加到差异列表中.如果两行相同，则不会将任何内容添加到差异列表中.

This new version uses a slightly different approach. It loops over a sorted list of all the keys found in either the default list or the output list.
If a key is only found in one of the lists the corresponding line is added to the diff list.
If a key is found in both lists but the output line differs from the default line then the corresponding line from the output list is added to the diff list. If both lines are identical, nothing is added to the diff list.

#Build default data list defdata = ''' Data1 = 1 Data2 = 2 Data3 = 3 Data4 = 4 Data5 = 5 Data6 = 6 '''.splitlines()[1:] #Build output data list outdata = ''' Data1 = 1 Data2 = 2 Data3 = 8 Data4 = 7 Data8 = 8 '''.splitlines()[1:] def make_dict(data): return dict((line.split(None, 1)[0], line) for line in data) defdict = make_dict(defdata) outdict = make_dict(outdata) #Create a sorted list containing all the keys allkeys = sorted(set(defdict) | set(outdict)) #print allkeys difflines = [] for key in allkeys: indef = key in defdict inout = key in outdict if indef and not inout: difflines.append(defdict[key]) elif inout and not indef: difflines.append(outdict[key]) else: #key must be in both dicts defval = defdict[key] outval = outdict[key] if outval != defval: difflines.append(outval) for line in difflines: print line

输出

Data3 = 8 Data4 = 7 Data5 = 5 Data6 = 6 Data8 = 8

相关文章