比较 2 个单独的 csv 文件并将差异写入新的 csv 文件 - Python 2.7

2022-01-25 00:00:00 python python-2.7 csv compare

问题描述

我正在尝试比较 python 中的两个 csv 文件并将差异保存到 python 2.7 中的第三个 csv 文件.

I am trying to compare two csv files in python and save the difference to a third csv file in python 2.7.

import csv

f1 = open ("olddata/file1.csv")
oldFile1 = csv.reader(f1)
oldList1 = []
for row in oldFile1:
    oldList1.append(row)

f2 = open ("newdata/file2.csv")
oldFile2 = csv.reader(f2)
oldList2 = []
for row in oldFile2:
    oldList2.append(row)

f1.close()
f2.close()

set1 = tuple(oldList1)
set2 = tuple(oldList2)

print oldList2.difference(oldList1)

我收到错误消息:

Traceback (most recent call last):
  File "compare.py", line 21, in <module>
    print oldList2.difference(oldList1)
AttributeError: 'list' object has no attribute 'difference'

我是 python 的新手,一般是编码,我还没有完成这段代码(我必须确保将差异存储到变量并将差异写入新的 csv 文件.).我整天都在努力解决这个问题,但我根本做不到.您的帮助将不胜感激.

I am new to python, and coding in general, and I am not done with this code just yet (I have to make sure to store the differences to a variable and write the difference to a new csv file.). I have been trying to solve this all day and I simply can't. Your help would be greatly appreciated.


解决方案

差异是什么意思?答案为您提供了两种截然不同的可能性.

What do you mean by difference? The answer to that gives you two distinct possibilities.

如果所有列都相同时认为某行相同,那么您可以通过以下代码得到答案:

If a row is considered same when all columns are same, then you can get your answer via the following code:

import csv

f1 = open ("olddata/file1.csv")
oldFile1 = csv.reader(f1)
oldList1 = []
for row in oldFile1:
    oldList1.append(row)

f2 = open ("newdata/file2.csv")
oldFile2 = csv.reader(f2)
oldList2 = []
for row in oldFile2:
    oldList2.append(row)

f1.close()
f2.close()

print [row for row in oldList1 if row not in oldList2]

但是,如果两行相同且某个关键字段(即列)相同,那么以下代码将为您提供答案:

However, if two rows are same if a certain key field (i.e. column) is same, then the following code will give you your answer:

import csv

f1 = open ("olddata/file1.csv")
oldFile1 = csv.reader(f1)
oldList1 = []
for row in oldFile1:
    oldList1.append(row)

f2 = open ("newdata/file2.csv")
oldFile2 = csv.reader(f2)
oldList2 = []
for row in oldFile2:
    oldList2.append(row)

f1.close()
f2.close()

keyfield = 0 # Change this for choosing the column number

oldList2keys = [row[keyfield] for row in oldList2]
print [row for row in oldList1 if row[keyfield] not in oldList2keys]

注意: 对于超大文件,上述代码可能运行缓慢.相反,如果您希望通过散列加速代码,您可以在使用以下代码转换 oldList 后使用 set:

Note: The above code might run slow for extremely large files. If instead, you wish to speed up code through hashing, you can use set after converting the oldLists using the following code:

set1 = set(tuple(row) for row in oldList1)
set2 = set(tuple(row) for row in oldList2)

在这之后,你可以使用set1.difference(set2)

相关文章