使用 pandas 从 csv 文件中读回元组

2022-01-19 00:00:00 python pandas csv tuples

问题描述

使用 pandas，我已将一个数据框导出到一个 csv 文件，该数据框的单元格包含字符串元组.生成的文件具有以下结构:

Using pandas, I have exported to a csv file a dataframe whose cells contain tuples of strings. The resulting file has the following structure:

index,colA 1,"('a','b')" 2,"('c','d')"

现在我想使用 read_csv 读回它.但是无论我尝试什么，pandas 都会将这些值解释为字符串而不是元组.例如:

Now I want to read it back using read_csv. However whatever I try, pandas interprets the values as strings rather than tuples. For instance:

In []: import pandas as pd df = pd.read_csv('test',index_col='index',dtype={'colA':tuple}) df.loc[1,'colA'] Out[]: "('a','b')"

有没有办法告诉熊猫做正确的事?最好不要对数据框进行繁重的后处理:实际表有 5000 行和 2500 列.

Is there a way of telling pandas to do the right thing? Preferably without heavy post-processing of the dataframe: the actual table has 5000 rows and 2500 columns.

解决方案

在列中存储元组通常不是一个好主意；使用 Series 和 DataFrame 的许多优点都丢失了.也就是说，您可以使用 converters 对字符串进行后处理:

Storing tuples in a column isn't usually a good idea; a lot of the advantages of using Series and DataFrames are lost. That said, you could use converters to post-process the string:

>>> df = pd.read_csv("sillytup.csv", converters={"colA": ast.literal_eval}) >>> df index colA 0 1 (a, b) 1 2 (c, d) [2 rows x 2 columns] >>> df.colA.iloc[0] ('a', 'b') >>> type(df.colA.iloc[0]) <type 'tuple'>

但我可能会在源代码上进行更改以避免首先存储元组.

But I'd probably change things at source to avoid storing tuples in the first place.

相关文章