Pandas 读取具有浮点值的 csv 文件会导致奇怪的舍入和小数位数

2022-01-09 00:00:00 python pandas csv floating-point rounding

问题描述

我有一个包含数值的 csv 文件,例如 1524.449677.总有 6 位小数.

I have a csv file containing numerical values such as 1524.449677. There are always exactly 6 decimal places.

当我通过 pandas read_csv 导入 csv 文件(和其他列)时,该列会自动获取数据类型 object.我的问题是这些值显示为 2470.6911370000003 实际上应该是 2470.691137.或者值 2484.30691 显示为 2484.3069100000002.

When I import the csv file (and other columns) via pandas read_csv, the column automatically gets the datatype object. My issue is that the values are shown as 2470.6911370000003 which actually should be 2470.691137. Or the value 2484.30691 is shown as 2484.3069100000002.

这在某种程度上似乎是一个数据类型问题.在通过 read_csv 导入时,我尝试通过将 dtype 参数作为 {'columnname': np.float64} 来显式提供数据类型.问题仍然没有消失.

This seems to be a datatype issue in some way. I tried to explicitly provide the data type when importing via read_csv by giving the dtype argument as {'columnname': np.float64}. Still the issue did not go away.

如何获取导入的值并完全按照它们在源 csv 文件中的样子显示?

How can I get the values imported and shown exactly as they are in the source csv file?


解决方案

Pandas 使用专用的 dec 2 bin 转换器,该转换器会牺牲准确性而不是速度.

Pandas uses a dedicated dec 2 bin converter that compromises accuracy in preference to speed.

float_precision='round_trip' 传递给 read_csv 可以解决此问题.

Passing float_precision='round_trip' to read_csv fixes this.

查看 此页面 了解更多详情.

Check out this page for more detail on this.

处理完你的数据后,如果你想把它保存回一个csv文件,你可以将
float_format = "%.nf"传给对应的方法.

After processing your data, if you want to save it back in a csv file, you can pass
float_format = "%.nf" to the corresponding method.

一个完整的例子:

import pandas as pd

df_in  = pd.read_csv(source_file, float_precision='round_trip')
df_out = ... # some processing of df_in
df_out.to_csv(target_file, float_format="%.3f") # for 3 decimal places

相关文章