为 pandas.read_csv 指定正确的 dtypes 以获取日期时间和布尔值

2022-01-13 00:00:00 python pandas csv types type-conversion

问题描述

我正在将 csv 文件加载到 Pandas DataFrame 中.对于每一列，如何使用 dtype 参数指定它包含的数据类型?

I am loading a csv file into a Pandas DataFrame. For each column, how do I specify what type of data it contains using the dtype argument?

我可以使用 numeric 数据(代码在底部)...
但是如何指定时间数据...
和分类数据，例如因子或布尔值?我试过 np.bool_ 和 pd.tslib.Timestamp 没有运气.

I can do it with numeric data (code at bottom)...

But how do I specify time data...

and categorical data such as factors or booleans? I have tried np.bool_ and pd.tslib.Timestamp without luck.

代码:

import pandas as pd import numpy as np df = pd.read_csv(<file-name>, dtype={'A': np.int64, 'B': np.float64})

解决方案

read_csv 有很多选项可以处理你提到的所有情况.您可能想尝试 dtype={'A': datetime.datetime}，但通常您不需要 dtypes，因为 pandas 可以推断类型.

There are a lot of options for read_csv which will handle all the cases you mentioned. You might want to try dtype={'A': datetime.datetime}, but often you won't need dtypes as pandas can infer the types.

对于日期，则需要指定 parse_date 选项:

parse_dates : boolean, list of ints or names, list of lists, or dict keep_date_col : boolean, default False date_parser : function

一般来说，要转换布尔值，您需要指定:

true_values : list Values to consider as True false_values : list Values to consider as False

这会将列表中的任何值转换为布尔值 true/false.对于更一般的转换，您很可能需要

Which will transform any value in the list to the boolean true/false. For more general conversions you will most likely need

转换器:字典.用于转换某些列中的值的可选函数字典.键可以是整数或列标签

converters : dict. optional Dict of functions for converting values in certain columns. Keys can either be integers or column labels

虽然密集，但请在此处查看完整列表:http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.parsers.read_csv.html

Though dense, check here for the full list: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.parsers.read_csv.html

相关文章