使用 csv 文件中的 copy_from 到 Postgres db 时,Psycopg2 不会自动生成 id

2022-01-20 00:00:00 python 复制 postgresql csv psycopg2

问题描述

我有一个包含多列的 csv 文件:

I have a csv file that has several columns:

upc 日期数量客户

在我的 physical 表中,每行都有一个自动生成的 id 列:

In my physical table, I have an auto generating id column for each row:

id upc 日期数量客户

当我运行 python 脚本复制到数据库时,数据库似乎将 upc 解释为实际 id.我收到此错误消息:

It seems as though the db is interpreting the upc as the actual id when I run my python script to copy into the db. I'm getting this error message:

Error: value "1111111" is out of range for type integer
CONTEXT:  COPY physical, line 1, column id: "1111111"

我以前从未尝试过,但我相信这是正确的:

I've never attempted this before, but I believe this is correct:

def insert_csv(f, table):
    connection = get_postgres_connection()
    cursor = connection.cursor()
    try:
        cursor.copy_from(f, table, sep=',')
        connection.commit()
        return True
    except (psycopg2.Error) as e:
        print(e)
        return False
    finally:
        cursor.close()
        connection.close()

我在这里做错了什么,还是我必须创建另一个脚本才能从表中获取最后一个 id?

Am I doing something wrong here, or do I have to create another script to get the last id from the table?

更新的工作代码:

def insert_csv(f, table, columns):
    connection = get_postgres_connection()
    cursor = connection.cursor()
    try:
        column_names = ','.join(columns)
        query = f'''
            COPY {table}({column_names})
            FROM STDOUT (FORMAT CSV)
        '''
        cursor.copy_expert(query, f)
        connection.commit()
        return True
    except (psycopg2.Error) as e:
        print(e)
        return False
    finally:
        cursor.close()
        connection.close()

columns = (
        "upc",
        "date_thru",
        "transaction_type",
        "transaction_type_subtype",
        "country_code",
        "customer",
        "quantity",
        "income_gross",
        "fm_serial",
        "date_usage"
    )

with open(dump_file, 'r', newline='', encoding="ISO-8859-1") as f:
        inserted = insert_csv(f, 'physical', columns)


解决方案

您需要指定要导入的列.来自文档:

You need to specify columns to import. From the documentation:

columns – 可与要导入的列的名称进行迭代.长度和类型应与要读取的文件的内容相匹配.如果未指定,则假定整个表与文件结构匹配.

columns – iterable with name of the columns to import. The length and types should match the content of the file to read. If not specified, it is assumed that the entire table matches the file structure.

您的代码可能如下所示:

Your code may look like this:

def insert_csv(f, table, columns):
    connection = connect()
    cursor = connection.cursor()
    try:
        cursor.copy_from(f, table, sep=',', columns=columns)
        connection.commit()
        return True
    except (psycopg2.Error) as e:
        print(e)
        return False
    finally:
        cursor.close()
        connection.close()
        
with open("path_to_my_csv") as file:
    insert_csv(file, "my_table", ("upc", "date", "quantity", "customer"))

如果您必须使用 copy_expert(),请按以下方式修改您的函数:

If you have to use copy_expert() modify your function in the way as follow:

def insert_csv(f, table, columns):
    connection = connect()
    cursor = connection.cursor()
    try:
        column_names = ','.join(columns)
        copy_cmd = f"copy {table}({column_names}) from stdout (format csv)"
        cursor.copy_expert(copy_cmd, f)
        connection.commit()
        return True
    except (psycopg2.Error) as e:
        print(e)
        return False
    finally:
        cursor.close()
        connection.close()

相关文章