无法使用 pandas to_sql()方法将数据插入Snowflake数据库表

问题描述

我的Snowflake实例上有一个数据库SFOPT_TEST。数据库有两个架构AUDITSPARAMS

架构AUDITS有一个使用SQLAlChemydeclarative_base()创建的表-

class AccountUsageLoginHistory(Base):
    
    '''
    This model will store the account parameters of the customers instances.
    '''

    __tablename__ = constants.TABLE_ACCOUNT_USAGE_LOGIN_HISTORY
    __table_args__ = {
        'schema' : os.environ.get('SCHEMA_NAME_AUDITS')
    }

    id = Column(Integer, Sequence('id_login_history'), primary_key=True, autoincrement=True)
    event_id = Column(Integer, nullable=True)
    event_timestamp = Column(TIMESTAMP, nullable=True)
    event_type = Column(String(100), nullable=True)
    user_name = Column(String(100), nullable=True)
    client_ip = Column(String(100), nullable=True)
    reported_client_type = Column(String(100), nullable=True)
    reported_client_version = Column(String(100), nullable=True)
    first_authentication_factor = Column(String(100), nullable=True)
    second_authentication_factor = Column(String(100), nullable=True)
    is_success = Column(String(100), nullable=True)
    error_code = Column(String(200), nullable=True) 
    error_message = Column(String(100), nullable=True)
    related_event_id = Column(Integer, nullable=True)
    event = Column(String(200), nullable=True)
    instance_id = Column(Integer, nullable=True)
    company_id = Column(Integer, nullable=True)
    user_id = Column(Integer, nullable=True)
    date_run = Column(Date, nullable=True)

    def __repr__(self):
        #return the class object.
        return "<LoginHistory({})>".format(self.id)

这就是在实例上创建表的方式-

我有一个如下所述的列数据帧,需要将其插入到上面创建的表中-

Index(['event_id', 'event_timestamp', 'event_type', 'user_name', 'client_ip',
       'reported_client_type', 'reported_client_version',
       'first_authentication_factor', 'second_authentication_factor',
       'is_success', 'error_code', 'error_message', 'related_event_id',
       'instance_id', 'user_id', 'event', 'company_id', 'date_run'],
      dtype='object')

所以要插入我使用的to_sql()方法,如下所示-

dataframe.to_sql(table_name, self.engine, index=False, method=pd_writer, if_exists="append")

这将返回错误-

Traceback (most recent call last):
  File "metadata_collection.py", line 59, in <module>
    y = x.collect_process_dump(sql='SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.LOGIN_HISTORY;', table_name='account_usage_login_history')
  File "metadata_collection.py", line 55, in collect_process_dump
    load_data = self.load_data.dump_data(table_name=table_name, dataframe=associate_df)
  File "/snowflake-backend/snowflake/collect_metadata/load_data.py", line 16, in dump_data
    dataframe.to_sql(table_name, self.engine, index=False, method=pd_writer, if_exists="append")
  File "/usr/local/lib/python3.7/site-packages/pandas/core/generic.py", line 2663, in to_sql
    method=method,
  File "/usr/local/lib/python3.7/site-packages/pandas/io/sql.py", line 521, in to_sql
    method=method,
  File "/usr/local/lib/python3.7/site-packages/pandas/io/sql.py", line 1317, in to_sql
    table.insert(chunksize, method=method)
  File "/usr/local/lib/python3.7/site-packages/pandas/io/sql.py", line 755, in insert
    exec_insert(conn, keys, chunk_iter)
  File "/usr/local/lib/python3.7/site-packages/snowflake/connector/pandas_tools.py", line 168, in pd_writer
    schema=table.schema)
  File "/usr/local/lib/python3.7/site-packages/snowflake/connector/pandas_tools.py", line 135, in write_pandas
    copy_results = cursor.execute(copy_into_sql, _is_internal=True).fetchall()
  File "/usr/local/lib/python3.7/site-packages/snowflake/connector/cursor.py", line 597, in execute
    errvalue)
  File "/usr/local/lib/python3.7/site-packages/snowflake/connector/errors.py", line 124, in errorhandler_wrapper
    cursor.errorhandler(connection, cursor, error_class, error_value)
  File "/usr/local/lib/python3.7/site-packages/snowflake/connector/errors.py", line 89, in default_errorhandler
    done_format_msg=error_value.get('done_format_msg'))
snowflake.connector.errors.ProgrammingError: 100072 (22000): 0198d465-0b4e-b74d-0000-d5e5000b524a: NULL result in a non-nullable column

此错误是因为我的雪花表中有一个字段id作为primary key,它不能是null。为了自动递增,我创建了一个序列,如上面class AccountUsageLoginHistory中所示。此外,在上面附加的屏幕截图中,id的默认值是IDENTITY START 1 INCREMENT 1。所有其他列都是nullable=True,因此问题仅与id有关。

我仍然无法将数据插入到我的表中。

MSSQL

如果您习惯于推荐答案或ORACLE,这可能会让您感到困惑,但是当您有一个NOT NULL约束(这是Snowflake强制执行的唯一约束)时,Snowflake不允许您忽略INSERT上的列。但是,由于您正在使用序列添加默认值,因此您可以将该列设置为可空,插入将会成功,并且如您所料,它将使用默认值填充ID列。

唯一需要注意的是,如果用户以这种方式插入表:

INSERT INTO TABLE_ACCOUNT_USAGE_LOGIN_HISTORY(ID, EVENT_ID) 
VALUES(NULL, 2);

查询将成功添加ID值为NULL的新行。

相关文章