无法使用 pandas to_sql()方法将数据插入Snowflake数据库表
问题描述
我的Snowflake实例上有一个数据库SFOPT_TEST
。数据库有两个架构AUDITS
和PARAMS
。
架构AUDITS
有一个使用SQLAlChemydeclarative_base()
创建的表-
class AccountUsageLoginHistory(Base):
'''
This model will store the account parameters of the customers instances.
'''
__tablename__ = constants.TABLE_ACCOUNT_USAGE_LOGIN_HISTORY
__table_args__ = {
'schema' : os.environ.get('SCHEMA_NAME_AUDITS')
}
id = Column(Integer, Sequence('id_login_history'), primary_key=True, autoincrement=True)
event_id = Column(Integer, nullable=True)
event_timestamp = Column(TIMESTAMP, nullable=True)
event_type = Column(String(100), nullable=True)
user_name = Column(String(100), nullable=True)
client_ip = Column(String(100), nullable=True)
reported_client_type = Column(String(100), nullable=True)
reported_client_version = Column(String(100), nullable=True)
first_authentication_factor = Column(String(100), nullable=True)
second_authentication_factor = Column(String(100), nullable=True)
is_success = Column(String(100), nullable=True)
error_code = Column(String(200), nullable=True)
error_message = Column(String(100), nullable=True)
related_event_id = Column(Integer, nullable=True)
event = Column(String(200), nullable=True)
instance_id = Column(Integer, nullable=True)
company_id = Column(Integer, nullable=True)
user_id = Column(Integer, nullable=True)
date_run = Column(Date, nullable=True)
def __repr__(self):
#return the class object.
return "<LoginHistory({})>".format(self.id)
这就是在实例上创建表的方式-
我有一个如下所述的列数据帧,需要将其插入到上面创建的表中-
Index(['event_id', 'event_timestamp', 'event_type', 'user_name', 'client_ip',
'reported_client_type', 'reported_client_version',
'first_authentication_factor', 'second_authentication_factor',
'is_success', 'error_code', 'error_message', 'related_event_id',
'instance_id', 'user_id', 'event', 'company_id', 'date_run'],
dtype='object')
所以要插入我使用的to_sql()
方法,如下所示-
dataframe.to_sql(table_name, self.engine, index=False, method=pd_writer, if_exists="append")
这将返回错误-
Traceback (most recent call last):
File "metadata_collection.py", line 59, in <module>
y = x.collect_process_dump(sql='SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.LOGIN_HISTORY;', table_name='account_usage_login_history')
File "metadata_collection.py", line 55, in collect_process_dump
load_data = self.load_data.dump_data(table_name=table_name, dataframe=associate_df)
File "/snowflake-backend/snowflake/collect_metadata/load_data.py", line 16, in dump_data
dataframe.to_sql(table_name, self.engine, index=False, method=pd_writer, if_exists="append")
File "/usr/local/lib/python3.7/site-packages/pandas/core/generic.py", line 2663, in to_sql
method=method,
File "/usr/local/lib/python3.7/site-packages/pandas/io/sql.py", line 521, in to_sql
method=method,
File "/usr/local/lib/python3.7/site-packages/pandas/io/sql.py", line 1317, in to_sql
table.insert(chunksize, method=method)
File "/usr/local/lib/python3.7/site-packages/pandas/io/sql.py", line 755, in insert
exec_insert(conn, keys, chunk_iter)
File "/usr/local/lib/python3.7/site-packages/snowflake/connector/pandas_tools.py", line 168, in pd_writer
schema=table.schema)
File "/usr/local/lib/python3.7/site-packages/snowflake/connector/pandas_tools.py", line 135, in write_pandas
copy_results = cursor.execute(copy_into_sql, _is_internal=True).fetchall()
File "/usr/local/lib/python3.7/site-packages/snowflake/connector/cursor.py", line 597, in execute
errvalue)
File "/usr/local/lib/python3.7/site-packages/snowflake/connector/errors.py", line 124, in errorhandler_wrapper
cursor.errorhandler(connection, cursor, error_class, error_value)
File "/usr/local/lib/python3.7/site-packages/snowflake/connector/errors.py", line 89, in default_errorhandler
done_format_msg=error_value.get('done_format_msg'))
snowflake.connector.errors.ProgrammingError: 100072 (22000): 0198d465-0b4e-b74d-0000-d5e5000b524a: NULL result in a non-nullable column
此错误是因为我的雪花表中有一个字段id
作为primary key
,它不能是null
。为了自动递增,我创建了一个序列,如上面class AccountUsageLoginHistory
中所示。此外,在上面附加的屏幕截图中,id
的默认值是IDENTITY START 1 INCREMENT 1
。所有其他列都是nullable=True,因此问题仅与id
有关。
我仍然无法将数据插入到我的表中。
MSSQL
如果您习惯于推荐答案或ORACLE,这可能会让您感到困惑,但是当您有一个NOT NULL约束(这是Snowflake强制执行的唯一约束)时,Snowflake不允许您忽略INSERT上的列。但是,由于您正在使用序列添加默认值,因此您可以将该列设置为可空,插入将会成功,并且如您所料,它将使用默认值填充ID列。
唯一需要注意的是,如果用户以这种方式插入表:
INSERT INTO TABLE_ACCOUNT_USAGE_LOGIN_HISTORY(ID, EVENT_ID)
VALUES(NULL, 2);
查询将成功添加ID值为NULL的新行。
相关文章