Impala的use_local_tz_for_unix_timestamp_conversions
使用过Impala的同学都知道,impala默认对于timestamp都是当成UTC来处理的,并不会做任何的时区转换。这也就是说,当你写入一个timestamp的数据时,impala就会把它当成是UTC的时间存起来,而不是本地时间。但是Impala同时又提供了use_local_tz_for_unix_timestamp_conversions和convert_legacy_hive_parquet_utc_timestamps这两个参数来处理timestamp的时区问题。convert_legacy_hive_parquet_utc_timestamps这个参数主要是用来处理hive写parquet文件,impala读取的问题,本文暂不展开,这里主要介绍下use_local_tz_for_unix_timestamp_conversions这个参数的作用。首先,我们来看下官方的解释: The --use_local_tz_for_unix_timestamp_conversions setting affects conversions from TIMESTAMP to BIGINT, or from BIGINT to TIMESTAMP. By default, Impala treats all TIMESTAMP values as UTC, to simplify analysis of time-series data from different geographic regions. When you enable the --use_local_tz_for_unix_timestamp_conversions setting, these operations treat the input values as if they are in the local time zone of the host doing the processing. See Impala Date and Time Functions for the list of functions affected by the --use_local_tz_for_unix_timestamp_conversions setting. 简单来说,就是开启了这个参数之后(默认false,表示关闭),当SQL里面涉及到了timestamp->bigint/bigint->timestamp的转换操作时,impala会把timestamp当成是本地的时间来处理,而不是UTC时间。这个地方听起来似乎很简单,但是实际理解起来的时候非常容易出错,这里笔者将结合自己的实际测试结果来看一下use_local_tz_for_unix_timestamp_conversions这个参数究竟是如何起作用的。
测试数据准备
首先,我们将测试集群的impala的use_local_tz_for_unix_timestamp_conversions和convert_legacy_hive_parquet_utc_timestamps参数都配置为false,然后重启集群。 接着,我们使用如下SQL来创建测试表,然后插入数据:
create table timestamp_test_parquet(id int,ts timestamp,sec bigint) stored as parquet;
insert into timestamp_test_parquet values(1001,UTC_TIMESTAMP(),cast(UTC_TIMESTAMP() as bigint));
相关文章