TIMESTAMP data type
In Impala, the TIMESTAMP data type holds a value of date and time. It
    can be decomposed into year, month, day, hour, minute and seconds fields, but with no time zone
    information available, it does not correspond to any specific point in time.
Internally, the resolution of the time portion of a TIMESTAMP value is in
        nanoseconds. 
Syntax:
      In the column definition of a CREATE TABLE statement:
    
column_name TIMESTAMPtimestamp [+ | -] INTERVAL interval
DATE_ADD (timestamp, INTERVAL interval time_unit)Range: 1400-01-01 to 9999-12-31
 Out of range TIMESTAMP values are converted to NULL. 
 The range of Impala TIMESTAMP is different from the Hive
          TIMESTAMP type.
INTERVAL expressions:
 You can perform date arithmetic by adding or subtracting a specified
      number of time units, using the INTERVAL keyword and the
        + operator, the - operator,
        date_add() or date_sub(). 
time_unit
      in the INTERVAL clause: - 
          YEAR[S]
- 
          MONTH[S]
- 
          WEEK[S]
- 
          DAY[S]
- 
          HOUR[S]
- 
          MINUTE[S]
- 
          SECOND[S]
- 
          MILLISECOND[S]
- 
          MICROSECOND[S]
- 
          NANOSECOND[S]
 You can only specify one time unit in each interval expression, for
      example INTERVAL 3 DAYS or INTERVAL 25
        HOURS, but you can produce any granularity by adding together
      successive INTERVAL values, such as
          timestamp_value + INTERVAL 3 WEEKS -
        INTERVAL 1 DAY + INTERVAL 10 MICROSECONDS. 
Internal details: Represented in memory as a 16-byte value.
Time zones:
 By default, Impala stores and interprets TIMESTAMP
      values in UTC time zone when writing to data files, reading from data
      files, or converting to and from system time values through functions. 
--use_local_tz_for_unix_timestamp_conversions startup
      flag to TRUE, Impala treats the
        TIMESTAMP values specified in the local time zone. The
      local time zone is determined in the following order with the
        TIMESTAMP query option takes the highest precedence: -  The TIMESTAMPquery option
- 
          $TZenvironment variable
- System time zone where the impalad coordinator runs
 The --use_local_tz_for_unix_timestamp_conversions
      setting can be used to fix discrepancy in INTERVAL
      operations. For example, a TIMESTAMP + INTERVAL
          n-hours can be affected by Daylight Saving
      Time, which Impala does not consider by default as these operations are
      applied as if the timestamp was in UTC. You can use the
        --use_local_tz_for_unix_timestamp_conversions setting
      to fix the issue. 
See Customizing time zones for configuring to use custom time zone database and aliases.
 See Impala date and time
          functions for the list of functions affected by the
          --use_local_tz_for_unix_timestamp_conversions setting. 
Time zone handling between Impala and Hive:
Interoperability between Hive and Impala is different depending on the file format.
- TextFor text tables, TIMESTAMPvalues can be written and read interchangeably by Impala and Hive as Hive reads and writesTIMESTAMPvalues without converting with respect to time zones.
- ParquetWhen Hive writes to Parquet data files, the TIMESTAMPvalues are normalized to UTC from the local time zone of the host where the data was written. On the other hand, Impala does not make any time zone adjustment when it writes or readsINT96 TIMESTAMPvalues to Parquet files. This difference in time zone handling can cause potentially inconsistent results when Impala processesTIMESTAMPvalues in the Parquet files written by Hive.To avoid incompatibility problems or having to code workarounds, you can specify one or both of these impalad startup flags:- 
                --use_local_tz_for_unix_timestamp_conversions=true
- 
                --convert_legacy_hive_parquet_utc_timestamps=true
 When the ‑‑convert_legacy_hive_parquet_utc_timestampssetting is enabled, Impala recognizes the Parquet data files written by Hive, and applies the same UTC-to-local-timezone conversion logic during the query as Hive does.In Impala 3.0 and lower, the ‑‑convert_legacy_hive_parquet_utc_timestampssetting had a severe impact on multi-threaded performance. The new time zone implementation in Impala 3.1 eliminated most of the performance overhead and made Impala scale well to multiple threads. The‑‑convert_legacy_hive_parquet_utc_timestampssetting is turned off by default for a performance reason. To avoid unexpected incompatibility problems, you should turn on the option when processingTIMESTAMPcolumns in Parquet files written by Hive.Hive currently cannot write INT64TIMESTAMPvalues.In Impala 3.2 and higher, INT64 TIMESTAMPvalues annotated with theTIMESTAMP_MILLISorTIMESTAMP_MICROSOriginalTypeare assumed to be always UTC normalized, so the UTC to local conversion will be always done.INT64 TIMESTAMPannotated with theTIMESTAMPLogicalTypespecifies whether UTC to local conversion is necessary depending on the Parquet metadata.
- 
                
Conversions:
 Impala automatically converts
          STRING literals of the correct format into
          TIMESTAMP values. Timestamp values are accepted in
        the format 'yyyy‑MM‑dd HH:mm:ss.SSSSSS', and can
        consist of just the date, or just the time, with or without the
        fractional second portion. For example, you can specify
          TIMESTAMP values such as
          '1966‑07‑30', '08:30:00', or
          '1985‑09‑25 17:45:30.005'. 
Leading zeroes are not required in the numbers representing the date
        component, such as month and date, or the time component, such as hour,
        minute, and second. For example, Impala accepts both
          '2018‑1‑1 01:02:03' and
          '2018-01-01 1:2:3' as valid.
 In STRING to TIMESTAMP conversions,
        leading and trailing white spaces, such as a space, a tab, a newline, or
        a carriage return, are ignored. For example, Impala treats the following
        as equivalent: '1999‑12‑01 01:02:03 ', ' 1999‑12‑01 01:02:03',
        '1999‑12‑01 01:02:03\r\n\t'. 
STRING literal to TIMESTAMP, you can
        use the following separators between the date part and the time part: - 
            One or more space characters Example: CAST('2001-01-09 01:05:01' AS TIMESTAMP)
- 
            The character “T” Example: CAST('2001-01-09T01:05:01' AS TIMESTAMP)
      Casting an integer or floating-point
          value N to TIMESTAMP produces a
          value that is N seconds past the start of the epoch
          date (January 1, 1970). By default, the result value represents a date
          and time in the UTC time zone. If the setting
            ‑‑use_local_tz_for_unix_timestamp_conversions=true
          is in effect, the resulting TIMESTAMP represents a
          date and time in the local time zone.
    
 In Impala 1.3 and higher, the FROM_UNIXTIME() and
          UNIX_TIMESTAMP() functions allow a wider range of format strings, with
        more flexibility in element order, repetition of letter placeholders, and separator
        characters. In Impala 2.3 and higher, the UNIX_TIMESTAMP() function also
        allows a numeric timezone offset to be specified as part of the input string. See Impala date and time functions for details. 
 In Impala 2.2.0 and higher, built-in functions
        that accept or return integers representing TIMESTAMP
        values use the BIGINT type for parameters and return
        values, rather than INT. This change lets the date and
        time functions avoid an overflow error that would otherwise occur on
        January 19th, 2038 (known as the Year 2038 problem
 or Y2K38
        problem
). This change affects the
          FROM_UNIXTIME() and UNIX_TIMESTAMP()
        functions. You might need to change application code that interacts with
        these functions, change the types of columns that store the return
        values, or add CAST() calls to SQL statements that call
        these functions. 
Partitioning:
 Although you cannot use a TIMESTAMP column as a partition key, you can extract
        the individual years, months, days, hours, and so on and partition based on those columns.
        Because the partition key column values are represented in HDFS directory names, rather than
        as fields in the data files themselves, you can also keep the original
          TIMESTAMP values if desired, without duplicating data or wasting storage
        space. See Partition Key Columns for more details on partitioning with date and
        time values.
[localhost:21000] > create table timeline (event string) partitioned by (happened timestamp);
ERROR: AnalysisException: Type 'TIMESTAMP' is not supported as partition-column type in column: happened
        NULL considerations: Casting any unrecognized STRING value to this type produces a
        NULL value.
      
HBase considerations: This data type is fully compatible with HBase tables.
        Parquet consideration:
        INT96 and INT64 encoded Parquet timestamps are supported
        in Impala.
Parquet considerations: This type is fully compatible with Parquet tables.
Text table considerations: Values of this type are potentially larger in text tables than in tables using Parquet or other binary formats.
        Column statistics considerations: Because this type has a fixed size, the maximum and average size
        fields are always filled in for column statistics, even before you run the COMPUTE STATS
        statement.
      
Kudu considerations:
TIMESTAMP columns in Kudu tables, instead of representing the date
        and time as a BIGINT value. The behavior of TIMESTAMP for
        Kudu tables has some special considerations: - 
            Any nanoseconds in the original 96-bit value produced by Impala are not stored, because Kudu represents date/time columns using 64-bit values. The nanosecond portion of the value is rounded, not truncated. Therefore, a TIMESTAMPvalue that you store in a Kudu table might not be bit-for-bit identical to the value returned by a query.
- 
            The conversion between the Impala 96-bit representation and the Kudu 64-bit representation introduces some performance overhead when reading or writing TIMESTAMPcolumns. You can minimize the overhead during writes by performing inserts through the Kudu API. Because the overhead during reads applies to each query, you might continue to use aBIGINTcolumn to represent date/time values in performance-critical applications.
- 
            The Impala TIMESTAMPtype has a narrower range for years than the underlying Kudu data type. Impala can represent years 1400-9999. If year values outside this range are written to a Kudu table by a non-Impala client, Impala returnsNULLby default when reading thoseTIMESTAMPvalues during a query. Or, if theABORT_ON_ERRORquery option is enabled, the query fails when it encounters a value with an out-of-range year.
Restrictions:
      If you cast a STRING with an unrecognized format to a
      TIMESTAMP, the result is NULL rather than an error. Make
      sure to test your data pipeline to be sure any textual date and time values are in a
      format that Impala TIMESTAMP can recognize.
    
        Currently, Avro tables cannot contain TIMESTAMP columns. If you need to store date and
        time values in Avro tables, as a workaround you can use a STRING representation of the
        values, convert the values to BIGINT with the UNIX_TIMESTAMP() function,
        or create separate numeric columns for individual date and time fields using the EXTRACT()
        function.
      
Examples:
      The following examples demonstrate using TIMESTAMP values with built-in
      functions:
    
select cast('1966-07-30' as timestamp);
select cast('1985-09-25 17:45:30.005' as timestamp);
select cast('08:30:00' as timestamp);
select hour('1970-01-01 15:30:00');         -- Succeeds, returns 15.
select hour('1970-01-01 15:30');            -- Returns NULL because seconds field required.
select hour('1970-01-01 27:30:00');         -- Returns NULL because hour value out of range.
select dayofweek('2004-06-13');             -- Returns 1, representing Sunday.
select dayname('2004-06-13');               -- Returns 'Sunday'.
select date_add('2004-06-13', 365);         -- Returns 2005-06-13 with zeros for hh:mm:ss fields.
select day('2004-06-13');                   -- Returns 13.
select datediff('1989-12-31','1984-09-01'); -- How many days between these 2 dates?
select now();                               -- Returns current date and time in local timezone.
      The following examples demonstrate using TIMESTAMP values with
      HDFS-backed tables:
    
create table dates_and_times (t timestamp);
insert into dates_and_times values
  ('1966-07-30'), ('1985-09-25 17:45:30.005'), ('08:30:00'), (now());
      The following examples demonstrate using TIMESTAMP values with Kudu
      tables:
    
create table timestamp_t (x int primary key, s string, t timestamp, b bigint)
  partition by hash (x) partitions 16
  stored as kudu;
-- The default value of now() has microsecond precision, so the final 3 digits
-- representing nanoseconds are all zero.
insert into timestamp_t values (1, cast(now() as string), now(), unix_timestamp(now()));
-- Values with 1-499 nanoseconds are rounded down in the Kudu TIMESTAMP column.
insert into timestamp_t values (2, cast(now() + interval 100 nanoseconds as string), now() + interval 100 nanoseconds, unix_timestamp(now() + interval 100 nanoseconds));
insert into timestamp_t values (3, cast(now() + interval 499 nanoseconds as string), now() + interval 499 nanoseconds, unix_timestamp(now() + interval 499 nanoseconds));
-- Values with 500-999 nanoseconds are rounded up in the Kudu TIMESTAMP column.
insert into timestamp_t values (4, cast(now() + interval 500 nanoseconds as string), now() + interval 500 nanoseconds, unix_timestamp(now() + interval 500 nanoseconds));
insert into timestamp_t values (5, cast(now() + interval 501 nanoseconds as string), now() + interval 501 nanoseconds, unix_timestamp(now() + interval 501 nanoseconds));
-- The string representation shows how underlying Impala TIMESTAMP can have nanosecond precision.
-- The TIMESTAMP column shows how timestamps in a Kudu table are rounded to microsecond precision.
-- The BIGINT column represents seconds past the epoch and so if not affected much by nanoseconds.
select s, t, b from timestamp_t order by t;
+-------------------------------+-------------------------------+------------+
| s                             | t                             | b          |
+-------------------------------+-------------------------------+------------+
| 2017-05-31 15:30:05.107157000 | 2017-05-31 15:30:05.107157000 | 1496244605 |
| 2017-05-31 15:30:28.868151100 | 2017-05-31 15:30:28.868151000 | 1496244628 |
| 2017-05-31 15:34:33.674692499 | 2017-05-31 15:34:33.674692000 | 1496244873 |
| 2017-05-31 15:35:04.769166500 | 2017-05-31 15:35:04.769167000 | 1496244904 |
| 2017-05-31 15:35:33.033082501 | 2017-05-31 15:35:33.033083000 | 1496244933 |
+-------------------------------+-------------------------------+------------+
Added in: Available in all versions of Impala.
Related information:
- 
        To convert to or from different date formats, or perform date arithmetic, use the date
        and time functions described in
        Impala date and time functions. In
        particular, the from_unixtime()function requires a case-sensitive format string such as"yyyy-MM-dd HH:mm:ss.SSSS", matching one of the allowed variations of aTIMESTAMPvalue (date plus time, only date, only time, optional fractional seconds).
- 
        See SQL differences between Impala and Hive for
        details about differences in TIMESTAMPhandling between Impala and Hive.
