TIMESTAMP Data Type
A data type used in CREATE TABLE and ALTER TABLE statements, representing a point in time.
Syntax:
In the column definition of a CREATE TABLE statement:
column_name TIMESTAMP
Range: Allowed date values range from 1400-01-01 to 9999-12-31; this range is different from the Hive TIMESTAMP type. Internally, the resolution of the time portion of a TIMESTAMP value is in nanoseconds.
INTERVAL expressions:
You can perform date arithmetic by adding or subtracting a specified number of time units, using the INTERVAL keyword and the + and - operators or date_add() and date_sub() functions. You can specify units as YEAR[S], MONTH[S], WEEK[S], DAY[S], HOUR[S], MINUTE[S], SECOND[S], MILLISECOND[S], MICROSECOND[S], and NANOSECOND[S]. You can only specify one time unit in each interval expression, for example INTERVAL 3 DAYS or INTERVAL 25 HOURS, but you can produce any granularity by adding together successive INTERVAL values, such as timestamp_value + INTERVAL 3 WEEKS - INTERVAL 1 DAY + INTERVAL 10 MICROSECONDS.
For example:
select now() + interval 1 day; select date_sub(now(), interval 5 minutes); insert into auction_details select auction_id, auction_start_time, auction_start_time + interval 2 days + interval 12 hours from new_auctions;
Time zones:
By default, Impala does not store timestamps using the local timezone, to avoid undesired results from unexpected time zone issues. Timestamps are stored and interpreted relative to UTC, both when written to or read from data files, or when converted to or from Unix time values through functions such as from_unixtime() or unix_timestamp().
Because Impala does not assume that TIMESTAMP values are in any particular time zone, you must be conscious of the time zone aspects of data that you query, insert, or convert.
For consistency with Unix system calls, the TIMESTAMP returned by the now() function represents the local time in the system time zone, rather than in UTC. To store values relative to the current time in a portable way, convert any now() return values using the to_utc_timestamp() function first. For example, the following example shows that the current time in California (where this Impala cluster is located) is shortly after 2 PM. If that value was written to a data file, and shipped off to a distant server to be analyzed alongside other data from far-flung locations, the dates and times would not match up precisely because of time zone differences. Therefore, the to_utc_timestamp() function converts it using a common reference point, the UTC time zone (descended from the old Greenwich Mean Time standard). The 'PDT' argument indicates that the original value is from the Pacific time zone with Daylight Saving Time in effect. When servers in all geographic locations run the same transformation on any local date and time values (with the appropriate time zone argument), the stored data uses a consistent representation. Impala queries can use functions such as EXTRACT(), MIN(), AVG(), and so on to do time-series analysis on those timestamps.
[localhost:21000] > select now(); +-------------------------------+ | now() | +-------------------------------+ | 2015-04-09 14:07:46.580465000 | +-------------------------------+ [localhost:21000] > select to_utc_timestamp(now(), 'PDT'); +--------------------------------+ | to_utc_timestamp(now(), 'pdt') | +--------------------------------+ | 2015-04-09 21:08:07.664547000 | +--------------------------------+
The converse function, from_utc_timestamp(), lets you take stored TIMESTAMP data or calculated results and convert back to local date and time for processing on the application side. The following example shows how you might represent some future date (such as the ending date and time of an auction) in UTC, and then convert back to local time when convenient for reporting or other processing. The final query in the example tests whether this arbitrary UTC date and time has passed yet, by converting it back to the local time zone and comparing it against the current date and time.
[localhost:21000] > select to_utc_timestamp(now() + interval 2 weeks, 'PDT'); +---------------------------------------------------+ | to_utc_timestamp(now() + interval 2 weeks, 'pdt') | +---------------------------------------------------+ | 2015-04-23 21:08:34.152923000 | +---------------------------------------------------+ [localhost:21000] > select from_utc_timestamp('2015-04-23 21:08:34.152923000','PDT'); +------------------------------------------------------------+ | from_utc_timestamp('2015-04-23 21:08:34.152923000', 'pdt') | +------------------------------------------------------------+ | 2015-04-23 14:08:34.152923000 | +------------------------------------------------------------+ [localhost:21000] > select from_utc_timestamp('2015-04-23 21:08:34.152923000','PDT') < now(); +--------------------------------------------------------------------+ | from_utc_timestamp('2015-04-23 21:08:34.152923000', 'pdt') < now() | +--------------------------------------------------------------------+ | false | +--------------------------------------------------------------------+
Conversions:
Impala automatically converts STRING literals of the correct format into TIMESTAMP values. Timestamp values are accepted in the format "yyyy-MM-dd HH:mm:ss.SSSSSS", and can consist of just the date, or just the time, with or without the fractional second portion. For example, you can specify TIMESTAMP values such as '1966-07-30', '08:30:00', or '1985-09-25 17:45:30.005'. Casting an integer or floating-point value N to TIMESTAMP produces a value that is N seconds past the start of the epoch date (January 1, 1970).
In Impala 1.3 and higher, the FROM_UNIXTIME() and UNIX_TIMESTAMP() functions allow a wider range of format strings, with more flexibility in element order, repetition of letter placeholders, and separator characters. See Impala Date and Time Functions for details.
Partitioning:
Although you cannot use a TIMESTAMP column as a partition key, you can extract the individual years, months, days, hours, and so on and partition based on those columns. Because the partition key column values are represented in HDFS directory names, rather than as fields in the data files themselves, you can also keep the original TIMESTAMP values if desired, without duplicating data or wasting storage space. See Partition Key Columns for more details on partitioning with date and time values.
[localhost:21000] > create table timeline (event string) partitioned by (happened timestamp); ERROR: AnalysisException: Type 'TIMESTAMP' is not supported as partition-column type in column: happened
Examples:
select cast('1966-07-30' as timestamp); select cast('1985-09-25 17:45:30.005' as timestamp); select cast('08:30:00' as timestamp); select hour('1970-01-01 15:30:00'); -- Succeeds, returns 15. select hour('1970-01-01 15:30'); -- Returns NULL because seconds field required. select hour('1970-01-01 27:30:00'); -- Returns NULL because hour value out of range. select dayofweek('2004-06-13'); -- Returns 1, representing Sunday. select dayname('2004-06-13'); -- Returns 'Sunday'. select date_add('2004-06-13', 365); -- Returns 2005-06-13 with zeros for hh:mm:ss fields. select day('2004-06-13'); -- Returns 13. select datediff('1989-12-31','1984-09-01'); -- How many days between these 2 dates? select now(); -- Returns current date and time in local timezone. create table dates_and_times (t timestamp); insert into dates_and_times values ('1966-07-30'), ('1985-09-25 17:45:30.005'), ('08:30:00'), (now());
NULL considerations: Casting any unrecognized STRING value to this type produces a NULL value.
Partitioning: Because this type potentially has so many distinct values, it is often not a sensible choice for a partition key column. For example, events 1 millisecond apart would be stored in different partitions. Consider using the TRUNC() function to condense the number of distinct values, and partition on a new column with the truncated values.
HBase considerations: This data type is fully compatible with HBase tables.
Parquet considerations: This type is fully compatible with Parquet tables.
Text table considerations: Values of this type are potentially larger in text tables than in tables using Parquet or other binary formats.
Internal details: Represented in memory as a 16-byte value.
Added in: Available in all versions of Impala.
Column statistics considerations: Because this type has a fixed size, the maximum and average size fields are always filled in for column statistics, even before you run the COMPUTE STATS statement.
Sqoop considerations:
If you use Sqoop to convert RDBMS data to Parquet, be careful with interpreting any resulting values from DATE, DATETIME, or TIMESTAMP columns. The underlying values are represented as the Parquet INT64 type, which is represented as BIGINT in the Impala table. The Parquet values represent the time in milliseconds, while Impala interprets BIGINT as the time in seconds. Therefore, if you have a BIGINT column in a Parquet table that was imported this way from Sqoop, divide the values by 1000 when interpreting as the TIMESTAMP type.
Restrictions:
If you cast a STRING with an unrecognized format to a TIMESTAMP, the result is NULL rather than an error. Make sure to test your data pipeline to be sure any textual date and time values are in a format that Impala TIMESTAMP can recognize.
Currently, Avro tables cannot contain TIMESTAMP columns. If you need to store date and time values in Avro tables, as a workaround you can use a STRING representation of the values, convert the values to BIGINT with the UNIX_TIMESTAMP() function, or create separate numeric columns for individual date and time fields using the EXTRACT() function.
Related information:
- Timestamp Literals.
- To convert to or from different date formats, or perform date arithmetic, use the date and time functions described in Impala Date and Time Functions. In particular, the from_unixtime() function requires a case-sensitive format string such as "yyyy-MM-dd HH:mm:ss.SSSS", matching one of the allowed variations of a TIMESTAMP value (date plus time, only date, only time, optional fractional seconds).
- See SQL Differences Between Impala and Hive for details about differences in TIMESTAMP handling between Impala and Hive.