Impala type conversion functions
Conversion functions are usually used in combination with other functions, to
explicitly pass the expected data types. Impala has strict rules regarding data types for
function parameters. For example, Impala does not automatically convert a
DOUBLE
value to FLOAT
, a BIGINT
value to
INT
, or other conversion where precision could be lost or overflow could
occur. Also, for reporting or dealing with loosely defined schemas in big data contexts, you
might frequently need to convert values to or from the STRING
type.
Function reference:
Impala supports the following type conversion functions:
- CAST(expression AS type)
-
Purpose: Returns expression converted to the
type data type. If the expression value is of a type that cannot be converted to the target type:
- Of
DECIMAL
,DATE
, andBOOLEAN
, the function returns an error. - Of all other types, the function returns
NULL
.
Usage notes:
Use
CAST
when passing a column value or literal to a function that expects a parameter with a different type. Frequently used in SQL operations such asCREATE TABLE AS SELECT
andINSERT ... VALUES
to ensure that values from various sources are of the appropriate type for the destination columns.Where practical, do a one-time
CAST()
operation during the ingestion process to make each column into the appropriate type, rather than using manyCAST()
operations in each query; doing type conversions for each row during each query can be expensive for tables with millions or billions of rows.The way this function deals with time zones when converting to or from
TIMESTAMP
values is affected by the‑‑use_local_tz_for_unix_timestamp_conversions
startup flag for the impalad daemon. See TIMESTAMP data type for details about how Impala handles time zone considerations for theTIMESTAMP
data type. - Of
- CAST(expression AS type FORMAT pattern)
-
Purpose: Returns expression converted to
the type data type based on the
pattern format string. This signature of
CAST()
with theFORMAT
clause is supported only for casts betweenSTRING
/CHAR
/VARCHAR
types andTIMESTAMP
/DATE
types.The following rules apply to pattern. Any exceptions to these rules are noted in the Details column of the table below.-
pattern is a case-insensitive
STRING
unless noted otherwise in the table below. - If pattern is
NULL
, an empty string, or a number, an error returns. - A fewer digits in expression than
specified in the pattern is accepted if a
separator is correctly specified in the
pattern. For example,
CAST('5-01-2017' AS DATE FORMAT 'MM-dd-yyyy')
returnsDATE'2017-05-01'
. - If fewer number of digits are in
expression than specified in the
pattern, the current date is used to
complete the year pattern. For example,
CAST('19/05' AS DATE FORMAT 'YYYY/MM')
will returnDATE'2019-05-01'
when executed on August 8, 2019.
The following format patterns are supported in theFORMAT
clause.Pattern Description Details YYYY
4-digit year. YYY
Last 3 digits of a year. YY
Last 2 digits of a year. Y
Last digit of a year RRRR
4-digit round year If 1, 3 or 4-digit year values are provided in expression, treated as
YYYY
.If 2-digit years are provided in expression, treated as
RR
.For date/time to string conversions, treated as
YYYY
.If
YYYY
,YYY
,YY
,Y
, orRR
is given in the same pattern for a string to date/time conversion, an error returns.RR
2-digit round year. For date/time to string conversion, same as
YY
.For string to date/time conversions, the first 2 digits of the year in the return value depends on the specified two-digit year and the last two digits of the current year as follows:-
If the specified 2-digit year is 00 to 49:
-
If the last 2 digits of the current year are 00 to 49, the returned year has the same first 2 digits as the current year.
-
If the last 2 digits of the current year are 50 to 99, the first 2 digits of the returned year are 1 greater than the first 2 digits of the current year.
-
-
If the specified 2-digit year is 50 to 99:
-
If the last 2 digits of the current year are 00 to 49, the first 2 digits of the returned year are 1 less than the first 2 digits of the current year.
-
If the last 2 digits of the current year are 50 to 99, the returned year has the same first 2 digits as the current year.
-
If
YYYY
,YYY
,YY
,Y
, orRR
is given in the same pattern for a string to date/time conversion, an error returns.If 1-digit year values are provided in expression, it is treated as
YYYY
.Q
Quarter of year (1-4) Not supported in string to date/time conversions.
MM
Month In date/time to string conversions, 1-digit months are prefixed with a zero.
MONTH
Month
month
Full month name In string to date/time conversions: - Converts a textual month name to a 2-digit month number.
- The input strings are expected without trailing
spaces, e.g. "
June
", "August
". - If the
FX
modifier is active for the token, a 9-character-long full month name with trailing spaces is expected for the input string.
In date/time to string conversions:- Converts to a 9-character-long full month name with trailing spaces.
- If the
FM
modifier is active for this token, converts to a month name without trailing spaces.
This pattern token is case sensitive.
MONTH
,Month
,month
would require different casing of the input in string to date/time conversions and would produce cased outputs in date/time to string conversions, For example,MONTH
,Month
,month
would convert respectivelyJULY
,July
, andjuly
.MON
Mon
mon
Abbreviated 3-character month name This pattern token is case sensitive.
MON
,Mon
,mon
would require different casing of the input in string to date/time conversions and would produce cased outputs in date/time to string conversions, For example,MON
,Mon
,mon
would convert respectivelyJUL
,Jul
, andjul
.WW
Week of year (1-53) Not supported in string to date/time conversions.
The 1st week begins on January 1st and ends on January 7th.
W
Week of month (1-5) Week of month (1-5) D
Day of week (1-7) Day of week (1-7) DD
Day of month (1-31) In date/time to string conversions, a 1-digit day is prefixed with a zero.
DDD
Day of year (1-366) In string to date/time conversions, providing
MM
andDD
along withDDD
results an error, e.g.YYYY-MM-DDD
.DAY
Day
day
Day name Not supported in string to date/time conversions.
In date/time to string conversions:- Converts to a 9-character-long full day name with trailing spaces.
- If the
FM
modifier is active for this token, converts to a day name without trailing spaces.
This pattern token is case sensitive.
DAY
,Day
,day
would produce cased outputs in date/time to string conversions, For example,DAY
,Day
,day
would convert respectivelyMONDAY
,Monday
, andmonday
.DY
Dy
dy
Not supported in string to date/time conversions.
This pattern token is case sensitive.
DY
,Dy
,dy
would produce cased outputs in date/time to string conversions, For example,DY
,Dy
,dy
would convert respectivelyMON
,Mon
, andmon
.HH
HH12
Hour of day (1-12) In date/time to string conversions, 1-digit hours are prefixed with a zero.
If provided hour in expression is not between 1 and 12, returns an error.
If no AM/PM is provided in expression, the default is
AM
.HH
HH12
Hour of day (1-12) In date/time to string conversions, 1-digit hours are prefixed with a zero.
If provided hour in expression is not between 1 and 12, returns an error.
If no AM/PM is provided in expression, the default is
AM
.HH24
Hour of day (0-23) In string to date/time conversions, if
HH12
,AM
,PM
are given in the same pattern, an error returns.MI
Minute of hour (0-59) In date/time to string conversions, 1-digit minutes are prefixed with a zero.
SS
Second of minute (0-59) In date/time to string conversions, 1-digit seconds are prefixed with a zero.
SSSSS
Second of Day (0-86399) In string to timestamp conversions, if SS
,HH
,HH12
,HH24
,MI
,AM
/PM
are given in the same pattern, an error returns.FF
FF1
, ...,FF9
Fractional second A number, 1 to 9, can be used to indicate the number of digits.
FF
specifies a 9 digits fractional second.AM
PM
A.M.
P.M.
Meridiem indicator For date/time to string conversions,
AM
andPM
are treated as synonyms. For example, casting'2019-01-01 11:00 am'
toTIMESTAMP
with the'YYYY-MM-DD HH12:MI PM'
pattern returns01-JAN-19 11.00.00.000000 AM
.For string to date/time conversion,
HH24
in the same pattern returns an error.TZH
Timezone offset hour An optional sign, + or -, and 2 digits for the value of signed numbers are allowed for the source expression, e.g.
+10
,-05
,04
.The allowed values are from
-15
to+5
.TZM
Timezone offset minute Unsigned numbers are allowed for the source expression.
The allowed values are from
0
to59
.-
.
/
,
'
;
:
<space>
Separator For string to date/time conversions, any separator character in the pattern string would match any separator character in the input expression.
For example,
CAST(“20191010” AS DATE FORMAT “YYYY-MM-DD”)
returns an error, butCAST("2019-.;10 10" AS DATE FORMAT "YYYY-MM-DD")
succeeds.T
Separates the date from the time. This pattern is used for accepting ISO 8601 date/time formats. Example:
YYYY-MM-DDTHH24:MI:SS.FF9Z
Z
Indicates the zero hour offset from UTC. This pattern is used for accepting ISO 8601 date/time formats. FX
Forces an exact match between the format pattern, pattern, and the input argument, expression. Must be specified at the beginning of the format pattern and is valid for the whole pattern.
In string to date/time conversions:
- Forces strict separator matching, including the separator character.
- Expects all the tokens to have the maximum possible length.
In date/time to string conversions, the outputs are padded as follows:
- Numeric outputs are left padded with zeros.
- Text outputs are right padded with spaces.
FM
Overrides FX
.In a date/time to string conversions,
FM
suppresses blank padding for the element immediately following theFM
in the pattern string, e.g. "2010-1-9
".In string to date/time conversions,FM
is used to override the effect ofFX
for certain tokens as follows:-
The length of the token in the input argument can be shorter than the max length of that token type if followed by a separator, e.g. 1-digit month, less than 4-digit year.
-
FM
modifier skips the separators and affects the next non-separator token. For example,CAST('1999-10-2' AS TIMESTAMP FORMAT 'FXYYYY-MMFM-DD')
returns1999-10-02 00:00:00
asFM
is applied toDD
and not to the separator(-
).
Free text Nested string You can specify a free text with surrounding double quotes (") in pattern where the same text is expected in the input argument without surrounding double quotes.
In date/time to string conversions, the string is included in the output with the characters' case preserved.
In string to date/time conversions, the nested string is not included in the resulting date/time object. However, the nested string has to match the related part of the input string, including the case.
The following rules apply:-
If the pattern is surrounded by double quotes, double quotes surrounding the free text token must be escaped with a single backslash (
\"
).For example:
CAST('"Date:"2019-11-10' AS DATE FORMAT "\"Date:\"YYYY-MM-DD")
-
If the pattern is surrounded by double quotes, and there is an escaped double quotes inside a text token that is itself surrounded by escaped double quotes, escape the double quotes in the free text token with a triple backslash: (
\\\"
)For example:
CAST("1985 some \"text 11-28" AS DATE''' FORMAT "YYYY\" some \\\"text \"MM-DD")
-
If the pattern is surrounded by single quotes, the free text token can be surrounded by (non-escaped) double quotes. To include double quotes within the free text token, those double quotes must be escaped by a single backslash.
For example:
CAST('"Date:"2019-11-10' AS DATE FORMAT '"\"Date:\""YYYY-MM-DD')
-
Literal single quotes/apostrophes (
'
) in the nested string must be escaped with a single backslash if the whole pattern string is delimited by single quotes. -
If a free text token contains any separator characters in the beginning, and the text token is right after a separator token sequence, an error returns as it's not trivial to find where the separator sequence ends and where the free text starts. In this case, use
FX
for strict matching.For example:
CAST("1985-11- ' 10" AS DATE FORMAT "YYYY-MM-\" ' \"DD")
returns an error, but removing the dash before the text token makes this succeed.
Examples:
Input Output CAST("02-11-2014" AS DATE FORMAT 'dd-mm-yyyy')
2014-11-02 CAST("365 2014" AS DATE FORMAT 'DDD-YYYY')
2014-12-31 CAST("5-01-26" AS DATE FORMAT 'YY-MM-DD')
Executed at 2019-01-01 11:11:11: 2015-01-26
CAST('2018-11-10T15:11:04Z' AS TIMESTAMP FORMAT 'YYYY-MM-DDTHH24:MI:SSZ')
2018-11-10 15:11:04 CAST("95-01-28" AS DATE FORMAT 'YYY-MM-DD')
Executed at 2019-01-01 11:11:11: 2095-01-28
CAST("49-01-15" AS DATE FORMAT 'RR-MM-DD')
Round year when last 2 digits of current year is greater than 49. Executed at 2099-01-01 11:11:11:
2149-01-15
CAST('2019.10.10 13:30:40.123456 +01:30' AS TIMESTAMP FORMAT 'YYYY-MM-DD HH24:MI:SS.FF9 TZH:TZM')
2019-10-10 13:30:40.123456000 CAST('2001-03-01 03:10:15.123456 -01:30' AS TIMESTAMP FORMAT 'FXYYYY-MM-DD HH12:MI:SS.FF6 TZH:TZM')
2001-03-01 03:10:15.123456000 CAST('2001-03-02 03:10:15' AS TIMESTAMP FORMAT 'FXYYYY MM-DD HH12:MI:SS')
NULL because the separator between the year and month do not match. CAST('2001-3-05' AS TIMESTAMP FORMAT 'FXYYYY-MM-DD')
NULL because the length of the month token does not match the pattern. CAST('2001-3-11 3:15:00.12345' AS TIMESTAMP FORMAT 'FXYYYY-FMMM-DD FMHH12:MI:SS.FMFF')
2001-03-11 03:15:00.123450000 The multiple
FM
modifiers for the month and hour override theFX
and suppress padding.CAST('2019-01-01 01:01:01' AS TIMESTAMP FORMAT 'FMHH12:MI:FMSS')
1:01:1 CAST('Date: 2019-11-10' AS DATE FORMAT '"Date: "YYYY-MM-DD')
2019-11-10 -
pattern is a case-insensitive
- TYPEOF(expression)
-
Purpose: Returns the name of the data type corresponding to
expression. For types with extra attributes,
such as length for
CHAR
andVARCHAR
, or precision and scale forDECIMAL
, includes the full specification of the type.Return type:
STRING
Usage notes: Typically used in interactive exploration of a schema, or in application code that programmatically generates schema definitions such as
CREATE TABLE
statements, for example, to get the type of an expression such ascol1 / col2
orCONCAT(col1, col2, col3)
. This function is especially useful for arithmetic expressions involvingDECIMAL
types because the precision and scale of the result is can be different than that of the operands.Examples:
TYPEOF(2)
returnsTINYINT
.TYPEOF(NOW())
returnsTIMESTAMP
.