Complex data types
Cloudera Data Visualization understands native complex data type configurations, and automatically translates complex structures into intuitive drag and drop dataset field configurations.
Native support for complex datatypes allows us to bypass ETL workflows and avoid unnecessary flattening at query time. It also leverages the powerful native SQL already optimized for complex types.
Overview of complex data types
- STRUCT
A group of named fields, where each field can have a different data type, both primitive and complex.
See STRUCT data type.
- ARRAY
A list of values that share the same data type.
See ARRAY data type.
- MAP
A set of (key,value) pairs.
See MAP data type.
The elements of an ARRAY or a
MAP, or the fields of a
STRUCT may be primitive, or they may be other
complex types. You can construct elaborate data structures with many levels, such as
a ARRAY
with STRUCT
elements.
Appearance of complex data types
Cloudera Data Visualization presents complex types very simply. Users can manipulate the individual component fields of complex types in the same manner as the primitive data type fields: place them on the shelves of the visual, as a component of an expressions, as a filter, and so on. Each level of a complex data type may be expanded to show component details, or collapsed.
Cloudera Data Visualization processes complex types as Dimensions, grouping individual elements under the complex type. The components of a complex type cannot be re-assigned to the Measures group even if they logically represent a measurement; however, you can easily use them as measurements on visual shelves, in filters, and so on.
Because native SQL processing is supported, you do not have to worry about the complexities of query generation to access these elements.
Restrictions
Complex data types have the following implementation restrictions:
MAP
,ARRAY
, andSTRUCT
fields require base tables or partitions that use the Parquet file format.- You cannot use the fields that contain complex data types as partition keys in a partitioned table.
- The Compute Statistics operation does not work for complex data types.
- The column definition for any complex type, including nested types, can have a maximum of 4000 characters.
- The Import into Dataset feature does not support files that contain complex datatypes.
- Analytical Views do not support complex type queries,
because complex types require
JOIN
operations. As a result, visuals that use complex data types do not have valid recommendations. However, if you associate a logical view with the dataset, then the recommendation engine proceeds to generate suggestions, and presents the relevant dataset columns as complex data types.