Using Apache HivePDF version

Using constraints

You can use SQL constraints to enforce data integrity and improve performance. Using constraints, the optimizer can simplify queries. Constraints can make data predictable and easy to locate. Using constraints and supported modifiers, you can follow examples to constrain queries to unique or not null values, for example.

You can use the constraints listed below in your queries. Hive enforces DEFAULT, NOT NULL and CHECK only, not PRIMARY KEY, FOREIGN KEY, and UNIQUE. DEFAULT even if enforced, does not support complex types (array,map,struct). Constraint enforcement is limited to the metadata level. This limitation aids integration with third party tools and optimization of constraints declarations, such as materialized view rewriting.

CHECK
Limits the range of values you can place in a column.
DEFAULT
Ensures a value exists, which is useful in offloading data from a data warehouse.
PRIMARY KEY
Identifies each row in a table using a unique identifier.
FOREIGN KEY
Identifies a row in another table using a unique identifier.
UNIQUE KEY
Checks that values stored in a column are different.
NOT NULL
Ensures that a column cannot be set to NULL.

You can use the following optional modifiers:

ENABLE
Ensures that all incoming data conforms to the constraint.
DISABLE
Does not ensure that all incoming data conforms to the constraint.
VALIDATE
Checks that all existing data in the table conforms to the constraint.
NOVALIDATE
Does not check that all existing data in the table conforms to the constraint.
ENFORCED
Maps to ENABLE NOVALIDATE.
NOT ENFORCED
Maps to DISABLE NOVALIDATE.
RELY
Specifies abiding by a constraint; used by the optimizer to apply further optimizations.
NORELY
Specifies not abiding by a constraint.

You use modifiers as shown in the following syntax:

( ( ( (ENABLE | DISABLE) (VALIDATE | NOVALIDATE) ) | ( ENFORCED | NOT ENFORCED ) ) (RELY | NORELY) )

Defaults modfiers

The following default modifiers are in place:

  • The default modifier for ENABLE is NOVALIDATE RELY.
  • The default modifier for DISABLE is NOVALIDATE NORELY.
  • If you do not specify a modifier when you declare a constraint, the default is ENABLE NOVALIDATE RELY. The following constraints do not support ENABLE:
    • PRIMARY KEY
    • FOREIGN KEY
    • UNIQUE KEY
    To prevent an error, specify a modifier when using these constraints to override the default.

The optimizer uses the constraint information to make smart decisions. The following examples show the use of constraints.

The following example shows how to create a table that declares the NOT NULL in-line constraint to constrain a column.

CREATE TABLE t(a TINYINT, b SMALLINT NOT NULL ENABLE, c INT);

The constrained column b accepts a SMALLINT value as shown in the first INSERT statement.

INSERT INTO t values(2,45,5667);
...
----------------------------------------------------------
1 row affected ...

The constrained column b will not accept a NULL value.

INSERT INTO t values(2,NULL,5667);
Error: Error running query: org.apache.hadoop.hive.ql.exec.errors.DataConstraintViolationError: /
  Either CHECK or NOT NULL constraint violated! (state=,code=0)
The following examples shows how to declare the FOREIGN KEY constraint out of line. You can specify a constraint name, in this case fk, in an out-of-line constraint
CREATE TABLE Persons (   
   ID INT NOT NULL,   
   Name STRING NOT NULL,   
   Age INT,
   Creator STRING DEFAULT CURRENT_USER(),    
   CreateDate DATE DEFAULT CURRENT_DATE(),
   PRIMARY KEY (ID) DISABLE NOVALIDATE);
   
CREATE TABLE BusinessUnit (
   ID INT NOT NULL,    
   Head INT NOT NULL,
   Creator STRING DEFAULT CURRENT_USER(),    
   CreateDate DATE DEFAULT CURRENT_DATE(),
   PRIMARY KEY (ID) DISABLE NOVALIDATE,
   CONSTRAINT fk FOREIGN KEY (Head) REFERENCES Persons(ID) DISABLE NOVALIDATE
   );