The Data Studio

Hive: No integrity features

Hive does not support any integrity features.

It does not have:

This means that we have to implement integrity constraints ourselves, causing more development, testing and maintenance work, or accept uncontrolled and unmeasured data quality issues.

The lack of primary and foreign key constraints means that we have to make sure that every row has a primary key and that all relationships in our data are correct and complete.

The lack of "not null" constraints means that we have to check for missing data explicitly.

The lack of Data-type integrity means that Hive fails silently if a data value does not match its data-type definition. The value then may appear to be null or may be truncated or may have been changed to a different value! See Hive: Silent failure on loading data. Again these are silent failures. We have to implement checks explicitly if we want to know about bad values in our data.

All this is extra work, increasing our project costs and delaying delivery. No amount of hand-waving from the vendor will brush away these costs.