Column compression
Kudu allows per-column compression using the LZ4
, Snappy
, or zlib
compression codecs.
By
default, columns that are Bitshuffle-encoded are inherently compressed with the LZ4
compression. Otherwise, columns are stored
uncompressed. Consider using compression if reducing storage space is more important than raw
scan performance.
Every
data set will compress differently, but in general LZ4
is the most efficient codec, while zlib
will compress to the smallest data sizes. Bitshuffle-encoded columns are
automatically compressed using LZ4
, so it is
not recommended to apply additional compression on top of this encoding.