Multi-Value Compression (MVC), also called field compression, allows compression to be applied at a column level. MVC replaces values specified by the user with a compact bit pattern. When the data is accessed, the bit pattern is used to look up the original value in the list (or dictionary).
For the character data type, before Teradata 13.10, MVC was restricted to fixed-length character columns (CHAR data type) and columns up to 255 characters wide. Teradata 13.10 onwards, MVC can be used with variable-length character columns (VARCHAR data type) and columns up to 510 characters wide. It can support any numeric type, all character data, GRAPHIC, VARGRAPHIC, BYTE and VARBYTE.
Compression is effectively used to save space for repeating data.
For example, the NEW_EMP_NAME column, the value always ranges between 1 to 18. So, its better to compress the value and store it in the table-header.
Sample Script for compression:
CREATE SET TABLE EMPLOYEE.EMPDETAIL ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
NEW_EMP_MGR COMPRESS INTEGER NOT NULL DEFAULT 0 COMPRESS (101 ,22 ,304 ,498 ,5 ,6 ,7 ,8 ,9 ,10 ,11 ,12 ,13 ,14 ,15 ,16 ,17 ,18 ),
PRIMARY INDEX ( EMP_ID )
PARTITION BY RANGE_N(HIRING_DT BETWEEN DATE '2006-01-01' AND DATE '2010-12-31' EACH INTERVAL '1' MONTH ,DATE '2011-01-01' AND DATE '2013-12-31' EACH INTERVAL '1' DAY , NO RANGE, UNKNOWN)
Real-life Scenario faced:
- MVC leading to long-running queries and using extra CPU cycles.
- Compressed columns are not used for Partition Primary Index
Explanation: If you have a compression on a secondary index, the data will be skewed – ( as data is distributed according to PI).
Now, there is a small overhead of fetching data based on MVC value. This will be magnified if the data is skewed. Hence, for long-running queries, the hot-amp may become very CPU-intensive.
Hence, we suggest to use MVC for cold-data. The columns in SCD that are descriptive and not used in joins.