Teradata Access Paths
Indexes are used to expediate row access in a table by not having to search the entire table for a row.The Primary Index is the mechanism for assigning a data row to a VPROC and a location on the VPROCs disk. When a table is created it must have a Primary Index specified, and the Primary Index cannot be changed after that without dropping and recreating the table. The Primary Index is not required to be the Primary Key.
The Secondary Index is an alternate path to the data. Secondary Indexes are used to improve performance by allowing the user to avoid scanning the entire table. The Secondary Index has no influence on the way the rows are distributed amoung the VPROCs. Secondary Indexes are optional and can be created and dropped dynamically. Secondary Indexes require separate subtables which require extra I/O to maintain the indexes.
Primary Key vs. Primary Index in Teradata
The Primary Key is a relational data modeling tool that defines the columns that uniquely define a row. The Primary Index is a physical implementation that defines the actual columns used to distribute and access the rows in a table. Below are a few comparisons and differences between the two in Teradata…
|Relational Data Model – Primary Key||Teradata Physical Implementation – Primary Index|
|Logical concept of the data model||Physical mechanism for storage|
|Teradata uses only for R.I.||Teradata must have exactly one|
|No limits on columns||16 column limit|
|Documented in the data model||Defined by SQL create table|
|Must be unique and unchanging||May be unique on non-unique and data can change|
|Identifies each row||Assigns each row to a specific VPROC and data block|
|Values cannot be changed||Values may be changed (Insert and Delete)|
|May not be NULL||May be NULL|
|Does not imply access path||Defines most common access path|
|Chosen for logical correctness||Chosen for physical performance|
Data Distribution Using the Primary Index Value
Hashing is a standard data processing technique tha takes in a data value and systematically mixes it up so that the incoming values are converted to a number in a specified range. Teradata's Hashing algorithm is different from other systems in that it does not require the algorithm to be customized to the data. It uses a hash VPROC that contains the buckets that points to a signal VPROC. The Teradata Hash Map contains 65,536 buckets (entries). With the destination VPROC, the full 32-bit has plus the table ID is used to assign the row to a cylinder and data block in the VPROC's disk storage. There can be 4,000,000,000 different row hash values.
The rows of all tables are distributed across the VPROCs according to their Primary Index value. The Primary Index value goes into the hashing algorithm and the output is a 32-bit Row Hash. The Destination Selection Word (DSW) is used to identify a hash VPROC entry. That entry identifies the VPROC that will be sent the message. The remaining 16 bits are not used by the YNET interface. The entire 32-bit Row Hash will be used by the selected VPROC to locate the row within the disk space.
Primary Index vs. Secondary Index
Teradata allows the use of Primary and Secondary Indexes, however, there are few differences between the two. In Teradata, Primary Indexes are required and Secondary Indexes are optional. Each table must have a Primary Index for distributing the rows among the VPROCs. Each table can have only one Primary Inde, but it can have up to 32 Secondary Indexes. Both Primary and Secondary Indexes may contain up to 16 columns, and they both may be unique or non-unique. Row distribution among VPROCs is only affected by the Primary Index not the Secondary Index. Secondary Indexes may be created and dropped at any time, but if the Primary Index must be changed or dropped, the table must be dropped and recreated. Again, both the Primary and Secondary Indexes will affect the table performance, so poorly chosen indexes can cause some VPROCs to work harder than others.
Full Table Scans
A Full Table Scan is another way to access data without using the Primary Index. In evaluating a SQL request, the Parser examines all possible access methods and chooses the one it believes to be the most efficient. The coding of the SQL request with the demographics of the table and the availability of indexes all play a role in the decision of the Parser. Some coding constructs always cause a Full Table Scan. In other cases, it might be choosen because it is the most efficient method. In general, if the number of physical reads exceeds the number of data blocks then a Full Table Scan may be faster.
In the Teradata Database System, Full Table Scans are always an all VPROC Operation, because every VPROC reads every row of the table. Because of the parallelism in the Teradata Database sytem, Full Table Scans are efficient; however, in other database systems without parallelism Full Table Scans can be an extremely time consuming event.
Some examples of when a Full Table Scan is performed:
- SQL statement does not contain the WHERE statement.
- The WHERE statement does not use the Primary or Secondary index.
- SQL statement uses a partial value (LIKE, …) in the WHERE statement. If a complete index is not given, a Full Table Scan will be performed.
- SQL statement uses inequality operators (<, >, …) in the WHERE statement.