As a major RDBMS, Teradata supports several popular Teradata join strategies (or join methods), and has its own unique ways to accomplish the joins.
The most common join strategies are:
Each join strategy has its own pros and cons, and it's hard to say which one is the best, depending on different circumstances. The optimizer will choose the best join strategy based on data demographics, statistics and indexes if any of them are available. Using EXPLAIN can help find out what join strategies are to be adopted.
No matter which join strategy, it is always applied between two tables. The more tables, the more join steps. Rows must be on the same AMP to be joined. So row distribution or duplication is unavoidable for some join strategies.
1. Product Join
This is the most basic and straightforward join strategy. In order to find a match between two tables with a join condition which is not based on equality (>, <, <>), or join conditions are ORed together.
The reason why we call it "Product" join is that, the number of comparisons required is the "product" of the number of rows of both tables. For example, table t1 has 10 rows, and table t2 has 25 rows, then it would require 10×25=250 comparisons to find the matching rows.
When the WHERE clause is missing, it will cause a special product join, called Cartesian Join or Cross Join, which will return all the combination of rows from both tables. In the above example, 250 rows will be returned as the result.
It is referred to as Nested-loops Join by vendors like IBM and Oracle, which also makes sense, when mapping it to the algorithm.
2. Merge Join
This is a much more efficient join strategy. It is adopted when the join conditions are based on equality (=). There is a prerequisite though: the two tables must be sorted based on the join column in advance (actually it's sorted based on the join column row hash sequence). That's why Oracle calls it Sort-Merge Join. That brings a great advantage for this type of join: both tables only need to be scanned once, in an interleaved manner.
Merge join is not necessarily always better than product join, due to the fact that sorting is required. If both tables are huge, sorting can be a tremendous effort.
3. Exclusion Join
This join strategy is used to find non-matching rows. If the query contains "NOT IN" or "EXCEPT", exclusion join will be picked. As a matter of fact, this kind of join can be done as either Merge Join or Product Join.
One thing worth noticing: exclusion merge join is based on set subtraction
operation, and a three-value logic
(TRUE, FALSE, UNKNOWN) will be used when comparisons is done on nullable
columns (or temporary result set).
4. Hash Join
Hash Join gets its name from the fact that one smaller
table is built as "hash-table", and potential matching rows from the second table are searched by hashing against the smaller table.
Usually optimizer will first identify a smaller table, and then sort it by the join column row hash sequence. If the smaller table is really small and can fit in the memory, the performance will be best. Otherwise, the sorted smaller table will beduplicated to all the AMPs. Then the larger table is processed one row at a time by doing a binary search of the smaller table for a match.
Hash Join is also based on equality condition (=).
5. Nested Join
Don't get confused with "Nested-loops Join", which is the term used by Oracle, IBM and Microsoft. In Teradata, Product Join is the counterpart of "Nested-loops Join" in other RDBMS.
However, Nested Join can be seen as an enhanced version of the common "Nested-loops Join", where Teradata takes advantage of its index structure. In order to make Nested Join picked, the following conditions must be satisfied:
1) The join condition is based on equality;
2) The join column is a unique index on one table;
3) The join column is any index on another table.
Based on conditions above it is not hard to infer how Nested Join works. First only one single row will be retrieved from one table with the help of the unique index, and then based on the row hash of that row, another table is accessed by some index.
Nested Join is the most efficient join method in Teradata. It is also the only join method that don't always use all the AMPs.