Teradata: Left Outer Join Vs Right Outer Join
- Technically Left outer join & Right outer join can be used interchangely, provided tables position are swapped.
- Few people have created myth that Left Outer Joins perform better than Right Outer joins and reason for their myth may be because they might have mostly saw Left Outer Joins in codes.
- Left Outer join is frequently used instead of Right Outer join because just by looking at the code, one can know which is the driver table (FULL data comes from this table) in the given join. This is just a convention used by most of the developers.
➠
Remember: There is no difference between Left Outer Join and Right Outer Join in terms of performance.
Lets prove above statement with examples and for that we will take couple of big tables(product & product_item) for Right outer & Left outer join experiments.
In the experiments, we will going to see if there is output difference when Left Outer Join or Right Outer Join is used and how Teradata optimizer behaves on Right & Left outer joins. Below are the count in tables.
Tutorial_db.product (Count--> 1975164979)
Tutorial_db.product_item (Count--> 1891505240)
LOCK ROW FOR ACCESS SELECT COUNT(*)
FROM tutorial_db.product_item
INNER JOIN tutorial_db.product
ON (tutorial_db.product_item.product_item_id = tutorial_db.product.product_item_id);
Inner/Equi Join Output:
Count(*)
-----------
1811349898
➠
Experiment 1: Number of records returned by Left Outer & Right Outer Join for a query
Explanation: If you notice (check below), Teradata has return exactly same number of rows (1975164979) for both Left & Right Outer Join queries.
Also please note that the tables are swapped to use Left & Right outer join interchangely.
Left Outer Join: As shown in the diagram, Left outer join will return all the rows/data from Left table plus only matching data (dark gray area in diagram) based on join from right table.
LEFT OUTER JOIN
================
LOCK ROW FOR ACCESS SELECT COUNT(*)
FROM tutorial_db.product
LEFT OUTER JOIN tutorial_db.product_item
ON (tutorial_db.product_item.product_item_id = tutorial_db.product.product_item_id);
Output:
Count(*)
-----------
1975164979
Right Outer Join: As shown in the diagram, Right outer join will return all the rows/data from right table plus only matching data (dark gray area in diagram) based on join from left table.
RIGHT OUTER JOIN
================
LOCK ROW FOR ACCESS SELECT COUNT(*)
FROM tutorial_db.product_item
RIGHT OUTER JOIN tutorial_db.product
ON (tutorial_db.product_item.product_item_id = tutorial_db.product.product_item_id);
Output:
Count(*)
-----------
1975164979
➠
Experiment 2: Optimiser's explain plan for Left Outer Join & Right Outer Join queries
EXPLAIN SELECT * FROM tutorial_db.product
LEFT OUTER JOIN tutorial_db.product_item
ON (tutorial_db.product_item.product_item_id = tutorial_db.product.product_item_id);
Explanation(LEFT OUTER JOIN)
--------------------------------------------------------------------------------
1) First, we lock tutorial_db.product for read on a reserved
RowHash in all partitions to prevent global deadlock.
2) Next, we lock tutorial_db.product_item for read
on a reserved RowHash in all partitions to prevent global deadlock.
3) We lock tutorial_db.product for read, and we lock
tutorial_db.product_item for read.
4) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from
tutorial_db.product by way of an all-rows scan with
no residual conditions into Spool 2 (all_amps) (compressed
columns allowed), which is built locally on the AMPs. Then
we do a SORT to order Spool 2 by the hash code of (
tutorial_db.product.product_item_id). The size of
Spool 2 is estimated with high confidence to be 1,975,940,236
rows (3,056,779,545,092 bytes). The estimated time for this
step is 3 minutes and 1 second.
2) We do an all-AMPs RETRIEVE step from
tutorial_db.product_item by way of an
all-rows scan with no residual conditions into Spool 3
(all_amps) (compressed columns allowed), which is built
locally on the AMPs. Then we do a SORT to order Spool 3 by
the hash code of (
tutorial_db.product_item.product_item_id).
The size of Spool 3 is estimated with high confidence to be
1,892,853,198 rows (2,731,387,164,714 bytes). The estimated
time for this step is 2 minutes and 41 seconds.
5) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of a
RowHash match scan, which is joined to Spool 3 (Last Use) by way
of a RowHash match scan. Spool 2 and Spool 3 are left outer
joined using a merge join, with a join condition of (
"product_item_id = product_item_id"). The result goes into Spool 1
(group_amps), which is built locally on the AMPs. The result
spool file will not be cached in memory. The size of Spool 1 is
estimated with low confidence to be 2,539,776,456 rows (
16,820,939,468,088 bytes). The estimated time for this step is 2
minutes and 33 seconds.
6) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 5 minutes and 33 seconds.
RIGHT OUTER JOIN: Explain Plan for Right Outer Join Query.
EXPLAIN SELECT * FROM tutorial_db.product_item
RIGHT OUTER JOIN tutorial_db.product
ON (tutorial_db.product_item.product_item_id = tutorial_db.product.product_item_id);
Explanation (RIGHT OUTER JOIN)
--------------------------------------------------------------------------------
1) First, we lock tutorial_db.product for read on a reserved
RowHash in all partitions to prevent global deadlock.
2) Next, we lock tutorial_db.product_item for read
on a reserved RowHash in all partitions to prevent global deadlock.
3) We lock tutorial_db.product for read, and we lock
tutorial_db.product_item for read.
4) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from
tutorial_db.product by way of an all-rows scan with
no residual conditions into Spool 2 (all_amps) (compressed
columns allowed), which is built locally on the AMPs. Then
we do a SORT to order Spool 2 by the hash code of (
tutorial_db.product.product_item_id). The size of
Spool 2 is estimated with high confidence to be 1,975,940,236
rows (3,056,779,545,092 bytes). The estimated time for this
step is 3 minutes and 1 second.
2) We do an all-AMPs RETRIEVE step from
tutorial_db.product_item by way of an
all-rows scan with no residual conditions into Spool 3
(all_amps) (compressed columns allowed), which is built
locally on the AMPs. Then we do a SORT to order Spool 3 by
the hash code of (
tutorial_db.product_item.product_item_id).
The size of Spool 3 is estimated with high confidence to be
1,892,853,198 rows (2,731,387,164,714 bytes). The estimated
time for this step is 2 minutes and 41 seconds.
5) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of a
RowHash match scan, which is joined to Spool 3 (Last Use) by way
of a RowHash match scan. Spool 2 and Spool 3 are left outer
joined using a merge join, with a join condition of (
"product_item_id = product_item_id"). The result goes into Spool 1
(group_amps), which is built locally on the AMPs. The result
spool file will not be cached in memory. The size of Spool 1 is
estimated with low confidence to be 2,539,776,456 rows (
16,820,939,468,088 bytes). The estimated time for this step is 2
minutes and 33 seconds.
6) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 5 minutes and 33 seconds.
Conclusion: There is no difference between Left Outer Join and Right Outer Join in Teradata.