Teradata: Left Outer Join Vs Right Outer Join

Pradeep

Teradata: Left Outer Join Vs Right Outer Join

Technically Left outer join & Right outer join can be used interchangely, provided tables position are swapped.
Few people have created myth that Left Outer Joins perform better than Right Outer joins and reason for their myth may be because they might have mostly saw Left Outer Joins in codes.
Left Outer join is frequently used instead of Right Outer join because just by looking at the code, one can know which is the driver table (FULL data comes from this table) in the given join. This is just a convention used by most of the developers.

➠ Remember: There is no difference between Left Outer Join and Right Outer Join in terms of performance.


Tutorial_db.product (Count--> 1975164979)
Tutorial_db.product_item (Count--> 1891505240)

LOCK ROW FOR ACCESS SELECT COUNT(*) 
                    FROM tutorial_db.product_item 
                    INNER JOIN tutorial_db.product 
                    ON (tutorial_db.product_item.product_item_id = tutorial_db.product.product_item_id);
Inner/Equi Join Output:
   Count(*)
-----------
 1811349898

➠ Experiment 1: Number of records returned by Left Outer & Right Outer Join for a query

Explanation:

1975164979

Left Outer Join: As shown in the diagram, Left outer join will return all the rows/data from Left table plus only matching data (dark gray area in diagram) based on join from right table.


LEFT OUTER JOIN
================
LOCK ROW FOR ACCESS SELECT COUNT(*) 
                    FROM tutorial_db.product 
                    LEFT OUTER JOIN tutorial_db.product_item 
                    ON (tutorial_db.product_item.product_item_id = tutorial_db.product.product_item_id);
Output:
   Count(*)
-----------
 1975164979

Right Outer Join: As shown in the diagram, Right outer join will return all the rows/data from right table plus only matching data (dark gray area in diagram) based on join from left table.


RIGHT OUTER JOIN
================
LOCK ROW FOR ACCESS SELECT COUNT(*) 
                    FROM tutorial_db.product_item 
                    RIGHT OUTER JOIN tutorial_db.product 
                    ON (tutorial_db.product_item.product_item_id = tutorial_db.product.product_item_id);
Output:
   Count(*)
-----------
 1975164979

➠ Experiment 2: Optimiser's explain plan for Left Outer Join & Right Outer Join queries

Explanation:

LEFT OUTER JOIN: Explain Plan for Left Outer Join Query.


EXPLAIN SELECT * FROM tutorial_db.product
 LEFT OUTER JOIN tutorial_db.product_item 
 ON (tutorial_db.product_item.product_item_id = tutorial_db.product.product_item_id);

Explanation(LEFT OUTER JOIN)
--------------------------------------------------------------------------------
  1) First, we lock tutorial_db.product for read on a reserved
     RowHash in all partitions to prevent global deadlock.
  2) Next, we lock tutorial_db.product_item for read
     on a reserved RowHash in all partitions to prevent global deadlock.
  3) We lock tutorial_db.product for read, and we lock
     tutorial_db.product_item for read.
  4) We execute the following steps in parallel.
       1) We do an all-AMPs RETRIEVE step from
          tutorial_db.product by way of an all-rows scan with
          no residual conditions into Spool 2 (all_amps) (compressed
          columns allowed), which is built locally on the AMPs.  Then
          we do a SORT to order Spool 2 by the hash code of (
          tutorial_db.product.product_item_id).  The size of
          Spool 2 is estimated with high confidence to be 1,975,940,236
          rows (3,056,779,545,092 bytes).  The estimated time for this
          step is 3 minutes and 1 second.
       2) We do an all-AMPs RETRIEVE step from
          tutorial_db.product_item by way of an
          all-rows scan with no residual conditions into Spool 3
          (all_amps) (compressed columns allowed), which is built
          locally on the AMPs.  Then we do a SORT to order Spool 3 by
          the hash code of (
          tutorial_db.product_item.product_item_id).
          The size of Spool 3 is estimated with high confidence to be
          1,892,853,198 rows (2,731,387,164,714 bytes).  The estimated
          time for this step is 2 minutes and 41 seconds.
  5) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of a
     RowHash match scan, which is joined to Spool 3 (Last Use) by way
     of a RowHash match scan.  Spool 2 and Spool 3 are left outer
     joined using a merge join, with a join condition of (
     "product_item_id = product_item_id").  The result goes into Spool 1
     (group_amps), which is built locally on the AMPs.  The result
     spool file will not be cached in memory.  The size of Spool 1 is
     estimated with low confidence to be 2,539,776,456 rows (
     16,820,939,468,088 bytes).  The estimated time for this step is 2
     minutes and 33 seconds.
  6) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 5 minutes and 33 seconds.

RIGHT OUTER JOIN: Explain Plan for Right Outer Join Query.


EXPLAIN SELECT * FROM tutorial_db.product_item 
  RIGHT OUTER JOIN tutorial_db.product
  ON (tutorial_db.product_item.product_item_id = tutorial_db.product.product_item_id);

Explanation (RIGHT OUTER JOIN)
--------------------------------------------------------------------------------
  1) First, we lock tutorial_db.product for read on a reserved
     RowHash in all partitions to prevent global deadlock.
  2) Next, we lock tutorial_db.product_item for read
     on a reserved RowHash in all partitions to prevent global deadlock.
  3) We lock tutorial_db.product for read, and we lock
     tutorial_db.product_item for read.
  4) We execute the following steps in parallel.
       1) We do an all-AMPs RETRIEVE step from
          tutorial_db.product by way of an all-rows scan with
          no residual conditions into Spool 2 (all_amps) (compressed
          columns allowed), which is built locally on the AMPs.  Then
          we do a SORT to order Spool 2 by the hash code of (
          tutorial_db.product.product_item_id).  The size of
          Spool 2 is estimated with high confidence to be 1,975,940,236
          rows (3,056,779,545,092 bytes).  The estimated time for this
          step is 3 minutes and 1 second.
       2) We do an all-AMPs RETRIEVE step from
          tutorial_db.product_item by way of an
          all-rows scan with no residual conditions into Spool 3
          (all_amps) (compressed columns allowed), which is built
          locally on the AMPs.  Then we do a SORT to order Spool 3 by
          the hash code of (
          tutorial_db.product_item.product_item_id).
          The size of Spool 3 is estimated with high confidence to be
          1,892,853,198 rows (2,731,387,164,714 bytes).  The estimated
          time for this step is 2 minutes and 41 seconds.
  5) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of a
     RowHash match scan, which is joined to Spool 3 (Last Use) by way
     of a RowHash match scan.  Spool 2 and Spool 3 are left outer
     joined using a merge join, with a join condition of (
     "product_item_id = product_item_id").  The result goes into Spool 1
     (group_amps), which is built locally on the AMPs.  The result
     spool file will not be cached in memory.  The size of Spool 1 is
     estimated with low confidence to be 2,539,776,456 rows (
     16,820,939,468,088 bytes).  The estimated time for this step is 2
     minutes and 33 seconds.
  6) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 5 minutes and 33 seconds.

Conclusion: There is no difference between Left Outer Join and Right Outer Join in Teradata.

There is no difference between Left Outer Join and Right Outer Join in terms of performance.

dbmstutorials.com

Teradata: Left Outer Join Vs Right Outer Join