Teradata Join strategies

Pradeep

Teradata Join strategies

Teradata uses different strategies to perform join between two tables. Data distribution and columns selected for joins heavily influence the execution plan and the selected join strategy.

Broadly, there are four types of Join strategies:

Nested Join
Product Join
Merge Join
Hash Join

Nested Join: This join strategies utilises 'Unique Indexes' (be it unique primary index or unique secondary index) from one of the table participating in the join to retrieve single record. Join condition should also match some column from single retrieved row to primary /Secondary index from the second table. Single row then matches one or more rows on the other table .

Explain Plan : Explain plan of 2 table join showing 'Nested join' being utilized by optimizer.


EXPLAIN SELECT * FROM tutorial_db.employee INNER JOIN tutorial_db.department ON (tutorial_db.employee.dept_no=tutorial_db.department.dept_no)
 AND emp_name='Pradeep';

Explanation(Nested Join)
--------------------------------------------------------------------------------
 This request is eligible for incremental planning and execution (IPE)
 but does not meet cost thresholds. The following is the static plan
 for the request.
  1) First, we do a two-AMP JOIN step from tutorial_db.employee by
     way of unique index # 4 "tutorial_db.employee.emp_name =
     'Pradeep'", which is joined to all partitions of
     tutorial_db.department by way of the primary index
     "tutorial_db.department.dept_no =
     tutorial_db.employee.dept_no".  tutorial_db.employee and
     tutorial_db.department are joined using a nested join, with a
     join condition of ("tutorial_db.employee.dept_no =
     tutorial_db.department.dept_no").  The result goes into Spool
     1 (one-amp), which is built locally on the AMPs.  The size of
     Spool 1 is estimated with low confidence to be 2 rows (510 bytes).
     The estimated time for this step is 0.00 seconds.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 0.00 seconds.

Explanation: In the above example, emp_name is the Unique secondary index which is enforcing 'Nested join'

Product Join: Product join compares every row from one table to every row in the second table assumping join condition as 1=1 and apply filter on the resultset if any condition is defined in 'WHERE'. Exampe: If 1st table has 20 rows and 2nd table has 10 records then product join will result in 200 (20X10) rows. In general, product joins are caused by mistakes

Common causes that can result in 'Product join'

Neither join condition nor proper 'Where' clause is specified in query.
Inequality condition is used as the join condition

There are 2 sub strategies for product join

Big table Small table strategy: This strategy comes into play when one table is very small. In this strategy, small table will be duplicated on all the AMPs into spool. Bigger table is then product joined to spool by scaning all-rows.

Explain Plan : Explain plan of 2 table join showing 'Product join' being utilized by optimizer using Big table Small table strategy.


EXPLAIN SELECT employee.*,department.* FROM  tutorial_db.employee, tutorial_db.department;
   
Explanation (Product Join(small table))
--------------------------------------------------------------------------------
  1) First, we lock a distinct tutorial_db."pseudo table" for read on a
     RowHash to prevent global deadlock for tutorial_db.employee.
  2) Next, we lock a distinct tutorial_db."pseudo table" for read on a
     RowHash to prevent global deadlock for tutorial_db.department.
  3) We lock tutorial_db.employee for read, and we lock tutorial_db.department for read.
  4) We do an all-AMPs RETRIEVE step from tutorial_db.employee by way of an
     all-rows scan with no residual conditions into Spool 2 (all_amps),
     which is duplicated on all AMPs.  The size of Spool 2 is estimated
     with low confidence to be 34,596 rows (1,833,588 bytes).  The
     estimated time for this step is 0.02 seconds.
  5) We do an all-AMPs JOIN step from tutorial_db.department by way of an
     all-rows scan with no residual conditions, which is joined to
     Spool 2 (Last Use) by way of an all-rows scan.  tutorial_db.department and
     Spool 2 are joined using a product join, with a join condition of
     ("(1=1)").  The result goes into Spool 1 (group_amps), which is
     built locally on the AMPs.  The size of Spool 1 is estimated with
     low confidence to be 34,596 rows (10,240,416 bytes).  The
     estimated time for this step is 0.01 seconds.
  6) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 0.03 seconds.

Explanation: In the above example, small table is duplicated over all the Amps.

General strategy: In this strategy, all rows from both tables are scanned and moved to separate spools. These spools will then be joined using a product join.

Explain Plan : Explain plan of 2 table join showing 'Product join' being utilized by optimizer using general strategy.


Explanation (Product Join (Big tables))
--------------------------------------------------------------------------------
  1) First, we lock a distinct tutorial_db."pseudo table" for read on a
     RowHash to prevent global deadlock for tutorial_db.a.
  2) Next, we lock tutorial_db.a for read.
  3) We execute the following steps in parallel.
       1) We do an all-AMPs RETRIEVE step from tutorial_db.a by way of an
          all-rows scan with no residual conditions into Spool 2
          (all_amps) (compressed columns allowed), which is built
          locally on the AMPs.  The size of Spool 2 is estimated with
          high confidence to be 5,400,649 rows (23,357,806,925 bytes).
          The estimated time for this step is 8.71 seconds.
       2) We do an all-AMPs RETRIEVE step from tutorial_db.b by way of an
          all-rows scan with no residual conditions into Spool 3
          (all_amps) (compressed columns allowed), which is duplicated
          on all AMPs.  The result spool file will not be cached in
          memory.  The size of Spool 3 is estimated with high
          confidence to be 1,004,520,714 rows (4,344,552,088,050 bytes).
          The estimated time for this step is 32 minutes and 15 seconds.
  4) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
     all-rows scan, which is joined to Spool 3 (Last Use) by way of an
     all-rows scan.  Spool 2 and Spool 3 are joined using a product
     join, with a join condition of ("(1=1)").  The result goes into
     Spool 1 (group_amps), which is built locally on the AMPs.  The
     result spool file will not be cached in memory.  The size of Spool
     1 is estimated with high confidence to be 29,167,009,621,201 rows
     (*** bytes).  The estimated time for this step is 19,592 hours and
     57 minutes.
  5) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 19,593 hours and 29
     minutes.

Explanation: In the above example, both tables are scanned and moved to separate spools.

Merge Join: This is the most common join strategy which comes into play for equi joins and exclusions as well.

There are 5 sub strategies for merge join

PI-PI Join
PI- No PI Join
No PI - No PI join
Exclusion Join
Big table Small table strategy

PI-PI Join: This strategy comes into play when join columns of both the table are primary index of corresponding tables. Rows will be directly joined (merge join) on local Amps without moving them to spool for joining.

Explain Plan : Explain plan of 2 table join showing 'Merge join' being utilized by optimizer using PI(Primary index) -PI(Primary index) Join.


EXPLAIN SELECT * FROM tutorial_db.employee a INNER JOIN tutorial_db.employee b ON (a.emp_no=b.emp_no);

Explanation (Merge Join (PI-PI Join))
--------------------------------------------------------------------------------
  1) First, we lock a distinct tutorial_db."pseudo table" for read on a
     RowHash to prevent global deadlock for tutorial_db.b.
  2) Next, we lock tutorial_db.b for read.
  3) We do an all-AMPs JOIN step from tutorial_db.b by way of a RowHash
     match scan, which is joined to tutorial_db.a by way of a RowHash match
     scan.  tutorial_db.b and tutorial_db.a are joined using a merge join, with
     a join condition of ("tutorial_db.a.emp_no = tutorial_db.b.emp_no").  The
     result goes into Spool 1 (group_amps), which is built locally on
     the AMPs.  The size of Spool 1 is estimated with low confidence to
     be 186 rows (64,542 bytes).  The estimated time for this step is
     0.01 seconds.
  4) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 0.01 seconds.

Explanation: In the above example,emp_no is the primary index and both tables are joined in the Amp local without moving the tables to spool.

PI-No PI Join: This strategy comes into play when join columns of one table is primary index and join column of second table is not primary index. Rows from second table will be hashed on the join columns and sorted by rowhash. Rowhash is then sent to spool on the corresponding Amps to join with the 1st table. 1st table is then merged joined to spool by scaning all-rows.

Explain Plan : Explain plan of 2 table join showing 'Merge join' being utilized by optimizer using PI(Primary index) - no PI(Primary index) Join.


EXPLAIN SELECT * FROM tutorial_db.employee a INNER JOIN tutorial_db.employee b on (a.emp_no=b.manager_id);
 
Explanation (Merge Join (PI- No PI Join))
--------------------------------------------------------------------------------
  1) First, we lock a distinct tutorial_db."pseudo table" for read on a
     RowHash to prevent global deadlock for tutorial_db.b.
  2) Next, we lock tutorial_db.b for read.
  3) We do an all-AMPs RETRIEVE step from tutorial_db.b by way of an
     all-rows scan with a condition of ("NOT (tutorial_db.b.manager_id IS
     NULL)") into Spool 2 (all_amps), which is redistributed by the
     hash code of (tutorial_db.b.manager_id) to all AMPs.  Then we do a
     SORT to order Spool 2 by row hash.  The size of Spool 2 is
     estimated with low confidence to be 186 rows (13,950 bytes).  The
     estimated time for this step is 0.00 seconds.
  4) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of a
     RowHash match scan, which is joined to tutorial_db.a by way of a
     RowHash match scan.  Spool 2 and tutorial_db.a are joined using a
     merge join, with a join condition of ("tutorial_db.a.emp_no =
     manager_id").  The result goes into Spool 1 (group_amps), which is
     built locally on the AMPs.  The size of Spool 1 is estimated with
     index join confidence to be 186 rows (64,542 bytes).  The
     estimated time for this step is 0.01 seconds.
  5) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 0.02 seconds.

Explanation: In the above example,emp_no is the primary index in 1 table and manager_id is not PI column. Rows from 'b' table will be hashed and redistributed in the corresponding Amps.

No PI-No PI join: This strategy comes into play when join columns of both tables are not primary index column. Rows from both tables are scanned and hashed on the join columns. Rowhash is then sorted and moved to separate spools for each table. These spools will be joined with each other using a merge join.

Explain Plan : Explain plan of 2 table join showing 'Merge join' being utilized by optimizer using no PI(Primary index) - noPI(Primary index) Join.


EXPLAIN SELECT * FROM tutorial_db.employee a INNER JOIN tutorial_db.dept_Pradeep b ON (a.emp_name =b.department_name);

Explanation (Merge Join (No PI- No PI Join))
--------------------------------------------------------------------------------
  1) First, we lock a distinct tutorial_db."pseudo table" for read on a
     RowHash to prevent global deadlock for tutorial_db.a.
  2) Next, we lock a distinct tutorial_db."pseudo table" for read on a
     RowHash to prevent global deadlock for tutorial_db.b.
  3) We lock tutorial_db.a for read, and we lock tutorial_db.b for read.
  4) We execute the following steps in parallel.
       1) We do an all-AMPs RETRIEVE step from tutorial_db.b by way of an
          all-rows scan with a condition of ("NOT (tutorial_db.b.department_name IS
          NULL)") into Spool 2 (all_amps), which is redistributed by
          the hash code of (tutorial_db.b.department_name) to all AMPs.  Then we do a
          SORT to order Spool 2 by row hash.  The size of Spool 2 is
          estimated with low confidence to be 186 rows (13,764 bytes).
          The estimated time for this step is 0.00 seconds.
       2) We do an all-AMPs RETRIEVE step from tutorial_db.a by way of an
          all-rows scan with a condition of ("NOT (tutorial_db.a.emp_name
          IS NULL)") into Spool 3 (all_amps), which is redistributed by
          the hash code of (tutorial_db.a.emp_name) to all AMPs.  Then we
          do a SORT to order Spool 3 by row hash.  The size of Spool 3
          is estimated with high confidence to be 434 rows (32,550
          bytes).  The estimated time for this step is 0.00 seconds.
  5) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of a
     RowHash match scan, which is joined to Spool 3 (Last Use) by way
     of a RowHash match scan.  Spool 2 and Spool 3 are joined using a
     merge join, with a join condition of ("emp_name = department_name").  The
     result goes into Spool 1 (group_amps), which is built locally on
     the AMPs.  The size of Spool 1 is estimated with no confidence to
     be 3,875 rows (1,375,625 bytes).  The estimated time for this step
     is 0.01 seconds.
  6) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 0.02 seconds.

Explanation: In the above example,emp_name & department_name are not primary index of both the tables. Rows from both table will be hashed and redistributed to separate spools before joining in the final spool.

Exclusion Join: This strategy comes into play when one table is used to exclude data from other table. Exclusion join is applicable when 'NOT EXISTS' and 'NOT IN' are used.

Explain Plan : Explain plan of 2 table join showing 'exclusion Merge join' being utilized by optimizer using Exclusion Join.


EXPLAIN SELECT * FROM tutorial_db.employee a WHERE NOT EXISTS (SELECT 1 FROM tutorial_db.employee b WHERE a.emp_no=b.manager_id);

Explanation (Exclusion Merge Join)
--------------------------------------------------------------------------------
  1) First, we lock a distinct tutorial_db."pseudo table" for read on a
     RowHash to prevent global deadlock for tutorial_db.b.
  2) Next, we lock tutorial_db.b for read.
  3) We do an all-AMPs RETRIEVE step from tutorial_db.b by way of an
     all-rows scan with a condition of ("NOT (tutorial_db.b.manager_id IS
     NULL)") into Spool 2 (all_amps), which is redistributed by the
     hash code of (tutorial_db.b.manager_id) to all AMPs.  Then we do a
     SORT to order Spool 2 by row hash and the sort key in spool field1
     eliminating duplicate rows.  The size of Spool 2 is estimated with
     high confidence to be 7 rows (175 bytes).  The estimated time for
     this step is 0.00 seconds.
  4) We do an all-AMPs JOIN step from tutorial_db.a by way of an all-rows
     scan with no residual conditions, which is joined to Spool 2 (Last
     Use) by way of an all-rows scan.  tutorial_db.a and Spool 2 are joined
     using an exclusion merge join, with a join condition of ("(NOT
     (tutorial_db.a.emp_no IS NULL )) AND (tutorial_db.a.emp_no = manager_id)")
     where unknown comparison will be ignored.  The result goes into
     Spool 1 (group_amps), which is built locally on the AMPs.  The
     size of Spool 1 is estimated with index join confidence to be 434
     rows (79,856 bytes).  The estimated time for this step is 0.01
     seconds.
  5) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 0.02 seconds.

Explanation: In the above example,manager_id of one table is used to exclude records from other table.

Big table Small table strategy: This strategy comes into play when one table is very small. In this strategy, small table will be duplicated on all the AMPs into spool. Bigger table is then merged joined to spool by scaning all-rows.

Hash Join: This join strategy is family of merge join but it perform much better than Merge Join & Product join. Hash join will be used when one table must be small enough to fit in the memory completely.

There are 2 sub strategies for hash join

Dynamic hash join: One table must be small enough to fit in the memory completely into a single partition and it should be very small compared to other table. Dynamic hash join provide ability to join the tables without moving large table to spool. This will only work when two tables are joined based on non‑primary index columns.

Explain Plan : Explain plan of 2 table join showing 'Hash join' being utilized by optimizer using Dynamic Hash Join.


Explanation (Hash Join (Single Partition)(Dynamic Hash join)
--------------------------------------------------------------------------------
  1) First, we lock a distinct tutorial_db."pseudo table" for read on a
     RowHash to prevent global deadlock for tutorial_db.employee_small_data.
  2) Next, we lock a distinct tutorial_db."pseudo table" for read on a
     RowHash to prevent global deadlock for tutorial_db.employee_large_data.
  3) We lock tutorial_db.employee_small_data for read, and we lock
     tutorial_db.employee_large_data for read.
  4) We do an all-AMPs RETRIEVE step from tutorial_db.employee_small_data by
     way of an all-rows scan with a condition of ("(NOT
     (tutorial_db.employee_small_data.start_gmt_ts IS NULL )) AND (NOT
     (tutorial_db.employee_small_data.agent_local_dt IS NULL ))") into Spool
     2 (all_amps) (compressed columns allowed), which is duplicated on
     all AMPs.  The size of Spool 2 is estimated with low confidence to
     be 34,596 rows (149,627,700 bytes).  The estimated time for this
     step is 0.08 seconds.
  5) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
     all-rows scan, which is joined to tutorial_db.employee_large_data by way
     of an all-rows scan with a condition of ("NOT
     (tutorial_db.employee_large_data.agent_local_dt IS NULL)").  Spool 2 and
     tutorial_db.employee_large_data are joined using a dynamic hash join, with
     a join condition of ("(agent_local_dt =
     tutorial_db.employee_large_data.agent_local_dt) AND (start_gmt_ts =
     tutorial_db.employee_large_data.start_gmt_ts)").  The result goes into
     Spool 1 (group_amps), which is built locally on the AMPs.  The
     size of Spool 1 is estimated with no confidence to be 432,251 rows
     (9,956,469,534 bytes).  The estimated time for this step is 6.19
     seconds.
  6) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 6.27 seconds.

Explanation: In the above example,small table is completely fit in the memory resulting in hash join.

Classic / Partitioned hash join: When one table is too large to fit into a single partition memory available for hash processing, the system breaks the table into multiple smaller partition which are small enough to fit in the available memory. Hash join partitions are created on both the joining tables Hash join partitions are created by hashing both table rows on their join columns in such a way that rows from one table's hash join partition can only match with rows in the other table's hash join partition.
There can be maximum of 50 partition in which table can be divided.

Explain Plan : Explain plan of 2 table join showing 'Hash join' being utilized by optimizer using Classic/Partitioned Hash Join.


Explanation (Hash Join (Multiple Partition)(Classic Hash join)
--------------------------------------------------------------------------------
  1) First, we lock a distinct tutorial_db."pseudo table" for read on a
     RowHash to prevent global deadlock for tutorial_db.employee_small_data.
  2) Next, we lock a distinct tutorial_db."pseudo table" for read on a
     RowHash to prevent global deadlock for tutorial_db.employee_large_data.
  3) We lock tutorial_db.employee_small_data for read, and we lock
     tutorial_db.employee_large_data for read.
  4) We execute the following steps in parallel.
       1) We do an all-AMPs RETRIEVE step from
          tutorial_db.employee_small_data by way of an all-rows scan with a
          condition of ("(NOT (tutorial_db.employee_small_data.start_gmt_ts
          IS NULL )) AND (NOT
          (tutorial_db.employee_small_data.agent_local_dt IS NULL ))") into
          Spool 2 (all_amps) (compressed columns allowed) fanned out
          into 8 hash join partitions , which is duplicated on all AMPs.
          The size of Spool 2 is estimated with low confidence to be
          172,980 rows (748,138,500 bytes).  The estimated time for
          this step is 0.34 seconds.
       2) We do an all-AMPs RETRIEVE step from tutorial_db.employee_large_data
          by way of an all-rows scan with a condition of ("(NOT
          (tutorial_db.employee_large_data.start_gmt_ts IS NULL )) AND (NOT
          (tutorial_db.employee_large_data.agent_local_dt IS NULL ))") into
          Spool 3 (all_amps) (compressed columns allowed) fanned out
          into 8 hash join partitions, which is built locally on the
          AMPs.  The size of Spool 3 is estimated with high confidence
          to be 5,400,649 rows (23,357,806,925 bytes).  The estimated
          time for this step is 8.71 seconds.
  5) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
     all-rows scan, which is joined to Spool 3 (Last Use) by way of an
     all-rows scan.  Spool 2 and Spool 3 are joined using a hash join
     of 8 partitions, with a join condition of ("(agent_local_dt =
     agent_local_dt) AND (start_gmt_ts = start_gmt_ts)").  The result
     goes into Spool 1 (group_amps), which is built locally on the AMPs.
     The size of Spool 1 is estimated with no confidence to be
     2,161,255 rows (49,782,347,670 bytes).  The estimated time for
     this step is 7.79 seconds.
  6) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 16.50 seconds.

Explanation: In the above example,1 table is small and it is internally divided into 8 partitions.