Teradata Join strategies

Teradata Join strategies

Teradata uses different strategies to perform join between two tables. Data distribution and columns selected for joins heavily influence the execution plan and the selected join strategy.

Broadly, there are four types of Join strategies:
Nested Join: This join strategies utilises 'Unique Indexes' (be it unique primary index or unique secondary index) from one of the table participating in the join to retrieve single record. Join condition should also match some column from single retrieved row to primary /Secondary index from the second table. Single row then matches one or more rows on the other table .



Product Join: Product join compares every row from one table to every row in the second table assumping join condition as 1=1 and apply filter on the resultset if any condition is defined in 'WHERE'. Exampe: If 1st table has 20 rows and 2nd table has 10 records then product join will result in 200 (20X10) rows. In general, product joins are caused by mistakes



Merge Join: This is the most common join strategy which comes into play for equi joins and exclusions as well.

There are 5 sub strategies for merge join

  1. PI-PI Join
  2. PI- No PI Join
  3. No PI - No PI join
  4. Exclusion Join
  5. Big table Small table strategy
  1. PI-PI Join: This strategy comes into play when join columns of both the table are primary index of corresponding tables. Rows will be directly joined (merge join) on local Amps without moving them to spool for joining.

    Explain Plan : Explain plan of 2 table join showing 'Merge join' being utilized by optimizer using PI(Primary index) -PI(Primary index) Join.
    
    EXPLAIN SELECT * FROM tutorial_db.employee a INNER JOIN tutorial_db.employee b ON (a.emp_no=b.emp_no);
    
    Explanation (Merge Join (PI-PI Join))
    --------------------------------------------------------------------------------
      1) First, we lock a distinct tutorial_db."pseudo table" for read on a
         RowHash to prevent global deadlock for tutorial_db.b.
      2) Next, we lock tutorial_db.b for read.
      3) We do an all-AMPs JOIN step from tutorial_db.b by way of a RowHash
         match scan, which is joined to tutorial_db.a by way of a RowHash match
         scan.  tutorial_db.b and tutorial_db.a are joined using a merge join, with
         a join condition of ("tutorial_db.a.emp_no = tutorial_db.b.emp_no").  The
         result goes into Spool 1 (group_amps), which is built locally on
         the AMPs.  The size of Spool 1 is estimated with low confidence to
         be 186 rows (64,542 bytes).  The estimated time for this step is
         0.01 seconds.
      4) Finally, we send out an END TRANSACTION step to all AMPs involved
         in processing the request.
      -> The contents of Spool 1 are sent back to the user as the result of
         statement 1.  The total estimated time is 0.01 seconds.
    
    Explanation: In the above example,emp_no is the primary index and both tables are joined in the Amp local without moving the tables to spool.

  2. PI-No PI Join: This strategy comes into play when join columns of one table is primary index and join column of second table is not primary index. Rows from second table will be hashed on the join columns and sorted by rowhash. Rowhash is then sent to spool on the corresponding Amps to join with the 1st table. 1st table is then merged joined to spool by scaning all-rows.

    Explain Plan : Explain plan of 2 table join showing 'Merge join' being utilized by optimizer using PI(Primary index) - no PI(Primary index) Join.
    
    EXPLAIN SELECT * FROM tutorial_db.employee a INNER JOIN tutorial_db.employee b on (a.emp_no=b.manager_id);
     
    Explanation (Merge Join (PI- No PI Join))
    --------------------------------------------------------------------------------
      1) First, we lock a distinct tutorial_db."pseudo table" for read on a
         RowHash to prevent global deadlock for tutorial_db.b.
      2) Next, we lock tutorial_db.b for read.
      3) We do an all-AMPs RETRIEVE step from tutorial_db.b by way of an
         all-rows scan with a condition of ("NOT (tutorial_db.b.manager_id IS
         NULL)") into Spool 2 (all_amps), which is redistributed by the
         hash code of (tutorial_db.b.manager_id) to all AMPs.  Then we do a
         SORT to order Spool 2 by row hash.  The size of Spool 2 is
         estimated with low confidence to be 186 rows (13,950 bytes).  The
         estimated time for this step is 0.00 seconds.
      4) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of a
         RowHash match scan, which is joined to tutorial_db.a by way of a
         RowHash match scan.  Spool 2 and tutorial_db.a are joined using a
         merge join, with a join condition of ("tutorial_db.a.emp_no =
         manager_id").  The result goes into Spool 1 (group_amps), which is
         built locally on the AMPs.  The size of Spool 1 is estimated with
         index join confidence to be 186 rows (64,542 bytes).  The
         estimated time for this step is 0.01 seconds.
      5) Finally, we send out an END TRANSACTION step to all AMPs involved
         in processing the request.
      -> The contents of Spool 1 are sent back to the user as the result of
         statement 1.  The total estimated time is 0.02 seconds.
    
    Explanation: In the above example,emp_no is the primary index in 1 table and manager_id is not PI column. Rows from 'b' table will be hashed and redistributed in the corresponding Amps.

  3. No PI-No PI join: This strategy comes into play when join columns of both tables are not primary index column. Rows from both tables are scanned and hashed on the join columns. Rowhash is then sorted and moved to separate spools for each table. These spools will be joined with each other using a merge join.

    Explain Plan : Explain plan of 2 table join showing 'Merge join' being utilized by optimizer using no PI(Primary index) - noPI(Primary index) Join.
    
    EXPLAIN SELECT * FROM tutorial_db.employee a INNER JOIN tutorial_db.dept_Pradeep b ON (a.emp_name =b.department_name);
    
    Explanation (Merge Join (No PI- No PI Join))
    --------------------------------------------------------------------------------
      1) First, we lock a distinct tutorial_db."pseudo table" for read on a
         RowHash to prevent global deadlock for tutorial_db.a.
      2) Next, we lock a distinct tutorial_db."pseudo table" for read on a
         RowHash to prevent global deadlock for tutorial_db.b.
      3) We lock tutorial_db.a for read, and we lock tutorial_db.b for read.
      4) We execute the following steps in parallel.
           1) We do an all-AMPs RETRIEVE step from tutorial_db.b by way of an
              all-rows scan with a condition of ("NOT (tutorial_db.b.department_name IS
              NULL)") into Spool 2 (all_amps), which is redistributed by
              the hash code of (tutorial_db.b.department_name) to all AMPs.  Then we do a
              SORT to order Spool 2 by row hash.  The size of Spool 2 is
              estimated with low confidence to be 186 rows (13,764 bytes).
              The estimated time for this step is 0.00 seconds.
           2) We do an all-AMPs RETRIEVE step from tutorial_db.a by way of an
              all-rows scan with a condition of ("NOT (tutorial_db.a.emp_name
              IS NULL)") into Spool 3 (all_amps), which is redistributed by
              the hash code of (tutorial_db.a.emp_name) to all AMPs.  Then we
              do a SORT to order Spool 3 by row hash.  The size of Spool 3
              is estimated with high confidence to be 434 rows (32,550
              bytes).  The estimated time for this step is 0.00 seconds.
      5) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of a
         RowHash match scan, which is joined to Spool 3 (Last Use) by way
         of a RowHash match scan.  Spool 2 and Spool 3 are joined using a
         merge join, with a join condition of ("emp_name = department_name").  The
         result goes into Spool 1 (group_amps), which is built locally on
         the AMPs.  The size of Spool 1 is estimated with no confidence to
         be 3,875 rows (1,375,625 bytes).  The estimated time for this step
         is 0.01 seconds.
      6) Finally, we send out an END TRANSACTION step to all AMPs involved
         in processing the request.
      -> The contents of Spool 1 are sent back to the user as the result of
         statement 1.  The total estimated time is 0.02 seconds.
    
    Explanation: In the above example,emp_name & department_name are not primary index of both the tables. Rows from both table will be hashed and redistributed to separate spools before joining in the final spool.

  4. Exclusion Join: This strategy comes into play when one table is used to exclude data from other table. Exclusion join is applicable when 'NOT EXISTS' and 'NOT IN' are used.

    Explain Plan : Explain plan of 2 table join showing 'exclusion Merge join' being utilized by optimizer using Exclusion Join.
    
    EXPLAIN SELECT * FROM tutorial_db.employee a WHERE NOT EXISTS (SELECT 1 FROM tutorial_db.employee b WHERE a.emp_no=b.manager_id);
    
    Explanation (Exclusion Merge Join)
    --------------------------------------------------------------------------------
      1) First, we lock a distinct tutorial_db."pseudo table" for read on a
         RowHash to prevent global deadlock for tutorial_db.b.
      2) Next, we lock tutorial_db.b for read.
      3) We do an all-AMPs RETRIEVE step from tutorial_db.b by way of an
         all-rows scan with a condition of ("NOT (tutorial_db.b.manager_id IS
         NULL)") into Spool 2 (all_amps), which is redistributed by the
         hash code of (tutorial_db.b.manager_id) to all AMPs.  Then we do a
         SORT to order Spool 2 by row hash and the sort key in spool field1
         eliminating duplicate rows.  The size of Spool 2 is estimated with
         high confidence to be 7 rows (175 bytes).  The estimated time for
         this step is 0.00 seconds.
      4) We do an all-AMPs JOIN step from tutorial_db.a by way of an all-rows
         scan with no residual conditions, which is joined to Spool 2 (Last
         Use) by way of an all-rows scan.  tutorial_db.a and Spool 2 are joined
         using an exclusion merge join, with a join condition of ("(NOT
         (tutorial_db.a.emp_no IS NULL )) AND (tutorial_db.a.emp_no = manager_id)")
         where unknown comparison will be ignored.  The result goes into
         Spool 1 (group_amps), which is built locally on the AMPs.  The
         size of Spool 1 is estimated with index join confidence to be 434
         rows (79,856 bytes).  The estimated time for this step is 0.01
         seconds.
      5) Finally, we send out an END TRANSACTION step to all AMPs involved
         in processing the request.
      -> The contents of Spool 1 are sent back to the user as the result of
         statement 1.  The total estimated time is 0.02 seconds.
    
    Explanation: In the above example,manager_id of one table is used to exclude records from other table.

  5. Big table Small table strategy: This strategy comes into play when one table is very small. In this strategy, small table will be duplicated on all the AMPs into spool. Bigger table is then merged joined to spool by scaning all-rows.


Hash Join: This join strategy is family of merge join but it perform much better than Merge Join & Product join. Hash join will be used when one table must be small enough to fit in the memory completely.

There are 2 sub strategies for hash join
  1. Dynamic hash join: One table must be small enough to fit in the memory completely into a single partition and it should be very small compared to other table. Dynamic hash join provide ability to join the tables without moving large table to spool. This will only work when two tables are joined based on non‑primary index columns.

    Explain Plan : Explain plan of 2 table join showing 'Hash join' being utilized by optimizer using Dynamic Hash Join.
    
    Explanation (Hash Join (Single Partition)(Dynamic Hash join)
    --------------------------------------------------------------------------------
      1) First, we lock a distinct tutorial_db."pseudo table" for read on a
         RowHash to prevent global deadlock for tutorial_db.employee_small_data.
      2) Next, we lock a distinct tutorial_db."pseudo table" for read on a
         RowHash to prevent global deadlock for tutorial_db.employee_large_data.
      3) We lock tutorial_db.employee_small_data for read, and we lock
         tutorial_db.employee_large_data for read.
      4) We do an all-AMPs RETRIEVE step from tutorial_db.employee_small_data by
         way of an all-rows scan with a condition of ("(NOT
         (tutorial_db.employee_small_data.start_gmt_ts IS NULL )) AND (NOT
         (tutorial_db.employee_small_data.agent_local_dt IS NULL ))") into Spool
         2 (all_amps) (compressed columns allowed), which is duplicated on
         all AMPs.  The size of Spool 2 is estimated with low confidence to
         be 34,596 rows (149,627,700 bytes).  The estimated time for this
         step is 0.08 seconds.
      5) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
         all-rows scan, which is joined to tutorial_db.employee_large_data by way
         of an all-rows scan with a condition of ("NOT
         (tutorial_db.employee_large_data.agent_local_dt IS NULL)").  Spool 2 and
         tutorial_db.employee_large_data are joined using a dynamic hash join, with
         a join condition of ("(agent_local_dt =
         tutorial_db.employee_large_data.agent_local_dt) AND (start_gmt_ts =
         tutorial_db.employee_large_data.start_gmt_ts)").  The result goes into
         Spool 1 (group_amps), which is built locally on the AMPs.  The
         size of Spool 1 is estimated with no confidence to be 432,251 rows
         (9,956,469,534 bytes).  The estimated time for this step is 6.19
         seconds.
      6) Finally, we send out an END TRANSACTION step to all AMPs involved
         in processing the request.
      -> The contents of Spool 1 are sent back to the user as the result of
         statement 1.  The total estimated time is 6.27 seconds.
    
    Explanation: In the above example,small table is completely fit in the memory resulting in hash join.

  2. Classic / Partitioned hash join: When one table is too large to fit into a single partition memory available for hash processing, the system breaks the table into multiple smaller partition which are small enough to fit in the available memory. Hash join partitions are created on both the joining tables Hash join partitions are created by hashing both table rows on their join columns in such a way that rows from one table's hash join partition can only match with rows in the other table's hash join partition.
    There can be maximum of 50 partition in which table can be divided.

    Explain Plan : Explain plan of 2 table join showing 'Hash join' being utilized by optimizer using Classic/Partitioned Hash Join.
    
    Explanation (Hash Join (Multiple Partition)(Classic Hash join)
    --------------------------------------------------------------------------------
      1) First, we lock a distinct tutorial_db."pseudo table" for read on a
         RowHash to prevent global deadlock for tutorial_db.employee_small_data.
      2) Next, we lock a distinct tutorial_db."pseudo table" for read on a
         RowHash to prevent global deadlock for tutorial_db.employee_large_data.
      3) We lock tutorial_db.employee_small_data for read, and we lock
         tutorial_db.employee_large_data for read.
      4) We execute the following steps in parallel.
           1) We do an all-AMPs RETRIEVE step from
              tutorial_db.employee_small_data by way of an all-rows scan with a
              condition of ("(NOT (tutorial_db.employee_small_data.start_gmt_ts
              IS NULL )) AND (NOT
              (tutorial_db.employee_small_data.agent_local_dt IS NULL ))") into
              Spool 2 (all_amps) (compressed columns allowed) fanned out
              into 8 hash join partitions , which is duplicated on all AMPs.
              The size of Spool 2 is estimated with low confidence to be
              172,980 rows (748,138,500 bytes).  The estimated time for
              this step is 0.34 seconds.
           2) We do an all-AMPs RETRIEVE step from tutorial_db.employee_large_data
              by way of an all-rows scan with a condition of ("(NOT
              (tutorial_db.employee_large_data.start_gmt_ts IS NULL )) AND (NOT
              (tutorial_db.employee_large_data.agent_local_dt IS NULL ))") into
              Spool 3 (all_amps) (compressed columns allowed) fanned out
              into 8 hash join partitions, which is built locally on the
              AMPs.  The size of Spool 3 is estimated with high confidence
              to be 5,400,649 rows (23,357,806,925 bytes).  The estimated
              time for this step is 8.71 seconds.
      5) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
         all-rows scan, which is joined to Spool 3 (Last Use) by way of an
         all-rows scan.  Spool 2 and Spool 3 are joined using a hash join
         of 8 partitions, with a join condition of ("(agent_local_dt =
         agent_local_dt) AND (start_gmt_ts = start_gmt_ts)").  The result
         goes into Spool 1 (group_amps), which is built locally on the AMPs.
         The size of Spool 1 is estimated with no confidence to be
         2,161,255 rows (49,782,347,670 bytes).  The estimated time for
         this step is 7.79 seconds.
      6) Finally, we send out an END TRANSACTION step to all AMPs involved
         in processing the request.
      -> The contents of Spool 1 are sent back to the user as the result of
         statement 1.  The total estimated time is 16.50 seconds.
    
    Explanation: In the above example,1 table is small and it is internally divided into 8 partitions.