Differet ways to remove duplicates from Teradata tables

Teradata Removing Duplicates From Table

➠ Users cannot remove exact duplicates using row_number function(or any other function) in Teradata.
➠ Exact duplicates from 1 table can only be removed using other temporary table.
➠ There can be 2 types of duplicates present in Teradata tables.

  1. Complete row duplicates(Exact)
    
    Example: Complete row duplicate
    
      id  name     subject   marks
    ----  -------  --------  -----
     123  Harry    english      90
     123  Harry    english      90
    
  2. Key Columns duplicates
    
    Example: Key Column duplicate
    
      id  name     subject   marks
    ----  -------  --------  -----
     123  Harry    english      90
     123  Harry    maths        95
    
Note: 'id' and 'name' are the key columns of the 'student' table used in examples above.

Table structure used in the example containing duplicates

CREATE MULTISET TABLE student
(
  id INTEGER,
  name VARCHAR(100),
  subject VARCHAR(100),
  marks INTEGER
)primary index(id,name);

Complete row duplicates: Complete row duplicates can be removed by using 3 approaches.



Key columns duplicates: Key columns duplicates can be removed by using below approach.

  1. Creating temporary table
    
    CREATE MULTISET TABLE student_temp
    (
      id INTEGER,
      name VARCHAR(100),
      subject VARCHAR(100),
      marks INTEGER
    )primary index(id,name);
    

  2. Inserting unique record into temp table using qualify function on the key columns(id & name)
    
    INSERT INTO student_temp SELECT * FROM student QUALIFY row_number() over (partition by id,name order by marks desc)=1;
    

  3. Delete records from main table
    
    DELETE FROM student;
    

  4. Inserting record from temp table to main table
    
    INSERT INTO student SELECT * FROM student_temp;
    

  5. Droping set backup table
    
    DROP TABLE student_temp;
    

Identifying Duplicates: Way to identifying duplicates in the table.

SELECT * FROM student QUALIFY count() over (partition by id,name )>1;