Teradata: Removing Duplicates From Table

Pradeep

Teradata Removing Duplicates From Table

➠ Users cannot remove exact duplicates using row_number function(or any other function) in Teradata.
➠ Exact duplicates from 1 table can only be removed using other temporary table.
➠ There can be 2 types of duplicates present in Teradata tables.

Complete row duplicates(Exact)


Example: Complete row duplicate

  id  name     subject   marks
----  -------  --------  -----
 123  Harry    english      90
 123  Harry    english      90

Key Columns duplicates


Example: Key Column duplicate

  id  name     subject   marks
----  -------  --------  -----
 123  Harry    english      90
 123  Harry    maths        95

Note: 'id' and 'name' are the key columns of the 'student' table used in examples above.

Table structure used in the example containing duplicates


CREATE MULTISET TABLE student
(
  id INTEGER,
  name VARCHAR(100),
  subject VARCHAR(100),
  marks INTEGER
)primary index(id,name);

➠ Complete row duplicates: Complete row duplicates can be removed by using 3 approaches.

Approach 1: By create a new table and rename it as main table
1. Creating new table with unique data
```
CREATE TABLE student_new AS (SELECT DISTINCT * FROM student) WITH DATA AND STATS;
```
2. Drop existing main table
```
DROP TABLE student;
```
3. Rename new table as main table
```
RENAME TABLE student_new TO student;
```

Approach 2: By using temporary SET table

Creating SET temporary table


CREATE SET TABLE student_set
(
  id INTEGER,
  name VARCHAR(100),
  subject VARCHAR(100),
  marks INTEGER
)primary index(id,name);

Inserting unique record into SET temporary table


INSERT INTO student_set SELECT * FROM student;

Delete records from main table
```
DELETE FROM student;
```

Inserting record from SET table to main table


INSERT INTO student SELECT * FROM student_set;

Droping set backup table
```
DROP TABLE student_set;
```

Approach 3: By using temporary MULTISET table

Creating temporary table


CREATE MULTISET TABLE student_temp
(
  id INTEGER,
  name VARCHAR(100),
  subject VARCHAR(100),
  marks INTEGER
)primary index(id,name);

Inserting unique record into temp table using qualify function


INSERT INTO student_temp SELECT * FROM student QUALIFY row_number() over (partition by id,name,subject,marks order by id)=1;

Delete records from main table
```
DELETE FROM student;
```

Inserting record from temp table to main table


INSERT INTO student SELECT * FROM student_temp;

Droping set backup table
```
DROP TABLE student_temp;
```

➠ Key columns duplicates: Key columns duplicates can be removed by using below approach.

Creating temporary table


CREATE MULTISET TABLE student_temp
(
  id INTEGER,
  name VARCHAR(100),
  subject VARCHAR(100),
  marks INTEGER
)primary index(id,name);

Inserting unique record into temp table using qualify function on the key columns(id & name)


INSERT INTO student_temp SELECT * FROM student QUALIFY row_number() over (partition by id,name order by marks desc)=1;

Delete records from main table
```
DELETE FROM student;
```

Inserting record from temp table to main table


INSERT INTO student SELECT * FROM student_temp;

Droping set backup table
```
DROP TABLE student_temp;
```

➠ Identifying Duplicates: Way to identifying duplicates in the table.


SELECT * FROM student QUALIFY count() over (partition by id,name )>1;

Differet ways to remove duplicates from Teradata tables

dbmstutorials.com

Teradata Removing Duplicates From Table