Teradata Partitioned Primary Index(PPI) is option in teradata which allows it to divide the data in partition for fast retrieval

Teradata Partition Primary Index(PPI)

PPI Characteristics:
  1. Partitioning can be done on volatile,global temporary and permanent tables.
  2. Partitioning column doesn't have to be part of non unique Primary Index.
  3. Partitioning column must be part of unique Primary Index.
  4. Partitioned column/s will not decide which AMP data should go,it is solely dependent on Primary Index.
  5. Rows within AMP will be sorted by partitioned column/s and not by rowId.
  6. PPI can be defined at multi-level i.e sub-partition within partition (called Multilevel PPI).
  7. Teradata uses partition elimination strategy which helps to avoid Full Table Scan(FTS) for partitioned tables.
  8. Partitioned tables may reduce I/O for range queries by reducing the number of rows to be processed .
  9. Details about the partitions,constraints & conditions can be retrieved from dbc.PartitioningConstraintsV view.
    
    select * from dbc.PartitioningConstraintsV where databasename='tutorial_db' and tablename='employee';
    
PPI disadvantages:
  1. Partitioning can make single row (primary index) accesses to the table slower if partitioning column is not a part of the primary index column.

Rows are sents to their appropriate AMPs based on row hash. By default rows are ordered within AMP by row Id. For example,Primary index for the table employee is emp_no.

Non-Partitioned Index

Below query will result in Full Table Scan(FTS) to satify the query since the rows are distributed across the AMPs.

LOCK ROW FOR ACCESS SELECT * FROM tutorial_db.employee where hire_date between '1981-02-01' AND '1981-02-28';

Partitioned Primary Index : Now Partitioned Primary Index is added on the hiredate while the primary index is same as before i.e emp_no. Rows will go to exactly same AMP as before

Syntax/Example :

CREATE MULTISET TABLE tutorial_db.employee
     (
      emp_no INTEGER,
      emp_name VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      job_title VARCHAR(30) CHARACTER SET LATIN NOT CASESPECIFIC,
      manager_id INTEGER,
      hire_date DATE FORMAT 'yyyy-mm-dd',
      salary DECIMAL(18,2),
      commission DECIMAL(18,2),
      dept_no INTEGER)
PRIMARY INDEX ( emp_no )
PARTITION BY range_n(
  hire_date BETWEEN date '1981-01-01' and '1988-12-31' Each interval '1' Month,
  NO range OR UNKNOWN );

***Only difference will be that data is sorted by hire_date and not by row Id.
Non-Partitioned Index

Below query will use partition elimination strategie to avoid Full Table Scan(FTS) as rows are grouped together in their respective partition(monthly) across the AMPs.

LOCK ROW FOR ACCESS SELECT * FROM tutorial_db.employee where hire_date between '1981-02-01' AND '1981-02-28';

 Question: How many partition on each AMP will need to be read by above query?
 Answer:  1

Teradata also support Case(CASE_N) & Range(RANGE_N) as a partitioning option.


Partitions within table

Handling Unknown values: Different conditions and combinations to handle unknown values and store them into separate or one partitions. Number of partition will depend on how conditions are specified while defining PPI, if 'OR' is used between clauses then only 1 partition will be created for unknown & NULL values and if comma(,) is used then 2 partitions will be created for unknown & NULL values.

Note: If unknown values are not handled properly and users try to insert unknown values then users will get below error.

*** Failure 5728 Partitioning violation for table tutorial_db.employee.


Multilevel Partitioned Primary Index(MLPPI) : Multilevel Partitioned primary index allow each partition at a given level to be again partitioned into sub-partitions. MLPPI perform efficient searches by using partition elimination at the various levels or combinations of levels.

Syntax/Example :Below table is first partitioned by year and then within each year rows will be partitoned(sub-partition) by month.

CREATE MULTISET TABLE tutorial_db.pradeep_employee_test
 (
  emp_no INTEGER,
  emp_name VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
  job_title VARCHAR(30) CHARACTER SET LATIN NOT CASESPECIFIC,
  manager_id INTEGER,
  hire_date DATE FORMAT 'yyyy-mm-dd',
  hire_year INTEGER,
  hire_month INTEGER,
  salary DECIMAL(18,2),
  commission DECIMAL(18,2),
  dept_no INTEGER)
PRIMARY INDEX ( emp_no )
PARTITION BY (
  RANGE_N(hire_year BETWEEN 1980 AND 1988 EACH 1),
  RANGE_N(hire_month BETWEEN 1 AND 12 Each 1,
  NO RANGE OR UNKNOWN ));