Teradata: Group Consecutive Dates

Teradata: Group Consecutive Dates


Lets try to understand how row grouping will work with the help of a simple example.


Dataset: Lets convert the above dataset into a table for demonstration purpose.

CREATE MULTISET VOLATILE TABLE  manrega_task
(
 task_id INTEGER,
 group_id INTEGER,
 start_dt DATE,
 end_Dt DATE
 )
ON COMMIT PRESERVE ROWS;


--Data
INSERT INTO manrega_task VALUES(1,2,'2018-08-01','2018-08-02');
INSERT INTO manrega_task VALUES(2,2,'2018-08-02','2018-08-03');
INSERT INTO manrega_task VALUES(3,2,'2018-08-03','2018-08-04');

INSERT INTO manrega_task VALUES(4,5,'2018-08-13','2018-08-14');
INSERT INTO manrega_task VALUES(5,5,'2018-08-14','2018-08-15');

INSERT INTO manrega_task VALUES(6,5,'2018-08-28','2018-08-29');

INSERT INTO manrega_task VALUES(7,2,'2018-08-30','2018-08-31');

INSERT INTO manrega_task VALUES(8,4,'2018-08-01','2018-08-02');
INSERT INTO manrega_task VALUES(9,4,'2018-08-02','2018-08-03');
INSERT INTO manrega_task VALUES(10,4,'2018-08-03','2018-08-04');



Approach 1: Query to group consecutive days using row_number

--Step 2:
SELECT 
    group_id,
    MIN(start_dt) start_dt,
    MAX(end_dt) end_dt
FROM
(
--Step 1:
  SELECT 
     group_id,
     start_dt,
     start_dt+1 end_dt,
     start_dt - ROW_NUMBER() OVER (PARTITION BY group_id ORDER BY start_dt) grouping_date
  FROM manrega_task
)ref
GROUP BY grouping_date,group_id
ORDER BY 2;

Output:
group_id    start_dt      end_dt
--------  ----------  ----------
       2  2018-08-01  2018-08-04
       5  2018-08-13  2018-08-15
       5  2018-08-28  2018-08-29
       2  2018-08-30  2018-08-31
Explanation: grouping_date will hold same date value for all consecutive dates for the given grouping( partition by group_id in this case) and this is used to group consecutive dates together.

Approach 2: Query to group consecutive days using normalize

--Step 2:
SELECT 
    group_id,
    BEGIN(period_type) start_dt,
    END(period_type) end_dt
FROM
(
--Step 1:
  SELECT NORMALIZE 
      group_id,
      PERIOD(start_dt,end_dt) period_type
  FROM 
  manrega_task
)ref
ORDER BY 2;

Output:
group_id    start_dt      end_dt
--------  ----------  ----------
       2  2018-08-01  2018-08-04
       5  2018-08-13  2018-08-15
       5  2018-08-28  2018-08-29
       2  2018-08-30  2018-08-31
Click here to know more about Normalize operation.