Teradata: Group Consecutive Dates

Pradeep

Teradata: Group Consecutive Dates

This tutorial will explain how to group consecutive dates and preserve gaps.
Consecutive date means end_dt of current row should be same as start_dt of next row or start date of next row should start_dt+1 of current row.
This collapsing of consecutive dates can be done using multiple approaches.
Watch this video on youtube which is explaining below examples for grouping consecutive dates.

Lets try to understand how row grouping will work with the help of a simple example.

Sample Data: Consider below data showing 2 group of Ids having multiple records over different period.


task_id     group_id    start_dt      end_Dt
-------  -----------  ----------  ----------
      1            2  2018-08-01  2018-08-02
      2            2  2018-08-02  2018-08-03
      3            2  2018-08-03  2018-08-04

      4            5  2018-08-13  2018-08-14
      5            5  2018-08-14  2018-08-15

      6            5  2018-08-28  2018-08-29

      7            2  2018-08-30  2018-08-31

Required Output: Grouping all rows with consecutive dates into a single row where end_dt of current row is same as start_dt of next row.
```
group_id    start_dt      end_dt
--------  ----------  ----------
       2  2018-08-01  2018-08-04

       5  2018-08-13  2018-08-15

       5  2018-08-28  2018-08-29

       2  2018-08-30  2018-08-31
```
Result Explanation:
- First 3 rows with group_id=2 got grouped because all these 3 row's start date is consecutive i.e Aug 1, 2, 3.
- 4th & 5th rows with group_id=5 got grouped because both row's start date is consecutive i.e Aug 13, 14.
- 6th row with group_id=5 did not got grouped with any other row because there is no consecutive date row against '2018-08-28'.
- 7th row with group_id=2 did not got grouped with any other row because there is no consecutive date row against '2018-08-30'.

Dataset: Lets convert the above dataset into a table for demonstration purpose.


CREATE MULTISET VOLATILE TABLE  manrega_task
(
 task_id INTEGER,
 group_id INTEGER,
 start_dt DATE,
 end_Dt DATE
 )
ON COMMIT PRESERVE ROWS;


--Data
INSERT INTO manrega_task VALUES(1,2,'2018-08-01','2018-08-02');
INSERT INTO manrega_task VALUES(2,2,'2018-08-02','2018-08-03');
INSERT INTO manrega_task VALUES(3,2,'2018-08-03','2018-08-04');

INSERT INTO manrega_task VALUES(4,5,'2018-08-13','2018-08-14');
INSERT INTO manrega_task VALUES(5,5,'2018-08-14','2018-08-15');

INSERT INTO manrega_task VALUES(6,5,'2018-08-28','2018-08-29');

INSERT INTO manrega_task VALUES(7,2,'2018-08-30','2018-08-31');

INSERT INTO manrega_task VALUES(8,4,'2018-08-01','2018-08-02');
INSERT INTO manrega_task VALUES(9,4,'2018-08-02','2018-08-03');
INSERT INTO manrega_task VALUES(10,4,'2018-08-03','2018-08-04');

Approach 1: Query to group consecutive days using row_number


--Step 2:
SELECT 
    group_id,
    MIN(start_dt) start_dt,
    MAX(end_dt) end_dt
FROM
(
--Step 1:
  SELECT 
     group_id,
     start_dt,
     start_dt+1 end_dt,
     start_dt - ROW_NUMBER() OVER (PARTITION BY group_id ORDER BY start_dt) grouping_date
  FROM manrega_task
)ref
GROUP BY grouping_date,group_id
ORDER BY 2;

Output:
group_id    start_dt      end_dt
--------  ----------  ----------
       2  2018-08-01  2018-08-04
       5  2018-08-13  2018-08-15
       5  2018-08-28  2018-08-29
       2  2018-08-30  2018-08-31

Explanation: grouping_date will hold same date value for all consecutive dates for the given grouping( partition by group_id in this case) and this is used to group consecutive dates together.

Approach 2: Query to group consecutive days using normalize


--Step 2:
SELECT 
    group_id,
    BEGIN(period_type) start_dt,
    END(period_type) end_dt
FROM
(
--Step 1:
  SELECT NORMALIZE 
      group_id,
      PERIOD(start_dt,end_dt) period_type
  FROM 
  manrega_task
)ref
ORDER BY 2;

Output:
group_id    start_dt      end_dt
--------  ----------  ----------
       2  2018-08-01  2018-08-04
       5  2018-08-13  2018-08-15
       5  2018-08-28  2018-08-29
       2  2018-08-30  2018-08-31

Click here to know more about Normalize operation.