SQL query to get start and end date from a result set

por | 20 julio, 2022

-1

I am working on one of requirement the raw data is in following format

Requirement – Startdate should be the date when status changed to 1 and enddate should be the 1st date after the record status changed from 1 to any other number.

CustomerStatusDate
A12307/2/2021
A12307/15/2021
A12307/22/2021
A12318/18/2021
A12319/8/2021
A123012/1/2021
A12301/21/2022
A12313/6/2022
A12313/7/2022
A12303/15/2022
B12311/1/2022
B12301/6/2022
C12311/2/2022
C12321/8/2022
C12301/9/2022

expected output

CustomerStartDateEndDate
A1238/18/202112/1/2021
A1239/8/202112/1/2021
A1233/6/20223/15/2022
A1233/7/20223/15/2022
B1231/1/20221/6/2022
C1231/2/20221/8/2022

Query I tried to get the output is below, I am getting the output for Customer B123 and C123, but not for A123 as expected.

Query Explanation – In 1st part of query I am taking all the records with status = 1 and in next part taking only those records where status is not equal to 1, and joining these 2 datasets based on Customer and row number generated.

SELECT A.[Customer],A.StartDate,B.EndDate
    from
    (
         SELECT [Customer],MIN(Date) AS STARTDATE,[Status],RANK() OVER (PARTITION BY [STATUS] ORDER BY Date ASC) AS ROWNUM
         FROM table1
         WHERE [STATUS] = 1
         GROUP BY Customer,Date,[Status]
     ) A
     LEFT JOIN
     (
        SELECT [Customer],MIN(Date) AS ENDDATE,[Status],RANK() OVER (PARTITION BY [STATUS] ORDER BY Date ASC) AS ROWNUM
         FROM table1
         WHERE [STATUS] != 1 
         AND Date>(
                    SELECT MIN(Date) AS STARTDATE
                    FROM table1
                    WHERE [STATUS] = 1
                )
     GROUP BY Customer,Date,[Status]
     ) B 
        ON 
        (
            A.[Customer] = B.[Customer] 
            AND A.RowNum = B.RowNum
        )
     ORDER BY A.Startdate

sql-servertsqljoingaps-and-islandsranking-functions

ShareFollow

edited Mar 31 at 12:54

user avatar

Charlieface

32.6k55 gold badges1313 silver badges3333 bronze badges

asked Mar 31 at 12:03

user avatar

BIDeveloper

69733 gold badges1313 silver badges2626 bronze badges

  • 3Images of data doesn’t help us help you. Neither does no explanation of the logic. – Larnu Mar 31 at 12:05
  • 1@Larnu updated the question with data instead of image and added the requirement and explanation – BIDeveloper Mar 31 at 12:21
  • 1Note: this is an «Islands and Gaps» type problem that is best solved using windowing aggregate functions (usually the difference of two of them). Cross Joins and Left Joins can be made to work but are pretty inefficient (O(n^2)) compared to a differential query (O(n)). – RBarryYoung Mar 31 at 12:43

Add a comment

2 Answers

Sorted by:Trending sort available                                               Highest score (default)                                                                   Trending (recent votes count more)                                                                   Date modified (newest first)                                                                   Date created (oldest first)                              

1

First you list the rows where Status = 1 and then use CROSS APPLY to get the corresponding minimum Date where the Status is not equal to 1

select s.[Customer],
       StartDate = s.[Date],
       EndDate   = e.[Date]
from   Table1 s
       cross apply
       (
           select [Date] = min(e.[Date])
           from   Table1 e
           where  e.[Customer] = s.[Customer]
           and    e.[Date]     > s.[Date]
           and    e.[Status]   <> 1
       ) e
where  s.[Status] = 1 
order by s.[Customer], s.[Date]

ShareFollow

answered Mar 31 at 12:29

user avatar

Squirrel

21.6k44 gold badges3535 silver badges3131 bronze badges

Add a comment

1

Here is a more efficient way to do this without a self-join.

WITH cte01only AS
  ( SELECT *, CASE Status WHEN 1 THEN 1 ELSE 0 END AS Status1 FROM table1 ),
  cteDifference AS
  (
    SELECT *,
        ROW_NUMBER() OVER (PARTITION BY Customer  ORDER BY Date, Status1)
        - ROW_NUMBER() OVER (PARTITION BY Customer, Status1  ORDER BY Date) AS StatusGroup
    FROM cte01only
  ),
  cteGroup AS
  (
    SELECT   Customer, StatusGroup, Status1, MIN(Date) As StartDate
    FROM cteDifference
    GROUP BY Customer, StatusGroup, Status1
  ),
  cteNextDate AS
  (
    SELECT Customer, StatusGroup, Status1, StartDate,
        LEAD(StartDate, 1, NULL) OVER (PARTITION BY Customer  ORDER BY StatusGroup) AS EndDate
    FROM cteGroup
  )
SELECT Customer, StartDate, EndDate
FROM cteNextDate
WHERE Status1 = 1
ORDER BY Customer, StateDate

The key trick here is the second CTE which uses the difference of two ROW_NUMBER() functions to tag the customer records (with the StatusGroup column) into separate partitions by contiguous runs of records whose status is 1 or not 1. After that they can be grouped according to that tag to get the start dates, and then use the LEAD() function to get the following group’s StartDate as the current groupings EndDate.

(There may be a more compact way to express this, but I like to layout each stage as a separate CTE.)

Share