SQL Server - cumulative sum on overlapping data - getting date that sum reaches a given value -
in our company, our clients perform various activities log in different tables - interview attendance, course attendance, , other general activities. have database view unions data of these tables giving activityview looks this. can see activities overlap - example while attending interview, client may have been performing cv update activity.
+----------------------+---------------+---------------------+-------------------+ | activity_client_id | activity_type | activity_start_date | activity_end_date | +----------------------+---------------+---------------------+-------------------+ | 112 | interview | 2015-06-01 09:00 | 2015-06-01 11:00 | | 112 | cv updating | 2015-06-01 09:30 | 2015-06-01 11:30 | | 112 | course | 2015-06-02 09:00 | 2015-06-02 16:00 | | 112 | interview | 2015-06-03 09:00 | 2015-06-03 10:00 | +----------------------+---------------+---------------------+-------------------+
each client has "sign date", recorded on client table, when joined our programme. here our sample client:
+-----------+---------------------+ | client_id | client_sign_up_date | +-----------+---------------------+ | 112 | 2015-05-20 | +-----------+---------------------+
i need create report show following columns:
+-----------+---------------------+--------------------------------------------+ | client_id | client_sign_up_date | date_client_completed_5_hours_of_activity | +-----------+---------------------+--------------------------------------------+
we need report in order see how effective our programme is. important aim of programme every client complete @ least 5 hours of activity possible. report tell how long sign take each client achieve figure.
what makes trickier when calculate 5 hours of total activity, must discount overlapping activities:
in sample data above client attended interview between 09:00 , 11:00.
on same day performed cv updating activity 09:30 11:30. our calculation, give them total activity day of 2.5 hours (150 minutes) - count 30 minutes of cv updating interview overlaps 11:00.
so report our sample client give following result:
+-----------+---------------------+--------------------------------------------+ | client_id | client_sign_up_date | date_client_completed_5_hours_of_activity | +-----------+---------------------+--------------------------------------------+ | 112 | 2015-05-20 | 2015-06-02 | +-----------+---------------------+--------------------------------------------+
so question how can create report using select statement ? can work out how writing stored procedure loop through view , write result report table. prefer avoid stored procedure , have select statement give me report on fly.
i using sql server 2005.
this 1 way it:
;with ctern ( select activity_client_id, activity_type, activity_start_date, activity_end_date, row_number() on (partition activity_client_id order activity_start_date) rn activities ), ctediff ( select c1.activity_client_id, c1.activity_type, x.activity_start_date, c1.activity_end_date, datediff(mi, x.activity_start_date, c1.activity_end_date) diff, row_number() on (partition c1.activity_client_id order x.activity_start_date) seq ctern c1 left join ctern c2 on c1.rn = c2.rn + 1 cross apply (select case when c1.activity_start_date < c2.activity_end_date c2.activity_end_date else c1.activity_start_date end) x(activity_start_date) ) select top 1 client_id, client_sign_up_date, activity_start_date, hoursofactivicty ctediff c1 inner join clients c2 on c1.activity_client_id = c2.client_id cross apply (select sum(diff) / 60.0 ctediff c3 c3.seq <= c1.seq) x(hoursofactivicty) hoursofactivicty >= 5 order seq
common table expressions , row_number()
introduced sql server 2005, above query should work version.
the first cte
, i.e. ctern
, produces following output:
client_id activity_type start_date end_date rn 112 interview 2015-06-01 09:00 2015-06-01 11:00 1 112 cv updating 2015-06-01 09:30 2015-06-01 11:30 2 112 course 2015-06-02 09:00 2015-06-02 16:00 3 112 interview 2015-06-03 09:00 2015-06-03 10:00 4
the second cte
, i.e. ctediff
, uses above table expression in order calculate time difference each record, taking consideration overlapps previous record:
client_id activity_type start_date end_date diff seq 112 interview 2015-06-01 09:00 2015-06-01 11:00 120 1 112 cv updating 2015-06-01 11:00 2015-06-01 11:30 30 2 112 course 2015-06-02 09:00 2015-06-02 16:00 420 3 112 interview 2015-06-03 09:00 2015-06-03 10:00 60 4
the final query calculates cumulative sum of time difference , selects first record exceeds 5 hours of activity.
the above query work simple interval overlaps, i.e. when end date of activity overlaps start date of next activity.
Comments
Post a Comment