Sum up values with dynamic grouping conditions

Sum up values with dynamic grouping conditions - sql-server

I'm looking for a way to sum up values with dynamic grouping conditions in only one query, if possible. That also means no UNION ALL.
(The query below is quite easy and UNION ALL wouldn't be expensive at all, but if the source data has to be gathered from a bunch of tables it decreases performace to do all joins twice.)
Example data:
create table data (id int, location nvarchar(1), qty int, grouping tinyint)
insert into data (id, location, qty, grouping) values (1, 'A', 10, 0)
insert into data (id, location, qty, grouping) values (1, 'A', 20, 0)
insert into data (id, location, qty, grouping) values (1, 'B', 15, 0)
insert into data (id, location, qty, grouping) values (2, 'A', 5, 1)
insert into data (id, location, qty, grouping) values (2, 'B', 10, 1)
insert into data (id, location, qty, grouping) values (3, 'B', 20, 1)
Qty should be summed up per location, if grouping is 0, else per id.
Estimated result:
1, A, 30
1, B, 15
2, null, 15
3, null, 20
See SQL-Fiddle

It is possible, using CASE WHEN ...
SELECT id ,
(CASE WHEN grouping = 0 THEN location ELSE NULL END) AS location,
SUM(qty) AS qty
FROM data
GROUP BY id ,(CASE WHEN grouping = 0 THEN location ELSE NULL END)
ORDER BY id
Result:
id location qty
1 A 30
1 B 15
2 NULL 15
3 NULL 20

Related

Find the date when a milestone was achieved

I have people that do many multi-day assignments (date x to date Y). I would like to find the date that they completed a milestone e.g. 50 days work completed.
Data is stored as a single row per Assignment
AssignmentId
StartDate
EndDate
I can sum up the total days they have completed up to a date, but am struggling to see how I would find out the date that a milestone was hit. e.g. How many people completed 50 days in October 2020 showing the date within the month that this occurred?
Thanks in advance
PS. Our database is SQL Server.

As mentioned by prwvious comments, it would be much easier to help you if you could provide example data and table structure in order help you answer this question.
However, guessing a simple DB structure with a table for your peolple, your tasks and the work each user completed, you can get the required sum of days by use of a date table (or cte) which contains a entry for each day and the window function SUM with UNBOUNDED PRECEDING. Following an example:
DECLARE #people TABLE(
id int
,name nvarchar(50)
)
DECLARE #tasks TABLE(
id int
,name nvarchar(50)
)
DECLARE #work TABLE(
people_id int
,task_id int
,task_StartDate date
,task_EndDate date
)
INSERT INTO #people VALUES (1, 'Peter'), (2, 'Paul'), (3, 'Mary');
INSERT INTO #tasks VALUES (1, 'Devleopment'), (2, 'QA'), (3, 'Sales');
INSERT INTO #work VALUES
(1, 1, '2019-04-05', '2019-04-08')
,(1, 1, '2019-05-05', '2019-06-08')
,(1, 1, '2019-07-05', '2019-09-08')
,(2, 2, '2019-04-08', '2019-06-08')
,(2, 2, '2019-09-08', '2019-10-03')
,(3, 1, '2019-11-01', '2019-12-01')
;WITH cte AS(
SELECT CAST('2019-01-01' AS DATE) AS dateday
UNION ALL
SELECT DATEADD(d, 1, dateday)
FROM cte
WHERE DATEADD(d, 1, dateday) < '2020-01-01'
),
cteWorkDays AS(
SELECT people_id, task_id, dateday, 1 AS cnt
FROM #work w
INNER JOIN cte c ON c.dateday BETWEEN w.task_StartDate AND w.task_EndDate
),
ctePeopleWorkdays AS(
SELECT *, SUM(cnt) OVER (PARTITION BY people_id ORDER BY dateday ROWS UNBOUNDED PRECEDING) dayCnt
FROM cteWorkDays
)
SELECT *
FROM ctePeopleWorkdays
WHERE dayCnt = 50
OPTION (MAXRECURSION 0)

The solution depends on how you store your data. The solution below assumes that each worked day exists as a single row in your data model.
The approach below uses a common table expression (cte) to generate a running total (Total) for each person (PersonId) and then filters on the milestone target (I set it to 5 to reduce the sample data size) and target month.
Sample data
create table WorkedDays
(
PersonId int,
TaskDate date
);
insert into WorkedDays (PersonId, TaskDate) values
(100, '2020-09-01'),
(100, '2020-09-02'),
(100, '2020-09-03'),
(100, '2020-09-04'),
(100, '2020-09-05'), -- person 100 worked 5 days by 2020-09-05 = milestone (in september)
(200, '2020-09-29'),
(200, '2020-09-30'),
(200, '2020-10-01'),
(200, '2020-10-02'),
(200, '2020-10-03'), -- person 200 worked 5 days by 2020-10-03 = milestone (in october)
(200, '2020-10-04'),
(200, '2020-10-05'),
(200, '2020-10-06'),
(300, '2020-10-10'),
(300, '2020-10-11'),
(300, '2020-10-12'),
(300, '2020-10-13'),
(300, '2020-10-14'), -- person 300 worked 5 days by 2020-10-14 = milestone (in october)
(300, '2020-10-15'),
(400, '2020-10-20'),
(400, '2020-10-21'); -- person 400 did not reach the milestone yet
Solution
with cte as
(
select wd.PersonId,
wd.TaskDate,
count(1) over(partition by wd.PersonId
order by wd.TaskDate
rows between unbounded preceding and current row) as Total
from WorkedDays wd
)
select cte.PersonId,
cte.TaskDate as MileStoneDate
from cte
where cte.Total = 5 -- milestone reached
and year(cte.TaskDate) = 2020
and month(cte.TaskDate) = 10; -- in october
Result
PersonId MilestoneDate
-------- -------------
200 2020-10-03
300 2020-10-14
Fiddle (also shows the common table expression output).

Search child rows for values

I have something like this:
Transaction Customer
1 Cust1
2 Cust2
3 Cust3
4 Cust4
TransID Code
2 A
2 B
2 D
3 A
4 B
4 C
If I want to be able to do something like "IF Customer 'Cust1' Has code 'A'", how should I best build a view? I want to end up being able to query something like "Select Customer from View where Code in [Some list of codes]" OR "Select Cust1 from View Having Codes in [Some list of codes]"
While I can do something like
Customer | Codes
Cust1 | A, B, D
Etc.
SELECT Transaction from Tbl where Codes like 'A'
This seems to me to be an impractical way to do it.

Here's how I'd do it
;with xact_cust (xact, cust) as
(
select 1, 'cust1' union all
select 2, 'cust2' union all
select 3, 'cust3' union all
select 4, 'cust4'
), xact_code (xact, code) as
(
select 2, 'A' union all
select 2, 'B' union all
select 2, 'D' union all
select 3, 'A' union all
select 4, 'B' union all
select 4, 'C'
)
select Cust, Code
from xact_cust cust
inner join xact_code code
on cust.xact = code.xact
where exists (select 1
from xact_code i
where i.xact = code.xact
and i.code = 'A')
If you NEED the codes serialized into a delimited list, take a look at this article: What this query does to create comma delimited list SQL Server?

Here's another option...
IF OBJECT_ID('tempdb..#CustomerTransaction', 'U') IS NOT NULL
DROP TABLE #CustomerTransaction;
CREATE TABLE #CustomerTransaction (
TransactionID INT NOT NULL PRIMARY KEY,
Customer CHAR(5) NOT NULL
);
INSERT #CustomerTransaction (TransactionID, Customer) VALUES
(1, 'Cust1'), (2, 'Cust2'), (3, 'Cust3'),
(4, 'Cust4'), (5, 'Cust5');
IF OBJECT_ID('tempdb..#TransactionCode', 'U') IS NOT NULL
DROP TABLE #TransactionCode;
CREATE TABLE #TransactionCode (
TransactionID INT NOT NULL,
Code CHAR(1) NOT NULL
);
INSERT #TransactionCode (TransactionID, Code) VALUES
(2, 'A'), (2, 'B'), (2, 'D'), (3, 'A'), (4, 'B'), (4, 'C');
--SELECT * FROM #CustomerTransaction ct;
--SELECT * FROM #TransactionCode tc;
--=============================================================
SELECT
ct.TransactionID,
ct.Customer,
CodeList = STUFF(tcx.CodeList, 1, 1, '')
FROM
#CustomerTransaction ct
CROSS APPLY (
SELECT
', ' + tc.Code
FROM
#TransactionCode tc
WHERE
ct.TransactionID = tc.TransactionID
ORDER BY
tc.Code ASC
FOR XML PATH('')
) tcx (CodeList);
Results...
TransactionID Customer CodeList
------------- -------- -----------
1 Cust1 NULL
2 Cust2 A, B, D
3 Cust3 A
4 Cust4 B, C
5 Cust5 NULL

How to write a case when statement when there are overlaps in T-SQL

I have a table like this
How can I group it to this
Small is the sum of the count when Count <25; Large is the sum of the count when Count>=25; Total is the sum of all counts.

Try it like this...
IF OBJECT_ID('tempdb..#TestData', 'U') IS NOT NULL
DROP TABLE #TestData;
CREATE TABLE #TestData (
ID INT NOT NULL PRIMARY KEY,
nCount int NOT NULL
);
INSERT #TestData (ID, nCount) VALUES
(1, 10), (2, 15), (3, 22), (4, 23),
(5, 25), (6, 27), (7, 30);
--=====================================
WITH
cte_Totals AS (
SELECT
Total = SUM(td.nCount),
Small = SUM(CASE WHEN td.nCount < 25 THEN td.nCount ELSE 0 END),
Large = SUM(CASE WHEN td.nCount >= 25 THEN td.nCount ELSE 0 END)
FROM
#TestData td
)
SELECT
x.[Group],
x.[Count]
FROM
cte_Totals t
CROSS APPLY (VALUES (1, 'Total', t.Total), (2, 'Small', t.Small), (3, 'Large', t.Large) ) x (SortBy, [Group],[Count])
ORDER BY
x.SortBy;
Results...
Group Count
----- -----------
Total 152
Small 70
Large 82
HTH,
Jason

The simplest way is to use CASE:
SELECT
SUM(Count) as Total,
SUM(CASE WHEN Count < 25 THEN Count ELSE 0 END) as Small,
SUM(CASE WHEN Count >= 25 THEN Count ELSE 0 END) as Large
FROM table

Late answer (keep the accepted as is), but I did want to introduce a concept which may be more helpful down the line.
I maintain a generic Tier Table. The following is a simplified example, but you can take the aggregation tiers out of the code, and put it in a table... things change, and you can serve multiple masters.
Sample Data
Declare #YourTable table (ID int,[Count] int)
Insert Into #YourTable values
(1, 10), (2, 15), (3, 22), (4, 23), (5, 25), (6, 27), (7, 30)
Declare #Tier table (Tier varchar(50),Seq int,Title varchar(50),R1 int,R2 int)
Insert Into #Tier values
('MyGroup',1,'Total',0,99999)
,('MyGroup',2,'Small',0,25)
,('MyGroup',3,'Large',25,99999)
The Actual Query
Select T.Title
,[Count] = sum(D.[Count])
From #Tier T
Join #YourTable D on (T.Tier='MyGroup' and D.Count >= T.R1 and D.Count<T.R2)
Group By T.Title,T.Seq
Order By T.Seq
Returns
Title Count
Total 152
Small 70
Large 82
EDIT - There are many ways you can construct this
Example
Declare #YourTable table (ID varchar(50),[Count] int)
Insert Into #YourTable values
('Tywin', 10), ('Tywin', 15), ('Tyrion', 22), ('Bran', 23), ('Ned', 25), ('John', 27), ('Robb', 30)
Declare #Tier table (Tier varchar(50),Seq int,Title varchar(50),R1 int,R2 int,C1 varchar(50),C2 varchar(50))
Insert Into #Tier values
('MyGroup',1,'Total' ,null,null,'a','z')
,('MyGroup',2,'Group 1',null,null,'Tywin,Tyrion',null)
,('MyGroup',3,'Group 2',null,null,'Bran,Ned,John,Robb',null)
Select T.Title
,[Count] = sum(D.[Count])
From #Tier T
Join #YourTable D on T.Tier='MyGroup' and (D.ID between C1 and C2 or patindex('%,'+D.ID+',%',','+C1+',')>0)
Group By T.Title,T.Seq
Order By T.Seq
Returns
Title Count
Total 152
Group 1 47
Group 2 105

How do you validate that range doesn't overlap in a list of data?

I have a list of data :
Id StartAge EndAge Amount
1 0 2 50
2 2 5 100
3 5 10 150
4 6 9 160
I have to set Amount for various age group.
The age group >0 and <=2 need to pay 50
The age group >2 and <=5 need to pay 100
The age group >5 and <=10 need to pay 150
But
The age group >6 and <=9 need to pay 160 is an invalid input because >6 and <=9 already exist on 150 amount range.
I have to validate such kind of invalid input before inserting my data as a bulk.Once 5-10 range gets inserted anything that is within this range should not be accepted by system. For example: In above list, user should be allowed to insert 10-15 age group but any of the following should be checked as invalid.
6-9
6-11
3-5
5-7
If Invalid Input exists on my list I don't need to insert the list.

You could try to insert your data to the temporary table first.
DECLARE #TempData TABLE
(
[Id] TINYINT
,[StartAge] TINYINT
,[EndAge] TINYINT
,[Amount] TINYINT
);
INSERT INTO #TempData ([Id], [StartAge], [EndAge], [Amount])
VALUES (1, 0, 2, 50)
,(2, 2, 5, 100)
,(3, 5, 10, 150)
,(4, 6, 9, 160);
Then, this data will be transferred to your target table using INSERT INTO... SELECT... statement.
INSERT INTO <your target table>
SELECT * FROM #TempData s
WHERE
NOT EXISTS (
SELECT 1
FROM #TempData t
WHERE
t.[Id] < s.[Id]
AND s.[StartAge] < t.[EndAge]
AND s.[EndAge] > t.[StartAge]
);
I've created a demo here

We can use recursive CTE to find how records are chained by end age and start age pairs:
DECLARE #DataSource TABLE
(
[Id] TINYINT
,[StartAge] TINYINT
,[EndAge] TINYINT
,[Amount] TINYINT
);
INSERT INTO #DataSource ([Id], [StartAge], [EndAge], [Amount])
VALUES (1, 0, 2, 50)
,(2, 2, 5, 100)
,(3, 5, 10, 150)
,(4, 6, 9, 160)
,(5, 6, 11, 160)
,(6, 3, 5, 160)
,(7, 5, 7, 160)
,(9, 10, 15, 20)
,(8, 7, 15, 20);
WITH PreDataSource AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY [StartAge] ORDER BY [id]) as [pos]
FROM #DataSource
), DataSource AS
(
SELECT [Id], [StartAge], [EndAge], [Amount], [pos]
FROM PreDataSource
WHERE [id] = 1
UNION ALL
SELECT R.[Id], R.[StartAge], R.[EndAge], R.[Amount], R.[pos]
FROM DataSource A
INNER JOIN PreDataSource R
ON A.[Id] < R.[Id]
AND A.[EndAge] = R.[StartAge]
AND R.[pos] =1
)
SELECT [Id], [StartAge], [EndAge], [Amount]
FROM DataSource;
This is giving us, the following output:
Note, that before this, we are using the following statement to prepare the data:
SELECT *, ROW_NUMBER() OVER (PARTITION BY [StartAge] ORDER BY [id]) as [pos]
FROM #DataSource;
The idea is to find records with same start age and to calculated which one is inserted first. Then, in the CTE we are getting only the first.

Assuming you are bulk inserting the mentioned data into a temp table(#tmp) or table variable (#tmp).
If you are working on sql server 2012 try the below.
select *
from(select *,lag(endage,1,0)over(order by endage) as [col1]
from #tmp)tmp
where startage>=col1 and endage>col1
The result of this query should be inserted into your main table.

T-SQL - Filling in the gaps in running balance

I am working on a Data Warehouse project and the client provides daily sales data. On-hand quantities are provided in most lines but are often missing. I need help on how to fill those missing values based on prior OH and sales information.
Here's a sample data:
Line# Store Item OnHand SalesUnits DateKey
-----------------------------------------------
1 001 A 100 20 1
2 001 A 80 10 2
3 001 A null 30 3 --[OH updated with 70 (80-10)]
4 001 A null 5 4 --[OH updated with 40 (70-30)]
5 001 A 150 10 5 --[OH untouched]
6 001 B null 4 1 --[OH untouched - new item]
7 001 B 80 12 2
8 001 B null 10 3 --[OH updated with 68 (80-12]
Lines 1 and 2 are not to be updated because OnHand quantities exist.
Lines 3 and 4 are to be updated based on their preceding rows.
Line 5 is to be left untouched because OnHand is provided.
Line 6 is to be left untouched because it is the first row for Item B
Is there a way I can do this in a set operation? I know I can do it easily using a fast_forward cursor but it will take a long time (15M+ rows).
Thanks for your help!

Test data:
declare #t table(
Line# int, Store char(3), Item char, OnHand int, SalesUnits int, DateKey int
)
insert #t values
(1, '001', 'A', 100, 20, 1),
(2, '001', 'A', 80 , 10, 2),
(3, '001', 'A', null, 30, 3),
(4, '001', 'A', null, 5, 4),
(5, '001', 'A', 150, 10, 5),
(6, '001', 'B', null, 4, 1),
(7, '001', 'B', null, 4, 2),
(8, '001', 'B', 80, 12, 3),
(9, '001', 'B', null, 10, 4)
Script to populate not using cursor:
;with a as
(
select Line#, Store, Item, OnHand, SalesUnits, DateKey, 1 correctdata from #t where DateKey = 1
union all
select t.Line#, t.Store, t.Item, coalesce(t.OnHand, a.onhand - a.salesunits), t.SalesUnits, t.DateKey, t.OnHand from #t t
join a on a.DateKey = t.datekey - 1 and a.item = t.item and a.store = t.store
)
update t
set OnHand = a.onhand
from #t t join a on a.line# = t.line#
where a.correctdata is null
Script to populate using cursor:
declare #datekey int, #store int, #item char, #Onhand int,
#calculatedonhand int, #salesunits int, #laststore int, #lastitem char
DECLARE sales_cursor
CURSOR FOR
SELECT datekey+1, store, item, OnHand -SalesUnits, salesunits
FROM #t sales
order by store, item, datekey
OPEN sales_cursor;
FETCH NEXT FROM sales_cursor
INTO #datekey, #store, #item, #Onhand, #salesunits
WHILE ##FETCH_STATUS = 0
BEGIN
SELECT #calculatedonhand = case when #laststore = #store and #lastitem = #item
then coalesce(#onhand, #calculatedonhand - #salesunits) else null end
,#laststore = #store, #lastitem = #item
UPDATE s
SET onhand=#calculatedonhand
FROM #t s
WHERE datekey = #datekey and #store = store and #item = item
and onhand is null and #calculatedonhand is not null
FETCH NEXT FROM sales_cursor
INTO #datekey, #store, #item, #Onhand, #salesunits
END
CLOSE sales_cursor;
DEALLOCATE sales_cursor;
I recommand you use the cursor version, I doubt you can get a decent performance using the recursive query. I know people in here hate cursors, but when your table has that size, it can be the only solution.