Distinct query in SQL script which is nested - sql-server

I have all data is in one single table and when an entry is inserted, it appends a unique ID for that account. I then have a script which performs several calculations across all entries. However what I want to do is script only distinct entries in the script so I can perform the same calculation but just on the first one it finds for that account, as the info does not change between entries. The script looks as follows
However when I try to nest a sub query, it asks for a group by or aggregate, however I don't want to group by these unique account codes, otherwise I will get 1000's of rows. I just want to aggregate all entries but perform a distinct on the account details. For example age is in every entry and therefore I just need to use 1 entry for the account and not all 10 that are in there, as I will get duplicates.
select
count ([Accountid]) as Total,
round (AVG ([AGE]),2) as AVGAGE,
SUM(CASE WHEN [AGE] BETWEEN 0 AND 4 THEN 1 ELSE 0 END) AS [0-4],
SUM(CASE WHEN [AGE] > 100 THEN 1 ELSE 0 END) AS [Over 100]
from [dbo].[table1]

If I understand your question completely, I suppose that you want something like that;
select
count ([Accountid]) as Total,
round (AVG ([AGE]),2) as AVGAGE,
SUM(CASE WHEN [AGE] BETWEEN 0 AND 4 THEN 1 ELSE 0 END) AS [0-4],
SUM(CASE WHEN [AGE] > 100 THEN 1 ELSE 0 END) AS [Over 100]
from [dbo].[table1] where Accountid NOT IN (select Accountid from [table1] group by Accountid having count(*) > 1)
Hope it helps

Related

SQL Server: add column for rows since value changed

I have a table that contains 3 columns: personID, weeknumber, and event. Event is 0 if there was no event for that person in that week and 1 if there was.
I need to create a new column weekssincelastevent which will be 0 for the week where event=1 and then 1,2,3,4 etc for the weeks afterwards. If there is a later event then it starts from 0 again. E.g.
personID
weeknumber
event
weekssincelastevent
1
1
0
NULL
1
2
0
NULL
1
3
1
0
1
4
0
1
1
5
0
2
1
6
0
3
2
1
0
NULL
2
2
1
0
2
3
0
1
2
4
1
0
2
5
0
1
The column should be NULL before the first events and all values NULL where a personID never has event.
I can't think how to write this in SQL.
The table has ~600m rows (60m personIDs with 100 weeknumbers each, although some personIDs don't have all the weeknumbers).
Many thanks for any insight.
This is a bit of a gaps and island problem here. The first part, in the CTE, puts the data into "groups". Each time there is an event that's a new group. it also calculates the number of weeks that past since the prior week (which is set to 0 for rows hosting an event). Then in the outer query we SUM the number of weeks past in each group, giving the number of weeks that have passed:
WITH Groups AS(
SELECT PersonID,
WeekNumber,
Event,
COUNT(CASE Event WHEN 1 THEN 1 END) OVER (PARTITION BY PersonID ORDER BY WeekNumber ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Events,
CASE Event WHEN 0 THEN WeekNumber - LAG(WeekNumber) OVER (PARTITION BY PersonID ORDER BY WeekNumber ASC) ELSE 0 END AS WeeksPassed
FROM dbo.YourTable)
SELECT PersonID,
WeekNumber,
Event,
CASE WHEN Events = 0 THEN NULL
ELSE SUM(WeeksPassed) OVER (PARTITION BY PersonID, Events ORDER BY WeekNumber ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
END AS WeekSinceLastEvent
FROM Groups;
db<>fiddle
You can do this with a conditional aggregate within a windowed function:
SELECT t.PersonID,
t.WeekNumber,
t.Event,
WeeksSinceLastEvent = t.WeekNumber - MAX(CASE WHEN t.Event = 1 THEN t.WeekNumber END)
OVER(PARTITION BY t.PersonID ORDER BY t.WeekNumber)
FROM dbo.T AS t;
The key parts are:
CASE WHEN t.Event = 1 THEN t.WeekNumber END Only consider week number where it is a valid event. Since MAX with ignore nulls this will only consider relevant rows
OVER (PARTITION BY t.PersonID ORDER BY t.WeekNumber) - Only consider rows for the current person, where the weeknumber is lower than the current row.
Example on DB<>Fiddle

Selecting same MSSQL table with different condition to get the difference

I have a table FinTrans As
Seq|Ledger|Debit_Credit|Amount
1 |130000|Debit |105
2 |120000|Debit |1456
3 |130000|Credit |500
4 |130000|Debit |9680
5 |130000|Credit |1432
6 |120000|Debit |1628
I want to find (sum of Debit Amount) - (sum of Credit Amount) for each ledger.
For eg.in above case for Ledger 130000
the sum of Debit Amount = 105+9680 = 9785
the sum of Credit Amount = 500 +1432=1537
Difference = 8248
How can I write a SQL query on the same table?
You can put a CASE expression inside an aggregate function. This is called conditional aggregation.
SELECT Ledger, SUM(Amount * CASE WHEN Debit_Credit = 'Credit' THEN -1 ELSE 1 END) As Difference
FROM FinTrans
-- WHERE Ledger = 130000 -- optional
GROUP BY d.Ledger
It works here because of the commutative property, which says you don't have to add up all the credits and debits separately to subtract one from the other; you can do all the additions and subtractions in any order and still end up with the same result.
If you really want to, you can do it this way:
SELECT Ledger,
SUM(CASE WHEN Debit_Credit = 'Debit' THEN Amount ELSE 0 END)
- SUM(CASE WHEN Debit_Credit = 'Credit' THEN Amount ELSE 0 END) As Difference
FROM FinTrans
-- WHERE Ledger = 130000 -- optional
GROUP BY d.Ledger
It more resembles the problem description. But it's more complicated and slower, and again, it's not needed.

Passing values into CASE statement

and thank you all in advance for your help.
I'm trying to take the results from two separate queries and include them in a third query that has a CASE statement. I've had some success but I'm not able to present the results of the third query in the proper order. The purpose of this is to show the employee count for each department under the different managers. So far I can only load separately the manager names and their departments and employee department count totals by department. What I can't figure out is how to get the manager names in and the employee department count in for each manager row. Below are the two source queries I've used so far and the query with the CASE statement. I've also looked at UNPIVOT function with no success yet.
a) This simple query lists each primary manager name. There are also sub managers that will be returned using a hierarchy query later.
select name from employees "Boss" where employeeid in
(‘1’,'5','25','84','85');
b) This query returns the department id count for each main manager (‘1’,'5','25','84','85') as well as all sub-managers.
select departmentid, count(departmentid) COUNT from employees
where departmentid = departmentid and level <= 3
connect by prior employeeid = bossid
start with employeeid = 5
group by departmentid
order by departmentid;
c) Here’s a CASE statement that outputs exactly as desired. The problem here is the select statement currently outputs only the manager names and the manager departments into the columns. What I need to do is output both the manager names and the manager's employee department counts into the individual manager row columns. I've tried to do a separate select of the manager names to get the ‘Boss’ column and another select to include the department counts. But that got messy. Also passing the counts in a second statement would create an additional unwanted column.
select e.name "Boss",
COUNT(CASE WHEN d.departmentid = '1' THEN 1 END) AS "Finance",
COUNT(CASE WHEN d.departmentid = '2' THEN 1 END) AS "HR",
COUNT(CASE WHEN d.departmentid = '3' THEN 1 END) AS "IT",
COUNT(CASE WHEN d.departmentid = '4' THEN 1 END) AS "Marketing",
COUNT(CASE WHEN d.departmentid = '5' THEN 1 END) AS "Sales"
from employees e, departments d
where e.employeeid in (select distinct e.bossid from employees e)
and e.departmentid = d.departmentid (+)
group by e.name
order by e.name;
Boss Finance HR IT Marketing Sales
-------------------- ---------- ---------- ---------- ---------- ----------
Baxter Carney 0 0 0 0 1
Blythe Pierce 0 0 0 0 1
Here's an altered CASE query that loads the employee department counts but unfortunately it loads by department and not by individual manager. That is the problem I'm stuck on right now. How to pass the counts to the right manager and into the right column.
select departmentid "DEPTNO",
COUNT(CASE WHEN departmentid = '1' THEN 1 END) AS "Finance",
COUNT(CASE WHEN departmentid = '2' THEN 1 END) AS "HR",
COUNT(CASE WHEN departmentid = '3' THEN 1 END) AS "IT",
COUNT(CASE WHEN departmentid = '4' THEN 1 END) AS "Marketing",
COUNT(CASE WHEN departmentid = '5' THEN 1 END) AS "Sales"
from employees
where departmentid = departmentid and level <= 3
connect by prior employeeid = bossid
start with employeeid = 5
group by departmentid
order by departmentid
/
DEPTNO Finance HR IT Marketing Sales
3 0 0 1 0 0
5 0 0 0 0 21
And here's for all managers. You can see that it just keeps increasing the individual department count.
DEPTNO Finance HR IT Marketing Sales
1 4 0 0 0 0
2 0 23 0 0 0
3 0 0 20 0 0
4 0 0 0 1 0
5 0 0 0 0 28

How SQL Server iterates over rows when computing several aggregate functions in one select query

I have a SQL Server 2012
And query
SELECT ManagerId,
SUM(CASE WHEN SoldInDay < 30 THEN 1 ELSE 0 END) as badSoldDays,
SUM(CASE WHEN Category = 'PC' THEN 1 ELSE 0 END) as DaysWithSoldPc
FROM SomeTable
GROUP BY ProductId
Some Table Definition
ManagerId | SoldInDay | Category
1 50 PC
1 20 Laptop
2 30 PC
3 40 Laptop
So, question is:
Does it mean that Sql will iterate over all rows twice? so, each aggregate function executes in separate cycle over all rows in table? or it's much smarter?
Doesn't matter what I want to get by this query, it's my dream.
(It appears that your question was addressed by a comment, but for completeness, I've provided an official answer here.)
First, your example SQL will not run. You are including ManagerId in the field list but not the GROUP BY. You will get an error akin to this:
Msg 8120, Level 16, State 1, Line 9 Column '#SomeTable.ManagerID' is
invalid in the select list because it is not contained in either an
aggregate function or the GROUP BY clause.
Assuming you meant "ManagerId" instead of "ProductId" in the field list, I reproduced your situation and reviewed the execution plans. It showed only one "Stream Aggregate" operator. You can force it to run over the table twice by separating the aggregations into two different common table expressions (CTEs) and JOINing them back together. In that case, you see two Stream Aggregate operators (one for each run through the table).
Here is the code to generate the execution plans:
DECLARE #SomeTable TABLE
(
ManagerId int,
SoldInDay int,
Category varchar(50)
);
INSERT INTO #SomeTable (ManagerId, SoldInDay, Category) VALUES (1, 50, 'PC');
INSERT INTO #SomeTable (ManagerId, SoldInDay, Category) VALUES (1, 20, 'Laptop');
INSERT INTO #SomeTable (ManagerId, SoldInDay, Category) VALUES (2, 30, 'PC');
INSERT INTO #SomeTable (ManagerId, SoldInDay, Category) VALUES (3, 40, 'Laptop');
/*
This produces an error:
Msg 8120, Level 16, State 1, Line 9
Column '#SomeTable.ManagerID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
SELECT
ManagerId,
SUM(CASE WHEN SoldInDay < 30 THEN 1 ELSE 0 END) as badSoldDays,
SUM(CASE WHEN Category = 'PC' THEN 1 ELSE 0 END) as DaysWithSoldPc
FROM #SomeTable
GROUP BY ProductId;
*/
SELECT
ManagerId,
SUM(CASE WHEN SoldInDay < 30 THEN 1 ELSE 0 END) as BadSoldDays,
SUM(CASE WHEN Category = 'PC' THEN 1 ELSE 0 END) as DaysWithSoldPc
FROM #SomeTable
GROUP BY ManagerId;
WITH DaysWithSoldPcTable AS
(
SELECT
ManagerId,
SUM(CASE WHEN Category = 'PC' THEN 1 ELSE 0 END) as DaysWithSoldPc
FROM #SomeTable
GROUP BY ManagerId
), BadSoldDaysTable AS
(
SELECT
ManagerId,
SUM(CASE WHEN SoldInDay < 30 THEN 1 ELSE 0 END) as BadSoldDays
FROM #SomeTable
GROUP BY ManagerId
)
SELECT
DaysWithSoldPcTable.ManagerId,
DaysWithSoldPcTable.DaysWithSoldPc,
BadSoldDaysTable.BadSoldDays
FROM DaysWithSoldPcTable
JOIN BadSoldDaysTable
ON DaysWithSoldPcTable.ManagerId = BadSoldDaysTable.ManagerId;

Custom Query Output Ordering with a CASE function

I often find when I am pulling data for analysis, that I group the number of orders a customer has placed into ranges, such as:
1-2
3-5
6-9
10-12
13-15
I do this with a CASE function. However, when you get the query results, the order ranges will be listed like:
1-2
10-12
13-15
3-5
6-9
This easy to correct in Excel when you have 1 query and a few order range groups. However, when you're pulling many queries, it's a pain to correct this over and over.
What is the best way to pull a range and have it ordered correctly?
here's an example of the query I would write:
SELECT
OrderRange = CASE
WHEN COUNT(OrderID) BETWEEN 1 AND 5 THEN '1-5'
WHEN COUNT(OrderID) BETWEEN 6 AND 10 THEN '6-10'
WHEN COUNT(OrderID) > 10 THEN '10+'
ELSE 'Error'
END
FROM Orders
GROUP BY CASE
WHEN COUNT(OrderID) BETWEEN 1 AND 5 THEN '1-5'
WHEN COUNT(OrderID) BETWEEN 6 AND 10 THEN '6-10'
WHEN COUNT(OrderID) > 10 THEN '10+'
ELSE 'Error'
END
ORDER BY... ?
I'd keep a table of ranges, e.g. (indices not written)
CREATE TABLE Ranges (RangeSet int, MinVal int, MaxVal int, Name varchar(50));
and then e.g.
INSERT INTO ranges VALUES
(1,1,5,'1-5'),(1,6,10,'6-10'),(1,11,-1,'11+'),
(2,1,10,'1-10'),(2,11,20,'11-20'),(2,21,30,'21-30'),(2,31,-1,'31+');
you get the idea. Now you do something like (table and field names free fiction)
SELECT
CustomerID,
count(OrderID) AS OrderCount
FROM Orders
WHERE <whatever, e.g order_date BETWEEN ... AND ...>
GROUP BY CustomerID
HAVING OrderCount>0
as you'd normally would expect, but wrap it in a superquery joining to the Ranges table
SELECT
BaseView.CustomerID as CustomerID,
Ranges.Name as OrderRange
FROM (
SELECT
CustomerID,
count(OrderID) AS OrderCount
FROM Orders
WHERE <whatever, e.g order_date BETWEEN ... AND ...>
GROUP BY CustomerID
HAVING OrderCount>0
) AS BaseView
INNER JOIN Ranges ON
Ranges.RangeSet=<id-of-required-rangeset>
AND BaseView.OrderCount>=Ranges.MinVal
AND (BaseView.OrderCount<=Ranges.MaxVal OR Ranges.MaxVal=-1)
ORDER BY RangeSet.MinVal DESC
;
Now you just have to supply the RangeSet you want to apply, maybe creating a new one on occasion.
Disclaimer: This is a performance-killer
If I'm understanding you correctly you want the list of customers and order ranges ordered from least to highest. You should be able to do that by just ordering by the count(orderID)
SELECT CustomerID,
OrderRange = CASE
WHEN COUNT(OrderID) BETWEEN 1 AND 5 THEN '1-5'
WHEN COUNT(OrderID) BETWEEN 6 AND 10 THEN '6-10'
WHEN COUNT(OrderID) > 10 THEN '10+'
ELSE 'Error'
END ,
FROM Orders
GROUP BY CustomerID
order by count(orderid)
Results:
CustomerId OrderRange
CENTC 1-5
GROSR 1-5
LAZYK 1-5
...
ROMEY 1-5
VINET 1-5
ALFKI 6-10
CACTU 6-10
...
VICTE 6-10
WANDK 6-10
BLONP 10+
GREAL 10+
RICAR 10+
...
QUICK 10+
ERNSH 10+
SAVEA 10+

Resources