SQL COUNT on 3 tables with JOIN

SQL COUNT on 3 tables with JOIN - sql-server

I have 4 tables in a database. The warehouse contains boxes owned by clients, and the boxes have files in them. There is a Client table, a Warehouse table, a Boxes table, and a Files table.
So the Client table has WarehouseID as a foreign key, the Boxes table has ClientID as a foreign key, and the Files table has BoxID as a foreign key. I want to count the number of boxes and files that each client has in my query, as well as the number of boxes that are in and out of the warehouse. A Status field on the Boxes and Files tables determines if the boxes and files are in or out of the warehouse. I run the following query on the boxes and the numbers are correct:
SELECT
[c].[ClientID],
[c].[Name] AS [ClientName],
[w].[Name] AS [WarehouseName],
COUNT(DISTINCT [b].[BoxID]) AS [BoxCount],
SUM(CASE WHEN [b].[Status] = #IN THEN 1 ELSE 0 END)) AS [BoxesIn],
SUM(CASE WHEN [b].[Status] = #OUT THEN 1 ELSE 0 END) AS [BoxesOut],
SUM(CASE WHEN [b].[DestructionDate] <= GETDATE() THEN 1 ELSE 0 END) AS [BoxesForDestruction],
FROM [Clients] AS [c] INNER JOIN [Boxes] AS [b]
ON [c].[ClientID] = [b].[ClientID]
INNER JOIN [Warehouses] AS [w]
ON [c].WarehouseID = [w].[WarehouseID]
WHERE [c].[ClientID] = #ClientID
GROUP BY
[c].[ClientID],
[c].[Name],
[w].[Name]
This produces the output of:
ClientID | ClientName | WarehouseName | BoxCount | BoxesIn | BoxesOut | BoxesForDestruction
1 | ACME Corp. | FooFactory | 22744 | 22699 | 45 | 7888
The output of the count is correct. When I add the Files table to the INNER JOIN then the numbers get inflated. Here is the SQL:
SELECT
[c].[ClientID],
[c].[Name] AS [ClientName],
[w].[Name] AS [WarehouseName],
COUNT(DISTINCT [b].[BoxID]) AS [BoxCount],
COUNT(DISTINCT [f].[FileID]) AS [FileCount], -- *NEW*
SUM(CASE WHEN [b].[Status] = #IN THEN 1 ELSE 0 END)) AS [BoxesIn],
SUM(CASE WHEN [b].[Status] = #OUT THEN 1 ELSE 0 END) AS [BoxesOut],
SUM(CASE WHEN [b].[DestructionDate] <= GETDATE() THEN 1 ELSE 0 END) AS [BoxesForDestruction],
FROM [Clients] AS [c] INNER JOIN [Boxes] AS [b]
ON [c].[ClientID] = [b].[ClientID]
INNER JOIN [Warehouses] AS [w]
ON [c].[WarehouseID] = [w].[WarehouseID]
INNER JOIN [Files] AS [f] -- *NEW*
ON [b].[BoxID] = [f].[BoxID] -- *NEW*
WHERE [c].[ClientID] = #ClientID
GROUP BY
[c].[ClientID],
[c].[Name],
[w].[Name]
This gives me the count output below (I've omitted the first 3 columns since they're not relevant):
BoxCount | FilesCount | BoxesIn | BoxesOut | BoxesForDestruction
19151 | 411961 | 411381 | 580 | 144615
The FilesCount is correct, but the other numbers are off. I know why this is happening, but I'm not sure how to fix it. The extra rows are created due to the multiple rows returned by the join on the boxes and files. When performing the SUM, the extra rows inflate the count. Since there is only one row for the warehouse, that join doesn't affect the count. How do I modify my query to get the correct number of files and boxes in and out of the warehouse?

A join repeats each row in the left hand table for each row in the right hand table. If you combine multiple joins some rows will be double counted. A solution is to move the count to a subquery. For example:
select *
from table1 t1
join (
select table1_id
, count(*)
from table2
group by
table1_id
) t2
on t2.table1_id = t1.id
join (
select table1_id
, count(*)
from table3
group by
table1_id
) t3
on t3.table1_id = t1.id

As mentioned by Andomar, I included "as myColumnOne" and "myColumnTwo" besides Count(*), as it is required on SQL Server 2018:
select *
from table1 t1
join (
select table1_id
, count(*) as myColumnOne
from table2
group by
table1_id
) t2
on t2.table1_id = t1.id
join (
select table1_id
, count(*) as myColumnTwo
from table3
group by
table1_id
) t3
on t3.table1_id = t1.id

Related

Subquery from another table in Case when condition

I am trying to write a query where I want to sum a price column based on the condition which is a subquery.
my query :
select
fund.FundName,
SUM(Case when (
Select Top 1 bitValue
from table 1
where table1.id = Company.id and table1.field = 25
) = 1 then price else 0 end) as 'TotalPrice'
from
Fund left outer join Company on Company.fundId=fund.id
group by
fund.fundName
It throws me error : Cannot perform an aggregate function on an expression containing an aggregate or a subquery.
What is the best alternative way to achieve this.

Hope this Works for your Case:
SELECT
FUND.FUNDNAME,
S.TotalPrice
FROM FUND
LEFT OUTER JOIN COMPANY ON COMPANY.FUNDID=FUND.ID
LEFT JOIN (SELECT CASE WHEN BITVALUE=1 THEN SUM(PRICE) ELSE 0 END as 'TotalPrice',table1.ID
from table 1
where table1.id = Company.id and table1.field = 25 GROUP BY table1.ID
) S ON S.ID=Company.id
GROUP BY
FUND.FUNDNAME

untested obviously with no sample data provided.
select fund.FundName
,SUM(Case when table1.id is not null then price else 0 end) as 'TotalPrice'
from Fund
left outer join Company on Company.fundId = fund.id
left outer join (
select distinct id
from table1
where field = 25
and bitvalue = 1
) table1 on table1.id = Company.id
group by fund.fundName

Find matched column records in one table that may be in multiple columns in a second table

I have two tables, Table 1 with multiple columns, name, ID number, address, etc. And Table 2 with columns, ID number 1 and ID number 2 and a few other columns.
I am trying to get a T-SQL query returning all rows in Table 1 with an indicator showing whether the ID number in Table 1 matches either ID_1 or ID_2 in Table 2. The result set would be all columns from Table 1 , plus the indicator “Matched” if the ID number in Table 1 matches either ID_1 or ID_2 in Table 2.
Table 1: ID | Name | Address |
Table 2: ID_1 | ID_2
Result
T1.ID, Name, Address, ("Matched"/"Unmatched") ...
Also, would it be the same to do the opposite, meaning instead of the result including all rows from Table 1 that have a matching ID in ID_1 or ID_2 in Table 2, the result set would include only records from Table 1 where t1.ID = (T2.ID_1 or T2.ID_2)?
SELECT DISTINCT
CASE
WHEN (table1.ID = table2.ID_1 )
THEN 'Matched'
ELSE 'Unmatched'
END AS Status ,
table1.*
FROM
table1
LEFT JOIN
table2 ON table1.ID = table2.ID_1
UNION
SELECT DISTINCT
CASE
WHEN (table1.ID = table2.ID_2)
THEN 'Matched'
ELSE 'Unmatched'
END AS Status,
table1.*
FROM
table1
LEFT JOIN
table2 ON table1.ID = table2.ID_2

I think that a correlated subquery with an exists condition would be a reasonable solution:
select
t1.*,
case when exists (select 1 from table2 t2 where t1.id in (t2.id_1, t2.id_2))
then 'Matched'
else 'Unmatched'
end matched
from table1 t1
And the other way around:
select
t2.*,
case when exists (select 1 from table1 t1 where t1.id in (t2.id_1, t2.id_2))
then 'Matched'
else 'Unmatched'
end matched
from table2 t2
If you want to "align" the rows based on the match for the whole dataset at once, then you might want to try a full join:
select t1.*, t2.*
from table1 t1
full join table2 t2 on t1.id in (t2.id_1, t2.id_2)

SQL Server: Cross-apply - record 0 for NULL results

T1 is a table of company and their (multiple users), T2 is table of registered users. I counted, for each company in T1, how many of their users are in T2 but need c3 to appear in the result table with #regUser == 0:
T1:
company user
c1 u1
c1 u2
c2 u2
c2 u3
c3 u4
c3 u1
T2:
user
u2
u3
So the resultant table should look like:
company #regUser
c1 1
c2 2
c3 0
With the following code I'm only getting the results for non-null companies:
select t1s.company, count(1)
from (select * from t1) t1s
cross apply (select *
from t2 t2s
where t2s.reguser = t1s.[user]) t12s
group by t1s.company
Thanks

just use left join
select t1.company,count(t2.user)
from t1 left join t2 on t1.user=t2.user
group by t1.company
the subquery is not necessary according to your requirement
but if you want to use apply then you need like below query
select t1s.company, count(t12s.Users)
from (select * from t1) t1s
outer apply (select Users
from t2 t2s
where t2s.Users = t1s.[Users]) t12s
group by t1s.company
output
company #regUser
c1 1
c2 2
c3 0
demolink

You can use a LEFT JOIN to get all the information of the left table with matching information from right table. By using a GROUP BY you can group the rows by company and get the COUNT of registered users for each company:
SELECT t1.company, COUNT(t2.[user]) AS regUser
FROM t1 LEFT JOIN t2 ON t1.[user] = t2.[user]
GROUP BY t1.company
ORDER BY t1.company ASC
You can also use the CROSS APPLY to solve this:
SELECT t1.company, SUM(CASE WHEN t1.[user] = t2.[user] THEN 1 ELSE 0 END) AS regUser
FROM t1 CROSS APPLY t2
GROUP BY t1.company
ORDER BY t1.company ASC
demo on dbfiddle.uk

All you need is a left join :
select company,count(t2.[user])
from t1 left outer join t2 on t1.[user]=t2.[user]
group by company
The question's query is overcomplicated. For example, from (select * from t1) t1s is equivalent to from t1 as t1s.

Just a LEFT JOIN with SUM()
CREATE TABLE T1(
Company VARCHAR(20),
Users VARCHAR(20)
);
CREATE TABLE T2(
Users VARCHAR(20)
);
INSERT INTO T1 VALUES
('c1', 'u1'),
('c1', 'u2'),
('c2', 'u2'),
('c2', 'u3'),
('c3', 'u4'),
('c3', 'u1');
INSERT INTO T2 VALUES
('u2'),
('u3');
SELECT T1.Company,
SUM(CASE WHEN T2.Users IS NULL THEN 0 ELSE 1 END) Cnt
FROM T1 LEFT JOIN T2
ON T1.Users = T2.Users
GROUP BY T1.Company;
Returns:
+---------+-----+
| Company | Cnt |
+---------+-----+
| c1 | 1 |
| c2 | 2 |
| c3 | 0 |
+---------+-----+
Live Demo

Create a stored procedure to aggregate rows

Having a transaction table with the following rows:
Id UserId PlatformId TransactionTypeId
-------------------------------------------------
0 1 3 1
1 1 1 2
2 2 3 2
3 3 2 1
4 2 3 1
How do I write a stored procedure that can aggregate the rows into a new table with the following format?
Id UserId Platforms TransactionTypeId
-------------------------------------------------
0 1 {"p3":1,"p1":1} {"t1":1,"t2":1}
1 2 {"p3":2} {"t2":1,"t1":1}
3 3 {"p2":1} {"t1":1}
So the rows are gouped by User, count each platform/transactionType and store as key/value json string.
Ref: My previous related question

You could use GROUP BY and FOR JSON:
SELECT MIN(ID) AS ID, UserId, MIN(sub.x) AS Platforms, MIN(sub2.x) AS Transactions
FROM tab t
OUTER APPLY (SELECT CONCAT('p', platformId) AS platform, cnt
FROM (SELECT PlatformId, COUNT(*) AS cnt
FROM tab t2 WHERE t2.UserId = t.UserId
GROUP BY PlatformId) s
FOR JSON AUTO) sub(x)
OUTER APPLY (SELECT CONCAT('t', TransactiontypeId) AS Transactions, cnt
FROM (SELECT TransactiontypeId, COUNT(*) AS cnt
FROM tab t2 WHERE t2.UserId = t.UserId
GROUP BY TransactiontypeId) s
FOR JSON AUTO) sub2(x)
GROUP BY UserId;
DBFiddle Demo
Result is a bit different(array of key-value) but please treat it as starting point.

Your sample JSON is not really a json, but since you want it that way:
SELECT u.UserId, plt.pValue, ttyp.ttValue
FROM Users AS [u]
CROSS APPLY (
SELECT '{'+STUFF( (SELECT ',"'+pn.pName+'":'+LTRIM(STR(pn.pCount))
FROM (SELECT p.Name AS pName, COUNT(*) AS pCount
FROM transactions t
left JOIN Platforms p ON p.PlatformID = t.PlatformId
WHERE t.UserId = u.UserId
GROUP BY p.PlatformId, p.Name
) pn
FOR XML PATH('')),1,1,'')+'}'
) plt(pValue)
CROSS APPLY (
SELECT '{'+STUFF( (SELECT ',"'+tty.ttName+'":'+LTRIM(STR(tty.ttCount))
FROM (SELECT tt.Name AS ttName, COUNT(*) AS ttCount
FROM transactions t
left JOIN dbo.TransactionType tt ON tt.TransactionTypeId = t.TransactionTypeID
WHERE t.UserId = u.UserId
GROUP BY tt.TransactionTypeId, tt.Name
) tty
FOR XML PATH('')),1,1,'')+'}'
) ttyp(ttValue)
WHERE EXISTS (SELECT * FROM transactions t WHERE u.UserId = t.UserId)
ORDER BY UserId;
DBFiddle Sample

Limited T-SQL Join

This should be simple enough, but somehow my brain stopped working.
I have two related tables:
Table 1:
ID (PK), Value1
Table 2:
BatchID, Table1ID (FK to Table 1 ID), Value2
Example data:
Table 1:
ID Value1
1 A
2 B
Table 2:
BatchID Table1ID Value2
1 1 100
2 1 101
3 1 102
1 2 200
2 2 201
Now, for each record in Table 1, I'd like to do a matching record on Table 2, but only the most recent one (batch ID is sequential). Result for the above example would be:
Table1.ID Table1.Value1 Table2.Value2
1 A 102
2 B 201
The problem is simple, how to limit join result with Table2. There were similar questions on SO, but can't find anything like mine. Here's one on MySQL that looks similar:
LIMITing an SQL JOIN
I'm open to any approach, although speed is still the main priority since it will be a big dataset.

WITH Latest AS (
SELECT Table1ID
,MAX(BatchID) AS BatchID
FROM Table2
GROUP BY Table1ID
)
SELECT *
FROM Table1
INNER JOIN Latest
ON Latest.Table1ID = Table1.ID
INNER JOIN Table2
ON Table2.BatchID = Latest.BatchID

SELECT id, value1, value2
FROM (
SELECT t1.id, t2.value1, t2.value2, ROW_NUMBER() OVER (PARTITION BY t1.id ORDER BY t2.BatchID DESC) AS rn
FROM table1 t1
JOIN table2 t2
ON t2.table1id = t1.id
) q
WHERE rn = 1

Try
select t1.*,t2.Value2
from(
select Table1ID,max(Value2) as Value2
from [Table 2]
group by Table1ID) t2
join [Table 1] t1 on t2.Table1ID = t1.id

Either GROUP BY or WHERE clause that filters on the most recent:
SELECT * FROM Table1 a
INNER JOIN Table2 b ON (a.id = b.Table1ID)
WHERE NOT EXISTS(
SELECT 1 FROM Table2 c WHERE c.Table1ID = a.id AND c.BatchID > b. BatchID
)