Using groupBy to improve my Select (select count) query - sql-server

Let's say we have this and want to see all Tasks, that havent been done yet and an additional column showing how many open Tasks there are left for this customer.
I have a table like this in my database:
+------------+--------------------------+-------+
| CustomerID | Task | Done |
+------------+--------------------------+-------+
| 1 | CleanRoom | False |
| 1 | Cleandishes | True |
| 1 | WashClothes | False |
| 2 | TakeDogsOut | True |
| 2 | PlayWithKids | True |
| 3 | HaveFunWithMrSamplesWife | True |
| 3 | CleanMrSamplesCar | False |
+------------+--------------------------+-------+
I need this as returned table:
+------------+-------------------+-------------+
| CustomerID | Task | DoneOverAll |
+------------+-------------------+-------------+
| 1 | CleanRoom | 2 |
| 1 | WashClothes | 2 |
| 3 | CleanMrSamplesCar | 1 |
+------------+-------------------+-------------+
Perfect return table would be like this, but I can do that myself when I have the one above:
About this a question; Doing this will probably be a String combination task. Should I do this on the Select statement, or would it be more advisable to do that in the final application on the client computer?
+------------+-------------------+-------------+
| CustomerID | Task | DoneOverAll |
+------------+-------------------+-------------+
| 1 | CleanRoom | 1/3 |
| 1 | WashClothes | 1/3 |
| 3 | CleanMrSamplesCar | 1/2 |
+------------+-------------------+-------------+
I know I could go like
SELECT
a.CustomerID,
a.Task,
(
Select count(*) from myTable where
customerID = a.CustomerID and
done = False
) as DoneOverAll
FROM myTable as a
WHERE Done = False
But I think that this is very ineffective, since it would execute a Select Count for each row in my table. Is there a way to achieve this with a JOIN using groupBy or something? I'm not into GroupBy commands yet.
Okay I should have tried first. Came up with the following;
Select count(*), CustomerID from myTable group by CustomerID
All I need to do now is to get this into a join.
Okay, got it. Sorry again for not trying first!
SELECT
a.CustomerID,
a.Task,
b.cnt
FROM myTable as a
LEFT JOIN (select count(*) AS cnt, CustomerID FROM myTable GROUP BY CustomerID) as b on a.CustomerID = B.CustomerID
WHERE Done = False
Question left;
Perfect return table would be like this, but I can do that myself when I have the one above:
About this a question; Doing this will probably be a String combination task. Should I do this on the Select statement, or would it be more advisable to do that in the final application on the client computer?
+------------+-------------------+-------------+
| CustomerID | Task | DoneOverAll |
+------------+-------------------+-------------+
| 1 | CleanRoom | 1/3 |
| 1 | WashClothes | 1/3 |
| 3 | CleanMrSamplesCar | 1/2 |
+------------+-------------------+-------------+

I'm not sure why Done = False, but this is your logic. :-)
Here's what I would do, without the LEFT JOIN.
SELECT
a.CustomerID,
a.Task,
SUM(CASE WHEN a.Done = 'False' THEN 1 ELSE 0 END) DoneOverAll,
SUM(Case WHEN a.Done = 'True' THEN 1 ELSE 0 END) NotDone
FROM myTable as a
Group By a.CustomerID, a.Task

Do calculate separately .
;with tempfalse as(
SELECT
a.CustomerID,
a.Task,
count(*) as DoneOverAll
FROM myTable as a
WHERE Done = False
group by a.CustomerID, a.Task
)
, temptrue (
SELECT
a.CustomerID,
a.Task,
count(*) as total
FROM myTable as a
group by a.CustomerID, a.Task
)
SELECT
a.CustomerID,
a.Task,
cast(NULLIF(DoneOverAll,0) as varchar (10) ) + '/' + cast(NULLIF(b.total,0) as varchar (10) )
from temptrue as a left join tempfalse b
on a.CustomerID =a.CustomerID and
a.Task = b.Task

Related

SQL Server UPDATE - GROUP BY - MAX

This is SQL Server 2016. I have the following data in only one table:
custID | prodID | title | titleCount | isMasterTitle
--------+--------+--------+-------------+-----------
266 | 191750 | prod01 | 1 | 0
266 | 191750 | prod02 | 4 | 0
266 | 191750 | prod03 | 25 | 0
300 | 20125 | prod04 | 3 | 0
300 | 20125 | prod05 | 15 | 0
I want to group by custID, prodID and title and update isMasterTitle field to 1 for every max() titleCount per group.
So, I want the following:
custID | prodID | title | titleCount | isMasterTitle
--------+--------+----------+------------+---------------
266 | 191750 | prod01 | 1 | 0
266 | 191750 | prod02 | 4 | 0
266 | 191750 | prod03 | 25 | 1
300 | 20125 | prod04 | 3 | 0
300 | 20125 | prod05 | 15 | 1
I'm trying the following:
UPDATE [dbo].[_Variations]
SET isMasterTitle = 1
FROM [dbo].[_Variations] v1
INNER JOIN (SELECT custID, prodID, MAX(titleCount) AS mtitleCount
FROM [_Variations]
GROUP BY custID,prodID) as v2 ON v1.custID = v2.custID and v1.prodID = v2.prodID and v1.titleCount = v2.mtitleCount
try the following:
;with cte
as
(
select isMasterTitle, ROW_NUMBER() over (partition by custID, prodID order by titleCount desc) rn
from #t
)
update cte
set isMasterTitle = 1
where rn = 1
select * from #t
Your given code also works fine.
Please find the db<>fiddle here.
I would recommend leveraging a powerful feature of SQL Server called the updateable common-table-expression.
You can build a cte that uses window functions to identify which row should be updated, and then directly update it; there is no need to join again the original table in the outer query. This makes the query both shorter and more efficient:
with cte as (
select
isMaster,
row_number() over(partition by custID, prodID order by titleCount desc) rn
from [dbo].[_Variations]
)
update cte set isMaster = 1 where rn = 1

SQL query to get value having multiple in the same table in SQL Server

Let's say I have a table with many columns like col1, col2, col3, id, variantId, col4, col5 etc
However I am only interested in id, variantId which look like this:
+----------+-----------+
| id | variantId |
+----------+-----------+
| a | 11 |
| a | 12 |
| b | 31 |
| c | 41 |
| c | 54 |
| d | abc |
| e | xyz |
| e | xyz |
+----------+-----------+
I need distinct ids which having count of distinct variantId more than once
In this case I would only get a and c
You can use group by and having:
select id
from t
group by id
having min(variant_id) <> max(variant_id);
You can also use:
having count(distinct variant_id) > 1
Try with group by having clause
select id
from table
group by id
having count(distinct variant_id) > 1
You can do it more efficiently with EXISTS:
select distinct t.id
from tablename t
where exists (
select 1 from tablename
where id = t.id and variantid <> t.variantid
)

SUM On Column With Group By SQL

I have following data:
+----------------+--------------+-----+
| StgDescription | ID | Amt |
+----------------+--------------+-----+
| A | OA17 | 11 |
| A | OA17 | 11 |
| A | OA17 | 11 |
| A | OA17 | 11 |
| B | ZA47/ A | 12 |
| B | ZA47/ A | 12 |
| B | ZA47/ B | 10 |
| B | ZA47/ B | 10 |
| B | ZA48/ A | 14 |
| B | ZA48/ F | 10 |
| B | ZA48 /G | 13 |
| B | ZA48 /H | 10 |
| B | ZA48/ I | 15 |
| B | ZA48/ J | 10 |
| B | ZA48/ K | 16 |
| B | ZA48/ L | 10 |
| c | FA01LM100340 | 10 |
| c | PA53 AE | 10 |
+----------------+--------------+-----+
I want to generate report in following format. The amount should be sum for ID for same StgDescription.
+----------------+-----+
| StgDescription | Amt |
+----------------+-----+
| a | 11 |
| b | 120 |
| c | 20 |
+----------------+-----+
I've written following query to get this result:
WITH CTE AS(
SELECT
distinct
s.StgDescription
,p.ID
,Amt
FROM [DinDb].[dbo].[tblTvlTransaction] t
JOIN tblstgmaster s on t.StgId=s.StgId
JOIN tblProjDocSt p on t.TDocID=p.DocId
JOIN [PdasDb].[dbo].[tblIDmaster] f ON p.ID=f.ID
where OptAuthoDateTime between '2015-07-27 00:00:00' and '2015-09-01 00:00:00')
select StgDescription,sum(AMT) from cte group by StgDescription
Is there any other efficient alternative to do this?
First in cte remove duplicates, then GROUP BY like:
WITH cte AS (
SELECT DISTINCT StgDescription, ID, Amt
FROM your_tab
)
SELECT
StgDescription,
Amt = SUM(Amt)
FROM cte
GROUP BY StgDescription;
OR:
WITH cte AS (
SELECT StgDescription, ID, Amt
FROM your_tab
GROUP BY StgDescription, ID, Amt
)
SELECT
StgDescription,
Amt = SUM(Amt)
FROM cte
GROUP BY StgDescription;
I hope that you get the data from a query, not from a table. It would not be good to store data thus redundantly. And it would not be gould to name a column ID which is not the unique identifier for a row in a table.
Your problem with the data is that you have duplicates, which prevents you from getting the sum directly. So use DISTINCT to make your data unique first.
If this data is from a query then simply add DISTINCT after the SELECT keyword. If not, use a derived table (i.e. a subquery) where you select distinct records from the table.
select stgdescription, sum(amt)
from
(
select distinct stgdescription, id, amt
from mydata
) distinct_data
group by stgdescription;
You may want to replace stgdescription with lower(stgdescription), though, if stgdescription can be 'A' or 'a' and you want to treat them the same.
I'd keep it as simple as possible, like this:
select StgDescription, sum(Amt) from
(
select distinct StgDescription, ID, Amt from tablename
) a
group by StgDescription
Hope it helps!
I suspect your duplicates are coming from [tblTvlTransaction], therefore, I would remove this table as a JOIN and use EXISTS to just check a record is there. So essentially the only tables in the FROM clause are those you actually need data from:
SELECT s.StgDescription, p.ID, s.Amt
FROM tblstgmaster AS s
INNER JOIN tblProjDocSt p on
t.TDocID = p.DocId
INNER JOIN [PdasDb].[dbo].[tblIDmaster] AS f
ON p.ID = f.ID
WHERE EXISTS
( SELECT 1
FROM [DinDb].[dbo].[tblTvlTransaction] AS t
WHERE t.OptAuthoDateTime BETWEEN '2015-07-27 00:00:00' AND '2015-09-01 00:00:00'
AND t.StgId = s.StgId
);
The advantage of EXISTS is that it can use a semi-join, which essentially means rather than pulling back all the rows from the transaction table, it will stop the seek/scan as soon as it finds one matching record. This should leave you without duplicates so you can do the SUM directly:
SELECT s.StgDescription, Amount = SUM(s.Amt)
FROM tblstgmaster AS s
INNER JOIN tblProjDocSt p on
t.TDocID = p.DocId
INNER JOIN [PdasDb].[dbo].[tblIDmaster] AS f
ON p.ID = f.ID
WHERE EXISTS
( SELECT 1
FROM [DinDb].[dbo].[tblTvlTransaction] AS t
WHERE t.OptAuthoDateTime BETWEEN '2015-07-27 00:00:00' AND '2015-09-01 00:00:00'
AND t.StgId = s.StgId
)
GROUP BY s.StgDescription;

TSQL pivot issue

Hello I have a temp table (#tempResult) that contains results like the following...
-----------------------------------------
| DrugAliasID | Dosage1 | Unit1 | rowID |
-----------------------------------------
| 322 | 10 | MG | 1 |
| 322 | 50 | ML | 2 |
| 441 | 20 | ML | 3 |
| 443 | 15 | ML | 4 |
-----------------------------------------
I'm looking to get the results to be like the following, pivoting the rows that have the same DrugAliasID.
--------------------------------------------------
| DrugAliasID | Dosage1 | Unit1 | Dosage2 | Unit2 |
--------------------------------------------------
| 322 | 10 | MG | 50 | ML |
| 441 | 20 | ML | NULL | NULL |
| 443 | 15 | ML | NULL | NULL |
--------------------------------------------------
So far I have a solution that isn't using pivot. I'm not too good with pivot and was wondering if anyone knew how to use it in this scenario. Or solve it some other way. Thanks
SELECT
tr.drugAliasID,
MIN(trmin.dosage1) AS dosage1,
MIN(trmin.unit1) AS unit1,
MIN(trmax.dosage1) AS dosage2,
MIN(trmax.unit1) AS unit2
FROM
#tempResult tr
JOIN
#tempResult trmin ON trmin.RowID = tr.rowid AND trmin.drugAliasID = tr.drugAliasID
JOIN
#tempResult trmax ON trmax.RowID = tr.rowid AND trmax.drugAliasID = tr.drugAliasID
JOIN
(SELECT
MIN(RowID) AS rowid,
drugAliasID
FROM
#tempResult
GROUP BY
drugAliasID) tr1 ON tr1.rowid = trmin.RowID
JOIN
(SELECT
MAX(RowID) AS rowid,
drugAliasID
FROM
#tempResult
GROUP BY
drugAliasID) tr2 ON tr2.rowid = tr.RowID
GROUP BY
tr.drugAliasID
HAVING
count(tr.drugAliasID) > 1
Assuming your version of SQL Server supports the use of CTEs, you can simplify your query thus:
;with cte as
(select *, row_number() over (partition by drugaliasid order by rowid) rn
from #tempResult
)
select c.drugaliasid, c.dosage1, c.unit1, c2.dosage1 as dosage2, c2.unit1 as unit2
from cte c
left join cte c2 on c.drugaliasid = c2.drugaliasid and c.rn = 1 and c2.rn = 2
where c.rn = 1
Demo
This will give you the desired result, without having to use the pivot keyword.

Rebuild window function row_number in sybase

I have a problem that I could easily solve if I had window functions available in Sybase, but I dont:
Consider a table test:
+------------+----------------+-------------+
| Account_Id | Transaction_Id | CaptureDate |
+------------+----------------+-------------+
| 1 | 1 | 2014-01-01 |
| 1 | 2 | 2013-12-31 |
| 1 | 3 | 2015-07-20 |
| 2 | 1 | 2012-02-20 |
| 2 | 2 | 2010-01-10 |
| ... | ... | ... |
+------------+----------------+-------------+
I want to get a result set containing for each Account The most recent CaptureDate with the corresponding Transaction_Id. With the window function row_number this would be easy:
select Accounts_Id, CaptureDate, Transaction_Id from
(select
CallAccounts_Id,
CaptureDate,
Transaction_Id,
ROW_NUMBER() OVER(partition by Accounts_Id order by CaptureDate desc) row
from test) tbl
where tbl.row = 1
but my sybase version does not have this. Obviously, sth like
select max(Transaction_Id ), max(Transaction_Id ), Account_Id
from test
group by Account_Id
does not work because it does not always give me the correct Transaction_Id.
How can I do this then in Sybase and not make it terribly verbose?
Thanks!
Try below:
SELECT Account_Id, Transaction_Id, CaptureDate
FROM test a
WHERE CaptureDate = (
SELECT MAX(CaptureDate)
FROM test b
WHERE a.Account_Id = b.Account_Id
)
EDIT 1:
Duplicate CaptureDate was not in your example, so I did not take care of that scenario. Try below:
SELECT Account_Id, Transaction_Id, CaptureDate
FROM test a
WHERE CaptureDate = (
SELECT MAX(CaptureDate)
FROM test b
WHERE a.Account_Id = b.Account_Id
)
AND Transaction_Id =
(
SELECT MAX(Transaction_Id)
FROM test c
WHERE a.Account_Id = c.Account_Id
AND a.CaptureDate = c.CaptureDate
)

Resources