Remove duplicate rows based on particular field - database

I have a table with data similar to this:
stat_id account_id discount date_from date_to type
1 1 50 2017-10-01 2017-10-31 1
2 2 40 2017-10-01 2017-10-31 1
3 1 0 2017-01-01 2017-11-30 2
I want to get all distinct account_ids, for a given period (date_from <= '2017-10-01' and date_to >= '2017-10-31'), each one with the highest type (type is either 1 or 2)
account_id discount type
1 0 2
2 40 1
I tried various queries, but I couldn't achieve this. What I get is one row with type = 1 and one row with type = 2 for account_id = 1
account_id discount type
1 50 1
1 0 2
2 40 1
I can filter them in my application, but for pure personal entertainment I want to do it in one query. Any help appreciated :)

Try this -
SELECT account_id, discount, type
FROM YOUR_TABLE
WHERE (account_id, type) IN (SELECT account_id, MAX(type)
FROM YOUR_TABLE
GROUP BY account_id)
AND date_from <= '2017-10-01'
AND date_to >= '2017-10-31'

Related

How to check values of different rows of a table

I have below sample input table. In real it has lots of records.
Input:
ID
Classification
123
1
123
2
123
3
123
4
657
1
657
3
657
4
For a 'ID', I want it's records should have 'Classification' column contains all the values 1, 2, 3 and 4. If any of these values are not present then that ID's records should be considered as an exception. The output should be as below.
ID
Classification
Flag
123
1
0
123
2
0
123
3
0
123
4
0
657
1
1
657
3
1
657
4
1
Can someone please help me with how can this can be achieved in sql server.
Thanks.
There are a couple of options here, which is more performant is up to you to test, not me (especially when I don't know what indexes you have). One uses conditional aggregation, to check that all the values are there, and the other uses a subquery and counts the DISTINCT values (as I don't know if there could be duplicate classifications):
SELECT *
INTO dbo.YourTable
FROM (VALUES(123,1),
(123,2),
(123,3),
(123,4),
(657,1),
(657,3),
(657,4))V(ID,Classification);
GO
CREATE CLUSTERED INDEX CI_YourIndex ON dbo.YourTable (ID,Classification);
GO
SELECT ID,
Classification,
CASE WHEN COUNT(CASE YT.Classification WHEN 1 THEN 1 END) OVER (PARTITION BY ID) > 0
AND COUNT(CASE YT.Classification WHEN 2 THEN 1 END) OVER (PARTITION BY ID) > 0
AND COUNT(CASE YT.Classification WHEN 3 THEN 1 END) OVER (PARTITION BY ID) > 0
AND COUNT(CASE YT.Classification WHEN 4 THEN 1 END) OVER (PARTITION BY ID) > 0 THEN 1 ELSE 0
END AS Flag
FROM dbo.YourTable YT;
GO
SELECT ID,
Classification,
CASE (SELECT COUNT(DISTINCT sq.Classification)
FROM dbo.YourTable sq
WHERE sq.ID = YT.ID
AND sq.Classification IN (1,2,3,4)) WHEN 4 THEN 1 ELSE 0
END AS Flag
FROM dbo.YourTable YT;
GO
DROP TABLE dbo.YourTable;

How to fix Aggregation in Group By, missing aggregation values

I have a table of sales info, and am interested in Grouping by customer, and returning the sum, count, max of a few columns. Any ideas please.
I checked all the Select columns are included in the Group By statement, a detail is returned not the Groupings and aggregate values.
I tried some explicit naming but that didn't help.
SELECT
customerID AS CUST,
COUNT([InvoiceID]) AS Count_Invoice,
SUM([Income]) AS Total_Income,
SUM([inc2015]) AS Tot_2015_Income,
SUM([inc2016]) AS Tot_2016_Income,
MAX([prodA]) AS prod_A,
FROM [table_a]
GROUP BY
customerID, InvoiceID,Income,inc2015, inc2016, prodA
There are multiple rows of CUST, i.e. there should be one row for CUST 1, 2 etc.... it should say this...
---------------------------------------------
CUST Count_Invoice Total_Income Tot_2015_Income Tot_2016_Income prod_A
1 2 600 300 300 2
BUT IT IS RETURNING THIS
======================================
CUST Count_Invoice Total_Income Tot_2015_Income Tot_2016_Income prod_A
1 1 300 300 0 1
1 1 300 0 300 1
2 1 300 0 300 1
2 1 500 0 500 0
3 2 800 0 800 0
3 1 300 0 300 1
You don't need to group by other columns, since they are already aggregating by count, min, max or sum.
So you may try this
SELECT customerID as CUST
,count([InvoiceID]) as Count_Invoice
,sum([Income]) as Total_Income
,sum([inc2015]) as Tot_2015_Income
,sum([inc2016]) as Tot_2016_Income
,max([prodA]) as prod_A --- here you are taking Max but in output it seems like sum
FROM [table_a]
Group By customerID
Note: For column prod_A you are using max which gives 1 but in result it is showing 2 which is actually sum or count. Please check.
for more info you may find this link of Group by.
From the description of your expected output, you should be aggregating by customer alone:
SELECT
customerID A CUST,
COUNT([InvoiceID]) AS Count_Invoice,
SUM([Income]) AS Total_Income,
SUM([inc2015]) AS Tot_2015_Income,
SUM([inc2016]) AS Tot_2016_Income,
MAX([prodA]) AS prod_A
FROM [table_a]
GROUP BY
customerID;

Daily report by date with mssql for mutiple column

I want to display daily report like this
Fulltime Contract Casual
2018/06/04 1 0 0
2018/06/05 1 0 0
2018/06/06 0 1 1
2018/06/07 2 1 0
2018/06/08 1 1 1
2018/06/09 0 1 1
but what I have is like this
Date Jobtype Meal
2018/06/04 Fulltime 1
2018/06/05 Fulltime 1
2018/06/06 Casual 1
2018/06/06 Contract 1
2018/06/07 Casual 1
2018/06/07 Contract 2
2018/06/08 Casual 1
2018/06/08 Contract 1
2018/06/08 Fulltime 1
2018/06/09 Casual 1
2018/06/09 Contract 1
What I have tried:
select Date, Jobtype,'Meal'=(COUNT(Date))
from CanLog
where WW BETWEEN '2018/06/06' and '2018/06/09'
group by Date, Jobtype
order by 1
I think you can try this:
SELECT Date,
(SELECT COUNT(*) FROM CanLog as c WHERE c.WW = clog.WW AND jobtype = 'fulltime') AS Fulltime,
(SELECT COUNT(*) FROM CanLog as c WHERE c.WW = clog.WW AND jobtype = 'contract') AS Contract,
(SELECT COUNT(*) FROM CanLog as c WHERE c.WW = clog.WW AND jobtype = 'casual') AS Casual
FROM CanLog AS clog
WHERE WW BETWEEN '2018/06/06' AND '2018/06/09'
GROUP BY Date, Jobtype
ORDER BY Date
Select in parenthesis count how much of given string is in the day.
And if you write from small all enlarged letters, it doesnt matter, sql is case insensitive
Not sure where your WW column comes from. I assumed it's the date column. If it's not, please adjust.
You need to use UNPIVOT operator and discard records that equal to 0:
select u.date, u.jobtype, u.meal
from canlog
unpivot
(
meal
for jobtype in ( fulltime, contract, casual )
) u
where
[Date] between '2018/06/06' and '2018/06/09'
and meal <> 0;
See Live DEMO.
Result:
date jobtype meal
2018-06-04 Fulltime 1
2018-06-05 Fulltime 1
2018-06-06 Contract 1
2018-06-06 Casual 1
2018-06-07 Fulltime 2
2018-06-07 Contract 1
2018-06-08 Fulltime 1
2018-06-08 Contract 1
2018-06-08 Casual 1
2018-06-09 Contract 1
2018-06-09 Casual 1
SELECT distinct Date,
(SELECT COUNT() FROM CanLog as c WHERE c.WW = clog.WW AND jobtype = 'fulltime') AS Fulltime,
(SELECT COUNT() FROM CanLog as c WHERE c.WW = clog.WW AND jobtype = 'contract') AS Contract,
(SELECT COUNT(*) FROM CanLog as c WHERE c.WW = clog.WW AND jobtype = 'casual') AS Casual
FROM CanLog AS clog
WHERE WW BETWEEN '2018/06/06' AND '2018/06/09'
GROUP BY Date, Jobtype
ORDER BY Date

Union and Order By (SQL Server)

Consider a table A and table B like :
Table A:
debit credit row
-----------------------
10 0 1
0 10 1
20 0 2
0 20 2
30 0 3
0 30 3
Table B:
debit credit row
-----------------------
10 0 1
0 10 1
20 0 2
0 20 2
30 0 3
0 30 3
Result:
debit credit row
--------------------
10 0 1
20 0 2
30 0 3
0 10 1
0 20 2
0 30 3
I'm trying to union all table A, B and show debit first, then sort it by row column.
by definition, the individual SELECTs making up a UNION are not allowed to contain an ORDER BY clause. The only ORDER BY clause allowed is at the end of the UNION and it applies to the entire UNION, making xxx UNION yyy ORDER BY zzz the eqivalent of (xxx UNION yyy) ORDER BY zzz
Meaning:
Invalid:
Select debit,credit,row
from
(
Select debit,credit,row
From table a
Where 'condition'
Union
Select debit,credit,row
From table b
Where 'condition 2'
) results
order by debit, row
Valid:
Select debit,credit,row
From table a
Where 'condition'
Union
Select debit,credit,row
From table b
Where 'condition 2'
Order by debit, row

Get top X percentage based on cumulative sum

My table looks like this:
ID | ItemID | ItemQualityID | Amount | UnitPrice
My goal is to find the top x% rows for each ItemID + ItemQualityID pair based on Amount cumulative sum and ordered by UnitPrice.
For example:
ID | ItemID | ItemQualityID | Amount | UnitPrice
1 1 1 18 2
2 1 1 1 1
3 1 1 1 1
4 2 1 18 2
5 2 1 1 1
6 2 1 1 1
7 1 1 1 3
and I want the top 10%, then the resulting table should contain row #2, 3, 5, 6. Since the total amount for ItemID 1 and 2 are 21 and 20 respectively, thus 10% would be 2 items each. If I want the top 20%, the resulting table should still be the same since if I include row 1 and 4 it would make it 100%. Row #7 has unit price > row #1 so if row #1 is not included then row #7 shouldn't be included as well.
Ideally I want the table with all the filtered rows for some other calculations but I will be happy even if I can only get the sum of Amount * UnitPrice of the filtered table. Something like
ItemID | ItemQualityID | Sum
1 1 2
2 1 2
for the above example.
You can use SUM OVER :
DECLARE #percent DECIMAL(5, 2) = .1
;WITH CteSum AS(
SELECT *,
TotalSum = SUM(Amount) OVER(PARTITION BY ItemID, ItemQualityID),
CumSum = SUM(Amount) OVER(PARTITION BY ItemID, ItemQualityID ORDER BY UnitPrice, ID)
FROM tbl
)
SELECT
ItemID,
ItemQualityID,
[Sum] = SUM(Amount * UnitPrice)
FROM CteSum
WHERE CumSum <= #percent * TotalSum
GROUP BY ItemID, ItemQualityID
ONLINE DEMO

Resources