If any clause when grouping

If any clause when grouping - sql-server

Doing a Sum() on a column adds up the values in that column based on group by. But lets say I want to sum these values only if all the values are not null or not 0, then I need a clause which checks if any of the values is 0 before it does the sum. How can I implement such a clause?
I'm using sql server 2005.
Thanks,
Barry

Let's supose your table schema is:
myTable( id, colA, value)
Then, one approach is:
Select colA, sum(value)
from myTable
group by colA
having count( id ) = count( nullif( value, 0 ))
Notice that nullif is a MSSQL server function. YOu should adapt code to your rdbms brand.
Explanation:
count aggregate function only count not null values. Here a counting null values test.

You say that 0+2+3=0 for this case. Assuming that NULL+2+3 should also be zero:
SELECT GroupField,
SUM(Value) * MIN(CASE WHEN COALESCE(Value, 0) = 0 THEN 0 ELSE 1 END)
FROM SumNonZero
GROUP BY GroupField
The above statement gives this result
GroupField (No column name)
case1 5
case2 0
case3 0
with this test data
CREATE TABLE SumNonZero (
GroupField CHAR(5) NOT NULL,
Value INT
)
INSERT INTO SumNonZero(GroupField, Value)
SELECT 'case1', 2
UNION ALL SELECT 'case1', 3
UNION ALL SELECT 'case2', 0
UNION ALL SELECT 'case2', 2
UNION ALL SELECT 'case2', 3
UNION ALL SELECT 'case3', NULL
UNION ALL SELECT 'case3', 3
UNION ALL SELECT 'case3', 4

It makes no sense to eliminate 0 from a SUM because it wont impact the sum.
But you may want to SUM based on another field:
select FIELD, sum(
case when(OTHER_FIELD>0) then FIELD
else 0
end)
from TABLE
group by TABLE

Related

Creating unique identifier column(1 or zero) Rank () SQL SERVER

I am trying to create a column in SQL SERVER that shows 1 OR 0(zero). I have a column of customer numbers that appear more than once. At the first hit on a unique non-repeated customer number it should show one and if it is repeated then 0(zero). How can I create this ?
CustNumber Unique
25122134 1
25122134 0
25122134 0
25122136 1
25122136 0
the solutions I am considering and trying out now are Rank() and Rank_DENSE().

declare #test table
(
CustNumber int
)
insert into #test values
(25122134),
(25122134),
(25122134),
(25122136),
(25122136)
select
* ,
// each CustNumber in partition has the same rank, but different row_number
case when (row_number() over (partition by CustNumber order by CustNumber)) = 1
then 1 else 0 end as [Unique]
// the 1st is unique, the rest (2..n) are not
from #test
order by CustNumber, [Unique] desc
// unique in each group should be displayed first

You don't want RANK because that, by definition, produces the same output for identical inputs.
ROW_NUMBER() and a simple CASE expression should do it:
;WITH Numbered as (
SELECT CustNumber,
ROW_NUMBER() OVER (PARTITION BY CustNumber
ORDER BY CustNumber) as rn --Unusual - pick a real column if you have a preference
FROM YourUnnamedTable
)
SELECT CustNumber,CASE WHEN rn = 1 THEN 1 ELSE 0 END as [Unique]
FROM Numbered

Aggregate Function Error on an Expression

What could be wrong with this query:
SELECT
SUM(CASE
WHEN (SELECT TOP 1 ISNULL(StartDate,'01-01-1900')
FROM TestingTable
ORDER BY StartDate Asc) <> '01-01-1900' THEN 1 ELSE 0 END) AS Testingvalue.
The get the error:
Cannot perform an aggregate function on an expression containing an aggregate or a subquery.

As koppinjo stated what your current (broken) query is doing is checking if you have a NULL-value (or StartDate = '01-01-1900') in your table, return either a 1 or a 0 depending on which, and then attempting to SUM that single value.
There are 2 different logical things you want.
Either getting the amount of rows that has a StartDate or checking if any row is missing StartDate.
SELECT --Checking if there is a NULL-value in table
(
CASE WHEN
(SELECT TOP 1 ISNULL(StartDate,'01-01-1900')
FROM TestingTable
ORDER BY StartDate Asc) <> '01-01-1900' THEN 1
ELSE 0
END
) AS TestingValue
SELECT SUM(TestingValue) TestingValue --Give the count of how many non-NULLs there is
FROM
(
SELECT
CASE WHEN
ISNULL(StartDate,'01-01-1900') <> '01-01-1900' THEN 1
ELSE 0
END AS TestingValue
FROM TestingTable
) T
Here is a SQL Fiddle showing both outputs side by side.

Hard to say, but you probably want something like this:
SELECT
SUM(TestingValue)
FROM
(SELECT
CASE
WHEN ISNULL(StartDate,'01-01-1900') <> '01-01-1900'
THEN 1
ELSE 0
END AS TestingValue
FROM TestingTable) t
As your original query is written now, your subquery will return 1 value overall, so your sum would be 1 or 0 always, not to mention it is illegal. To get around that, this SQL will apply the case statement to every row in the TestingTable and insert the result into a derived table (t), then the 'outer' select will sum the results. Hope this helps!

SQL Server reference fields in derived table with unions

I'm having a bit of an issue with some derived tables that I hope someone will be able to help with. What I've got is 2 derived tables inside a select statement that then uses a pivot to display the results horizontally rather than vertically.
What I've got so far is:
SELECT * FROM(
SELECT SUM(Value) AS TotDays, ClassId FROM MainTable GROUP BY ClassId)
Union All
SELECT SUM(NumDays) As TotDays, ClassId FROM (
SELECT CASE WHEN COUNT(SiteId) > 0 THEN 1 ELSE 0 END AS NumDays
FROM Table2 GROUP BY ClassId ) as SUB
) AS a
PIVOT (SUM(TotDays) FROM ClassId
IN ([12],[13],[14],[15]
What I'm trying to do is reference the individual columns rather than using SELECT *, but I don't know how to do it. I can make it work without if I drop everything from the union onwards, but when I put the union in it doesn't work and I have to use SELECT *.
Anyone got any ideas on what's going wrong?
Thanks
Alex

You have a couple of errors on your query. For example, your UNION ALL has sets with a different number of columns, and you have other syntax errors. Try this way:
SELECT [12],[13],[14],[15]
FROM ( SELECT SUM(Value) AS TotDays, ClassId
FROM MainTable
GROUP BY ClassId
UNION ALL
SELECT SUM(NumDays) As TotDays, ClassId
FROM ( SELECT CASE WHEN COUNT(SiteId) > 0 THEN 1 ELSE 0 END NumDays,
ClassId
FROM Table2
GROUP BY ClassId) as SUB
) AS a
PIVOT (SUM(TotDays) FROM ClassId IN ([12],[13],[14],[15])) AS PT

SQL Server result column in WHERE clause

SELECT
MyColumn = 'something'
FROM table
WHERE MyColumn == 'something'
Possible to use MyColumn in WHERE clause?
EDIT:
Here's full query:
select TOP 10
PremiumYTDCurrent=Sum(CASE WHEN
AASI.Inv_Acctcur>='201101'
and AASI.Inv_Acctcur<='201102'
THEN (AASI.Inv_Premium)*R.[Percent]
ELSE 0 END),
PremiumYTDPrevious=Sum(CASE WHEN
AASI.Inv_Acctcur>='201001'
and AASI.Inv_Acctcur<='201002'
THEN (AASI.Inv_Premium)*R.[Percent]
ELSE 0 END),
R.STAFF, L.Description, L.LINE_OF_BUSINESS
from AAS_Invoice AASI,Invoice I,Revenue_Tracking R, Policy P, Line_Of_Business L
where I.Invoice_No=convert(Char,Convert(int,AASI.Inv_Entry_Num))
and I.Invoice=R.Invoice
and I.POLICY=P.POLICY
and L.LINE_OF_BUSINESS=P.LINE_OF_BUSINESS
and R.Organization IN (SELECT ST.ORGANIZATION FROM Staff ST WHERE ST.STAFF=14407)
and R.Staff=14407
and R.Activity_type='Broker'
and R.[Percent]>0
and PremiumYTDCurrent != 0
group by R.STAFF, L.Description, L.LINE_OF_BUSINESS
order by PremiumYTDCurrent DESC, PremiumYTDPrevious DESC, average_policy DESC

You can not use the column in the where clause. Use the expression instead.
and Sum(CASE WHEN
AASI.Inv_Acctcur>='201101'
and AASI.Inv_Acctcur<='201102'
THEN (AASI.Inv_Premium)*R.[Percent]
ELSE 0 END) <> 0
Edit 1
Did not notice the SUM clause.
Try add it as a HAVING clause instead after order by.
having Sum(CASE WHEN
AASI.Inv_Acctcur>='201101'
and AASI.Inv_Acctcur<='201102'
THEN (AASI.Inv_Premium)*R.[Percent]
ELSE 0 END) != 0

You could wrap the SQL up in a nested statement, a horrendously simple example being, e.g.:
SELECT MyMadeUpColumnName, col2, AnotherMadeUpColumn FROM (
SELECT SUM(sillycolumn) AS 'MyMadeUpColumnName', col2 FROM table GROUP BY col2
) AS t
WHERE t.AnotherMadeUpColumn <> 0
Any column names that you (re)define in the derived table become the actual column names for the parent select.

SQL Select Statement For Calculating A Running Average Column

I am trying to have a running average column in the SELECT statement based on a column from the n previous rows in the same SELECT statement. The average I need is based on the n previous rows in the resultset.
Let me explain
Id Number Average
1 1 NULL
2 3 NULL
3 2 NULL
4 4 2 <----- Average of (1, 3, 2),Numbers from previous 3 rows
5 6 3 <----- Average of (3, 2, 4),Numbers from previous 3 rows
. . .
. . .
The first 3 rows of the Average column are null because there are no previous rows. The row 4 in the Average column shows the average of the Number column from the previous 3 rows.
I need some help trying to construct a SQL Select statement that will do this.

This should do it:
--Test Data
CREATE TABLE RowsToAverage
(
ID int NOT NULL,
Number int NOT NULL
)
INSERT RowsToAverage(ID, Number)
SELECT 1, 1
UNION ALL
SELECT 2, 3
UNION ALL
SELECT 3, 2
UNION ALL
SELECT 4, 4
UNION ALL
SELECT 5, 6
UNION ALL
SELECT 6, 8
UNION ALL
SELECT 7, 10
--The query
;WITH NumberedRows
AS
(
SELECT rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
FROM RowsToAverage rta
)
SELECT nr.ID, nr.Number,
CASE
WHEN nr.RowNumber <=3 THEN NULL
ELSE ( SELECT avg(Number)
FROM NumberedRows
WHERE RowNumber < nr.RowNumber
AND RowNumber >= nr.RowNumber - 3
)
END AS MovingAverage
FROM NumberedRows nr

Assuming that the Id column is sequential, here's a simplified query for a table named "MyTable":
SELECT
b.Id,
b.Number,
(
SELECT
AVG(a.Number)
FROM
MyTable a
WHERE
a.id >= (b.Id - 3)
AND a.id < b.Id
AND b.Id > 3
) as Average
FROM
MyTable b;

Edit: I missed the point that it should average the three previous records...
For a general running average, I think something like this would work:
SELECT
id, number,
SUM(number) OVER (ORDER BY ID) /
ROW_NUMBER() OVER (ORDER BY ID) AS [RunningAverage]
FROM myTable
ORDER BY ID

A simple self join would seem to perform much better than a row referencing subquery
Generate 10k rows of test data:
drop table test10k
create table test10k (Id int, Number int, constraint test10k_cpk primary key clustered (id))
;WITH digits AS (
SELECT 0 as Number
UNION SELECT 1
UNION SELECT 2
UNION SELECT 3
UNION SELECT 4
UNION SELECT 5
UNION SELECT 6
UNION SELECT 7
UNION SELECT 8
UNION SELECT 9
)
,numbers as (
SELECT
(thousands.Number * 1000)
+ (hundreds.Number * 100)
+ (tens.Number * 10)
+ ones.Number AS Number
FROM digits AS ones
CROSS JOIN digits AS tens
CROSS JOIN digits AS hundreds
CROSS JOIN digits AS thousands
)
insert test10k (Id, Number)
select Number, Number
from numbers
I would pull the special case of the first 3 rows out of the main query, you can UNION ALL those back in if you really want it in the row set. Self join query:
;WITH NumberedRows
AS
(
SELECT rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
FROM test10k rta
)
SELECT nr.ID, nr.Number,
avg(trailing.Number) as MovingAverage
FROM NumberedRows nr
join NumberedRows as trailing on trailing.RowNumber between nr.RowNumber-3 and nr.RowNumber-1
where nr.Number > 3
group by nr.id, nr.Number
On my machine this takes about 10 seconds, the subquery approach that Aaron Alton demonstrated takes about 45 seconds (after I changed it to reflect my test source table) :
;WITH NumberedRows
AS
(
SELECT rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
FROM test10k rta
)
SELECT nr.ID, nr.Number,
CASE
WHEN nr.RowNumber <=3 THEN NULL
ELSE ( SELECT avg(Number)
FROM NumberedRows
WHERE RowNumber < nr.RowNumber
AND RowNumber >= nr.RowNumber - 3
)
END AS MovingAverage
FROM NumberedRows nr
If you do a SET STATISTICS PROFILE ON, you can see the self join has 10k executes on the table spool. The subquery has 10k executes on the filter, aggregate, and other steps.

Want to improve this post? Provide detailed answers to this question, including citations and an explanation of why your answer is correct. Answers without enough detail may be edited or deleted.
Check out some solutions here. I'm sure that you could adapt one of them easily enough.

If you want this to be truly performant, and arn't afraid to dig into a seldom-used area of SQL Server, you should look into writing a custom aggregate function. SQL Server 2005 and 2008 brought CLR integration to the table, including the ability to write user aggregate functions. A custom running total aggregate would be the most efficient way to calculate a running average like this, by far.

Alternatively you can denormalize and store precalculated running values. Described here:
http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/01/23/denormalizing-to-enforce-business-rules-running-totals.aspx
Performance of selects is as fast as it goes. Of course, modifications are slower.