Sql Server: How to get aggregate statistics across fields - sql-server

Not sure if the title is phrased correctly here...
But in any case, for simplicity's sake, I have a table with three fields like below:
USER_ID Number_Of_Apples Number_Of_Pears
ABC1 1 NULL
ABC2 1 NULL
ABC3 NULL 5
ABC4 1 12
I want to know if there's a way to do a 'distinct' query of sorts that will give me the different levels of data per field. So in the example above, I want to see something like:
USER_ID Number_OF_Apples Number_OF_Pears
4 2 3
2 is returned for Number_Of_Apples b/c we only see 2 possible values in our dataset.
I'm wondering if there's an elegant way of doing this if you had 100 fields or more?

Similar to your other question you can count distinct with a coalesce to make sure you count the nulls:
SELECT COUNT(DISTINCT(COALESCE([USER_ID], 'nullitem'))) [USER_ID],
COUNT(DISTINCT(COALESCE([Number_Of_Apples], 'nullitem'))) [Number_Of_Apples],
COUNT(DISTINCT(COALESCE([Number_Of_Pears], 'nullitem'))) [Number_Of_Pears]
FROM mytable
Just add whatever items you need.

Try this:
SELECT (SELECT COUNT(DISTINCT USER_ID)
FROM mytable) AS USER_ID,
(SELECT COUNT(DISTINCT CASE
WHEN Number_Of_Apples IS NULL THEN -1
ELSE Number_Of_Apples
END)
FROM mytable) AS Number_Of_Apples ,
(SELECT COUNT(DISTINCT CASE
WHEN Number_Of_Pears IS NULL THEN -1
ELSE Number_Of_Pears
END) AS Number_Of_Pears
FROM mytable) AS Number_Of_Pears
The above query uses a CASE expression so as to handle NULL values for Number_Of_Apples and Number_Of_Pears columns. I've made the assumption that -1 is not a possible value for these two columns.
Demo here

Related

SQL CASE WHEN - NULL values returning unwanted rows

I have two case statements running in the same select statement, but here is a simplified example:
SELECT t.Person,
CASE WHEN t.Order LIKE 'Test Order 1 CHRG' THEN (SUBSTRING(t.Order PATINDEX('%[0-9]%',ord.Name),1)) END AS 'Order1'
CASE WHEN t.Order LIKE 'Test Order 2 CHRG' THEN (SUBSTRING(t.Order PATINDEX('%[0-9]%',ord.Name),1)) END AS 'Order2'
The results that I am getting are:
Name Order1 Order2
======================================
Person A 4 NULL
Person A NULL 3
Person B 2 NULL
Person B NULL 3
Person C 1 NULL
Person C NULL 5
Is there a way to ignore NULL value that is being produced by the CASE statements and have the query return only one row for each t.Name? Like this:
Name Order1 Order2
======================================
Person A 4 3
Person B 2 3
Person C 1 5
Thanks in advance!
(Now that I'm not on a Teams call) You're effectively doing a pivot here, which means you need to add some aggregation to eliminate the NULL values so that both values are on the same row. Filling in a few gaps, I suspect you therefore need:
SELECT t.Person,
MAX(CASE WHEN t.Order = 'Test Order 1 CHRG' THEN (SUBSTRING(t.Order PATINDEX('%[0-9]%',ord.Name),1)) END) AS Order1 --Don't use single quotes for aliases
MAX(CASE WHEN t.Order = 'Test Order 2 CHRG' THEN (SUBSTRING(t.Order PATINDEX('%[0-9]%',ord.Name),1)) END) AS Order2 --You didn't need LIKE either, as there was no pattern
FROM dbo.Table1 t
JOIN table2 ord ON t.id = ord.tid
GROUP BY t.Person;
Using Searched Case Statement is the best fit to exclude NULL marker from the result set of the Case Statement.
Below is the base T-SQL syntax for it;
--Searched CASE expression:
CASE
WHEN Boolean_expression THEN result_expression [ ...n ]
[ ELSE else_result_expression ]
END
Detailed information

How to get the latest not null value from multiple columns in SQL or Azure Synapse

I have data like in the below format
I want output in the below format
Please help me with the SQL code. Thanks !
Like I mention in the comments, you need to fix whatever it is that's inserting the data and not lose the values so that they become NULL in "newer" rows.
To get the results you want, you'll going to have to use row numbering and conditional aggregation, which is going to get messy the more columns you have; and why you need to fix the real problem. This will look something like this:
WITH CTE AS(
SELECT GroupingColumn,
NullableCol1,
NullableCol2,
DateColumn,
CASE WHEN NullableCol1 IS NOT NULL THEN ROW_NUMBER() OVER (PARTITION BY GroupingColumn, CASE WHEN NullableCol1 IS NULL THEN 1 ELSE 0 END ORDER BY DateColumn DESC) AS NullableCol1RN,
CASE WHEN NullableCol2 IS NOT NULL THEN ROW_NUMBER() OVER (PARTITION BY GroupingColumn, CASE WHEN NullableCol2 IS NULL THEN 1 ELSE 0 END ORDER BY DateColumn DESC) AS NullableCol2RN
FROM dbo.YourTable)
SELECT GroupingColumn,
MAX(CASE NullableCol1RN WHEN 1 THEN NullableCol1 END) AS NullableCol1,
MAX(CASE NullableCol2RN WHEN 1 THEN NullableCol2 END) AS NullableCol2,
MAX(DateColumn) AS DateColumn
FROM CTE;

How do I exclude rows when an incremental value starts over?

I am a newbie poster but have spent a lot of time researching answers here. I can't quite figure out how to create a SQL result set using SQL Server 2008 R2 that should probably be using lead/lag from more modern versions. I am trying to aggregate data based on sequencing of one column, but there can be varying numbers of instances in each sequence. The only way I know a sequence has ended is when the next row has a lower sequence number. So it may go 1-2, 1-2-3-4, 1-2-3, and I have to figure out how to make 3 aggregates out of that.
Source data is joined tables that look like this (please help me format):
recordID instanceDate moduleID iResult interactionNum
1356 10/6/15 16:14 1 68 1
1357 10/7/15 16:22 1 100 2
1434 10/9/15 16:58 1 52 1
1435 10/11/15 17:00 1 60 2
1436 10/15/15 16:57 1 100 3
1437 10/15/15 16:59 1 100 4
I need to find a way to separate the first 2 rows from the last 4 rows in this example, based on values in the last column.
What I would love to ultimately get is a result set that looks like this, which averages the iResult column based on the grouping and takes the first instanceDate from the grouping:
instanceDate moduleID iResult
10/6/15 1 84
10/9/15 1 78
I can aggregate to get this result using MIN and AVG if I can just find a way to separate the groups. The data is ordered by instanceDate (please ignore the date formatting here) then interactionNum and the group separation should happen when the query finds a row where the interactionNum is <= than the previous row (will usually start over with '1' but not always, so prefer just to separate on a lower or equal integer value).
Here is the query I have so far (includes the joins that give the above data set):
SELECT
X.*
FROM
(SELECT TOP 100 PERCENT
instanceDate, b.ModuleID, iResult, b.interactionNum
FROM
(firstTable a
INNER JOIN
secondTable b ON b.someID = a.someID)
WHERE
a.someID = 2
AND b.otherID LIKE 'xyz'
AND a.ModuleID = 1
ORDER BY
instanceDate) AS X
OUTER APPLY
(SELECT TOP 1
*
FROM
(SELECT
instanceDate, d.ModuleID, iResult, d.interactionNum
FROM
(firstTable c
INNER JOIN
secondTable d ON d.someID = c.someID)
WHERE
c.someID = 2
AND d.otherID LIKE 'xyz'
AND c.ModuleID = 1
AND d.interactionNum = X.interactionNum
AND c.instanceDate < X.instanceDate) X2
ORDER BY
instanceDate DESC) Y
WHERE
NOT EXISTS (SELECT Y.interactionNum INTERSECT SELECT X.interactionNum)
But this is returning an interim result set like this:
instanceDate ModuleID iResult interactionNum
10/6/15 16:10 1 68 1
10/6/15 16:14 1 100 2
10/15/15 16:57 1 100 3
10/15/15 16:59 1 100 4
and the problem is that interactionNum 3, 4 do not belong in this result set. They would go in the next result set when I loop over this query. How do I keep them out of the result set in this iteration? I need the result set from this query to just include the first two rows, 'seeing' that row 3 of the source data has a lower value for interactionNum than row 2 has.
Not sure what ModuleID was supposed to be used, but I guess you're looking for something like this:
select min (instanceDate), [moduleID], avg([iResult])
from (
select *,row_number() over (partition by [moduleID] order by instanceDate) as RN
from Table1
) X
group by [moduleID], RN - [interactionNum]
The idea here is to create a running number with row_number for each moduleid, and then use the difference between that and InteractionNum as grouping criteria.
Example in SQL Fiddle
Here is my solution, although it should be said, I think #JamesZ answer is cleaner.
I created a new field called newinstance which is 1 wherever your instanceNumber is 1. I then created a rolling sum(newinstance) called rollinginstance to group on.
Change the last select to SELECT * FROM cte2 to show all the fields I added.
IF OBJECT_ID('tempdb..#tmpData') IS NOT NULL
DROP TABLE #tmpData
CREATE TABLE #tmpData (recordID INT, instanceDate DATETIME, moduleID INT, iResult INT, interactionNum INT)
INSERT INTO #tmpData
SELECT 1356,'10/6/15 16:14',1,68,1 UNION
SELECT 1357,'10/7/15 16:22',1,100,2 UNION
SELECT 1434,'10/9/15 16:58',1,52,1 UNION
SELECT 1435,'10/11/15 17:00',1,60,2 UNION
SELECT 1436,'10/15/15 16:57',1,100,3 UNION
SELECT 1437,'10/15/15 16:59',1,100,4
;WITH cte1 AS
(
SELECT *,
CASE WHEN interactionNum=1 THEN 1 ELSE 0 END AS newinstance,
ROW_NUMBER() OVER(ORDER BY recordID) as rowid
FROM #tmpData
), cte2 AS
(
SELECT *,
(select SUM(newinstance) from cte1 b where b.rowid<=a.rowid) as rollinginstance
FROM cte1 a
)
SELECT MIN(instanceDate) AS instanceDate, moduleID, AVG(iResult) AS iResult
FROM cte2
GROUP BY moduleID, rollinginstance

Use NOT Equal condition in sql?

I want to fetch orders that have a “Received” (ActivityID = 1) activity but not a “Delivered” (ActivityID = 4) activity on orders table. i.e orders that are received but not deliverd yet.
my query is
SELECT OrderID FROM tblOrderActivity
where (tblOrderActivity.ActivityID = 1 AND tblOrderActivity.ActivityID != 4)
GROUP BY OrderID
it is not returning desired result.
result should be orderID 2 and 4
Your query doesn't really make sense. Grouping happens after WHERE clause, so you're basically getting all orders that have ActivityID ==1 (because if activity Id is 1 there it's always not equal to 4).
After WHERE clause is applied you end up with following rows:
OrderID ActivityID
1 1
2 1
3 1
4 1
And these are the orders you group. No more condition is evaluated.
If 4 is the highest possible ActivityID you could do following:
SELECT OrderID
FROM tblOrderActivity
GROUP BY OrderID
HAVING MAX(ActivityID) < 4
HAVING condition is applied after grouping, which is what you want.
I don't think Group by is needed here. You can use a Subquery to find he order's which is not delivered. Try this.
SELECT *
FROM Yourtable a
WHERE a.ActivityID = 1
AND NOT EXISTS (SELECT 1
FROM yourtable b
WHERE a.OrderID = b.OrderID
AND b.ActivityID = 4)

If any clause when grouping

Doing a Sum() on a column adds up the values in that column based on group by. But lets say I want to sum these values only if all the values are not null or not 0, then I need a clause which checks if any of the values is 0 before it does the sum. How can I implement such a clause?
I'm using sql server 2005.
Thanks,
Barry
Let's supose your table schema is:
myTable( id, colA, value)
Then, one approach is:
Select colA, sum(value)
from myTable
group by colA
having count( id ) = count( nullif( value, 0 ))
Notice that nullif is a MSSQL server function. YOu should adapt code to your rdbms brand.
Explanation:
count aggregate function only count not null values. Here a counting null values test.
You say that 0+2+3=0 for this case. Assuming that NULL+2+3 should also be zero:
SELECT GroupField,
SUM(Value) * MIN(CASE WHEN COALESCE(Value, 0) = 0 THEN 0 ELSE 1 END)
FROM SumNonZero
GROUP BY GroupField
The above statement gives this result
GroupField (No column name)
case1 5
case2 0
case3 0
with this test data
CREATE TABLE SumNonZero (
GroupField CHAR(5) NOT NULL,
Value INT
)
INSERT INTO SumNonZero(GroupField, Value)
SELECT 'case1', 2
UNION ALL SELECT 'case1', 3
UNION ALL SELECT 'case2', 0
UNION ALL SELECT 'case2', 2
UNION ALL SELECT 'case2', 3
UNION ALL SELECT 'case3', NULL
UNION ALL SELECT 'case3', 3
UNION ALL SELECT 'case3', 4
It makes no sense to eliminate 0 from a SUM because it wont impact the sum.
But you may want to SUM based on another field:
select FIELD, sum(
case when(OTHER_FIELD>0) then FIELD
else 0
end)
from TABLE
group by TABLE

Resources