Prevent Grouping rows by NULL value - sql-server

According to this article:
When grouping with a column in a GROUP BY statement that contains NULLs, they will be put into one group in your result set:
However, what I want is to prevent grouping rows by NULL value.
The following code gives me one row:
IF(OBJECT_ID('tempdb..#TestTable') IS NOT NULL)
DROP TABLE #TestTable
GO
CREATE TABLE #TestTable ( ID INT, Value INT )
INSERT INTO #TestTable(ID, Value) VALUES
(NULL, 70),
(NULL, 70)
SELECT
ID
, Value
FROM #TestTable
GROUP BY ID, Value
The output is:
ID Value
NULL 70
However, I would like to have two rows. My desired result looks like this:
NULL 70
NULL 70
Is it possible to have two rows with GROUP BY?
UPDATE:
What I need is to count those rows:
SELECT
COUNT(1) AS rows
FROM (SELECT 1 AS foo
FROM #TestTable
GROUP BY ID, Value
)q
OUTPUT: 1
But, actually, there are two rows. I need output to have 2.

What you need is a way to make NULL values in Id unique. Using the following code will make the values unique, but continue to group the non-NULL value by virtue of the default value for a case expression being NULL:
group by Id, case when Id is NULL then NewId() end, Value

Assuming you want this behavior because you do want to group by the values of the nullable column (Id in your example), you can add a row_number when the id column is null using a common table expression to create an artificial difference between duplicate groups - like this:
-- Adding some more rows to the table
INSERT INTO #TestTable(ID, Value) VALUES
(NULL, 70),
(NULL, 70),
(1, 70),
(1, 70),
(2, 70);
The query, with the cte:
WITH CTE AS
(
SELECT Id, Value, IIF(Id IS NULL, ROW_NUMBER() OVER(ORDER BY Id), NULL) As Surrogate
FROM #TestTable
)
SELECT
ID
, Value
FROM CTE
GROUP BY ID, Surrogate, Value
Results:
ID Value
NULL 70
NULL 70
1 70
2 70

Related

Why COALESCE return NULL instead of value I need?

From the table tblQuoteStatusChangeLog I need to check if column NewQuoteStatusID has one of those values (2, 25 or 202), and pick the earliest TimeStamp.
So if it has value 2, then pick up the TimeStamp, if it doesnt have value 2 then check if there is 25 and pick up corresponding TimeStamp, and if its not then check for 202 and pick up proper stamp.
So from tblQuoteStatusChangeLog I need to pick up first row with StatusID 202, because its the only that falls under condition.
So I have this query:
SELECT
(SELECT TOP (1) Timestamp
FROM tblQuoteStatusChangeLog
WHERE NewQuoteStatusID = COALESCE (2,25,202) AND ControlNo = tblQuotes.ControlNo
ORDER BY Timestamp DESC) as DateQuoted
FROM tblQuotes
INNER JOIN tblMaxQuoteIDs ON tblQuotes.QuoteID = tblMaxQuoteIDs.MaxQuoteID
where tblQuotes.ControlNo = 50065
But for some reason I got NULL value as a result
What am I missing here?
Thanks
I don't think coalesce() is the function that you want. coalesce(2, 25, 2002) returns the first non-NULL value, which is always "2". Your sample data doesn't have the value "2", so that is why the subquery returns NULL.
I think you might want IN:
SELECT (SELECT TOP (1) Timestamp
FROM tblQuoteStatusChangeLog
WHERE NewQuoteStatusID IN (2, 25, 202) AND
ControlNo = tblQuotes.ControlNo
ORDER BY Timestamp DESC
)

Column '' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause

This table is for the purpose of demo, but I have physical table whose values I need to insert into another table. there is no primary key in this table. The question I have is - Is the only way to get all the data in one SELECT statement using aggregate values (using SUM, AVG, etc.) and non-aggregate fields is listing all the not aggregate fields in the GROUP BY clause or is there some other way as well? What would be the impact of listing a large number of fields in the GROUP BY clause?
Here is the sample:
CREATE TABLE #SummaryData(
[Col_Name] varchar(20) not NULL,
[Col_Date] datetime NULL,
[ColC] [decimal](18, 4) NULL,
[ColD] [decimal](18, 4) NULL,
[ColE] [decimal](18, 4) NULL
)
INSERT INTO #SummaryData ([Col_Name],[Col_Date],[ColC],[ColD],[ColE])
VALUES ('BOA' ,'03/10/2017', 2.4507 ,33536.0000 ,0.0073)
INSERT INTO #SummaryData ([Col_Name],[Col_Date],[ColC],[ColD],[ColE])
VALUES ('BOA' , '03/11/2017' , 9.9419,47041.0000, 0.0088)
INSERT INTO #SummaryData ([Col_Name],[Col_Date],[ColC],[ColD],[ColE])
VALUES ('Merrill Lynch', '03/10/2017', 2.8152, 32371.0000, 0.0042)
INSERT INTO #SummaryData ([Col_Name],[Col_Date],[ColC],[ColD],[ColE])
VALUES ('Merrill Lynch', '03/11/2017', 9.9333, 35671.0000, 0.0444)
--NOTE: Next SELECT will be used to INSERT data into another table, so I need all fields
SELECT [Col_Name],[Col_Date],[ColC],
CASE WHEN SUM([ColE]) > 0 THEN SUM([ColD])/SUM([ColE]) ELSE 0 END AS SomeVal , [ColE]
FROM #SummaryData
GROUP BY [Col_Name],[Col_Date],[ColE],[ColC]
If I do not include ColE and ColC in the GROUP BY clause I get:
Msg 8120, Level 16, State 1, Line 21
Column '#SummaryData.Col_Date' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Whenever you use an aggregate function, all non-aggregate values in your SELECT statement need to appear in your group by statement. If you want to insert aggregate values then you need to use the group by. With that said, why do you need to use the SUM function? This would only be needed if you had duplicate entries you were consolidating. The below query avoids the SUM and thus does not need a group by.
SELECT [Col_Name],[Col_Date],[ColC],
CASE WHEN [ColE] > 0 THEN [ColD]/[ColE] ELSE 0 END AS SomeVal , [ColE]
FROM #SummaryData
If you want to see all of the records, you can't use a GROUP BY at all. If you need intermediate values such as SUM(ColE) and SUM(ColD) from the whole table, you can calculate them and put them into a variable. Then you can use the variables however you want to.
DECLARE #SumE DECIMAL(18, 4);
SELECT #SumE = SUM(ColE) FROM #SummaryData
That's totally correct, Group by has all non aggregate functions.
but Why ?
Simple Demo:-
create table emp (empid int , departmentName varchar(15))
go
insert into emp values (1 , 'HR')
insert into emp values (2 , 'HR')
insert into emp values (3 , 'HR')
insert into emp values (4 , 'Sales')
insert into emp values (5 , 'Sales')
insert into emp values (7 , 'Developemnet')
insert into emp values (8 , 'Developemnet')
insert into emp values (9 , 'Developemnet')
insert into emp values (10 , 'Developemnet')
insert into emp values (11 , 'Developemnet')
The Desired Result is:-
countEmpID departmentName
5 Developemnet
3 HR
2 Sales
so for achieving that, you MUST select count (empid) & departmentName then Group by with non aggregate functions (departmentName) because this is way to making groups via next code:-
select count (empid) countEmpID, departmentName
from emp
group by departmentName
and this way if you didn't put non aggragate functions in group by, the next error will be raised:-
Msg 8120, Level 16, State 1, Line 15 Column 'emp.departmentName' is
invalid in the select list because it is not contained in either an
aggregate function or the GROUP BY clause.
Hope it helps.

Window function behaves differently in Subquery/CTE?

I thought the following three SQL statements are semantically the same. The database engine will expand the second and third query to the first one internally.
select ....
from T
where Id = 1
select *
from
(select .... from T) t
where Id = 1
select *
from
(select .... from T where Id = 1) t
However, I found the window function behaves differently. I have the following code.
-- Prepare test data
with t1 as
(
select *
from (values ( 2, null), ( 3, 10), ( 5, -1), ( 7, null), ( 11, null), ( 13, -12), ( 17, null), ( 19, null), ( 23, 1759) ) v ( id, col1 )
)
select *
into #t
from t1
alter table #t add primary key (id)
go
The following query returns all the rows.
select
id, col1,
cast(substring(max(cast(id as binary(4)) + cast(col1 as binary(4)))
over (order by id
rows between unbounded preceding and 1 preceding), 5, 4) as int) as lastval
from
#t
id col1 lastval
-------------------
2 NULL NULL
3 10 NULL
5 -1 10
7 NULL -1
11 NULL -1
13 -12 -1
17 NULL -12
19 NULL -12
23 1759 -12
Without CTE/subquery: then I added a condition just return the row which Id = 19.
select
id, col1,
cast(substring(max(cast(id as binary(4)) + cast(col1 as binary(4))) over (order by id rows between unbounded preceding and 1 preceding), 5, 4) as int) as lastval
from
#t
where
id = 19;
However, lastval returns null?
With CTE/subquery: now the condition is applied to the CTE:
with t as
(
select
id, col1,
cast(substring(max(cast(id as binary(4)) + cast(col1 as binary(4))) over (order by id rows between unbounded preceding and 1 preceding ), 5, 4) as int) as lastval
from
#t)
select *
from t
where id = 19;
-- Subquery
select
*
from
(select
id, col1,
cast(substring(max(cast(id as binary(4)) + cast(col1 as binary(4))) over (order by id rows between unbounded preceding and 1 preceding), 5, 4) as int) as lastval
from
#t) t
where
id = 19;
Now lastval returns -12 as expected?
The logic order of operations of the SELECT statement is import to understand the results of your first example. From the Microsoft documentation, the order is, from top to bottom:
FROM
ON
JOIN
WHERE
GROUP BY
WITH CUBE or WITH ROLLUP
HAVING
SELECT
DISTINCT
ORDER BY
TOP
Note that the WHERE clause processing happens logically before the SELECT clause.
The query without the CTE is being filtered where id = 19. The order of operations causes the where to process before the window function in the select clause. There is only 1 row with an id of 19. Therefore, the where limits the rows to id = 19 before the window function can process the rows between unbounded preceding and 1 preceding. Since there are no rows for the window function, the lastval is null.
Compare this to the CTE. The outer query's filter has not yet been applied, so the CTE operates an all of the data. The rows between unbounded preceding finds the prior rows. The outer part of the query applies the filter to the intermediate results returns just the row 19 which already has the correct lastval.
You can think of the CTE as creating a temporary #Table with the CTE data in it. All of the data is logically processed into a separate table before returning data to the outer query. The CTE in your example creates a temporary work table with all of the rows that includes the lastval from the prior rows. Then, the filter in the outer query gets applied and limits the results to id 19.
(In reality, the CTE can shortcut and skip generating data, if it can do so to improve performance without affecting the results. Itzik Ben-Gan has a great example of a CTE that skips processing when it has returned enough data to satisfy the query.)
Consider what happens if you put the filter in the CTE. This should behave exactly like the first example query that you provided. There is only 1 row with an id = 19, so the window function does not find any preceding rows:
with t as ( select id, col1,
cast(substring(max(cast(id as binary(4)) + cast(col1 as binary(4))) over ( order by id
rows between unbounded preceding and 1 preceding ), 5, 4) as int) as lastval
from #t
where id = 19 -- moved filter inside CTE
)
select *
from t
Window functions operate on your result set, so when you added where id = 19 your result set only had 1 row. Since your window function specifies rows between unbounded preceding and 1 preceding there was no preceding row, and resulted in null.
By using the subquery/cte you are allowing the window function to operate over the unfiltered result set (where the preceding rows exist), then retrieving only those rows from that result set where id = 19.
The querys you are comparing are not equivalent.
select id ,
(... ) as lastval
from #t
where id = 19;
will take only 1 row, so 'lastval' will take NULL from col1 as for the windowed function does not find preceding row.

Get array of records based on two keys in same table

I have tried this on the following table,
SELECT DISTINCT
a.main_id,
array_agg(distinct a.secondary_id ) AS arr
FROM table1 a JOIN table1 b ON a.secondary_id = b.secondary_id or a.tertiary_id = b.tertiary_id
group by a.main_id, a.secondary_id , b.tertiary_id
I added the distinct to omit the duplicates But I can not get the whole row as an element in the array which does not even put the rows together to the array based on the below mentioned requirement. I was following this.
Table script:
Create table table1
(
id bigserial NOT NULL,
main_id integer NOT NULL,
secondary_id integer,
tertiary_id integer,
data1 text,
data2 text,
CONSTRAINT table1_pk PRIMARY KEY (main_id)
)
Data:
INSERT INTO table1(
main_id, secondary_id, tertiary_id, data1, data2)
VALUES (1,2,NULL,'data1_1_2_N','data2_1_2_N'),
(2,2,NULL,'data1_2_2_N','data2_2_2_N'),
(3,3,5,'data1_3_3_5','data2_3_3_5'),
(4,3,5,'data1_4_3_5','data2_4_3_5'),
(5,NULL,1,'data1_5_N_1','data2_5_N_1'),
(6,NULL,1,'data1_6_N_1','data2_6_N_1'),
(7,NULL,1,'data1_7_N_1','data2_7_N_1'),
(8,NULL,2,'data1_8_N_2','data2_8_N_2'),
(9,NULL,2,'data1_9_N_2','data2_9_N_2'),
(10,NULL,3,'data1_10_N_3','data2_10_N_3'),
(11,12,12,'data1_11_12_12','data2_11_12_12'),
(12,12,11,'data1_12_12_11','data2_12_12_11')
Requirement:
If secondary_id is equal in two or more rows they should be considered as one set,
else if tertiary_id is equal they can be considered as one set.
Expected Result:
1 | {(1,2,NULL,'data1_1_2_N','data2_1_2_N'),(2,2,NULL,'data1_2_2_N','data2_2_2_N')}
2 | {(3,3,NULL,'data1_3_3_N','data2_3_3_N'),(4,3,NULL,'data1_4_3_N','data2_4_3_N')}
3 | {(5,NULL,1,'data1_5_N_1','data2_5_N_1'),(6,NULL,1,'data1_6_N_1','data2_6_N_1'),(7,NULL,1,'data1_7_N_1','data2_7_N_1')}
4 | {(8,NULL,2,'data1_8_N_2','data2_8_N_2'),(9,NULL,2,'data1_9_N_2','data2_9_N_2')}
5 | {(10,NULL,3,'data1_10_N_3','data2_10_N_3')}
6 | {(11,12,12,'data1_11_12_12','data2_11_12_12'),(12,12,11,'data1_12_12_11','data2_12_12_11') }
Version "PostgreSQL 9.3.11"
This should achieve your output. The trick sticks within conditional group by clause to handle cases where secondary_id and tertiary_id are the same for a record which has a matching record on both of those fields.
select array_agg(distinct t1)
from table1 t1
join table1 t2 on
t1.secondary_id = t2.secondary_id
or t1.tertiary_id = t2.tertiary_id
group by
case
when t1.secondary_id is null or t1.secondary_id is null
then concat(t1.secondary_id,'#',t1.tertiary_id) -- #1
when t1.secondary_id is not null and t1.tertiary_id is not null and t1.secondary_id = t2.secondary_id
then t1.secondary_id::TEXT -- #2
when t1.secondary_id is not null and t1.tertiary_id is not null and t1.tertiary_id = t2.tertiary_id
then t1.tertiary_id::TEXT -- #3
end
order by 1
Standard case is when any of the fields are null, which stands for #1. We need to group by both columns and we're tricking it by concatenating both values from columns with a # mark and doing a group by this concatenated column.
For #2 and #3 we need to cast the grouping value to type text to make it go through (types returned by CASE statement need to be the same).
Option #2 serves the case when both values are not null and secondary_id matches between those "chosen" rows from selfjoin. Option #3 is analogical, but for tertiary_id match.
Output:
array_agg
------------------------------------------------------------------------------------------------------------
{"(1,1,2,,data1_1_2_N,data2_1_2_N)","(2,2,2,,data1_2_2_N,data2_2_2_N)"}
{"(3,3,3,5,data1_3_3_5,data2_3_3_5)","(4,4,3,5,data1_4_3_5,data2_4_3_5)"}
{"(5,5,,1,data1_5_N_1,data2_5_N_1)","(6,6,,1,data1_6_N_1,data2_6_N_1)","(7,7,,1,data1_7_N_1,data2_7_N_1)"}
{"(8,8,,2,data1_8_N_2,data2_8_N_2)","(9,9,,2,data1_9_N_2,data2_9_N_2)"}
{"(10,10,,3,data1_10_N_3,data2_10_N_3)"}
{"(11,11,4,4,data1_11_4_4,data2_11_4_4)","(12,12,4,11,data1_12_4_11,data2_12_4_11)"}
If you'd like to get rid of column id from your record, you could use a CTE and select all columns but id and then refer to that CTE in from clause.

sql server TOP command with order by clause

CREATE DATABASE TEST
USE TEST
CREATE TABLE TBL_TEMP
(
ID INT,
NAME VARCHAR(100),
CREATED_ON DATETIME
)
INSERT INTO TBL_TEMP VALUES (1, 'A', NULL)
INSERT INTO TBL_TEMP VALUES (2, 'B', NULL)
INSERT INTO TBL_TEMP VALUES (3, 'C', NULL)
INSERT INTO TBL_TEMP VALUES (4, 'D', NULL)
SELECT TOP 1 *
FROM TBL_TEMP
ORDER BY CREATED_ON
Result:
ID NAME CREATED_ON
------------------
2 B NULL
SELECT TOP 1 * FROM TBL_TEMP
Result:
ID NAME CREATED_ON
--------------------
1 A NULL
Why top 1 gives two different results, is it that when order by clause is used it picks random row and when not used then it gives proper top record ?
is it a kind of bug in sql server 2008 ?
SQL does not guarantee an order unless you specify an ORDER BY clause, so in the second example you get the first-inserted row by good fortune.
If you specify an ORDER BY clause, the order is not defined if the values to sort on are identical. SQL could have selected any one of the four.
This is not a bug, but defined behaviour in SQL.

Resources