NOT IN filter out NULL values - sql-server

I was trying to filter out some predefined values from a table sample which has two column col1 and col2.
My query:
select *
from sample
where (col1 is not null or col2 is not null)
and col1 not in (1,2)
and col2 not in (3,4);
However, the above query filter out all the null values (in col1 or col2 ).
Example: the below row is filtered out,
col1 col2
---------
7 null
null 8
I get the expected result when i modify the query to below.
select *
from sample
where (col1 is not null or col2 is not null)
and (col1 not in (1,2) or col1 is null)
and (col2 not in (3,4) or col2 is null);
Why NOT IN filters out rows with NULL value even though I am not specified NULL in NOT IN ?

Nothing is equal to NULL, and is anything not equal to NULL. For an expression like NULL NOT IN (1,2), this evaluates to unknown which (importantly) is not true; meaning that the WHERE is not met. This is why your second query, where you handle your NULLs works.
Alternatively, you could use an EXISTS. It's perhaps not an intuitive, but handles NULL values:
WITH VTE AS(
SELECT *
FROM (VALUES(1,3),(2,4),(3,5),(7,NULL),(NULL,8))V(col1,col2))
SELECT *
FROM VTE
WHERE NOT EXISTS(SELECT 1
WHERE Col1 IN (1,2)
AND Col2 IN (3,4));

Try this
SET ANSI_NULLS OFF
select *
from sample
where (col1 is not null or col2 is not null)
and col1 not in (1,2)
and col2 not in (3,4);
ANSI NULL ON/OFF:
This option specifies the setting for ANSI NULL comparisons. When this is on, any query that compares a value with a null returns a 0. When off, any query that compares a value with a null returns a null value (https://blog.sqlauthority.com/2007/03/05/sql-server-quoted_identifier-onoff-and-ansi_null-onoff-explanation/#:~:text=ANSI%20NULL%20ON%2FOFF%3A,null%20returns%20a%20null%20value.).
this discussed in https://stackoverflow.com/questions/129077/null-values-inside-not-in-clause#:~:text=NOT%20IN%20returns%200%20records,not%20the%20value%20being%20tested.

Related

How to derive column value based on occurrence of a phrase in Snowflake

I have input table as below
I want to have a derived column with the logic like
If for single value of COL1, if field COL2 has 'ABC' then DERIVED_COL will be filled with 'ABC_FIXED', if for a single value of COL2, if field COL# does not have 'ABC', then DERIVED_COL will be filled with 'ABC_NONFIXED'.
Is this possible in Snowflake?
Just do a self-join with a subset of same table with col2='ABC'. If join produces result means ABC fixed else ABC not fixed.
select orig.col1,orig.col2,
case when abc.col2 is not null then 'ABC_FIXED' else 'ABC_NOTFIXED' end derived_col
from mytable orig
left join (select distinct col1, col2 from mytable where col2='ABC') abc
on abc.col1=orig.col1
Using windowed function:
SELECT *, CASE WHEN COUNT_IF(COL2 = 'ABC') OVER(PARTITION BY COL1) > 0
THEN 'ABC_FIXED' ELSE 'ABC_NOTFIXED'
END AS DERIVED_COL
FROM tab;

Comparing integers in columns with nulls?

I have a table. The table has two integer columns. Nulls are allowed in each column. The following query works
select * from table where col1=1 and col2 = 2
but neither of the following two queries work
select * from table where col1=1 and col2 != 2 or
select * from table where col1=1 and col2 <> = 2
I am aware that the comparison operators are not supposed to work for columns that have null, but '=' is a comparison operator and the first query above works. I do not understand why the first query works and the second query does not? (If you see any typos just ignore them I tested this with real code any mistakes are just when I transcribed it to this question.)
Here are two sql statements that will allow you to create a table and insert data into it for testing the above queries.
CREATE TABLE Test (
ID int,
Col1 int,
Col2 int)
and the insert statements
INSERT INTO test
(id, col1 , col2)
VALUES
(1,1,NULL),
(2,NULL,2),
(3,1,2)
It may help you understand by examining the results of each query predicate individually using the sample data. Note that both conditions must evaluate to TRUE for a row to be returned due to the AND logical operator.
select * from Test where col1=1 and col2 = 2;
VALUES
(1,1,NULL), --col1=1 is TRUE and col2 = 2 is UNKNOWN
(2,NULL,2), --col1=1 is UNKNOWN and col2 = 2 is TRUE
(3,1,2) --col1=1 is TRUE and col2 = 2 is TRUE: row returned because both are TRUE
select * from table where col1=1 and col2 <> 2
VALUES
(1,1,NULL), --col1=1 is TRUE and col2 <> 2 is UNKNOWN
(2,NULL,2), --col1=1 is UNKNOWN and col2 <> 2 is FALSE
(3,1,2) --col1=1 is TRUE and col2 <> 2 is FALSE
If Col 2 is either 2 or null as in your comment then you need to use IS NULL
select * from table where col1=1 and col2 IS NULL
Or convert the null values to some other non-used value
select * from table where col1=1 and ISNULL(col2, 0) != 2
Sounds like those columns should really be bit fields though to me.

Using IS NOT NULL for multiple columns

I want to check for the is not null constraint for multiple columns in a single SQL statement in the WHERE clause, is there a way to do so?
Also I don't want want to enforce the NOT NULL type constraint on the column definition.
SELECT * FROM AB_DS_TRANSACTIONS
WHERE FK_VIOLATION IS NULL
AND TRANSACTION_ID NOT IN(
SELECT distinct TRANSACTION_ID FROM AB_TRANSACTIONS)
AND COUNTRY_ID IS NOT NULL
AND GEO_CUST_COUNTRY_ID IS NOT NULL
AND INVOICE_DATE IS NOT NULL
AND ABB_GLOBALID IS NOT NULL
AND SALES_ORG_ID IS NOT NULL
AND DIST_ID IS NOT NULL
AND CUSTOMER_ID IS NOT NULL
AND REPORT_UNIT_ID IS NOT NULL
AND CURR_INVOICE IS NOT NULL
AND DIVISION_CODE IS NOT NULL
So instead of using IS NOT NULL again and again I want to simplify things
You can use
SELECT * FROM table1
WHERE NOT (Column1 IS NULL OR
Column2 IS NULL OR
Column3 IS NULL OR
Column4 IS NULL
IS NOT NULL)
As per OP comment, Updating answer
Inserting Rows by Using INSERT and SELECT Subqueries
INSERT INTO Table_A
SELECT column1, column2, column3,column4
FROM Table_B
WHERE NOT (Column1 IS NULL OR
Column2 IS NULL OR
Column3 IS NULL OR
Column4 IS NULL
IS NOT NULL);
Your query
I am able to reduce 50 chars approx
SELECT * FROM AB_DS_TRANSACTIONS
WHERE
FK_VIOLATION IS NULL
AND TRANSACTION_ID NOT
IN(SELECT distinct TRANSACTION_ID FROM AB_TRANSACTIONS)
AND
NOT (
COUNTRY_ID IS NULL
OR GEO_CUST_COUNTRY_ID IS NULL
OR INVOICE_DATE IS NULL
OR ABB_GLOBALID IS NULL
OR SALES_ORG_ID IS NULL
OR DIST_ID IS NULL
OR CUSTOMER_ID IS NULL
OR REPORT_UNIT_ID IS NULL
OR CURR_INVOICE IS NULL
OR DIVISION_CODE IS NULL
)
SELECT * FROM AB_DS_TRANSACTIONS
WHERE COALESCE(COUNTRY_ID,GEO_CUST_COUNTRY_ID,INVOICE_DATE,ABB_GLOBALID,SALES_ORG_ID,DIST_ID,CUSTOMER_ID,REPORT_UNIT_ID,CURR_INVOICE,DIVISION_CODE) IS NOT NULL
I think the syntax that you are looking for to show only those rows where all are not null is this
SELECT * from table_B
where COLUMN1 is not null and COLUMN2 is not null and COLUMN3 is not null
In BigQuery (might work in other db)- I would use the concat-function. Make sure to have strings or cast fields to strings where needed. Concat returns null if one of the fields is null.
SELECT * FROM AB_DS_TRANSACTIONS
WHERE FK_VIOLATION IS NULL
AND TRANSACTION_ID NOT IN(
SELECT distinct TRANSACTION_ID FROM AB_TRANSACTIONS)
AND concat(COUNTRY_ID,GEO_CUST_COUNTRY_ID,INVOICE_DATE,ABB_GLOBALID,SALES_ORG_ID,DIST_ID,CUSTOMER_ID,REPORT_UNIT_ID,CURR_INVOICE,DIVISION_CODE) IS NOT NULL
I think one strategy can be using least function?
The limitation here would be that all the arguments must be of the same type, so looks like OP's columns may need to be converted to a str or something.
If we can get past that limitation, I think the below should work to check if any of the columns are null:
SELECT *
FROM AB_DS_TRANSACTIONS
WHERE FK_VIOLATION IS NULL
AND TRANSACTION_ID NOT IN( SELECT distinct TRANSACTION_ID FROM AB_TRANSACTIONS)
AND least(COUNTRY_ID
,GEO_CUST_COUNTRY_ID
,INVOICE_DATE
,ABB_GLOBALID
,SALES_ORG_ID
,DIST_ID
,CUSTOMER_ID
,REPORT_UNIT_ID
,CURR_INVOICE
,DIVISION_CODE
) IS NOT NULL
;
ps - using snowflake.
It seems like this should be the elegant one:
(at least for newer versions of MSSQL)
SELECT * FROM tbl WHERE NOT COL1+COL2+COL3+COL4 IS NULL

Select Where clause

I am doing a Select in SQL Server and say my Select is pulling the data like this,
ID Col1 Col2 Col3 Col4
1 xx xx xx xx
2 null null null null
3 xx xx null null
I only want the records where not all the rows are Null. In above record, I don't want the row where ID= 2.
How can I do this in where clause?
Thanks.
Do they all have the same datatype? If so
WHERE COALESCE(Col1, Col2, Col3, Col4) IS NOT NULL
An alternate to Martin's solution:
Where Col1 Is Not Null
Or Col2 Is Not Null
Or Col3 Is Not Null
Or Col4 Is Not Null
It should be noted that if any of the columns are not implicitly castable to the same datatype (e.g. all varchar or all ints), that COALESCE will throw an error.
Where not ( Col1 Is Null
And Col2 Is Null
And Col3 Is Null
And Col4 Is Null)
This will make sure that all the columns are not null
Select Col1, Col2, Col3, Col4
from table
where Col1>''
and Col2>''
and Col3>''
and Col4>''
What this does is it selects all rows that actually have data in them. However, if you have char fields that are balnk but not null that you need to select, you'll have to change this to >=. This query will also utalize a covering index i believe.

in sqlserver, need to select one of two columns, can a case statment do this?

My table:
Users (userID, col1, col2)
I want to make a generic stored procedure, but I need to return EITHER col1 or col2 in a query.
Can I case statement handle this situation?
SELECT userID, col1
FROM Users
OR
SELECT userID, col2
FROM Users
Using CASE:
SELECT t.userid,
CASE
WHEN [something to evaluate why to show col1 vs col2 ] THEN
t.col1
ELSE
t.col2
END
FROM USERS t
Using COALESCE:
SELECT t.userid,
COALESCE(t.col1, t.col2)
FROM USERS t
COALESCE returns the first column value that isn't null, starting from the left.
Yes, as long as you don't mind returning the same column names:
SELECT userID, ArbitraryCol = CASE WHEN #param = 1 then col1 ELSE col2 END
FROM Users
If you need the column headers to change, then you should use the IF statement
IF #param = 1
BEGIN
SELECT userID, col1 FROM Users
END
ELSE
BEGIN
SELECT userID, col2 FROM Users
END
I think SQL Server 2000/2005 is something like this.
select
UserID,
case UserID
when 1 then Col1
when 2 then Col2
else 'boo'
end as SpecialSituation
from
Users
It can, assuming you have a logical expression for when to return col1 instead of col2:
SELECT USERID CASE WHEN USERTYPE='X' THEN COL1 ELSE COL2 END FROM USERS
If col1 and col2 are of different types, you will need to use CAST or CONVERT to convert one to the other.

Resources