Better way to join null values than picking a magic value? - sql-server

I need to join two tables that are more or less the same (one is a staging table data to go in the other).
Some of the columns are nullable, and when the values are null, the join in my merge statement does not match. (This is normal behavior for nulls.)
The problem is that, when they don't match it causes the row to be deleted and recreated, Changing the value identity of the row in the actual table.
I know that I can do something like this to join nulls:
on coalesce(target.SomeId, -9999) = coalesce(source.SomeId, -9999)
But I don't like having to pick out a number that I hope will never be used. (It feels dirty.)
Is there a better way to make a join on a nullable column than using a magic number like this?

Let's go with this:
target.SomeId = source.SomeId
or (target.SomeId is null and source.SomeId is null)
Conceptually, this should make sense. That is, either both values are null or both values are equal to each other. This should also perform better as the coalesce forces a table scan. I've converted the coalesce style to that above and seen tremendous performance gains.

I almost exclusively use the following pattern
ON EXISTS (SELECT target.SomeId INTERSECT source.SomeId)
after picking it up from Paul White's blog post here.

ON ((target.SomeId IS NULL) AND (source.SomeId IS NULL))
OR ((target.SomeId IS NOT NULL) AND (source.SomeId IS NOT NULL)
AND (target.SomeId = source.SomeId)))

Assuming you mean columns that aren't part of the joining key, then
...and (isnull(source.ColX, target.ColX) = isnull(target.ColX, source.ColX)
or (source.ColX is null and target.ColX is null))
should cover all possibilities: the first line catches if both values are not null or only one value is not null, and the second line catches if both are null. Pretty ugly, but then that's what happens when you get too many nulls in your system.

The result is strange and contains rows resulted from INNER JOIN and rows resulted from a CROSS JOIN betwen NULL IDs from source and target table (ex. {S-C, T-C}, {S-C, T-D}, {S-C, T-E}, {S-D, T-C}, {S-D, T-D}, {S-D, T-E},).
Look at this example:
DECLARE #Source TABLE (SomeId INT NULL, Name VARCHAR(10) NOT NULL);
DECLARE #Target TABLE (SomeId INT NULL, Name VARCHAR(10) NOT NULL);
INSERT #Source VALUES (1,'S-A'),(2,'S-B'),(NULL,'S-C'),(NULL,'S-D');
INSERT #Target VALUES (1,'T-A'),(2,'T-B'),(NULL,'T-C'),(NULL,'T-D'),(NULL,'T-E'),(6,'T-F');
SELECT s.*, t.*
FROM #Source s
INNER JOIN #Target t ON s.SomeId = t.SomeId OR s.SomeId IS NULL AND t.SomeId IS NULL;
SELECT s.*, t.*
FROM #Source s
INNER JOIN #Target t ON ISNULL(s.SomeId,-9999) = ISNULL(t.SomeId,-9999);
Results:
SomeId Name SomeId Name
----------- ---------- ----------- ----------
1 S-A 1 T-A <- INNER JOIN
2 S-B 2 T-B <- INNER JOIN
NULL S-C NULL T-C <- "CROSS JOIN"
NULL S-C NULL T-D <- "CROSS JOIN"
NULL S-C NULL T-E <- "CROSS JOIN"
NULL S-D NULL T-C <- "CROSS JOIN"
NULL S-D NULL T-D <- "CROSS JOIN"
NULL S-D NULL T-E <- "CROSS JOIN"

Special characters?
Else you can try:
on (target.SomeId is null OR source.SomeId is null OR target.SomeId = source.SomeId)

Related

Why TRY_PARSE its so slow?

I have this query that basically returns (right now) only 10 rows as results:
select *
FROM Table1 as o
inner join Table2 as t on t.Field1 = o.Field2
where Code = 123456 and t.FakeData is not null
Now, if I want to parse the field FakeData (which, unfortunately, can contain different types of data, from DateTime to Surname/etc; i.e. nvarchar(70)), for data show and/or filtering:
select *, TRY_PARSE(t.FakeData as date USING 'en-GB') as RealDate
FROM Table1 as o
inner join Table2 as t on t.Field1 = o.Field2
where Code = 123456 and t.FakeData is not null
It takes x10 the query to be executed.
Where am I wrong? How can I speed up?
I can't edit the database, I'm just a customer which read data.
The TSQL documentation for TRY_PARSE makes the following observation:
Keep in mind that there is a certain performance overhead in parsing the string value.
NB: I am assuming your typical date format would be dd/mm/yyyy.
The following is something of a shot-in-the-dark that might help. By progressively assessing the nvarchar column if it is a candidate as a date it is possible to reduce the number of uses of that function. Note that a data point established in one apply can then be referenced in a subsequent apply:
CREATE TABLE mytable(
FakeData NVARCHAR(60) NOT NULL
);
INSERT INTO mytable(FakeData) VALUES (N'oiwsuhd ouhw dcouhw oduch woidhc owihdc oiwhd cowihc');
INSERT INTO mytable(FakeData) VALUES (N'9603200-0297r2-0--824');
INSERT INTO mytable(FakeData) VALUES (N'12/03/1967');
INSERT INTO mytable(FakeData) VALUES (N'12/3/2012');
INSERT INTO mytable(FakeData) VALUES (N'3/3/1812');
INSERT INTO mytable(FakeData) VALUES (N'ohsw dciuh iuh pswiuh piwsuh cpiuwhs dcpiuhws ipdcu wsiu');
select
t.FakeData, oa3.RealDate
from mytable as t
outer apply (
select len(FakeData) as fd_len
) oa1
outer apply (
select case when oa1.fd_len > 10 then 0
when len(replace(FakeData,'/','')) + 2 = oa1.fd_len then 1
else 0
end as is_candidate
) oa2
outer apply (
select case when oa2.is_candidate = 1 then TRY_PARSE(t.FakeData as date USING 'en-GB') end as RealDate
) oa3
FakeData
RealDate
oiwsuhd ouhw dcouhw oduch woidhc owihdc oiwhd cowihc
null
9603200-0297r2-0--824
null
12/03/1967
1967-03-12
12/3/2012
2012-03-12
3/3/1812
1812-03-03
ohsw dciuh iuh pswiuh piwsuh cpiuwhs dcpiuhws ipdcu wsiu
null
db<>fiddle here

Do I need to handle nulls on LEFT JOINs?

There is a more senior SQL developer (the DBA) at the office who told me that in all the LEFT JOINS of my script, I must handle the scenario where the join column of the left table is possibly null, otherwise, I have to use INNER JOINs. Now, being a noob, I might be wrong here, but I can't see his point and left me needlessly confused.
His explanation was, unless the column is non-nullable, either I must
use ISNULL(LeftTable.ColumnA,<replacement value here>) on the ON clause, or
handle null values in the ON clause or the
WHERE clause, either by adding AND LeftTable.ColumnA IS NOT NULL or AND LeftTable.ColumnA IS NULL.
I thought those are unnecessary, since one uses a LEFT JOIN if one does not mind returning null rows from the right table, if the values of the right table join column does not match the left table join column, whether it be using equality or inequality. My intent is that it does not have to be equal to the right table join column values. If the left table join column is null, it is ok for me to return null rows on the right table, as a null is not equal to anything.
What is it that I am not seeing here?
MAJOR EDIT:
So I am adding table definitions and scripts. These are not the exact scripts, just to illustrate the problem. I have remove earlier edits which are incorrect as was not in front of the script before.
CREATE TABLE dbo.Contact (
ContactID int NOT NULL, --PK
FirstName varchar(10) NULL,
LastName varchar(10) NULL,
StatusID int NULL,
CONSTRAINT PK_Contact_ContactID
PRIMARY KEY CLUSTERED (ContactID)
);
GO
CREATE TABLE dbo.UserGroup (
UserGroupID int NOT NULL, --PK
UserGroup varchar(50) NULL,
StatusID int NULL,
CONSTRAINT PK_UserGroup_UserGroupID
PRIMARY KEY CLUSTERED (UserGroupID)
);
GO
CREATE TABLE dbo.UserGroupContact (
UserGroupID int NOT NULL, --PK,FK
ContactID int NOT NULL, --PK,FK
StatusID int NULL
CONSTRAINT PK_UserGroupContact_UserGroupContactID
PRIMARY KEY CLUSTERED (UserGroupID, ContactID),
CONSTRAINT FK_UserGroupContact_UserGroupId
FOREIGN KEY (UserGroupId)
REFERENCES [dbo].[UserGroup](UserGroupId),
CONSTRAINT FK_UserGroupContact_ContactId
FOREIGN KEY (ContactId)
REFERENCES [dbo].[Contact](ContactId)
);
GO
CREATE TABLE dbo.Account (
AccountID int NOT NULL, --PK
AccountName varchar(50) NULL,
AccountManagerID int NULL, --FK
Balance int NULL,
CONSTRAINT PK_Account_AccountID
PRIMARY KEY CLUSTERED (AccountID),
CONSTRAINT FK_Account_AccountManagerID
FOREIGN KEY (AccountManagerID)
REFERENCES [dbo].[Contact](ContactId),
);
GO
My original query would look like below. When I say "left table", I mean the table on the left of the ON clause in a join. If "right table", its the table on the right of the ON clause.
SELECT
a.AccountId,
a.AccountName,
a.Balance,
ug.UserGroup,
ugc.UserGroupID,
a.AccountManagerID,
c.FirstName,
c.LastName
FROM dbo.Account a
LEFT JOIN dbo.Contact c
ON a.AccountManagerID = c.ContactID
AND c.StatusID=1
LEFT JOIN dbo.UserGroupContact ugc
ON a.AccountManagerID = ugc.ContactID
AND ugc.StatusID=1
LEFT JOIN dbo.UserGroup ug
ON ugc.UserGroupID = ug.UserGroupID
AND ug.StatusID=1
WHERE
a.Balance > 0
AND ugc.UserGroupID = 10
AND a.AccountManagerID NOT IN (20,30)
Notice in the example script above, the first and second left joins has a nullable column on the left table and non-nullable column on the right table. The third left join has both nullable columns on the left and right tables.
The suggestion was to "change to inner join or handle NULL condition in where clause" or "There is use of LEFT JOIN but there are non null conditions referenced in the WHERE clause."
The suggestion is to do either of these depending on intent:
a) convert to inner join (not possible as I want unmatched rows from Account table)
SELECT
a.AccountId,
a.AccountName,
a.Balance,
ug.UserGroup,
ugc.UserGroupID,
a.AccountManagerID,
c.FirstName,
c.LastName
FROM dbo.Account a
INNER JOIN dbo.Contact c
ON a.AccountManagerID = c.ContactID
AND c.StatusID=1
INNER JOIN dbo.UserGroupContact ugc
ON a.AccountManagerID = ugc.ContactID
AND ugc.StatusID=1
INNER JOIN dbo.UserGroup ug
ON ugc.UserGroupID = ug.UserGroupID
AND ug.StatusID=1
WHERE
a.Balance > 0
AND ugc.UserGroupID = 10
AND a.AccountManagerID NOT IN (20,30)
b) handle nulls in WHERE clause (not possible as I want to return rows with nulls on column a.AccountManagerID and on ugc.UserGroupID)
SELECT
a.AccountId,
a.AccountName,
a.Balance,
ug.UserGroup,
ugc.UserGroupID,
a.AccountManagerID,
c.FirstName,
c.LastName
FROM dbo.Account a
LEFT JOIN dbo.Contact c
ON a.AccountManagerID = c.ContactID
AND c.StatusID=1
LEFT JOIN dbo.UserGroupContact ugc
ON a.AccountManagerID = ugc.ContactID
AND ugc.StatusID=1
LEFT JOIN dbo.UserGroup ug
ON ugc.UserGroupID = ug.UserGroupID
AND ug.StatusID=1
WHERE
a.Balance > 0
AND ugc.UserGroupID = 10
AND a.AccountManagerID NOT IN (20,30)
AND a.AccountManagerID IS NOT NULL
AND ugc.UserGroupID IS NOT NULL
c) handle nulls in ON clause (I settled on this which I thought doesn't make sense because it's redundant)
SELECT
a.AccountId,
a.AccountName,
a.Balance,
ug.UserGroup,
ugc.UserGroupID,
a.AccountManagerID,
c.FirstName,
c.LastName
FROM dbo.Account a
LEFT JOIN dbo.Contact c
ON a.AccountManagerID = c.ContactID
AND c.StatusID=1
AND a.AccountManagerID IS NOT NULL
LEFT JOIN dbo.UserGroupContact ugc
ON a.AccountManagerID = ugc.ContactID
AND ugc.StatusID=1
AND a.AccountManagerID IS NOT NULL
LEFT JOIN dbo.UserGroup ug
ON ugc.UserGroupID = ug.UserGroupID
AND ug.StatusID=1
AND ugc.UserGroupID IS NOT NULL
WHERE
a.Balance > 0
AND ugc.UserGroupID = 10
AND a.AccountManagerID NOT IN (20,30)
I did not provide example for ISNULL(). Also, I think he was not referring to implicit inner joins.
To recap, how do I handle this suggestion: "There is use of LEFT JOIN but there are non null conditions referenced in the WHERE clause."? He commented it's a "questionable LEFT JOIN logic".
One thing your question doesn't talk about is ANSI NULLs, whether they're on or off. If ANSI NULLs are on, comparing NULL = NULL return false, but if they're off, NULL = NULL returns true.
You can read more about ANSI NULLs here: https://learn.microsoft.com/en-us/sql/t-sql/statements/set-ansi-nulls-transact-sql
So if ANSI NULLs are OFF, you very much care about matching a NULL foreign key to missing row in a join. Your rows with NULL foreign keys are going to match every single row where the left table was all NULLs.
If ANSI NULLs are ON, the LEFT OUTER JOIN will behave as expected, and NULL foreign keys will not match up with NULL primary keys of other missing rows.
If another dev is telling you that you need to be careful about NULLs in OUTER JOINs, that's probably a good indication that the database you're working with has ANSI NULLs OFF.
one uses a LEFT JOIN if one does not mind returning null rows from the right table
Left table LEFT JOIN right table ON condition returns INNER JOIN rows plus unmatched left table rows extended by by nulls.
One uses left join if that's what one wants.
the join column of the left table
A join is not on "the join column"--whatever that means. It is on the condition.
That might, say, be one column in the left table being equal to the same-named column in the right. Or be a function of one column in the left table being equal to the same-named column in the right. Or be a boolean function of same-named columns. Or involve/include any of those. Or be any boolean function of any of the input columns.
If the left table join column is null, it is ok for me to return null rows on the right table, as a null is not equal to anything.
It seems you are suffering from a fundamental misconception. The only thing that is "ok for me to return" is the rows you were told to return, for certain possible input.
It's not a matter of, say, coding some condition on some tables because we want certain inner join rows and then accepting whatever null-extended rows we get. If we use a left join, it's because it returns the correct inner join rows & the correct null-extended rows; otherwise we want a different expression.
It is not a matter of, say, a left table row having null meaning that that row must not be part of the inner join & must be null-extended. We have some input; we want some output. If we want the inner join of two tables on some condition no matter how that condition uses nulls or any other input values plus the unmatched left table rows then we left join those tables on that condition; otherwise we want a different expression.
(Your question uses but doesn't explain "handle". You don't tell us the rows you were told to return, for certain possible input. You don't even give us example desired output for example input or your actual output for some query. So we have no way of adddressing what your DBA's critique is trying to say about what you ought to do or what you are doing your queries.)
Going to expand a bit on my comment here; this, however, is guess work based on what we have at the moment.
based on your current wording, what you've stated is wrong. Let's take these simple tables:
USE Sandbox;
GO
CREATE TABLE Example1 (ID int NOT NULL, SomeValue varchar(10));
GO
CREATE TABLE Example2 (ID int NOT NULL, ParentID int NOT NULL, SomeOtherValue varchar(10));
GO
INSERT INTO Example1
VALUES (1,'abc'),(2,'def'),(3,'bcd'),(4,'zxy');
GO
INSERT INTO Example2
VALUES (1,1,'sadfh'),(2,1,'asdgfkhji'),(3,3,'sdfhdfsbh');
Now, let's have a simple query with a LEFT JOIN:
SELECT *
FROM Example1 E1
LEFT JOIN Example2 E2 ON E1.ID = E2.ParentID
ORDER BY E1.ID, E2.ID;
Note that 5 rows are returned. No handling of NULL was required. if you added an OR to the ON it would be non-sensical, as ParentID cannot have a value of NULL.
If, however, we add something to the WHERE for example:
SELECT *
FROM Example1 E1
LEFT JOIN Example2 E2 ON E1.ID = E2.ParentID
WHERE LEFT(E2.SomeOtherValue,1) = 's'
ORDER BY E1.ID, E2.ID;
This now turns the LEFT JOIN into an implicit INNER JOIN. The above would therefore be better written as:
SELECT *
FROM Example1 E1
JOIN Example2 E2 ON E1.ID = E2.ParentID
WHERE LEFT(E2.SomeOtherValue,1) = 's'
ORDER BY E1.ID, E2.ID;
This, however, may not be the intended output; you may well want unmatched rows (and why you intially used a LEFT JOIN. There are 2 ways you could do that. The first is add the criteria to the ON clause:
SELECT *
FROM Example1 E1
LEFT JOIN Example2 E2 ON E1.ID = E2.ParentID
AND LEFT(E2.SomeOtherValue,1) = 's'
ORDER BY E1.ID, E2.ID;
The other would be do add an OR (don't use ISNULL, it affects SARGability!):
SELECT *
FROM Example1 E1
LEFT JOIN Example2 E2 ON E1.ID = E2.ParentID
WHERE LEFT(E2.SomeOtherValue,1) = 's'
OR E2.ID IS NULL
ORDER BY E1.ID, E2.ID;
This, I imagine is what your senior is talking about.
To repeat though:
SELECT *
FROM Example1 E1
LEFT JOIN Example2 E2 ON E1.ID = E2.ParentID OR E2.ID IS NULL
ORDER BY E1.ID, E2.ID;
Makes no sense. E2.ID cannot have a value of NULL, so the clause makes no change to the query, apart from probably making it run slower.
Cleanup:
DROP TABLE Example1;
DROP TABLE Example2;
in my eyes this is very simple, as far as I understood it.
Let's try with an example.
Imagine to have 2 tables, a master and a details table.
MASTER TABLE "TheMaster"
ID NAME
1 Foo1
2 Foo2
3 Foo3
4 Foo4
5 Foo5
6 Foo6
DETAILS TABLE "TheDetails"
ID ID_FK TheDetailValue
1 1 3
2 1 5
3 3 3
4 5 2
5 5 9
6 3 6
7 1 4
TheDetails table is linked to TheMaster table through the field ID_FK.
Now, imagine to run a query where you need to sum the values of the column TheDetailValue. I would go with something like this:
SELECT TheMaster.ID, TheMaster.NAME, Sum(TheDetails.TheDetailValue) AS SumOfTheDetailValue
FROM TheMaster INNER JOIN TheDetails ON TheMaster.ID = TheDetails.ID_FK
GROUP BY TheMaster.ID, TheMaster.NAME;
You would get a list like this:
ID NAME SumOfTheDetailValue
1 Foo1 12
3 Foo3 9
5 Foo5 11
But, what is your query uses a LEFT JOIN instead of a INNER JOIN? For example:
SELECT TheMaster.ID, TheMaster.NAME, Sum(TheDetails.TheDetailValue) AS SumOfTheDetailValue
FROM TheMaster LEFT JOIN TheDetails ON TheMaster.ID = TheDetails.ID_FK
GROUP BY TheMaster.ID, TheMaster.NAME;
The result would be:
ID NAME SumOfTheDetailValue
1 Foo1 12
2 Foo2
3 Foo3 9
4 Foo4
5 Foo5 11
6 Foo6
You will obtain a NULL for each master field having no values in the details table.
How do you exclude these values? Using an ISNULL!
SELECT TheMaster.ID, TheMaster.NAME, Sum(TheDetails.TheDetailValue) AS SumOfTheDetailValue
FROM TheMaster LEFT JOIN TheDetails ON TheMaster.ID = TheDetails.ID_FK
WHERE (((TheDetails.ID_FK) Is Not Null))
GROUP BY TheMaster.ID, TheMaster.NAME;
...which would take us to these results:
ID NAME SumOfTheDetailValue
1 Foo1 12
3 Foo3 9
5 Foo5 11
...which is exactly what we obtained before using an INNER JOIN.
So, in the end, I guess your collegue is talking about the use of the ISNULL function, in order to exclude the records having no relation in another table.
That's it.
For example purpose only the query were made using MS Access (rapid test), so the ISNULL function is implemented with "Is Null", which can become "Is Not Null". In your case probably it's something like ISNULL() and/or NOT ISNULL()

Optimizing SQL Function

I'm trying to optimize or completely rewrite this query. It takes about ~1500ms to run currently. I know the distinct's are fairly inefficient as well as the Union. But I'm struggling to figure out exactly where to go from here.
I am thinking that the first select statement might not be needed to return the output of;
[Key | User_ID,(User_ID)]
Note; Program and Program Scenario are both using Clustered Indexes. I can provide a screenshot of the Execution Plan if needed.
ALTER FUNCTION [dbo].[Fn_Get_Del_User_ID] (#_CompKey INT)
RETURNS VARCHAR(8000)
AS
BEGIN
DECLARE #UseID AS VARCHAR(8000);
SET #UseID = '';
SELECT #UseID = #UseID + ', ' + x.User_ID
FROM
(SELECT DISTINCT (UPPER(p.User_ID)) as User_ID FROM [dbo].[Program] AS p WITH (NOLOCK)
WHERE p.CompKey = #_CompKey
UNION
SELECT DISTINCT (UPPER(ps.User_ID)) as User_ID FROM [dbo].[Program] AS p WITH (NOLOCK)
LEFT OUTER JOIN [dbo].[Program_Scenario] AS ps WITH (NOLOCK) ON p.ProgKey = ps.ProgKey
WHERE p.CompKey = #_CompKey
AND ps.User_ID IS NOT NULL) x
RETURN Substring(#UserIDs, 3, 8000);
END
There are two things happening in this query
1. Locating rows in the [Program] table matching the specified CompKey (#_CompKey)
2. Locating rows in the [Program_Scenario] table that have the same ProgKey as the rows located in (1) above.
Finally, non-null UserIDs from both these sets of rows are concatenated into a scalar.
For step 1 to be efficient, you'd need an index on the CompKey column (clustered or non-clustered)
For step 2 to be efficient, you'd need an index on the join key which is ProgKey on the Program_Scenario table (this likely is a non-clustered index as I can't imagine ProgKey to be PK). Likely, SQL would resort to a loop join strategy - i.e., for each row found in [Program] matching the CompKey criteria, it would need to lookup corresponding rows in [Program_Scenario] with same ProgKey. This is a guess though, as there is not sufficient information on the cardinality and distribution of data.
Ensure the above two indexes are present.
Also, as others have noted the second left outer join is a bit confusing as an inner join is the right way to deal with it.
Per my interpretation the inner part of the query can be rewritten this way. Also, this is the query you'd ideally run and optimize before tacking the string concatenation part. The DISTINCT is dropped as it is automatic with a UNION. Try this version of the query along with the indexes above and if it provides the necessary boost, then include the string concatenation or the xml STUFF approaches to return a scalar.
SELECT UPPER(p.User_ID) as User_ID
FROM
[dbo].[Program] AS p WITH (NOLOCK)
WHERE
p.CompKey = #_CompKey
UNION
SELECT UPPER(ps.User_ID) as User_ID
FROM
[dbo].[Program] AS p WITH (NOLOCK)
INNER JOIN [dbo].[Program_Scenario] AS ps WITH (NOLOCK) ON p.ProgKey = ps.ProgKey
WHERE
p.CompKey = #_CompKey
AND ps.User_ID IS NOT NULL
I am taking a shot in the dark here. I am guessing that the last code you posted is still a scalar function. It also did not have all the logic of your original query. Again, this is a shot in the dark since there is no table definitions or sample data posted.
This might be how this would look as an inline table valued function.
ALTER FUNCTION [dbo].[Fn_Get_Del_User_ID]
(
#_CompKey INT
) RETURNS TABLE AS RETURN
select MyResult = STUFF(
(
SELECT distinct UPPER(p.User_ID) as User_ID
FROM dbo.Program AS p
WHERE p.CompKey = #_CompKey
group by p.User_ID
UNION
SELECT distinct UPPER(ps.User_ID) as User_ID
FROM dbo.Program AS p
LEFT OUTER JOIN dbo.Program_Scenario AS ps ON p.ProgKey = ps.ProgKey
WHERE p.CompKey = #_CompKey
AND ps.User_ID IS NOT NULL
for xml path ('')
), 1, 1, '')
from dbo.Program

Ignore condition in WHERE clause when column is NULL

I do have table were one row (with Type =E) is related to another row.
I have written query to return COUNT of those related rows. The problem is that there is no explicit relationship (like ID column that would clearly say which row is related to other row). Therefore I am trying to find relationship based on multiple conditions in WHERE clause.
The problem is that in few cases, the columns A and B could be NULL (for records where TYPE = 'M'). In such a cases I would like to ignore that condition, so It would use only first 3 conditions to determine relationship.
I have tried CASE Statement but is not working as expected:
SELECT [T1].[ID],[T1].[AlphaId],[T1].[Type],[T1].[A],[T1].[B],[T1].[Date],[T1].[ServiceID]
,( SELECT COUNT(*)
FROM MyTable T2
WHERE [T1].[AlphaId]=[T2].[AlphaId] AND
[T1].[Date]=[T2].[Date] AND
[T1].[ServiceID]=[T2].[ServiceID] AND
[T2].[A]=CASE WHEN [T2].[A] IS NULL THEN NULL ELSE [T1].[A] END AND
[T2].[B]=CASE WHEN [T2].[B] IS NULL THEN NULL ELSE [T1].[B] END AND
[T2].[Type]='M'
) as TotalCount
FROM MyTable T1
WHERE [T1].[Type] = 'E'
I can't ignore that condition, as for some cases the Date, ServiceID could be same, however it's the A, B which differs them. Luckily where A, B IS NULL, it is the Date, ServiceID which differs those two records.
http://sqlfiddle.com/#!3/c98db/1
Many thanks in advance.
You could join the tables and use COUNT and GROUP BY to get the counts. Then you can JOIN [A] and [B] if they are equal or NULL.
SELECT [T1].[ID],[T1].[AlphaId],[T1].[Type],[T1].[A],[T1].[B],[T1].[Date],[T1].[ServiceID], count([T2].[ID])
FROM MyTable T1
INNER JOIN MyTable T2 ON [T1].[AlphaId]=[T2].[AlphaId] AND
[T1].[Date]=[T2].[Date] AND
[T1].[ServiceID]=[T2].[ServiceID] AND
([T2].[A]= [T1].[A] OR [T2].[A] IS NULL )AND
([T2].[B]= [T1].[B] OR [T2].[B] IS NULL )AND
[T2].[Type] <> [T1].[Type]
WHERE [T1].[Type] = 'E'
GROUP BY [T1].[ID],[T1].[AlphaId],[T1].[Type],[T1].[A],[T1].[B],[T1].[Date],[T1].[ServiceID]

JOIN ON subselect returns what I want, but surrounding select is missing records when subselect returns NULL

I have a table where I am storing records with a Created_On date and a Last_Updated_On date. Each new record will be written with a Created_On, and each subsequent update writes a new row with the same Created_On, but an updated Last_Updated_On.
I am trying to design a query to return the newest row of each. What I have looks something like this:
SELECT
t1.[id] as id,
t1.[Store_Number] as storeNumber,
t1.[Date_Of_Inventory] as dateOfInventory,
t1.[Created_On] as createdOn,
t1.[Last_Updated_On] as lastUpdatedOn
FROM [UserData].[dbo].[StoreResponses] t1
JOIN (
SELECT
[Store_Number],
[Date_Of_Inventory],
MAX([Created_On]) co,
MAX([Last_Updated_On]) luo
FROM [UserData].[dbo].[StoreResponses]
GROUP BY [Store_Number],[Date_Of_Inventory]) t2
ON
t1.[Store_Number] = t2.[Store_Number]
AND t1.[Created_On] = t2.co
AND t1.[Last_Updated_On] = t2.luo
AND t1.[Date_Of_Inventory] = t2.[Date_Of_Inventory]
WHERE t1.[Store_Number] = 123
ORDER BY t1.[Created_On] ASC
The subselect works fine...I see X number of rows, grouped by Store_Number and Date_Of_Inventory, some of which have luo (Last_Updated_On) values of NULL. However, those rows in the sub-select where luo is null do not appear in the overall results. In other words, where I get 6 results in the sub-select, I only get 2 in the overall results, and its only those rows where the Last_Updated_On is not NULL.
So, as a test, I wrote the following:
SELECT 1 WHERE NULL = NULL
And got no results, but, when I run:
SELECT 1 WHERE 1 = 1
I get back a result of 1. Its as if SQL Server is not relating NULL to NULL.
How can I fix this? Why wouldn't two fields compare when both values are NULL?
You could use Coalesce (example assuming Store_Number is an integer)
ON
Coalesce(t1.[Store_Number],0) = Coalesce(t2.[Store_Number],0)
The ANSI Null comparison is not enabled by default; NULL doesn't equal NULL.
You can enable this (if your business case and your Database design usage of NULL requires this) by the Hint:
SET ansi_nulls off
Another alternative basic turn around using:
ON ((t1.[Store_Number] = t2.[Store_Number]) OR
(t1.[Store_Number] IS NULL AND t2.[Store_Number] IS NULL))
Executing your POC:
SET ansi_nulls off
SELECT 1 WHERE NULL = NULL
Returns:
1
This also works:
AND EXISTS (SELECT t1.Store_Number INTERSECT SELECT t2.Store_Number)

Resources