I'm trying to create a recursive CTE that traverses all the records for a given ID, and does some operations between ordered records. Let's say I have customers at a bank who get charged a uniquely identifiable fee, and a customer can pay that fee in any number of installments:
WITH recursive payments (
id
, index
, fees_paid
, fees_owed
)
AS (
SELECT id
, index
, fees_paid
, fee_charged
FROM table
WHERE index = 1
UNION ALL
SELECT t.id
, t.index
, t.fees_paid
, p.fees_owed - p.fees_paid
FROM table t
JOIN payments p
ON t.id = p.id
AND t.index = p.index + 1
)
SELECT *
FROM payments
ORDER BY 1,2;
The join logic seems sound, but when I join the output of this query to the source table, I'm getting non-deterministic and incorrect results.
This is my first foray into Snowflake's recursive CTEs. What am I missing in the intermediate result logic that is leading to the non-determinism here?
I assume this is edited code, because in the anchor of you CTE you select the fourth column fee_charged which does not exist, and then in the recursion you don't sum the fees paid and other stuff, basically you logic seems rather strange.
So creating some random data, that has two different id streams to recurse over:
create or replace table data (id number, index number, val text);
insert into data
select * from values (1,1,'a'),(2,1,'b')
,(1,2,'c'), (2,2,'d')
,(1,3,'e'), (2,3,'f')
v(id, index, val);
Now altering you CTE just a little bit to concat that strings together..
WITH RECURSIVE payments AS
(
SELECT id
, index
, val
FROM data
WHERE index = 1
UNION ALL
SELECT t.id
, t.index
, p.val || t.val as val
FROM data t
JOIN payments p
ON t.id = p.id
AND t.index = p.index + 1
)
SELECT *
FROM payments
ORDER BY 1,2;
we get:
ID INDEX VAL
1 1 a
1 2 ac
1 3 ace
2 1 b
2 2 bd
2 3 bdf
Which is exactly as I would expect. So how this relates to your "it gets strange when I join to other stuff" is ether, your output of you CTE is not how you expect it to be.. Or your join to other stuff is not working as you expect, Or there is a bug with snowflake.
Which all comes down to, if the CTE results are exactly what you expect, create a table and join that to your other table, so eliminate some form of CTE vs JOIN bug, and to debug why your join is not working.
But if your CTE output is not what you expect, then lets help debug that.
How do I merge 2 tables into 1 table in T-SQL? I tried merging with full outer join which helps in joining 2 tables but with "customer Account" 2 times. I need all the columns from table A and Table B with only once "Customer Account Field" and all the rest of the columns from table A and Table B.ields.
Here is my example in more detail:
Table A - my first Table with 5 columns:
Table B - my second table with 6 columns:
I'm expecting the output like this:
Output with all fields in table A and in Table B but the common field only once:
Thanks a lot.
Add the required(all) columns from t1 and t2 to the select statement
SELECT COALESCE(t1.customeraccount, t2.customeraccount) as customeraccount,
t1.BasicCardType,
t2.MonthlySet
FROM table1 t1
FULL JOIN table2 t2 ON t1.customeraccount = t2.customeraccount;
(Edited based on comments): Join the tables on the CustomerAccount ID field (giving you entries that exist in both tables), then add a union for all entries that only exist in table A, then add a union for entries that only exist in table B. In principle:
-- get entries that exist in both tables
select Table_A.CustomerAccount, TableAField1, TableAField2, TableBField1, TableBField2
from Table_A
join Table_B on Table_A.CustomerAccount = Table_B.CustomerAccount
-- get entries that only exist in table_a
union select Table_A.CustomerAccount, TableAField1, TableAField2, null, null
from Table_A
where Table_A.CustomerAccount not in (select CustomerAccount from Table_B)
-- get entries that only exist in table_B
union select Table_B.CustomerAccount, null, null, TableBField1, TableBField2
from Table_B
where Table_B.CustomerAccount not in (select CustomerAccount from Table_A)
There is a more senior SQL developer (the DBA) at the office who told me that in all the LEFT JOINS of my script, I must handle the scenario where the join column of the left table is possibly null, otherwise, I have to use INNER JOINs. Now, being a noob, I might be wrong here, but I can't see his point and left me needlessly confused.
His explanation was, unless the column is non-nullable, either I must
use ISNULL(LeftTable.ColumnA,<replacement value here>) on the ON clause, or
handle null values in the ON clause or the
WHERE clause, either by adding AND LeftTable.ColumnA IS NOT NULL or AND LeftTable.ColumnA IS NULL.
I thought those are unnecessary, since one uses a LEFT JOIN if one does not mind returning null rows from the right table, if the values of the right table join column does not match the left table join column, whether it be using equality or inequality. My intent is that it does not have to be equal to the right table join column values. If the left table join column is null, it is ok for me to return null rows on the right table, as a null is not equal to anything.
What is it that I am not seeing here?
MAJOR EDIT:
So I am adding table definitions and scripts. These are not the exact scripts, just to illustrate the problem. I have remove earlier edits which are incorrect as was not in front of the script before.
CREATE TABLE dbo.Contact (
ContactID int NOT NULL, --PK
FirstName varchar(10) NULL,
LastName varchar(10) NULL,
StatusID int NULL,
CONSTRAINT PK_Contact_ContactID
PRIMARY KEY CLUSTERED (ContactID)
);
GO
CREATE TABLE dbo.UserGroup (
UserGroupID int NOT NULL, --PK
UserGroup varchar(50) NULL,
StatusID int NULL,
CONSTRAINT PK_UserGroup_UserGroupID
PRIMARY KEY CLUSTERED (UserGroupID)
);
GO
CREATE TABLE dbo.UserGroupContact (
UserGroupID int NOT NULL, --PK,FK
ContactID int NOT NULL, --PK,FK
StatusID int NULL
CONSTRAINT PK_UserGroupContact_UserGroupContactID
PRIMARY KEY CLUSTERED (UserGroupID, ContactID),
CONSTRAINT FK_UserGroupContact_UserGroupId
FOREIGN KEY (UserGroupId)
REFERENCES [dbo].[UserGroup](UserGroupId),
CONSTRAINT FK_UserGroupContact_ContactId
FOREIGN KEY (ContactId)
REFERENCES [dbo].[Contact](ContactId)
);
GO
CREATE TABLE dbo.Account (
AccountID int NOT NULL, --PK
AccountName varchar(50) NULL,
AccountManagerID int NULL, --FK
Balance int NULL,
CONSTRAINT PK_Account_AccountID
PRIMARY KEY CLUSTERED (AccountID),
CONSTRAINT FK_Account_AccountManagerID
FOREIGN KEY (AccountManagerID)
REFERENCES [dbo].[Contact](ContactId),
);
GO
My original query would look like below. When I say "left table", I mean the table on the left of the ON clause in a join. If "right table", its the table on the right of the ON clause.
SELECT
a.AccountId,
a.AccountName,
a.Balance,
ug.UserGroup,
ugc.UserGroupID,
a.AccountManagerID,
c.FirstName,
c.LastName
FROM dbo.Account a
LEFT JOIN dbo.Contact c
ON a.AccountManagerID = c.ContactID
AND c.StatusID=1
LEFT JOIN dbo.UserGroupContact ugc
ON a.AccountManagerID = ugc.ContactID
AND ugc.StatusID=1
LEFT JOIN dbo.UserGroup ug
ON ugc.UserGroupID = ug.UserGroupID
AND ug.StatusID=1
WHERE
a.Balance > 0
AND ugc.UserGroupID = 10
AND a.AccountManagerID NOT IN (20,30)
Notice in the example script above, the first and second left joins has a nullable column on the left table and non-nullable column on the right table. The third left join has both nullable columns on the left and right tables.
The suggestion was to "change to inner join or handle NULL condition in where clause" or "There is use of LEFT JOIN but there are non null conditions referenced in the WHERE clause."
The suggestion is to do either of these depending on intent:
a) convert to inner join (not possible as I want unmatched rows from Account table)
SELECT
a.AccountId,
a.AccountName,
a.Balance,
ug.UserGroup,
ugc.UserGroupID,
a.AccountManagerID,
c.FirstName,
c.LastName
FROM dbo.Account a
INNER JOIN dbo.Contact c
ON a.AccountManagerID = c.ContactID
AND c.StatusID=1
INNER JOIN dbo.UserGroupContact ugc
ON a.AccountManagerID = ugc.ContactID
AND ugc.StatusID=1
INNER JOIN dbo.UserGroup ug
ON ugc.UserGroupID = ug.UserGroupID
AND ug.StatusID=1
WHERE
a.Balance > 0
AND ugc.UserGroupID = 10
AND a.AccountManagerID NOT IN (20,30)
b) handle nulls in WHERE clause (not possible as I want to return rows with nulls on column a.AccountManagerID and on ugc.UserGroupID)
SELECT
a.AccountId,
a.AccountName,
a.Balance,
ug.UserGroup,
ugc.UserGroupID,
a.AccountManagerID,
c.FirstName,
c.LastName
FROM dbo.Account a
LEFT JOIN dbo.Contact c
ON a.AccountManagerID = c.ContactID
AND c.StatusID=1
LEFT JOIN dbo.UserGroupContact ugc
ON a.AccountManagerID = ugc.ContactID
AND ugc.StatusID=1
LEFT JOIN dbo.UserGroup ug
ON ugc.UserGroupID = ug.UserGroupID
AND ug.StatusID=1
WHERE
a.Balance > 0
AND ugc.UserGroupID = 10
AND a.AccountManagerID NOT IN (20,30)
AND a.AccountManagerID IS NOT NULL
AND ugc.UserGroupID IS NOT NULL
c) handle nulls in ON clause (I settled on this which I thought doesn't make sense because it's redundant)
SELECT
a.AccountId,
a.AccountName,
a.Balance,
ug.UserGroup,
ugc.UserGroupID,
a.AccountManagerID,
c.FirstName,
c.LastName
FROM dbo.Account a
LEFT JOIN dbo.Contact c
ON a.AccountManagerID = c.ContactID
AND c.StatusID=1
AND a.AccountManagerID IS NOT NULL
LEFT JOIN dbo.UserGroupContact ugc
ON a.AccountManagerID = ugc.ContactID
AND ugc.StatusID=1
AND a.AccountManagerID IS NOT NULL
LEFT JOIN dbo.UserGroup ug
ON ugc.UserGroupID = ug.UserGroupID
AND ug.StatusID=1
AND ugc.UserGroupID IS NOT NULL
WHERE
a.Balance > 0
AND ugc.UserGroupID = 10
AND a.AccountManagerID NOT IN (20,30)
I did not provide example for ISNULL(). Also, I think he was not referring to implicit inner joins.
To recap, how do I handle this suggestion: "There is use of LEFT JOIN but there are non null conditions referenced in the WHERE clause."? He commented it's a "questionable LEFT JOIN logic".
One thing your question doesn't talk about is ANSI NULLs, whether they're on or off. If ANSI NULLs are on, comparing NULL = NULL return false, but if they're off, NULL = NULL returns true.
You can read more about ANSI NULLs here: https://learn.microsoft.com/en-us/sql/t-sql/statements/set-ansi-nulls-transact-sql
So if ANSI NULLs are OFF, you very much care about matching a NULL foreign key to missing row in a join. Your rows with NULL foreign keys are going to match every single row where the left table was all NULLs.
If ANSI NULLs are ON, the LEFT OUTER JOIN will behave as expected, and NULL foreign keys will not match up with NULL primary keys of other missing rows.
If another dev is telling you that you need to be careful about NULLs in OUTER JOINs, that's probably a good indication that the database you're working with has ANSI NULLs OFF.
one uses a LEFT JOIN if one does not mind returning null rows from the right table
Left table LEFT JOIN right table ON condition returns INNER JOIN rows plus unmatched left table rows extended by by nulls.
One uses left join if that's what one wants.
the join column of the left table
A join is not on "the join column"--whatever that means. It is on the condition.
That might, say, be one column in the left table being equal to the same-named column in the right. Or be a function of one column in the left table being equal to the same-named column in the right. Or be a boolean function of same-named columns. Or involve/include any of those. Or be any boolean function of any of the input columns.
If the left table join column is null, it is ok for me to return null rows on the right table, as a null is not equal to anything.
It seems you are suffering from a fundamental misconception. The only thing that is "ok for me to return" is the rows you were told to return, for certain possible input.
It's not a matter of, say, coding some condition on some tables because we want certain inner join rows and then accepting whatever null-extended rows we get. If we use a left join, it's because it returns the correct inner join rows & the correct null-extended rows; otherwise we want a different expression.
It is not a matter of, say, a left table row having null meaning that that row must not be part of the inner join & must be null-extended. We have some input; we want some output. If we want the inner join of two tables on some condition no matter how that condition uses nulls or any other input values plus the unmatched left table rows then we left join those tables on that condition; otherwise we want a different expression.
(Your question uses but doesn't explain "handle". You don't tell us the rows you were told to return, for certain possible input. You don't even give us example desired output for example input or your actual output for some query. So we have no way of adddressing what your DBA's critique is trying to say about what you ought to do or what you are doing your queries.)
Going to expand a bit on my comment here; this, however, is guess work based on what we have at the moment.
based on your current wording, what you've stated is wrong. Let's take these simple tables:
USE Sandbox;
GO
CREATE TABLE Example1 (ID int NOT NULL, SomeValue varchar(10));
GO
CREATE TABLE Example2 (ID int NOT NULL, ParentID int NOT NULL, SomeOtherValue varchar(10));
GO
INSERT INTO Example1
VALUES (1,'abc'),(2,'def'),(3,'bcd'),(4,'zxy');
GO
INSERT INTO Example2
VALUES (1,1,'sadfh'),(2,1,'asdgfkhji'),(3,3,'sdfhdfsbh');
Now, let's have a simple query with a LEFT JOIN:
SELECT *
FROM Example1 E1
LEFT JOIN Example2 E2 ON E1.ID = E2.ParentID
ORDER BY E1.ID, E2.ID;
Note that 5 rows are returned. No handling of NULL was required. if you added an OR to the ON it would be non-sensical, as ParentID cannot have a value of NULL.
If, however, we add something to the WHERE for example:
SELECT *
FROM Example1 E1
LEFT JOIN Example2 E2 ON E1.ID = E2.ParentID
WHERE LEFT(E2.SomeOtherValue,1) = 's'
ORDER BY E1.ID, E2.ID;
This now turns the LEFT JOIN into an implicit INNER JOIN. The above would therefore be better written as:
SELECT *
FROM Example1 E1
JOIN Example2 E2 ON E1.ID = E2.ParentID
WHERE LEFT(E2.SomeOtherValue,1) = 's'
ORDER BY E1.ID, E2.ID;
This, however, may not be the intended output; you may well want unmatched rows (and why you intially used a LEFT JOIN. There are 2 ways you could do that. The first is add the criteria to the ON clause:
SELECT *
FROM Example1 E1
LEFT JOIN Example2 E2 ON E1.ID = E2.ParentID
AND LEFT(E2.SomeOtherValue,1) = 's'
ORDER BY E1.ID, E2.ID;
The other would be do add an OR (don't use ISNULL, it affects SARGability!):
SELECT *
FROM Example1 E1
LEFT JOIN Example2 E2 ON E1.ID = E2.ParentID
WHERE LEFT(E2.SomeOtherValue,1) = 's'
OR E2.ID IS NULL
ORDER BY E1.ID, E2.ID;
This, I imagine is what your senior is talking about.
To repeat though:
SELECT *
FROM Example1 E1
LEFT JOIN Example2 E2 ON E1.ID = E2.ParentID OR E2.ID IS NULL
ORDER BY E1.ID, E2.ID;
Makes no sense. E2.ID cannot have a value of NULL, so the clause makes no change to the query, apart from probably making it run slower.
Cleanup:
DROP TABLE Example1;
DROP TABLE Example2;
in my eyes this is very simple, as far as I understood it.
Let's try with an example.
Imagine to have 2 tables, a master and a details table.
MASTER TABLE "TheMaster"
ID NAME
1 Foo1
2 Foo2
3 Foo3
4 Foo4
5 Foo5
6 Foo6
DETAILS TABLE "TheDetails"
ID ID_FK TheDetailValue
1 1 3
2 1 5
3 3 3
4 5 2
5 5 9
6 3 6
7 1 4
TheDetails table is linked to TheMaster table through the field ID_FK.
Now, imagine to run a query where you need to sum the values of the column TheDetailValue. I would go with something like this:
SELECT TheMaster.ID, TheMaster.NAME, Sum(TheDetails.TheDetailValue) AS SumOfTheDetailValue
FROM TheMaster INNER JOIN TheDetails ON TheMaster.ID = TheDetails.ID_FK
GROUP BY TheMaster.ID, TheMaster.NAME;
You would get a list like this:
ID NAME SumOfTheDetailValue
1 Foo1 12
3 Foo3 9
5 Foo5 11
But, what is your query uses a LEFT JOIN instead of a INNER JOIN? For example:
SELECT TheMaster.ID, TheMaster.NAME, Sum(TheDetails.TheDetailValue) AS SumOfTheDetailValue
FROM TheMaster LEFT JOIN TheDetails ON TheMaster.ID = TheDetails.ID_FK
GROUP BY TheMaster.ID, TheMaster.NAME;
The result would be:
ID NAME SumOfTheDetailValue
1 Foo1 12
2 Foo2
3 Foo3 9
4 Foo4
5 Foo5 11
6 Foo6
You will obtain a NULL for each master field having no values in the details table.
How do you exclude these values? Using an ISNULL!
SELECT TheMaster.ID, TheMaster.NAME, Sum(TheDetails.TheDetailValue) AS SumOfTheDetailValue
FROM TheMaster LEFT JOIN TheDetails ON TheMaster.ID = TheDetails.ID_FK
WHERE (((TheDetails.ID_FK) Is Not Null))
GROUP BY TheMaster.ID, TheMaster.NAME;
...which would take us to these results:
ID NAME SumOfTheDetailValue
1 Foo1 12
3 Foo3 9
5 Foo5 11
...which is exactly what we obtained before using an INNER JOIN.
So, in the end, I guess your collegue is talking about the use of the ISNULL function, in order to exclude the records having no relation in another table.
That's it.
For example purpose only the query were made using MS Access (rapid test), so the ISNULL function is implemented with "Is Null", which can become "Is Not Null". In your case probably it's something like ISNULL() and/or NOT ISNULL()
I have a query:
SELECT c.somecolumn,p.someothercolumn
FROM table1 co
INNER JOIN table2 p(NOLOCK) ON co.COLUMN = p.COLUMN
INNER JOIN table3 c(NOLOCK) ON co.column11 = c.column11
WHERE co.filterColumn = 1
Table2 is a junction table and the join between table1 and table2 is on a column without distinct values (that’s the requirement and can't be changed) and hence there are cross joins.
Output of this query results in 180 million records.
Record count:
table 1: 2 190 561
table 2: 568 277
table 3: 300 150
How to optimize the above query? Execution plan:
Make sure you at least have indexes on the columns in the joins that include the columns you're returning (for example, in table2, you should have a non-clustered index that is keyed on "p.COLUMN" and includes "p.someothercolumn". For table 3, key on c.column11 and include c.somecolumn. You should have an index on table1.filtercolumn.
Consider also, that you have to return 180 million rows to the caller, that takes time. Try just inserting that data into a throwaway table just to keep the network load time out of your equation.
These could be ideally the indexes that are required:
For table1 - filtered index on COLUMN and column11 where co.filterColumn = 1
For table2 - Index on COLUMN include someothercolumn
For table3 - Index on column11 include somecolumn
SELECT c.somecolumn
,tmp.someothercolumn
FROM table1 co
INNER JOIN table3 c(NOLOCK) ON co.column11 = c.column11
AND co.filterColumn = 1
CROSS APPLY (SELECT TOP (1) p.SomeOtherColumn
FROM table2 p(NOLOCK)
WHERE p.COLUMN = co.Column) tmp
Given the following 2 rows of data:-
ColumnA ColumnB ColumnC ColumnD
33 10298 11588 4474.32
33 10298 11588 2237.16
How do I go about writing a T-SQL query which will remove only the first data row where ColumnsA - C are the same and the value in ColumnD is double that of the second data row.
It doesn't have to particularly performant as I am only removing approximately 500 rows.
Something along these lines should work:
DELETE FROM t2
FROM table t1
inner join
table t2
on
t1.ColumnA = t2.ColumnA and
t1.ColumnB = t2.ColumnB and
t1.ColumnC = t2.ColumnC and
t1.ColumnD * 2 = t2.ColumnD
This assumes that if you have 3 rows where their ratios between columnD values are 1 : 2 : 4, you want to delete both the 2 and 4 rows. If that's not the case, please consider such a situation and let me know what should happen there.
DELETE documentation
Complete script:
create table T (A int,B int, C int, D int)
insert into T(A,B,C,D)
values (1,2,3,4),(1,2,3,8)
delete from t2
from t t1
inner join
t t2
on
t1.A = t2.A and t1.B = t2.B and t1.C = t2.C and t1.D * 2 = t2.D
select * from T
Result:
A B C D
----------- ----------- ----------- -----------
1 2 3 4
Try this solution:
delete from YourTable
from YourTable t1
where exists (select 1 from YourTable t2 where t1.ColumnA=t2.ColumnA and t1.ColumnB=t2.ColumnB and t1.ColumnC=t2.ColumnC and t1.ColumnD=t2.ColumnD*2)
You can't use the same table two times in a join statement, if You want delete from that table. So use istead the join an exists statement or join a derived table.