Left Join containing where clause INSIDE join - sql-server

Lets say we have the following table structure:
DECLARE #Person TABLE
(
PersonId INT,
Name VARCHAR(50)
)
DECLARE #Address TABLE
(
AddressId INT IDENTITY(1,1),
PersonId INT
)
And we insert two person records:
INSERT INTO #Person (PersonId, Name) VALUES (1, 'John Doe')
INSERT INTO #Person (PersonId, Name) VALUES (2, 'Jane Doe')
But we only insert a address record for John
INSERT INTO #Address (PersonId) VALUES (1)
If I execute the following queries I get different results
SELECT *
FROM #Person p
LEFT JOIN #Address a
ON p.PersonId = a.PersonId AND a.PersonId IS NULL
PersonId | Name | AddressId | PersonId
1 | John Doe | NULL | NULL
2 | Jane Doe | NULL | NULL
VS
SELECT *
FROM #Person p
LEFT JOIN #Address a
ON p.PersonId = a.PersonId
WHERE a.PersonId IS NULL
PersonId | Name | AddressId | PersonId
2 | Jane Doe | NULL | NULL
Why are the queries returning different results?

The first query is not meeting any of your conditions. Hence it is displaying all results from the #Person table (Typical Left join). Where as in the second query, the where clause is applied after the join. Hence it is displaying proper result.

First:
get all records (two) from Person and join 0 records from Address, cos none of address have PersonID = NULL. After that no additional filters applyed. And you see two records from Person
Second:
get all records (two) from Person and one of them joined to Address with ID = 1. After that your WHERE filter applyed and one of records with joined ID = 1 disappears.

ON clause defines which all matching rows to show from both tables.
WHERE clause actually filters the rows.
In the 1st query, it is returning 2 rows because LEFT JOIN returns all the rows from the left table irrespective of match from right table.
2nd query is returning 1 row, because for PersonId=1, #Address table contains a matching record hence a.PersonId is NOT NULL.

Make it a habit to read your SQL query from the Where condition and then look at your joins, this will give you a clearer meaning/understanding of what is happening or going to be returned.
In this case you said WHERE a.PersonId IS NULL the Select Part must happen and It must Join using the following join criteria.
That is how your query is being read by your machine hence the different sets of results.
And then in contrast, on the condition where there is no where clause, the results on the Left table (p) do not have to exist on (a) but at the same time the results on (a) must be null but already they might not exist. Already at this point your SQL will be confused.

Related

what to use instead of union to join same results based on two where clauses

I have two queries that work as expected for example
Query 1
select Name,ID,Product,Question
from table 1
where Id= 9 and ProductID=30628
table output
Name | ID | Product | QUestion
0659e103-b33d-4603 |12356|Apple | is it picked up?
0659e103-b33d-4603 |12456|Apple |Available in store?
0659e103-b33d-4603 |12458|Apple |confirm order?
query 2
select Name,ID,Product,Question
from table 1
where Id= 9 and TypeID=2
table output
Name | ID | Product | QUestion
0659e103-b33d-4603 |12347|Apple | Problem at store?
as you can see in query 1 i use a ProductID and in query 2 i use a TypeID these two values gives me different out puts
so i used a union to join both as follows
select Name,ID,Product,Question
from table 1
where Id= 9 and ProductID=30628
union
select Name,ID,Product,Question
from table 1
where Id= 9 and TypeID=2
which i get the desired output
Name | ID | Product | QUestion
0659e103-b33d-4603 |12356|Apple | is it picked up?
0659e103-b33d-4603 |12456|Apple |Available in store?
0659e103-b33d-4603 |12458|Apple |confirm order?
0659e103-b33d-4603 |12347|Apple | Problem at store?
is their a better way to do this because my query will grow and i would not like to repeat the same thing over again. is their a better way to optimize the query?
NOte i can not use ProductID and TypeID on the same line because they do not result in accurate results
You could use OR since you are querying the same table.
SELECT Name
,ID
,Product
,Question
FROM TABLE1
WHERE (
Id = 9
AND ProductID = 30628
)
OR (
Id = 9
AND TypeID = 2
)
If you have a growing number of OR conditions you could use a temp table/variable and inner join to profit from a set based operation.
The inner join will only return matching rows.
CREATE TABLE #SomeTable(Id INT NOT NULL, ProductID INT NULL, TypeID INT NULL)
-- Insert all conditions you want to match.
INSERT INTO #SomeTable(Id, ProductID, TypeId)
VALUES (9, 30628, NULL)
, (9, NULL, 2)
SELECT Name
,ID
,Product
,Question
FROM TABLE1 x
INNER JOIN #SomeTable y ON
x.ID = y.ID -- Since ID is Not null in the temp table
AND (y.ProductID IS NULL OR y.ProductID = x.ProductID)
AND (y.TypeID IS NULL OR y.TypeID = x.TypeID)
You can use cas-when clause with a self join.
Case-when something like this:
SELECT t1_2.Name,
t1_2.ID,
t1_2.Product,
t1_2.Question,
(CASE WHEN (t1.Id= 9 and t1.ProductID=30628) THEN ID
WHEN (t1.Id= 9 and t1.TypeID=2) THEN ID
ELSE NULL) AS IDcalc
FROM table_1 t1 LEFT JOIN table_1 t1_2
ON t1.ID = t1_2.ID
WHERE (CASE WHEN (t1.Id= 9 and t1.ProductID=30628) THEN ID
WHEN (t1.Id= 9 and t1.TypeID=2) THEN ID
ELSE NULL) IS NOT NULL
You can use any table in the join.
In comparison of query performance the OR is much better until you have only one table, if you have more tables, then you should use temp table or case-when in your query.

SQL Server : left outer join on two relations to 2 tables

I have 2 tables with primary keys and third or many table which references these 2 primary tables and have some extra values on one or both primary keys.
I need to create some SQL which will always deliver result with as much information as possible by joining these 3 tables. Best result - all 3 tables joined. Medium result - at least some primary keys (or both) are selected. Worst result all columns are null.
Main idea is to have combination of two primary tables and many extra tables which could be empty but should allow results from tables with values.
I tried to start with 3 tables but got stuck on second join.
It works for me only when I join first table. Joining second one produces error.
What should I use instead of ? as SQL statement?
http://sqlfiddle.com/#!18/7438b/3
CREATE TABLE [AGENCIES]
(
[AGENCY_NAME] [CHAR](9),
id INT IDENTITY(1,1) NOT NULL PRIMARY KEY
);
CREATE TABLE [PERSONS]
(
[NAME] [CHAR](9),
id INT IDENTITY(1,1) NOT NULL PRIMARY KEY
);
CREATE TABLE [AGENCY_PERSON]
(
agency_id INT FOREIGN KEY REFERENCES agencies(id),
person_id INT FOREIGN KEY REFERENCES persons(id),
[TITLE] [CHAR](9) NULL,
id INT IDENTITY(1,1) NOT NULL PRIMARY KEY
);
INSERT INTO agencies (AGENCY_NAME)
VALUES ('AgencyOne'), ('AgencyTwo'), ('Agency3');
INSERT INTO persons (name)
VALUES ('PersonOne'), ('PersonTwo'), ('Person3');
INSERT INTO AGENCY_PERSON (agency_id, person_id, title)
VALUES (1, 1, 'TitleOne'), (1, 2, 'TitleTwo');
SELECT * FROM AGENCY_PERSON;
-- works fine for one primary table
SELECT [AGENCY_NAME], [TITLE]
FROM agencies
LEFT OUTER JOIN [AGENCY_PERSON] ON [AGENCY_PERSON].agency_id = agencies.id
WHERE [AGENCY_NAME] = 'AgencyOne';
-- error for two primary tables: Msg 4104 - The multi-part identifier "agencies.id" could not be bound.
SELECT [AGENCY_NAME], [TITLE], persons.name
FROM agencies, persons
LEFT OUTER JOIN [AGENCY_PERSON] ON [AGENCY_PERSON].agency_id = agencies.id
AND [AGENCY_PERSON].person_id = persons.id
WHERE [AGENCY_NAME] = 'AgencyOne';
-- select ? 'AgencyOne' - all records exist
-- AgencyOne, TitleOne, PersonOne
-- select ? 'TitleTwo' - both records on primary tables exist, but no in join table
-- AgencyOne, TitleTwo, NULL
-- select ? 'Agency3' - one of primary tables exist
-- Agency3, NULL, NULL
-- select ? 'Title3' - one of primary tables exist
-- NULL, Title3, NULL
-- select ? 'AgencyX' - nothing exists
-- NULL, NULL, NULL
forpas gave good answer but it is in reverse. Extra tables are left joined by primary which requires extra tables exist and have values. What I need is opposite - extra tables should join primaries. For example it could be more extra tables like PERSON_PHONE, PERSON_ADDRES or AGENCY_PERSON_LOCATION. As soon as agency or person exist (but no values in these extra tables) result should be row with existing agency and person and nulls in all other columns from the joined tables.
Your code would work if you did not use that old style (cross) join:
from agencies, persons
So write it like this:
select a.[AGENCY_NAME], ap.[TITLE], p.name
from agencies as a cross join persons as p
left outer join [AGENCY_PERSON] as ap
on ap.agency_id = a.id and ap.person_id = p.id
where a.[AGENCY_NAME] = 'AgencyOne';
I used aliases for all the tables involved and I qualified all the columns with the aliases of the tables they belong.
Results:
> AGENCY_NAME | TITLE | name
> :---------- | :-------- | :--------
> AgencyOne | TitleOne | PersonOne
> AgencyOne | TitleTwo | PersonTwo
> AgencyOne | null | Person3
I'm not sure if this is what you want as output but I believe you see now how you can join all 3 tables.
In case you want only the matching rows of the tables, then you should do inner joins:
select a.[AGENCY_NAME], ap.[TITLE], p.name
from [AGENCY_PERSON] as ap
inner join agencies as a on ap.agency_id = a.id
inner join persons as p on ap.person_id = p.id
where a.[AGENCY_NAME] = 'AgencyOne';
Results:
> AGENCY_NAME | TITLE | name
> :---------- | :-------- | :--------
> AgencyOne | TitleOne | PersonOne
> AgencyOne | TitleTwo | PersonTwo
See the demo.

Lookup delimited values in a table in sql-server

In a table A i have a column (varchar*30) city-id with the value e.g. 1,2,3 or 2,4.
The description of the value is stored in another table B, e.g.
1 Amsterdam
2 The Hague
3 Maastricht
4 Rotterdam
How must i join table A with table B to get the descriptions in one or maybe more rows?
Assuming this is what you meant:
Table A:
id
-------
1
2
3
Table B:
id | Place
-----------
1 | Amsterdam
2 | The Hague
3 | Maastricht
4 | Rotterdam
Keep id column in both tables as auto increment, and PK.
Then just do a simple inner join.
select * from A inner join B on (A.id = B.id);
Ideal way to deal with such scenarios is to have a normalized table as Collin. In case that can't be done here is the way to go about -
You would need to use a table-valued function to split the comma-seperated value. If you are having SQL-Server 2016, there is a built-in SPLIT_STRING function, if not you would need to create one as shown in this link.
create table dbo.sCity(
CityId varchar(30)
);
create table dbo.sCityDescription(
CityId int
,CityDescription varchar(30)
);
insert into dbo.sCity values
('1,2,3')
,('2,4');
insert into dbo.sCityDescription values
(1,'Amsterdam')
,(2,'The Hague')
,(3,'Maastricht')
,(4,'Rotterdam');
select ctds.CityDescription
,sst.Value as 'CityId'
from dbo.sCity ct
cross apply dbo.SplitString(CityId,',') sst
join dbo.sCityDescription ctds
on sst.Value = ctds.CityId;

Get array of records based on two keys in same table

I have tried this on the following table,
SELECT DISTINCT
a.main_id,
array_agg(distinct a.secondary_id ) AS arr
FROM table1 a JOIN table1 b ON a.secondary_id = b.secondary_id or a.tertiary_id = b.tertiary_id
group by a.main_id, a.secondary_id , b.tertiary_id
I added the distinct to omit the duplicates But I can not get the whole row as an element in the array which does not even put the rows together to the array based on the below mentioned requirement. I was following this.
Table script:
Create table table1
(
id bigserial NOT NULL,
main_id integer NOT NULL,
secondary_id integer,
tertiary_id integer,
data1 text,
data2 text,
CONSTRAINT table1_pk PRIMARY KEY (main_id)
)
Data:
INSERT INTO table1(
main_id, secondary_id, tertiary_id, data1, data2)
VALUES (1,2,NULL,'data1_1_2_N','data2_1_2_N'),
(2,2,NULL,'data1_2_2_N','data2_2_2_N'),
(3,3,5,'data1_3_3_5','data2_3_3_5'),
(4,3,5,'data1_4_3_5','data2_4_3_5'),
(5,NULL,1,'data1_5_N_1','data2_5_N_1'),
(6,NULL,1,'data1_6_N_1','data2_6_N_1'),
(7,NULL,1,'data1_7_N_1','data2_7_N_1'),
(8,NULL,2,'data1_8_N_2','data2_8_N_2'),
(9,NULL,2,'data1_9_N_2','data2_9_N_2'),
(10,NULL,3,'data1_10_N_3','data2_10_N_3'),
(11,12,12,'data1_11_12_12','data2_11_12_12'),
(12,12,11,'data1_12_12_11','data2_12_12_11')
Requirement:
If secondary_id is equal in two or more rows they should be considered as one set,
else if tertiary_id is equal they can be considered as one set.
Expected Result:
1 | {(1,2,NULL,'data1_1_2_N','data2_1_2_N'),(2,2,NULL,'data1_2_2_N','data2_2_2_N')}
2 | {(3,3,NULL,'data1_3_3_N','data2_3_3_N'),(4,3,NULL,'data1_4_3_N','data2_4_3_N')}
3 | {(5,NULL,1,'data1_5_N_1','data2_5_N_1'),(6,NULL,1,'data1_6_N_1','data2_6_N_1'),(7,NULL,1,'data1_7_N_1','data2_7_N_1')}
4 | {(8,NULL,2,'data1_8_N_2','data2_8_N_2'),(9,NULL,2,'data1_9_N_2','data2_9_N_2')}
5 | {(10,NULL,3,'data1_10_N_3','data2_10_N_3')}
6 | {(11,12,12,'data1_11_12_12','data2_11_12_12'),(12,12,11,'data1_12_12_11','data2_12_12_11') }
Version "PostgreSQL 9.3.11"
This should achieve your output. The trick sticks within conditional group by clause to handle cases where secondary_id and tertiary_id are the same for a record which has a matching record on both of those fields.
select array_agg(distinct t1)
from table1 t1
join table1 t2 on
t1.secondary_id = t2.secondary_id
or t1.tertiary_id = t2.tertiary_id
group by
case
when t1.secondary_id is null or t1.secondary_id is null
then concat(t1.secondary_id,'#',t1.tertiary_id) -- #1
when t1.secondary_id is not null and t1.tertiary_id is not null and t1.secondary_id = t2.secondary_id
then t1.secondary_id::TEXT -- #2
when t1.secondary_id is not null and t1.tertiary_id is not null and t1.tertiary_id = t2.tertiary_id
then t1.tertiary_id::TEXT -- #3
end
order by 1
Standard case is when any of the fields are null, which stands for #1. We need to group by both columns and we're tricking it by concatenating both values from columns with a # mark and doing a group by this concatenated column.
For #2 and #3 we need to cast the grouping value to type text to make it go through (types returned by CASE statement need to be the same).
Option #2 serves the case when both values are not null and secondary_id matches between those "chosen" rows from selfjoin. Option #3 is analogical, but for tertiary_id match.
Output:
array_agg
------------------------------------------------------------------------------------------------------------
{"(1,1,2,,data1_1_2_N,data2_1_2_N)","(2,2,2,,data1_2_2_N,data2_2_2_N)"}
{"(3,3,3,5,data1_3_3_5,data2_3_3_5)","(4,4,3,5,data1_4_3_5,data2_4_3_5)"}
{"(5,5,,1,data1_5_N_1,data2_5_N_1)","(6,6,,1,data1_6_N_1,data2_6_N_1)","(7,7,,1,data1_7_N_1,data2_7_N_1)"}
{"(8,8,,2,data1_8_N_2,data2_8_N_2)","(9,9,,2,data1_9_N_2,data2_9_N_2)"}
{"(10,10,,3,data1_10_N_3,data2_10_N_3)"}
{"(11,11,4,4,data1_11_4_4,data2_11_4_4)","(12,12,4,11,data1_12_4_11,data2_12_4_11)"}
If you'd like to get rid of column id from your record, you could use a CTE and select all columns but id and then refer to that CTE in from clause.

Query to find the record with most matching columns, where the number of columns and names of columns is unknown?

I have two tables, X and Y, with identical schema but different records. Given a record from X, I need a query to find the closest matching record in Y that contains NULL values for non-matching columns. Identity columns should be excluded from the comparison. For example, if my record looked like this:
------------------------
id | col1 | col2 | col3
------------------------
0 |'abc' |'def' | 'ghi'
And table Y looked like this:
------------------------
id | col1 | col2 | col3
------------------------
6 |'abc' |'def' | 'zzz'
8 | NULL |'def' | NULL
Then the closest match would be record 8, since where the columns don't match, there are NULL values. 6 WOULD have been the closest match, but the 'zzz' disqualified it.
What's unique about this problem is that the schema of the tables is unknown besides the id column and the data types. There could be 4 columns, or there could be 7 columns. We just don't know - it's dynamic. All we know is that there is going to be an 'id' column and that the columns will be strings, either varchar or nvarchar.
What is the best query in this case to pick the closest matching record out of Y, given a record from X? I'm actually writing a function. The input is an integer (the id of a record in X) and the output is an integer (the id of a record in Y, or NULL). I'm an SQL novice, so a brief explanation of what's happening in your solution would help me greatly.
There could be 4 columns, or there could be 7 columns.... I'm actually writing a function.
This is an impossible task. Because functions are deterministic, so you cannot have a function that will work on an arbitrary table structure, using dynamic SQL. A stored procedure, sure, but not a function.
However, the below shows you a way using FOR XML and some decomposing of the XML to unpivot rows into column names and values which can then be compared. The technique used here and the queries can be incorporated into a stored procedure.
MS SQL Server 2008 Schema Setup:
-- this is the data table to match against
create table t1 (
id int,
col1 varchar(10),
col2 varchar(20),
col3 nvarchar(40));
insert t1
select 6, 'abc', 'def', 'zzz' union all
select 8, null , 'def', null;
-- this is the data with the row you want to match
create table t2 (
id int,
col1 varchar(10),
col2 varchar(20),
col3 nvarchar(40));
insert t2
select 0, 'abc', 'def', 'ghi';
GO
Query 1:
;with unpivoted1 as (
select n.n.value('local-name(.)','nvarchar(max)') colname,
n.n.value('.','nvarchar(max)') value
from (select (select * from t2 where id=0 for xml path(''), type)) x(xml)
cross apply x.xml.nodes('//*[local-name()!="id"]') n(n)
), unpivoted2 as (
select x.id,
n.n.value('local-name(.)','nvarchar(max)') colname,
n.n.value('.','nvarchar(max)') value
from (select id,(select * from t1 where id=outr.id for xml path(''), type) from t1 outr) x(id,xml)
cross apply x.xml.nodes('//*[local-name()!="id"]') n(n)
)
select TOP(1) WITH TIES
B.id,
sum(case when A.value=B.value then 1 else 0 end) matches
from unpivoted1 A
join unpivoted2 B on A.colname = B.colname
group by B.id
having max(case when A.value <> B.value then 1 end) is null
ORDER BY matches;
Results:
| ID | MATCHES |
----------------
| 8 | 1 |

Resources