Result of various Joins on a scenario - sql-server

I have two tables test1 and test2 both with single column containing some values.
I have applied the inner and outer joins but have confusion with the output.
Create table test1
( id int)
insert into test1 values (1)
insert into test1 values (1)
insert into test1 values (1)
Create table test2
( id int)
insert into test2 values (1)
insert into test2 values (1)
insert into test2 values (NULL)
select a.id from test1 a inner join test2 b on a.id = b.id
I was expecting,
1
1
Null
as output for inner join, left join and right join.
But the original output was,
1
1
1
1
1
1
Could you please help me in understanding this on all the joins.

Each one of the three 1s in test1 joined with each one of the two 1s in test2, and this yielded the 3x2=6 rows you got in the result-set. There is nothing different between the first,second and third 1 in test1 and nothing different between the first and second 1 in test2.
Also, keep in mind that all of the following conditions:
NULL = 1
NULL <> 1
NULL = NULL
NULL <> NULL
are false. All conditions which have a NULL at one side will evaluate to false. This is because a NULL represents an unknown value.
What you expected is quite wrong, as you can see. It seems you expected the first row of test1 to be joined with the first row of test2 and so on. There is no such "magic" in sql - the entire logic of the join is placed within the ON clause, which joined the 1s as expained before.

Related

Get array of records based on two keys in same table

I have tried this on the following table,
SELECT DISTINCT
a.main_id,
array_agg(distinct a.secondary_id ) AS arr
FROM table1 a JOIN table1 b ON a.secondary_id = b.secondary_id or a.tertiary_id = b.tertiary_id
group by a.main_id, a.secondary_id , b.tertiary_id
I added the distinct to omit the duplicates But I can not get the whole row as an element in the array which does not even put the rows together to the array based on the below mentioned requirement. I was following this.
Table script:
Create table table1
(
id bigserial NOT NULL,
main_id integer NOT NULL,
secondary_id integer,
tertiary_id integer,
data1 text,
data2 text,
CONSTRAINT table1_pk PRIMARY KEY (main_id)
)
Data:
INSERT INTO table1(
main_id, secondary_id, tertiary_id, data1, data2)
VALUES (1,2,NULL,'data1_1_2_N','data2_1_2_N'),
(2,2,NULL,'data1_2_2_N','data2_2_2_N'),
(3,3,5,'data1_3_3_5','data2_3_3_5'),
(4,3,5,'data1_4_3_5','data2_4_3_5'),
(5,NULL,1,'data1_5_N_1','data2_5_N_1'),
(6,NULL,1,'data1_6_N_1','data2_6_N_1'),
(7,NULL,1,'data1_7_N_1','data2_7_N_1'),
(8,NULL,2,'data1_8_N_2','data2_8_N_2'),
(9,NULL,2,'data1_9_N_2','data2_9_N_2'),
(10,NULL,3,'data1_10_N_3','data2_10_N_3'),
(11,12,12,'data1_11_12_12','data2_11_12_12'),
(12,12,11,'data1_12_12_11','data2_12_12_11')
Requirement:
If secondary_id is equal in two or more rows they should be considered as one set,
else if tertiary_id is equal they can be considered as one set.
Expected Result:
1 | {(1,2,NULL,'data1_1_2_N','data2_1_2_N'),(2,2,NULL,'data1_2_2_N','data2_2_2_N')}
2 | {(3,3,NULL,'data1_3_3_N','data2_3_3_N'),(4,3,NULL,'data1_4_3_N','data2_4_3_N')}
3 | {(5,NULL,1,'data1_5_N_1','data2_5_N_1'),(6,NULL,1,'data1_6_N_1','data2_6_N_1'),(7,NULL,1,'data1_7_N_1','data2_7_N_1')}
4 | {(8,NULL,2,'data1_8_N_2','data2_8_N_2'),(9,NULL,2,'data1_9_N_2','data2_9_N_2')}
5 | {(10,NULL,3,'data1_10_N_3','data2_10_N_3')}
6 | {(11,12,12,'data1_11_12_12','data2_11_12_12'),(12,12,11,'data1_12_12_11','data2_12_12_11') }
Version "PostgreSQL 9.3.11"
This should achieve your output. The trick sticks within conditional group by clause to handle cases where secondary_id and tertiary_id are the same for a record which has a matching record on both of those fields.
select array_agg(distinct t1)
from table1 t1
join table1 t2 on
t1.secondary_id = t2.secondary_id
or t1.tertiary_id = t2.tertiary_id
group by
case
when t1.secondary_id is null or t1.secondary_id is null
then concat(t1.secondary_id,'#',t1.tertiary_id) -- #1
when t1.secondary_id is not null and t1.tertiary_id is not null and t1.secondary_id = t2.secondary_id
then t1.secondary_id::TEXT -- #2
when t1.secondary_id is not null and t1.tertiary_id is not null and t1.tertiary_id = t2.tertiary_id
then t1.tertiary_id::TEXT -- #3
end
order by 1
Standard case is when any of the fields are null, which stands for #1. We need to group by both columns and we're tricking it by concatenating both values from columns with a # mark and doing a group by this concatenated column.
For #2 and #3 we need to cast the grouping value to type text to make it go through (types returned by CASE statement need to be the same).
Option #2 serves the case when both values are not null and secondary_id matches between those "chosen" rows from selfjoin. Option #3 is analogical, but for tertiary_id match.
Output:
array_agg
------------------------------------------------------------------------------------------------------------
{"(1,1,2,,data1_1_2_N,data2_1_2_N)","(2,2,2,,data1_2_2_N,data2_2_2_N)"}
{"(3,3,3,5,data1_3_3_5,data2_3_3_5)","(4,4,3,5,data1_4_3_5,data2_4_3_5)"}
{"(5,5,,1,data1_5_N_1,data2_5_N_1)","(6,6,,1,data1_6_N_1,data2_6_N_1)","(7,7,,1,data1_7_N_1,data2_7_N_1)"}
{"(8,8,,2,data1_8_N_2,data2_8_N_2)","(9,9,,2,data1_9_N_2,data2_9_N_2)"}
{"(10,10,,3,data1_10_N_3,data2_10_N_3)"}
{"(11,11,4,4,data1_11_4_4,data2_11_4_4)","(12,12,4,11,data1_12_4_11,data2_12_4_11)"}
If you'd like to get rid of column id from your record, you could use a CTE and select all columns but id and then refer to that CTE in from clause.

SQL Join one-to-many tables, selecting only most recent entries

This is my first post - so I apologise if it's in the wrong seciton!
I'm joining two tables with a one-to-many relationship using their respective ID numbers: but I only want to return the most recent record for the joined table and I'm not entirely sure where to even start!
My original code for returning everything is shown below:
SELECT table_DATES.[date-ID], *
FROM table_CORE LEFT JOIN table_DATES ON [table_CORE].[core-ID] = table_DATES.[date-ID]
WHERE table_CORE.[core-ID] Like '*'
ORDER BY [table_CORE].[core-ID], [table_DATES].[iteration];
This returns a group of records: showing every matching ID between table_CORE and table_DATES:
table_CORE date-ID iteration
1 1 1
1 1 2
1 1 3
2 2 1
2 2 2
3 3 1
4 4 1
But I need to return only the date with the maximum value in the "iteration" field as shown below
table_CORE date-ID iteration Additional data
1 1 3 MoreInfo
2 2 2 MoreInfo
3 3 1 MoreInfo
4 4 1 MoreInfo
I really don't even know where to start - obviously it's going to be a JOIN query of some sort - but I'm not sure how to get the subquery to return only the highest iteration for each item in table 2's ID field?
Hope that makes sense - I'll reword if it comes to it!
--edit--
I'm wondering how to integrate that when I'm needing all the fields from table 1 (table_CORE in this case) and all the fields from table2 (table_DATES) joined as well?
Both tables have additional fields that will need to be merged.
I'm pretty sure I can just add the fields into the "SELECT" and "GROUP BY" clauses, but there are around 40 fields altogether (and typing all of them will be tedious!)
Try using the MAX aggregate function like this with a GROUP BY clause.
SELECT
[ID1],
[ID2],
MAX([iteration])
FROM
table_CORE
LEFT JOIN table_DATES
ON [table_CORE].[core-ID] = table_DATES.[date-ID]
WHERE
table_CORE.[core-ID] Like '*' --LIKE '%something%' ??
GROUP BY
[ID1],
[ID2]
Your example field names don't match your sample query so I'm guessing a little bit.
Just to make sure that I have everything you’re asking for right, I am going to restate some of your question and then answer it.
Your source tables look like this:
table_core:
table_dates:
And your outputs are like this:
Current:
Desired:
In order to make that happen all you need to do is use a subquery (or a CTE) as a “cross-reference” table. (I used temp tables to recreate your data example and _ in place of the - in your column names).
--Loading the example data
create table #table_core
(
core_id int not null
)
create table #table_dates
(
date_id int not null
, iteration int not null
, additional_data varchar(25) null
)
insert into #table_core values (1), (2), (3), (4)
insert into #table_dates values (1,1, 'More Info 1'),(1,2, 'More Info 2'),(1,3, 'More Info 3'),(2,1, 'More Info 4'),(2,2, 'More Info 5'),(3,1, 'More Info 6'),(4,1, 'More Info 7')
--select query needed for desired output (using a CTE)
; with iter_max as
(
select td.date_id
, max(td.iteration) as iteration_max
from #table_dates as td
group by td.date_id
)
select tc.*
, td.*
from #table_core as tc
left join iter_max as im on tc.core_id = im.date_id
inner join #table_dates as td on im.date_id = td.date_id
and im.iteration_max = td.iteration
select *
from
(
SELECT table_DATES.[date-ID], *
, row_number() over (partition by table_CORE date-ID order by iteration desc) as rn
FROM table_CORE
LEFT JOIN table_DATES
ON [table_CORE].[core-ID] = table_DATES.[date-ID]
WHERE table_CORE.[core-ID] Like '*'
) tt
where tt.rn = 1
ORDER BY [core-ID]

Ignore condition in WHERE clause when column is NULL

I do have table were one row (with Type =E) is related to another row.
I have written query to return COUNT of those related rows. The problem is that there is no explicit relationship (like ID column that would clearly say which row is related to other row). Therefore I am trying to find relationship based on multiple conditions in WHERE clause.
The problem is that in few cases, the columns A and B could be NULL (for records where TYPE = 'M'). In such a cases I would like to ignore that condition, so It would use only first 3 conditions to determine relationship.
I have tried CASE Statement but is not working as expected:
SELECT [T1].[ID],[T1].[AlphaId],[T1].[Type],[T1].[A],[T1].[B],[T1].[Date],[T1].[ServiceID]
,( SELECT COUNT(*)
FROM MyTable T2
WHERE [T1].[AlphaId]=[T2].[AlphaId] AND
[T1].[Date]=[T2].[Date] AND
[T1].[ServiceID]=[T2].[ServiceID] AND
[T2].[A]=CASE WHEN [T2].[A] IS NULL THEN NULL ELSE [T1].[A] END AND
[T2].[B]=CASE WHEN [T2].[B] IS NULL THEN NULL ELSE [T1].[B] END AND
[T2].[Type]='M'
) as TotalCount
FROM MyTable T1
WHERE [T1].[Type] = 'E'
I can't ignore that condition, as for some cases the Date, ServiceID could be same, however it's the A, B which differs them. Luckily where A, B IS NULL, it is the Date, ServiceID which differs those two records.
http://sqlfiddle.com/#!3/c98db/1
Many thanks in advance.
You could join the tables and use COUNT and GROUP BY to get the counts. Then you can JOIN [A] and [B] if they are equal or NULL.
SELECT [T1].[ID],[T1].[AlphaId],[T1].[Type],[T1].[A],[T1].[B],[T1].[Date],[T1].[ServiceID], count([T2].[ID])
FROM MyTable T1
INNER JOIN MyTable T2 ON [T1].[AlphaId]=[T2].[AlphaId] AND
[T1].[Date]=[T2].[Date] AND
[T1].[ServiceID]=[T2].[ServiceID] AND
([T2].[A]= [T1].[A] OR [T2].[A] IS NULL )AND
([T2].[B]= [T1].[B] OR [T2].[B] IS NULL )AND
[T2].[Type] <> [T1].[Type]
WHERE [T1].[Type] = 'E'
GROUP BY [T1].[ID],[T1].[AlphaId],[T1].[Type],[T1].[A],[T1].[B],[T1].[Date],[T1].[ServiceID]

Query with Left Outer Join

I'm having trouble figuring this out.
According to Jeff Atwood A Visual Explanation of SQL Joins Left outer join produces a complete set of records from Table A, with the matching records (where available) in Table B. If there is no match, the right side will contain null.
The left table (TableA) doesn't have duplicates. The right tableB has 1 or 2 entries for each client number. The PrimaryTP designates one as primary with 1 and the other has 0.
I shouldn't have to include the line And B.PrimaryTP = 1 because TableA doesn't have duplicates. Yet if I leave it out I get duplicate client numbers. Why?
Can you help me understand how this works. It's being very confusing to me. The logic of And B.PrimaryTP = 1 escapes me. Yet it seems to work. Still, I'm scared to trust it if I don't understand it. Can you help me understand it. Or do I have a logic error hidden in the query?
SELECT A.ClientNum --returns a list with no duplicate client numbers
FROM (...<TableA>
) as A
Left Outer Join
<TableB> as B
on A.ClientNum = B.ClientNum
--eliminate mismatch of (ClientNum <> FolderNum)
Where A.ClientNum Not In
(
Select ClientNum From <TableB>
Where ClientNum Is Not Null
And ClientNum <> IsNull(FolderNum, '')
)
--eliminate case where B.PrimaryTP <> 1
And B.PrimaryTP = 1
The difference between an INNER JOIN and a LEFT JOIN is just that the LEFT JOIN still returns the rows in Table A when there are no corresponding rows in Table B.
But it's still a JOIN, which means that if there is more than one corresponding row in Table B, it will join the row from Table A to each one of them.
So if you want to make sure that you get no more than one result for each row in Table A, you have to make sure that no more than one row from Table B is found - hence the And B.PrimaryTP = 1.
If you have one client number in A and two matches in Table B, then you will get duplicates.
Suppose you have the following data,
Table-A(client Num) Table-B(client Num)
1 2
2 2
The left Join Results
Table-A(client Num) Table-B(client Num)
1 (null)
2 2
2 2
This is the cause of duplicates. So you need to take distinct values form Table B or perform Distinct on the result set.
I shouldn't have to include the line And B.PrimaryTP = 1 because TableA doesn't have duplicates. Yet if I leave it out I get duplicate client numbers. Why?
Because both rows in the right table match a row in the left table. There is no way for SQL Server to output a triangular result; it must show the columns from both tables for every joined row. And this is true for INNER JOIN as well.
DECLARE #a TABLE(a INT);
DECLARE #b TABLE(b INT);
INSERT #a VALUES(1),(2);
INSERT #b VALUES(1),(1);
SELECT a.a, b.b FROM #a AS a
LEFT OUTER JOIN #b AS b ON a.a = b.b;
SELECT a.a, b.b FROM #a AS a
INNER JOIN #b AS b ON a.a = b.b;
Results:
a b
-- ----
1 1
1 1
2 NULL
a b
-- --
1 1
1 1
On the link that you gave the joins are explained very good. So the problem is that you have several records from table A (no matter that there are no duplicates) is that to 1 record in A there are 2 records in B (in some cases). To avoid this you can use either DISTINCT clause, either GROUP BY clause.
The LEFT OUTER JOIN will give you all the records from A with all the matching records from B. The difference with an INNER JOIN is that if there are no matching records in B, an INNER join will omit the record from A entirely, while the LEFT join will then still include a row with the results from A.
In your case, however, you may also want to check out the DISTINCT keyword.

JOIN ON subselect returns what I want, but surrounding select is missing records when subselect returns NULL

I have a table where I am storing records with a Created_On date and a Last_Updated_On date. Each new record will be written with a Created_On, and each subsequent update writes a new row with the same Created_On, but an updated Last_Updated_On.
I am trying to design a query to return the newest row of each. What I have looks something like this:
SELECT
t1.[id] as id,
t1.[Store_Number] as storeNumber,
t1.[Date_Of_Inventory] as dateOfInventory,
t1.[Created_On] as createdOn,
t1.[Last_Updated_On] as lastUpdatedOn
FROM [UserData].[dbo].[StoreResponses] t1
JOIN (
SELECT
[Store_Number],
[Date_Of_Inventory],
MAX([Created_On]) co,
MAX([Last_Updated_On]) luo
FROM [UserData].[dbo].[StoreResponses]
GROUP BY [Store_Number],[Date_Of_Inventory]) t2
ON
t1.[Store_Number] = t2.[Store_Number]
AND t1.[Created_On] = t2.co
AND t1.[Last_Updated_On] = t2.luo
AND t1.[Date_Of_Inventory] = t2.[Date_Of_Inventory]
WHERE t1.[Store_Number] = 123
ORDER BY t1.[Created_On] ASC
The subselect works fine...I see X number of rows, grouped by Store_Number and Date_Of_Inventory, some of which have luo (Last_Updated_On) values of NULL. However, those rows in the sub-select where luo is null do not appear in the overall results. In other words, where I get 6 results in the sub-select, I only get 2 in the overall results, and its only those rows where the Last_Updated_On is not NULL.
So, as a test, I wrote the following:
SELECT 1 WHERE NULL = NULL
And got no results, but, when I run:
SELECT 1 WHERE 1 = 1
I get back a result of 1. Its as if SQL Server is not relating NULL to NULL.
How can I fix this? Why wouldn't two fields compare when both values are NULL?
You could use Coalesce (example assuming Store_Number is an integer)
ON
Coalesce(t1.[Store_Number],0) = Coalesce(t2.[Store_Number],0)
The ANSI Null comparison is not enabled by default; NULL doesn't equal NULL.
You can enable this (if your business case and your Database design usage of NULL requires this) by the Hint:
SET ansi_nulls off
Another alternative basic turn around using:
ON ((t1.[Store_Number] = t2.[Store_Number]) OR
(t1.[Store_Number] IS NULL AND t2.[Store_Number] IS NULL))
Executing your POC:
SET ansi_nulls off
SELECT 1 WHERE NULL = NULL
Returns:
1
This also works:
AND EXISTS (SELECT t1.Store_Number INTERSECT SELECT t2.Store_Number)

Resources