T-SQL How to join to multiple tables depending on value - sql-server

So I have some complicated production tables. What it comes down to, however, is that I'd like to be able to join to multiple tables depending on the where the value is that I'm seeking. Specifically, say I have an employee ID, "JHDOE", and I want to join it to the table where I can get that employee's name. The main table for employees is "Table A":
Notice that the field ID_2 does NOT have the value "JHDOE". Instead, it has "DOEJH". Well, there is another table that actually has the value I'm seeking, "Table B":
In this table, we do see "JHDOE" so at first I tried something like this:
from TableStart as start
left join TableB as b on
case
when start.EMP_ID like '[0-9]%'
then b.ID
else b.ID_2
end = start.EMP_ID
But this created other problems. So what I'd like to do is do something like join to EITHER table, or at least something to the same effect. One method I tried was this:
from TableStart as start
left join (select a.Name, a.ID, a.ID_2
from TableA as a
union
select b.Name, b.ID, b.ID_2
from TableB as b) names on
case
when start.EMP_ID like '[0-9]%'
then names.ID
else names.ID_2
end = start.EMP_ID
The result set for the union looks like this:
On my production data, this same scenario resulted in a blank. I suppose it doesn't know which row to join to? So I think I need to do something like pivot the rows into columns and then do an OR... but I'm not sure. I would be greatly appreciative of any guidance or instruction.

Instead of using a CASE in the join I think an IN operator would work. That will function as an OR (EMP_ID = ID or EMP_ID = ID_2). And if you LEFT JOIN from TableStart to both TableA and TableB and use the COALESCE function with each column from TableA and TableB you will get the first non-null value.
This assumes that if both TableA and TableB have a match for TableStart you are fine with taking the first non-null value which could result in a mixture of values from TableA and TableB if TableA has some null values.
select
start.*
, coalesce (a.Name, b.Name) as [Name]
, coalesce (a.ID, b.ID) as [ID]
, coalesce (a.ID_2, b.ID_2) as [ID_2]
from TableStart as start
left join TableA as a on start.EMP_ID in (a.ID, a.ID_2)
left join TableB as b on start.EMP_ID in (b.ID, b.ID_2)
Here is the full demo.

Related

Need an IFF statement to work where the data is not in a joining table, so it is not labeled as NULL

SELECT DISTINCT
ATO.Agent,
ATO.Bottler_ID,
ATO.Account_number,
ATO.DELIVERY_DATE,
IFF (ATO.Account_number IS NULL,0,1) AS Opportunity
from ATO
left join OOM on ATO.Account_number = OOM.Account_number
The Account number will not be NULL, If it does not appear in the OOM table, I just want a "0" in a new Opportunity column, but if it IS there, I need a 1.
just use a CASE statement to implement the logic based on OOM.Account_number being null or not
This query does a left join and replaces nulls joins with the desired logic:
with data as (select $1 id, $2 name from values(1,'a'), (2,'b') )
, reference as (select $1 id from values(1))
select a.*, iff(b.id is null, 'no ref', 'has ref')
from data a
left join reference b
on a.id=b.id
The query you posted should work as expected. Can you show us what results you get, and what results you expect?

MSSQL can't understand what's happening with the action "having count(*) lesser than <some field of other table>"

I've tried to understand some part of an exercise i'm doing and just couldn't get it.
There's a part where 'T' is selected, grouped by 'a' and than it's redirected to "having count(*) < T3.a",
and I don't know how to approach it.
I've tried googling this sort of thing and see if there are similar examples but all other examples were using regular numbers for ex.: "having count(*) < 5" and not whole fields for comparison.
The exercise is this:
MSSQL exercise
create table T(a int, b int);
insert into T values(1,2);
insert into T values(1,1);
insert into T values(2,3);
insert into T values(2,4);
insert into T values(3,4);
insert into T values(4,5);
select T3.b, (select count(T5.a)
from T T5
where T5.a = T3.b)
from (select T1.a as a, T2.b as b
from T T1, T T2
where T1.b < T2.a) as T3
where not exists (select T4.a
from T T4
group by T4.a
having count(*) < T3.a);
I thought that the having count(*) was comparing each value that was grouped by to each value of T3.a in each row and if all rows have met the criteria than the value is getting selected but I somehow get different results.
Can someone please explain to me what is really going on behind this "having count(*) < T3.a" operation?
Thank you in advance.
To repeat myself from the comments, a HAVING is like a WHERE for aggregate functions. You cannot use aggregate function in the WHERE, for example WHERE SUM(SomeColumn) > 5, so you need to do them in the HAVING: HAVING SUM(SomeColumn) > 5. This would returns any rows where the SUM of the column SomeColumn is greater than 5 in the group.
For your expression, HAVING COUNT(*) < T3.a it would only return rows where the value of COUNT(*) is less than the value of T3.a.
Let's break this down to it's separate parts.
First the FROM
from (select T1.a as a, T2.b as b
from T T1, T T2
where T1.b < T2.a) as T3
This uses the old-style deprecated cross-join syntax. It can be rewritten as a normal join:
from (select T1.a as a, T2.b as b
from T T1
join T T2 on T1.b < T2.a
) as T3
If we analyze what it does, we realize that it is actually what is known as a triangular join: every row is self-joined to every row lower than it. This was commonly done when window aggregates were not available.
WHERE
where not exists (select T4.a
from T T4
group by T4.a
having count(*) < T3.a);
This is a correlated subquery: T3.a is a reference to the outer query.
What this predicate says is: for this particular row, there must be no rows in the subquery.
The subquery itself says: take all rows in T, group them by a and count, then only include rows for which the count is less than the outer reference a.
Note that because it is an EXIST, the actual selected value is not used. I suspect this may not have been the intention.
SELECT
select T3.b, (select count(T5.a)
from T T5
where T5.a = T3.b)
We then take b from the first join, and the count from a subquery of all matching T rows. Again, this was common when window aggregates were not available.
So the whole thing can be rewritten as follows:
select T2.b, (select count(T5.a)
from T T5
where T5.a = T3.b)
from (
select *, count(*) over (partition by a) as cnt
from T
) T1
join T T2 on T1.b < T2.a
where T1.cnt < T1.a;
There is something not quite right about the logic in your query, but without knowing what the original intention was, and without seeing the table and column names, I cannot say. The triangular join in particular looks very suspect.

How to get data from other table within a group by query?

I have tried to group records from one table which have similar SerialNo. And I also want to show a column records from other table that has relation ship with table one using SerialNo.
I have a table 1:
And table 2:
My Query is:
select CIT_SERIALNUMBER, COUNT(CIT_ID)
as Cases from Table_2 where CIT_SOURCEID like '%E_One%'
and (CIT_CREATED BETWEEN '2018-01-15'AND '2019-06-15') and CIT_SERIALNUMBER is not null
group by CIT_SERIALNUMBER
having COUNT(CIT_ID)>1 order by min(CIT_CREATED) desc
Here is the result table:
In the query above I’ve got only CIT_SERIALNUMBER records from Table_2. But I also want to get the data from Table_1 column ComputerName. So, the expected result is:
Note: The two table 1 and 2 can be join by Column T1_Serial and CIT_SERIALNUMBER.
Please help me to re-write the sql query to achieve the expected result above.
If I understood your column names correctly, try this:
select CIT_SERIALNUMBER, ComputerName, COUNT(CIT_ID)
as Cases from Table_2 join Table_1 on Table_2.CIT_SERIALNUMBER=Table_1.Serial where CIT_SOURCEID like '%E_One%'
and (CIT_CREATED BETWEEN '2018-01-15'AND '2019-06-15') and CIT_SERIALNUMBER is not null
group by CIT_SERIALNUMBER
having COUNT(CIT_ID)>1 order by min(CIT_CREATED) desc
Try this-
SELECT A.ComputerName,
CIT_SERIALNUMBER,
COUNT(CIT_ID) AS Cases
FROM table_1 A
INNER JOIN Table_2 B ON A.Column T1_Serial = CIT_SERIALNUMBER.
WHERE B.CIT_SOURCEID LIKE '%E_One%'
AND (B.CIT_CREATED BETWEEN '2018-01-15' AND '2019-06-15')
AND B.CIT_SERIALNUMBER IS NOT NULL
GROUP BY A.ComputerName,B.CIT_SERIALNUMBER
HAVING COUNT(B.CIT_ID) > 1
ORDER BY MIN(B.CIT_CREATED) DESC;
It look odd, but I've got a solution for this:
I select all duplicated records from Table_2 first
Then I join Table_1 with result set of Table_2 to view column from both tables
Then I use another select to select data from result set above and group by all records.
Here is the query:
select z.CIT_SERIALNUMBER, z.ComputerName, z.Cases from (
SELECT y.CIT_SERIALNUMBER, x.ComputerName, y.Cases
FROM Table_1 x
right JOIN (
select CIT_SERIALNUMBER, COUNT(CIT_ID)
as Cases from Table_2 where CIT_SOURCEID like '%E_One%'
and (CIT_CREATED BETWEEN '2018-01-15'AND '2019-06-15') and CIT_SERIALNUMBER is not null
group by CIT_SERIALNUMBER
having COUNT(CIT_ID)>1
) y ON y.CIT_SERIALNUMBER = x.SerialNo) z group by CIT_SERIALNUMBER, z.ComputerName, z.Cases
Result set:

sql insert query clause

I have a table A which contains fields
ChangeID DistributionID OutletBrandID
and Table B contains
ID DistributionID OutletBrandID
I need to insert data in table A from table B only if the distributionID and OutletBrandID combination doesn't exist already. Therefore I can't simply use the IN clause as it needs to be a combination.
Assuming that ChangeID and ID should match between the tables:
INSERT INTO TableA (ChangeID, DistributionID, OutletBrandID)
SELECT b.ID, b.DistributionID, b.OutletBrandID FROM TableB b
LEFT OUTER JOIN TableA a ON a.DistributionID=b.DistributionID
AND a.OutletBrandID = b.OutletBrandID
WHERE
a.OutletBrandID IS NULL
AND
a.DistributionID IS NULL

OVER (ORDER BY Col) generates 'Sort' operation

I'm working on a query that needs to do filtering, ordering and paging according to the user's input. Now I'm testing a case that's really slow, upon inspection of the Query Plan a 'Sort' is taking 96% of the time.
The datamodel is really not that complicated, the following query should be clear enough to understand what's happening:
WITH OrderedRecords AS (
SELECT
A.Id
, A.col2
, ...
, B.Id
, B.col1
, ROW_NUMBER() OVER (ORDER BY B.col1 ASC) AS RowNumber
FROM A
LEFT JOIN B ON (B.SomeThing IS NULL) AND (A.BId = B.Id)
WHERE (A.col2 IN (...)) AND (B.Id IN (...))
)
SELECT
*
FROM OrderedRecords WHERE RowNumber Between x AND y
A is a table containing about 100k records, but will grow to tens of millions in the field, while B is category type table with 5 items (and this will never grow any bigger then perhaps a few more). There are clustered indexes on A.Id and B.Id.
Performance is really dreadful and I'm wondering if it's possible to remedy this somehow. If, for example, the ordering is on A.Id instead of B.col1 everything is pretty darn fast. Perhaps I can optimize B.col1 is some sort of index.
I already tried putting an index on the field itself, but this didn't help. Probably because the number of distinct items in table B is very small (in itself & compared to A).
Any ideas?
I think this may be part of the problem:
LEFT JOIN B ON (B.SomeThing IS NULL) AND (A.Id = B.Id)
WHERE (A.col2 IN (...)) AND (B.Id IN (...)
Your LEFT JOIN is going to logically act like an INNER JOIN because of the WHERE clause you have in place, since only certain B.ID rows are going to be returned. If that's your intent, then go ahead and use an inner join, which may help the optimizer realize that you are looking for a restricted number of rows.
I suggest you to try following.
For the B table create index:
create index IX_B_1 on B (col1, Id, SomeThing)
For the A table create index:
create index IX_A_1 on A (col2, BId) include (Id, ...)
In the include put all other columns of the table A, that listed in SELECT of OrderedRecords CTE.
However, as you see, index IX_A_1 is space taking, and can take size of about table data itself.
So, as an alternative you may try omit extra columns from include part of the index:
create index IX_A_2 on A (col2, BId) include (Id)
but in this case you will have to slightly modify your query:
;WITH OrderedRecords AS (
SELECT
AId = A.Id
, A.col2
-- remove other A columns from here
, bid = B.Id
, B.col1
, ROW_NUMBER() OVER (ORDER BY B.col1 ASC) AS RowNumber
FROM A
LEFT JOIN B ON (B.SomeThing IS NULL) AND (A.BId = B.Id)
WHERE (A.col2 IN (...)) AND (B.Id IN (...))
)
SELECT
R.*, A.OtherColumns
FROM OrderedRecords R
join A on A.Id = R.AId
WHERE R.RowNumber Between x AND y

Resources