SQL Server Joining Between Two Tables Based on Closest Value

SQL Server Joining Between Two Tables Based on Closest Value - sql-server

I have two sets of data, as shown below.
Table 1:
enter code here
ID Value_1
1 233.67
2 83.28
3 84.49
4 1234.83
Table 2:
NewID Value_3 Value_4
5 NULL 83
6 NULL 85
7 NULL 235
I want to join the two tables in such a way that the resulting data set would look like below.
ID NewID Value_1 Value_2
1 7 233.67 235
2 5 83.28 83
3 6 84.49 85
4 NULL 1234.83 NULL
I know that using the ROUND command would cause future problems. Do any of you know how I could create the above resulting set?

Something like this:
DECLARE #Table1 TABLE
(
Id int,
Value1 float
)
INSERT INTO #Table1
VALUES
(1, 233.67),
(2, 83.28),
(3, 84.49),
(4, 1234.83)
DECLARE #Table2 TABLE
(
NewId int,
Value2 float
)
INSERT INTO #Table2
VALUES
(5, 83),
(6, 85),
(7, 235)
SELECT DISTINCT
t1.Id,
FIRST_VALUE(t2.NewId) OVER (PARTITION BY t1.Id ORDER BY ABS(t1.Value1 - t2.Value2) ASC) AS NewId,
t1.Value1,
FIRST_VALUE(t2.Value2) OVER (PARTITION BY t1.Id ORDER BY ABS(t1.Value1 - t2.Value2) ASC) AS Value2
FROM #Table1 t1
CROSS JOIN #Table2 t2
ORDER BY t1.Id
But you will get a result for ID=4 too.

This approach avoids a cross join, which requires every combination to be tested. It also works with SQL Server 2005 and up. It works by calculating a lower and upper bound between the midpoints of each record in Table2 and then joining on Table1 where Value_1 is between the midpoints. If Value_1 lands right on the boundary this code will round up (choose the higher of Value_4 to match).
--Load Table1
select 1 ID, convert(float,233.67) Value_1 into #Table1
insert into #Table1 select 2, 83.28
insert into #Table1 select 3, 84.49
insert into #Table1 select 4, 1234.83
--Load Table2
select 5 NewID, null Value_3, convert(float,83) Value_4 into #Table2
insert into #Table2 select 6, NULL, 85
insert into #Table2 select 7, NULL, 235
;with cte_Table2 as
(
select *, ROW_NUMBER() over (order by Value_4) OrderNum
from #Table2
)
Select #Table1.ID,
NewTable2.NewID,
#Table1.Value_1,
NewTable2.Value_4 Value_2
from #Table1
full join
(
select Table2.NewID,
Table2.Value_3,
Table2.Value_4,
Table2Prev.Value_4 + (Table2.Value_4 - Table2Prev.Value_4) / 2.0 LowerBound,
Table2.Value_4 + (Table2Next.Value_4 - Table2.Value_4) / 2.0 UpperBound
from cte_Table2 Table2
left join cte_Table2 Table2Prev
on Table2.OrderNum = Table2Prev.OrderNum + 1
left join cte_Table2 Table2Next
on Table2.OrderNum = Table2Next.OrderNum - 1
) NewTable2
on (#Table1.Value_1 < UpperBound or UpperBound is null)
and (#Table1.Value_1 >= LowerBound or LowerBound is null)
order by 1
I noticed that you did not show a match for ID of 4 in your expected output. If you need some kind of range to exclude matches then you would have to add that restriction to the ON conditions of the join. For example you could exclude values that are outside a threshold of 10 by making sure you use a FULL JOIN and add this to the ON conditions:
and abs(#Table1.Value_1-NewTable2.Value_4) < 10.0
It occurs to me though that maybe what you are really trying to do is a type of join where not only is the value in Table1 matched with it's closest value in Table2 but the value in Table2 is also the closest value in Table1. In this case, you would have to build a sub-query for Table1 with the boundary conditions and then check for it's closest match as well. Like this:
;with cte_Table1 as
(
select *, ROW_NUMBER() over (order by Value_1) OrderNum
from #Table1
),
cte_Table2 as
(
select *, ROW_NUMBER() over (order by Value_4) OrderNum
from #Table2
)
Select NewTable1.ID,
NewTable2.NewID,
NewTable1.Value_1,
NewTable2.Value_4 Value_2
from
(
select Table1.ID,
Table1.Value_1,
Table1Prev.Value_1 + (Table1.Value_1 - Table1Prev.Value_1) / 2.0 LowerBound,
Table1.Value_1 + (Table1Next.Value_1 - Table1.Value_1) / 2.0 UpperBound
from cte_Table1 Table1
left join cte_Table1 Table1Prev
on Table1.OrderNum = Table1Prev.OrderNum + 1
left join cte_Table1 Table1Next
on Table1.OrderNum = Table1Next.OrderNum - 1
) NewTable1
full join
(
select Table2.NewID,
Table2.Value_3,
Table2.Value_4,
Table2Prev.Value_4 + (Table2.Value_4 - Table2Prev.Value_4) / 2.0 LowerBound,
Table2.Value_4 + (Table2Next.Value_4 - Table2.Value_4) / 2.0 UpperBound
from cte_Table2 Table2
left join cte_Table2 Table2Prev
on Table2.OrderNum = Table2Prev.OrderNum + 1
left join cte_Table2 Table2Next
on Table2.OrderNum = Table2Next.OrderNum - 1
) NewTable2
on (NewTable1.Value_1 < NewTable2.UpperBound or NewTable2.UpperBound is null)
and (NewTable1.Value_1 >= NewTable2.LowerBound or NewTable2.LowerBound is null)
and (NewTable2.Value_4 < NewTable1.UpperBound or NewTable1.UpperBound is null)
and (NewTable2.Value_4 >= NewTable1.LowerBound or NewTable1.LowerBound is null)
order by 1
Now the records in the tables will only match if the values are the closest match both ways. This will ensure that each record in Table1 will be matched to at most 1 value in Table2. Because it's a full join you can get null values on either side... depending on which table is smaller.
One more thing to mention... this code might not handle duplicate values the way you need. If you have more than one of the same value in either Value_1 or Value_4, it will find a match to one of them but not both. If you want that... then you would have to change your common table expressions to:
;with cte_Table1 as
(
select *, ROW_NUMBER() over (order by Value_1) OrderNum
from (select distinct Value_1 from #Table1) tbl1
),
cte_Table2 as
(
select *, ROW_NUMBER() over (order by Value_4) OrderNum
from (select distinct Value_4 from #Table2) tbl2
)
Then update your sub-queries to only output the values with the boundaries. This would give you best matches with between the unique values. Then you could join to Table1 and Table2 ON Table1.Value_1 = NewTable1.Value1 and Table2.Value_4 = NewTable2.Value_4 to get the rest of the fields.
Certainly some optimizations can be made if you have very large tables... like breaking out some of these sub-queries into indexed temporary tables.

Related

T-SQL Left join twice

Query below works as planned, it shows exactly the way i joined it, and that is fine, but problem with it, is that if you have more "specialization" tables for users, something like "Mail type" or anything that user can have more then one data ... you would have to go two left joins for each and "give priority" via ISNULL (in this case)
I am wondering, how could I avoid using two joins and "give" priority to TypeId 2 over TypeId 1 in a single join, is that even possible?
if object_id('tempdb..#Tab1') is not null drop table #Tab1
create table #Tab1 (UserId int, TypeId int)
if object_id('tempdb..#Tab2') is not null drop table #Tab2
create table #Tab2 (TypeId int, TypeDescription nvarchar(50))
insert into #Tab1 (UserId, TypeId)
values
(1, 1),
(1, 2)
insert into #Tab2 (TypeId, TypeDescription)
values
(1, 'User'),
(2, 'Admin')
select *, ISNULL(t2.TypeDescription, t3.TypeDescription) [Role]
from #Tab1 t1
LEFT JOIN #Tab2 t2 on t1.TypeId = t2.TypeId and
t2.TypeId = 2
LEFT JOIN #Tab2 t3 on t1.TypeId = t3.TypeId and
t3.TypeId = 1

The first problem is determining priority. In this case, you could use the largest TypeId, but that does not seem like a great idea. You could add another column to serve as a priority ordinal instead.
From there, it is a top 1 per group query:
using top with ties and row_number():
select top 1 with ties
t1.UserId, t1.TypeId, t2.TypeDescription
from #Tab1 t1
left join #Tab2 t2
on t1.TypeId = t2.TypeId
order by row_number() over (
partition by t1.UserId
order by t2.Ordinal
--order by t1.TypeId desc
)
using common table expression and row_number():
;with cte as (
select t1.UserId, t1.TypeId, t2.TypeDescription
, rn = row_number() over (
partition by t1.UserId
order by t2.Ordinal
--order by t1.TypeId desc
)
from #Tab1 t1
left join #Tab2 t2
on t1.TypeId = t2.TypeId
)
select UserId, TypeId, TypeDescription
from cte
where rn = 1
rextester demo for both: http://rextester.com/KQAV36173
both return:
+--------+--------+-----------------+
| UserId | TypeId | TypeDescription |
+--------+--------+-----------------+
| 1 | 2 | Admin |
+--------+--------+-----------------+

Actually I don't think you don't need a join at all. But you have to take the max TypeID without respect to the TypeDescription, since these differences can defeat a Group By. So a workaround is to take the Max without TypeDescription initially, then subquery the result to get the TypeDescription.
SELECT dT.*
,(SELECT TypeDescription FROM #Tab2 T2 WHERE T2.TypeId = dT.TypeId) [Role] --2. Subqueries TypeDescription using the Max TypeID
FROM (
select t1.UserId
,MAX(T1.TypeId) [TypeId]
--, T1.TypeDescription AS [Role] --1. differences will defeat group by. Subquery for value later in receiving query.
from #Tab1 t1
GROUP BY t1.UserId
) AS dT
Produces Output:
UserId TypeId Role
1 2 Admin

How to join two Tables which have no unique ID

I am trying to Left join Table 1 to table 2
Table1 Table2
ID Data ID Data2
1 r 1 q
2 t 1 a
3 z 2 x
1 u 3 c
After i have left joined this two Tables i would like get something like this
Table1+2
ID Data Data2
1 r a
2 t x
3 z c
1 u q
and NOT
Table1+2
ID Data Data2
1 r q
2 t x
3 z c
1 u q
and my question is: is there any possibility to tell table 2 that if u have used something for table 1, dont use it and give me next value. do i have to make it im T-SQL or to and new column where i can list if this id exists then write 2 if not 1(Number data). How can i solve this problem?
Ty u in advance.

Since there's no uniqueness in the ID on both tables, lets add some uniqueness to it.
So it can be used to join on.
The window function ROW_NUMBER can be used for that.
An example solution that gives the expected result:
DECLARE #TestTable1 TABLE (ID INT, Data VARCHAR(1));
DECLARE #TestTable2 TABLE (ID INT, Data VARCHAR(1));
INSERT INTO #TestTable1 VALUES (1,'r'),(2,'t'),(3,'z'),(1,'u');
INSERT INTO #TestTable2 VALUES (1,'q'),(1,'a'),(2,'x'),(3,'c');
select
t1.ID, t1.Data,
t2.Data as Data2
from (
select ID, Data,
row_number() over (partition by ID order by Data) as rn
from #TestTable1
) t1
left join (
select ID, Data,
row_number() over (partition by ID order by Data) as rn
from #TestTable2
) t2 on t1.ID = t2.ID and t1.rn = t2.rn;
Note: Because of the LEFT JOIN, this does assume the amount of same ID's in table2 are equal or lower can those on table1. But you can change that to a FULL JOIN if that's not the case.
Returns :
ID Data Data2
1 r a
1 u q
2 t x
3 z c
To get the other result could have been achieved in different ways.
This is actually a more common situation.
Where one wants all from Table 1, but only get one value from Table 2 for each record of Table 1.
1) A top 1 with ties in combination with a order by rownumber()
select top 1 with ties
t1.ID, t1.Data,
t2.Data as Data2
from #TestTable1 t1
left join #TestTable2 t2 on t1.ID = t2.ID
order by row_number() over (partition by t1.ID, t1.Data order by t2.Data desc);
The top 1 with ties will only show those where the row_number() = 1
2) Using the row_number in a subquery:
select ID, Data, Data2
from (
select
t1.ID, t1.Data,
t2.Data as Data2,
row_number() over (partition by t1.ID, t1.Data order by t2.Data desc) as rn
from #TestTable1 t1
left join #TestTable2 t2 on t1.ID = t2.ID
) q
where rn = 1;
3) just a simple group by and a max :
select t1.ID, t1.Data, max(t2.Data) as Data2
from #TestTable1 t1
left join #TestTable2 t2 on t1.ID = t2.ID
group by t1.ID, t1.Data
order by 1,2;
All 3 give the same result:
ID Data Data2
1 r q
1 u q
2 t x
3 z c

Use ROW_NUMBER
DECLARE #Table1 TABLE (ID INT, DATA VARCHAR(10))
DECLARE #Table2 TABLE (ID INT, DATA VARCHAR(10))
INSERT INTO #Table1
VALUES
(1, 'r'),
(2, 't'),
(3, 'z'),
(1, 'u')
INSERT INTO #Table2
VALUES
(1, 'q'),
(1, 'a'),
(2, 'x'),
(3, 'c')
SELECT
A.*,
B.DATA
FROM
(SELECT *, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY(SELECT NULL)) RowId FROM #Table1) A INNER JOIN
(SELECT *, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY(SELECT NULL)) RowId FROM #Table2) B ON A.ID = B.ID AND
A.RowId = B.RowId

SQL Server: How to select top rows of a group based on value of the column of that group?

I have two tables like below.
table 1
id rem
1 2
2 1
table 2
id value
1 abc
1 xyz
1 mno
2 mnk
2 mjd
EDIT:
#output
id value
1 abc
1 xyz
2 mnk
What i want to do is select top 2 rows of table2 with id one as rem value is 2 for id 1 and top 1 row with id 2 as its rem value is 1 and so on. I am using MS sqlserver 2012 My whole scenario is more complex than this. Please help.
Thank you.
EDIT : I know that i should have given what i have done and how i am doing it but for this particular part i don't have idea for starting. I could do this by using while loop for each unique id but i want to do it in one go if possible.

First, SQL tables represent unordered sets. There is no specification of which values you get, unless you include an order by.
For this purpose, I would go with row_number():
select t2.*
from table1 t1 join
(select t2.*,
row_number() over (partition by id order by id) as seqnum
from table2 t2
) t2
on t1.id = t2.id and t2.seqnum <= t1.rem;
Note: The order by id in the windows clause should be based on which rows you want. If you don't care which rows, then order by id or order by (select null) is fine.

Try This:
DECLARE #tbl1 TABLE (id INT, rem INT)
INSERT INTO #tbl1 VALUES (1, 2), (2, 1)
DECLARE #tbl2 TABLE (id INT, value VARCHAR(10))
INSERT INTO #tbl2 VALUES (1, 'abc'), (1, 'xyz'),
(1, 'mno'), (2, 'mnk'), (2, 'mjd')
SELECT * FROM #tbl1 -- your table 1
SELECT * FROM #tbl2 -- your table 2
SELECT id,value,rem FROM ( SELECT ROW_NUMBER() OVER (PARTITION BY T.ID ORDER BY T.ID) rowid,
T.id,T.value,F.rem FROM #tbl2 T LEFT JOIN #tbl1 F ON T.id = F.id ) A WHERE rowid = 1
-- your required output
Hope it helps.

SQL Server: How do I get the highest value not set of an int column?

Let's take an example. These are the rows of the table I want get the data:
The column I'm talking about is the reference one. The user can set this value on the web form, but the system I'm developing must suggest the lowest reference value still not used.
As you can see, the smallest value of this column is 35. I could just take the smaller reference and sum 1, but, in that case, the value 36 is already used. So, the value I want is 37.
Is there a way to do this without a loop verification? This table will grow so much.

This is for 2012+
DECLARE #Tbl TABLE (id int, reference int)
INSERT INTO #Tbl
( id, reference )
VALUES
(1, 49),
(2, 125),
(3, 35),
(4, 1345),
(5, 36),
(6, 37)
SELECT
MIN(A.reference) + 1 Result
FROM
(
SELECT
*,
LEAD(reference) OVER (ORDER BY reference) Tmp
FROM
#Tbl
) A
WHERE
A.reference - A.Tmp != -1
Result: 37

Here is yet another place where the tally table is going to prove invaluable. In fact it is so useful I keep a view on my system that looks like this.
create View [dbo].[cteTally] as
WITH
E1(N) AS (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),
E2(N) AS (SELECT 1 FROM E1 a cross join E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a cross join E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
)
select N from cteTally
Next of course we need some sample data and table to hold it.
create table #Something
(
id int identity
, reference int
, description varchar(10)
)
insert #Something (reference, description)
values (49, 'data1')
, (125, 'data2')
, (35, 'data3')
, (1345, 'data4')
, (36, 'data5')
, (7784, 'data6')
Now comes the magic of the tally table.
select top 1 t.N
from cteTally t
left join #Something s on t.N = s.reference
where t.N >= (select MIN(reference) from #Something)
and s.id is null
order by t.N

This is ugly, but should get the job done:
select
top 1 reference+1
from
[table]
where
reference+1 not in (select reference from [table])
order by reference

I used a table valued express to get the next value. I first left outer joined the table to itself (shifting the key in the join by +1). I then looked only at rows that had no corresponding match (b.ID is null). The minimum a.ReferenceID + 1 gives us the answer we are looking for.
create table MyTable
(
ID int identity,
Reference int,
Description varchar(20)
)
insert into MyTable values (10,'Data')
insert into MyTable values (11,'Data')
insert into MyTable values (12,'Data')
insert into MyTable values (15,'Data')
-- Find gap
;with Gaps as
(
select a.Reference+1 as 'GapID'
from MyTable a
left join MyTable b on a.Reference = b.Reference-1
where b.ID is null
)
select min(GapID) as 'NewReference'
from Gaps
NewReference
------------
13
I hope the code was clearer than my description.

CREATE TABLE #T(ID INT , REFERENCE INT, [DESCRIPTION] VARCHAR(50))
INSERT INTO #T
SELECT 1,49 , 'data1' UNION ALL
SELECT 2,125 , 'data2' UNION ALL
SELECT 3,35 , 'data3' UNION ALL
SELECT 4,1345, 'data4' UNION ALL
SELECT 5,36 , 'data5' UNION ALL
SELECT 6,7784, 'data6'
SELECT TOP 1 REFERENCE + 1
FROM #T T1
WHERE
NOT EXISTS
(
SELECT 1 FROM #T T2 WHERE T2.REFERENCE = T1.REFERENCE + 1
)
ORDER BY T1.REFERENCE
--- OR
SELECT MIN(REFERENCE) + 1
FROM #T T1
WHERE
NOT EXISTS
(
SELECT 1 FROM #T T2 WHERE T2.REFERENCE = T1.REFERENCE + 1
)

How about using a Tally table. The following illustrates the concept. It would be better to use a persisted numbers table as opposed to the cte however the code below illustrates the concept.
For further reading as to why you should use a persisted table, check out the following link: sql-auxiliary-table-of-numbers
DECLARE #START int = 1, #END int = 1000
CREATE TABLE #TEST(UsedValues INT)
INSERT INTO #TEST(UsedValues) VALUES
(1),(3),(5),(7),(9),(11),(13),(15),(17)
;With NumberSequence( Number ) as
(
Select #start as Number
union all
Select Number + 1
from NumberSequence
where Number < #end
)
SELECT MIN(Number)
FROM NumberSequence n
LEFT JOIN #TEST t
ON n.Number = t.UsedValues
WHERE UsedValues IS NULL
OPTION ( MAXRECURSION 1000 )

You could try using a descending order:
SELECT DISTINCT reference
FROM `Resultsados`
ORDER BY `reference` ASC;
As far as I know, there is no way to do this without a loop. To prevent multiple values from returning be sure to use DISTINCT.

Combine two tables in SQL Server

I have tow tables with the same number of rows
Example:
table a:
1,A
2,B
3,C
table b:
AA,BB
AAA,BBB,
AAAA,BBBB
I want a new table made like that in SQL SErver:
1,A,AA,BB
2,B,AAA,BBB
3,C,AAAA,BBBB
How do I do that?

In SQL Server 2005 (or newer), you can use something like this:
-- test data setup
DECLARE #tablea TABLE (ID INT, Val CHAR(1))
INSERT INTO #tablea VALUES(1, 'A'), (2, 'B'), (3, 'C')
DECLARE #tableb TABLE (Val1 VARCHAR(10), Val2 VARCHAR(10))
INSERT INTO #tableb VALUES('AA', 'BB'),('AAA', 'BBB'), ('AAAA', 'BBBB')
-- define CTE for table A - sort by "ID" (I just assumed this - adapt if needed)
;WITH DataFromTableA AS
(
SELECT ID, Val, ROW_NUMBER() OVER(ORDER BY ID) AS RN
FROM #tablea
),
-- define CTE for table B - sort by "Val1" (I just assumed this - adapt if needed)
DataFromTableB AS
(
SELECT Val1, Val2, ROW_NUMBER() OVER(ORDER BY Val1) AS RN
FROM #tableb
)
-- create an INNER JOIN between the two CTE which just basically selected the data
-- from both tables and added a new column "RN" which gets a consecutive number for each row
SELECT
a.ID, a.Val, b.Val1, b.Val2
FROM
DataFromTableA a
INNER JOIN
DataFromTableB b ON a.RN = b.RN
This gives you the requested output:

You could do a rank over the primary keys, then join on that rank:
SELECT RANK() OVER (table1.primaryKey),
T1.*,
T2.*
FROM
SELECT T1.*, T2.*
FROM
(
SELECT RANK() OVER (table1.primaryKey) [rank], table1.* FROM table1
) AS T1
JOIN
(
SELECT RANK() OVER (table2.primaryKey) [rank], table2.* FROM table2
) AS T2 ON T1.[rank] = T2.[rank]

Your query is strange, but in Oracle you can do this:
select a.*, tb.*
from a
, ( select rownum rn, b.* from b ) tb -- temporary b - added rn column
where a.c1 = tb.rn -- assuming first column in a is called c1
if there is not column with numbers in a you can do same trick twice
select ta.*, tb.*
from ( select rownum rn, a.* from a ) ta
, ( select rownum rn, b.* from b ) tb
where ta.rn = tb.rn
Note: be aware that this can generate random combination, for example
1 A AA BB
2 C A B
3 B AAA BBB
because there is no order by in ta and tb

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

SQL Server Joining Between Two Tables Based on Closest Value - sql-server

Related

T-SQL Left join twice

How to join two Tables which have no unique ID

SQL Server: How to select top rows of a group based on value of the column of that group?

SQL Server: How do I get the highest value not set of an int column?

Combine two tables in SQL Server

Categories

Resources