How to query rows with nearest column values? - sql-server

How should I construct T-SQL statement in order to achieve following:
from this
table1:
display_term|occurence
A|1
A|4
B|3
B|9
retrieve this
table2:
display_term|occurence
A|4
B|3
The "nearest distance" between A and B is 1 and it can be seen in resulting table.
i.e. I want to query the nearest (column "occurrence") distinct(column "display_term") records.
Thanks in advance

For just two terms, you can do something like:
declare #T table (display_term char(1) not null,occurence int not null)
insert into #T (display_term,occurence) values
('A',1),
('A',4),
('B',3),
('B',9)
select top 1
*
from
#T t1
cross join
#T t2
where
t1.display_term = 'A' and
t2.display_term = 'B'
order by ABS(t1.occurence - t2.occurence)
Which produces:
display_term occurence display_term occurence
------------ ----------- ------------ -----------
A 4 B 3
(You can search for UNPIVOT based solutions if you need exactly the result set you asked for)
It's not clear from your question whether this needs to extend to more terms - there's no obvious way to re-interpret the requirement for more terms, so I've left that undone for now.
UNPIVOT based solution if exact result set required. Setup #T as above:
select display_term,occurence from (
select top 1
t1.occurence as A,
t2.occurence as B
from
#T t1
cross join
#T t2
where
t1.display_term = 'A' and
t2.display_term = 'B'
order by ABS(t1.occurence - t2.occurence)
) t unpivot (occurence for display_term in (A,B)) as u
Result:
display_term occurence
------------------------------------ -----------
A 4
B 3

You said you want the nearest. It's not clear exactly what that means. Your example above shows the maximum of each display term. If you want the maximum value for each display term, you want to use aggregation. This is achieved by using the Group By and the Max method.
SELECT display_term, Max(occurence) as MaxOccurrence
FROM TABLE_NAME
GROUP BY display_term

If a single row result will work for you, this will do it:
SELECT a.display_term AS adt,
a.occurence As aoc,
b.display_term AS bdt,
b.occurence AS boc,
ABS(a.occurence - b.occurence) AS distance
FROM my_table a, my_table b
WHERE a.display_term = 'A'
AND b.display_term = 'B'
AND ABS(a.occurence - b.occurence) = (
SELECT MIN(ABS(a.occurence - b.occurence))
FROM my_table a, my_table b
WHERE a.display_term = 'A'
AND b.display_term = 'B'
)

Related

How to check for a specific condition by looping through every record in SQL Server?

I do have following table
ID Name
1 Jagan Mohan Reddy868
2 Jagan Mohan Reddy869
3 Jagan Mohan Reddy
Name column size is VARCHAR(55).
Now for some other task we need to take only 10 varchar length i.e. VARCHAR(10).
My requirement is to check that after taking the only 10 bits length of Name column value for eg if i take Name value of ID 1 i.e. Jagan Mohan Reddy868 by SUBSTRING(Name, 0,11) if it equals with another row value. here in this case the final value of SUBSTRING(Jagan Mohan Reddy868, 0,11) is equal to Name value of ID 3 row whose Name is 'Jagan Mohan Reddy'. I need to make a list of those kind rows. Can somebody help me out on how can i achieve in SQL Server.
My main check is that the truncated values of my Name column should not match with any non truncated values of Name column. If so i need to get those records.
Assuming I understand the question, I think you are looking for something like this:
Create and populate sample data (Please save us this step in your future questions)
DECLARE #T as TABLE
(
Id int identity(1,1),
Name varchar(15)
)
INSERT INTO #T VALUES
('Hi, I am Zohar.'),
('Hi, I am Peled.'),
('Hi, I am Z'),
('I''m Zohar peled')
Use a cte with a self inner join to get the list of ids that match the first 10 chars:
;WITH cte as
(
SELECT T2.Id As Id1, T1.Id As Id2
FROM #T T1
INNER JOIN #T T2 ON LEFT(T1.Name, 10) = t2.Name AND T1.Id <> T2.Id
)
Select the records from the original table, inner joined with a union of the Id1 and Id2 from the cte:
SELECT T.Id, Name
FROM #T T
INNER JOIN
(
SELECT Id1 As Id
FROM CTE
UNION
SELECT Id2
FROM CTE
) U ON T.Id = U.Id
Results:
Id Name
----------- ---------------
1 Hi, I am Zohar.
3 Hi, I am Z
Try this
SELECT Id,Name
FROM(
SELECT *,ROW_NUMBER() OVER(PARTITION BY Name, LEFT(Name,11) ORDER BY ID) RN
FROM Tbale1 T
) Tmp
WHERE Tmp.RN = 1
loop over your column for all the values and put your substring() function inside this loop and I think in Sql index of string starts from 1 instead of 0. If you pass your string to charindex() like this
CHARINDEX('Y', 'Your String')
thus you will come to know whether it is starting from 0 or 1
and you can save your substring value as value of other column with length 10
I hope it will help you..
I think this should cover all the cases you are looking for.
-- Create Table
DECLARE #T as TABLE
(
Id int identity(1,1),
Name varchar(55)
)
-- Create Data
INSERT INTO #T VALUES
('Jagan Mohan Reddy868'),
('Jagan Mohan Reddy869'),
('Jagan Mohan Reddy'),
('Mohan Reddy'),
('Mohan Reddy123551'),
('Mohan R')
-- Get Matching Items
select *, SUBSTRING(name, 0, 11) as ShorterName
from #T
where SUBSTRING(name, 0, 11) in
(
-- get all shortnames with a count > 1
select SUBSTRING(name, 0, 11) as ShortName
from #T
group by SUBSTRING(name, 0, 11)
having COUNT(*) > 1
)
order by Name, LEN(Name)

How to query number based SQL Sets with Ranges in SQL

What I'm looking for is a way in MSSQL to create a complex IN or LIKE clause that contains a SET of values, some of which will be ranges.
Sort of like this, there are some single numbers, but also some ranges of numbers.
EX: SELECT * FROM table WHERE field LIKE/IN '1-10, 13, 24, 51-60'
I need to find a way to do this WITHOUT having to specify every number in the ranges separately AND without having to say "field LIKE blah OR field BETWEEN blah AND blah OR field LIKE blah.
This is just a simple example but the real query will have many groups and large ranges in it so all the OR's will not work.
One fairly easy way to do this would be to load a temp table with your values/ranges:
CREATE TABLE #Ranges (ValA int, ValB int)
INSERT INTO #Ranges
VALUES
(1, 10)
,(13, NULL)
,(24, NULL)
,(51,60)
SELECT *
FROM Table t
JOIN #Ranges R
ON (t.Field = R.ValA AND R.ValB IS NULL)
OR (t.Field BETWEEN R.ValA and R.ValB AND R.ValB IS NOT NULL)
The BETWEEN won't scale that well, though, so you may want to consider expanding this to include all values and eliminating ranges.
You can do this with CTEs.
First, create a numbers/tally table if you don't already have one (it might be better to make it permanent instead of temporary if you are going to use it a lot):
;WITH Numbers AS
(
SELECT
1 as Value
UNION ALL
SELECT
Numbers.Value + 1
FROM
Numbers
)
SELECT TOP 1000
Value
INTO ##Numbers
FROM
Numbers
OPTION (MAXRECURSION 1000)
Then you can use a CTE to parse the comma delimited string and join the ranges with the numbers table to get the "NewValue" column which contains the whole list of numbers you are looking for:
DECLARE #TestData varchar(50) = '1-10,13,24,51-60'
;WITH CTE AS
(
SELECT
1 AS RowCounter,
1 AS StartPosition,
CHARINDEX(',',#TestData) AS EndPosition
UNION ALL
SELECT
CTE.RowCounter + 1,
EndPosition + 1,
CHARINDEX(',',#TestData, CTE.EndPosition+1)
FROM CTE
WHERE
CTE.EndPosition > 0
)
SELECT
u.Value,
u.StartValue,
u.EndValue,
n.Value as NewValue
FROM
(
SELECT
Value,
SUBSTRING(Value,1,CASE WHEN CHARINDEX('-',Value) > 0 THEN CHARINDEX('-',Value)-1 ELSE LEN(Value) END) AS StartValue,
SUBSTRING(Value,CASE WHEN CHARINDEX('-',Value) > 0 THEN CHARINDEX('-',Value)+1 ELSE 1 END,LEN(Value)- CHARINDEX('-',Value)) AS EndValue
FROM
(
SELECT
SUBSTRING(#TestData, StartPosition, CASE WHEN EndPosition > 0 THEN EndPosition-StartPosition ELSE LEN(#TestData)-StartPosition+1 END) AS Value
FROM
CTE
)t
)u INNER JOIN ##Numbers n ON n.Value BETWEEN u.StartValue AND u.EndValue
All you would need to do once you have that is query the results using an IN statement, so something like
SELECT * FROM MyTable WHERE Value IN (SELECT NewValue FROM (/*subquery from above*/)t)

Refactoring T-SQL nested SELECT query to use in a case statement

I have the following table:
maker model type
B 1121 pc
A 1233 pc
E 1260 pc
A 1752 laptop
A 1276 printer
D 1288 printer
I need to receive a result in the form: maker, pc. If a particular maker has models in a given type, I need to concatenate the word 'yes' with the number of models in parentheses. Ex. yes(1) for maker 'A'. So, how can I avoid the following duplication?
CASE
WHEN SELECT COUNT(*) WHERE ... > 0
THEN 'yes(' + CAST((SELECT COUNT(*) WHERE ...) AS varchar) + ')'
This is not a real world problem. I just need to understand how to save a subquery result to use it in a branch statement. The result of this branch statement may contain the subquery result itself.
Creating tables:
create table #t (maker varchar(100), model varchar(100), type varchar(100) );
insert into #t ( maker, model, type ) values
( 'B', '1121', 'pc'),
( 'A', '1233', 'pc'),
( 'E', '1260', 'pc');
Query in easy steps:
;with
totals as (
select maker, type,
count( * ) as n
from
#t
group by
maker, type
) ,
maker_type as (
select distinct maker, type
from #t
)
select
mm.*, t.n,
case when t.n is null then 'No' else 'Yes' end as yes_no
from
maker_type mm
left outer join
totals t
on mm.maker = t.maker and
mm.type = t.type
Results:
maker type n yes_no
----- ---- - ------
A pc 1 Yes
B pc 1 Yes
E pc 1 Yes
I don't extend solution concatenating strings because I see that you know how to do it. Be free to change first or second CTE query to match yours requirements.

set difference in SQL query

I'm trying to select records with a statement
SELECT *
FROM A
WHERE
LEFT(B, 5) IN
(SELECT * FROM
(SELECT LEFT(A.B,5), COUNT(DISTINCT A.C) c_count
FROM A
GROUP BY LEFT(B,5)
) p1
WHERE p1.c_count = 1
)
AND C IN
(SELECT * FROM
(SELECT A.C , COUNT(DISTINCT LEFT(A.B,5)) b_count
FROM A
GROUP BY C
) p2
WHERE p2.b_count = 1)
which takes a long time to run ~15 sec.
Is there a better way of writing this SQL?
If you would like to represent Set Difference (A-B) in SQL, here is solution for you.
Let's say you have two tables A and B, and you want to retrieve all records that exist only in A but not in B, where A and B have a relationship via an attribute named ID.
An efficient query for this is:
# (A-B)
SELECT DISTINCT A.* FROM (A LEFT OUTER JOIN B on A.ID=B.ID) WHERE B.ID IS NULL
-from Jayaram Timsina's blog.
You don't need to return data from the nested subqueries. I'm not sure this will make a difference withiut indexing but it's easier to read.
And EXISTS/JOIN is probably nicer IMHO then using IN
SELECT *
FROM
A
JOIN
(SELECT LEFT(B,5) AS b1
FROM A
GROUP BY LEFT(B,5)
HAVING COUNT(DISTINCT C) = 1
) t1 On LEFT(A.B, 5) = t1.b1
JOIN
(SELECT C AS C1
FROM A
GROUP BY C
HAVING COUNT(DISTINCT LEFT(B,5)) = 1
) t2 ON A.C = t2.c1
But you'll need a computed column as marc_s said at least
And 2 indexes: one on (computed, C) and another on (C, computed)
Well, not sure what you're really trying to do here - but obviously, that LEFT(B, 5) expression keeps popping up. Since you're using a function, you're giving up any chance to use an index.
What you could do in your SQL Server table is to create a computed, persisted column for that expression, and then put an index on that:
ALTER TABLE A
ADD LeftB5 AS LEFT(B, 5) PERSISTED
CREATE NONCLUSTERED INDEX IX_LeftB5 ON dbo.A(LeftB5)
Now use the new computed column LeftB5 instead of LEFT(B, 5) anywhere in your query - that should help to speed up certain lookups and GROUP BY operations.
Also - you have a GROUP BY C in there - is that column C indexed?
If you are looking for just set difference between table1 and table2,
the below query is simple that gives the rows that are in table1, but not in table2, such that both tables are instances of the same schema with column names as
columnone, columntwo, ...
with
col1 as (
select columnone from table2
),
col2 as (
select columntwo from table2
)
...
select * from table1
where (
columnone not in col1
and columntwo not in col2
...
);

How to group ranged values using SQL Server

I have a table of values like this
978412, 400
978813, 20
978834, 50
981001, 20
As you can see the second number when added to the first is 1 number before the next in the sequence. The last number is not in the range (doesnt follow a direct sequence, as in the next value). What I need is a CTE (yes, ideally) that will output this
978412, 472
981001, 20
The first row contains the start number of the range then the sum of the nodes within. The next row is the next range which in this example is the same as the original data.
From the article that Josh posted, here's my take (tested and working):
SELECT
MAX(t1.gapID) as gapID,
t2.gapID-MAX(t1.gapID)+t2.gapSize as gapSize
-- max(t1) is the specific lower bound of t2 because of the group by.
FROM
( -- t1 is the lower boundary of an island.
SELECT gapID
FROM gaps tbl1
WHERE
NOT EXISTS(
SELECT *
FROM gaps tbl2
WHERE tbl1.gapID = tbl2.gapID + tbl2.gapSize + 1
)
) t1
INNER JOIN ( -- t2 is the upper boundary of an island.
SELECT gapID, gapSize
FROM gaps tbl1
WHERE
NOT EXISTS(
SELECT * FROM gaps tbl2
WHERE tbl2.gapID = tbl1.gapID + tbl1.gapSize + 1
)
) t2 ON t1.gapID <= t2.gapID -- For all t1, we get all bigger t2 and opposite.
GROUP BY t2.gapID, t2.gapSize
Check out this MSDN Article. It gives you a solution to your problem, if it will work for you depends on the ammount of data you have and your performance requirements for the query.
Edit:
Well using the example in the query, and going with his last solution the second way to get islands (first way resulted in an error on SQL 2005).
SELECT MIN(start) AS startGroup, endGroup, (endgroup-min(start) +1) as NumNodes
FROM (SELECT g1.gapID AS start,
(SELECT min(g2.gapID) FROM #gaps g2
WHERE g2.gapID >= g1.gapID and NOT EXISTS
(SELECT * FROM #gaps g3
WHERE g3.gapID - g2.gapID = 1)) as endGroup
FROM #gaps g1) T1 GROUP BY endGroup
The thing I added is (endgroup-min(start) +1) as NumNodes. This will give you the counts.

Resources