Comparing the length of two similar strings and picking the longest - sql-server

I am Trying to compare two strings and pick the longest if they are similar, I have managed to pick the longest by using the following code:
SELECT D.RID, ProductID, Product, [Length] FROM (
SELECT RID, MAX([Length]) AS theLength FROM SortData GROUP BY RID)
AS X INNER JOIN SortData AS D ON D.RID = X.RID AND D.[Length] = X.theLength
But I am now trying to make sure that the code only pick the longest string if it is a like the word it is comparing it to, I have attempted the following code in a few ways but I would be grateful if somebody could help me:
SELECT D.RID, D.ProductID, Product, [Length] FROM (
SELECT RID, Product, MAX([Length]) AS theLength FROM SortData GROUP BY RID)
AS X INNER JOIN SortData AS D ON D.RID = X.RID AND D.[Length] = X.theLength WHERE
D.Product LIKE Product
Using this code I get the Following Error:
Msg 8120, Level 16, State 1, Line 3 Column 'SortData.Product' is
invalid in the select list because it is not contained in either an
aggregate function or the GROUP BY clause. Msg 209, Level 16, State 1,
Line 5 Ambiguous column name 'Product'. Msg 209, Level 16, State 1,
Line 2 Ambiguous column name 'Product'.
Example of the Data I would Like to pick:
1 Sam
1 Samantha
2 Oliver
3 Ollie
4 Benjamin
4 Ben
...
I would expect the output list to be like:
1 Samantha
2 Oliver
3 Ollie
4 Benjamin
...
To Clarify what I am trying to do in the context of this example, I am trying to compare the two Names and if the are LIKE (e.g. x.Name LIKE Name) then pick the longest...
As Requested here is further test data:
1 Hydrogen
1 Hydrogen Oxide
1 Carbon Monoxide
2 Carbon
2 Carbon
2 Carbon Dioxide
3 Carbon Monoxide
3 Carbon Dioxide
3 Oxygen
4 Hydrogen Dioxide
Desired Results are as so:
1 Hydrogen Oxide
1 Carbon Monoxide
2 Carbon Dioxide
3 Carbon Monoxide
3 Oxygen
4 Hydrogen Dioxide

Perhaps another option: The WITH TIES clause in concert with Row_Number()
Example
Select Top 1 with ties *
From YourTable
Order By Row_Number() over (Partition by ID Order By Len(Name) desc)

Your query doesn't come close to your sample data and output. So I built this around the sample data provided to demonstrate one way of solving this.
declare #Something table
(
Col1 int
, Col2 varchar(20)
)
insert #Something values
(1, 'Sam')
, (1, 'Samantha')
, (2, 'Oliver')
, (3, 'Ollie')
select x.Col1
, x.Col2
from
(
select *
, RowNum = ROW_NUMBER() over(partition by Col1 order by LEN(Col2) desc)
from #Something
) x
where x.RowNum = 1
---EDIT---
To demonstrate that this code still returns the desired output from your new sample data...
declare #Something table
(
Col1 int
, Col2 varchar(20)
)
insert #Something values
(1, 'Sam')
, (1, 'Samantha')
, (2, 'Oliver')
, (3, 'Ollie')
, (4, 'Benjamin')
, (4, 'Ben')
select x.Col1
, x.Col2
from
(
select *
, RowNum = ROW_NUMBER() over(partition by Col1 order by LEN(Col2) desc)
from #Something
) x
where x.RowNum = 1
This returns:
1 Samantha
2 Oliver
3 Ollie
4 Benjamin
Since you claim this still doesn't work you need to provide an example of how or why this doesn't work. You keep mentioning LIKE but have not explained or demonstrated how that comes into play here. Help me understand the problem and I can help you find a solution.

I Ended up figuring it out and using the following code:
SELECT D.RID, ProductID, D.Product, [Length] FROM
(
SELECT RID, MAX([Length]) AS theLength
FROM SortData GROUP BY RID
) AS X
INNER JOIN SortData AS D ON D.RID = X.RID AND D.[Length] = X.theLength
WHERE D.Product LIKE Product
GO

Related

TSQL : Find PAIR Sequence in a table

I have following table in T-SQL(there are other columns too but no identity column or primary key column):
Oid Cid
1 a
1 b
2 f
3 c
4 f
5 a
5 b
6 f
6 g
7 f
So in above example I would like to highlight that following Oid are duplicate when looking at Cid column values as "PAIRS":
Oid:
1 (1 matches Oid: 5)
2 (2 matches Oid: 4 and 7)
Please NOTE that Oid 2 match did not include Oid 6, since the pair of 6 has letter 'G' as well.
Is it possible to create a query without using While loop to highlight the "Oid" like above? along with how many other matches count exist in database?
I am trying to find the patterns within the dataset relating to these two columns. Thank you in Advance.
Here is a worked example - see comments for explanation:
--First set up your data in a temp table
declare #oidcid table (Oid int, Cid char(1));
insert into #oidcid values
(1,'a'),
(1,'b'),
(2,'f'),
(3,'c'),
(4,'f'),
(5,'a'),
(5,'b'),
(6,'f'),
(6,'g'),
(7,'f');
--This cte gets a table with all of the cids in order, for each oid
with cte as (
select distinct Oid, (select Cid + ',' from #oidcid i2
where i2.Oid = i.Oid order by Cid
for xml path('')) Cids
from #oidcid i
)
select Oid, cte.Cids
from cte
inner join (
-- Here we get just the lists of cids that appear more than once
select Cids, Count(Oid) as OidCount
from cte group by Cids
having Count(Oid) > 1 ) as gcte on cte.Cids = gcte.Cids
-- And when we list them, we are showing the oids with duplicate cids next to each other
Order by cte.Cids
select o1.Cid, o1.Oid, o2.Oid
, count(*) + 1 over (partition by o1.Cid) as [cnt]
from table o1
join table o2
on o1.Cid = o2.Cid
and o1.Oid < o2.Oid
order by o1.Cid, o1.Oid, o2.Oid
Maybe Like this then:
WITH CTE AS
(
SELECT Cid, oid
,ROW_NUMBER() OVER (PARTITION BY cid ORDER BY cid) AS RN
,SUM(1) OVER (PARTITION BY oid) AS maxRow2
,SUM(1) OVER (PARTITION BY cid) AS maxRow
FROM oid
)
SELECT * FROM CTE WHERE maxRow != 1 AND maxRow2 = 1
ORDER BY oid

Displaying sorted hierarchy rows in SQL server?

Assuming I have this table : ( c is a child of parent p)
c p
------
40 0
2 3
2 40
3 1
7 2
1 0
Where (0 means root) — I want the order of select to be displayed as :
c b
------
1 0
3 1
2 3
40 0
2 40
7 2
That's becuase we have 2 roots (1,40) and 1 < 40.
So we start at 1 and then display below it - all it's descendants.
Then we get to 40. same logic again.
Question:
How can I do it ?
I've succeeded to display it recursively + finding level of hierarchy*(not sure if it helps though)*
WITH cte(c, p) AS (
SELECT 40, 0 UNION ALL
SELECT 2,3 UNION ALL
SELECT 2,40 UNION ALL
SELECT 3,1 UNION ALL
SELECT 7,2 UNION ALL
SELECT 1,0
) , cte2 AS(
SELECT c,
p,
PLevel = 1
FROM cte
WHERE p = 0
UNION ALL
SELECT cte.c,
cte.p,
PLevel = cte2.PLevel + 1
FROM cte
INNER JOIN cte2
ON cte2.c = cte.p
)
SELECT *
FROM cte2
Full SQL fiddle
You have almost done it. Just add a rank to identify each group and then sort the data on it.
Also, as you are working with more complex hierarchy we need to change the [level] value. In is now not a number, put the full path of the current element to its parent. Where \ means parent. For example the following string:
\1\5\4\1
represents the hierarchy below:
1
--> 5
--> 4
--> 1
I get the idea from hierarchyid type. You may want to consider storing hierarchies using it, as it has handy build-in functions for working with such structures.
Here is full working example with the new data:
DECLARE #DataSource TABLE
(
[c] TINYINT
,[p] TINYINT
);
INSERT INTO #DataSource ([c], [p])
VALUES (1,0)
,(3, 1)
,(2, 3)
,(5,1)
,(7, 2)
,(40, 0)
,(2, 40);
WITH DataSource ([c], [p], [level], [rank])AS
(
SELECT [c]
,[p]
,CAST('/' AS VARCHAR(24))
,ROW_NUMBER() OVER (ORDER BY [c] ASC)
FROM #DataSource
WHERE [p] = 0
UNION ALL
SELECT DS.[c]
,DS.[p]
,CAST(DS1.[level] + CAST(DS.[c] AS VARCHAR(3)) + '/' AS VARCHAR(24))
,DS1.[rank]
FROM #DataSource DS
INNER JOIN DataSource DS1
ON DS1.[c] = DS.[p]
)
SELECT [c]
,[p]
FROM DataSource
ORDER BY [Rank]
,CAST([level] AS hierarchyid);
Again, pay attention to the node (7,2) which is participating in the two groups (even in your example). I guess this is just a sample data and you have a way to defined where the node should be included.

How to find Hierarchical order in SQL server

Assume an organization assigning employees to do annual reviews to others employees. Each ReviewID (who is an employee) can get reviewed by multiple employeeIDs. An employee can start/do the review only if the particular reviewID completed all his reviewIDs or has no pending reviewIDs.
Sample Data code:
CREATE TABLE FindOrder
(
EmployeeID int
,ReviewID int
)
insert findorder
values (1,3), (1,10), (1,12), (2,3), (2,5), (2,7), (3,0), (4,6), (5, 3), (6,0), (7,0), (10,0), (12,5)
EmployeeIDs that have nothing to review (ReviewID=0) should be my first set of list (3, 6, 7, 10). EmployeedIDs who can start their review now are 4,5 ( should be my second set) as they need to review 6, 3 who dont have pending ReviewIDs. Here not employeeIDs 1 or 2 because 1 has reviewID 12 who did not complete all his reviews. so on...
Please let me know if I am still not clear.
I want to find the order levels such that level 0 is (6,10,7,3), level 1 is (5, 4), level 2 is (2, 12), level 3 is (1).
I tried this cte to find order:
;WITH CTE AS
(
SELECT EmployeeID, ReviewID, 0 AS [Level] FROM FindOrder WHERE NETOUT = '0'
UNION ALL
SELECT NN.EmployeeID, NN.ReviewID, [Level]+1 FROM FindOrder nn
JOIN CTE ON NN.ReviewID=CTE.EmployeeID
)
SELECT * FROM CTE
But I get Employeeid 1 in level 1 and level 3. EmployeeID 1 should not come in level 1 as all ppl Employee 1 has to review did not complete their reviews ie., Employee 1 should come as Employee 12 did not complete his review.
In general, new subset of data in recursive query above should have filtered EmployeeID 1 and 2.
Little tricky to explain but I hope I am clear now :(
It looks like your level should actually be the longest path of reviews needed for a given employee. For example, employee one has the following paths...
1->3
1->10
1->12->5->3
The level for this employee is the longest path, and if I'm understanding your question, the only one you care about. Try this...
;WITH CTE AS
(
SELECT EmployeeID, ReviewID, 0 AS [Level] FROM FindOrder WHERE ReviewId = '0'
UNION ALL
SELECT NN.EmployeeID, NN.ReviewID, [Level]+1 FROM FindOrder nn
JOIN CTE ON NN.ReviewID=CTE.EmployeeID
)
SELECT EmployeeId, MAX(Level) AS Level FROM CTE
GROUP BY EmployeeID
ORDER BY MAX(Level)

CTE Recursion to get tree hierarchy

I need to get an ordered hierarchy of a tree, in a specific way. The table in question looks a bit like this (all ID fields are uniqueidentifiers, I've simplified the data for sake of example):
EstimateItemID EstimateID ParentEstimateItemID ItemType
-------------- ---------- -------------------- --------
1 A NULL product
2 A 1 product
3 A 2 service
4 A NULL product
5 A 4 product
6 A 5 service
7 A 1 service
8 A 4 product
Graphical view of the tree structure (* denotes 'service'):
A
___/ \___
/ \
1 4
/ \ / \
2 7* 5 8
/ /
3* 6*
Using this query, I can get the hierarchy (just pretend 'A' is a uniqueidentifier, I know it isn't in real life):
DECLARE #EstimateID uniqueidentifier
SELECT #EstimateID = 'A'
;WITH temp as(
SELECT * FROM EstimateItem
WHERE EstimateID = #EstimateID
UNION ALL
SELECT ei.* FROM EstimateItem ei
INNER JOIN temp x ON ei.ParentEstimateItemID = x.EstimateItemID
)
SELECT * FROM temp
This gives me the children of EstimateID 'A', but in the order that it appears in the table. ie:
EstimateItemID
--------------
1
2
3
4
5
6
7
8
Unfortunately, what I need is an ordered hierarchy with a result set that follows the following constraints:
1. each branch must be grouped
2. records with ItemType 'product' and parent are the top node
3. records with ItemType 'product' and non-NULL parent grouped after top node
4. records with ItemType 'service' are bottom node of a branch
So, the order that I need the results, in this example, is:
EstimateItemID
--------------
1
2
3
7
4
5
8
6
What do I need to add to my query to accomplish this?
Try this:
;WITH items AS (
SELECT EstimateItemID, ItemType
, 0 AS Level
, CAST(EstimateItemID AS VARCHAR(255)) AS Path
FROM EstimateItem
WHERE ParentEstimateItemID IS NULL AND EstimateID = #EstimateID
UNION ALL
SELECT i.EstimateItemID, i.ItemType
, Level + 1
, CAST(Path + '.' + CAST(i.EstimateItemID AS VARCHAR(255)) AS VARCHAR(255))
FROM EstimateItem i
INNER JOIN items itms ON itms.EstimateItemID = i.ParentEstimateItemID
)
SELECT * FROM items ORDER BY Path
With Path - rows a sorted by parents nodes
If you want sort childnodes by ItemType for each level, than you can play with Level and SUBSTRING of Pathcolumn....
Here SQLFiddle with sample of data
This is an add-on to Fabio's great idea from above. Like I said in my reply to his original post. I have re-posted his idea using more common data, table name, and fields to make it easier for others to follow.
Thank you Fabio! Great name by the way.
First some data to work with:
CREATE TABLE tblLocations (ID INT IDENTITY(1,1), Code VARCHAR(1), ParentID INT, Name VARCHAR(20));
INSERT INTO tblLocations (Code, ParentID, Name) VALUES
('A', NULL, 'West'),
('A', 1, 'WA'),
('A', 2, 'Seattle'),
('A', NULL, 'East'),
('A', 4, 'NY'),
('A', 5, 'New York'),
('A', 1, 'NV'),
('A', 7, 'Las Vegas'),
('A', 2, 'Vancouver'),
('A', 4, 'FL'),
('A', 5, 'Buffalo'),
('A', 1, 'CA'),
('A', 10, 'Miami'),
('A', 12, 'Los Angeles'),
('A', 7, 'Reno'),
('A', 12, 'San Francisco'),
('A', 10, 'Orlando'),
('A', 12, 'Sacramento');
Now the recursive query:
-- Note: The 'Code' field isn't used, but you could add it to display more info.
;WITH MyCTE AS (
SELECT ID, Name, 0 AS TreeLevel, CAST(ID AS VARCHAR(255)) AS TreePath
FROM tblLocations T1
WHERE ParentID IS NULL
UNION ALL
SELECT T2.ID, T2.Name, TreeLevel + 1, CAST(TreePath + '.' + CAST(T2.ID AS VARCHAR(255)) AS VARCHAR(255)) AS TreePath
FROM tblLocations T2
INNER JOIN MyCTE itms ON itms.ID = T2.ParentID
)
-- Note: The 'replicate' function is not needed. Added it to give a visual of the results.
SELECT ID, Replicate('.', TreeLevel * 4)+Name 'Name', TreeLevel, TreePath
FROM MyCTE
ORDER BY TreePath;
I believe that you need to add the following to the results of your CTE...
BranchID = some kind of identifier that uniquely identifies the branch. Forgive me for not being more specific, but I'm not sure what identifies a branch for your needs. Your example shows a binary tree in which all branches flow back to the root.
ItemTypeID where (for example) 0 = Product and 1 = service.
Parent = identifies the parent.
If those exist in the output, I think you should be able to use the output from your query as either another CTE or as the FROM clause in a query. Order by BranchID, ItemTypeID, Parent.

SQL Select Statement For Calculating A Running Average Column

I am trying to have a running average column in the SELECT statement based on a column from the n previous rows in the same SELECT statement. The average I need is based on the n previous rows in the resultset.
Let me explain
Id Number Average
1 1 NULL
2 3 NULL
3 2 NULL
4 4 2 <----- Average of (1, 3, 2),Numbers from previous 3 rows
5 6 3 <----- Average of (3, 2, 4),Numbers from previous 3 rows
. . .
. . .
The first 3 rows of the Average column are null because there are no previous rows. The row 4 in the Average column shows the average of the Number column from the previous 3 rows.
I need some help trying to construct a SQL Select statement that will do this.
This should do it:
--Test Data
CREATE TABLE RowsToAverage
(
ID int NOT NULL,
Number int NOT NULL
)
INSERT RowsToAverage(ID, Number)
SELECT 1, 1
UNION ALL
SELECT 2, 3
UNION ALL
SELECT 3, 2
UNION ALL
SELECT 4, 4
UNION ALL
SELECT 5, 6
UNION ALL
SELECT 6, 8
UNION ALL
SELECT 7, 10
--The query
;WITH NumberedRows
AS
(
SELECT rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
FROM RowsToAverage rta
)
SELECT nr.ID, nr.Number,
CASE
WHEN nr.RowNumber <=3 THEN NULL
ELSE ( SELECT avg(Number)
FROM NumberedRows
WHERE RowNumber < nr.RowNumber
AND RowNumber >= nr.RowNumber - 3
)
END AS MovingAverage
FROM NumberedRows nr
Assuming that the Id column is sequential, here's a simplified query for a table named "MyTable":
SELECT
b.Id,
b.Number,
(
SELECT
AVG(a.Number)
FROM
MyTable a
WHERE
a.id >= (b.Id - 3)
AND a.id < b.Id
AND b.Id > 3
) as Average
FROM
MyTable b;
Edit: I missed the point that it should average the three previous records...
For a general running average, I think something like this would work:
SELECT
id, number,
SUM(number) OVER (ORDER BY ID) /
ROW_NUMBER() OVER (ORDER BY ID) AS [RunningAverage]
FROM myTable
ORDER BY ID
A simple self join would seem to perform much better than a row referencing subquery
Generate 10k rows of test data:
drop table test10k
create table test10k (Id int, Number int, constraint test10k_cpk primary key clustered (id))
;WITH digits AS (
SELECT 0 as Number
UNION SELECT 1
UNION SELECT 2
UNION SELECT 3
UNION SELECT 4
UNION SELECT 5
UNION SELECT 6
UNION SELECT 7
UNION SELECT 8
UNION SELECT 9
)
,numbers as (
SELECT
(thousands.Number * 1000)
+ (hundreds.Number * 100)
+ (tens.Number * 10)
+ ones.Number AS Number
FROM digits AS ones
CROSS JOIN digits AS tens
CROSS JOIN digits AS hundreds
CROSS JOIN digits AS thousands
)
insert test10k (Id, Number)
select Number, Number
from numbers
I would pull the special case of the first 3 rows out of the main query, you can UNION ALL those back in if you really want it in the row set. Self join query:
;WITH NumberedRows
AS
(
SELECT rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
FROM test10k rta
)
SELECT nr.ID, nr.Number,
avg(trailing.Number) as MovingAverage
FROM NumberedRows nr
join NumberedRows as trailing on trailing.RowNumber between nr.RowNumber-3 and nr.RowNumber-1
where nr.Number > 3
group by nr.id, nr.Number
On my machine this takes about 10 seconds, the subquery approach that Aaron Alton demonstrated takes about 45 seconds (after I changed it to reflect my test source table) :
;WITH NumberedRows
AS
(
SELECT rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
FROM test10k rta
)
SELECT nr.ID, nr.Number,
CASE
WHEN nr.RowNumber <=3 THEN NULL
ELSE ( SELECT avg(Number)
FROM NumberedRows
WHERE RowNumber < nr.RowNumber
AND RowNumber >= nr.RowNumber - 3
)
END AS MovingAverage
FROM NumberedRows nr
If you do a SET STATISTICS PROFILE ON, you can see the self join has 10k executes on the table spool. The subquery has 10k executes on the filter, aggregate, and other steps.
Want to improve this post? Provide detailed answers to this question, including citations and an explanation of why your answer is correct. Answers without enough detail may be edited or deleted.
Check out some solutions here. I'm sure that you could adapt one of them easily enough.
If you want this to be truly performant, and arn't afraid to dig into a seldom-used area of SQL Server, you should look into writing a custom aggregate function. SQL Server 2005 and 2008 brought CLR integration to the table, including the ability to write user aggregate functions. A custom running total aggregate would be the most efficient way to calculate a running average like this, by far.
Alternatively you can denormalize and store precalculated running values. Described here:
http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/01/23/denormalizing-to-enforce-business-rules-running-totals.aspx
Performance of selects is as fast as it goes. Of course, modifications are slower.

Resources