Iterate over XML "cousins" with SQL Server

Iterate over XML "cousins" with SQL Server - sql-server

I have the below XML document stored in a TSQL variable with XML type:
<root>
<parent>
<child>Alice</child>
<child>Bob</child>
<child>Carol</child>
</parent>
<house>
<room><id>1</id></room>
<room><id>2</id></room>
<room><id>3</id></room>
</house>
</root>
I would like to iterate over "cousin" nodes (that is, nodes whose parents are siblings) and insert in a table one row per iteration and one column per cousin. So the result would be something like this:
Child | Room
------------
Alice | 1
Bob | 2
Carol | 3
(I know for a fact that there are as many rooms as children).
I feel like this is a simple task, but can't seem to find a way. I am a beginner in SQL Server and XPath, and probably lack the terminology to look for documentation.
What I've tried so far is to iterate over, say the child elements, and try to read the matching room element from there using ROW_NUMBER to pick the room I want:
INSERT INTO children (child, room)
SELECT
child = T.Item.value('(../parent/child/text())[' + (ROW_NUMBER() OVER(ORDER BY T.Item)) + ']', 'VARCHAR(10)'),
room = T.Item.value('(id/text())[1]', 'CHAR(1)')
FROM
#XML.nodes('root/house/room') AS T(Item)
But SQL server complains that the value() accepts only string literal as first argument (what kind of limitation is that??).
Any idea of how I could do that simply?

Not sure if this the best or shortest way, but it produces the output you require:
DECLARE #x XML='
<root>
<parent>
<child>Alice</child>
<child>Bob</child>
<child>Carol</child>
</parent>
<house>
<room><id>1</id></room>
<room><id>2</id></room>
<room><id>3</id></room>
</house>
</root>';
;WITH childs AS (
SELECT
n.e.value('.','NVARCHAR(128)') AS child,
id=ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM
#x.nodes('/root/parent/child') AS n(e)
),
room_ids AS (
SELECT
n.e.value('.','NVARCHAR(128)') AS room_id,
id=ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM
#x.nodes('/root/house/room/id') AS n(e)
)
SELECT
c.child,
r.room_id
FROM
childs AS c
INNER JOIN room_ids AS r ON
r.id=c.id
ORDER BY
c.id;

As a pure XPath query you might do this:
SELECT The.Room.value('id[1]','varchar(max)'),The.Room.value('let $r:=. return (../../parent/child[position()=$r]/text())[1]','varchar(max)')
FROM #YourXML.nodes('/root/house/room') AS The(Room)
But - to be honest - I'd prefer TT.'s solution. This solution relys on the rooms to be numbered from 1 to n without gaps. TT.'s solution would work even with unsorted room numbers...

Related

How to do an SQL Server Recursive CTE

I have a table where each record contains a unique Numeric id and 2 parent ids (mother, father). I would like to find a way to list the parents(2), grandparents(4), great grandparents(8) and so on down to a specified level. Before I give up on pure SQL and do it in Python, can anyone tell me a way to do this?

You can try something like below, where once you pass child numericId, you get corresponding parents and from there, recursively you get higher levels.
You can filter the levels using parentlevel filter.
DECLARE #childNumericId AS INT
;WITH CTE_Ancestory AS (
SELECT numericId AS child, ParentId1 as father, parentId2 as mother, 1 as parentlevel
FROM tableName
WHERE numericId = #childNumericId
UNION ALL
SELECT t.NumericId AS child, t.ParentId1 as father, t.parentId2 as mother, c.parentlevel + 1 AS parentLevel
FROM tableName AS t
INNER JOIN CTE_Ancestory AS c ON t.numericId IN (c.father, c.mother)
)
SELECT *
from CTE_Ancestory
Where parentlevel < 4 -- number of levels you need

Find/replace string values based upon table values

I have a somewhat unusual need to find/replace values in a string from values in a separate table.
Basically, I need to standardize a bunch of addresses, and one of the steps is to replace things like St, Rd or Blvd with Street, Road or Boulevard. I was going to write a function with bunch of nested REPLACE() statements, but this is 1) inefficient; and 2) not practical. There are over 500 possible abbreviations for street types according the USPS website.
What I'd like to do is something akin to:
REPLACE(Address1, Col1, Col2) where col1 and col2 are abbreviation and full street type in a separate table.
Anyone have any insight into something like this?

You can do such replacements using a recursive CTE. Something like this:
with r as (
select t.*, row_number() over (order by id) as seqnum
from replacements
),
cte as (
select replace(t.address, r.col1, r.col2) as address, seqnum as i
from t cross join
r
where r.seqnum = 1
union all
select replace(cte.address, r.col1, r.col2), i + 1
from cte join
r
on r.i = cte.i + 1
)
select cte.*
from (select cte.*, max(i) over () as maxi
from cte
) cte
where maxi = i;
That said, this is basically iteration. It will be quite expensive to do this on a table where there are 500 replacements per row.
SQL is probably not the best tool for this.

Sql find Row with longest String and delete the Rest

I am currently working on a table with approx. 7.5mio rows and 16 columns. One of the rows is an internal identifier (let's call it ID) we use at my university. Another column contains a string.
So, ID is NOT the unique index for a row, so it is possible that one identifier appears more than once in the table - the only difference between the two rows being the string.
I need to find all rows with ID and just keep the one with the longest string and deleting every other row from the original table. Unfortunately I am more of a SQL Novice, and I am really stuck at this point. So if anyone could help, this would be really nice.

Take a look at this sample:
SELECT * INTO #sample FROM (VALUES
(1, 'A'),
(1,'Long A'),
(2,'B'),
(2,'Long B'),
(2,'BB')
) T(ID,Txt)
DELETE S FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY LEN(Txt) DESC) RN
FROM #sample) S
WHERE RN!=1
SELECT * FROM #sample
Results:
ID Txt
-- ------
1 Long A
2 Long B

It might be possible just in SQL, but the way I know how to do it would be a two-pass approach using application code - I assume you have an application you are writing.
The first pass would be something like:
SELECT theid, count(*) AS num, MAX(LEN(thestring)) AS keepme FROM thetable WHERE num > 1 GROUP BY theid
Then you'd loop through the results in whatever language you're using and delete anything with that ID except the one matching the string returned. The language I know is PHP, so I'll use it for my example, but the method would be the same in any language (for brevity, I'm skipping error checking, prepared statements, and such, and not testing - please use carefully):
$sql = 'SELECT theid, count(*) AS num, MAX(LEN(thestring)) AS keepme FROM thetable WHERE num > 1 GROUP BY theid';
$result = sqlsrv_query($resource, $sql);
while ($row = sqlsrv_fetch_object($result)) {
$sql = 'DELETE FROM thetable WHERE theid = '.$row->theid.' AND NOT thestring = '.$row->keepme;
$result = sqlsrv_query($resource, $sql);
}
You didn't say what you would want to do if two strings are the same length, so this solution does not deal with that at all - I'm assuming that each ID will only have one longest string.

how to select data row from a comma separated value field

My question is not exactly but similar to this question
How to SELECT parts from a comma-separated field with a LIKE statement
but i have not seen any answer there. So I am posting my question again.
i have the following table
╔════════════╦═════════════╗
║ VacancyId ║ Media ║
╠════════════╬═════════════╣
║ 1 ║ 32,26,30 ║
║ 2 ║ 31, 25,20 ║
║ 3 ║ 21,32,23 ║
╚════════════╩═════════════╝
I want to select data who has media id=30 or media=21 or media= 40
So in this case the output will return the 1st and the third row.
How can I do that ?
I have tried media like '30' but that does not return any value. Plus i just dont need to search for one string in that field .
My database is SQL Server
Thank you

It's never good to use the comma separated values to store in database if it is feasible try to make separate tables to store them as most probably this is 1:n relationship.
If this is not feasible then there are following possible ways you can do this,
If your number of values to match are going to stay same, then you might want to do the series of Like statement along with OR/AND depending on your requirement.
Ex.-
WHERE
Media LIKE '%21%'
OR Media LIKE '%30%'
OR Media LIKE '%40%'
However above query will likely to catch all the values which contains 21 so even if columns with values like 1210,210 will also be returned. To overcome this you can do following trick which is hamper the performance as it uses functions in where clause and that goes against making Seargable queries.
But here it goes,
--Declare valueSearch variable first to value to match for you can do this for multiple values using multiple variables.
Declare #valueSearch = '21'
-- Then do the matching in where clause
WHERE
(',' + RTRIM(Media) + ',') LIKE '%,' + #valueSearch + ',%'
If the number of values to match are going to change then you might want to look into FullText Index and you should thinking about the same.
And if you decide to go with this after Fulltext Index you can do as below to get what you want,
Ex.-
WHERE
CONTAINS(Media, '"21" OR "30" OR "40"')

The best possible way i can suggest is first you have do comma separated value to table using This link and you will end up with table looks like below.
SELECT * FROM Table
WHERE Media in('30','28')
It will surely works.

You can use this, but the performance is inevitably poor. You should, as others have said, normalise this structure.
WHERE
',' + media + ',' LIKE '%,21,%'
OR ',' + media + ',' LIKE '%,30,%'
Etc, etc...

If you are certain that any Media value containing the string 30 will be one you wish to return, you just need to include wildcards in your LIKE statement:
SELECT *
FROM Table
WHERE Media LIKE '%30%'
Bear in mind though that this would also return a record with a Media value of 298,300,302 for example, so if this is problematic for you, you'll need to consider a more sophisticated method, like:
SELECT *
FROM Table
WHERE Media LIKE '%,30,%'
OR Media LIKE '30,%'
OR Media LIKE '%,30'
OR Media = '30'
If there might be spaces in the strings (as per in your question), you'll also want to strip these out:
SELECT *
FROM Table
WHERE REPLACE(Media,' ','') LIKE '%,30,%'
OR REPLACE(Media,' ','') LIKE '30,%'
OR REPLACE(Media,' ','') LIKE '%,30'
OR REPLACE(Media,' ','') = '30'
Edit: I actually prefer Coder of Code's solution to this:
SELECT *
FROM Table
WHERE ',' + LTRIM(RTRIM(REPLACE(Media,' ',''))) + ',' LIKE '%,30,%'
You mention that would wish to search for multiple strings in this field, which is also possible:
SELECT *
FROM Table
WHERE Media LIKE '%30%'
OR Media LIKE '%28%'
SELECT *
FROM Table
WHERE Media LIKE '%30%'
AND Media LIKE '%28%'

I agree not a good idea comma seperated values stored like that. Bu if you have to;
I think using inline function is will give better performance;
Select VacancyId, Media from (
Select 1 as VacancyId, '32,26,30' as Media
union all
Select 2, '31,25,20'
union all
Select 3, '21,32,23'
) asa
CROSS APPLY dbo.udf_StrToTable(Media, ',') tbl
where CAST(tbl.Result as int) in (30,21,40)
Group by VacancyId, Media
Output is;
VacancyId Media
----------- ---------
1 32,26,30
3 21,32,23
and our inline function script is;
if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[udf_StrToTable]') and xtype in (N'FN', N'IF', N'TF'))
drop function [dbo].udf_StrToTable
GO
CREATE FUNCTION udf_StrToTable (#List NVARCHAR(MAX), #Delimiter NVARCHAR(1))
RETURNS TABLE
With Encryption
AS
RETURN
( WITH Split(stpos,endpos)
AS(
SELECT 0 AS stpos, CHARINDEX(#Delimiter,#List) AS endpos
UNION ALL
SELECT CAST(endpos+1 as int), CHARINDEX(#Delimiter,#List,endpos+1)
FROM Split
WHERE endpos > 0
)
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) as inx,
SUBSTRING(#List,stpos,COALESCE(NULLIF(endpos,0),LEN(#List)+1)-stpos) Result
FROM Split
)
GO

This solution uses a RECURSIVE CTE to identify the position of each comma within the string then uses SUBSTRING to return all strings between the commas.
I've left some unnecessary code in place to help you get you head round what it's doing. You can strip it down to provide exactly what you need.
DROP TABLE #TMP
CREATE TABLE #TMP(ID INT, Vals CHAR(100))
INSERT INTO #TMP(ID,VALS)
VALUES
(1,'32,26,30')
,(2,'31, 25,20')
,(3,'21,32,23')
;WITH cte
AS
(
SELECT
ID
,VALS
,0 POS
,CHARINDEX(',',VALS,0) REM
FROM
#TMP
UNION ALL
SELECT ID,VALS,REM,CHARINDEX(',',VALS,REM+1)
FROM
cte c
WHERE CHARINDEX(',',VALS,REM+1) > 0
UNION ALL
SELECT ID,VALS,REM,LEN(VALS)
FROM
cte c
WHERE POS+1 < LEN(VALS) AND CHARINDEX(',',VALS,REM+1) = 0
)
,cte_Clean
AS
(
SELECT ID,CAST(REPLACE(LTRIM(RTRIM(SUBSTRING(VALS,POS+1,REM-POS))),',','') AS INT) AS VAL FROM cte
WHERE POS <> REM
)
SELECT
ID
FROM
cte_Clean
WHERE
VAL = 32
ORDER BY ID

Recursive query with CTE - SUM of child columns for a given parent

I have a forum database that stores forum information in a single column. The forum allows for unlimited subforums.
Table name - forums
| ForumID | ParentForumID | Name | Description | TopicCount | ReplyCount | LastPost |
Given a ForumID as a parameter I am trying to SUM the TopicCount and ReplyCount for all child entries. I am also trying to return the latest LastPost, which is specified as DATETIME.
I've searched google and this forum and understand I should be using a recursive CTE but am having some difficulty understanding the syntax. Here is my CTE - work in progress.
WITH CTE (ForumID, ParentForumID)
AS
(
SELECT ForumID AS Descendant, ParentForumID as Ancestor
FROM forums
UNION ALL
SELECT e.Ancestor
FROM
CTE as e
INNER JOIN CTE AS d
ON Descendant = d.ParentForumID
)
SELECT e.Descendant, SUM(TopicCount) AS topics, SUM(ReplyCount) AS replys
FROM CTE e
WHERE e.Ancestor = 1
Where 1 = Parameter for the forum ID.
Thanks in advance for the help!

You're doing OK - you're quite close :-)
Basically, you need to:
define the initial forum to be picked before the CTE
create an "anchor" query to that forum defined
then iterate over all children and sum up the TopicCount and ReplyCount counters
So your code should look something like this:
DECLARE #RootForumID INT
SET #RootForumID = 1 -- or whatever you want...
;WITH CTE AS
(
-- define the "anchor" query - select the chosen forum
SELECT
ForumID, TopicCount, ReplyCount, LastPost
FROM
dbo.forums
WHERE
ForumID = #RootForumID
UNION ALL
-- select the child rows
SELECT
f.ForumID, f.TopicCount, f.ReplyCount, f.LastPost
FROM
dbo.forums f
INNER JOIN
CTE on f.ParentForumID = CTE.ForumID
)
SELECT
SUM(TopicCount) AS topics,
SUM(ReplyCount) AS replys,
MAX(LastPost) AS 'Latest Post'
FROM
CTE
Of course, you could wrap this into a stored procedure that would take the initial "root" ForumID as a parameter .