How to retrieve rows with XML column that have specific child node?

How to retrieve rows with XML column that have specific child node? - sql-server

I have a table in SQL Server with an XML column:
CREATE TABLE dbo.XmlData (
id INT IDENTITY,
data XML
);
The XML stored in the data column is dynamic, I mean, only some of them have certain child node. Additionally, this nodes appear in different locations within the XML structure.
Example 1, /RootNode/FindMe:
<RootNode>
<ChildNodeOne>One</ChildNodeOne>
<ChildNodeTwo>Two</ChildNodeTwo>
<ChildNodeThree>Three</ChildNodeThree>
<FindMe>FindMe</FindMe>
</RootNode>
Example 2, /RootNode/ChildNodeThree/Deeper/FindMe and /RootNode/ChildNodeTwo/FindMe :
<RootNode>
<ChildNodeOne>One</ChildNodeOne>
<ChildNodeTwo>
<FindMe>FindMe</FindMe>
</ChildNodeTwo>
<ChildNodeThree>
<Deeper>
<FindMe>FindMe</FindMe>
</Deeper>
</ChildNodeThree>
</RootNode>
There are rows with the XML column that doesn't have the <FindMe/> node.
I need to write a query to retrieve only the rows that have the <FindMe/> node. No matter its location within the XML structure.

The first thing to know is, that //FindMe will trigger a deep search. That means: Find this node whereever it exists in the XML. A single / marks the root element (Start the search at the beginning) and no slash at all marks the current node (Continue from the context node):
In the following there are some examples how one could query for a node name:
A dummy table for tests:
CREATE TABLE dbo.DummyTbl (
id INT IDENTITY,
Remark VARCHAR(100),
data XML
);
INSERT INTO DummyTbl VALUES
('FindMe is on second level',
'<RootNode>
<ChildNodeOne>One</ChildNodeOne>
<ChildNodeTwo>Two</ChildNodeTwo>
<ChildNodeThree>Three</ChildNodeThree>
<FindMe>FindMe 1</FindMe>
</RootNode>')
,('FindMe is two times somewhere deeper',
'<RootNode>
<ChildNodeOne>One</ChildNodeOne>
<ChildNodeTwo>
<FindMe>FindMe 2a</FindMe>
</ChildNodeTwo>
<ChildNodeThree>
<Deeper>
<FindMe>FindMe 2b</FindMe>
</Deeper>
</ChildNodeThree>
</RootNode>')
,('FindMe does not exist',
'<RootNode>
<ChildNodeOne>One</ChildNodeOne>
<ChildNodeTwo>
</ChildNodeTwo>
<ChildNodeThree>
<Deeper>
<FindMeNot>Something else</FindMeNot>
</Deeper>
</ChildNodeThree>
</RootNode>')
,('FindMe exists, but is empty',
'<RootNode>
<ChildNodeOne>One</ChildNodeOne>
<ChildNodeTwo>
</ChildNodeTwo>
<ChildNodeThree>
<Deeper>
<FindMe/>
</Deeper>
</ChildNodeThree>
</RootNode>');
Different queries:
--All FindMe nodes (inlcuding the empty one)
SELECT d.id,d.Remark
,f.value('.','nvarchar(max)') FindMeNode
FROM dbo.DummyTbl AS d
CROSS APPLY d.data.nodes('//FindMe') AS A(f)
--All IDs where there is at least one "FindMe"-node
SELECT d.id,d.Remark
FROM dbo.DummyTbl AS d
WHERE d.data.exist('//FindMe')=1
GO
--All IDs where there is at least one "FindMe"-node, but not the empty one
SELECT d.id,d.Remark
FROM dbo.DummyTbl AS d
WHERE d.data.exist('//FindMe/text()')=1
--Find IDs, where there is at least one empty FindMe node
SELECT d.id,d.Remark
FROM dbo.DummyTbl AS d
WHERE d.data.exist('//FindMe[empty(text())]')=1
--Now with a variable name to search for
DECLARE #SearchForName NVARCHAR(100)='FindMe';
SELECT d.id,d.Remark
FROM dbo.DummyTbl AS d
WHERE d.data.exist('//*[local-name()=sql:variable("#SearchForName")]/text()')=1
GO
--Clean up
DROP TABLE dbo.DummyTbl;

Related

Group by based on specific column contains value from list of values

I've one table myTable
ID
Content
1
Hello, this is the test content
2
Hi, test content.
I have one list having different values = ["Hello","Hi","Yes","content"]
Now I have to find occurrence of value in myTable-> content column & resultant table have value & count of that value in myTable-> content column (one row of myTable table can have more than one values & use case-insensitive search).
Output be like:
Value
Count
Hello
1
Hi
1
Yes
0
content
2
I want to make optimal SQL server query.

Assuming you are using SQL Server 2016 or above, you could try converting your list to a table like structure, and perform a left join and count on your table.
For instance :
CREATE TABLE MyTable (
ID INT CONSTRAINT PK_MyTable PRIMARY KEY,
Content NVARCHAR(MAX)
);
INSERT INTO MyTable (ID,CONTENT) VALUES
(1,'Hello, this is the test content'),
(2,'Hi, test content.');
DECLARE #MyList NVARCHAR(MAX)
SET #MyList='["Hello","Hi","Yes","content"]';
SELECT
List.Value,
COUNT(MyTable.Content) Count
FROM OPENJSON(#MyList) List --Convert the list to a json
LEFT JOIN MyTable ON '.' + UPPER(MyTable.Content) + '.' LIKE '%[^a-z]' + UPPER(List.Value) +'[^a-z]%'
GROUP BY List.Value;
You can try it on this fiddle.
Please do note that there is margin for improvement, such as full text index instead of this ugly regular expression clause.
See also :
Search for “whole word match” with SQL Server LIKE pattern

SQL unique PK for grouped data in SP

I am trying to build a temp table with grouped data from multiple tables (in an SP), I am successful in building the data set however I have a requirement that each grouped row have a unique id. I know there are ways to generate unique ids for each row, However the problem I have is that I need the id for a given row to be the same on each run regardless of the number of rows returned.
Example:
1st run:
ID Column A Column B
1 apple 15
2 orange 10
3 grape 11
2nd run:
ID Column A Column B
3 grape 11
The reason I want this is because i am sending this data up to SOLR and when I do a delta I need to have the ID back for the same row as its trying to re-index
Any way I can do this?

Not sure if this will help, not entirely confident of your wider picture, but ...
As your new data is assembled, log each [column a] value in a table of your own.
Give that table an IDENTITY column to do the numbering for you.
Now you can join any new data sets to your lookup table and you'll have a persistent number for each column A.
You just need to ensure that each time you query new data, you add new values to the lookup table.
create table dbo.myRef(
idx int identity(1,1)
,[A] nvarchar(100)
)
General draft as below ...
--- just simulating some input data here
with cte as (
select 'apple' as [A], 15 as [B]
UNION
select 'orange' as [A], 10 as [B]
UNION
select 'banana' as [A], 4 as [B]
)
select * into #temp from cte;
-- Put any new values into the lookup table
-- and they will be assigned a new index number by the identity column
insert into dbo.myRef([A])
select distinct [A]
from #temp where [A] not in (select [A] from dbo.myRef)
-- now pull your original data for output, joining to the lookup table to get a ref number.
select T.*,R.idx
from #temp T
inner join
oer.myRef R
on T.[A] = R.[A]

Sorry for the late reply, i was stuck with something else, however i solved my own issue.
I built 2 temp tables one with all the data from the various tables (#master) and another temp table (#final) to house all the grouped data with an empty column for ID
Next i did a concat(column1, '-',column2,'-', column3) on 3 columns from the #master and updated the #final table based on the type
this helped me to get the same concat ids on each run

How can I insert unique HierarchyId path without having to specify left and right sibling?

I am using HierarchyId in SQL to store the data. I am following the tutorial from here:
http://www.codeproject.com/Tips/740553/Hierarchy-ID-in-SQL-Server
The examples that are mentioned in the sample are explicitly specifying position of the node:
DECLARE #parent HierarchyId = (SELECT Node FROM H WHERE Name = 'Thuru')
DECLARE #jhony HierarchyId = (SELECT Node FROM H WHERE name = 'Johnny')
INSERT INTO H (Node,ID,Name) VALUES (#parent.GetDescendant(#jhony,NULL), 3, 'Robert')
Code is telling SQl that which are the sibling nodes of this particular Node. Which is OK. However, I all I want is that insert node at ANY position in the tree under a PARTICULAR parent. which means I want to be able to use something like:
DECLARE #parent HierarchyId = HierarchyId::GetRoot()
INSERT INTO H (Node,ID,Name) VALUES (#parent.GetDescendant(NULL,NULL),2,'Johnny')
WHich means
As far as the node is inserted under a correct parent, we do not care about the horizontal positioning of the node
When I tried GetDescendant(NULL,NULL) for multiple inserts for the same parent, it gives the same path /1/ to every child. Why is that?
Also, I came across following link: https://technet.microsoft.com/en-us/library/bb677212%28v=sql.105%29.aspx. which is showing an example of storing the last inserted child for a particular parent and then use it as a reference before inserting doing any further inserts into the DB. Is it the standart method for doing insert into a table with hierarchy to get the uniqueness in the path?

When I tried GetDescendant(NULL,NULL) for multiple inserts for the same parent, it gives the same path /1/ to every child. Why is that?
A given instance of HierarchyId doesn't keep track of all of the descendants that it has. Indeed, I can do something like the following:
declare #a hierarchyid = '/1/', #b hierarchyid = '/1/1/';
select #b.IsDescendantOf(#a); --should be 1
The thing to note in the example is that I created both #a and #b out of whole cloth (that is, I didn't create #b using the GetDescendant method). The point of the arguments to the GetDescendant method are so that it knows where in the list of siblings you'd like to place yours. If you don't care (and it seems that you don't based on your comments), the second argument will always be null (which is saying "make the new entry the last one in the list in a breadth-first traversal").
All of this has been a long-winded way to say that if you pass NULL for both arguments, it's going to assume that there are currently no descendants under that particular instance of HierarchyId and so the one that you're asking for will be the first. Another way to think about it is that the GetDescendant method is deterministic (that is to say, given the same arguments, it will return the same answer every time).
Is it the standart method for doing insert into a table with hierarchy to get the uniqueness in the path?
It seems reasonable to me. I think about it this way: I'm going to call GetDescendant with the first argument being the last existing immediate descendant in a breadth-first traversal (possibly NULL if there are no existing descendants) and the second argument of NULL (since I'm just tacking it onto the end).

I don't really like cursors for all the usual reasons. However using set based INSERT INTO..SELECT works fine for IDENTITY columns and SEQUENCES but not for HIERARCHYID. Therefore I have used this approach for multiple inserts into a hierarchy with a common parent. It could be cascaded where you have multiple levels.
-- This would add all employees who are managers from an
-- Employee table (with employee_id and isManager columns)
-- as descendants of an existing root node in an OrgChart table
BEGIN TRAN
DECLARE #root hierarchyid
DECLARE #lastNode hierarchyid
DECLARE #employee_id INT
SELECT #root = hierarchyid::GetRoot() FROM [dbo].[OrgChart]
SELECT #lastNode = NULL -- GetDescendant(NULL, NULL) for the first descendant
-- Have to use a cursor because using set based INSERT INTO..SELECT
-- with hierarchy gives each row the same hierarchyid
DECLARE c CURSOR FOR
SELECT employee_id
FROM [dbo].[Employees]
WHERE
[isManager] = 1
OPEN c
FETCH NEXT FROM c INTO #employee_id
WHILE ##FETCH_STATUS = 0
BEGIN
INSERT INTO
[dbo].[OrgChart](
orgID,
[effective_start_date],
effective_end_date,
employee_id
)
SELECT
#root.GetDescendant(#lastNode, NULL),
GETDATE(),
NULL,
#employee_id
-- Get the hierarchyid you have just added
-- so you can add the next one after it
SELECT #lastNode = orgID FROM [dbo].[OrgChart]
FETCH NEXT FROM c INTO #employee_id
END
CLOSE c
DEALLOCATE c
COMMIT

Sql Server - Capture an XML query and save it to a table?

I would like to run a (extensive) query that produces one line of XML. This XML represents about 12 tables worth of relational data. When I "delete" a row from the top level table I would like to "capture" the state of the data at that moment (in XML), save it to an archive table, then delete all the child table data and finally mark the top level table/row as "isDeleted=1". We have other data hanging off of the parent table that CANNOT be deleted and cannot lose the relation to the top table.
I have most of the XML worked out - see my post here on that can of worms.
Now, how can I capture that into an archive table?
CREATE TABLE CampaignArchive(CampaignID int, XmlData XML)
INSERT INTO CampaignArchive(CampaignID, XmlData)
SELECT CampaignID, (really long list of columns) FROM (all my tables) FOR XML PATH ...
This just does not work. :)
Any suggestions?
TIA

You need a subquery to wrap all that XML creation into one single scalar that goes into column XmlData. Again, use TYPE to create a scalar of type XML not a string:
INSERT INTO CampaignArchive(CampaignID, XmlData)
SELECT CampaignID, (
select (really long list of columns)
FROM (all my tables) FOR XML PATH ..., type);

Ordering SQL Results based on Input Params

In conjunction with the fn_split function, I'm returning a list of results from a table based on comma separated values.
The Stored Procedure T-SQL is as follows:
SELECT ProductCode, ProductDesc, ImageAsset, PriceEuros, PriceGBP, PriceDollars,
replace([FileName],' ','\_') as [filename],
ID as FileID, weight
from Products
LEFT OUTER JOIN Assets on Assets.ID = Products.ImageAsset
where ProductCode COLLATE DATABASE_DEFAULT IN
(select [value] from fn\_split(#txt,','))
and showOnWeb = 1
I pass in to the #txt param the following (as an example):
ABC001,ABC009,ABC098,ABC877,ABC723
This all works fine, however the results are not returned in any particular order - I need the products returning in the 'SAME ORDER' as the input param.
Unfortunately this is a live site with a built schema, so I can't change anything on it (but I wish I could) - otherwise I would make it more sensible.

If all of the references that are passed in on the #txt param are unique you could use CharIndex to find their position within the param e.g.
order by charindex(ProductCode, #txt)

In the stored procedure, I would create a table which has a numeric key which is the PK for the temp table and set to auto-increment. Then, I would insert the results of fn_split into that table, so that you have the parameters as they are ordered in the list (the auto-increment will take care of that).
Then, join the result set to this temp table, ordering by the numeric key.
If the entries in the parameter list are not unique, then after inserting the records into the temp table, I would delete any duplicates, except for the record where the id is the minimum for that particular parameter.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to retrieve rows with XML column that have specific child node? - sql-server

Related

Group by based on specific column contains value from list of values

SQL unique PK for grouped data in SP

How can I insert unique HierarchyId path without having to specify left and right sibling?

Sql Server - Capture an XML query and save it to a table?

Ordering SQL Results based on Input Params

Categories

Resources