Does the `nodes()` method keep the document order? - sql-server

Does the nodes() method of the xml data type return nodes in document order?
For example, if there are data like:
declare #xml xml;
set #xml = '<Fruits><Apple /><Banana /><Orange /><Pear /></Fruits>';
which is queried as
select T.c.query('.')
from #xml.nodes('/Fruits/*') T(c);
will elements be returned in document order? Order of rows returned by select is known to be undefined if order by clause is omitted. Is it the case for select ... from ... .nodes(), or is it exceptional?

Yes, nodes() generates a row set in document order. The operator used in the query plan to do this is the Table Valued Function XML Reader.
Table-valued Function XML Reader inputs an XML BLOB as a parameter and
produces a row set representing XML nodes in XML document order. Other
input parameters may restrict XML nodes returned to a subset of XML
document.
But a query without order by has an undefined order so there are no guarantees.
One way to work around that is to use the id generated by the table valued function in row_number() over() clause and use the generated number in the order by.
select X.q
from
(
select T.c.query('.') as q,
row_number() over(order by T.c) as rn
from #xml.nodes('/Fruits/*') T(c)
) as X
order by X.rn
It is not possible to use T.c in an order by directly. Trying that will give you
Msg 493, Level 16, State 1, Line 19
The column 'c' that was returned from the nodes() method cannot be used directly. It can only be used with one of the four XML data type methods, exist(), nodes(), query(), and value(), or in IS NULL and IS NOT NULL checks.
The error did not mention that it should work with row_number but it does and that could very well be bug that might get fixed so the code above will fail. But up until SQL Server 2012 it works just fine.
A way to get a guaranteed order without relying on the undocumented use of row_number would be to use a table of numbers where you extract the nodes by position.
select T.c.query('.') as q
from Numbers as N
cross apply #xml.nodes('/Fruits/*[sql:column("N.Number")]') as T(c)
where N.Number between 1 and #xml.value('count(/Fruits/*)', 'int')
order by N.Number

Related

Is there a way to pass a variable as a singleton when querying XML?

First time I'm working with XML, and seem to have trouble passing a variable to the singleton of XML query. The error that shows when I execute the query is "The argument 1 of the XML data type method "value" must be a string literal"
I have set the variable to data type varchar. What am I missing here?
----Inside LOOP----
SELECT #count = #count + 1
select
a.XMLData.value('(//FlightInfo/TailNumber/text())['''+ #count +''']', 'varchar(100)') TaiNumber
from
XMLwithOpenXML a
Is there an easier way to go through XML nodes to amend those nodes that have incorrect/missing data?
I'm trying to traverse the "TailNumber" nodes, and then using some kind of join or where clause to update the node value with the correct TailNumber
AFAIK, SQL Server still doesn't support the position() function as an output, only as a filtering criterion. As such, there is no easy way to number the nodes in their natural order. All solutions I've seen calculate the position of each node by counting all preceding nodes, producing essentially a semi-cartesian which kills all hopes for performance on a decently-sized XML.
You can, however, use a rather dirty trick which I employed some 8 years ago. You can generate a monotonous sequence of numbers and use them to find nodes in corresponding positions. The code below illustrates the approach:
-- Sample data
declare #t table (
Id int identity(1,1) primary key,
XMLData xml not null
);
insert into #t (XMLData)
values (N'<r>
<Item>First Value</Item>
<Item>Second</Item>
</r>'),
(N'<r>
<Item>Another One</Item>
<Item>123456</Item>
<Item>The last</Item>
</r>');
declare #NodeCount int;
-- Get the largest amount of nodes among the rows of interest
select #NodeCount = max(t.XMLData.value('count(/r/Item)', 'int'))
from #t t;
select t.Id, n.RN, i.c.value('./text()[1]', 'varchar(100)') as [NodeValue]
from #t t
cross join (
-- Generate a number sequence for each /r/Item node in every row
select top (#NodeCount) row_number() over(order by (select null)) as [RN]
from sys.all_objects
) n
-- Get N-th node
cross apply t.XMLData.nodes('/r/Item[position() = sql:column("n.RN")]') i(c)
order by t.Id, n.RN;
Just be careful about running it against a table with several million rows and / or big XML blobs. It runs pretty neat in this example, but I have no idea how efficient it will be in real-world cases. If your XMLs are big, or you have plenty of rows to query, a better solution might be to pre-process the XML by adding node numbers as attributes using a language other than SQL. A CLR function, for example, might be much better for this.
P.S. Still, should be faster than a loop...

T-SQL equivalent of IEnumerable.Zip()

In a T-SQL stored procedure, when supplied with two tables each of which has the same number of rows, how can I pair-wise match the rows based on row order rather than a join criteria?
Basically, an equivalent of .NET's IEnumerable.Zip() method?
I'm using SQL Server 2016.
Background
The purpose of the stored procedure is to act as an integration adapter between two other applications. I do not control the source code for either application.
The "client" application contains extensibility objects which can be configured to invoke a stored procedure in an SQL Server database. The configuration options for the extensibility point allow me to name a stored procedure which will be invoked, and provide a statically configured list of named parameters and their associated values, which will be passed to the stored procedure. Only scalar parameters are supported, not table-valued parameters.
The stored procedure needs to collect data from the "server" application (which is exposed through an OLE-DB provider) and transform it into a suitable result set for consumption by the client application.
For maintenance reasons, I want to avoid storing any configuration in the adapter database. I want to write generic, flexible logic in the stored procedure, and pass all necessary configuration information as parameters to that stored procedure.
The configuration information that's needed for the stored procedure is, essentially, equivalent to the following table variable schema:
DECLARE #TableOfServerQueryParameterValues AS TABLE (
tag NVARCHAR(50),
filterexpr NVARCHAR(500)
)
This table can then be used as the left-hand side of JOIN and CROSS APPLY queries in the stored proc which are run against the "server" application interfaces.
The problem I encountered is that I did not know of any way of passing a table of parameter info from the client application, because its extensibility points only include scalar parameter support.
So, I thought I would pass two scalar parameters. One would be a comma-separated list of tag values. The other would be a comma-separated list of filterexpr values.
Inside the stored proc, it's easy to use STRING_SPLIT to convert each of those parameters into a single-column table. But then I needed to match the two columns together into a two-column table, which I could then use as the basis for INNER JOIN or CROSS APPLY to query the server application.
The best solution I've come up with so far is selecting each table into a table variable and use the ROW_NUMBER() function to assign a row number, and then join the two tables together by matching on the extra ROW_NUMBER column. Is there an easier way to do it than that? It would be nice not to have to declare all the columns in the table variables.
Your suggestion of using row_number seems sound.
Instead of table variables you can use subqueries or CTEs; there should be little difference overall, though avoiding the table variable reduces the number of passes you need to make & avoids the additional code to maintain.
select a.*, b.* --specify whatever columns you want to return
from (
select *
, row_number() over (order by someArbitraryColumnPreferablyYourClusteredIndex) r
from TableA
) a
full outer join --use a full outer if your have different numbers of rows in the tables & want
--results from the larger table with nulls from the smaller for the bonus rows
--otherwise use an inner join to only get matches for both tables
(
select *
, row_number() over (order by someArbitraryColumnPreferablyYourClusteredIndex) r
from TableA
) b
on b.r = a.r
Update
Regarding #PanagiotisKanavos's comment on passing structured data, here's a simple example of how you could convert a value passed as an xml type to table data:
declare #tableA xml = '<TableA>
<row><col1>x</col1><col2>Anne</col2><col3>Droid</col3></row>
<row><col1>y</col1><col2>Si</col2><col3>Borg</col3></row>
<row><col1>z</col1><col2>Roe</col2><col3>Bott</col3></row>
</TableA>'
select row_number() over (order by aRow) r
, x.aRow.value('(./col1/text())[1]' , 'nvarchar(32)') Code
, x.aRow.value('(./col2/text())[1]' , 'nvarchar(32)') GivenName
, x.aRow.value('(./col3/text())[1]' , 'nvarchar(32)') Surname
from #tableA.nodes('/*/*') x(aRow)
You may get a performance boost over the above by using the following. This creates a dummy column allowing us to do an order by where we don't care about the order. This should be faster than the above as ordering by 1 will be simpler than sorting based on the xml type.
select row_number() over (order by ignoreMe) r
, x.aRow.value('(./col1/text())[1]' , 'nvarchar(32)') Code
, x.aRow.value('(./col2/text())[1]' , 'nvarchar(32)') GivenName
, x.aRow.value('(./col3/text())[1]' , 'nvarchar(32)') Surname
from #tableA.nodes('/*/*') x(aRow)
cross join (select 1) a(ignoreMe)
If you do care about the order, you can order by the data's fields, as such:
select row_number() over (order by x.aRow.value('(./col1/text())[1]' , 'nvarchar(32)') ) r
, x.aRow.value('(./col1/text())[1]' , 'nvarchar(32)') Code
, x.aRow.value('(./col2/text())[1]' , 'nvarchar(32)') GivenName
, x.aRow.value('(./col3/text())[1]' , 'nvarchar(32)') Surname
from #tableA.nodes('/*/*') x(aRow)

Is it possible to avoid subquery in select when concatenating columns?

I have a "main" table containing an id (plus some other columns) and an aka table which joins to it by the [main id] column to main.id. The following query returns some columns from main along with a column of concatenated comma-separated "lastName"s from aka:
SELECT m.id, m.name,
(SELECT a.[lastname] + ',' AS [text()]
FROM aka a
WHERE a.[main id] = m.[id]
FOR xml path ('')) [akas]
FROM main m
This works fine, but I'd like to know if there is a way to avoid doing this in a subquery?
Using CROSS APPLY you could move subquery from SELECT list:
SELECT m.id, m.name,
(SELECT a.[lastname] + ',' AS [text()]
FROM aka a
WHERE a.[main id] = m.[id]
FOR xml path ('')) [akas]
FROM main m;
to:
SELECT m.id, m.name, s.akas
FROM main m
CROSS APPLY (SELECT a.[lastname] + ',' AS [text()]
FROM aka a
WHERE a.[main id] = m.[id]
FOR xml path ('')) AS s(akas)
Notes:
You could refer to s.akas multiple time
You could add WHERE s.akas ...
Long subquery in SELECT list could be less readable
If it is possbile that correlated subquery return no rows you need to use OUTER APPLY instead.
Generally spoken there's nothing against a sub-query in a technical view...
You might prefer an APPLY due to readability or multi reference.
Whenever you put a sub-query directly into the column's list like here:
SELECT Column1
,Column2
,(SELECT x FROM y) AS Column3
,[...]
... this sub-select must deliver
just one column
of just one row
Using FOR XML PATH(''),TYPE lets the result be one single value of type XML. This makes it possible to return many rows/columns "as one". Without the ,TYPE it will be the XML "as text". The concatenation trick with XML is possible due to a speciality of the generation of XML with empty tag names and return "as text". But in any case: The returned value will be just one bit of information, therefore fitting into a column list.
Whenever you expect more than one row, you'd have to force this to be one bit of data (like - often seen! - SELECT TOP 1 x FROM y ORDER BY SomeSortKey, which brings back the first or the last or ...)
All other intentions to get 1:n data needs 'JOIN' or 'APPLY'. With scalar data, as in your case, there will actually be no difference, whether you use a sub-select or an APPLY.
Since you have an arbitrary number of records combining for the final string, what you have is the best option to do this in SQL. Generally, you are expected to return one row per item, and if you want a CSV string then build that in your client code.

Sql Server Xml Value method -- Distinct Values for one column

I have the below query I am trying to return distinct value from the second .value method.
Here is what I have tried. I tried adding 'distinct-values(.)' to return only distinct but it is still returning the same results as a normal '.' How can I select distinct values from just one column?
;WITH XMLNAMESPACES (default 'http://www.w3.org/2001/XMLSchema')
SELECT
a.value('.', 'NVARCHAR(50)') AS Visitor
, b.value('distinct-values(.)', 'NVARCHAR(50)') AS Sender
FROM XmlTable AS X
CROSS APPLY xmlDocument.nodes('Root/Visitors/Visitor') AS aa(a)
CROSS APPLY xmlDocument.nodes('Root/Senders/Sender') AS bb(b)
Here is the normal result
Here is whay I am trying to get
Xml Like this
<upx:Root xmlns:upx="http://www.w3.org/2001/XMLSchema">
<upx:Visitors>
<upx:Visitor>Visitor1</upx:Visitor>
<upx:Visitor>Visitor2</upx:Visitor>
</upx:Visitors>
<upx:Senders>
<upx:Sender>Sender1</upx:Sender>
</upx:Senders>
</upx:Root>
It is your cross apply with your nodes statement listed twice that is showing this problem. Do what you are doing with the 'nodes' syntax with a 'query' extension instead followed up by a 'value' extension to show what is in the xml directly from extension instead of relying on the nodes with a cross apply. The problem is you are not displaying to the audience where you get that Id from? Are you determining that at run time from the xml itself or joining yet to another table or having another part of the xml not present? What in essence that is happening with the nodes is it is cross applying and saying: "I have two vales in that node heirarchy here they are." Then you are cross applying again a different node and it is returning the same thing twice. You must be careful when using cross apply twice exactly what it is doing. I can show the differentiation but without how I know you are relating back to 1 (are you just hunting for it somehow for the int after visitor?) I don't know how to represent exactly what you are wanting.
EDIT: Okay it is what I thought then. Now my code may be longer than some and I will admit there may be an easier way to do this however I would do three things:
Keep your cross apply with nodes because nodes is useful in that it will repeat rows you need to count on. However I would add an artificial flag for the name you use for the node. Then I would union together two select statements using the nodes.
I would then use a nested select as a from statement and then determine row number with a windowed function based on the flags I just set.
I would then nest that again and then use the very same row number as the Id of the row number and then I would do some syntactic pivoting based on a max(case when) based on the flags I arbitrarily set.
I usually prefer cte's but since your XML namespace has a 'with' beginning and the first cte does as well I forgot how the syntax is to work around that. Nested Selects IMHO can get hairy when there are multiple so I choose CTE's usually but in this case I did a nested select inside of another nested select. I hope this helps:
declare #xml xml = '<upx:Root xmlns:upx="http://www.w3.org/2001/XMLSchema">
<upx:Visitors>
<upx:Visitor>Visitor1</upx:Visitor>
<upx:Visitor>Visitor2</upx:Visitor>
</upx:Visitors>
<upx:Senders>
<upx:Sender>Sender1</upx:Sender>
</upx:Senders>
</upx:Root>'
;
declare #Xmltable table ( xmlDocument xml);
insert into #XmlTable values (#xml);
WITH XMLNAMESPACES (default 'http://www.w3.org/2001/XMLSchema')
select
pos as Id
, max(case when Listing = 'Visitors' then Value end) as Visitors
, max(case when Listing = 'Senders' then Value end) as Senders
from
(
select
*
, row_number() over(partition by Listing order by Value) as pos
from
(
SELECT
'Visitors' as Listing
, a.value('.', 'NVARCHAR(50)') AS Value
FROM #XmlTable AS X
CROSS APPLY xmlDocument.nodes('Root/Visitors/Visitor') AS aa(a)
union
SELECT
'Senders'
, b.value('distinct-values(.)', 'NVARCHAR(50)') AS Sender
FROM #XmlTable AS X
CROSS APPLY xmlDocument.nodes('Root/Senders/Sender') AS bb(b)
) as u
) as listing
group by pos

Questions About Sort Order Using XQuery in SQL Server

Is the result of a SELECT like the one below using XQuery in SQL Server guaranteed to be in document order, that is in the original order the nodes are listed in the xml string?
DECLARE #x XML = '<hey i="3"/><hey i="4"/><hey i="0"/>'
SELECT t.i.value('.', 'int')
FROM #x.nodes('/hey/#i') t(i)
I ask because in general SELECT statements do not guarantee an order unless one is provided.
Also, if this order is (or is not) guaranteed, is that documented somewhere officially, maybe on Microsoft's web site?
Last, is it possible to sort opposite of document order or do other strange sorts and queries based on the original document order from within the SELECT statement?
The
DECLARE #x XML = '<hey i="3"/><hey i="4"/><hey i="0"/>'
is an example of an XML "sequence". By definition, without a sort order, the select should always come back in the documents original order.
As already mentioned, you can change the sort order. Here is one example:
SELECT t.i.value('.', 'int') as x, ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) as rn
FROM #x.nodes('/hey/#i') t(i)
order by rn desc
Here is some information about sequences on the Microsoft site:
http://msdn.microsoft.com/en-us/library/ms179215(v=sql.105).aspx
Here is a more general discussion of a sequence in Xquery.
http://en.wikibooks.org/wiki/XQuery/Sequences
I realize that my original answer above is incorrect after reading the page on the Microsoft site I referred to above. That page says that you need a comma between elements to construct a sequence. The example given is not a "sequence". However, my original info about changing the sort order stands :)
I think that same rules of Select apply here, no matter if you're selectig from ordinary table, or XML. There's selection part, and there's projection part, and the engine can take different paths to retrieve your data (from the middle sideways, for example). Unfortunately I can't find any official document to support that.
And for sure, there's no intrinsic table/document order that you can access and manipulate.
You can add order by clause to the select statement.
DECLARE #x XML = '<hey i="3"/><hey i="4"/><hey i="0"/>'SELECT [column1] = t.i.value('.', 'int') FROM #x.nodes('/hey/#i') t(i) order by column1 desc
I know what you mean about this. I suspect the order would be document order, but the documentation does not make it clear, and relying on it implicitly just isn't nice.
One way to be confident about the order would be:
DECLARE #x XML = '<hey i="3"/><hey i="4"/><hey i="0"/>';
select #x.value('(/hey[sql:column("id")]/#i)[1]', 'int'), id
from (
select row_number() over (order by (select 0)) as id
from #x.nodes('/hey') t(i)
) t
order by id
This would then give you a way to answer your other question, i.e. getting the values in reverse, or some other, order.
N.B. This is going to be much slower than just using nodes as the size of your XML increases.

Resources