Questions About Sort Order Using XQuery in SQL Server - sql-server

Is the result of a SELECT like the one below using XQuery in SQL Server guaranteed to be in document order, that is in the original order the nodes are listed in the xml string?
DECLARE #x XML = '<hey i="3"/><hey i="4"/><hey i="0"/>'
SELECT t.i.value('.', 'int')
FROM #x.nodes('/hey/#i') t(i)
I ask because in general SELECT statements do not guarantee an order unless one is provided.
Also, if this order is (or is not) guaranteed, is that documented somewhere officially, maybe on Microsoft's web site?
Last, is it possible to sort opposite of document order or do other strange sorts and queries based on the original document order from within the SELECT statement?

The
DECLARE #x XML = '<hey i="3"/><hey i="4"/><hey i="0"/>'
is an example of an XML "sequence". By definition, without a sort order, the select should always come back in the documents original order.
As already mentioned, you can change the sort order. Here is one example:
SELECT t.i.value('.', 'int') as x, ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) as rn
FROM #x.nodes('/hey/#i') t(i)
order by rn desc
Here is some information about sequences on the Microsoft site:
http://msdn.microsoft.com/en-us/library/ms179215(v=sql.105).aspx
Here is a more general discussion of a sequence in Xquery.
http://en.wikibooks.org/wiki/XQuery/Sequences
I realize that my original answer above is incorrect after reading the page on the Microsoft site I referred to above. That page says that you need a comma between elements to construct a sequence. The example given is not a "sequence". However, my original info about changing the sort order stands :)

I think that same rules of Select apply here, no matter if you're selectig from ordinary table, or XML. There's selection part, and there's projection part, and the engine can take different paths to retrieve your data (from the middle sideways, for example). Unfortunately I can't find any official document to support that.
And for sure, there's no intrinsic table/document order that you can access and manipulate.

You can add order by clause to the select statement.
DECLARE #x XML = '<hey i="3"/><hey i="4"/><hey i="0"/>'SELECT [column1] = t.i.value('.', 'int') FROM #x.nodes('/hey/#i') t(i) order by column1 desc

I know what you mean about this. I suspect the order would be document order, but the documentation does not make it clear, and relying on it implicitly just isn't nice.
One way to be confident about the order would be:
DECLARE #x XML = '<hey i="3"/><hey i="4"/><hey i="0"/>';
select #x.value('(/hey[sql:column("id")]/#i)[1]', 'int'), id
from (
select row_number() over (order by (select 0)) as id
from #x.nodes('/hey') t(i)
) t
order by id
This would then give you a way to answer your other question, i.e. getting the values in reverse, or some other, order.
N.B. This is going to be much slower than just using nodes as the size of your XML increases.

Related

Is there a way to pass a variable as a singleton when querying XML?

First time I'm working with XML, and seem to have trouble passing a variable to the singleton of XML query. The error that shows when I execute the query is "The argument 1 of the XML data type method "value" must be a string literal"
I have set the variable to data type varchar. What am I missing here?
----Inside LOOP----
SELECT #count = #count + 1
select
a.XMLData.value('(//FlightInfo/TailNumber/text())['''+ #count +''']', 'varchar(100)') TaiNumber
from
XMLwithOpenXML a
Is there an easier way to go through XML nodes to amend those nodes that have incorrect/missing data?
I'm trying to traverse the "TailNumber" nodes, and then using some kind of join or where clause to update the node value with the correct TailNumber
AFAIK, SQL Server still doesn't support the position() function as an output, only as a filtering criterion. As such, there is no easy way to number the nodes in their natural order. All solutions I've seen calculate the position of each node by counting all preceding nodes, producing essentially a semi-cartesian which kills all hopes for performance on a decently-sized XML.
You can, however, use a rather dirty trick which I employed some 8 years ago. You can generate a monotonous sequence of numbers and use them to find nodes in corresponding positions. The code below illustrates the approach:
-- Sample data
declare #t table (
Id int identity(1,1) primary key,
XMLData xml not null
);
insert into #t (XMLData)
values (N'<r>
<Item>First Value</Item>
<Item>Second</Item>
</r>'),
(N'<r>
<Item>Another One</Item>
<Item>123456</Item>
<Item>The last</Item>
</r>');
declare #NodeCount int;
-- Get the largest amount of nodes among the rows of interest
select #NodeCount = max(t.XMLData.value('count(/r/Item)', 'int'))
from #t t;
select t.Id, n.RN, i.c.value('./text()[1]', 'varchar(100)') as [NodeValue]
from #t t
cross join (
-- Generate a number sequence for each /r/Item node in every row
select top (#NodeCount) row_number() over(order by (select null)) as [RN]
from sys.all_objects
) n
-- Get N-th node
cross apply t.XMLData.nodes('/r/Item[position() = sql:column("n.RN")]') i(c)
order by t.Id, n.RN;
Just be careful about running it against a table with several million rows and / or big XML blobs. It runs pretty neat in this example, but I have no idea how efficient it will be in real-world cases. If your XMLs are big, or you have plenty of rows to query, a better solution might be to pre-process the XML by adding node numbers as attributes using a language other than SQL. A CLR function, for example, might be much better for this.
P.S. Still, should be faster than a loop...

Alternative to ROW_NUMBER() to get row position?

I have a dynamically generated query with a potentially complex ORDER BY clause. I need to retrieve the row number into a column for further processing. All the documentation I've been able to find points me to ROW_NUMBER() but —unless I'm missing something— I need to rewrite the query to move the ORDER BY clause from this:
SELECT ...
FROM ...
JOIN ...
WHERE ...
ORDER BY ...
... to this:
SELECT ..., ROW_NUMBER() OVER(ORDER BY ...) AS RN
FROM ...
JOIN ...
WHERE ...
I can certainly do that but it involves tweaking some convoluted code that's shared by other modules that do not need this.
Is there a variable of function that just retrieves row position in current result set?
Another approach I've seen people use is the following:
IF (OBJECT_ID('tempdb..#tempresult') IS NOT NULL)
DROP TABLE #tempresult;
CREATE TABLE #tempresult (
idx INT IDENTITY(1,1),
...
);
INSERT #tempresult ...
SELECT ...
FROM ...
JOIN ...
WHERE ...
ORDER BY ...
idx is actually what we look for.
However, not sure if this would be more performance optimal. Depends on your cases.
The temp table could be replaced with table variable if necessary, and also a PRIMARY KEY on idx could be used.
Generally I would always go for ROW_NUMBER() as it is overall the better option.
You can do it like this:
select *, row_number() over(order by (select null))
from MyTable
order by Col1
The ORDER BY in the OVER clause is seperate from the ORDER BY clause for the query.
order by (select null) gets round the problem of having to supply a column for ordering the row_number() function.
If you have concerns about performance, you should do some testing for your situation and post another question if there is a problem.
For the records, this precise feature does not seem to be implemented in SQL Server at the time of writing, not even in latest versions. You need to use other techniques like ROW_NUMBER() or the temporary table trick explained in the accepted answer.
(Oracle, for instance, has the ROWNUM pseudo-column.)

Does the `nodes()` method keep the document order?

Does the nodes() method of the xml data type return nodes in document order?
For example, if there are data like:
declare #xml xml;
set #xml = '<Fruits><Apple /><Banana /><Orange /><Pear /></Fruits>';
which is queried as
select T.c.query('.')
from #xml.nodes('/Fruits/*') T(c);
will elements be returned in document order? Order of rows returned by select is known to be undefined if order by clause is omitted. Is it the case for select ... from ... .nodes(), or is it exceptional?
Yes, nodes() generates a row set in document order. The operator used in the query plan to do this is the Table Valued Function XML Reader.
Table-valued Function XML Reader inputs an XML BLOB as a parameter and
produces a row set representing XML nodes in XML document order. Other
input parameters may restrict XML nodes returned to a subset of XML
document.
But a query without order by has an undefined order so there are no guarantees.
One way to work around that is to use the id generated by the table valued function in row_number() over() clause and use the generated number in the order by.
select X.q
from
(
select T.c.query('.') as q,
row_number() over(order by T.c) as rn
from #xml.nodes('/Fruits/*') T(c)
) as X
order by X.rn
It is not possible to use T.c in an order by directly. Trying that will give you
Msg 493, Level 16, State 1, Line 19
The column 'c' that was returned from the nodes() method cannot be used directly. It can only be used with one of the four XML data type methods, exist(), nodes(), query(), and value(), or in IS NULL and IS NOT NULL checks.
The error did not mention that it should work with row_number but it does and that could very well be bug that might get fixed so the code above will fail. But up until SQL Server 2012 it works just fine.
A way to get a guaranteed order without relying on the undocumented use of row_number would be to use a table of numbers where you extract the nodes by position.
select T.c.query('.') as q
from Numbers as N
cross apply #xml.nodes('/Fruits/*[sql:column("N.Number")]') as T(c)
where N.Number between 1 and #xml.value('count(/Fruits/*)', 'int')
order by N.Number

Sql Server Xml Value method -- Distinct Values for one column

I have the below query I am trying to return distinct value from the second .value method.
Here is what I have tried. I tried adding 'distinct-values(.)' to return only distinct but it is still returning the same results as a normal '.' How can I select distinct values from just one column?
;WITH XMLNAMESPACES (default 'http://www.w3.org/2001/XMLSchema')
SELECT
a.value('.', 'NVARCHAR(50)') AS Visitor
, b.value('distinct-values(.)', 'NVARCHAR(50)') AS Sender
FROM XmlTable AS X
CROSS APPLY xmlDocument.nodes('Root/Visitors/Visitor') AS aa(a)
CROSS APPLY xmlDocument.nodes('Root/Senders/Sender') AS bb(b)
Here is the normal result
Here is whay I am trying to get
Xml Like this
<upx:Root xmlns:upx="http://www.w3.org/2001/XMLSchema">
<upx:Visitors>
<upx:Visitor>Visitor1</upx:Visitor>
<upx:Visitor>Visitor2</upx:Visitor>
</upx:Visitors>
<upx:Senders>
<upx:Sender>Sender1</upx:Sender>
</upx:Senders>
</upx:Root>
It is your cross apply with your nodes statement listed twice that is showing this problem. Do what you are doing with the 'nodes' syntax with a 'query' extension instead followed up by a 'value' extension to show what is in the xml directly from extension instead of relying on the nodes with a cross apply. The problem is you are not displaying to the audience where you get that Id from? Are you determining that at run time from the xml itself or joining yet to another table or having another part of the xml not present? What in essence that is happening with the nodes is it is cross applying and saying: "I have two vales in that node heirarchy here they are." Then you are cross applying again a different node and it is returning the same thing twice. You must be careful when using cross apply twice exactly what it is doing. I can show the differentiation but without how I know you are relating back to 1 (are you just hunting for it somehow for the int after visitor?) I don't know how to represent exactly what you are wanting.
EDIT: Okay it is what I thought then. Now my code may be longer than some and I will admit there may be an easier way to do this however I would do three things:
Keep your cross apply with nodes because nodes is useful in that it will repeat rows you need to count on. However I would add an artificial flag for the name you use for the node. Then I would union together two select statements using the nodes.
I would then use a nested select as a from statement and then determine row number with a windowed function based on the flags I just set.
I would then nest that again and then use the very same row number as the Id of the row number and then I would do some syntactic pivoting based on a max(case when) based on the flags I arbitrarily set.
I usually prefer cte's but since your XML namespace has a 'with' beginning and the first cte does as well I forgot how the syntax is to work around that. Nested Selects IMHO can get hairy when there are multiple so I choose CTE's usually but in this case I did a nested select inside of another nested select. I hope this helps:
declare #xml xml = '<upx:Root xmlns:upx="http://www.w3.org/2001/XMLSchema">
<upx:Visitors>
<upx:Visitor>Visitor1</upx:Visitor>
<upx:Visitor>Visitor2</upx:Visitor>
</upx:Visitors>
<upx:Senders>
<upx:Sender>Sender1</upx:Sender>
</upx:Senders>
</upx:Root>'
;
declare #Xmltable table ( xmlDocument xml);
insert into #XmlTable values (#xml);
WITH XMLNAMESPACES (default 'http://www.w3.org/2001/XMLSchema')
select
pos as Id
, max(case when Listing = 'Visitors' then Value end) as Visitors
, max(case when Listing = 'Senders' then Value end) as Senders
from
(
select
*
, row_number() over(partition by Listing order by Value) as pos
from
(
SELECT
'Visitors' as Listing
, a.value('.', 'NVARCHAR(50)') AS Value
FROM #XmlTable AS X
CROSS APPLY xmlDocument.nodes('Root/Visitors/Visitor') AS aa(a)
union
SELECT
'Senders'
, b.value('distinct-values(.)', 'NVARCHAR(50)') AS Sender
FROM #XmlTable AS X
CROSS APPLY xmlDocument.nodes('Root/Senders/Sender') AS bb(b)
) as u
) as listing
group by pos

Order Of Execution of the SQL query

I am confused with the order of execution of this query, please explain me this.
I am confused with when the join is applied, function is called, a new column is added with the Case and when the serial number is added. Please explain the order of execution of all this.
select Row_number() OVER(ORDER BY (SELECT 1)) AS 'Serial Number',
EP.FirstName,Ep.LastName,[dbo].[GetBookingRoleName](ES.UserId,EP.BookingRole) as RoleName,
(select top 1 convert(varchar(10),eventDate,103)from [3rdi_EventDates] where EventId=13) as EventDate,
(CASE [dbo].[GetBookingRoleName](ES.UserId,EP.BookingRole)
WHEN '90 Day Client' THEN 'DC'
WHEN 'Association Client' THEN 'DC'
WHEN 'Autism Whisperer' THEN 'DC'
WHEN 'CampII' THEN 'AD'
WHEN 'Captain' THEN 'AD'
WHEN 'Chiropractic Assistant' THEN 'AD'
WHEN 'Coaches' THEN 'AD'
END) as Category from [3rdi_EventParticipants] as EP
inner join [3rdi_EventSignup] as ES on EP.SignUpId = ES.SignUpId
where EP.EventId = 13
and userid in (
select distinct userid from userroles
--where roleid not in(6,7,61,64) and roleid not in(1,2))
where roleid not in(19, 20, 21, 22) and roleid not in(1,2))
This is the function which is called from the above query.
CREATE function [dbo].[GetBookingRoleName]
(
#UserId as integer,
#BookingId as integer
)
RETURNS varchar(20)
as
begin
declare #RoleName varchar(20)
if #BookingId = -1
Select Top 1 #RoleName=R.RoleName From UserRoles UR inner join Roles R on UR.RoleId=R.RoleId Where UR.UserId=#UserId and R.RoleId not in(1,2)
else
Select #RoleName= RoleName From Roles where RoleId = #BookingId
return #RoleName
end
Queries are generally processed in the follow order (SQL Server). I have no idea if other RDBMS's do it this way.
FROM [MyTable]
ON [MyCondition]
JOIN [MyJoinedTable]
WHERE [...]
GROUP BY [...]
HAVING [...]
SELECT [...]
ORDER BY [...]
SQL is a declarative language. The result of a query must be what you would get if you evaluated as follows (from Microsoft):
Logical Processing Order of the SELECT statement
The following steps show the logical
processing order, or binding order,
for a SELECT statement. This order
determines when the objects defined in
one step are made available to the
clauses in subsequent steps. For
example, if the query processor can
bind to (access) the tables or views
defined in the FROM clause, these
objects and their columns are made
available to all subsequent steps.
Conversely, because the SELECT clause
is step 8, any column aliases or
derived columns defined in that clause
cannot be referenced by preceding
clauses. However, they can be
referenced by subsequent clauses such
as the ORDER BY clause. Note that the
actual physical execution of the
statement is determined by the query
processor and the order may vary from
this list.
FROM
ON
JOIN
WHERE
GROUP BY
WITH CUBE or WITH ROLLUP
HAVING
SELECT
DISTINCT
ORDER BY
TOP
The optimizer is free to choose any order it feels appropriate to produce the best execution time. Given any SQL query, is basically impossible to anybody to pretend it knows the execution order. If you add detailed information about the schema involved (exact tables and indexes definition) and the estimated cardinalities (size of data and selectivity of keys) then one can take a guess at the probable execution order.
Ultimately, the only correct 'order' is the one described ion the actual execution plan. See Displaying Execution Plans by Using SQL Server Profiler Event Classes and Displaying Graphical Execution Plans (SQL Server Management Studio).
A completely different thing though is how do queries, subqueries and expressions project themselves into 'validity'. For instance if you have an aliased expression in the SELECT projection list, can you use the alias in the WHERE clause? Like this:
SELECT a+b as c
FROM t
WHERE c=...;
Is the use of c alias valid in the where clause? The answer is NO. Queries form a syntax tree, and a lower branch of the tree cannot be reference something defined higher in the tree. This is not necessarily an order of 'execution', is more of a syntax parsing issue. It is equivalent to writing this code in C#:
void Select (int a, int b)
{
if (c = ...) then {...}
int c = a+b;
}
Just as in C# this code won't compile because the variable c is used before is defined, the SELECT above won't compile properly because the alias c is referenced lower in the tree than is actually defined.
Unfortunately, unlike the well known rules of C/C# language parsing, the SQL rules of how the query tree is built are somehow esoteric. There is a brief mention of them in Single SQL Statement Processing but a detailed discussion of how they are created, and what order is valid and what not, I don't know of any source. I'm not saying there aren't good sources, I'm sure some of the good SQL books out there cover this topic.
Note that the syntax tree order does not match the visual order of the SQL text. For example the ORDER BY clause is usually the last in the SQL text, but as a syntax tree it sits above everything else (it sorts the output of the SELECT, so it sits above the SELECTed columns so to speak) and as such is is valid to reference the c alias:
SELECT a+b as c
FROM t
ORDER BY c;
SQL query is not imperative but declarative, so you have no idea which the statement is executed first, but since SQL is evaluated by SQL query engines, most of the SQL engines follows similar process to obtain the results. You may have to understand how the query engine works internally to understand some SQL execution behavior.
Julia Evens has a great post explaining this, it is worth to check it out:
https://jvns.ca/blog/2019/10/03/sql-queries-don-t-start-with-select/
SQL is a declarative language, meaning that it tells the SQL engine what to do, not how. This is in contrast to an imperative language such as C, in which how to do something is clearly laid out.
This means that not all statements will execute as expected. Of particular note are boolean expressions, which may not evaluate from left-to-right as written. For example, the following code is not guaranteed to execute without a divide by zero error:
SELECT 'null' WHERE 1 = 1 OR 1 / 0 = 0
The reason for this is the query optimizer chooses the best (most efficient) way to execute a statement. This means that, for example, a value may be loaded and filtered before a transforming predicate is applied, causing an error. See the second link above for an example
See: here and here.
"Order of execution" is probably a bad mental model for SQL queries. Its hard to actually write a single query that would actually depend on order of execution (this is a good thing). Instead you should think of all join and where clauses happening simultaneously (almost like a template)
That said you could run display the Execution Plans which should give you insight into it.
However since its's not clear why you want to know the order of execution, I'm guessing your trying to get a mental model for this query so you can fix it in some way. This is how I would "translate" your query, although I've done well with this kind of analysis there's some grey area with how precise it is.
FROM AND WHERE CLAUSE
Give me all the Event Participants rows. from [3rdi_EventParticipants
Also give me all the Event Signup rows that match the Event Participants rows on SignUpID inner join 3rdi_EventSignup] as ES on EP.SignUpId = ES.SignUpId
But Only for Event 13 EP.EventId = 13
And only if the user id has a record in the user roles table where the role id is not in 1,2,19,20,21,22
userid in (
select distinct userid from userroles
--where roleid not in(6,7,61,64) and roleid not in(1,2))
where roleid not in(19, 20, 21, 22) and roleid not in(1,2))
SELECT CLAUSE
For each of the rows give me a unique ID
Row_number() OVER(ORDER BY (SELECT 1)) AS 'Serial Number',
The participants First Name EP.FirstName
The participants Last Name Ep.LastName
The Booking Role name GetBookingRoleName
Go look in the Event Dates and find out what the first eventDate where the EventId = 13 that you find
(select top 1 convert(varchar(10),eventDate,103)from [3rdi_EventDates] where EventId=13) as EventDate
Finally translate the GetBookingRoleName in Category. I don't have a table for this so I'll map it manually (CASE [dbo].[GetBookingRoleName](ES.UserId,EP.BookingRole)
WHEN '90 Day Client' THEN 'DC'
WHEN 'Association Client' THEN 'DC'
WHEN 'Autism Whisperer' THEN 'DC'
WHEN 'CampII' THEN 'AD'
WHEN 'Captain' THEN 'AD'
WHEN 'Chiropractic Assistant' THEN 'AD'
WHEN 'Coaches' THEN 'AD'
END) as Category
So a couple of notes here. You're not ordering by anything when you select TOP. You should probably have na order by there. You could also just as easily put that in your from clause e.g.
from [3rdi_EventParticipants] as EP
inner join [3rdi_EventSignup] as ES on EP.SignUpId = ES.SignUpId,
(select top 1 convert(varchar(10),eventDate,103)
from [3rdi_EventDates] where EventId=13
Order by eventDate) dates
There is a logical order to evaluation of the query text, but the database engine can choose what order execute the query components based upon what is most optimal. The logical text parsing ordering is listed below. That is, for example, why you can't use an alias from SELECT clause in a WHERE clause. As far as the query parsing process is concerned, the alias doesn't exist yet.
FROM
ON
OUTER
WHERE
GROUP BY
CUBE | ROLLUP
HAVING
SELECT
DISTINCT
ORDER BY
TOP
See the Microsoft documentation (see "Logical Processing Order of the SELECT statement") for more information on this.
Simplified order for T-SQL -> SELECT statement:
1) FROM
2) Cartesian product
3) ON
4) Outer rows
5) WHERE
6) GROUP BY
7) HAVING
8) SELECT
9) Evaluation phase in SELECT
10) DISTINCT
11) ORDER BY
12) TOP
as I had done so far - same order was applicable in SQLite.
Source => SELECT (Transact-SQL)
... of course there are (rare) exceptions.

Resources