Can You Optimize XML Operations in SQL Server?

Can You Optimize XML Operations in SQL Server? - sql-server

I am generating & sending XML EVENTS from the database through a SQL BROKER using SQL CLR - and it works great. However, I am looking at the SQL PLAN and am a little shocked at some of the statistics. Small transformations seem to cost quite a bit of CPU TIME.
All the examples I see online optimize the TABLE the XML sits in by adding an index (etc)...but there is no table for me (I am simply generating the XML).
As such...
Q: Is there a way to "optimize" these kind of "generational" statements?
Maybe some approaches are better than others?
I have yet to see anything online about this
Thanks.
SAMPLES OF EXPENSIVE STATEMENTS:
DECLARE #CurrentId UNIQUEIDENTIFIER = (SELECT #Event.value('(/Event/#auditId)[1]', 'UNIQUEIDENTIFIER'));
SET #Event.modify('replace value of (/Event/#auditId)[1][1] with sql:variable("#NewId")');
EVENT XML:
An event would look like...
<Event auditId="FE4D0A4C-388B-E611-9B4D-0050569B733D" force="false" CreatedOn="2016-10-05T20:14:20.020">
<DataSource machineName="ABC123">DatabaseName</DataSource>
<Topic>
<Filter>TOPIC/ENTITY/ACTION</Filter>
</Topic>
<Name>Something.Created</Name>
<Contexts>
<Context>
<Name>TableName</Name>
<Key>
<IssueId>000</IssueId>
</Key>
</Context>
</Contexts>
</Event>

An XML index will not help you with this (Read this). There are very rare situations, where this kind of index would help you. The effect is high, if you read from your XML with full-path. In the moment you are using XQuery, any kind of navigation, it makes things even worse.
.modify() is quite heavy. In this special case it could be faster to rebuild the XML as such (you know more about it than the engine does):
DECLARE #xml XML=N'
<Event auditId="FE4D0A4C-388B-E611-9B4D-0050569B733D" force="false" CreatedOn="2016-10-05T20:14:20.020">
<DataSource machineName="ABC123">DatabaseName</DataSource>
<Topic>
<Filter>TOPIC/ENTITY/ACTION</Filter>
</Topic>
<Name>Something.Created</Name>
<Contexts>
<Context>
<Name>TableName</Name>
<Key>
<IssueId>000</IssueId>
</Key>
</Context>
</Contexts>
</Event>';
DECLARE #NewId UNIQUEIDENTIFIER=NEWID();
SELECT #NewId AS [#auditId]
,e.value('#force','nvarchar(max)') AS [#force] --read this as string to avoid expensive conversions
,e.value('#CreatedOn','nvarchar(max)') AS [#CreatedOn] --same here
,e.query('*') AS [node()] --read "as-is"
FROM #xml.nodes('/Event') AS A(e)
FOR XML PATH('Event');
There is - for sure! - no general approach to get XML things faster. If this existed, it would be the one and only ...
I'd monitor the system and pick out the most expensive calls and try to modify them one by one...

Related

Having an issue with XML containing nested objects to POCO (de)serialisation (VB.Net)

I've created an SP in SQL Server that returns as XML. I decided to do this as the information has contacts and addresses in it and I wanted to reduce the amount of data I get.
<Accounts>
<Account>
<Company />
<ContactNumber />
<Addresses>
<Address>
<Line1 />
....
</Address>
<Addresses>
<Contacts>
<Contact>
<Name />
....
</Contact>
<Account>
</Accounts>
I have found SqlCommand.ExecuteXmlReader but I'm confused as to how to serialise this into my POCO. Can someone point me at what my next step is. (The POCO was created by the Insert XML as a class menu item in VS2019).
My Google fu is letting me down as I'm not sure what I should be looking for to help understand how to serialize the XML into something that will allow me to go with Foreach Account in AccountsClass type logic.
Any help is greatly appreciated.
PS The XML above is just a sample to show what I'm doing. The actual data is over 70 fields and with two FK joins the initial 40000 rows is well in excess of 1.8 million once selected as a normal dataset.
EDIT: Just in case someone stumbles on this and are in the same situation I was in.
When preparing a sample record for the Past XML to class make sure you have more than one record if you are expecting something similar to my example above. (The class changes to support more than one record.)
You get very different results when searching for deserialize when doing your research. This small change resulted in me figuring out the issue.

XML Attributes to SQL

I've seen posts that are similar to what I'm about to ask, but I can't find anything that actually solves my problem. I've seen (and duplicated) getting XML tags into SQL, but not getting the attributes of the tags into SQL.
Background: We use a program that runs "events" based on each event's schedule. The schedules are in XML. Ultimately, I'm trying to compare the last time a given event ran to the last time it should have run, based on its schedule. Once I get the schedules out of XML and into a table (or tables), I'm confident I can take it from there. But, I'm STRUGGLING with that first step.
Below is the XML I'm trying to get into a table (or tables). Any help would be greatly appreciated!!
<Schedule LastModified="2016-06-27T21:02:10.6041531Z" TimeZone="(UTC-06:00) Central Time (US & Canada)" ConvertedToUTC="True" Type="Weekly">
<Beginning StartDate="2016-05-26T22:26:00.0000000" />
<Block BlockType="AllDay" Interval="10" IntervalType="Minute" SetType="Inclusive" Start="15:00:00" End="17:00:00" Duration="02:00:00" />
<Interval Type="Weekly" RecurEveryX="1" Sunday="False" Monday="True" Tuesday="True" Wednesday="True" Thursday="True" Friday="True" Saturday="False" />
<Ending Type="NoEndDate" />
</Schedule>

Don't know, where you have your issues... Reading this XML is rather trivial (did not code all of the values, but you'll get the idea):
DECLARE #mockup TABLE(YourXML XML);
INSERT INTO #mockup VALUES
(N'<Schedule LastModified="2016-06-27T21:02:10.6041531Z" TimeZone="(UTC-06:00) Central Time (US & Canada)" ConvertedToUTC="True" Type="Weekly">
<Beginning StartDate="2016-05-26T22:26:00.0000000" />
<Block BlockType="AllDay" Interval="10" IntervalType="Minute" SetType="Inclusive" Start="15:00:00" End="17:00:00" Duration="02:00:00" />
<Interval Type="Weekly" RecurEveryX="1" Sunday="False" Monday="True" Tuesday="True" Wednesday="True" Thursday="True" Friday="True" Saturday="False" />
<Ending Type="NoEndDate" />
</Schedule>');
SELECT m.YourXML.value(N'(/Schedule/#LastModified)[1]',N'datetime') AS Schedule_LastModified
,m.YourXML.value(N'(/Schedule/#TimeZone)[1]',N'nvarchar(max)') AS Schedule_TimeZone
,m.YourXML.value(N'(/Schedule/Beginning/#StartDate)[1]',N'datetime') AS Beginning_StartDate
,m.YourXML.value(N'(/Schedule/Block/#BlockType)[1]',N'nvarchar(max)') AS Block_BlockType
,m.YourXML.value(N'(/Schedule/Block/#Interval)[1]',N'int') AS Block_Interval
,m.YourXML.value(N'(/Schedule/Interval/#Type)[1]',N'nvarchar(max)') AS Interval_Type
FROM #mockup AS m;
If this is not your solution, please edit your question: Add your own attempt, the wrong output and the expected output. Your explanation did not make is all clear to me...

Where is the data saved, from the Datatable-items, in ECM eRoom-database?

I am trying to retrieve data out of the ECM eRoom Database (which isn't documented, as I know of).
I have an eRoom with an custom "Database", some Fields.
When I query the objects table I find the "Database" row
select * from[dbo].[Objects] where internalId = 1234567
and the Rows for the entries
select top 10 * from[dbo].[Objects] where parentInternalId = 1234567
but I don't find any field with the values of the entries, only an Column with NonSearchableProperties., that is only full with Hex Data.
My Question(s),
how could i retrieve the values?
Is it possible to retrieve them with mere SQL?
What is the simplest way?

This is not the silver bullet, but it is okay for my usecase
After long goolging and alot of test scripts, i found some answers, but probably due to the fact the the system is soon end-of-life and that the documentation is not easy to read, here are my finding.
Is it possible to retrieve them with mere SQL?
As far as I could find out, no! (please correct me If I'm wrong)
how could i retrieve the values?
With the eRoom API(on the Server there are some Sample programms to query the data/objects <installation-path>\eRoom Server\Toolkit\Samples, with c++, vb, vbscript, ... all a bit to much overhead), or with the eRoom XML Query Language(exql) over soap calls.
What is the simplest way?
After alot of tests, searching in forums and many tests with soap ui. I found out that queries with exql seem to be the simplest way to retrieve Data, if you understand the structure.
Here some ressource that were helpful:
(very) Basic info of exql from the manufacturer https://eroom.asce.org/eRoomHelp/en/XML_Help/Introduction.htm
(disclaimer: I didn't find it helpful, but it shows at least some basics)
short 9 page Developer guide https://developer-content.emc.com/developer/downloads/eRoomXMLCapabilitiesUseCaseProgramDashboard.pdf (the last example on page 8, helped me understand how to setup the query, with alot of fantasy)
But for this to work, don't forget, to activate Allow XML queries and commands from external applications in the Site Settings
TIP 1:
you always can go go deeper you just need to know the right xml-element under. <Database>, <Cells> and <DBCell> can help you go deeper
TIP 2:
don't query to much data since this query likely run into timeouts
Update 1:
Just to save time for anyone who is looking, this "query" returns all rows (properties) for a Database(s) created in an eRoom Root.
(don't forget to set facility and room in the Url ex. http://server/eroomxml/facilities/TEST/Rooms/TestRoom, although it could be set in the query)
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:er="http://www.eroom.com/eRoomXML/2003/700">
<soapenv:Header/>
<soapenv:Body>
<er:ExecuteXMLCommand>
<er:eRoomXML>
<er:command er:select="HomePage/Items">
<er:getproperties>
<er:Item>
<Database>
<Rows>
<Item>
<Cells>
<DBCell>
<Content>
</Content>
</DBCell>
</Cells>
</Item>
</Rows>
</Database>
</er:Item>
</er:getproperties>
</er:command>
</er:eRoomXML>
</er:ExecuteXMLCommand>
</soapenv:Body>
</soapenv:Envelope>

What is the meaning of the "Missing Index Impact %" in a SQL Server 2008 execution plan?

I was just examining an estimated execution plan in SSMS. I noticed that a query had query cost of 99% (relative to the batch). I then examined the plan displayed below. That cost was almost entirely coming from a "Clustered Index Delete" on table A. However, the Missing Index recommendation is for Table B. And the Missing Index Impact is said to be 95%.
The query is a DELETE statement (obviously) which relies on a nested loops INNER JOIN with TableB. If nearly all the cost according to the plan is coming from the DELETE operation, why would the index suggestion be on Table B which -- even though it was a scan -- had a cost of only 0%? Is the impact of 95% an impact against the neglible cost of the scan (listed as on 0%) and not the overall cost of the query (said to be nearly ALL of the batch)?
Please explain IMPACT if possible. Here is the plan:

This is query 27 in the batch.
Probably the impact it is showing you actually belongs to an entirely different statement (1-26).
This seems to be a problem with the way that the impacts are displayed for estimated plans in SSMS.
The two batches below contain the same two statements with the order reversed. Notice in the first case it claims both statements would be helped equally with an impact of 99.38 and in the second 49.9818.
So it is showing the estimated impact for the first instance encountered of that missing index - Not the one that actually relates to the statement.
I don't see this issue in the actual execution plans and the correct impact is actually shown in the plan XML next to each statement even in the estimated plan.
I've added a Connect item report about this issue here. (Though possibly you have encountered another issue as 10% impact seems to be the cut off point for the missing index details being included in the plan and it is difficult to see how that would be possible for the same reasons as described in the question)
Example Data
CREATE TABLE T1
(
X INT,
Y CHAR(8000)
)
INSERT INTO T1
(X)
SELECT TOP 10000 ROW_NUMBER() OVER (ORDER BY ##spid)
FROM sys.all_objects o1,
sys.all_objects o2
Batch 1
SELECT *
FROM T1
WHERE X = -1
SELECT *
FROM T1
WHERE X = -1
UNION ALL
SELECT *
FROM T1
Batch 2
SELECT *
FROM T1
WHERE X = -1
UNION ALL
SELECT *
FROM T1
SELECT *
FROM T1
WHERE X = -1
The XML for the first plan (heavily truncated) is below, showing that the correct information is in the plan itself.
<?xml version="1.0" encoding="utf-16"?>
<ShowPlanXML>
<BatchSequence>
<Batch>
<Statements>
<StmtSimple StatementCompId="1">
<QueryPlan>
<MissingIndexes>
<MissingIndexGroup Impact="99.938">
<MissingIndex Database="[tempdb]" Schema="[dbo]" Table="[T1]">
<ColumnGroup Usage="EQUALITY">
<Column Name="[X]" ColumnId="1" />
</ColumnGroup>
</MissingIndex>
</MissingIndexGroup>
</MissingIndexes>
</QueryPlan>
</StmtSimple>
</Statements>
<Statements>
<StmtSimple StatementCompId="2">
<QueryPlan>
<MissingIndexes>
<MissingIndexGroup Impact="49.9818">
<MissingIndex Database="[tempdb]" Schema="[dbo]" Table="[T1]">
<ColumnGroup Usage="EQUALITY">
<Column Name="[X]" ColumnId="1" />
</ColumnGroup>
</MissingIndex>
</MissingIndexGroup>
</MissingIndexes>
</QueryPlan>
</StmtSimple>
</Statements>
</Batch>
</BatchSequence>
</ShowPlanXML>

Assuming that interpretation of missing impact % is identical or similar with that of avg_user_impact column from sys.dm_db_missing_index_group_stats system view then missing impact % represents (more or less):
Average percentage benefit that user queries could experience if this
missing index group was implemented. The value means that the query
cost would on average drop by this percentage if this missing index
group was implemented.

Thanks for the information everyone. Martin Smith I believe did find a bug as a result of this though I am not sure if it the same bug as what I am seeing. In fact I am not sure if my issue is a bug or by design. Let me elaborate on some new observations:
In looking through this rather large execution plan (62 queries), I noticed the the Missing Index recommendation (and respective Impact %) that i mentioned in the original question is listed on nearly every query in the 62 query batch. Oddly, many of these queries do not even call the table the index is recommended for! After observing this, I opened the XML and searched the element 'MissingIndexes' which showed about 10 different indexes missing all with varying Impact %'s, naturally. Why the execution plan does not show this visually and instead shows just one Missing Indezx, I do not know. I presume it is either 1) a bug or 2) it only shows the missing index with the HIGHEST impact % - which is the one I see riddled throughout my entire plan.
A suggestion if you are experiencing this too: Get comfortable with the XML over the visual execution plan. Search the xml element 'MissingIndexes' and match that up with the statements to get proper results.
I also read from Microsoft http://technet.microsoft.com/en-us/library/ms345524(v=sql.105).aspx
that the missing index stats come from a group of DMVs. If the Impact % is in fact from these DMVs, then I would also presume that Impact % is based on MUCH MUCH more than just the Query/Statement in the execution plan were the index is recommended. So take it with a grain of salt, and use them wisely based your own knowledge of your database.
I am going to leave this opened-ended and not mark anything as an "answer" just yet. Feel free to chime in folks!
Thanks again.

Okay so let me see if I can clarify here.
There will still be costs to those other operations the 0% is because the DELETE on a loop is taking the vast majority of of your processor and IO time. That doesn't however mean those other operations don't have processor/memory/IO costs that can be improved on this query by adding that index. Especially if you are doing a loop essentially your mapping to tableB for one record then deleting out of tableA over and over. Therefore having an index that makes it easier to match those rows will speed up your delete.

Optimizing XQuery projection

I'm getting some horrific performance from an XQuery projection in Sql Server.
What would be the best way to write the following transformation?
select DocumentData.query(
'<object type="dynamic">
<state>
<OrderTotal type="decimal">
{fn:sum(
for $A in /object[1]/state[1]/OrderDetails[1]/object/state[1]
return ($A/ItemPrice[1] * $A/Quantity[1]))}
</OrderTotal>
<CustomerId type="guid">
{xs:string(/object[1]/state[1]/CustomerId[1])}
</CustomerId>
<Details type="collection">
{/object[1]/state[1]/OrderDetails[1]/object}
</Details>
</state>
</object>') as DocumentData
from documents
(I know the code is a bit out of context)
If I check the executionplan for this code, there is about 10+ joins going on.
Should I break this down to use for $var for each level in the structure?
For more context, this is what I'm trying to accomplish:
http://rogeralsing.com/2011/03/02/linq-to-sqlxml-projections/
I'm writing a "Linq to XQuery translator" / NoSQL Document DB emulator, filtering works like a charm, projections suffer from perf problems.

This article is quite useful:
Performance Optimizations for the XML Data Type in SQL Server 2005
In particular it recommends that instead of writing paths of the form...
/object[1]/state[1]/CustomerId[1]
you should instead write...
(/object/state/CustomerId)[1]

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight