The Best Way to shred XML data into SQL Server database columns

The Best Way to shred XML data into SQL Server database columns - sql-server

What is the best way to shred XML data into various database columns? So far I have mainly been using the nodes and value functions like so:
INSERT INTO some_table (column1, column2, column3)
SELECT
Rows.n.value('(#column1)[1]', 'varchar(20)'),
Rows.n.value('(#column2)[1]', 'nvarchar(100)'),
Rows.n.value('(#column3)[1]', 'int'),
FROM #xml.nodes('//Rows') Rows(n)
However I find that this is getting very slow for even moderate size xml data.

Stumbled across this question whilst having a very similar problem, I'd been running a query processing a 7.5MB XML file (~approx 10,000 nodes) for around 3.5~4 hours before finally giving up.
However, after a little more research I found that having typed the XML using a schema and created an XML Index (I'd bulk inserted into a table) the same query completed in ~ 0.04ms.
How's that for a performance improvement!
Code to create a schema:
IF EXISTS ( SELECT * FROM sys.xml_schema_collections where [name] = 'MyXmlSchema')
DROP XML SCHEMA COLLECTION [MyXmlSchema]
GO
DECLARE #MySchema XML
SET #MySchema =
(
SELECT * FROM OPENROWSET
(
BULK 'C:\Path\To\Schema\MySchema.xsd', SINGLE_CLOB
) AS xmlData
)
CREATE XML SCHEMA COLLECTION [MyXmlSchema] AS #MySchema
GO
Code to create the table with a typed XML column:
CREATE TABLE [dbo].[XmlFiles] (
[Id] [uniqueidentifier] NOT NULL,
-- Data from CV element
[Data] xml(CONTENT dbo.[MyXmlSchema]) NOT NULL,
CONSTRAINT [PK_XmlFiles] PRIMARY KEY NONCLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
Code to create Index
CREATE PRIMARY XML INDEX PXML_Data
ON [dbo].[XmlFiles] (Data)
There are a few things to bear in mind though. SQL Server's implementation of Schema doesn't support xsd:include. This means that if you have a schema which references other schema, you'll have to copy all of these into a single schema and add that.
Also I would get an error:
XQuery [dbo.XmlFiles.Data.value()]: Cannot implicitly atomize or apply 'fn:data()' to complex content elements, found type 'xs:anyType' within inferred type 'element({http://www.mynamespace.fake/schemas}:SequenceNumber,xs:anyType) ?'.
if I tried to navigate above the node I had selected with the nodes function. E.g.
SELECT
,C.value('CVElementId[1]', 'INT') AS [CVElementId]
,C.value('../SequenceNumber[1]', 'INT') AS [Level]
FROM
[dbo].[XmlFiles]
CROSS APPLY
[Data].nodes('/CVSet/Level/CVElement') AS T(C)
Found that the best way to handle this was to use the OUTER APPLY to in effect perform an "outer join" on the XML.
SELECT
,C.value('CVElementId[1]', 'INT') AS [CVElementId]
,B.value('SequenceNumber[1]', 'INT') AS [Level]
FROM
[dbo].[XmlFiles]
CROSS APPLY
[Data].nodes('/CVSet/Level') AS T(B)
OUTER APPLY
B.nodes ('CVElement') AS S(C)
Hope that that helps someone as that's pretty much been my day.

in my case i'm running SQL 2005 SP2 (9.0).
The only thing that helped was adding OPTION ( OPTIMIZE FOR ( #your_xml_var = NULL ) ).
Explanation is on the link below.
Example:
INSERT INTO #tbl (Tbl_ID, Name, Value, ParamData)
SELECT 1,
tbl.cols.value('name[1]', 'nvarchar(255)'),
tbl.cols.value('value[1]', 'nvarchar(255)'),
tbl.cols.query('./paramdata[1]')
FROM #xml.nodes('//root') as tbl(cols) OPTION ( OPTIMIZE FOR ( #xml = NULL ) )
https://connect.microsoft.com/SQLServer/feedback/details/562092/an-insert-statement-using-xml-nodes-is-very-very-very-slow-in-sql2008-sp1

I'm not sure what is the best method. I used OPENXML construction:
INSERT INTO Test
SELECT Id, Data
FROM OPENXML (#XmlDocument, '/Root/blah',2)
WITH (Id int '#ID',
Data varchar(10) '#DATA')
To speed it up, you can create XML indices. You can set index specifically for value function performance optimization. Also you can use typed xml columns, which performs better.

We had a similar issue here. Our DBA (SP, you the man) took a look at my code, made a little tweak to the syntax, and we got the speed we had been expecting. It was unusual because my select from XML was plenty fast, but the insert was way slow. So try this syntax instead:
INSERT INTO some_table (column1, column2, column3)
SELECT
Rows.n.value(N'(#column1/text())[1]', 'varchar(20)'),
Rows.n.value(N'(#column2/text())[1]', 'nvarchar(100)'),
Rows.n.value(N'(#column3/text())[1]', 'int')
FROM #xml.nodes('//Rows') Rows(n)
So specifying the text() parameter really seems to make a difference in performance. Took our insert of 2K rows from 'I must have written that wrong - let me stop it' to about 3 seconds. Which was 2x faster than the raw insert statements we had been running through the connection.

I wouldn't claim this is the "best" solution, but I've written a generic SQL CLR procedure for this exact purpose - it takes a "tabular" Xml structure (such as that returned by FOR XML RAW) and outputs a resultset.
It does not require any customization / knowledge of the structure of the "table" in the Xml, and turns out to be extremely fast / efficient (although this wasn't a design goal). I just shredded a 25MB (untyped) xml variable in under 20 seconds, returning 25,000 rows of a pretty wide table.
Hope this helps someone:
http://architectshack.com/ClrXmlShredder.ashx

This isn't an answer, more an addition to this question - I have just come across the same problem and I can give figures as edg asks for in the comment.
My test has xml which results in 244 records being inserted - so 244 nodes.
The code that I am rewriting takes on average 0.4 seconds to run.(10 tests run, spread from .56 secs to .344 secs) Performance is not the main reason the code is being rewritten, but the new code needs to perform as well or better. This old code loops the xml nodes, calling a sp to insert once per loop
The new code is pretty much just a single sp; pass the xml in; shred it.
Tests with the new code switched in show the new sp takes on average 3.7 seconds - almost 10 times slower.
My query is in the form posted in this question;
INSERT INTO some_table (column1, column2, column3)
SELECT
Rows.n.value('(#column1)[1]', 'varchar(20)'),
Rows.n.value('(#column2)[1]', 'nvarchar(100)'),
Rows.n.value('(#column3)[1]', 'int'),
FROM #xml.nodes('//Rows') Rows(n)
The execution plan appears to show that for each column, sql server is doing a separate "Table Valued Function [XMLReader]" returning all 244 rows, joining all back up with Nested Loops(Inner Join). So In my case where I am shredding from/ inserting into about 30 columns, this appears to happen separately 30 times.
I am going to have to dump this code, I don't think any optimisation is going to get over this method being inherently slow. I am going to try the sp_xml_preparedocument/OPENXML method and see if the performance is better for that. If anyone comes across this question from a web search (as I did) I would highly advise you to do some performance testing before using this type of shredding in SQL Server

There is an XML Bulk load COM object (.NET Example)
From MSDN:
You can insert XML data into a SQL
Server database by using an INSERT
statement and the OPENXML function;
however, the Bulk Load utility
provides better performance when you
need to insert large amounts of XML
data.

My current solution for large XML sets (> 500 nodes) is to use SQL Bulk Copy (System.Data.SqlClient.SqlBulkCopy) by using a DataSet to load the XML into memory and then pass the table to SqlBulkCopy (defining a XML schema helps).
Obviously there a pitfalls such as needlessly using a DataSet and loading the whole document into memory first. I would like to go further in the future and implement my own IDataReader to bypass the DataSet method but currently the DataSet is "good enough" for the job.
Basically I never found a solution to my original question regarding the slow performance for that type of XML shredding. It could be slow due to the typed xml queries being inherently slow or something to do with transactions and the the SQL Server log. I guess the typed xml functions were never designed for operating on non-trivial node sizes.
XML Bulk Load: I tried this and it was fast but I had trouble getting the COM dll to work under 64bit environments and I generally try to avoid COM dlls that no longer appear to be supported.
sp_xml_preparedocument/OPENXML: I never went down this road so would be interested to see how it performs.

Related

SQL query runs into a timeout on a sparse dataset

For sync purposes, I am trying to get a subset of the existing objects in a table.
The table has two fields, [Group] and Member, which are both stringified Guids.
All rows together may be to large to fit into a datatable; I already encountered an OutOfMemory exception. But I have to check that everything I need right now is in the datatable. So I take the Guids I want to check (they come in chunks of 1000), and query only for the related objects.
So, instead of filling my datatable once with all
SELECT * FROM Group_Membership
I am running the following SQL query against my SQL database to get related objects for one thousand Guids at a time:
SELECT *
FROM Group_Membership
WHERE
[Group] IN (#Guid0, #Guid1, #Guid2, #Guid3, #Guid4, #Guid5, ..., #Guid999)
The table in question now contains a total of 142 entries, and the query already times out (CommandTimeout = 30 seconds). On other tables, which are not as sparsely populated, similar queries don't time out.
Could someone shed some light on the logic of SQL Server and whether/how I could hint it into the right direction?
I already tried to add a nonclustered index on the column Group, but it didn't help.

I'm not sure that WHERE IN will be able to maximally use an index on [Group], or if at all. However, if you had a second table containing the GUID values, and furthermore if that column had an index, then a join might perform very fast.
Create a temporary table for the GUIDs and populate it:
CREATE TABLE #Guids (
Guid varchar(255)
)
INSERT INTO #Guids (Guid)
VALUES
(#Guid0, #Guid1, #Guid2, #Guid3, #Guid4, ...)
CREATE INDEX Idx_Guid ON #Guids (Guid);
Now try rephrasing your current query using a join instead of a WHERE IN (...):
SELECT *
FROM Group_Membership t1
INNER JOIN #Guids t2
ON t1.[Group] = t2.Guid;
As a disclaimer, if this doesn't improve the performance, it could be because your table has low cardinality. In such a case, an index might not be very effective.

tsql - joining two tables from different databases hosted on separate servers

I have got 2 databases sitting on different physical servers and linked. I need to join DB1.T1 with DB2.T2 and create an id. The problem is performance. My senior insists using a function and I have created it below.
IF OBJECT_ID (N'dbo.getXXXId', N'FN') IS NOT NULL
DROP FUNCTION dbo.getTRId;
GO
CREATE FUNCTION dbo.getTRId (#gcPRef bigint)
RETURNS varchar (100)
WITH EXECUTE AS CALLER --may not be necessary. not sure.
AS
BEGIN
DECLARE #TRID varchar (100);
SELECT #TRID = CONVERT(varchar (12), hu2.PropId)
+ '_'+ CONVERT(varchar (12), c.WSId)
FROM [172.29.110.133].DB1.dbo.checks c
join [172.29.110.133].DB1.[dbo].VHier
ON VHier.xx= c.xx
join [172.29.110.133].DB1.[dbo].rvc
ON rvc.xx= VHier.xx
AND rvc.yy= VHier.yy
join [172.29.110.133].DB1.[dbo].HUNIT hu
ON c.xx= hu.xx
WHERE c.CheckId = #gcPRef;
RETURN (#TRID);
END;
GO
I use the query below to query each checkid using the function above.
select getTRId(guestCheckPRef), guestCheckid from DB2.Guest_CHECKS GC
where GC.closeBusinessDate = '2014-06-25'
A couple of things you may like to know:
DB1 and DB2 are hosted on different physical servers.
I am not a DBA so please let me know if I am doing anything wrong.
Approximately 45000 records created daily. so this is the amount of rows..
I have already tried joining them without involving a function. it takes forever. in 30 seconds, 450 records returned only. I cannot keep tables locked for a long time.
CONSTRAINT [DB1.PK_CHECK] PRIMARY KEY CLUSTERED
CONSTRAINT [DB2.XPKGUEST_CHECKS] PRIMARY KEY NONCLUSTERED
I do not know if constraints are playing a role here. DB2.GUEST_CHECKS.guestCheckPRef is NOT even a FK here. guestCheckPRef is PK in DB1.CHECK.
performance is very poor. I need to return DB2.propid + DB2.wsid + DB1.guestCheckid.
This is all I can give for now. Any suggestion is appreciated. It does not have to be done with a function.
Thanks in advance. Regards.Oz.

Here are a few things to try or consider:
Have you checked that the query is using the best available indexes? You could try running the query through the query analyser to see if there's any indexes you could add to improve performance.
What version of SQL Server are you running? Depending on the version you might be able to replicate the table from one server to the other to alleviate the cost of running a query across your network.
I notice that several of the joins are across to the other server - could you consolidate all of those joins into a single view that is optimised using indexes - may result in less network traffic.
Try putting your function on the other server and calling it from the first server to see if there's any performance improvement.

Doing a "select" in a function is generally considered "not a good idea". The select in the function will be repeated once for each row in the result set, which is probably why the performance is bad.
Erp. This was supposed to be a comment, not an answer. To turn this into a proper answer, rewrite the query as a join, without using the function. (I.e. take the contents of the function and integrate it into a single join.)
Your example query should look something like this:
;with getTRID as
(SELECT CONVERT(varchar (12), hu2.PropId)
+ '_'+ CONVERT(varchar (12), c.WSId) AS TRID
FROM [172.29.110.133].DB1.dbo.checks c
join [172.29.110.133].DB1.[dbo].VHier
ON VHier.xx= c.xx
join [172.29.110.133].DB1.[dbo].rvc
ON rvc.xx= VHier.xx
AND rvc.yy= VHier.yy
join [172.29.110.133].DB1.[dbo].HUNIT hu
ON c.xx= hu.xx)
select getTRId.TRID, guestCheckid from DB2.Guest_CHECKS GC
inner join getTRID ON CheckId = guestCheckPRef
where GC.closeBusinessDate = '2014-06-25'
N.B. I'm working from memory here so, please, no flames for syntax errors! Thx.
Steve G.

MAX keyword taking a lot of time to select a value from a column

Well, I have a table which is 40,000,000+ records but when I try to execute a simple query, it takes ~3 min to finish execution. Since I am using the same query in my c# solution, which it needs to execute over 100+ times, the overall performance of the solution is deeply hit.
This is the query that I am using in a proc
DECLARE #Id bigint
SELECT #Id = MAX(ExecutionID) from ExecutionLog where TestID=50881
select #Id
Any help to improve the performance would be great. Thanks.

What indexes do you have on the table? It sounds like you don't have anything even close to useful for this particular query, so I'd suggest trying to do:
CREATE INDEX IX_ExecutionLog_TestID ON ExecutionLog (TestID, ExecutionID)
...at the very least. Your query is filtering by TestID, so this needs to be the primary column in the composite index: if you have no indexes on TestID, then SQL Server will resort to scanning the entire table in order to find rows where TestID = 50881.
It may help to think of indexes on SQL tables in the same way as those you'd find in the back of a big book that are hierarchial and multi-level. If you were looking for something, then you'd manually look under 'T' for TestID then there'd be a sub-heading under TestID for ExecutionID. Without an index entry for TestID, you'd have to read through the entire book looking for TestID, then see if there's a mention of ExecutionID with it. This is effectively what SQL Server has to do.
If you don't have any indexes, then you'll find it useful to review all the queries that hit the table, and ensure that one of those indexes is a clustered index (rather than non-clustered).

Try to re-work everything into something that works in a set based manner.
So, for instance, you could write a select statement like this:
;With OrderedLogs as (
Select ExecutionID,TestID,
ROW_NUMBER() OVER (PARTITION BY TestID ORDER By ExecutionID desc) as rn
from ExecutionLog
)
select * from OrderedLogs where rn = 1 and TestID in (50881, 50882, 50883)
This would then find the maximum ExecutionID for 3 different tests simultaneously.
You might need to store that result in a table variable/temp table, but hopefully, instead, you can continue building up a larger, single, query, that processes all of the results in parallel.
This is the sort of processing that SQL is meant to be good at - don't cripple the system by iterating through the TestIDs in your code.
If you need to pass many test IDs into a stored procedure for this sort of query, look at Table Valued Parameters.

Preferred way to access data within XML columns in SQL Server

Background
Recently I've started to use XML a lot more as a column in SQL Server 2005. During a bit of downtime yesterday, I noticed that two of the link tables I used a really just in the way and it bores me to tears having to write yet more supporting structure code for a couple of joins.
To actually generate the data for these two link tables, I pass in two XML fields to my stored procedure, which writes the main record, breaks the two XML variables down into #tables and inserts them into the actual tables with the new SCOPE_IDENTITY() from the master record.
After some though, I decided to just do away with those tables altogether and just store the XML in XML fields. Now I understand there are some pitfalls here, like general querying performance, GROUP BY doesn't work on XML data. And the query is generally a bit of a mess, but overall I like that I can now work with XElement when I get the data back.
Also, this stuff isn't going to get changed. It's a one shot affair, so I don't have to worry about modification.
I am wondering about the best way to actually get at this data. A lot of my queries involve getting a master record based upon the criteria of a child or even a subchild record. Most of the sprocs in the database do this but on a far more elaborate scale, usually requiring UDFs and Subqueries to work effectively but I have knocked up a trivial example to test querying some data...
INSERT INTO Customers VALUES ('Tom', '', '<PhoneNumbers><PhoneNumber Type="1" Value="01234 456789" /><PhoneNumber Type="2" Value="01746 482954" /></PhoneNumbers>')
INSERT INTO Customers VALUES ('Andy', '', '<PhoneNumbers><PhoneNumber Type="2" Value="07948 598348" /></PhoneNumbers>')
INSERT INTO Customers VALUES ('Mike', '', '<PhoneNumbers><PhoneNumber Type="3" Value="02875 482945" /></PhoneNumbers>')
INSERT INTO Customers VALUES ('Steve', '', '<PhoneNumbers></PhoneNumbers>')
Now I can see two ways of grabbing it.
Method 1
DECLARE #PhoneType INT
SET #PhoneType = 2
SELECT ct.*
FROM Customers ct
WHERE ct.PhoneNumbers.exist('/PhoneNumbers/PhoneNumber[#Type=sql:variable("#PhoneType")]') = 1
Really? sql:variable feels a bit unwholesome. However, it does work. However it's distinctively more difficult to access data in a more meaningful way.
Method 2
SELECT ct.*, pt.PhoneType
FROM Customers ct
CROSS APPLY ct.PhoneNumbers.nodes('/PhoneNumbers/PhoneNumber') AS nums(pn)
INNER JOIN PhoneTypes pt ON pt.ID = nums.pn.value('./#Type[1]', 'int')
WHERE nums.pn.value('./#Type[1]', 'int') = #PhoneType
This is more like it. Already I can easily expand it to do joins and all other good stuff. I've used CROSS APPLY before on a table valued function, and it was very good. The execution plan for this as opposed to the previous query is seriously more advanced. Admittedly I haven't done any indexing and whatnot on these tables, but it's 97% of the entire batch cost.
Method 2 (expanded)
SELECT ct.ID, ct.CustomerName, ct.Notes, pt.PhoneType
FROM Customers ct
CROSS APPLY ct.PhoneNumbers.nodes('/PhoneNumbers/PhoneNumber') AS nums(pn)
INNER JOIN PhoneTypes pt ON pt.ID = nums.pn.value('./#Type[1]', 'int')
WHERE nums.pn.value('./#Type[1]', 'int') IN (SELECT ID FROM PhoneTypes)
Nice IN clause here. I can also do something like pt.PhoneType = 'Work'
Finally
So I'm essentially obtaining the results that I want, but is there anything I should be aware of when using this mechanism to interrogate small amounts of XML data? Will it fall down on performance during elaborate searches? And is the storage of such markup style data too much of an overhead?
Side note
I've used things like sp_xml_preparedocument and OPENXML in the past just to pass lists into sprocs, but this is like a breath of fresh air in comparison!

One approach we've taken for some of our key items of information stored inside an XML column is to "surface" them as computed, persisted properties on the "parent" table. This is done using a little stored function.
It works great, because the value is computed only once every time the XML changes - as long as it's not changing, there's no recomputation, the value is stored on the table like any other column.
It's also great since it can be indexed! So if you're searching and/or joining on such a field - that works like a charm!
So you basically need a stored function along the lines of this:
CREATE FUNCTION [dbo].[GetPhoneNo1](#DataXML XML)
RETURNS VARCHAR(50)
WITH SCHEMABINDING
AS BEGIN
DECLARE #result VARCHAR(20)
SELECT
#result = #DataXML.value('(/PhoneNumbers/PhoneNumber[#Type="1"]/#Value)[1]', 'VARCHAR(50)')
RETURN #result
END
If you don't have a phone number of type 1, you'll just get back a NULL.
Then, you need to extend your parent table with a computed, persisted column:
ALTER TABLE dbo.Customers
ADD PhoneNumberType1 AS dbo.GetPhoneNo1(PhoneNumbers)
As you can see - it works just fine for single entries, but unfortunately, you cannot surface a whole list of properties. But if you have some key items, like ID's or something, that you expect most of your rows to have, this can be a very nice and slick way to get at that information more easily and more efficiently.

Tune Slow SQL Query

I got an app running on my SQL Server that is starting to slow down on a specific task. I ran SQL Profiler and noticed that the
following query is taking an enormous (1-2 minutes) amount of time. I don't have access to the code to change the query.
Is there anything I can tune/change in the database? The PC10000 table in the statement below has approx. 119000 records. I also have the execution plan attached.
SELECT TOP 25
zProjectID, zTaskID, zTransactionNumber, zTransactionDate, zUserID,
zCostCategoryDDL, zCostCategoryString, zSubCostCategory, zSubCostCategoryString,
zDepartmentID, zJournalEntry, zPostingDate, zSalesPostingDate, zPeriodNumber,
zTransactionDescription, zBillingDescriptionLine1, zBillingDescriptionLine2,
zBillingDescriptionLine3, zBillingDescriptionLine4, zSalesAccountIndex,
zSalesAccountString, zDistDocumentTypeDDL, zDistDocumentNumber, zDistSequenceNumber,
zSalesDocumentTypeDDL, zSalesDocumentNumber, zSalesLineNumber, zDistHistoryYear,
zSeriesDDL, zSourceDoc, zWebSource, zOrigDocumentNumber, zOrigDocumentDate,
zOrigID, zOrigName, zExpenseStatusDDL, zApprovalUserIDCost, zAccountIndex,
zAccountNumberString, zBillingStatusDDL, zApprovalUserIDBilling, zBillingWorkQty,
zBillingWorkAmt, zQty, zQtyBilled, zUnitCost,
zUnitPrice, zRevenueAmt, zOriginatingRevenueAmt, zCostAmtEntered, zCostAmt,
zOriginatingCostAmt, zPayGroupID, zPayrollStatusDDL, zTotalTimeStatusDDL,
zEmployeeID, zHoursEntered, zHoursPaid, zPayRecord, zItemID, zItemDescription,
zUofM, zItemQty, zBurdenStatusDDL, zUserDefinedDate, zUserDefinedDate2,
zUserDefinedString, zUserDefinedString2, zUserDefinedCurrency,
zUserDefinedCurrency2, zNoteIndex, zImportType, DEX_ROW_ID
FROM
DBServer.dbo.pc10000
WHERE
(zDistDocumentNumber in
(select cast(JRNENTRY as varchar(20))
from DBServer..GL10001
where BACHNUMB = 'PMCHK00004283')
or zSalesDocumentNumber in
(select cast(JRNENTRY as varchar(20))
from DBServer..GL10001
where BACHNUMB = 'PMCHK00004283'))
ORDER BY
zProjectID ASC ,zTaskID ASC ,zTransactionNumber ASC

The biggest problem you have looks to be due to lack of suitable indexes.
You can see that because of the presence of Table Scans within the execution plan.
Table Scans hit performance as they mean the whole table is being scanned for data that matches the given clauses in the query.
I'd recommend you add an index on BACHNUMB in GL10001
You may also want to try indexes on zDistDocumentNumber and zSalesDocumentNumber in PC10000, but I think the GL10001 index is the main one.
"IN" clauses are typically quite expensive compared to other techniques, but as you can't change the query itself then there's nothing you can do about that.
Without a doubt, you need to add suitable indexes

The query is doing 2 table scans on the GL10001 table. From a quick look at the query (which is a bit hard to read) I would see if you have an index on the BACHNUMB column.

the execution plan shows pretty clearly that actually locating the rows is what's taking all the time (no cumbersome bookmark lookups, or aggregation/rearrange tasks), so it's quite positively going to be a question of indexing. hover the table scans in the execution plan, and check 'object' in the tooltip, to see what columns are being used. see to it that they're indexed.
you might also want to run a trace to sample some live data, and feed that to the database tuning advisor.

You could rewrite those sub-selects as a join, and add an index to GP01..GL10001 on BACHNUMB and JRNENTRY

Since you can't change the query, the best thing you could do is make sure you have indexes on the columns that you're using for your joins (and subqueries). If you can think of a better query plan, you could provide that to SQL Server instead of letting it calculate its own (this is a very rare case).

Replace the OR with a UNION ALL of two queries this should get shot of those spools
i.e. run the query once with something like this
SELECT ....
(zDistDocumentNumber in
(select cast(JRNENTRY as varchar(20))
from DBServer..GL10001
where BACHNUMB = 'PMCHK00004283')
UNION ALL
SELECT ...
zSalesDocumentNumber in
(select cast(JRNENTRY as varchar(20))
from DBServer..GL10001
where BACHNUMB = 'PMCHK00004283'))

In addition to adding indexes, you can also convert the IN statements to EXISTS... something along these lines:
SELECT TOP 25 ....
FROM GP01.dbo.pc10000 parent
WHERE EXISTS
(
SELECT child.*
FROM GP01..GL10001 child
WHERE BACHNUMB = 'PMCHK00004283'
and parent.zDistDocumentNumber = child.JRNENTRY
)
OR EXISTS
(
SELECT child2.*
FROM GP01..GL10001 child2
WHERE BACHNUMB = 'PMCHK00004283'
and parent.zSalesDocumentnumber = child2.JRENTRY
)
ORDER BY zProjectID ASC ,zTaskID ASC ,zTransactionNumber ASC

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

The Best Way to shred XML data into SQL Server database columns - sql-server

There is an XML Bulk load COM object (.NET Example) From MSDN: You can insert XML data into a SQL Server database by using an INSERT statement and the OPENXML function; however, the Bulk Load utility provides better performance when you need to insert large amounts of XML data.

Related

SQL query runs into a timeout on a sparse dataset

tsql - joining two tables from different databases hosted on separate servers

MAX keyword taking a lot of time to select a value from a column

Preferred way to access data within XML columns in SQL Server

Tune Slow SQL Query

Categories

Resources