SQL SELECT using XML input - sql-server

I've currently got a C# application that responds to HTTP requests. The body of the HTTP request (XML) is passed to SQL Server, at which time the database engine performs the correct instruction. One of the instructions is used to load information about Invoices using the id of the customer(InvoiceLoad):
<InvoiceLoad ControlNumber="12345678901">
<Invoice>
<CustomerID>johndoe#gmail.com</CustomerID>
</Invoice>
</InvoiceLoad>
I need to perform a SELECT operation against the invoice table (which contains the associated email address).
I've tried using:
SELECT 'Date', 'Status', 'Location'
FROM Invoices
WHERE Email_Address = Invoice.A.value(.)
using an xml.nodes('InvoiceLoad/Invoice/CustomerId') Invoice(A)
command.
However, as this query may run THOUSANDS of times per minute, I want to make it as fast as possible. I'm hearing that one way to do this may be to use CROSS APPLY (which I have never used). Is that the solution? If not, how exactly would I go about making this query as fast as possible? Any and all suggestions are greatly appreciated!

I don't see why you would need a call to .nodes() at all - from what I understand, each XML fragment has just a single entry - right?
So given this XML:
<InvoiceLoad ControlNumber="12345678901">
<Invoice>
<CustomerID>johndoe#gmail.com</CustomerID>
</Invoice>
</InvoiceLoad>
you can use this SQL to get the value of the <CustomerID> node:
DECLARE #xmlvar XML
SET #xmlvar = '<InvoiceLoad ControlNumber="12345678901">
<Invoice>
<CustomerID>johndoe#gmail.com</CustomerID>
</Invoice>
</InvoiceLoad>'
SELECT
#xmlvar.value('(/InvoiceLoad/Invoice/CustomerID)[1]', 'varchar(100)')
and you can join this against your customer table or whatever you need to do.
If you have the XML stored in a table, and you always need to extract that value from <CustomerID>, you could also think about creating a computed, persisted column on that table that would extract that e-mail address into a separate column, which you could then use for easy joining. This requires a little bit of work - a stored function taking the XML as input - but it's really quite a nice way to "surface" certain important snippets of data from your XML.
Step 1: create your function
CREATE FUNCTION dbo.ExtractCustomer (#input XML)
RETURNS VARCHAR(255)
WITH SCHEMABINDING
AS BEGIN
DECLARE #Result VARCHAR(255)
SELECT
#Result = #Input.value('(/InvoiceLoad/Invoice/CustomerID)[1]', 'varchar(255)')
RETURN #result
END
So given your XML, you get the one <CustomerID> node and extract its "inner text" and return it as a VARCHAR(255).
Step 2: add a computed, persisted column to your table
ALTER TABLE dbo.YourTableWithTheXML
ADD CustomerID AS dbo.ExtractCustomer(YourXmlColumnHere) PERSISTED
Now, your table that has the XML column has a new column - CustomerID - which will automagically contain the contents of the <CustomerID> as a VARCHAR(255). The value is persisted, i.e. as long as the XML doesn't change, it doesn't have to be re-computed. You can use that column like any other on your table, and you can even index it to speed up any joins on it!

Related

How to get a XML structure from SQL Server stored procedure

I am working on a vb.net application, the management wants me to change the applications data source from SQL Server to XML.
I have a class called WebData.vb in the old application I need to somehow find a way to replace the stored procedures in it and make it read xml. So I was thinking of getting the xml structure from the returning result set of the stored procedure. I looked online and they said that for normal select statement you can do something like this:
FOR xml path ('Get_Order'),ROOT ('Get_Orders')
I am looking for something like
EXEC dbo.spMML_GET_ORDERS_FOR_EXPORT
FOR xml path ('Get_Order'),ROOT ('Get_Orders')
so now that I have the structure I can pass that data to a datatable and then return that datatable to the method.
Also if there is an alternative way in creating a XML stored procedure please let me know thanks coders.
Assuming you can't modify the stored proc (due to other dependencies or some other reason) to have the SELECT within the proc have the FOR XML syntax, you can use INSERT/EXEC to insert the results of the stored proc into a temp table or table variable, then apply your FOR XML onto a query of those results.
Something like this should work:
DECLARE #Data TABLE (...) -- Define table to match results of stored proc
INSERT #Data
EXEC dbo.spMML_GET_ORDERS_FOR_EXPORT
SELECT * FROM #Data FOR xml path ('Get_Order'),ROOT ('Get_Orders')
There are a few methods, one adding namespaces using WITH XMLNAMESPACES(<STRING> AS <NAMESPACE string>). XMLNAMESPACES can embed appropriate XML markers to your tables for use with other applications (which hopefully is a factor here), making documentation a little easier.
Depending on your application use, you can use FOR XML {RAW, PATH, AUTO, or EXPLICIT} in your query, as well as XQUERY methods...but for your needs, stick to the simpler method like XML PATH or XML AUTO.
XML PATH is very flexible, however you lose the straightforward identification of the column datatypes.
XMLNAMESPACE
WITH XMLNAMESPACES('dbo.MyTableName' AS SQL)
SELECT DENSE_RANK() OVER (ORDER BY Name ASC) AS 'Management_ID'
, Name AS [Name]
, Began AS [Team/#Began]
, Ended AS [Team/#Ended]
, Team AS [Team]
, [Role]
FROM dbo.SSIS_Owners
FOR XML PATH, ELEMENTS, ROOT('SQL')
XML AUTO
Because you might want to return to the database, I suggest using XML AUTO with XMLSCHEMA, where the sql datatypes are kept in the XML.
SELECT DENSE_RANK() OVER (ORDER BY Name ASC) AS 'Management_ID'
, Name AS [Name]
, Began AS [Team/#Began]
, Ended AS [Team/#Ended]
, Team AS [Team]
, [Role]
FROM dbo.SSIS_Owners
FOR XML AUTO, ELEMENTS, XMLSCHEMA('SSIS_Owners')
Downside is XMLNAMESPACES is not an option, but you can get around this through solutions like XML SCHEMA COLLECTIONS or in the query itself as I showed.
You can also just use XML PATH directly without the namespace, but again, that depends on your application use as you are transforming everything to XML files.
Also note how I defined the embedded attributes. A learning point here, but think about the query in the same order that the XML would appear. That is why I defined the variable attributes first before I then stated what the text for that node was.
Lastly, I think you'll find Paparazzi has a question on this topic that covers quite. TSQL FOR XML PATH Attribute On , Type

How to optimize replace and find function?

I am trying to create function which can replace certain words with hyperlink in sql. When I call the function as query in sql, its takes a really long time to execute the query, more than 2-3min. I assumed this is because, the tag_libary table has around 600,000 records and iterating through large number, would consume a lot of processing time.
CREATE FUNCTION dbo.ReplaceTags(#body VARCHAR(MAX))
RETURNS VARCHAR(MAX)
AS
BEGIN
SELECT #body = REPLACE(#body,name,''+name+'')
FROM Tag_Library
RETURN #body
END
article table (id, title, body)
1, Story1, At the same time there is a list consisting of: DUCHS, EUROC, GLSPE and WODST. Only two of the tags have covered with the prices in the last three months - GROSV at 99.11 on 8 October and JUBIL at 0s on 11 September.
tag_library table (id, name)
1,DRYDN33
2,DUCHS
3,DRYDN33
4,DRYDN15
5,EUROC
6,DRYDN15
7,GROSV
Hence, I am writing to seek some advice, if there is a way to make this sql function optimal or would it be better to change this function, into a insert trigger?
Please advice, if possible.
Yust a thought, I did not test it:
Change your query to this one:
SELECT
#body = REPLACE(#body,name,''+name+'')
FROM
Tag_Library
WHERE
#body LIKE '%' + name + '%'
This should filter the Tag_Library table to those records which are present in the input string and the SQL Server do not have to process lots of unnecessary records (replaces). BUT It will not prevent to do a full table / index scan to check the table!
You can improve this solution by storing the required tags in a table per articles (and update that table via triggers when the source records/tables are changed). In this case you can use joins to filter the Tag_Library table (instead of the LIKE operator), but it reqires extra codes to maintain the dictionary.
You're focusing on the wrong thing. The problem is that this is a scalar function, and they perform miserably. You should change it to a table-valued function that returns a single row and use APPLY.
See, for example:
http://dataeducation.com/scalar-functions-inlining-and-performance-an-entertaining-title-for-a-boring-post/

How can I insert a 100+k rows of XML into SQL Server 2012 in a fast way

I have the following scenario/requirements for which I am not sure what is the best way to address to perform in the fastest way possible, looking for some guidance of features to use and examples of them, if available
I will receive anywhere between 10k to 100k of entities (in XML format) from a web service that I want to upsert (some rows might exist, others might not).
here are some of the requirements:
The source of the XML is a web service that I'm calling from C# code. Actually two different methods. For one of the methods, the return schema will be something flat that I can map directly to one of my tables. For the other, it will return an XML representation that I might need to work with in C# in order to be able to map it to flat entities for my tables. In that scenario, would it be best to do the modifications needed and then write to file to an XML to use as source?
The returned XML can contain up to 150k entities in XML, that may or may not exist in my tables yet, so I'm looking to upsert them. The files, when written to disk, can weight up to 20 megabytes. I asked if they could do JSON instead of XML, but apparently that's not a choice.
The SQL database is on a different server than my IIS server, so I rather avoid having the SQL server retrieve the XML from a file, I rather pass it from C# as a string or as a Table Value Parameter.
The tables are rather simple and don't have indexes other than the PK ones.
I've never been big on XML, although it got way easier with LINQ to XML, which I was initially using to parse each record and send individual inserts but the performance was just bad, so based on some research I've been doing, I'm thinking I could use:
Upserts from SQL server through MERGE statements.
Pass the whole XML as a parameter and use OPENXML to use as source in the MERGE statement.
Or, somehow generate a Table Value Parameter in C# and pass that to SQL to use on the MERGE.
I read on this similar question (which didnt have access to upsert/merge) that instead of trying to upsert directly from the XML, that it might be better to insert everything to a temporary table and do the merge/upsert against the temporary table?
Would this work and be considerably fast?
If anyone has had a similar scenario, can you share your thoughts/ideas about what combination of features would be best?
Thanks.
You are on the right track. I have a similar setup using XML to transfer data between an online portal and the client-server application. The rest of the setup is very similar to what you have.
The fact that your tables are not indexed is a bit of a concern, if you are comparing any fields that are not PK Fields, regardless of how you index the temp tables. It is important to have either one index with all of the fields used in the merge match clause, or an index for each of them - I find the former yields better performance. Beyond that, using an XML parameter, OpenXML and temp tables is the way to go.
The following code has not been tested, so may need a bit of debugging, but it will put you on the right track. A couple of notes: If all of the fields in the OpenXML WITH clause are attributes, then you can drop the last parameter (i.e. ", 2") and field source specifiers (i.e. "#id" for the detail table). Although the data in your description is flat, in which case you will only need one table, I do often need to import into linked records. I have included a simple master-detail relationship example in the code below, just for the sake of completeness.
CREATE PROCEDURE usp_ImportFromXML (#data XML) AS
BEGIN
/*
<root>
<data>
<match_field_1>1</match_field_1>
<match_field_2>val2</match_field_2>
<data_1>val3</data_1>
<data_2>val4</data_2>
<detail_records>
<detail_data id="detailID1">
<detail_1>blah1<detail_1>
<detail_2>blah2<detail_2>
</detail_data>
<detail_data id="detailID2">
<detail_1>blah3<detail_1>
<detail_2>blah4<detail_2>
</detail_data>
</detail_records>
</data>
<data>
...
</root>
*/
DECLARE #iDoc INT
EXEC sp_xml_preparedocument #iDoc OUTPUT, #data
SELECT * INTO #temp
FROM OpenXML(#iDoc, '/root/data', 2) WITH (
match_field_1 INT,
match_field_2 VARCHAR(50),
data_1 VARCHAR(50),
data_2 VARCHAR(50)
)
SELECT * INTO #detail
FROM OpenXML(#iDoc, '/root/data/detail_data', 2) WITH (
match_field_1 INT '../../match_field_1',
match_field_2 VARCHAR(50) '../../match_field_2',
detail_id VARCHAR(50) '#id',
detail_1 VARCHAR(50),
detail_2 VARCHAR(50)
)
EXEC sp_xml_removedocument #iDoc
CREATE INDEX #IX_temp ON #temp(match_field_1, match_field_2)
CREATE INDEX #IX_detail ON #detail(match_field_1, match_field_2, detail_id)
MERGE data_table a
USING #temp ta
ON ta.match_field_1 = a.match_field_1 AND ta.match_field_2 = a.match_field_2
WHEN MATCHED THEN
UPDATE SET data_1 = ta.data_1, data_2 = ta.data_2
WHEN NOT MATCHED THEN
INSERT (match_field_1, match_field_2, data_1, data_2) VALUES (ta.match_field_1, ta.match_field_2, ta.data_1, ta.data_2)
MERGE detail_table a
USING (SELECT d.*, p._key FROM #detail d, data_table p WHERE d.match_field_1 = p.match_field_1 AND d.match_field_2 = p.match_field_2) ta
ON a.id = ta.id AND a.parent_key = ta._key
WHEN MATCHED THEN
UPDATE SET detail_1 = ta.detail_1, detail2 = ta.detail_2
WHEN NOT MATCHED THEN
INSERT (parent_key, id, detail_1, detail_2) VALUES (ta._key, ta.id, ta.detail_1, ta.detail_2)
DROP TABLE #temp
DROP TABLE #detail
END
Use (3). Process the data ready for upset in C#. C# is made for this kind of algorithmic work. It is both the right programming language as well as the faster programming language. T-SQL is not the right tool. You do not want to use XML with T-SQL for very high performance stuff because it burns CPU like crazy. Instead use the fast TDS protocol to send TVP or bulk data.
Then, send the data to the server using either a TVP or a bulk-insert (SqlBulkCopy) to a temp table. The latter technique is great for very many rows (>10k?). Bulk insert uses special TDS features. It does not use SQL batches to transfer the data. It does not get faster than this.
Then use the MERGE statement as you described. Use big batch sizes, potentially all rows in one batch.
The best way I've found is to bulk insert into a temp table from your C# code, then issue the merge once the data is in SQL Server. I have an example here on my blog SQL Server Bulk Upsert
I use this in production to insert millions of rows daily, and have yet to find a faster way to do it. Give it a try, I think you will be impressed with the performance of the solution.

SQL Server 2000: search through out database

Some how some records in my table are getting updated with value of xyz in a certain column. Out of hundred of stored procedures, functions, triggers, how can I determine which code is doing this action. Is there a way to search through the database some how through each and every script of the code?
Please help.
One approach is to check syscomments
Contains entries for each view, rule,
default, trigger, CHECK constraint,
DEFAULT constraint, and stored
procedure within the database. The
text column contains the original SQL
definition statements..
e.g. select text from syscomments
If you are having trouble finding that literal string, the values could be coming from a table, or they could be being concatenated within a routine.
Try this
Select text from syscomments
where CharIndex('x', text) > 0
and CharIndex('y', text) > 0
and CharIndex('z', text) > 0
That might help you either find the right routine, or further indicate that the values are coming from a table.
This is going to be nearly impossible to do in SQL Server 2000 because the update might very well be from a variable that has that value or a join to another table that has that value and not hard-coded into the stored proc, trigger etc. The update could also be coming from a DTS package, a job, a piece of dynamic code run by the app or even from query analyzer, so the code itself may not be recorded inthe datbase anywhere.
Perhaps a better approach might be to create an audit table for the table in question and have it record the user and the code from the spid that generated the change as well as the old and new values. You'll have to wait until it happens again, but then you would know exactly what changed the value and what value to put it back to if need be.
Alternatively you could run profiler on the system until it happens but profiler tends to hurt performance and is not usually a good idea to run on a production system. If it is happening very often, it might be an acceptable alternative.
Here's a hint as to how you might get some of the info you want for the eventual trigger code you write:
create table #temp (eventtype nvarchar (1000), parameters int, eventinfo nvarchar (4000), myspid int)
declare #myspid int
select #myspid =##spid
insert #temp (eventtype,parameters, eventinfo)
exec ('dbcc inputbuffer (##spid)')
update #temp
set myspid = #myspid
select hostname, program_name, eventinfo
from #temp t
join sysprocesses s on t.myspid = s.spid
WHERE spid = #myspid
You might use sql-profiler to trac the update of a given table / column.

Concatenating rows from different tables into one field

In a project using a MSSQL 2005 Database we are required to log all data manipulating actions in a logging table. One field in that table is supposed to contain the row before it was changed. We have a lot of tables so I was trying to write a stored procedure that would gather up all the fields in one row of a table that was given to it, concatenate them somehow and then write a new log entry with that information.
I already tried using FOR XML PATH and it worked, but the client doesn't like the XML notation, they want a csv field.
Here's what I had with FOR XML PATH:
DECLARE #foo varchar(max);
SET #foo = (SELECT * FROM table WHERE id = 5775 FOR XML PATH(''));
The values for "table", "id" and the actual id (here: 5775) would later be passed in via the call to the stored procedure.
Is there any way to do this without getting XML notation and without knowing in advance which fields are going to be returned by the SELECT statement?
We used XML path and as you've discovered, it works very well. Since one of SQL's features is to store XML properly, the CSV makes no sense.
What you could try is a stored proc that reads out the XML in CSV format (fake it). I would. Since you won't likely be reading the data that much compared to saving it, the overhead is negligible.
How about:
Set #Foo = Stuff(
( Select ',' + MyCol1 + ',' + MyCol2 ...
From Table
Where Id = 5775
For Xml Path('')
), 1, 1, '')
This will produce a CSV line (presuming the inner SQL returns a single row). Now, this solves the second part of your question. As to the first part of "without knowing in advance which fields", there is no means to do this without using dynamic SQL. I.e., you have to build the SQL statement as a string on the fly. If you are going to do that you might as well build the entire CSV result on the fly.

Resources