T-SQL xquery .modify method using a wildcard - sql-server

I am working in SQL Server 2014. I have created a stored procedure which does its processing thing, and at the end, takes the final query output and formats it as XML. Due to the nature of the logic in the procedure, sometimes a node must be deleted from the final XML output. Here's a sample of the XML output (for brevity I have not included the root nodes; hopefully they won't be required to answer my question):
<DALink>
<InteractingIngredients>
<InteractingIngredient>
<IngredientID>1156</IngredientID>
<IngredientDesc>Lactobacillus acidophilus</IngredientDesc>
<HICRoot>Lactobacillus acidophilus</HICRoot>
<PotentiallyInactive>Not necessary</PotentiallyInactive>
<StatusCode>Live</StatusCode>
</InteractingIngredient>
</InteractingIngredients>
<ActiveProductsExistenceCode>Exist</ActiveProductsExistenceCode>
<IngredientTypeBasisCode>1</IngredientTypeBasisCode>
<AllergenMatch>Lactobacillus acidophilus</AllergenMatch>
<AllergenMatchType>Ingredient</AllergenMatchType>
</DALink>
<ScreenDrug>
<DrugID>1112894</DrugID>
<DrugConceptType>RxNorm_SemanticClinicalDr</DrugConceptType>
<DrugDesc>RxNorm LACTOBACILLUS ACIDOPHILUS/PECTIN ORAL capsule</DrugDesc>
<Prospective>true</Prospective>
</ScreenDrug>
In the procedure I have code that looks at the XML structure above and deletes a node when it shouldn't be there because it's easier to modify the xml than tweak the query output. Here is a sample of that code:
SET #xml_Out.modify('declare namespace ccxsd="http://schemas.foobar.com/CC/v1_3"; delete //ccxsd:InteractingIngredients[../../../ccxsd:ScreenDrug/ccxsd:DrugConceptType="OneThing"]');
SET #xml_Out.modify('declare namespace ccxsd="http://schemas.foobar.com/CC/v1_3"; delete //ccxsd:InteractingIngredients[../../../ccxsd:ScreenDrug/ccxsd:DrugConceptType="AnotherThing"]');
SET #xml_Out.modify('declare namespace ccxsd="http://schemas.foobar.com/CC/v1_3"; delete //ccxsd:InteractingIngredients[../../../ccxsd:ScreenDrug/ccxsd:DrugConceptType="SomethingElse"]');
SET #xml_Out.modify('declare namespace ccxsd="http://schemas.foobar.com/CC/v1_3"; delete //ccxsd:InteractingIngredients[../../../ccxsd:ScreenDrug/ccxsd:DrugConceptType="SomethingElseAgain"]');
SET #xml_Out.modify('declare namespace ccxsd="http://schemas.foobar.com/CC/v1_3"; delete //ccxsd:InteractingIngredients[../../../ccxsd:ScreenDrug/ccxsd:DrugConceptType="RxNorm*"]');
The final command is the one I can't figure out how to make work. All I need to do is to look for instances where the element "DrugConceptType" starts with the string "RxNorm", because there are multiple versions of the string that can possibly occur.
I have Googled and StackOverFlowed at length, but perhaps because of my inexperience in this area I didn't ask the question correctly.
Is there a relatively easy way to re-write the final .modify statement above to use a wildcard after "RxNorm"?

Your reduction of the root node is a problem acutally, as you are using the namespace "ccxsd" and your XML does not show this.
Anyway, better, than to write the declare namespace ... over and over, was this as first line of your statement:
;WITH XMLNAMESPACES('http://schemas.fdbhealth.com/CC/v1_3' AS ccxsd)
Well, as a .modify() is a one statement call it doesn't make such a difference...
But to your question of a wildcard
This would delete all nodes, where the Element's content starts with "RxNorm":
SET #xml.modify('delete //*[fn:substring(.,1,6)="RxNorm"]');
Be aware of missing namespaces... Cannot test it...
EDIT: A simplified working example:
You have to check this with your actual XML (with root and namespace)
DECLARE #xml_Out XML=
'<DALink>
<InteractingIngredients>
<InteractingIngredient>
<IngredientID>1156</IngredientID>
<IngredientDesc>Lactobacillus acidophilus</IngredientDesc>
<HICRoot>Lactobacillus acidophilus</HICRoot>
<PotentiallyInactive>Not necessary</PotentiallyInactive>
<StatusCode>Live</StatusCode>
</InteractingIngredient>
</InteractingIngredients>
<ActiveProductsExistenceCode>Exist</ActiveProductsExistenceCode>
<IngredientTypeBasisCode>1</IngredientTypeBasisCode>
<AllergenMatch>Lactobacillus acidophilus</AllergenMatch>
<AllergenMatchType>Ingredient</AllergenMatchType>
</DALink>
<ScreenDrug>
<DrugID>1112894</DrugID>
<DrugConceptType>RxNorm_SemanticClinicalDr</DrugConceptType>
<DrugDesc>RxNorm LACTOBACILLUS ACIDOPHILUS/PECTIN ORAL capsule</DrugDesc>
<Prospective>true</Prospective>
</ScreenDrug>';
SET #xml_Out.modify('delete //InteractingIngredients[fn:substring((../../ScreenDrug/DrugConceptType)[1],1,6)="RxNorm"]');
SELECT #xml_Out;

Related

Loading data in Snowflake using bind variables

We're building dynamic data loading statements for Snowflake using the Python interface.
We want to create a stage at query runtime, and use that stage in a subsequent statement. Table and stage names are dynamic using bind variable.
Yet, it doens't seem like we can find the correct syntax as we tried everything on https://docs.snowflake.com/en/user-guide/python-connector-api.html
COPY INTO IDENTIFIER( %(table_name)s )(SRC, LOAD_TIME, ROW_HASH)
FROM (SELECT t.$1, CURRENT_TIMESTAMP(0), MD5(t.$1) FROM "'%(stage_name)s'" t)
PURGE = TRUE;
Is this even possible? Does it work for anyone?
Your code does not create stage as you mentioned, and you don't need create a stage, instead use table stage or user stage. The SQL below uses table stage.
You also need to change your syntax a little and use more pythonic way : f-strings
sql = f"""COPY INTO {table_name} (SRC, LOAD_TIME, ROW_HASH)
FROM (SELECT t.$1, CURRENT_TIMESTAMP(0), MD5(t.$1) FROM #%{table_name} t)
PURGE = TRUE"""

Updating the xml data nodes by passing the SQL variable

I have the following XML and I need to update the specified one based on the parameter:
<DOPremium>
<BasePremium>337500</BasePremium>
<TotalPremium>337500</TotalPremium>
<NettPremium>337500</NettPremium>
<GrossPremium>337500</GrossPremium>
<OptionId>0</OptionId>
</DOPremium>
<DOPremium>
<BasePremium>337500</BasePremium>
<TotalPremium>337500</TotalPremium>
<NettPremium>337500</NettPremium>
<GrossPremium>337500</GrossPremium>
<OptionId>1</OptionId>
</DOPremium>
<DOPremium>
<BasePremium>337500</BasePremium>
<TotalPremium>337500</TotalPremium>
<NettPremium>337500</NettPremium>
<GrossPremium>337500</GrossPremium>
<OptionId>2</OptionId>
</DOPremium>
I'm trying to update the respective nodes based on the selection of the DOPremium object, but I'm not able to do that. Can someone verify where I'm wrong?
SET #NewXmlValue = N' <BasePremium>[sql:variable("#R15_premium")]</BasePremium>'
SET #DataXml.modify('delete /*/Premiums/DOPremium/BasePremium[sql:variable("#OptionID")]')
SET #DataXml.modify('insert sql:variable("#NewXmlValue") into (/*/Premiums/DOPremium[sql:variable("#OptionID")])[1]')
-- Add TotalPremium
SET #NewXmlValue = N' <TotalPremium>[sql:variable("#R15_premium")]</TotalPremium>'
SET #DataXml.modify('delete /*/Premiums/DOPremium/TotalPremium[sql:variable("#OptionID")]')
SET #DataXml.modify('insert sql:variable("#NewXmlValue") into (/*/Premiums/DOPremium[sql:variable("#OptionID")])[1]')
OK, first of all, this can't work:
SET #NewXmlValue = N'<BasePremium>[sql:variable("#R15_premium")]</BasePremium>'
sql:variable() is only interpreted as a function in an XQuery operation. This isn't an XQuery operation, so it will just textually insert sql:variable(...). If you want an actual XML node with the text value of the variable, you have to be a little more roundabout:
SET #NewXmlValue = '';
SET #NewXmlValue = (SELECT #NewXmlValue.query('<BasePremium>{sql:variable("#R15_premium")}</BasePremium>'));
This approach (and others) can be found in the docs. (In this very simple case concatenating the strings in T-SQL also works, of course, but in general that's not a good idea because it doesn't take care of escaping the XML when necessary.)
The syntax for selecting the desired DOPremium node also needs work -- /BasePremium[sql:variable("#OptionID")] is legal, but it means "the BasePremium node that, sequentially numbering from 1, has number #OptionID". If #OptionID is supposed to match what's in OptionID, that's not the way to write it.
If your intent was to write "change the contents of the BasePremium value of the node with OptionID text equal to #OptionID to the value #R15_premium", here's how you do that (well, one way to do that):
SET #DataXml.modify('
replace value of (
/*
/Premiums
/DOPremium[child::OptionId/.=sql:variable("#OptionID")]
/BasePremium
/text()
)[1]
with sql:variable("#R15_premium")')
And something similar for TotalPremium. You can, of course, also replace entire nodes, but that seems unnecessary here.

Solr deletions with custom full import

I'm trying to use the DataImportHandler to keep my index in sync with a SQL database (what I would think is a pretty vanilla thing to do). Since my database will be pretty large, I want to use incremental imports using this method http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport so the calls are of the form http://localhost:8983/solr/Items/dataimport?command=full-import&clean=false. This works perfectly well for adding items.
I have a separate DeletedItems table in my database which contains the primary keys of the items that have been deleted from the Items table, along with when they were deleted. As part of the DataImport call, I had hoped to be able to delete the relevant items from my index based on a query along the lines of
SELECT Id FROM DeletedItems WHERE WasDeletedOn > '${dataimporter.last_index_time}'
but I can't figure out how to do this. The link above alludes to it with the cryptic
In this case it means obviously that in case you also want to use deletedPkQuery then when running the delta-import command is still necessary.
but setting deletedPkQuery to the above SQL query doesn't seem to work. I then read that deletedPkQuery only works with delta-imports, so I am forced to make two requests to my solr server as part of the sync process? This doesn't seem right as the operations are parameterized by the dataimporter.last_index_time property, which changes. Both steps would need to be done in one "atomic" action, surely? Any ideas?
You must use the import handler special commands
https://wiki.apache.org/solr/DataImportHandler#Special_Commands
With these commands you can alter the boost or delete a document coming from the recordset of the full import query. Be aware that you must use the $skipDoc field to avoid that the document gets indexed again and that you must repeat the id in the $deleteDocById field.
You can use a union query
select
id,
text,
'false' as [$deleteDocById],
'false' as [$skipDoc]
from [rows to update or add]
Union Select
id,
'' as text,
id as [$deleteDocById],
true as [$skipDoc]
or a case when
select
id,
text,
CASE
when deleted = 1 then id
else 'false'
END as [$deleteDocById],
CASE
when deleted = 1 then 'true'
else 'false'
END as [$skipDoc]
Where updated > ${dih.last_index_time}
The deletedPkQuery is run as part of the regular call to delta-import, so you don't have to run anything twice (and when doing a full-import, there's no need to run deletedPkQuery, since the whole connection is cleared before importing anyway).
The deletedPkQuery should be configured on the same element as your main query. Be sure to match the field names exactly as well, and that the id produced by your deletedPkQuery matches the one provided by the main query.
There's a minimal example on solr.pl for importing and deleting fields using the same deleted_entries-table structure as you have here:
<entity
name="album"
query="SELECT * from albums"
deletedPkQuery="SELECT deleted_id as id FROM deletes WHERE deleted_at > '${dataimporter.last_index_time}'"
>
Also make sure that the format of the deleted_at-field is comparable against the value produced by last_index_time. The default is yyyy-MM-dd HH:mm:ss.
.. and lastly, remember that the last_index_time property isn't available before the second time the task is run, since there's no "previous index time" the first time an index is being populated (but the deletedPkQuery shouldn't run before that anyway).

SQL Server XQuery with Default Namespace

I've got some XML Data in a SQL Server Table in an XML Column as follows:
<AffordabilityResults>
<matchlevel xmlns="urn:callcredit.co.uk/soap:affordabilityapi2">IndividualMatch</matchlevel>
<searchdate xmlns="urn:callcredit.co.uk/soap:affordabilityapi2">2013-07-29T11:20:53</searchdate>
<searchid xmlns="urn:callcredit.co.uk/soap:affordabilityapi2">{E40603B5-B59C-4A6A-92AB-98DE83DB46E7}</searchid>
<calculatedgrossannual xmlns="urn:callcredit.co.uk/soap:affordabilityapi2">13503</calculatedgrossannual>
<debtstress xmlns="urn:callcredit.co.uk/soap:affordabilityapi2">
<incomedebtratio>
<totpaynetincome>0.02</totpaynetincome>
<totamtunsecured>0.53</totamtunsecured>
<totamtincsec>0.53</totamtincsec>
</incomedebtratio>
</debtstress>
</AffordabilityResults>
You'll note that some of the elements have an xmlns attribute and some don't...
I need to write queries to return the data - and more importantly show a business analyst how to write her own queries to get the data she needs so I want it to be as simple as possible.
I can query the data easily using the WITH XMLNAMESPACES element as follows:
WITH XMLNAMESPACES (N'urn:callcredit.co.uk/soap:affordabilityapi2' as x )
SELECT
ResponseXDoc.value('(/AffordabilityResults/x:matchlevel)[1]','varchar(max)' ) AS MatchLevel
, ResponseXDoc.value('(/AffordabilityResults/x:debtstress/x:incomedebtratio/x:totamtunsecured)[1]','nvarchar(max)' ) AS UnsecuredDebt
FROM [NewBusiness].[dbo].[t_TacResults]
But adding the x: part to the query makes it look overly complicated, and I want to keep it simple for the business analyst.
I tried adding:
WITH XMLNAMESPACES (DEFAULT 'urn:callcredit.co.uk/soap:affordabilityapi2' )
and removing the x: from the XQuery - but this returns null (possibly because of the lack of the xmlns on the root element?)
Is there any way I can simplify these queries either with or without the default namespace?
If namespaces are not important in your use case, you could use the namespace wildcard selector *:, which both selects nodes without and with arbitrary namespaces.
An example query could be
(/*:AffordabilityResults/*:matchlevel)[1]
The business analyst will still have to add the selector in front of every node test, but it's the same "prefix" all the time and the only error to be expected is forgetting to use it somewhere.

How do I search a "Property Bag" table in SQL?

I have a basic "property bag" table that stores attributes about my primary table "Card." So when I want to start doing some advanced searching for cards, I can do something like this:
SELECT dbo.Card.Id, dbo.Card.Name
FROM dbo.Card
INNER JOIN dbo.CardProperty ON dbo.CardProperty.IdCrd = dbo.Card.Id
WHERE dbo.CardProperty.IdPrp = 3 AND dbo.CardProperty.Value = 'Fiend'
INTERSECT
SELECT dbo.Card.Id, dbo.Card.Name
FROM dbo.Card
INNER JOIN dbo.CardProperty ON dbo.CardProperty.IdCrd = dbo.Card.Id
WHERE (dbo.CardProperty.IdPrp = 10 AND (dbo.CardProperty.Value = 'Wind' OR dbo.CardProperty.Value = 'Fire'))
What I need to do is to extract this idea into some kind of stored procedure, so that ideally I can pass in a list of property/value combinations and get the results of the search.
Initially this is going to be a "strict" search meaning that the results must match all elements in the query, but I'd also like to have a "loose" query so that it would match any of the results in the query.
I can't quite seem to wrap my head around this one. My previous version of this was to do generate some massive SQL query to execute with a lot of AND/OR clauses in it, but I'm hoping to do something a little more elegant this time. How do I go about doing this?
it seems to me that you have an EAV model here.
if you're using sql server 2005 and up i'd suggest you use XML datatype for this:
http://weblogs.sqlteam.com/mladenp/archive/2006/10/14/14032.aspx
makes searching and stuff much easier with built in xml querying capabilities.
if you can't change your model then look at this:
http://weblogs.sqlteam.com/davidm/articles/12117.aspx

Resources