Alternate nodes in a SQL Server XQuery path? - sql-server

Is there any way to have a sort of "alternates group" like in Regular Expressions, in an XQuery path in SQL Server?
I have this query...
SELECT Q.ROWID QUEUEID, Q.DOCUMENTPACKAGETYPE,
B.R.value('#_ID', 'NAME') PARTYID,
B.R.value('#_BorrowerID', 'NAME') BORROWERID,
B.R.value('#_Name', 'NAME') NAME,
B.R.value('#_EmailAddress', 'NAME') EMAILADDRESS
FROM docutech.QUEUE_EX Q
CROSS APPLY Q.DATA.nodes('LOAN_PUSHBACK_PACKAGE/EVENT_DATA/ESIGN/PARTY') AS B(R)
WHERE Q.REASONFORPUSHBACK = 'DocumentDistribution' AND B.R.value('#_Type', 'NAME') = 'Borrower'
But what I need, is for the CROSS APPLY the ESIGN node in the path can actually be either ESIGN or ECLOSE. So I am looking to do something like the following (thinking in RegEx terms)...
CROSS APPLY Q.DATA.nodes('LOAN_PUSHBACK_PACKAGE/EVENT_DATA/(ESIGN)|(ECLOSE)/PARTY') AS B(R)
Is there any way to do something like this? I'd really hate to have to repeat the same query twice, just for that simple difference, though maybe XQuery doesn't support options like that?
Actually, I just found I can use an asterisk, which will match both, but I'd LIKE to be able to limit it to those known node values if possible. If not, I guess that will do.

I think I got it what you need. Here is conceptual example for you.
The XPath predicate expression is checking that the element names at a particular level belong to a sequence of specified names. The <SomethingElse> element is not a member of the sequence, that's why its data is not retrieved.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, xmldata XML);
INSERT INTO #tbl (xmldata) VALUES
(N'<LOAN_PUSHBACK_PACKAGE>
<EVENT_DATA>
<ESIGN>
<PARTY _Name="one"/>
</ESIGN>
<ECLOSE>
<PARTY _Name="two"/>
</ECLOSE>
<SomethingElse>
<PARTY _Name="three"/>
</SomethingElse>
</EVENT_DATA>
</LOAN_PUSHBACK_PACKAGE>');
-- DDL and sample data population, end
SELECT c.value('#_Name','VARCHAR(20)') AS [Name]
FROM #tbl
CROSS APPLY xmldata.nodes('/LOAN_PUSHBACK_PACKAGE/EVENT_DATA/*[local-name(.)=("ESIGN","ECLOSE")]/PARTY') AS t(c);
Output
+------+
| Name |
+------+
| one |
| two |
+------+

Related

Split a database table column value of type VARCHAR into multiple values using a delimiter?

Can you split a database table column value of type VARCHAR into multiple values using a delimiter?
So when fetching data I would like to split values like text|another|other using pipe | as a delimiter.
We are using stored procedures and T-SQL, I have somewhat basic understanding of SQL, but my understanding is that using STRING_SPLIT https://learn.microsoft.com/en-us/sql/t-sql/functions/string-split-transact-sql?view=sql-server-ver15 ths should be possible.
Can anyone assist me with more details?
It is definitely possible, and pretty straight-forward with STRING_SPLIT. Even before SQL Server 2017 when STRING_SPLIT was introduced it was possible, just not as pretty.
An example with your string would be something like this:
DECLARE #t TABLE (ID INT, StringVal VARCHAR(1000));
INSERT #t
VALUES (1, 'text|another|other');
SELECT *
FROM #t
CROSS APPLY STRING_SPLIT(StringVal, '|');
The results of this are
ID| StringVal | value
--------------------------------
1 | text|another|other | text
1 | text|another|other | another
1 | text|another|other | other
So to answer your question, yes it is definitely possible. However, you should certainly take the comment discussion into consideration.
Well as Panagiotis Kanavos said as a comment, the best and easiest would be to this in the 'Transform' part of your ETL (Extract, Transform, Load).
When you do this, you don't even have to deal with SQL : since you are on SQL-SERVER i assume you use SSIS. You can call an external C# or .Net script to do all the work.
To answer your question, yes SQL provides a lot of builtin function to deals with string : you can use substr() + charindex(), or String_split or even write your own
Microsoft doc about String Functions : https://learn.microsoft.com/en-us/sql/t-sql/functions/string-functions-transact-sql?view=sql-server-ver15
The short answer is: You can do it.
The longer answer is: You shouldn't do it, because it violates the principle of Database Normalization. Anything up to NF3 is the (arguably) bare minimum, which you should stick to.
If the data enters your database from an external source, Transform it in your ETL process before it actually enters your database.
Your database should then store the data in relational form, and other transformations to fit your frontend should be done upon retrival to make it fit your consuming applications. You can read up on the ANSI-SPARC Architecture if you so desire.
Please check the below answer. This might help :
DECLARE #Table TABLE (ID int, SampleData Varchar(100))
INSERT #Table
(ID,SampleData)
VALUES
(1, 'text|another|other'),
(2, 'sample|data|again'),
(3, 'this|should|help')
SELECT
A.ID AS ID,
Split.a.value('.', 'VARCHAR(100)') AS SampleData
FROM (
SELECT
ID,
CAST ('<M>' + REPLACE(SampleData, '|', '</M><M>') + '</M>' AS XML) AS SampleData
FROM
#Table A
) AS A
CROSS APPLY SampleData.nodes ('/M') AS Split(a)
The Output:-
ID SampleData
1 text
1 another
1 other
2 sample
2 data
2 again
3 this
3 should
3 help

not able to identify difference between same value

I have data inside a table's column. I SELECT DISTINCT of that column, i also put LTRIM(RTRIM(col_name)) as well while writing SELECT. But still I am getting duplicate column record.
How can we identify why it is happening and how we can avoid it?
I tried RTRIM, LTRIM, UPPER function. Still no help.
Query:
select distinct LTRIM(RTRIM(serverstatus))
from SQLInventory
Output:
Development
Staging
Test
Pre-Production
UNKNOWN
NULL
Need to be decommissioned
Production
Pre-Produc​tion
Decommissioned
Non-Production
Unsupported Edition
Looks like there's a unicode character in there, somewhere. I copied and pasted the values out initially as a varchar, and did the following:
SELECT DISTINCT serverstatus
FROM (VALUES('Development'),
('Staging'),
('Test'),
('Pre-Production'),
('UNKNOWN'),
('NULL'),
('Need to be decommissioned'),
('Production'),
(''),
('Pre-Produc​tion'),
('Decommissioned'),
('Non-Production'),
('Unsupported Edition'))V(serverstatus);
This, interestingly, returned the values below:
Development
Staging
Test
Pre-Production
UNKNOWN
NULL
Need to be decommissioned
Production
Pre-Produc?tion
Decommissioned
Non-Production
Unsupported Edition
Note that one of the values is Pre-Produc?tion, meaning that there is a unicode character between the c and t.
So, let's find out what it is:
SELECT 'Pre-Produc​tion', N'Pre-Produc​tion',
UNICODE(SUBSTRING(N'Pre-Produc​tion',11,1));
The UNICODE function returns back 8203, which is a zero-width space. I assume you want to remove these, so you can update your data by doing:
UPDATE SQLInventory
SET serverstatus = REPLACE(serverstatus, NCHAR(8203), N'');
Now your first query should work as you expect.
(I also suggest you might therefore want a lookup table for your status' with a foreign key, so that this can't happen again).
DB<>fiddle
I deal with this type of thing all the time. For stuff like this NGrams8K and PatReplace8k and PATINDEX are your best friends.
Putting what you posted in a table variable we can analyze the problem:
DECLARE #table TABLE (txtID INT IDENTITY, txt NVARCHAR(100));
INSERT #table (txt)
VALUES ('Development'),('Staging'),('Test'),('Pre-Production'),('UNKNOWN'),(NULL),
('Need to be decommissioned'),('Production'),(''),('Pre-Produc​tion'),('Decommissioned'),
('Non-Production'),('Unsupported Edition');
This query will identify items with characters other than A-Z, spaces and hyphens:
SELECT t.txtID, t.txt
FROM #table AS t
WHERE PATINDEX('%[^a-zA-Z -]%',t.txt) > 0;
This returns:
txtID txt
----------- -------------------------------------------
10 Pre-Produc​tion
To identify the bad character we can use NGrams8k like this:
SELECT t.txtID, t.txt, ng.position, ng.token -- ,UNICODE(ng.token)
FROM #table AS t
CROSS APPLY dbo.NGrams8K(t.txt,1) AS ng
WHERE PATINDEX('%[^a-zA-Z -]%',ng.token)>0;
Which returns:
txtID txt position token
------ ----------------- -------------------- ---------
10 Pre-Produc​tion 11 ?
PatReplace8K makes cleaning up stuff like this quite easily and quickly. First note this query:
SELECT OldString = t.txt, p.NewString
FROM #table AS t
CROSS APPLY dbo.patReplace8K(t.txt,'%[^a-zA-Z -]%','') AS p
WHERE PATINDEX('%[^a-zA-Z -]%',t.txt) > 0;
Which returns this on my system:
OldString NewString
------------------ ----------------
Pre-Produc?tion Pre-Production
To fix the problem you can use patreplace8K like this:
UPDATE t
SET txt = p.newString
FROM #table AS t
CROSS APPLY dbo.patReplace8K(t.txt,'%[^a-zA-Z -]%','') AS p
WHERE PATINDEX('%[^a-zA-Z -]%',t.txt) > 0;

Need to be able to check column by one by one and for a certain string but can't figure out a way to do so

TablesIs there a way for me to have a String and check within multiple columns in another table in order starting from 1 until I get a match?
I have table with a few fields
Medicine
-----------
Advil
Tylenol
Midol
I need to check it against another table and check column in order for the medicine above.
MedsToTry1 | MedsToTry2 | MedsToTry3 | MedsToTry4 | MedsToTry5 | MedsToTry6 |
------------|------------|------------|------------|------------|------------|
NotAdvil Advil Null Null Null Null
Tylenol Ibuprofen NotTylenol Null Null Null
NotMidol NotAdvil Ibuprofen Midol Null Null
So I have to go through each one of the fields in the first table and search for them in the 'MedsToTry1' field if not there then on 'MedsToTry2' and so on until found.
I've tried concatentation on all the strings in the MedsToTry fields and searching for the string in there but it doesn't guarantee that it'll be in order and I need for 'MedsToTry1' to be checked first.
I tried to use COALESCE but it returns the fields on MedsToTry1 since they're all not null but won't go to MedsToTry2 to see if it's there.
Is there a way for me to do this? Have a String and check within multiple columns in another table in order starting from 1 until I get a match?
If I need to provide more information please let me know. I pretty new to SQL so I'm take any and all help I can get.
Thank you.
Your table is - quite probably - not the real source table. If you can query against this, it was much better. This is - again my guessing - the result of a pivot operation, where a row-wise table is transformed to a side-by-side or column-wise format.
You did not state the expected output. For the next time, or to improve this question, please look at my code and try to prepare a stand-alone example to reproduce your issue yourself. This time I've done it for you:
First we have to declare mockup-tables to simulate your issue:
DECLARE #tblA TABLE(Medicine VARCHAR(100));
INSERT INTO #tblA VALUES
('Advil')
,('Tylenol')
,('Midol');
DECLARE #tblB TABLE(RowId INT,MedsToTry1 VARCHAR(100),MedsToTry2 VARCHAR(100),MedsToTry3 VARCHAR(100),MedsToTry4 VARCHAR(100),MedsToTry5 VARCHAR(100),MedsToTry6 VARCHAR(100));
INSERT INTO #tblB VALUES
(1,'NotAdvil','Advil',Null,Null ,Null,Null)
,(2,'Tylenol','Ibuprofen','NotTylenol',Null ,Null,Null)
,(3,'NotMidol','NotAdvil','Ibuprofen','Midol',Null,Null);
--This is the query (use your own table names to test this in your environment)
SELECT B.RowId
,C.*
FROM #tblB b
CROSS APPLY (VALUES(1,b.MedsToTry1)
,(2,b.MedsToTry2)
,(3,b.MedsToTry3)
,(4,b.MedsToTry4)
,(5,b.MedsToTry5)
,(6,b.MedsToTry6)) C(MedRank,Medicin)
WHERE EXISTS(SELECT 1 FROM #tblA A WHERE A.Medicine=C.Medicin);
The idea in short:
The trick with CROSS APPLY (VALUES... will return each name-numbered column (MedsToTry1, MedsToTry2...) in one row, together with a rank. This way we do not lose the information of the sort order or the position within the table.
The WHERE will reduce the set to rows, where there exists a corresponding medicine in the other table.
The result
RowId MedRank Medicin
1 2 Advil
2 1 Tylenol
3 4 Midol

Parse XML for a column value in a SQL query

I've set up a SQL fiddle to mimic the tables that I currently have which can be found here: http://sqlfiddle.com/#!6/7675e/5
I have 2 tables that I would like to join (Things and ThingData) which is easy enough, but I would like 1 of the columns to be coming from a value that is pulled from parsed XML in one of the columns in ThingData.
Ideally the output would look something like this:
thingID | thingValue | xmlValue
1 | aaa | a
As you can see in the fiddle, I'm able to parse a single XML string at a time, but I'm unsure of how to go from here to using the parsing stuff as a column in a join. Any help would be greatly appreciated.
I updated your SQL Fiddle to demonstrate; the number one issue you're facing is that you're not using an XML type for your XML column. That's going to cause headaches down the road as people shove crap in that column :P
http://sqlfiddle.com/#!6/4c674/2
I have made a change to your query that gets executed .
DECLARE #xml xml
SET #xml = (select thingDataXML from ThingData where thingDataID = 3)
;WITH XMLNAMESPACES(DEFAULT 'http://www.testing.org/a/b/c/d123')
SELECT
t.[thingID]
, t.[thingValue]
, CONVERT(XML,td.[thingDataXML]).value('(/anItem/a1/b1/c1/text())[1]','Varchar(1)') as xmlValue
FROM Things t
join ThingData td
on td.[thingDataID] = t.[thingDataID]
This should join from you main table into the table containing the xml and would then retreive the one value in the xml for you, note that if you would like to return multiple values for the one row you will need to either concatinate the answer or perhaps decide on a crossjoin to then use the data with the other data set to perform some logic with.
EDIT:
The answer for your question would be that both have a use case, the cross join he showed above creates an accesible table to query which uses a tiny bit more memory while the query runs but can be accessed many times, where as the normal xquery value function just reads a node which would only work to read a single value from the xml.
Cross join would be the solution if you would like to construct tabular data out of your xml.

How to get 2 data sources to merge in SSIS?

I am fairly new to SSIS. I came across a situation where i have to use a data flow task. The data source is MS- SQL server 2008 and destination is Sharepoint list. I gave an SQl query for Data source object as
SELECT Customer_ID, Project_ID, Project_Name, Project_Manager_ID, Project_Manager_Name, DeliveryManager1_ID, DM1NAME FROM dbo.LifeScience_Project
WHERE (Customer_ID IN ('1200532', '1200632', '1207916', '1212121', '1217793', '1219351', '1219417', '1219776'))
Now, this is the problem. The customer ids in where clause need to come from a different data source. That would make it look like something as
SELECT Customer_ID, Project_ID, Project_Name, Project_Manager_ID, Project_Manager_Name, DeliveryManager1_ID, DM1NAME FROM dbo.LifeScience_Project
WHERE Customer_ID IN (select customer_id from [Database2].Customer_Master)
Please guide me to implement this.
I would run one query in an execute sql task in which build you a list of customer Id's and store the result in a variable:-
Declare #s varchar(max)
Set #s =''
SELECT #s = #s + '''' + Cast(customer_id as varchar(20)) + ''','
FROM (select customer_id from [Database2].Customer_Master ) As T
Select #s
Then in your dataflow task, the source query would be parameterized, using the variable from the first part.
SELECT Customer_ID, Project_ID, Project_Name, Project_Manager_ID, Project_Manager_Name, DeliveryManager1_ID, DM1NAME FROM dbo.LifeScience_Project
WHERE (Customer_ID IN (?))
You would have to use a Lookup Component within your data flow task. So for example, write your first query within an OLEDB Component by chosing SQL Command from the Data Access Mode just pulling everything, i.e.,
SELECT Customer_ID, Project_ID, Project_Name, Project_Manager_ID, Project_Manager_Name, DeliveryManager1_ID, DM1NAME
FROM dbo.LifeScience_Project
Then connect this source to a Lookup Component and within the Lookup component, instead of choosing a lookup table, select the SQL option and type your second query in there:
select customer_id
from [Database2].Customer_Master
Now do a lookup matching the two Customer_ID fields and only direct output on successful matches. If there are specific ID's you want you can add that to the second query like so:
select customer_id
from [Database2].Customer_Master
WHERE customer_id IN('someid', 'someid2',...)
That's how I would do it.
EDIT:
In response to Sushant's additional questions for clarification.
(1) You want to use the MATCH output when directing your rows NOT NO MATCH.
(2) Yes that is correct, make sure for your data source it points to the other database (Non-SQL2008 one).
(3) In columns Tab you simply match Customer_ID to Customer_ID;
(4) In Advanced tab you don't need to do anything. In the Error tab you can choose to Ignore errors since you don;t care about the non-matches.
(5) Yes that is the correct flow - the prompt when connecting to Sharepoint is asking if you want to redirect the MATCHED output from the Lookup or the UNMATCHED output. You want to choose the MATCHED output as I mentioned in point (1).

Resources