Split data from strings into columns - sql-server

I have a column with a long string. The data needs split into columns and there are variable lengths of strings with not always the same amount of columns. Not exactly sure how to do this so was looking for some advice here.
Lets say I have this string:
VS5~MedCond1~35.4|VS4~MedCond2~16|VS1~MedCond3~155|VS2~MedCond4~70|SPO2~MedCond5~100|VS3~MedCond6~64|FiO2~MedCond7~21|MAP~MedCond8~98|
And in some cases the string might not have all the medical conditions just some of them.
I need to split into columns where the column name is in between the tilds i.e. MedCond1 and the value would be the value to the right of the tild but before the pipe and end up like this:
MedCond1 MedCond2 MedCond3 MedCond4 MedCond5 MedCond6 MedCond7 MedCond8
======== ======== ======== ======== ======== ======== ======== ========
35.1 24 110 64 100 88 21 79
I need to do this for a lot of rows within a large table and as I said not all the columns are always present but they will not be different names, you might have med cond 1- 8, then in another set have med cond 3, 4, 7.
Here is a query I created that is kind of what I want but not dynamic so it is picking up the values with some extra bits of the string
select MainCol, case when charindex('MedCond1', MainCol) > 0 then
substring(MainCol, charindex('MedCond1', MainCol) + 9, 4) end as [MedCond1]
from MedTable
Will return
MedCond1
========
35.3
40.2
33.6
33|V <--- Problem
As you can see the numeric value is sometimes picked up with additional part of the string due to hard coding of the charindex number. The value is sometimes 4 characters long with a decimal place, sometimes 2 long with no decimal place. I would like to make this dynamic. The pipe defines the end of the data I need and the start is defined by the tild at the end of the column name.
Thanks for any thoughts on making this dynamic
Andrew

This data looks like a table itself. It could have been stored in SQL Server as xml. SQL Server supports xml fields and allows querying them. In fact, one could try to convert this string to XML, then try to query it:
declare #medTable table (item nvarchar(2000))
insert into #medTable
values ('VS5~MedCond1~35.4|VS4~MedCond2~16|VS1~MedCond3~155|VS2~MedCond4~70|SPO2~MedCond5~100|VS3~MedCond6~64|FiO2~MedCond7~21|MAP~MedCond8~98|');
-- Step 1: Replace `|` with <item> tags and `~` with `tag` tags
-- This will return an xml value for each medTable row
with items as (
select xmlField= cast('<item><tag>'
+ replace(
replace(item,'|','</tag></item><item><tag>'),
'~','</tag><tag>' )
+ '</tag></item>' as xml)
from #medTable
)
-- Step 2: Select different tags and display them as fields
select
y.item.value('(tag/text())[1]','nvarchar(20)'),
y.item.value('(tag/text())[2]','nvarchar(20)'),
y.item.value('(tag/text())[3]','nvarchar(20)')
from items outer apply xmlField.nodes('item') as y(item)
The result is :
-------------------- -------------------- -------
VS5 MedCond1 35.4
VS4 MedCond2 16
VS1 MedCond3 155
VS2 MedCond4 70
SPO2 MedCond5 100
VS3 MedCond6 64
FiO2 MedCond7 21
MAP MedCond8 98
NULL NULL NULL
It would be better to perform this conversion when loading the data though. It's easier for example, to make the replacements in C# or SSIS and store a complete xml value in the database.
You can modify this query too, to generate the xml value and store it in the database:
declare #medTable2 table (xmlField xml)
with items as (
select xmlField= cast('<item><tag>' + replace(replace(item,'|','</tag></item><item><tag>'),'~','</tag><tag>' ) + '</tag></item>' as xml)
from #medTable
)
insert into #medTable2
select items.xmlField
from items
-- Query the new table from now on
select
y.item.value('(tag/text())[1]','nvarchar(20)'),
y.item.value('(tag/text())[2]','nvarchar(20)'),
y.item.value('(tag/text())[3]','nvarchar(20)')
from #medTable2 outer apply xmlField.nodes('item') as y(item)

OK, let me take a stab at this. The solution I'm outlining is not going to be purely SQL Server, however, it uses a round-trip via a text-file.
The approach uses the following steps:
Unpivot the data delimited by the pipe symbols (to create more than one line of output for each line of input)
Round-trip the data from SQL Server to a text file and back
Separate the data into columns on the tilde ~ symbol delimiter
Pivot the data back into columns
The key benefit of this approach is the unpivot operation, which allows you to handle missing columns like MedCond2 naturally by the absence of an equivalent row. It also eliminates nearly all string manipulation, save for the one REPLACE function in step 1 below.
Given a single row's contents like the following:
VS5~MedCond1~35.4|VS4~MedCond2~16|VS1~MedCond3~155|VS2~MedCond4~70|SPO2~MedCond5~100|VS3~MedCond6~64|FiO2~MedCond7~21|MAP~MedCond8~98|
Step 1 (Unpivot): Find and replace all instances of the pipe symbol with a newline character. So, REPLACE(column, '|', CHAR(13)) will give you the following lines of text (i.e. multiple lines of text in a single database row) for a single input row:
VS5~MedCond1~35.4
VS4~MedCond2~16
VS1~MedCond3~155
VS2~MedCond4~70
SPO2~MedCond5~100
VS3~MedCond6~64
FiO2~MedCond7~21
MAP~MedCond8~98
Step 2 (Round-trip): Write the above output to a text file, using your tool of choice (SSIS, SQLCMD, etc.) and ensure that the newline character defined is the same as that used in the REPLACE command in step 1.
The purpose of this step is to concatenate multiple lines within the same row with other lines in different rows.
Note that steps 1 can be eliminated by defining the row delimiter for steps 2 & 3 as the pipe symbol. I've put in the additional step 1 using newlines only to make it easier to understand and debug.
Step 3 (Separate columns): Import the text file back into SQL Server using the same tool, and define the column delimiter as the tilde ~ symbol, row delimiter same as in steps 1/2.
ColA MedCondTitle MedCondValue
------ ------------- -------------
VS5 MedCond1 35.4
VS4 MedCond2 16
VS1 MedCond3 155
VS2 MedCond4 70
SPO2 MedCond5 100
VS3 MedCond6 64
FiO2 MedCond7 21
MAP MedCond8 98
Step 4 (Pivot): Now you'd have a trivially simple step of pivoting rows to columns, which can be achieved with a statement of the form:
SUM(CASE WHEN MedCondTitle='MedCond1' THEN MedCondValue ELSE 0) as MedCond1

Related

Use result of sql query to replace text on multiple xml files

i have a table on sql like this:
CD_MATERIAL | CD_IDENTIFICACAO
1 | 002323
2 | 00322234
... | ...
AND SO ON (5000+ lines)
I need to use that info to search and replace multiple external xml files on a folder (all the tags on those XML had numbers like the CD_IDENTIFICACAO from sql query, i need to replace with corresponding cd_material from sql query "ex.: 002323 becomes 1)
I used this query to extract all the cd_identificacao to use on Notepad++:
declare #result varchar(max)
select #result = COALESCE(#result + '', '') + CONCAT('(',CD_IDENTIFICACAO,')|') from TBL_MATERIAIS WHERE CD_IDENTIFICACAO <> '' ORDER BY CD_MATERIAL
select #result
That would bring me ex.:
(1TEC45D025)|(1TEC800039)|(999999999)|(542251)|(2TEC58426)|(234852)
and changed the parameters to get the replace ex.:
(? 2000)|(? 2001)|(? 2002)|(? 2003)|(? 2004)|(? 2005)
but i don't know how to add a number (increment) on front of "?" so notepad++ would understand it (search and replace would have 5000+ results, so it's not pratical to manually add the increment).
I was able to get a workaround for this. I've used this query to get all the the terms for find and replace i needed (1 per line)
select concat('<cProd>',cd_identificacao,'</cProd>'), concat('<cProd>',cd_material,'</cProd>') from tbl_materiais where cd_identificacao <> '' order by cd_material
That would result in:
<cProd>1TEC460054</cProd> <cProd>1</cProd>
<cProd>1TEC240035</cProd> <cProd>2</cProd>
(i added the tag too to make sure no other information could be replaced as there were many number combinations that could lead to incorrect replacement)
then pasted it on a txt and i used the notepad++ to replace the space between column 1 and 2 for /r/n wich would result in:
<cProd>1TEC460054</cProd>
<cProd>1</cProd>
<cProd>1TEC240035</cProd>
<cProd>2</cProd>
then i used "Ecobyte Replace Text" Tool, pasted my result file as new selection in bottom frame, loaded all my files on a new replace group on top frame (on properties of the group, u can change directory and options), then executed the replacement, it worked perfectly.
Thx.

Splitting contents of one sql column into 3 columns based on certain characters that always happen in the value

I'm trying to form a SQL query, using SQL Server 2014 without creating a function. I do not have permissions on the database to create functions so I have to do it with a query only.
I have a column named Test with the example value of:
Accounting -> Add Missing functionality in Payable -> Saving a blank Missing row
I want my query to return the information (of varying length) between the two arrows (->). I have tried the right, left, substring, charindex and patindex functions and various combinations of each.
Basically the query needs to be SUBSTRING(Test, CHARINDEX(' -> ', TEST) +3, <some length here>)
The length is the part I'm having a hard time figuring out. I need the full length minus the first part before and including the first -> which evaluates to:
Add Missing functionality in Payable -> Saving a blank Missing row
From that result, I need to remove everything after and including the ->, which would then leave me with:
Add Missing functionality in Payable
At the end of the day, I want to split this one column up into 3 like so:
Domain | Feature | Test
------------------------------------------------------------------------------
Accounting | Add Missing functionality in Payable | Saving a blank Missing row
Can anyone show me how to do this query, without having to write a function? Any suggestions would be greatly appreciated as I have been working on this one portion of the query for the better part of 4 hours now. Thank you in advance for your help. Have a great day!!
I tried the following query and it is woking fine for me:
DECLARE #X as varchar(1000)
SET #X = 'Accounting -> Add Missing functionality in Payable -> Saving a blank Missing row'
SELECT SUBSTRING(#X,1,CHARINDEX('->',#X) - 1) AS Domain,
SUBSTRING(#X,CHARINDEX('->',#X) + 2,LEN(SUBSTRING(#X,CHARINDEX('->',#X) + 2,LEN(#X))) - LEN(SUBSTRING(#X,LEN(#X) - CHARINDEX('>-',REVERSE(#X)) ,LEN(#X)))) AS Feature,
SUBSTRING(#X,LEN(#X) - CHARINDEX('>-',REVERSE(#X)) + 2 ,LEN(#X)) AS Test
You have to use this query:
SELECT SUBSTRING([Test],1,CHARINDEX('->',[Test]) - 1) AS Domain,
SUBSTRING([Test],CHARINDEX('->',[Test]) + 2,LEN(SUBSTRING([Test],CHARINDEX('->',[Test]) + 2,LEN([Test]))) - LEN(SUBSTRING([Test],LEN([Test]) - CHARINDEX('>-',REVERSE([Test])) ,LEN([Test])))) AS Feature,
SUBSTRING([Test],LEN([Test]) - CHARINDEX('>-',REVERSE([Test])) + 2 ,LEN([Test])) AS Test
FROM MyTable --Replace MyTable with your table name

Query for pattern separated by new lines

I have a table (defect ) where a column stores a text. Each line in this text represents a version. (this is clearquest database running microsoft SQL, accessed via JDBC)
For example, following data represents three versions a fix is made.
defect version_fixed
1 2015.1.1
2 2015.1.1\n2015.1.13
3 2015.1.12\n2015.1.1
4 2015.1.12\n2015.1.1\n2015.1.13
5 2015.1.13\n2015.1.10
5 2015.1.100
As you see the version is not stored in an order. It can appear anywhere.
I am interested in all rows with fix version fixed containing "2015.1.1". But my query either gets more rows or skips some
version_fixed like '%2016.1.1%' (gets row 5 as it matches the pattern)
version_fixed like '%2016.1.1\n'(does not get any thing.)
I am looking for query to get exact list for 2015.1.1
defect version_fixed
1 2015.1.1
2 2015.1.1\n2015.1.13
3 2015.1.12\n2015.1.1
4 2015.1.12\n2015.1.1\n2015.1.13
How can I query where text matches with "exact string, delimited by new line or end of text". What is the correct way to escape new line?
Side note: Current solution is to get all records(including unwanted one and then filter out incorrect results)
You could try this. It relies on Sql Server adding the newline to the string when you break the line.
create table defect( version_fixed varchar(max) )
insert into defect( version_fixed )
values ( '2015.1.1' )
, ( '2015.1.1
2015.1.13' )
, ( '2015.1.12
2015.1.1' )
, ( '2015.1.12
2015.1.1
2015.1.13')
, ( '2015.1.13
2015.1.10' )
, ( '2015.1.100' )
-- break to a new line and Sql Server will include the newline character in the string
select * from defect where version_fixed like '%2015.1.1
%' or version_fixed like '%2015.1.1'
You can as the below:
WHERE '\' + version_fixed + '\' LIKE '%2015.1.1\%'
This solution depands on your sample data.

Attempting to run a while loop in my select statement under cases in SQL Server 2012

The Data
Let us say I have a field in SQL that consists of multi-line Information, each of which consists of i topics, each topic consisting of m points of information. Topics are prefaced with 'i.' and information with a dash. It looks something like:
________________________________________________
|Number | Information
|===============================================
|1 | 1. Topic 1.1
| | -Info 1.1.1
| | - ... [more info]
| | 2. Topic 1.2
| | -Info 1.2.1
| | - ...[more info]
| | ... [more topics]
|_______|_____________________________
|2 | 1. Topic 2.1
|....and so on
The Current System
What I am doing with this information is to parse out each topic into it's own column, then unpivoting those columns and searching for Topics that contain a given keyword #keyword.
Currently the code reads something like:
Select
Number
,Case When Information LIKE '%1. %2. %'
Then substring (Information, charindex('1.',Information),
charindex('2.', Information) -(charindex('1.',Information)+2) )
Else Information
End as [Topic1]
,Case When Information LIKE '%2. %3. %'
Then substring (Information, charindex('2.',Information),
charindex('3.', Information) -(charindex('2.',Information)+2) )
Else 'N/A'
End as [Topic2]
...repeat 2nd case for each set of numbers up to '%20. %21. %'
The only reason the first one is different is because if it doesn't match the pattern then I want to grab the whole field so that I don't miss anything. I then unpivot the Topic fields that I just created into a general [Topic] field, and then utilize a WHERE [Topic] LIKE '%' +#keyword+'%' to pull out any particular topics and their associated case number to output as my final table. The cases can have anywhere from 1 to 40+ topics attached, with 1-7 attached info fields per topic.
The Desired Modification
Notice: To make the code easier to read, I will not be writing my substring code in proper syntax, instead opting to write substring(Information,ci(#Iter), ci(#Iter+1)-ci(#Iter)) to denote the substring running from the position given by '(iter).' to the position given by '(iter+1).'
What I would like to do is to perform the following:
Declare #Iter smallint
Declare #Result varchar(max)
Select
Number
, Set #Iter=1
Set #Result = ' '
Case When Information LIKE '%'+#keyword+'%' --keyword chosen at front end
Then While #Iter < #n --#n set by the user from front end
Begin
Case When Information LIKE '%' + cast(#Iter as varchar(5))
+ '. %'+cast((#Iter+1) as varchar(5))+'. %'
and substring(Information,ci(#Iter), ci(#Iter+1)-ci(#Iter) )
LIKE '%'+#keyword+'%'
Then Set #Result = #Result +substring(Information,ci(#Iter),
ci(#Iter+1)-ci(#Iter) )
Else Set #Result = #Result end
Set #Iter = #Iter +1
End
Else ' ' end [Result]
The Explanation
In case what I want isn't clear, I'll run through what I'm trying to accomplish
I want to output a list of case numbers that include Topics that include the keyword.
For each case in the list I want to output only those topics that include the keyword.
I want to allow the end user of the report to choose how many Topics in each case they'll search.
I don't want to have to create a table with a column for each Topic when I can't know how many the user will want to create.
Due to these considerations it feels like a loop would be the best option, but there are problems in trying to accomplish that.
The Problem
SQL server won't allow me to utilize a loop in my Select statement--Incorrect syntax near 'While'.
The place where the information comes from prohibits normalization of the information in the table I'm searching
Even if it didn't I am barred from creating my own permanent tables at work, so I can't normalize the data for all incoming data
I am also not allowed to write my own stored procedures.
If there is any way (for example through a cte) to implement these changes, I'm open to hearing them! I'm mostly looking at ways to make the code less daunting looking (20 cases to produce 20 fields in my current cte looks scary, which then needs 3 ctes just to unpack properly [unpivot, removal of certain cases meeting certain conditions, combination into a workable output table])
Thanks in advance for reading this and helping!
I think you're working too hard.
If all you need are topic names and numbers, isn't it easier to split the Information column by newlines, and then collect all lines that start with a number and not a "dash" by then, you will have a list of strings that look like:
Topic 1.1
Topic 2.1
And then it's easy to just match the lines against the keyword?
Something like this untested SQL:
select SUBSTRING(s.value,1, PATINDEX('% %', s.Value) - 1) AS topicId
, SUBSTRING(s.Value, PATINDEX('% %', s.Value), LENGTH(s.Value)) AS topicText
from [table that would make Codd cry] t
cross apply STRING_SPLIT(t.Information, CHAR(13)) s
where s.Value LIKE '[0-9]%' -- Starts with a number
AND s.Value LIKE #keywords --matches keywords
Not sure if you can create functions or you have STRING_SPLIT available in your SQL Server version, but if you don't, there are some string splitting CTEs you can find on the net to do the job for you

Loop 5 records at a time and assign it to variable

I have a table of 811 records. I want to get five records at a time and assign it to variable. Next time when I run the foreach loop task in SSIS, it will loop another five records and overwrite the variable. I have tried doing with cursor but couldn't find the solution. Any help will be highly appreciated. I have table like this for e.g.
ServerId ServerName
1 Abc11
2 Cde22
3 Fgh33
4 Ijk44
5 Lmn55
6 Opq66
7 Rst77
. .
. .
. .
I want query should take first five names as follows and assign it to variable
ServerId ServerName
1 Abc11
2 Cde22
3 Fgh33
4 Ijk44
5 Lmn55
Then next loop takes another five name and overwrite the variable value and so on till the last record is consumed.
Taking ltn's answer into consideration this is how you can achieve limiting the rows in SSIS.
The Design will look like
Step 1 : Create the variables
Name DataType
Count int
Initial int
Final int
Step 2 : For the 1st Execute SQL Task write the sql to store the count
Select count(*) from YourTable
In the General tab of this task Select the ResultSet as Single Row.
In the ResultSet tab map the result to the variable
ResultName VariableName
0 User::Count
Step 3 : In the For Loop container enter the expression as shown below
Step 4 : Inside the For Loop drag an Execute SQL Task and write the expression
In Parameter Mapping map the initial variable
VariableName Direction DataType ParameterName ParameterSize
User::Initial Input NUMERIC 0 -1
Result Set tab
Result Name Variable Name
0 User::Final
Inside the DFT u can write the sqL to get the particular rows
Click on Parameters and select the variable INITIAL and FINAL
if your data will not be update between paging cycles and the sort order is always the same then you could try an approach similiar to:
CREATE PROCEDURE TEST
(
#StartNumber INT,
#TakeNumber INT
)
AS
SELECT TOP(#TakeNumber)
*
FROM(
SELECT
RowNumber=ROW_NUMBER() OVER(ORDER BY IDField DESC),
NameField
FROM
TableName
)AS X
WHERE RowNumber>=#StartNumber

Resources