Parse Full Name into separate column Name fields - sql-server

SSIS project SQL 2014 I have a full name string in a single column including commas as input and I need to parse last name, first name, middle, if they exist into separate columns for the output. Can this be in done in the select?
I have seen solutions looking for specific parts of strings etc, but nothing that splits into 1 to 3 columns depending on the string in that particular row. For this integration, I can assume 1st position is last name, next is first if it exists and next is middle if it exists.

To sort of flesh out the comments, you can use a Derived Column transformation to generate your name parts from the full name. Any parts that don't exist will get blank spaces (not NULLS) in the output.
The syntax is TOKEN(character_expression, delimiter_string, occurrence)
Or, in your case:
LastName | <add as new column> | TOKEN(FullName, ",", 1)
FirstName | <add as new column> | TOKEN(FullName, ",", 2)
MiddleName | <add as new column> | TOKEN(FullName, ",", 1)
It should look something like this. It's a similar thing I did with table names:

Related

SQL Server - REPLACE - Matching string with old substring edits entire field?

I recently had a request come through to remove some Agent names from the guest surname field in a client's database.
Eg. 'John Smith -Wotif'
When testing using the following UPDATE statement, the entire field was wiped rather than just the specific string.
UPDATE GUEST
SET SURNAME = REPLACE(' -Wotif',' -Wotif','')
WHERE SURNAME LIKE '% -Wotif'
I've since found that simply using the column name as the matching string will allow the full statement to work (even if already specified in the SET section), but I can't work out where the logic of the original statement effectively says 'wipe these fields entirely'.
Unless specified otherwise, surely the '' replacement only applies to the value contained within the substring, regardless of whether the string and substring match?
The first argument in the REPLACE function is the full string that you want to search. So you should be referencing the SURNAME field rather than specifying part of the string.
REPLACE(SURNAME,' -Wotif','')
You update SQL command should be like this -
UPDATE GUEST
SET SURNAME = REPLACE(SURNAME, 'FindValue' , 'ReplaceWithValue')
WHERE SURNAME LIKE '% -Wotif'
If you want to find & replace '-Wotif' with blank, then update command should be like below-
UPDATE GUEST
SET SURNAME = REPLACE(SURNAME, '-Wotif' , '')
WHERE SURNAME LIKE '% -Wotif'

Use result of sql query to replace text on multiple xml files

i have a table on sql like this:
CD_MATERIAL | CD_IDENTIFICACAO
1 | 002323
2 | 00322234
... | ...
AND SO ON (5000+ lines)
I need to use that info to search and replace multiple external xml files on a folder (all the tags on those XML had numbers like the CD_IDENTIFICACAO from sql query, i need to replace with corresponding cd_material from sql query "ex.: 002323 becomes 1)
I used this query to extract all the cd_identificacao to use on Notepad++:
declare #result varchar(max)
select #result = COALESCE(#result + '', '') + CONCAT('(',CD_IDENTIFICACAO,')|') from TBL_MATERIAIS WHERE CD_IDENTIFICACAO <> '' ORDER BY CD_MATERIAL
select #result
That would bring me ex.:
(1TEC45D025)|(1TEC800039)|(999999999)|(542251)|(2TEC58426)|(234852)
and changed the parameters to get the replace ex.:
(? 2000)|(? 2001)|(? 2002)|(? 2003)|(? 2004)|(? 2005)
but i don't know how to add a number (increment) on front of "?" so notepad++ would understand it (search and replace would have 5000+ results, so it's not pratical to manually add the increment).
I was able to get a workaround for this. I've used this query to get all the the terms for find and replace i needed (1 per line)
select concat('<cProd>',cd_identificacao,'</cProd>'), concat('<cProd>',cd_material,'</cProd>') from tbl_materiais where cd_identificacao <> '' order by cd_material
That would result in:
<cProd>1TEC460054</cProd> <cProd>1</cProd>
<cProd>1TEC240035</cProd> <cProd>2</cProd>
(i added the tag too to make sure no other information could be replaced as there were many number combinations that could lead to incorrect replacement)
then pasted it on a txt and i used the notepad++ to replace the space between column 1 and 2 for /r/n wich would result in:
<cProd>1TEC460054</cProd>
<cProd>1</cProd>
<cProd>1TEC240035</cProd>
<cProd>2</cProd>
then i used "Ecobyte Replace Text" Tool, pasted my result file as new selection in bottom frame, loaded all my files on a new replace group on top frame (on properties of the group, u can change directory and options), then executed the replacement, it worked perfectly.
Thx.

Split data from strings into columns

I have a column with a long string. The data needs split into columns and there are variable lengths of strings with not always the same amount of columns. Not exactly sure how to do this so was looking for some advice here.
Lets say I have this string:
VS5~MedCond1~35.4|VS4~MedCond2~16|VS1~MedCond3~155|VS2~MedCond4~70|SPO2~MedCond5~100|VS3~MedCond6~64|FiO2~MedCond7~21|MAP~MedCond8~98|
And in some cases the string might not have all the medical conditions just some of them.
I need to split into columns where the column name is in between the tilds i.e. MedCond1 and the value would be the value to the right of the tild but before the pipe and end up like this:
MedCond1 MedCond2 MedCond3 MedCond4 MedCond5 MedCond6 MedCond7 MedCond8
======== ======== ======== ======== ======== ======== ======== ========
35.1 24 110 64 100 88 21 79
I need to do this for a lot of rows within a large table and as I said not all the columns are always present but they will not be different names, you might have med cond 1- 8, then in another set have med cond 3, 4, 7.
Here is a query I created that is kind of what I want but not dynamic so it is picking up the values with some extra bits of the string
select MainCol, case when charindex('MedCond1', MainCol) > 0 then
substring(MainCol, charindex('MedCond1', MainCol) + 9, 4) end as [MedCond1]
from MedTable
Will return
MedCond1
========
35.3
40.2
33.6
33|V <--- Problem
As you can see the numeric value is sometimes picked up with additional part of the string due to hard coding of the charindex number. The value is sometimes 4 characters long with a decimal place, sometimes 2 long with no decimal place. I would like to make this dynamic. The pipe defines the end of the data I need and the start is defined by the tild at the end of the column name.
Thanks for any thoughts on making this dynamic
Andrew
This data looks like a table itself. It could have been stored in SQL Server as xml. SQL Server supports xml fields and allows querying them. In fact, one could try to convert this string to XML, then try to query it:
declare #medTable table (item nvarchar(2000))
insert into #medTable
values ('VS5~MedCond1~35.4|VS4~MedCond2~16|VS1~MedCond3~155|VS2~MedCond4~70|SPO2~MedCond5~100|VS3~MedCond6~64|FiO2~MedCond7~21|MAP~MedCond8~98|');
-- Step 1: Replace `|` with <item> tags and `~` with `tag` tags
-- This will return an xml value for each medTable row
with items as (
select xmlField= cast('<item><tag>'
+ replace(
replace(item,'|','</tag></item><item><tag>'),
'~','</tag><tag>' )
+ '</tag></item>' as xml)
from #medTable
)
-- Step 2: Select different tags and display them as fields
select
y.item.value('(tag/text())[1]','nvarchar(20)'),
y.item.value('(tag/text())[2]','nvarchar(20)'),
y.item.value('(tag/text())[3]','nvarchar(20)')
from items outer apply xmlField.nodes('item') as y(item)
The result is :
-------------------- -------------------- -------
VS5 MedCond1 35.4
VS4 MedCond2 16
VS1 MedCond3 155
VS2 MedCond4 70
SPO2 MedCond5 100
VS3 MedCond6 64
FiO2 MedCond7 21
MAP MedCond8 98
NULL NULL NULL
It would be better to perform this conversion when loading the data though. It's easier for example, to make the replacements in C# or SSIS and store a complete xml value in the database.
You can modify this query too, to generate the xml value and store it in the database:
declare #medTable2 table (xmlField xml)
with items as (
select xmlField= cast('<item><tag>' + replace(replace(item,'|','</tag></item><item><tag>'),'~','</tag><tag>' ) + '</tag></item>' as xml)
from #medTable
)
insert into #medTable2
select items.xmlField
from items
-- Query the new table from now on
select
y.item.value('(tag/text())[1]','nvarchar(20)'),
y.item.value('(tag/text())[2]','nvarchar(20)'),
y.item.value('(tag/text())[3]','nvarchar(20)')
from #medTable2 outer apply xmlField.nodes('item') as y(item)
OK, let me take a stab at this. The solution I'm outlining is not going to be purely SQL Server, however, it uses a round-trip via a text-file.
The approach uses the following steps:
Unpivot the data delimited by the pipe symbols (to create more than one line of output for each line of input)
Round-trip the data from SQL Server to a text file and back
Separate the data into columns on the tilde ~ symbol delimiter
Pivot the data back into columns
The key benefit of this approach is the unpivot operation, which allows you to handle missing columns like MedCond2 naturally by the absence of an equivalent row. It also eliminates nearly all string manipulation, save for the one REPLACE function in step 1 below.
Given a single row's contents like the following:
VS5~MedCond1~35.4|VS4~MedCond2~16|VS1~MedCond3~155|VS2~MedCond4~70|SPO2~MedCond5~100|VS3~MedCond6~64|FiO2~MedCond7~21|MAP~MedCond8~98|
Step 1 (Unpivot): Find and replace all instances of the pipe symbol with a newline character. So, REPLACE(column, '|', CHAR(13)) will give you the following lines of text (i.e. multiple lines of text in a single database row) for a single input row:
VS5~MedCond1~35.4
VS4~MedCond2~16
VS1~MedCond3~155
VS2~MedCond4~70
SPO2~MedCond5~100
VS3~MedCond6~64
FiO2~MedCond7~21
MAP~MedCond8~98
Step 2 (Round-trip): Write the above output to a text file, using your tool of choice (SSIS, SQLCMD, etc.) and ensure that the newline character defined is the same as that used in the REPLACE command in step 1.
The purpose of this step is to concatenate multiple lines within the same row with other lines in different rows.
Note that steps 1 can be eliminated by defining the row delimiter for steps 2 & 3 as the pipe symbol. I've put in the additional step 1 using newlines only to make it easier to understand and debug.
Step 3 (Separate columns): Import the text file back into SQL Server using the same tool, and define the column delimiter as the tilde ~ symbol, row delimiter same as in steps 1/2.
ColA MedCondTitle MedCondValue
------ ------------- -------------
VS5 MedCond1 35.4
VS4 MedCond2 16
VS1 MedCond3 155
VS2 MedCond4 70
SPO2 MedCond5 100
VS3 MedCond6 64
FiO2 MedCond7 21
MAP MedCond8 98
Step 4 (Pivot): Now you'd have a trivially simple step of pivoting rows to columns, which can be achieved with a statement of the form:
SUM(CASE WHEN MedCondTitle='MedCond1' THEN MedCondValue ELSE 0) as MedCond1

T-SQL wildcard not operator ^ not working

T-SQL Not wildcard:
SELECT * FROM Customers
WHERE City LIKE 'A[^a]%';
It returns: 'Aachen'
So what is the meaning of ^ operator here, same result will come if use
WHERE City LIKE 'A[a]%';
I know I can use 'A[!a]%' and will work, my concern is then why ^?
From here:
The Caret Wildcard Character [^]:
The Caret Wildcard Character is used to search for any single
character not within the specified range [^a-c] or set [^abc].
To find all employees with a 3 characters long first name that begins
with ‘Ja’ and the third character is not ‘n’:
SELECT FirstName, MiddleName, LastName
FROM Person.Person
WHERE FirstName LIKE 'Ja[^n]'
Here is a screenshot depicting that it is working as expected:

Building dynamic query for Sql Server 2008 when table name contains " ' "

I need to fetch Table's TOP_PK, IDENT_CURRENT, IDENT_INCR, IDENT_SEED for which i am building dynamic query as below:
sGetSchemaCommand = String.Format("SELECT (SELECT TOP 1 [{0}] FROM [{1}]) AS TOP_PK, IDENT_CURRENT('[{1}]') AS CURRENT_IDENT, IDENT_INCR('[{1}]') AS IDENT_ICREMENT, IDENT_SEED('[{1}]') AS IDENT_SEED", pPrimaryKey, pTableName)
Here pPrimaryKey is name of Table's primary key column and pTableName is name of Table.
Now, i am facing problem when Table_Name contains " ' " character.(For Ex. KIN'1)
When i am using above logic and building query it would be as below:
SELECT (SELECT TOP 1 [ID] FROM [KIL'1]) AS TOP_PK, IDENT_CURRENT('[KIL'1]') AS CURRENT_IDENT, IDENT_INCR('[KIL'1]') AS IDENT_ICREMENT, IDENT_SEED('[KIL'1]') AS IDENT_SEED
Here, by executing above query i am getting error as below:
Incorrect syntax near '1'.
Unclosed quotation mark after the character string ') AS IDENT_SEED'.
So, can anyone please show me the best way to solve this problem?
Escape a single quote by doubling it: KIL'1 becomes KIL''1.
If a string already has adjacent single quotes, two becomes four, or four becomes eight... it can get a little hard to read, but it works :)
Using string methods from .NET, your statement could be:
sGetSchemaCommand = String.Format("SELECT (SELECT TOP 1 [{0}] FROM [{1}]) AS TOP_PK, IDENT_CURRENT('[{2}]') AS CURRENT_IDENT, IDENT_INCR('[{2}]') AS IDENT_ICREMENT, IDENT_SEED('[{2}]') AS IDENT_SEED", pPrimaryKey, pTableName, pTableName.Replace("'","''"))
EDIT:
Note that the string replace is now only on a new, third substitution string. (I've taken out the string replace for pPrimaryKey, and for the first occurrence of pTableName.) So now, single quotes are only doubled, when they will be within other single quotes.
You need to replace every single quote into two single quotes http://beyondrelational.com/modules/2/blogs/70/posts/10827/understanding-single-quotes.aspx

Resources