Working on String concatenation with parsename function in sql server - sql-server

I have a employee table and the sample data in it is like this. I am using sql server 2008.
CREATE TABLE employee (name nvarchar(255))
insert into employee (name) values ('Alex,AlexMartin'),
('John,John'),
('Mayr,Mayr'),
('Shel,Sheila'),
('corolla,corolla,corolla3'),
('Mary4,Mary,Mary'),
('Justin,Justin,Justin'),
('Sara,Sara,Sara,Sara'),
('clarence,clarence,clarence458,clarence,clarence'),
('fiesta,fiesta,fiesta,fiesta,fiesta'),
('scorpio1,scorpio,scorpio,scorpio4,scorpio')
I want to delete a value if all the values in the string are same example: John,John should be replaced by 'John'. If all the names in string are not equal like Shel,Sheila it should retain both the values.
For this I am using
update employee set name=(select PARSENAME(REPLACE(name, ',', '.'), 2)) where (select PARSENAME(REPLACE(name, ',', '.'), 2))
like (select PARSENAME(REPLACE(name, ',', '.'), 1)) but it is changing Mary4,Mary,Mary to Mary. I tried combinations for 5, 4, and 3 names but there is no use. In fact for five names this code is not at all working. Is there any efficient way to do this?

This will get the data in the format you're looking for:
CREATE TABLE #Employee (Name NVarChar(255));
INSERT INTO #Employee (Name) VALUES ('Alex,AlexMartin'),
('John,John'),
('Mayr,Mayr'),
('Shel,Sheila'),
('corolla,corolla,corolla3'),
('Mary4,Mary,Mary'),
('Justin,Justin,Justin'),
('Sara,Sara,Sara,Sara'),
('clarence,clarence,clarence458,clarence,clarence'),
('fiesta,fiesta,fiesta,fiesta,fiesta'),
('scorpio1,scorpio,scorpio,scorpio4,scorpio'),
('Another');
SELECT Name, CASE
WHEN CHARINDEX(Name, ',') = 0 THEN Name
WHEN
REPLACE(REPLACE(Name, LEFT(Name, CHARINDEX(',', Name) - 1), ''), ',', '') = ''
THEN LEFT(Name, CHARINDEX(',', Name) - 1)
ELSE Name
END Result
FROM #Employee;
DROP TABLE #Employee;
It works by getting the first name in the comma-separated list. By your requirements all of the items have to be identical in order to condense the list, so it doesn't matter which element in the list we use for comparison.
All occurrences of the first item are removed from the list. Then all of the commas are removed. If the resulting value is empty (i.e. '') then we know all of the items are identical. In that case the first element is used as the Result value. Otherwise, we return the original list unchanged.
EDIT: Some of your data mustn't have a , in it, so I've updated the answer to take care of that. It will just return the same input if a delimiter doesn't exist.

Related

SQL Server CONCAT adds blank spaces when used

I've tried to use CONCAT function of some fields in a table; in order to get a string that I need to compare onto another field from different table.
However when I use the function it's like it random adds spaces between the fields and then I cannot use this result to compare.
I've tried:
SELECT CONCAT([STC_GL-STC].[ZZGL_Desc_Group_5D],'-',
[STC_GL-STC].[ZZCostCentreGroup],'-',
AS RESULT
FROM [STC_GL-STC];
As an example of result:
'Compras - RM -MATERIA PRIMA -'
(Please note the blank spaces in the second and third (-).
I would need to obtain:
'Compras - RM-MATERIA PRIMA-'
I've checked the values in the fields and there is no blank spaces at the end on fields ZZGL_Desc_Group_5D , ZZCostCentreGroup.
I've also tried:
SELECT CONCAT_WS('-',[ZZGL_Desc_Group_5D],[ZZCostCentreGroup]) AS RESULT
FROM [STC_GL-STC]
With same result.
And finally I tried to remove blank spaces using RTRIM and LTRIM using the following:
SELECT CONCAT(LTRIM(RTRIM([STC_GL-STC].[ZZGL_Desc_Group_5D])),
LTRIM(RTRIM('-')),
LTRIM(RTRIM([STC_GL-STC].[ZZCostCentreGroup]))) AS RESULT
FROM [STC_GL-STC]
ORDER BY RESULT ASC;
And even with LTRIM and RTRIM functions on that field, I still getting the same result.
How to get rid of this behaviour and of the blank spaces? Is there another way to build that string?
Kind Regards and many thanks in advance,
Long time ago I created a udf function to remove white spaces.
It is based on the 'magic' of the XML xs:token data type.
udf
/*
1. All invisible TAB, Carriage Return, and Line Feed characters will be replaced with spaces.
2. Then leading and trailing spaces are removed from the value.
3. Further, contiguous occurrences of more than one space will be replaced with a single space.
*/
CREATE FUNCTION dbo.udf_tokenize(#input VARCHAR(MAX))
RETURNS VARCHAR(MAX)
AS
BEGIN
RETURN (SELECT CAST('<r><![CDATA[' + #input + ']]></r>' AS XML).value('(/r/text())[1] cast as xs:token?','VARCHAR(MAX)'));
END
Test harness
-- DDL and sample data population, start
DECLARE #mockTbl TABLE (ID INT IDENTITY(1,1), col_1 VARCHAR(100), col_2 VARCHAR(100));
INSERT INTO #mockTbl (col_1, col_2)
VALUES (' FL ', ' Miami')
, (' FL ', ' Fort Lauderdale ')
, (' NY ', ' New York ')
, (' NY ', '')
, (' NY ', NULL);
-- DDL and sample data population, end
SELECT *
, col_1n = dbo.udf_tokenize(col_1)
, col_2n = dbo.udf_tokenize(col_2)
, CONCAT_WS('-', dbo.udf_tokenize(col_1), dbo.udf_tokenize(col_2)) AS RESULT
FROM #mockTbl;

Divide personal names in separate columns

I have one table which is called #MyNames.This table in one column contain full name of persons (Name space Surname). In this column we can distinct name from surname with space with them.From left side we have name and after space we have surname.Below you can see table:
So my intention is to divided this column with name MyFullName into two separate columns called ,,FirstName" and ,,LastName''. In order to do this i try with this code:
SELECT LEFT(MyFullName,charindex(' ', MyFullName) - 1) AS FirstName, RIGHT(MyFullName,charindex(' ', MyFullName)-1 ) AS LastName
from #MyNames
Output from this code is not good and I have results like table below :
So can anybody help me how to fix this code and get result correctly and to have fist name in one column and second name in other column?
Obviously the surname clause isn't working correctly.
Instead of using RIGHT(MyFullName,charindex(' ', MyFullName)-1 ) consider SUBSTRING(MyFullName,charindex(' ', MyFullName) + 1, 100) (or whatever field length is).
A better approach (though less easy to read imo) is to replace the first name component with blank '' characters.
STUFF(MyFullName,1,charindex(' ', MyFullName),'')
These should both work
SELECT LEFT(MyFullName,charindex(' ', MyFullName)-1) AS FirstName,
SUBSTRING(MyFullName,charindex(' ', MyFullName)+1, 100) AS LastName
from #MyNames
SELECT LEFT(MyFullName,charindex(' ', MyFullName)-1) AS FirstName,
STUFF(MyFullName,1,charindex(' ', MyFullName),'') AS LastName
from #MyNames
A small correction in your query should fix the problem.
SELECT LEFT(MyFullName,charindex(' ', MyFullName) - 1) AS FirstName
, RIGHT(MyFullName,len(MyFullName) - charindex(' ', MyFullName)) AS LastName
from #MyNames

SQL Server 2016 How to use a simple Regular Expression in T-SQL?

I have a column with the name of a person in the following format: "LAST NAME, FIRST NAME"
Only Upper Cases Allowed
Space after comma optional
I would like to use a regular expression like: [A-Z]+,[ ]?[A-Z]+ but I do not know how to do this in T-SQL. In Oracle, I would use REGEXP_LIKE, is there something similar for SQL Server 2016?
I need something like the following:
UPDATE table
SET is_correct_format = 'YES'
WHERE REGEXP_LIKE(table.name,'[A-Z]+,[ ]?[A-Z]+');
First, case sensitivity depends on the collation of the DB, though with LIKE you can specify case comparisons. With that... here is some Boolean logic to take care of the cases you stated. Though, you may need to add additional clauses if you discover some bogus input.
declare #table table (Person varchar(64), is_correct_format varchar(3) default 'NO')
insert into #table (Person)
values
('LowerCase, Here'),
('CORRECTLY, FORMATTED'),
('CORRECTLY,FORMATTEDTWO'),
('ONLY FIRST UPPER, LowerLast'),
('WEGOT, FormaNUMB3RStted'),
('NoComma Formatted'),
('CORRECTLY, TWOCOMMA, A'),
(',COMMA FIRST'),
('COMMA LAST,'),
('SPACE BEFORE COMMA , GOOD'),
(' SPACE AT BEGINNING, GOOD')
update #table
set is_correct_format = 'YES'
where
Person not like '%[^A-Z, ]%' --check for non characters, excluding comma and spaces
and len(replace(Person,' ','')) = len(replace(replace(Person,' ',''),',','')) + 1 --make sure there is only one comma
and charindex(',',Person) <> 1 --make sure the comma isn't at the beginning
and charindex(',',Person) <> len(Person) --make sure the comma isn't at the end
and substring(Person,charindex(',',Person) - 1,1) <> ' ' --make sure there isn't a space before comma
and left(Person,1) <> ' ' --check preceeding spaces
and UPPER(Person) = Person collate Latin1_General_CS_AS --check collation for CI default (only upper cases)
select * from #table
The tsql equivalent could look like this. I'm not vouching for the efficiency of this solution.
declare #table as table(name varchar(20), is_Correct_format varchar(5))
insert into #table(name) Values
('Smith, Jon')
,('se7en, six')
,('Billy bob')
UPDATE #table
SET is_correct_format = 'YES'
WHERE
replace(name, ', ', ',x')
like (replicate('[a-z]', charindex(',', name) - 1)
+ ','
+ replicate('[a-z]', len(name) - charindex(',', name)) )
select * from #table
The optional space is hard to solve, so since it's next to a legal character I'm just replacing with another legal character when it's there.
TSQL does not provide the kind of 'repeating pattern' of * or + in regex, so you have to count the characters and construct the pattern that many times in your search pattern.
I split the string at the comma, counted the alphas before and after, and built a search pattern to match.
Clunky, but doable.

Keep only desired characters and separate with semicolon in T-SQL

The problem:
I have text data imported into the db with a lot of unwanted characters. I need to keep only 4 capital letter strings within the imported text string. Example:
1447;#MIBD (This is a nice name);#2056;#LKRE (Very nice name indeed)
this could be in one column in one row of my table. What I need to extract from the string is:
MIBD and LKRE
And the result should preferably be the desired strings separated with semicolons.
It should be applied to the whole column and I cannot know how many of these 4 upper case letter strings might appear in one row.
Went through all sorts of function like PATINDEX etc. but really do not know how to approach it. thanks for any help!
try this, it assumes that the four char code is always preceded by ;# . As PATINDEX is case insensitive I have added additional check to verify that all the four character are capital.
DECLARE #MyTable Table( ID INT, MyString VARCHAR(8000))
INSERT INTO #MyTable
VALUES
(1, '1447;#MIBD (This is a nice name);#2056;#LKRE (Very nice name indeed)')
,(2, ';#DBCC (This is a nice name);#2056;#LLC (Very nice name indeed) ;#ABCD')
,(3, ';#AaaA;#OPQR;1234 (and) ;#WXYZ')
,(4, ';#abc this empty string without any code')
;WITH CTE AS
(
SELECT ID
,SUBSTRING(MyString, PATINDEX('%;#[A-Z][A-Z][A-Z][A-Z]%',MyString)+2, 4) AS NewString
,STUFF(MyString, 1, PATINDEX('%;#[A-Z][A-Z][A-Z][A-Z]%',MyString)+6, '') AS MyString
FROM #MyTable m
WHERE PATINDEX('%;#[A-Z][A-Z][A-Z][A-Z]%',MyString) > 0
UNION ALL
SELECT ID
,SUBSTRING(MyString, PATINDEX('%;#[A-Z][A-Z][A-Z][A-Z]%',MyString)+2, 4) AS NewString
,STUFF(MyString, 1, PATINDEX('%;#[A-Z][A-Z][A-Z][A-Z]%',MyString)+6, '') AS MyString
FROM CTE c
WHERE PATINDEX('%;#[A-Z][A-Z][A-Z][A-Z]%',MyString) > 0
)
SELECT c.ID,
STUFF(( SELECT '; ' + NewString
FROM CTE c1
WHERE c1.ID = c.ID
AND ASCII(SUBSTRING(NewString, 1, 1)) BETWEEN ASCII('A') AND ASCII('Z') -- first char
AND ASCII(SUBSTRING(NewString, 2, 1)) BETWEEN ASCII('A') AND ASCII('Z') -- second char
AND ASCII(SUBSTRING(NewString, 3, 1)) BETWEEN ASCII('A') AND ASCII('Z') -- third char
AND ASCII(SUBSTRING(NewString, 4, 1)) BETWEEN ASCII('A') AND ASCII('Z') -- fourth char
FOR XML PATH(''), TYPE).value('.', 'VARCHAR(MAX)') -- use the value clause to hanlde xml character issue like, &,",>,<
,1,1,'') AS CodeList
FROM CTE c
GROUP BY ID
OPTION (MAXRECURSION 0);
I came to something like this so far:
ALTER FUNCTION CleanData
(
-- Parameters here
#Text AS VARCHAR(4000)
)
RETURNS VARCHAR(4000)
AS
BEGIN
WHILE PATINDEX('%[0-9#;()]%', #Text) > 0
BEGIN
SET #Text = STUFF(#Text, PATINDEX('%[0-9#;()]%', #Text), 1, '')
END
RETURN #Text
END
But what I get is the Initials and the characters in parantheses as the PATINDEX cannot differ between the upper and lower case. Maybe it might be helpful for somebody else

Parsing through a column to flip names using patindex

So I have a database of customers. I run SELECT * FROM MyTable it gives me back several columns, one of which is the name. Looks like this:
"Doe, John"
"Jones, Bill"
"Smith, Mike"
"Johnson, Bob"
"Harry Smith"
"Black, Linda"
"White, Laura"
etc. Some are last name, first name. Others are first name last name.
My boss wants me to flip the names so they are all first then last.
So I ran this:
SELECT SUBSTRING(Column_Name, CHARINDEX(',', Column_Name) + 2, LEN(Name) - CHARINDEX(',', Column_Name) + 1) + ' ' + SUBSTRING(Column_Name, 1, CHARINDEX(',', Column_Name) - 1) FROM MyTable
The problem is that when I run that, it only runs the names until it finds one it doesn't need to flip. So in the example above, it would only give me the first four names, not all of them.
It was suggested to me that I could use the PATINDEX() to pull out all of the names. I don't know how to use this and was hoping I could get some help with it.
I suspect your code has TRY/CATCH or you are otherwise swallowing/suppressing/ignoring errors. You should get 4 rows back and then a big ugly error message:
Msg 537, Level 16, State 2
Invalid length parameter passed to the LEFT or SUBSTRING function.
The problem is that your expression assumes that , always exists. You need to cater for that either by filtering out the rows that don't contain a , (though this is not very dependable, since the expression could be attempted before the filter), or the following way, where you make different decisions about how to reassemble the string based on whether a , is found or not:
DECLARE #x TABLE(y VARCHAR(255));
INSERT #x VALUES
('Doe, John'),
('Jones, Bill'),
('Smith, Mike'),
('Johnson, Bob'),
('Harry Smith'),
('Black, Linda'),
('White, Laura');
SELECT LTRIM(SUBSTRING(y, COALESCE(NULLIF(CHARINDEX(',',y)+2,2),1),255))
+ RTRIM(' ' + LEFT(y, COALESCE(NULLIF(CHARINDEX(',' ,y)-1,-1),0)))
FROM #x;
Results:
John Doe
Bill Jones
Mike Smith
Bob Johnson
Harry Smith
Linda Black
Laura White
You don't need PATINDEX in this case, although it could be used. I'd take your expression to flip the names and put it in a CASE expression.
DECLARE #MyTable TABLE
(
Name VARCHAR(64) NOT NULL
);
INSERT #MyTable(Name)
VALUES
('Doe, John'),
('Jones, Bill'),
('Smith, Mike'),
('Johnson, Bob'),
('Harry Smith'),
('Black, Linda'),
('White, Laura');
SELECT
CASE
WHEN CHARINDEX(',', Name, 1) = 0 THEN Name
ELSE SUBSTRING(Name, CHARINDEX(',', Name) + 2, LEN(Name) - CHARINDEX(',', Name) + 1)
+ ' ' + SUBSTRING(Name, 1, CHARINDEX(',', Name) - 1)
END AS [Name]
FROM #MyTable;
The first condition simply returns the original value if no comma was used.

Resources