String split based on condition - sql-server

I have few string with numbers like this; and its around 3000 records.
Column
------------
Cell 233567-3455
Cell123-4567
Cell#123-7449
Local 456-0987
1 616 468-7796
1234567-5x2345
234/625-1234
(C)755-7442
5732878-2
5721899-23
6712909-3
7894200-234
2144-57238
5673893/588218
437-4737-5772
How can i find the records like below:
Column
-------------
5732878-2
5721899-23
6712909-3
7894200-234
Once I find this, I need to split those into two parts
1st Column. | 2nd column
------------- |
5732878 | 5732872
5721899 | 5721823
6712909 | 6712903
7894200 | 7894234
I tried to fix This using PARINDEX and CHARINDEX
But somehow its not working.Please help.

I don't know your filtering logic to get to your intermediate set, but this should get your expected final result set. I assumed you only want records where the length of the string to the left of the hyphen is greater than the length on the right and also exclude records with more than 1 hyphen.
SELECT LEFT(telephone, CHARINDEX('-', telephone)-1) AS [1stTelephone],
STUFF(
--get the string before the hyphen
LEFT(telephone, CHARINDEX('-', telephone)-1),
--get the starting location of chars we are going to replace
LEN(LEFT(telephone, CHARINDEX('-', telephone)))-LEN(RIGHT(telephone, CHARINDEX('-', REVERSE(telephone))-1)),
--get the length of the section we are replacing
LEN(RIGHT(telephone, CHARINDEX('-', REVERSE(telephone))-1)),
--replace that section with the string after the hyphen
RIGHT(telephone, CHARINDEX('-', REVERSE(telephone))-1)
) AS [2nd telephone]
FROM your_table
WHERE LEN(LEFT(telephone, CHARINDEX('-', telephone))) > LEN(RIGHT(telephone, CHARINDEX('-', REVERSE(telephone))))
AND len(telephone) - len(REPLACE(telephone, '-', '')) = 1

Somewhat dirty method (looks specifically for 7 digits followed by hyphen followed by any number of digits):
SELECT BasePhone AS Phone1, LEFT(BasePhone, 7-LEN(OtherPhoneEnd)) + OtherPhoneEnd AS Phone2
FROM (
SELECT LEFT(Telephone, 7) AS BasePhone, SUBSTRING(Telephone,9,7) AS OtherPhoneEnd
FROM Telephones
WHERE Telephone LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9]-%'
)

I assumed based on information you given, that you want numbers with hyphen (-) at 8th position. Try this:
create table #TelNo (
Tel varchar(30)
)
insert #TelNo(Tel)
values ('5732878-2'),
('5721899-23'),
('6712909-3'),
('7894200-234'),
('2144-57238'),
('5673893/588218'),
('437-4737-5772')
select Tel, LEFT(Tel, Len(tel) - len(suffix)) + suffix [SecondTel] from (
select substring(Tel, 1, 7) [Tel], substring(Tel, 9, 10) [suffix] from #TelNo
where CHARINDEX('-', Tel) = 8
)a

You could use something like this:
DDL
use tempdb
create table TelNo (
Tel varchar(30)
)
insert TelNo(Tel)
values ('5732878-2'),
('5721899-23'),
('6712909-3'),
('7894200-234'),
('2144-57238'),
('5673893/588218'),
('437-4737-5772')
Code
select Tel,
case
when Tel like '%_-[0-9]' then left(Tel, len(Tel)-2)
when Tel like '%__-[0-9][0-9]' then left(Tel, len(Tel)-3)
when Tel like '%___-[0-9][0-9][0-9]' then left(Tel, len(Tel)-4)
else Tel
end Tel1,
case
when Tel like '%_-[0-9]' then left(Tel, len(Tel)-3) + right(Tel, 1)
when Tel like '%__-[0-9][0-9]' then left(Tel, len(Tel)-5) + right(Tel, 2)
when Tel like '%___-[0-9][0-9][0-9]' then left(Tel, len(Tel)-7) + right(Tel, 3)
else NULL
end Tel2
from TelNo

Related

Expression to find multiple spaces in string

We handle a lot of sensitive data and I would like to mask passenger names using only the first and last letter of each name part and join these by three asterisks (***),
For example: the name 'John Doe' will become 'J***n D***e'
For a name that consists of two parts this is doable by finding the space using the expression:
LEFT(CardHolderNameFromPurchase, 1) +
'***' +
CASE WHEN CHARINDEX(' ', PassengerName) = 0
THEN RIGHT(PassengerName, 1)
ELSE SUBSTRING(PassengerName, CHARINDEX(' ', PassengerName) -1, 1) +
' ' +
SUBSTRING(PassengerName, CHARINDEX(' ', PassengerName) +1, 1) +
'***' +
RIGHT(PassengerName, 1)
END
However, the passenger name can have more than two parts, there is no real limit to it. How should can I find the indices of all spaces within an expression? Or should I maybe tackle this problem in a different way?
Any help or pointer is much appreciated!
This solution does what you want it to, but is really the wrong approach to use when trying to hide personally identifiable data, as per Gordon's explanation in his answer.
SQL:
declare #t table(n nvarchar(20));
insert into #t values('John Doe')
,('JohnDoe')
,('John Doe Two')
,('John Doe Two Three')
,('John O''Neill');
select n
,stuff((select ' ' + left(s.item,1) + '***' + right(s.item,1)
from dbo.fn_StringSplit4k(t.n,' ',null) as s
for xml path('')
),1,1,''
) as mask
from #t as t;
Output:
+--------------------+-------------------------+
| n | mask |
+--------------------+-------------------------+
| John Doe | J***n D***e |
| JohnDoe | J***e |
| John Doe Two | J***n D***e T***o |
| John Doe Two Three | J***n D***e T***o T***e |
| John O'Neill | J***n O***l |
+--------------------+-------------------------+
String splitting function based on Jeff Moden's Tally Table approach:
create function [dbo].[fn_StringSplit4k]
(
#str nvarchar(4000) = ' ' -- String to split.
,#delimiter as nvarchar(1) = ',' -- Delimiting value to split on.
,#num as int = null -- Which value to return, null returns all.
)
returns table
as
return
-- Start tally table with 10 rows.
with n(n) as (select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1)
-- Select the same number of rows as characters in #str as incremental row numbers.
-- Cross joins increase exponentially to a max possible 10,000 rows to cover largest #str length.
,t(t) as (select top (select len(isnull(#str,'')) a) row_number() over (order by (select null)) from n n1,n n2,n n3,n n4)
-- Return the position of every value that follows the specified delimiter.
,s(s) as (select 1 union all select t+1 from t where substring(isnull(#str,''),t,1) = #delimiter)
-- Return the start and length of every value, to use in the SUBSTRING function.
-- ISNULL/NULLIF combo handles the last value where there is no delimiter at the end of the string.
,l(s,l) as (select s,isnull(nullif(charindex(#delimiter,isnull(#str,''),s),0)-s,4000) from s)
select rn
,item
from(select row_number() over(order by s) as rn
,substring(#str,s,l) as item
from l
) a
where rn = #num
or #num is null;
GO
If you consider PassengerName as sensitive information, then you should not be storing it in clear text in generally accessible tables. Period.
There are several different options.
One is to have reference tables for sensitive information. Any table that references this would have an id rather than the name. Viola. No sensitive information is available without access to the reference table, and that would be severely restricted.
A second method is a reversible compression algorithm. This would allow the the value to be gibberish, but with the right knowledge, it could be transformed back into a meaningful value. Typical methods for this are the public key encryption algorithms devised by Rivest, Shamir, and Adelman (RSA encoding).
If you want to do first and last letters of names, I would be really careful about Asian names. Many of them consist of two or three letters, when written in Latin script. That isn't much hiding. SQL Server does not have simple mechanisms to do this. You can write a user-defined function with a loop to manager the process. However, I view this as the least secure and least desirable approach.
This uses Jeff Moden's DelimitedSplit8K, as well as the new functionality in SQL Server 2017 STRING_AGG. As I don't know what version you're using, I've just gone "whole hog" and assumed you're using the latest version.
Jeff's function is invaluable here, as it returns the ordinal position, something which Microsoft have foolishly omitted from their own function, STRING_SPLIT (and didn't add in 2017 either). Ordinal position is key here, so we can't make use of the built in function.
WITH VTE AS(
SELECT *
FROM (VALUES ('John Doe'),('Jane Bloggs'),('Edgar Allan Poe'),('Mr George W. Bush'),('Homer J Simpson')) V(FullName)),
Masking AS (
SELECT *,
ISNULL(STUFF(Item, 2, LEN(item) -2,'***'), Item) AS MaskedPart
FROM VTE V
CROSS APPLY dbo.delimitedSplit8K(V.Fullname, ' '))
SELECT STRING_AGG(MaskedPart,' ') AS MaskedFullName
FROM Masking
GROUP BY Fullname;
Edit: Nevermind, OP has commented they are using 2008, so STRING_AGG is out of the question. #iamdave, however, has posted an answer which is very similar to my own, just do it the "old fashioned XML way".
Depending on your version of SQL Server, you may be able to use the built-in string split to rows on spaces in the name, do your string formatting, and then roll back up to name level using an XML path.
create table dataset (id int identity(1,1), name varchar(50));
insert into dataset (name) values
('John Smith'),
('Edgar Allen Poe'),
('One Two Three Four');
with split as (
select id, cs.Value as Name
from dataset
cross apply STRING_SPLIT (name, ' ') cs
),
formatted as (
select
id,
name,
left(name, 1) + '***' + right(name, 1) as out
from split
)
SELECT
id,
(SELECT ' ' + out
FROM formatted b
WHERE a.id = b.id
FOR XML PATH('')) [out_name]
FROM formatted a
GROUP BY id
Result:
id out_name
1 J***n S***h
2 E***r A***n P***e
3 O***e T***o T***e F***r
You can do that using this function.
create function [dbo].[fnMaskName] (#var_name varchar(100))
RETURNS varchar(100)
WITH EXECUTE AS CALLER
AS
BEGIN
declare #var_part varchar(100)
declare #var_return varchar(100)
declare #n_position smallint
set #var_return = ''
set #n_position = 1
WHILE #n_position<>0
BEGIN
SET #n_position = CHARINDEX(' ', #var_name)
IF #n_position = 0
SET #n_position = LEN(#var_name)
SET #var_part = SUBSTRING(#var_name, 1, #n_position)
SET #var_name = SUBSTRING(#var_name, #n_position+1, LEN(#var_name))
if #var_part<>''
SET #var_return = #var_return + stuff(#var_part, 2, len(#var_part)-2, replicate('*',len(#var_part)-2)) + ' '
END
RETURN(#var_return)
END

TSQL stuff for xml path and sub query

I've to replace csv datas in column by correspondant id always in csv format
I've a problem with this query :
select t0.code , t0.categories, t0.departement, (
SELECT Stuff((
SELECT N', ' + CONVERT(varchar, id_categorie) FROM tcategories t1 WHERE t0.departement = t1.departement COLLATE French_CI_AI and categorie IN (t0.tcategories)
FOR XML PATH(''),TYPE).value('text()[1]','varchar(max)'),1,1,N'')) as id_colonne
FROM #codes_reductions t0 where categories is not null
Here is the result :
code | categories | departement | id_colonne
AIRSTREAM | 'A','B','BA' | JMQ | NULL
If I replace 'and categorie IN (t0.tcategories)' by and categorie IN ('A','B','BA') the query works good
Here is the result :
code | categories | departement | id_colonne
AIRSTREAM | 'A','B','BA' | JMQ | 128, 129, 260
I tryed to use COLLATE French_CI_AI on my column, but without success. Any idea ?
... categorie IN (t0.tcategories) ....
The cause of your problem is that categorie column stores those values ('A','B','BA') as a single value not as an array / list / table of values. So, SQL Server compares two strings thus s1 IN (s2) which is equivalent to s1 = s2 => A IN ('A','B','BA') <=> A = 'A','B','BA'.
Example (note: double single quotes ('') are used to define an empty string while four single quotes ('''') are used to define a string with a single quote: SELECT '''' --> '):
DECLARE #categories VARCHAR(1000);
SET #categories = '''A'',''B'',''BA'''
SELECT #categories AS ColA, CASE WHEN 'A' IN (#categories) THEN 'TRUE' ELSE 'FALSE' END AS Col2
/*
ColA Col2
------------ -----
'A','B','BA' FALSE
*/
The solution on short term is to use one of following conditions:
C#1 (if categorie contains single quotes): ... t0.tcategories LIKE '%' + categorie + '%' ....
C#2 (when categorie doesn't contains single quotes): ... t0.tcategories LIKE '%''' + categorie + '''%' ....
Example:
DECLARE #categories VARCHAR(1000);
SET #categories = '''A'',''B'',''BA'''
SELECT #categories AS ColA, CASE WHEN #categories LIKE '%''A''%' THEN 'TRUE' ELSE 'FALSE' END AS Col2
/*
ColA Col2
------------ -----
'A','B','BA' TRUE
*/
Second note: this works when every separate value from t0.tcategories column doesn't includes single quote(s) (example: 'B'A' / 'B''A' ).
On medium/long term, you should store separately every single value from tReduction.tCategories column using another table :
CREATE TABLE dbo.ReductionCategory (
... pk ...,
ReductionCode INT NOT NULL REFERENCES dbo.tReduction(ReductionCode), -- FK
CategoryCode INT NOT NULL REFERENCES dbo.tCategories(CategoryCode) -- FK
)
Thus, condition becomes
... categorie /*CategoryCode*/
IN (
SELECT rc.CategoryCode FROM dbo.ReductionCategory rc
WHERE rc.ReductionCode = t0.ReductionCode
) ....

TSQL Search Box First Name Surname order priority

I have a front end search box where the user can search for someone by firstname, middlename, surname or job title and bulk of the backend code looks like this:
SELECT TOP 50 * FROM (SELECT [EmployeeId], SUM(MatchOrder) as MatchOrder
FROM (SELECT
[EmployeeId],
CASE WHEN A.[EmployeeFieldId] = 4 Then 15 --Surname
WHEN A.[EmployeeFieldId] in (1, 2) Then 15 --PreferredName, FirstName
WHEN A.[EmployeeFieldId] = 3 Then 5 --MiddleName
WHEN A.[EmployeeFieldId] = 5 Then 20 --JobTitle
ELSE 3
END as MatchOrder
FROM [latest].[EmployeeAttributes] A
WHERE (' + #search + ')
) internal
GROUP BY EmployeeId) A
join dbo.vwEmployees E on E.EmployeeId = A.EmployeeId -- TEMP
ORDER BY 2 DESC'
Each employeeID is given a score (MatchOrder) which is totalled depending on how many of the above criteria are met (e.g. First Name + Surname match = 30) and then the search is ordered by the MatchOrder score to be displayed by the front end, But the problem is that if someone's First and Surname are very similar, e.g. Patrick Patterson and I only search for Pat Rice, then Patrick Patterson (30 pts) appears above Patrick Rice(30pts) because the First Name is being matched twice.
I'd like for it to either lower the points score if the match is doubly made, or modify my switch statement to somehow do this (nested case?
Do you know how I can combat this? Any help would be appreciated.
Thanks
Since [EmployeeFieldId] is always mapped to the same [MatchOrder], you should be able to control this by including [EmployeeFieldId] in the "internal" result set and slapping a DISTINCT clause on the SELECT:
SELECT DISTINCT
[EmployeeId],
[EmployeeFieldId],
CASE WHEN A.[EmployeeFieldId] = 4 Then 15 --Surname
WHEN A.[EmployeeFieldId] in (1, 2) Then 15 --PreferredName, FirstName
WHEN A.[EmployeeFieldId] = 3 Then 5 --MiddleName
WHEN A.[EmployeeFieldId] = 5 Then 20 --JobTitle
ELSE 3
END as MatchOrder
FROM [latest].[EmployeeAttributes] A
WHERE (' + #search + ')
That way, each employee will get at max one of the same field IDs applied towards their score.

Get COUNT() of rows where first 3 digits of column are alike

I have a result set of codes that are usually three digits followed by up to 2 digits like 012.34 or 123.45. The first three digits define a general category of group, and the digits following the decimal place define more specific qualities. There could be 77 012.xx numbers, and there are hundreds of unique 3 digit group definitions, followed by a varying number of digits per entry.
Does anyone know how to write a quick query to achieve this?
Assuming it is in a varchar column since you're storing 012.34...
SELECT LEFT(someColumn,3), COUNT(*)
FROM someTable
GROUP BY LEFT(someColumn,3)
HAVING COUNT(*) > 5 -- per your comments
ORDER BY LEFT(someColumn,3)
If it's not, then you'd do this:
SELECT LEFT(CONVERT(VARCHAR(10),someColumn),3), COUNT(*)
FROM someTable
GROUP BY LEFT(CONVERT(VARCHAR(10),someColumn),3)
ORDER BY LEFT(CONVERT(VARCHAR(10),someColumn),3)
#rypress these look strikingly similar to ICD-9 Diagnosis codes for Other respiratory tuberculosis ClickMe. Is this correct? In that case you will get Category and subcategory in your result and may change your counts(012->012.0->012.00,012.02..)
Sample Data:
IF OBJECT_ID(N'TempICD') > 0
BEGIN
DROP TABLE TempICD
END
CREATE TABLE TempICD (ICD VARCHAR(10))
INSERT INTO TempICD
VALUES ('012'),('012.0'),('012.00'),('012.01'),('012.02'),
('012.03'),('012.04'),('012.05'),('012.05'),
('013'),('013.0'),('013.00'),('013.01'),('013.02'),
('013.03'),(NULL)
Query to get Category with 6 or more line items (Including Category and Sub Category):
SELECT LEFT(ICD, 3) AS ICDs,
COUNT(1) AS ICDCount
FROM TempICD
GROUP BY LEFT(ICD, 3)
HAVING COUNT(*) > 5
ORDER BY LEFT(ICD, 3)
Query to get Category with 6 or more line items (Excluding Category and Sub Category):
SELECT SUBSTRING(ICD, 1, CHARINDEX('.', ICD + '.') - 1) AS ICDs,
SUM(CASE
WHEN LEN(SUBSTRING(ICD, CHARINDEX('.', ICD) + 1, LEN(ICD))) = 2 THEN 1
ELSE 0
END) AS ICDCount
FROM TempICD
WHERE ICD IS NOT NULL
GROUP BY SUBSTRING(ICD, 1, CHARINDEX('.', ICD + '.') - 1)
HAVING SUM(CASE
WHEN LEN(SUBSTRING(ICD, CHARINDEX('.', ICD) + 1, LEN(ICD))) = 2 THEN 1
ELSE 0
END) > 5
Cleanup Script:
IF OBJECT_ID(N'TempICD') > 0
BEGIN
DROP TABLE TempICD
END
This may also help you.
I assume that the column is Decimal Data type.
SELECT CAST([COLUMN] AS INT) [GROUP],
COUNT(*) [COUNT]
FROM [TABLE] T
GROUP BY CAST([COLUMN] AS INT)
it depends if your resultset is numeric or characters. if not numeric you can use string operations.
select left(resultSet,charindex('.',resultset)) , count(*)
from x
group by by left(resultSet,charindex('.',resultset))
order by left(resultSet,charindex('.',resultset))
with charindex you will get a correct 'cut' when the first digits are not 3 as 'usually'.
if your resultset is numeric/float you can use the floor function
select floor(resultset),count(*)
from x
group by floor(resultset)
order by floor(resultset)

Make unique colume in SQL

I have table which has a duplicate data.
This is my Now table
Id Name
1 shahin Zen
2 shahin Zen & Aaron Henley
3 Fred Sayz feat. Antonia Lucas
4 Fred Sayz feat. Lawrence Alexander
5 Fred Sayz feat. Sibel
Note: I can not use distinct beacuse name has not fully match.
I want to make a table form this table like,
ID Name
1 shahin
2 Fred
Please anyone solved this kind of problem.
Thanks advance
if you just want to get distinct first words of the rows:
select distinct substring(Name, 0, charindex(' ', Name, 0))
from myTable
you can also add a check for the rows that contains space character by adding a where clause:
where charindex(' ', myTable, 0) > 0
If you just need the first names, try this:
SELECT
LEFT(name, CHARINDEX(' ', name))
FROM Table1
GROUP BY LEFT(name, CHARINDEX(' ', name))
You need to account for those records that don't have a space...
Select Distinct Left(name,CharIndex(' ',name+' '))
From myTable

Resources