Make unique colume in SQL - sql-server

I have table which has a duplicate data.
This is my Now table
Id Name
1 shahin Zen
2 shahin Zen & Aaron Henley
3 Fred Sayz feat. Antonia Lucas
4 Fred Sayz feat. Lawrence Alexander
5 Fred Sayz feat. Sibel
Note: I can not use distinct beacuse name has not fully match.
I want to make a table form this table like,
ID Name
1 shahin
2 Fred
Please anyone solved this kind of problem.
Thanks advance

if you just want to get distinct first words of the rows:
select distinct substring(Name, 0, charindex(' ', Name, 0))
from myTable
you can also add a check for the rows that contains space character by adding a where clause:
where charindex(' ', myTable, 0) > 0

If you just need the first names, try this:
SELECT
LEFT(name, CHARINDEX(' ', name))
FROM Table1
GROUP BY LEFT(name, CHARINDEX(' ', name))

You need to account for those records that don't have a space...
Select Distinct Left(name,CharIndex(' ',name+' '))
From myTable

Related

String split based on condition

I have few string with numbers like this; and its around 3000 records.
Column
------------
Cell 233567-3455
Cell123-4567
Cell#123-7449
Local 456-0987
1 616 468-7796
1234567-5x2345
234/625-1234
(C)755-7442
5732878-2
5721899-23
6712909-3
7894200-234
2144-57238
5673893/588218
437-4737-5772
How can i find the records like below:
Column
-------------
5732878-2
5721899-23
6712909-3
7894200-234
Once I find this, I need to split those into two parts
1st Column. | 2nd column
------------- |
5732878 | 5732872
5721899 | 5721823
6712909 | 6712903
7894200 | 7894234
I tried to fix This using PARINDEX and CHARINDEX
But somehow its not working.Please help.
I don't know your filtering logic to get to your intermediate set, but this should get your expected final result set. I assumed you only want records where the length of the string to the left of the hyphen is greater than the length on the right and also exclude records with more than 1 hyphen.
SELECT LEFT(telephone, CHARINDEX('-', telephone)-1) AS [1stTelephone],
STUFF(
--get the string before the hyphen
LEFT(telephone, CHARINDEX('-', telephone)-1),
--get the starting location of chars we are going to replace
LEN(LEFT(telephone, CHARINDEX('-', telephone)))-LEN(RIGHT(telephone, CHARINDEX('-', REVERSE(telephone))-1)),
--get the length of the section we are replacing
LEN(RIGHT(telephone, CHARINDEX('-', REVERSE(telephone))-1)),
--replace that section with the string after the hyphen
RIGHT(telephone, CHARINDEX('-', REVERSE(telephone))-1)
) AS [2nd telephone]
FROM your_table
WHERE LEN(LEFT(telephone, CHARINDEX('-', telephone))) > LEN(RIGHT(telephone, CHARINDEX('-', REVERSE(telephone))))
AND len(telephone) - len(REPLACE(telephone, '-', '')) = 1
Somewhat dirty method (looks specifically for 7 digits followed by hyphen followed by any number of digits):
SELECT BasePhone AS Phone1, LEFT(BasePhone, 7-LEN(OtherPhoneEnd)) + OtherPhoneEnd AS Phone2
FROM (
SELECT LEFT(Telephone, 7) AS BasePhone, SUBSTRING(Telephone,9,7) AS OtherPhoneEnd
FROM Telephones
WHERE Telephone LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9]-%'
)
I assumed based on information you given, that you want numbers with hyphen (-) at 8th position. Try this:
create table #TelNo (
Tel varchar(30)
)
insert #TelNo(Tel)
values ('5732878-2'),
('5721899-23'),
('6712909-3'),
('7894200-234'),
('2144-57238'),
('5673893/588218'),
('437-4737-5772')
select Tel, LEFT(Tel, Len(tel) - len(suffix)) + suffix [SecondTel] from (
select substring(Tel, 1, 7) [Tel], substring(Tel, 9, 10) [suffix] from #TelNo
where CHARINDEX('-', Tel) = 8
)a
You could use something like this:
DDL
use tempdb
create table TelNo (
Tel varchar(30)
)
insert TelNo(Tel)
values ('5732878-2'),
('5721899-23'),
('6712909-3'),
('7894200-234'),
('2144-57238'),
('5673893/588218'),
('437-4737-5772')
Code
select Tel,
case
when Tel like '%_-[0-9]' then left(Tel, len(Tel)-2)
when Tel like '%__-[0-9][0-9]' then left(Tel, len(Tel)-3)
when Tel like '%___-[0-9][0-9][0-9]' then left(Tel, len(Tel)-4)
else Tel
end Tel1,
case
when Tel like '%_-[0-9]' then left(Tel, len(Tel)-3) + right(Tel, 1)
when Tel like '%__-[0-9][0-9]' then left(Tel, len(Tel)-5) + right(Tel, 2)
when Tel like '%___-[0-9][0-9][0-9]' then left(Tel, len(Tel)-7) + right(Tel, 3)
else NULL
end Tel2
from TelNo

Expression to find multiple spaces in string

We handle a lot of sensitive data and I would like to mask passenger names using only the first and last letter of each name part and join these by three asterisks (***),
For example: the name 'John Doe' will become 'J***n D***e'
For a name that consists of two parts this is doable by finding the space using the expression:
LEFT(CardHolderNameFromPurchase, 1) +
'***' +
CASE WHEN CHARINDEX(' ', PassengerName) = 0
THEN RIGHT(PassengerName, 1)
ELSE SUBSTRING(PassengerName, CHARINDEX(' ', PassengerName) -1, 1) +
' ' +
SUBSTRING(PassengerName, CHARINDEX(' ', PassengerName) +1, 1) +
'***' +
RIGHT(PassengerName, 1)
END
However, the passenger name can have more than two parts, there is no real limit to it. How should can I find the indices of all spaces within an expression? Or should I maybe tackle this problem in a different way?
Any help or pointer is much appreciated!
This solution does what you want it to, but is really the wrong approach to use when trying to hide personally identifiable data, as per Gordon's explanation in his answer.
SQL:
declare #t table(n nvarchar(20));
insert into #t values('John Doe')
,('JohnDoe')
,('John Doe Two')
,('John Doe Two Three')
,('John O''Neill');
select n
,stuff((select ' ' + left(s.item,1) + '***' + right(s.item,1)
from dbo.fn_StringSplit4k(t.n,' ',null) as s
for xml path('')
),1,1,''
) as mask
from #t as t;
Output:
+--------------------+-------------------------+
| n | mask |
+--------------------+-------------------------+
| John Doe | J***n D***e |
| JohnDoe | J***e |
| John Doe Two | J***n D***e T***o |
| John Doe Two Three | J***n D***e T***o T***e |
| John O'Neill | J***n O***l |
+--------------------+-------------------------+
String splitting function based on Jeff Moden's Tally Table approach:
create function [dbo].[fn_StringSplit4k]
(
#str nvarchar(4000) = ' ' -- String to split.
,#delimiter as nvarchar(1) = ',' -- Delimiting value to split on.
,#num as int = null -- Which value to return, null returns all.
)
returns table
as
return
-- Start tally table with 10 rows.
with n(n) as (select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1)
-- Select the same number of rows as characters in #str as incremental row numbers.
-- Cross joins increase exponentially to a max possible 10,000 rows to cover largest #str length.
,t(t) as (select top (select len(isnull(#str,'')) a) row_number() over (order by (select null)) from n n1,n n2,n n3,n n4)
-- Return the position of every value that follows the specified delimiter.
,s(s) as (select 1 union all select t+1 from t where substring(isnull(#str,''),t,1) = #delimiter)
-- Return the start and length of every value, to use in the SUBSTRING function.
-- ISNULL/NULLIF combo handles the last value where there is no delimiter at the end of the string.
,l(s,l) as (select s,isnull(nullif(charindex(#delimiter,isnull(#str,''),s),0)-s,4000) from s)
select rn
,item
from(select row_number() over(order by s) as rn
,substring(#str,s,l) as item
from l
) a
where rn = #num
or #num is null;
GO
If you consider PassengerName as sensitive information, then you should not be storing it in clear text in generally accessible tables. Period.
There are several different options.
One is to have reference tables for sensitive information. Any table that references this would have an id rather than the name. Viola. No sensitive information is available without access to the reference table, and that would be severely restricted.
A second method is a reversible compression algorithm. This would allow the the value to be gibberish, but with the right knowledge, it could be transformed back into a meaningful value. Typical methods for this are the public key encryption algorithms devised by Rivest, Shamir, and Adelman (RSA encoding).
If you want to do first and last letters of names, I would be really careful about Asian names. Many of them consist of two or three letters, when written in Latin script. That isn't much hiding. SQL Server does not have simple mechanisms to do this. You can write a user-defined function with a loop to manager the process. However, I view this as the least secure and least desirable approach.
This uses Jeff Moden's DelimitedSplit8K, as well as the new functionality in SQL Server 2017 STRING_AGG. As I don't know what version you're using, I've just gone "whole hog" and assumed you're using the latest version.
Jeff's function is invaluable here, as it returns the ordinal position, something which Microsoft have foolishly omitted from their own function, STRING_SPLIT (and didn't add in 2017 either). Ordinal position is key here, so we can't make use of the built in function.
WITH VTE AS(
SELECT *
FROM (VALUES ('John Doe'),('Jane Bloggs'),('Edgar Allan Poe'),('Mr George W. Bush'),('Homer J Simpson')) V(FullName)),
Masking AS (
SELECT *,
ISNULL(STUFF(Item, 2, LEN(item) -2,'***'), Item) AS MaskedPart
FROM VTE V
CROSS APPLY dbo.delimitedSplit8K(V.Fullname, ' '))
SELECT STRING_AGG(MaskedPart,' ') AS MaskedFullName
FROM Masking
GROUP BY Fullname;
Edit: Nevermind, OP has commented they are using 2008, so STRING_AGG is out of the question. #iamdave, however, has posted an answer which is very similar to my own, just do it the "old fashioned XML way".
Depending on your version of SQL Server, you may be able to use the built-in string split to rows on spaces in the name, do your string formatting, and then roll back up to name level using an XML path.
create table dataset (id int identity(1,1), name varchar(50));
insert into dataset (name) values
('John Smith'),
('Edgar Allen Poe'),
('One Two Three Four');
with split as (
select id, cs.Value as Name
from dataset
cross apply STRING_SPLIT (name, ' ') cs
),
formatted as (
select
id,
name,
left(name, 1) + '***' + right(name, 1) as out
from split
)
SELECT
id,
(SELECT ' ' + out
FROM formatted b
WHERE a.id = b.id
FOR XML PATH('')) [out_name]
FROM formatted a
GROUP BY id
Result:
id out_name
1 J***n S***h
2 E***r A***n P***e
3 O***e T***o T***e F***r
You can do that using this function.
create function [dbo].[fnMaskName] (#var_name varchar(100))
RETURNS varchar(100)
WITH EXECUTE AS CALLER
AS
BEGIN
declare #var_part varchar(100)
declare #var_return varchar(100)
declare #n_position smallint
set #var_return = ''
set #n_position = 1
WHILE #n_position<>0
BEGIN
SET #n_position = CHARINDEX(' ', #var_name)
IF #n_position = 0
SET #n_position = LEN(#var_name)
SET #var_part = SUBSTRING(#var_name, 1, #n_position)
SET #var_name = SUBSTRING(#var_name, #n_position+1, LEN(#var_name))
if #var_part<>''
SET #var_return = #var_return + stuff(#var_part, 2, len(#var_part)-2, replicate('*',len(#var_part)-2)) + ' '
END
RETURN(#var_return)
END

How to concatenate the cells in a row?

There are many questions how to concatenate multiple rows into a varchar, but can you concatenate all the cells in a row into a varchar?
Example : A table with 3 columns
| Id | FirstName | LastName |
| 1 | John | Doe |
| 2 | Erik | Foo |
return the following
"1, John, Doe"
"2, Erik, Foo"
You know which table you are working on.
Note 1 : Assume that you don't know the name of the columns when you write your query.
Note 2 : I would like to avoid dynamic SQL (if possible)
Only thing I can think of is setting nocount to on outputting results to text instead of a grid using these parameters. That can be done without knowing amount of columns and avoiding Dynamic SQL.
SET NOCOUNT ON;
;WITH Test (Id, FirstName, LastName)
AS (
SELECT 1, 'John', 'Doe'
UNION ALL
SELECT 2, 'Erik', 'Foo'
)
SELECT *
FROM Test
Will return you this:
1,John,Doe
2,Erik,Foo
Here is the basic version of this. Converting this to a dynamic sql solution when the columns are unknown is going to be very tricky. You will need to use sql to dynamically generate a query similar to this. Any table that doesn't have a primary key, or a unique index would be nearly impossible because you wouldn't know what column to use as your group by. It also becomes more tricky because you don't know what datatype(s) you are working with. You would also need to be certain to add some logic to handle single quotes and NULL. This is an interesting challenge for sure. If I have time this weekend I may try to work something up for the dynamic version of this.
with Something(Id, FirstName, LastName) as
(
select 1, 'John', 'Doe' union all
select 2, 'Erik', 'Foo'
)
select STUFF((select cast(s2.Id as varchar(5)) + ', ' + s2.FirstName + ', ' + s2.LastName
from Something s2
where s2.Id = s.Id
for xml path('')), 1, 0, '') as Stuffed
from Something s
group by Id

Separating firstname surname

If the original field looks like paul#yates then this syntax picks out the surname correctly
substring(surname,CHARINDEX('#',surname+'#')+1,LEN(name3))
however if the field is paul#b#yates then the surname looks like #b#yates. I want the middle letter to be dropped so it picks only the surname out.
any ideas?
You can;
;with T(name) as (
select 'paul#yates' union
select 'paul#b#yates'
)
select
right(name, charindex('#', reverse(name) + '#') - 1)
from T
>>
yates
yates
you could reverse the array, split it till you find the first "#", take that part and reverse it again.
if this is java, there should be an array.reverse function, otherwise you possibly need to write it on your own.
also you could cut the string in pieces until there are no mor "#" signs left and then take the last part (the substring should return "-1" or something), but i like my first idea better.
Here's an example for you
declare #t table (name varchar(max));
insert #t select
'john' union all select
'john#t#bill' union all select
'joe#public';
select firstname=left(name,-1+charindex('#',name+'#')),
surname=case when name like '%#%' then
stuff(name,1,len(name)+1-charindex('#',reverse(name)+'#'),'')
end
from #t;
-- results
FIRSTNAME SURNAME
john (null)
john bill
joe public

Select only string with number SQL Server

I need select only strings in my table, but this table has numbers and strings together.
Ex:
ID Name
1 Jacke11
2 Andre
3 Rodrigo11
4 55555
My select need return only Ids: 1, 2, 3.
Thanks
SELECT ID
FROM YourTable
WHERE ISNUMERIC(Name + '.0e0') = 0
As an alternative to Joe's very fine ISNUMERIC solution, you can use PATINDEX to make sure you have an alpha character:
SELECT ID
FROM YourTable
WHERE PATINDEX('%[a-z]%', name) > 0
This may be slightly faster since it will stop searching the string as soon as it gets to the first alpha character.

Resources