Related
We handle a lot of sensitive data and I would like to mask passenger names using only the first and last letter of each name part and join these by three asterisks (***),
For example: the name 'John Doe' will become 'J***n D***e'
For a name that consists of two parts this is doable by finding the space using the expression:
LEFT(CardHolderNameFromPurchase, 1) +
'***' +
CASE WHEN CHARINDEX(' ', PassengerName) = 0
THEN RIGHT(PassengerName, 1)
ELSE SUBSTRING(PassengerName, CHARINDEX(' ', PassengerName) -1, 1) +
' ' +
SUBSTRING(PassengerName, CHARINDEX(' ', PassengerName) +1, 1) +
'***' +
RIGHT(PassengerName, 1)
END
However, the passenger name can have more than two parts, there is no real limit to it. How should can I find the indices of all spaces within an expression? Or should I maybe tackle this problem in a different way?
Any help or pointer is much appreciated!
This solution does what you want it to, but is really the wrong approach to use when trying to hide personally identifiable data, as per Gordon's explanation in his answer.
SQL:
declare #t table(n nvarchar(20));
insert into #t values('John Doe')
,('JohnDoe')
,('John Doe Two')
,('John Doe Two Three')
,('John O''Neill');
select n
,stuff((select ' ' + left(s.item,1) + '***' + right(s.item,1)
from dbo.fn_StringSplit4k(t.n,' ',null) as s
for xml path('')
),1,1,''
) as mask
from #t as t;
Output:
+--------------------+-------------------------+
| n | mask |
+--------------------+-------------------------+
| John Doe | J***n D***e |
| JohnDoe | J***e |
| John Doe Two | J***n D***e T***o |
| John Doe Two Three | J***n D***e T***o T***e |
| John O'Neill | J***n O***l |
+--------------------+-------------------------+
String splitting function based on Jeff Moden's Tally Table approach:
create function [dbo].[fn_StringSplit4k]
(
#str nvarchar(4000) = ' ' -- String to split.
,#delimiter as nvarchar(1) = ',' -- Delimiting value to split on.
,#num as int = null -- Which value to return, null returns all.
)
returns table
as
return
-- Start tally table with 10 rows.
with n(n) as (select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1)
-- Select the same number of rows as characters in #str as incremental row numbers.
-- Cross joins increase exponentially to a max possible 10,000 rows to cover largest #str length.
,t(t) as (select top (select len(isnull(#str,'')) a) row_number() over (order by (select null)) from n n1,n n2,n n3,n n4)
-- Return the position of every value that follows the specified delimiter.
,s(s) as (select 1 union all select t+1 from t where substring(isnull(#str,''),t,1) = #delimiter)
-- Return the start and length of every value, to use in the SUBSTRING function.
-- ISNULL/NULLIF combo handles the last value where there is no delimiter at the end of the string.
,l(s,l) as (select s,isnull(nullif(charindex(#delimiter,isnull(#str,''),s),0)-s,4000) from s)
select rn
,item
from(select row_number() over(order by s) as rn
,substring(#str,s,l) as item
from l
) a
where rn = #num
or #num is null;
GO
If you consider PassengerName as sensitive information, then you should not be storing it in clear text in generally accessible tables. Period.
There are several different options.
One is to have reference tables for sensitive information. Any table that references this would have an id rather than the name. Viola. No sensitive information is available without access to the reference table, and that would be severely restricted.
A second method is a reversible compression algorithm. This would allow the the value to be gibberish, but with the right knowledge, it could be transformed back into a meaningful value. Typical methods for this are the public key encryption algorithms devised by Rivest, Shamir, and Adelman (RSA encoding).
If you want to do first and last letters of names, I would be really careful about Asian names. Many of them consist of two or three letters, when written in Latin script. That isn't much hiding. SQL Server does not have simple mechanisms to do this. You can write a user-defined function with a loop to manager the process. However, I view this as the least secure and least desirable approach.
This uses Jeff Moden's DelimitedSplit8K, as well as the new functionality in SQL Server 2017 STRING_AGG. As I don't know what version you're using, I've just gone "whole hog" and assumed you're using the latest version.
Jeff's function is invaluable here, as it returns the ordinal position, something which Microsoft have foolishly omitted from their own function, STRING_SPLIT (and didn't add in 2017 either). Ordinal position is key here, so we can't make use of the built in function.
WITH VTE AS(
SELECT *
FROM (VALUES ('John Doe'),('Jane Bloggs'),('Edgar Allan Poe'),('Mr George W. Bush'),('Homer J Simpson')) V(FullName)),
Masking AS (
SELECT *,
ISNULL(STUFF(Item, 2, LEN(item) -2,'***'), Item) AS MaskedPart
FROM VTE V
CROSS APPLY dbo.delimitedSplit8K(V.Fullname, ' '))
SELECT STRING_AGG(MaskedPart,' ') AS MaskedFullName
FROM Masking
GROUP BY Fullname;
Edit: Nevermind, OP has commented they are using 2008, so STRING_AGG is out of the question. #iamdave, however, has posted an answer which is very similar to my own, just do it the "old fashioned XML way".
Depending on your version of SQL Server, you may be able to use the built-in string split to rows on spaces in the name, do your string formatting, and then roll back up to name level using an XML path.
create table dataset (id int identity(1,1), name varchar(50));
insert into dataset (name) values
('John Smith'),
('Edgar Allen Poe'),
('One Two Three Four');
with split as (
select id, cs.Value as Name
from dataset
cross apply STRING_SPLIT (name, ' ') cs
),
formatted as (
select
id,
name,
left(name, 1) + '***' + right(name, 1) as out
from split
)
SELECT
id,
(SELECT ' ' + out
FROM formatted b
WHERE a.id = b.id
FOR XML PATH('')) [out_name]
FROM formatted a
GROUP BY id
Result:
id out_name
1 J***n S***h
2 E***r A***n P***e
3 O***e T***o T***e F***r
You can do that using this function.
create function [dbo].[fnMaskName] (#var_name varchar(100))
RETURNS varchar(100)
WITH EXECUTE AS CALLER
AS
BEGIN
declare #var_part varchar(100)
declare #var_return varchar(100)
declare #n_position smallint
set #var_return = ''
set #n_position = 1
WHILE #n_position<>0
BEGIN
SET #n_position = CHARINDEX(' ', #var_name)
IF #n_position = 0
SET #n_position = LEN(#var_name)
SET #var_part = SUBSTRING(#var_name, 1, #n_position)
SET #var_name = SUBSTRING(#var_name, #n_position+1, LEN(#var_name))
if #var_part<>''
SET #var_return = #var_return + stuff(#var_part, 2, len(#var_part)-2, replicate('*',len(#var_part)-2)) + ' '
END
RETURN(#var_return)
END
I'd like to concatenate three columns - street, street number and city to one column "adress". The strange thing is that I cannot do it for some reason.
This is what I have tried so far:
SELECT street,
street_num,
city,
isnull(street,'') + '' + isnull(street_num,'') + '' + isnull(city,'') AS tst1, --doesnt work
concat(isnull(street,''),' ',isnull(street_num,''), ' ', isnull(city,'')) AS tst2, --doesnt work
(street_num + ' ' + street) AS tst3, --does work
(street_num + ' ' + city) AS tst4, --does work
(city + ' ' + street) AS tst5 --doesnt work
FROM [DB].[dbo].[adresses]
Note that + or concat doesnt work, it only shows the first column, street in these cases. However, if I start with street number and add street or city, it does work. But if I try to add third column, it is not shown.
If it helps, the table was pulled from Oracle by OPENQUERY and the table structure is as follows:
street VARCHAR(100), null
street_num VARCHAR(50), null
city VARCHAR(100), null
I am on MSSQL 2014.
EDIT
As asked in the comments, i cant show the data as I am dealing with addresses of our customers. Below are two dummy records plus expected result (adress) as example:
street | street_num | city | adress
--------------------------------------------------------------------
avenida pino alto | 45 | avila | avenida pino alto 45 avila
rue de abaixo | 86 | madrid | rue de abaixo 86 madrid
Furthermore, If i copy the records and do something like this, it works of course.
SELECT 'avenida pino alto' + ' ' + '45' + ' ' + 'avila'
Based on comments, it seems that your street column contains some char/data that causes problems.
I have no idea what it could be, but you can try to find out like this:
select top 10
street,
len(street) as streetCharLen,
cast(street as varbinary(500)) as streetBytes
from [DB].[dbo].[adresses]
Then compare what the different columns tell you.
Here's a quick sample:
declare #t table (
id int,
thestring varchar(50)
)
insert into #t values (1, 'test')
select thestring,
len(thestring) as slen,
cast(thestring as varbinary(100)) as sbytes
from #t
If in this sample, the slen is not 4, or the sbytes contains something that does not map back to one of the characters that I see when selecting, then something is wrong with the string.
Use convert(varchar,[exp]):
SELECT street,
street_num,
city,
isnull(convert(varchar,street),'') + '' + isnull(convert(varchar,street_num),'') + '' + isnull(convert(varchar,city),'') AS tst1
FROM [DB].[dbo].[adresses]
Try as follows.
select isnull((convert varchar(250),street),'')+isnull((convert varchar(250),[street number]),'')+
isnull((convert varchar(250),[city]),'') as 'Adress'
from .......(your query)
I'm using SQL Server 2008. I have a source table with a few columns (A, B) containing string data to split into a multiple columns. I do have function that does the split already written.
The data from the Source table (the source table format cannot be modified) is used in a View being created. But I need to have my View have already split data for Column A and B from the Source table. So, my view will have extra columns that are not in the Source table.
Then the View populated with the Source table is used to Merge with the Other Table.
There two questions here:
Can I split column A and B from the Source table when creating a View, but do not change the Source Table?
How to use my existing User Defined Function in the View "Select" statement to accomplish this task?
Idea in short:
String to split is also shown in the example in the commented out section. Pretty much have Destination table, vStandardizedData View, SP that uses the View data to Merge to tblStandardizedData table. So, in my Source column I have column A and B that I need to split before loading to tblStandardizedData table.
There are five objects that I'm working on:
Source File
Destination Table
vStandardizedData View
tblStandardizedData table
Stored procedure that does merge
(Update and Insert) form the vStandardizedData View.
Note: all the 5 objects a listed in the order they are supposed to be created and loaded.
Separately from this there is an existing UDFunction that can split the string which I was told to use
Example of the string in column A (column B has the same format data) to be split:
6667 Mission Street, 4567 7rd Street, 65 Sully Pond Park
Desired result:
User-defined function returns a table variable:
CREATE FUNCTION [Schema].[udfStringDelimeterfromTable]
(
#sInputList VARCHAR(MAX) -- List of delimited items
, #Delimiter CHAR(1) = ',' -- delimiter that separates items
)
RETURNS #List TABLE (Item VARCHAR(MAX)) WITH SCHEMABINDING
/*
* Returns a table of strings that have been split by a delimiter.
* Similar to the Visual Basic (or VBA) SPLIT function. The
* strings are trimmed before being returned. Null items are not
* returned so if there are multiple separators between items,
* only the non-null items are returned.
* Space is not a valid delimiter.
*
* Example:
SELECT * FROM [Schema].[udfStringDelimeterfromTable]('abcd,123, 456, efh,,hi', ',')
*
* Test:
DECLARE #Count INT, #Delim CHAR(10), #Input VARCHAR(128)
SELECT #Count = Count(*)
FROM [Schema].[udfStringDelimeterfromTable]('abcd,123, 456', ',')
PRINT 'TEST 1 3 lines:' + CASE WHEN #Count=3
THEN 'Worked' ELSE 'ERROR' END
SELECT #DELIM=CHAR(10)
, #INPUT = 'Line 1' + #delim + 'line 2' + #Delim
SELECT #Count = Count(*)
FROM [Schema].[udfStringDelimeterfromTable](#Input, #Delim)
PRINT 'TEST 2 LF :' + CASE WHEN #Count=2
THEN 'Worked' ELSE 'ERROR' END
What I'd ask you, is to read this: How to create a Minimal, Complete, and Verifiable example.
In general: If you use your UDF, you'll get table-wise data. It was best, if your UDF would return the item together with a running number. Otherwise you'll first need to use ROW_NUMBER() OVER(...) to create a part number in order to create your target column names via string concatenation. Then use PIVOT to get the columns side-by-side.
An easier approach could be a string split via XML like in this answer
A quick proof of concept to show the principles:
DECLARE #tbl TABLE(ID INT,YourValues VARCHAR(100));
INSERT INTO #tbl VALUES
(1,'6667 Mission Street, 4567 7rd Street, 65 Sully Pond Park')
,(2,'Other addr1, one more addr, and another one, and even one more');
WITH Casted AS
(
SELECT *
,CAST('<x>' + REPLACE(YourValues,',','</x><x>') + '</x>' AS XML) AS AsXml
FROM #tbl
)
SELECT *
,LTRIM(RTRIM(AsXml.value('/x[1]','nvarchar(max)'))) AS Address1
,LTRIM(RTRIM(AsXml.value('/x[2]','nvarchar(max)'))) AS Address2
,LTRIM(RTRIM(AsXml.value('/x[3]','nvarchar(max)'))) AS Address3
,LTRIM(RTRIM(AsXml.value('/x[4]','nvarchar(max)'))) AS Address4
,LTRIM(RTRIM(AsXml.value('/x[5]','nvarchar(max)'))) AS Address5
FROM Casted
If your values might include forbidden characters (especially <,> and &) you can find an approach to deal with this in the linked answer.
The result
+----+---------------------+-----------------+--------------------+-------------------+----------+
| ID | Address1 | Address2 | Address3 | Address4 | Address5 |
+----+---------------------+-----------------+--------------------+-------------------+----------+
| 1 | 6667 Mission Street | 4567 7rd Street | 65 Sully Pond Park | NULL | NULL |
+----+---------------------+-----------------+--------------------+-------------------+----------+
| 2 | Other addr1 | one more addr | and another one | and even one more | NULL |
+----+---------------------+-----------------+--------------------+-------------------+----------+
Hi I am currently doing a project in which the database needs to sort the lot numbers
prefix is nvarchar
lotnum is int
suffix is nvarchar
I have managed to convert the lot number
code i used is
Select (case when prefix is null then '' else prefix end) +
CONVERT ( nvarchar , ( lotnumber ) ) +(case when suffix is null then '' else suffix end)
(values in the database are a1a,1a,1,2,100)
when I order by lotnumber I get
a1a
1a
1
2
100
then prefix to the order by
and get this result
1
a1a
1a
2
100
I have added the suffix as well and returns the same result
I need to order it as follows
1
1a
2
100
a1a
Please could someone help me on this
Have you tried ordering by all three columns?
ORDER BY prefix, lotnum, suffix
By the way, I can see you're using SQL Server. To make things more portable, I'd recommend to use COALESCE and CAST instead of a CASE/WHEN and CONVERT for prefix and lotnum. Full query may look like this.
SELECT
COALESCE(prefix, '')
+ CAST(lotnum AS NVARCHAR)
+ COALESCE(suffix, '') AS lot_number
FROM
YourTable
ORDER BY
COALESCE(prefix, '')
,lotnum
,COALESCE(suffix, '')
I'm doing some reporting against a silly database and I have to do
SELECT [DESC] as 'Description'
FROM dbo.tbl_custom_code_10 a
INNER JOIN dbo.Respondent b ON CHARINDEX(',' + a.code + ',', ',' + b.CC10) > 0
WHERE recordid = 116
Which Returns Multiple Rows
Palm
Compaq
Blackberry
Edit *
Schema is
Respondent Table (At a Glance) ...
*recordid lname fname address CC10 CC11 CC12 CC13*
116 Smith John Street 1,4,5, 1,3,4, 1,2,3, NULL
Tbl_Custom_Code10
*code desc*
0 None
1 Palm
10 Samsung
11 Treo
12 HTC
13 Nokia
14 LG
15 HP
16 Dash
Result set will always be 1 row, so John Smith: | 646-465-4566 | Has a Blackberry, Palm, Compaq | Likes: Walks on the beach, Rainbows, Saxophone
However I need to be able to use this within another query ... like
Select b.Name, c.Number, d.MulitLineCrap FROM Tables
How can I go about this, Thanks in advance ...
BTW I could also do it in LINQ if any body had any ideas ...
Here is one way to make a comma-separated list based on a query (just replace the query inside the first WITH block). Now, how that joins up with your query against b and c, I have no idea. You'll need to supply a more complete question - including specifics on how many rows come back from the second query and whether "MultilineCrap" is the same for each of those rows or if it depends on data in b/c.
;WITH x([DESC]) AS
(
SELECT d FROM (VALUES('Palm'),('Compaq'),('Blackberry')) AS x(d)
)
SELECT STUFF((SELECT ',' + [DESC]
FROM x
FOR XML PATH(''), TYPE).value(N'./text()[1]', N'varchar(max)'),1,1,'');
EDIT
Given the new requirements, perhaps this is the best way:
CREATE FUNCTION dbo.GetMultiLineCrap
(
#s VARCHAR(MAX)
)
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE #x VARCHAR(MAX) = '';
SELECT #x += ',' + [desc]
FROM dbo.tbl_custom_code_10
WHERE ',' + #s LIKE '%,' + RTRIM(code) + ',%';
RETURN (SELECT STUFF(#x, 1, 1, ''));
END
GO
SELECT r.LName, r.FName, MultilineCrap = dbo.GetMultiLineCrap(r.CC10)
FROM dbo.Respondent AS r
WHERE recordid = 116;
Please use aliases that make a little bit of sense, instead of just serially applying a, b, ,c, etc. Your queries will be easier to read, I promise.