How to concatenate the cells in a row? - sql-server

There are many questions how to concatenate multiple rows into a varchar, but can you concatenate all the cells in a row into a varchar?
Example : A table with 3 columns
| Id | FirstName | LastName |
| 1 | John | Doe |
| 2 | Erik | Foo |
return the following
"1, John, Doe"
"2, Erik, Foo"
You know which table you are working on.
Note 1 : Assume that you don't know the name of the columns when you write your query.
Note 2 : I would like to avoid dynamic SQL (if possible)

Only thing I can think of is setting nocount to on outputting results to text instead of a grid using these parameters. That can be done without knowing amount of columns and avoiding Dynamic SQL.
SET NOCOUNT ON;
;WITH Test (Id, FirstName, LastName)
AS (
SELECT 1, 'John', 'Doe'
UNION ALL
SELECT 2, 'Erik', 'Foo'
)
SELECT *
FROM Test
Will return you this:
1,John,Doe
2,Erik,Foo

Here is the basic version of this. Converting this to a dynamic sql solution when the columns are unknown is going to be very tricky. You will need to use sql to dynamically generate a query similar to this. Any table that doesn't have a primary key, or a unique index would be nearly impossible because you wouldn't know what column to use as your group by. It also becomes more tricky because you don't know what datatype(s) you are working with. You would also need to be certain to add some logic to handle single quotes and NULL. This is an interesting challenge for sure. If I have time this weekend I may try to work something up for the dynamic version of this.
with Something(Id, FirstName, LastName) as
(
select 1, 'John', 'Doe' union all
select 2, 'Erik', 'Foo'
)
select STUFF((select cast(s2.Id as varchar(5)) + ', ' + s2.FirstName + ', ' + s2.LastName
from Something s2
where s2.Id = s.Id
for xml path('')), 1, 0, '') as Stuffed
from Something s
group by Id

Related

Expression to find multiple spaces in string

We handle a lot of sensitive data and I would like to mask passenger names using only the first and last letter of each name part and join these by three asterisks (***),
For example: the name 'John Doe' will become 'J***n D***e'
For a name that consists of two parts this is doable by finding the space using the expression:
LEFT(CardHolderNameFromPurchase, 1) +
'***' +
CASE WHEN CHARINDEX(' ', PassengerName) = 0
THEN RIGHT(PassengerName, 1)
ELSE SUBSTRING(PassengerName, CHARINDEX(' ', PassengerName) -1, 1) +
' ' +
SUBSTRING(PassengerName, CHARINDEX(' ', PassengerName) +1, 1) +
'***' +
RIGHT(PassengerName, 1)
END
However, the passenger name can have more than two parts, there is no real limit to it. How should can I find the indices of all spaces within an expression? Or should I maybe tackle this problem in a different way?
Any help or pointer is much appreciated!
This solution does what you want it to, but is really the wrong approach to use when trying to hide personally identifiable data, as per Gordon's explanation in his answer.
SQL:
declare #t table(n nvarchar(20));
insert into #t values('John Doe')
,('JohnDoe')
,('John Doe Two')
,('John Doe Two Three')
,('John O''Neill');
select n
,stuff((select ' ' + left(s.item,1) + '***' + right(s.item,1)
from dbo.fn_StringSplit4k(t.n,' ',null) as s
for xml path('')
),1,1,''
) as mask
from #t as t;
Output:
+--------------------+-------------------------+
| n | mask |
+--------------------+-------------------------+
| John Doe | J***n D***e |
| JohnDoe | J***e |
| John Doe Two | J***n D***e T***o |
| John Doe Two Three | J***n D***e T***o T***e |
| John O'Neill | J***n O***l |
+--------------------+-------------------------+
String splitting function based on Jeff Moden's Tally Table approach:
create function [dbo].[fn_StringSplit4k]
(
#str nvarchar(4000) = ' ' -- String to split.
,#delimiter as nvarchar(1) = ',' -- Delimiting value to split on.
,#num as int = null -- Which value to return, null returns all.
)
returns table
as
return
-- Start tally table with 10 rows.
with n(n) as (select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1)
-- Select the same number of rows as characters in #str as incremental row numbers.
-- Cross joins increase exponentially to a max possible 10,000 rows to cover largest #str length.
,t(t) as (select top (select len(isnull(#str,'')) a) row_number() over (order by (select null)) from n n1,n n2,n n3,n n4)
-- Return the position of every value that follows the specified delimiter.
,s(s) as (select 1 union all select t+1 from t where substring(isnull(#str,''),t,1) = #delimiter)
-- Return the start and length of every value, to use in the SUBSTRING function.
-- ISNULL/NULLIF combo handles the last value where there is no delimiter at the end of the string.
,l(s,l) as (select s,isnull(nullif(charindex(#delimiter,isnull(#str,''),s),0)-s,4000) from s)
select rn
,item
from(select row_number() over(order by s) as rn
,substring(#str,s,l) as item
from l
) a
where rn = #num
or #num is null;
GO
If you consider PassengerName as sensitive information, then you should not be storing it in clear text in generally accessible tables. Period.
There are several different options.
One is to have reference tables for sensitive information. Any table that references this would have an id rather than the name. Viola. No sensitive information is available without access to the reference table, and that would be severely restricted.
A second method is a reversible compression algorithm. This would allow the the value to be gibberish, but with the right knowledge, it could be transformed back into a meaningful value. Typical methods for this are the public key encryption algorithms devised by Rivest, Shamir, and Adelman (RSA encoding).
If you want to do first and last letters of names, I would be really careful about Asian names. Many of them consist of two or three letters, when written in Latin script. That isn't much hiding. SQL Server does not have simple mechanisms to do this. You can write a user-defined function with a loop to manager the process. However, I view this as the least secure and least desirable approach.
This uses Jeff Moden's DelimitedSplit8K, as well as the new functionality in SQL Server 2017 STRING_AGG. As I don't know what version you're using, I've just gone "whole hog" and assumed you're using the latest version.
Jeff's function is invaluable here, as it returns the ordinal position, something which Microsoft have foolishly omitted from their own function, STRING_SPLIT (and didn't add in 2017 either). Ordinal position is key here, so we can't make use of the built in function.
WITH VTE AS(
SELECT *
FROM (VALUES ('John Doe'),('Jane Bloggs'),('Edgar Allan Poe'),('Mr George W. Bush'),('Homer J Simpson')) V(FullName)),
Masking AS (
SELECT *,
ISNULL(STUFF(Item, 2, LEN(item) -2,'***'), Item) AS MaskedPart
FROM VTE V
CROSS APPLY dbo.delimitedSplit8K(V.Fullname, ' '))
SELECT STRING_AGG(MaskedPart,' ') AS MaskedFullName
FROM Masking
GROUP BY Fullname;
Edit: Nevermind, OP has commented they are using 2008, so STRING_AGG is out of the question. #iamdave, however, has posted an answer which is very similar to my own, just do it the "old fashioned XML way".
Depending on your version of SQL Server, you may be able to use the built-in string split to rows on spaces in the name, do your string formatting, and then roll back up to name level using an XML path.
create table dataset (id int identity(1,1), name varchar(50));
insert into dataset (name) values
('John Smith'),
('Edgar Allen Poe'),
('One Two Three Four');
with split as (
select id, cs.Value as Name
from dataset
cross apply STRING_SPLIT (name, ' ') cs
),
formatted as (
select
id,
name,
left(name, 1) + '***' + right(name, 1) as out
from split
)
SELECT
id,
(SELECT ' ' + out
FROM formatted b
WHERE a.id = b.id
FOR XML PATH('')) [out_name]
FROM formatted a
GROUP BY id
Result:
id out_name
1 J***n S***h
2 E***r A***n P***e
3 O***e T***o T***e F***r
You can do that using this function.
create function [dbo].[fnMaskName] (#var_name varchar(100))
RETURNS varchar(100)
WITH EXECUTE AS CALLER
AS
BEGIN
declare #var_part varchar(100)
declare #var_return varchar(100)
declare #n_position smallint
set #var_return = ''
set #n_position = 1
WHILE #n_position<>0
BEGIN
SET #n_position = CHARINDEX(' ', #var_name)
IF #n_position = 0
SET #n_position = LEN(#var_name)
SET #var_part = SUBSTRING(#var_name, 1, #n_position)
SET #var_name = SUBSTRING(#var_name, #n_position+1, LEN(#var_name))
if #var_part<>''
SET #var_return = #var_return + stuff(#var_part, 2, len(#var_part)-2, replicate('*',len(#var_part)-2)) + ' '
END
RETURN(#var_return)
END

Incorrect syntax near "case"

I have one table having data like this:
ID | Fill
---------------
1 | ####
2 | ####Y
3 | ####Y245
I want to insert the above data into another table and expecting the result table to be:
ID | Fill
----------------
1 | (Space)
2 | Y
3 | Y245
That is, when i find ####, it should be replace by space (4 space char as it has 4#)
Here is how I'm trying to do this:
insert into table1
(
id
,case
when contains(substring([fill],1,4),'####') then ' '+substring([fill],5,100)
else [fill]
end
)
select
id
,convert(char(100),[col1]+[col2]+[col3]+[col4])
from
table2
However, its showing syntax error near "case". What am I doing wrong? how can i achieve the desired result?
Just use replace()
insert into destination_table (col1)
select replace(col1, '#', ' ' ) from source_table
If # occurs, it will be replaced. If not, then the original string is used.
The case is in the field list part of the INSERT statement and is therefore not valid.
You could just use a simple replace to achieve this
INSERT INTO table1 (id, fill)
select id, replace(fill, '####', ' ') from table2

Split Single Column into multiple and Load it to a Table or a View

I'm using SQL Server 2008. I have a source table with a few columns (A, B) containing string data to split into a multiple columns. I do have function that does the split already written.
The data from the Source table (the source table format cannot be modified) is used in a View being created. But I need to have my View have already split data for Column A and B from the Source table. So, my view will have extra columns that are not in the Source table.
Then the View populated with the Source table is used to Merge with the Other Table.
There two questions here:
Can I split column A and B from the Source table when creating a View, but do not change the Source Table?
How to use my existing User Defined Function in the View "Select" statement to accomplish this task?
Idea in short:
String to split is also shown in the example in the commented out section. Pretty much have Destination table, vStandardizedData View, SP that uses the View data to Merge to tblStandardizedData table. So, in my Source column I have column A and B that I need to split before loading to tblStandardizedData table.
There are five objects that I'm working on:
Source File
Destination Table
vStandardizedData View
tblStandardizedData table
Stored procedure that does merge
(Update and Insert) form the vStandardizedData View.
Note: all the 5 objects a listed in the order they are supposed to be created and loaded.
Separately from this there is an existing UDFunction that can split the string which I was told to use
Example of the string in column A (column B has the same format data) to be split:
6667 Mission Street, 4567 7rd Street, 65 Sully Pond Park
Desired result:
User-defined function returns a table variable:
CREATE FUNCTION [Schema].[udfStringDelimeterfromTable]
(
#sInputList VARCHAR(MAX) -- List of delimited items
, #Delimiter CHAR(1) = ',' -- delimiter that separates items
)
RETURNS #List TABLE (Item VARCHAR(MAX)) WITH SCHEMABINDING
/*
* Returns a table of strings that have been split by a delimiter.
* Similar to the Visual Basic (or VBA) SPLIT function. The
* strings are trimmed before being returned. Null items are not
* returned so if there are multiple separators between items,
* only the non-null items are returned.
* Space is not a valid delimiter.
*
* Example:
SELECT * FROM [Schema].[udfStringDelimeterfromTable]('abcd,123, 456, efh,,hi', ',')
*
* Test:
DECLARE #Count INT, #Delim CHAR(10), #Input VARCHAR(128)
SELECT #Count = Count(*)
FROM [Schema].[udfStringDelimeterfromTable]('abcd,123, 456', ',')
PRINT 'TEST 1 3 lines:' + CASE WHEN #Count=3
THEN 'Worked' ELSE 'ERROR' END
SELECT #DELIM=CHAR(10)
, #INPUT = 'Line 1' + #delim + 'line 2' + #Delim
SELECT #Count = Count(*)
FROM [Schema].[udfStringDelimeterfromTable](#Input, #Delim)
PRINT 'TEST 2 LF :' + CASE WHEN #Count=2
THEN 'Worked' ELSE 'ERROR' END
What I'd ask you, is to read this: How to create a Minimal, Complete, and Verifiable example.
In general: If you use your UDF, you'll get table-wise data. It was best, if your UDF would return the item together with a running number. Otherwise you'll first need to use ROW_NUMBER() OVER(...) to create a part number in order to create your target column names via string concatenation. Then use PIVOT to get the columns side-by-side.
An easier approach could be a string split via XML like in this answer
A quick proof of concept to show the principles:
DECLARE #tbl TABLE(ID INT,YourValues VARCHAR(100));
INSERT INTO #tbl VALUES
(1,'6667 Mission Street, 4567 7rd Street, 65 Sully Pond Park')
,(2,'Other addr1, one more addr, and another one, and even one more');
WITH Casted AS
(
SELECT *
,CAST('<x>' + REPLACE(YourValues,',','</x><x>') + '</x>' AS XML) AS AsXml
FROM #tbl
)
SELECT *
,LTRIM(RTRIM(AsXml.value('/x[1]','nvarchar(max)'))) AS Address1
,LTRIM(RTRIM(AsXml.value('/x[2]','nvarchar(max)'))) AS Address2
,LTRIM(RTRIM(AsXml.value('/x[3]','nvarchar(max)'))) AS Address3
,LTRIM(RTRIM(AsXml.value('/x[4]','nvarchar(max)'))) AS Address4
,LTRIM(RTRIM(AsXml.value('/x[5]','nvarchar(max)'))) AS Address5
FROM Casted
If your values might include forbidden characters (especially <,> and &) you can find an approach to deal with this in the linked answer.
The result
+----+---------------------+-----------------+--------------------+-------------------+----------+
| ID | Address1 | Address2 | Address3 | Address4 | Address5 |
+----+---------------------+-----------------+--------------------+-------------------+----------+
| 1 | 6667 Mission Street | 4567 7rd Street | 65 Sully Pond Park | NULL | NULL |
+----+---------------------+-----------------+--------------------+-------------------+----------+
| 2 | Other addr1 | one more addr | and another one | and even one more | NULL |
+----+---------------------+-----------------+--------------------+-------------------+----------+

Join tables by column names, convert string to column name

I have a table which store 1 row per 1 survey.
Each survey got about 70 questions, each column present 1 question
SurveyID Q1, Q2 Q3 .....
1 Yes Good Bad ......
I want to pivot this so it reads
SurveyID Question Answer
1 Q1 Yes
1 Q2 Good
1 Q3 Bad
... ... .....
I use {cross apply} to acheive this
SELECT t.[SurveyID]
, x.question
, x.Answer
FROM tbl t
CROSS APPLY
(
select 1 as QuestionNumber, 'Q1' as Question , t.Q1 As Answer union all
select 2 as QuestionNumber, 'Q2' as Question , t.Q2 As Answer union all
select 3 as QuestionNumber, 'Q3' as Question , t.Q3 As Answer) x
This works but I dont want to do this 70 times so I have this select statement
select ORDINAL_POSITION
, COLUMN_NAME from INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = mytable
This gives me the list of column and position of column in the table.
So I hope I can somehow join 2nd statement with the 1st statement where by column name. However I am comparing content within a column and a column header here. Is it doable? Is there other way of achieving this?
Hope you can guide me please?
Thank you
Instead of Cross Apply you should use UNPIVOT for this query....
SQL Fiddle
MS SQL Server 2008 Schema Setup:
CREATE TABLE Test_Table(SurveyID INT, Q1 VARCHAR(10)
, Q2 VARCHAR(10), Q3 VARCHAR(10), Q4 VARCHAR(10))
INSERT INTO Test_Table VALUES
(1 , 'Yes', 'Good' , 'Bad', 'Bad')
,(2 , 'Bad', 'Bad' , 'Yes' , 'Good')
Query 1:
SELECT SurveyID
,Questions
,Answers
FROM Test_Table t
UNPIVOT ( Answers FOR Questions IN (Q1,Q2,Q3,Q4))up
Results:
| SurveyID | Questions | Answers |
|----------|-----------|---------|
| 1 | Q1 | Yes |
| 1 | Q2 | Good |
| 1 | Q3 | Bad |
| 1 | Q4 | Bad |
| 2 | Q1 | Bad |
| 2 | Q2 | Bad |
| 2 | Q3 | Yes |
| 2 | Q4 | Good |
If you need to perform this kind of operation to lots of similar tables that have differing numbers of columns, an UNPIVOT approach alone can be tiresome because you have to manually change the list of columns (Q1,Q2,Q3,etc) each time.
The CROSS APPLY based query in the question also suffers from similar drawbacks.
The solution to this, as you've guessed, involves using meta-information maintained by the server to tell you the list of columns you need to operate on. However, rather than requiring some kind of join as you suspect, what is needed is Dynamic SQL, that is, a SQL query that creates another SQL query on-the-fly.
This is done essentially by concatenating string (varchar) information in the SELECT part of the query, including values from columns which are available in your FROM (and join) clauses.
With Dynamic SQL (DSQL) approaches, you often use system metatables as your starting point. INFORMATION_SCHEMA exists in some SQL Server versions, but you're better off using the Object Catalog Views for this.
A prototype DSQL solution to generate the code for your CROSS APPLY approach would look something like this:
-- Create a variable to hold the created SQL code
-- First, add the static code at the start:
declare #SQL varchar(max) =
' SELECT t.[SurveyID]
, x.question
, x.Answer
FROM tbl t
CROSS APPLY
(
'
-- This syntax will add to the variable for every row in the query results; it's a little like looping over all the rows.
select #SQL +=
'select ' + cast(C.column_id as varchar)
+ ' as QuestionNumber, ''' + C.name
+ ''' as Question , t.' + C.name
+ ' As Answer union all
'
from sys.columns C
inner join sys.tables T on C.object_id=T.object_id
where T.name = 'MySurveyTable'
-- Remove final "union all", add closing bracket and alias
set #SQL = left(#SQL,len(#SQL)-10) + ') x'
print #SQL
-- To also execute (run) the dynamically-generated SQL
-- and get your desired row-based output all at the same time,
-- use the EXECUTE keyword (EXEC for short)
exec #SQL
A similar approach could be used to dynamically write SQL for the UNPIVOT approach.

Joining a table based on comma separated values

How can I join two tables, where one of the tables has multiple comma separated values in one column that reference an id in another column?
1st table
Name | Course Id
====================
Zishan | 1,2,3
Ellen | 2,3,4
2nd table
course id | course name
=======================
1 | java
2 | C++
3 | oracle
4 | dot net
Maybe this uglyness, I have not checked results:
select names.name, courses.course_name
from names inner join courses
on ',' + names.course_ids + ',' like '%,' + cast(courses.course_id as nvarchar(20)) + ',%'
First of all your Database structure is not normalized and should have been. Since it is already set up this way , here's how to solve the issue.
You'll need a function to split your string first:
CREATE FUNCTION SPLIT_STRING(str VARCHAR(255), delim VARCHAR(12), pos INT) RETURNS VARCHAR(255)
RETURN REPLACE(SUBSTRING(SUBSTRING_INDEX(str, delim, pos),
LENGTH(SUBSTRING_INDEX(str, delim, pos-1)) + 1), delim, '');
Then you'll need to create a view in order to make up for your structure:
CREATE VIEW database.viewname AS
SELECT SPLIT_STRING(CourseID, ',', n) as firstField,
SPLIT_STRING(CourseID, ',', n) as secondField,
SPLIT_STRING(CourseID, ',',n) as thirdField
FROM 1stTable;
Where n is the nth item in your list.
Now that you have a view which generates your separated fields, you can make a normal join on your view, just use your view like you would use a table.
SELECT *
FROM yourView
JOIN table1.field ON table2.field
However since I don't think you'll always have 3 values in your second field from your first table you'll need to tweak it a little more.
Inspiration of my answer from:
SQL query to split column data into rows
and
Equivalent of explode() to work with strings in MySQL
SELECT f.name,s.course_name FROM table1 AS f
INNER JOIN table2 as s ON f.course_id IN (s.course_id)
Use the Below Query For Solution
Select * from table_2 t2 INNER JOIN table_1 t1 on t1.Course Id = t2.course id

Resources