Pivot and concatenate values from column in SQL Server - sql-server

I have table with these columns:
ID | Name | Value
------------------
1 | Test1 | 0
2 | Test2 | 1
3 | Test3 | 0
4 | Test4 | 0
5 | Test5 | 1
And I want to have pivoted and concatenated value column as string
01001

The below code will give the expected result:
SELECT #Result = #Result + CAST(VALUE AS VARCHAR)
FROM #TmpTestingTable
Or you can use the STUFF:
SELECT STUFF(
( SELECT CAST(VALUE AS VARCHAR)
FROM #TmpTestingTable
FOR XML PATH ('')
), 1, 0, '')
For sample, I inserted the columns into the temporary table and execute the code.
CREATE TABLE #TmpTestingTable (ID INT, Name VARCHAR (20), Value INT)
INSERT INTO #TmpTestingTable (ID, Name, Value) VALUES
(1 , 'Test1' , 0),
(2 , 'Test2' , 1),
(3 , 'Test3' , 0),
(4 , 'Test4' , 0),
(5 , 'Test5' , 1)
DECLARE #Result AS VARCHAR (100) = '';
-- using variable approach
SELECT #Result = #Result + CAST(VALUE AS VARCHAR)
FROM #TmpTestingTable
SELECT #Result
-- using STUFF approach
SELECT STUFF(
( SELECT CAST(VALUE AS VARCHAR)
FROM #TmpTestingTable
FOR XML PATH ('')
), 1, 0, '')
DROP TABLE #TmpTestingTable

Use FOR XML to concatinate. It is important that you also include an ORDER BY. Otherwise you have no control of the order of the values and you risk an arbitrary order.
SELECT
(SELECT CAST([VALUE] AS CHAR(1))
FROM yourtable
ORDER BY ID
FOR XML PATH ('')
)

SELECT GROUP_CONCAT(Value SEPARATOR '') FROM Table
EDIT:
Not working on SQL Server. Have a look at Simulating group_concat MySQL function in Microsoft SQL Server 2005? to try to make it work

Related

SQL Server group by count eliminate duplicates [duplicate]

How do I get:
id Name Value
1 A 4
1 B 8
2 C 9
to
id Column
1 A:4, B:8
2 C:9
No CURSOR, WHILE loop, or User-Defined Function needed.
Just need to be creative with FOR XML and PATH.
[Note: This solution only works on SQL 2005 and later. Original question didn't specify the version in use.]
CREATE TABLE #YourTable ([ID] INT, [Name] CHAR(1), [Value] INT)
INSERT INTO #YourTable ([ID],[Name],[Value]) VALUES (1,'A',4)
INSERT INTO #YourTable ([ID],[Name],[Value]) VALUES (1,'B',8)
INSERT INTO #YourTable ([ID],[Name],[Value]) VALUES (2,'C',9)
SELECT
[ID],
STUFF((
SELECT ', ' + [Name] + ':' + CAST([Value] AS VARCHAR(MAX))
FROM #YourTable
WHERE (ID = Results.ID)
FOR XML PATH(''),TYPE).value('(./text())[1]','VARCHAR(MAX)')
,1,2,'') AS NameValues
FROM #YourTable Results
GROUP BY ID
DROP TABLE #YourTable
If it is SQL Server 2017 or SQL Server Vnext, SQL Azure you can use STRING_AGG as below:
SELECT id, STRING_AGG(CONCAT(name, ':', [value]), ', ')
FROM #YourTable
GROUP BY id
using XML path will not perfectly concatenate as you might expect... it will replace "&" with "&" and will also mess with <" and ">
...maybe a few other things, not sure...but you can try this
I came across a workaround for this... you need to replace:
FOR XML PATH('')
)
with:
FOR XML PATH(''),TYPE
).value('(./text())[1]','VARCHAR(MAX)')
...or NVARCHAR(MAX) if thats what youre using.
why the hell doesn't SQL have a concatenate aggregate function? this is a PITA.
I ran into a couple of problems when I tried converting Kevin Fairchild's suggestion to work with strings containing spaces and special XML characters (&, <, >) which were encoded.
The final version of my code (which doesn't answer the original question but may be useful to someone) looks like this:
CREATE TABLE #YourTable ([ID] INT, [Name] VARCHAR(MAX), [Value] INT)
INSERT INTO #YourTable ([ID],[Name],[Value]) VALUES (1,'Oranges & Lemons',4)
INSERT INTO #YourTable ([ID],[Name],[Value]) VALUES (1,'1 < 2',8)
INSERT INTO #YourTable ([ID],[Name],[Value]) VALUES (2,'C',9)
SELECT [ID],
STUFF((
SELECT ', ' + CAST([Name] AS VARCHAR(MAX))
FROM #YourTable WHERE (ID = Results.ID)
FOR XML PATH(''),TYPE
/* Use .value to uncomment XML entities e.g. > < etc*/
).value('.','VARCHAR(MAX)')
,1,2,'') as NameValues
FROM #YourTable Results
GROUP BY ID
DROP TABLE #YourTable
Rather than using a space as a delimiter and replacing all the spaces with commas, it just pre-pends a comma and space to each value then uses STUFF to remove the first two characters.
The XML encoding is taken care of automatically by using the TYPE directive.
Another option using Sql Server 2005 and above
---- test data
declare #t table (OUTPUTID int, SCHME varchar(10), DESCR varchar(10))
insert #t select 1125439 ,'CKT','Approved'
insert #t select 1125439 ,'RENO','Approved'
insert #t select 1134691 ,'CKT','Approved'
insert #t select 1134691 ,'RENO','Approved'
insert #t select 1134691 ,'pn','Approved'
---- actual query
;with cte(outputid,combined,rn)
as
(
select outputid, SCHME + ' ('+DESCR+')', rn=ROW_NUMBER() over (PARTITION by outputid order by schme, descr)
from #t
)
,cte2(outputid,finalstatus,rn)
as
(
select OUTPUTID, convert(varchar(max),combined), 1 from cte where rn=1
union all
select cte2.outputid, convert(varchar(max),cte2.finalstatus+', '+cte.combined), cte2.rn+1
from cte2
inner join cte on cte.OUTPUTID = cte2.outputid and cte.rn=cte2.rn+1
)
select outputid, MAX(finalstatus) from cte2 group by outputid
Install the SQLCLR Aggregates from http://groupconcat.codeplex.com
Then you can write code like this to get the result you asked for:
CREATE TABLE foo
(
id INT,
name CHAR(1),
Value CHAR(1)
);
INSERT INTO dbo.foo
(id, name, Value)
VALUES (1, 'A', '4'),
(1, 'B', '8'),
(2, 'C', '9');
SELECT id,
dbo.GROUP_CONCAT(name + ':' + Value) AS [Column]
FROM dbo.foo
GROUP BY id;
Eight years later... Microsoft SQL Server vNext Database Engine has finally enhanced Transact-SQL to directly support grouped string concatenation. The Community Technical Preview version 1.0 added the STRING_AGG function and CTP 1.1 added the WITHIN GROUP clause for the STRING_AGG function.
Reference: https://msdn.microsoft.com/en-us/library/mt775028.aspx
SQL Server 2005 and later allow you to create your own custom aggregate functions, including for things like concatenation- see the sample at the bottom of the linked article.
This is just an addition to Kevin Fairchild's post (very clever by the way). I would have added it as a comment, but I don't have enough points yet :)
I was using this idea for a view I was working on, however the items I was concatinating contained spaces. So I modified the code slightly to not use spaces as delimiters.
Again thanks for the cool workaround Kevin!
CREATE TABLE #YourTable ( [ID] INT, [Name] CHAR(1), [Value] INT )
INSERT INTO #YourTable ([ID], [Name], [Value]) VALUES (1, 'A', 4)
INSERT INTO #YourTable ([ID], [Name], [Value]) VALUES (1, 'B', 8)
INSERT INTO #YourTable ([ID], [Name], [Value]) VALUES (2, 'C', 9)
SELECT [ID],
REPLACE(REPLACE(REPLACE(
(SELECT [Name] + ':' + CAST([Value] AS VARCHAR(MAX)) as A
FROM #YourTable
WHERE ( ID = Results.ID )
FOR XML PATH (''))
, '</A><A>', ', ')
,'<A>','')
,'</A>','') AS NameValues
FROM #YourTable Results
GROUP BY ID
DROP TABLE #YourTable
An example would be
In Oracle you can use LISTAGG aggregate function.
Original records
name type
------------
name1 type1
name2 type2
name2 type3
Sql
SELECT name, LISTAGG(type, '; ') WITHIN GROUP(ORDER BY name)
FROM table
GROUP BY name
Result in
name type
------------
name1 type1
name2 type2; type3
This kind of question is asked here very often, and the solution is going to depend a lot on the underlying requirements:
https://stackoverflow.com/search?q=sql+pivot
and
https://stackoverflow.com/search?q=sql+concatenate
Typically, there is no SQL-only way to do this without either dynamic sql, a user-defined function, or a cursor.
Just to add to what Cade said, this is usually a front-end display thing and should therefore be handled there. I know that sometimes it's easier to write something 100% in SQL for things like file export or other "SQL only" solutions, but most of the times this concatenation should be handled in your display layer.
Don't need a cursor... a while loop is sufficient.
------------------------------
-- Setup
------------------------------
DECLARE #Source TABLE
(
id int,
Name varchar(30),
Value int
)
DECLARE #Target TABLE
(
id int,
Result varchar(max)
)
INSERT INTO #Source(id, Name, Value) SELECT 1, 'A', 4
INSERT INTO #Source(id, Name, Value) SELECT 1, 'B', 8
INSERT INTO #Source(id, Name, Value) SELECT 2, 'C', 9
------------------------------
-- Technique
------------------------------
INSERT INTO #Target (id)
SELECT id
FROM #Source
GROUP BY id
DECLARE #id int, #Result varchar(max)
SET #id = (SELECT MIN(id) FROM #Target)
WHILE #id is not null
BEGIN
SET #Result = null
SELECT #Result =
CASE
WHEN #Result is null
THEN ''
ELSE #Result + ', '
END + s.Name + ':' + convert(varchar(30),s.Value)
FROM #Source s
WHERE id = #id
UPDATE #Target
SET Result = #Result
WHERE id = #id
SET #id = (SELECT MIN(id) FROM #Target WHERE #id < id)
END
SELECT *
FROM #Target
Let's get very simple:
SELECT stuff(
(
select ', ' + x from (SELECT 'xxx' x union select 'yyyy') tb
FOR XML PATH('')
)
, 1, 2, '')
Replace this line:
select ', ' + x from (SELECT 'xxx' x union select 'yyyy') tb
With your query.
You can improve performance significant the following way if group by contains mostly one item:
SELECT
[ID],
CASE WHEN MAX( [Name]) = MIN( [Name]) THEN
MAX( [Name]) NameValues
ELSE
STUFF((
SELECT ', ' + [Name] + ':' + CAST([Value] AS VARCHAR(MAX))
FROM #YourTable
WHERE (ID = Results.ID)
FOR XML PATH(''),TYPE).value('(./text())[1]','VARCHAR(MAX)')
,1,2,'') AS NameValues
END
FROM #YourTable Results
GROUP BY ID
didn't see any cross apply answers, also no need for xml extraction. Here is a slightly different version of what Kevin Fairchild wrote. It's faster and easier to use in more complex queries:
select T.ID
,MAX(X.cl) NameValues
from #YourTable T
CROSS APPLY
(select STUFF((
SELECT ', ' + [Name] + ':' + CAST([Value] AS VARCHAR(MAX))
FROM #YourTable
WHERE (ID = T.ID)
FOR XML PATH(''))
,1,2,'') [cl]) X
GROUP BY T.ID
Using the Stuff and for xml path operator to concatenate rows to string :Group By two columns -->
CREATE TABLE #YourTable ([ID] INT, [Name] CHAR(1), [Value] INT)
INSERT INTO #YourTable ([ID],[Name],[Value]) VALUES (1,'A',4)
INSERT INTO #YourTable ([ID],[Name],[Value]) VALUES (1,'B',8)
INSERT INTO #YourTable ([ID],[Name],[Value]) VALUES (1,'B',5)
INSERT INTO #YourTable ([ID],[Name],[Value]) VALUES (2,'C',9)
-- retrieve each unique id and name columns and concatonate the values into one column
SELECT
[ID],
STUFF((
SELECT ', ' + [Name] + ':' + CAST([Value] AS VARCHAR(MAX)) -- CONCATONATES EACH APPLICATION : VALUE SET
FROM #YourTable
WHERE (ID = Results.ID and Name = results.[name] )
FOR XML PATH(''),TYPE).value('(./text())[1]','VARCHAR(MAX)')
,1,2,'') AS NameValues
FROM #YourTable Results
GROUP BY ID
SELECT
[ID],[Name] , --these are acting as the group by clause
STUFF((
SELECT ', '+ CAST([Value] AS VARCHAR(MAX)) -- CONCATONATES THE VALUES FOR EACH ID NAME COMBINATION
FROM #YourTable
WHERE (ID = Results.ID and Name = results.[name] )
FOR XML PATH(''),TYPE).value('(./text())[1]','VARCHAR(MAX)')
,1,2,'') AS NameValues
FROM #YourTable Results
GROUP BY ID, name
DROP TABLE #YourTable
Using Replace Function and FOR JSON PATH
SELECT T3.DEPT, REPLACE(REPLACE(T3.ENAME,'{"ENAME":"',''),'"}','') AS ENAME_LIST
FROM (
SELECT DEPT, (SELECT ENAME AS [ENAME]
FROM EMPLOYEE T2
WHERE T2.DEPT=T1.DEPT
FOR JSON PATH,WITHOUT_ARRAY_WRAPPER) ENAME
FROM EMPLOYEE T1
GROUP BY DEPT) T3
For sample data and more ways click here
If you have clr enabled you could use the Group_Concat library from GitHub
Another example without the garbage: ",TYPE).value('(./text())[1]','VARCHAR(MAX)')"
WITH t AS (
SELECT 1 n, 1 g, 1 v
UNION ALL
SELECT 2 n, 1 g, 2 v
UNION ALL
SELECT 3 n, 2 g, 3 v
)
SELECT g
, STUFF (
(
SELECT ', ' + CAST(v AS VARCHAR(MAX))
FROM t sub_t
WHERE sub_t.g = main_t.g
FOR XML PATH('')
)
, 1, 2, ''
) cg
FROM t main_t
GROUP BY g
Input-output is
************************* -> *********************
* n * g * v * * g * cg *
* - * - * - * * - * - *
* 1 * 1 * 1 * * 1 * 1, 2 *
* 2 * 1 * 2 * * 2 * 3 *
* 3 * 2 * 3 * *********************
*************************
I used this approach which may be easier to grasp. Get a root element, then concat to choices any item with the same ID but not the 'official' name
Declare #IdxList as Table(id int, choices varchar(max),AisName varchar(255))
Insert into #IdxLIst(id,choices,AisName)
Select IdxId,''''+Max(Title)+'''',Max(Title) From [dbo].[dta_Alias]
where IdxId is not null group by IdxId
Update #IdxLIst
set choices=choices +','''+Title+''''
From #IdxLIst JOIN [dta_Alias] ON id=IdxId And Title <> AisName
where IdxId is not null
Select * from #IdxList where choices like '%,%'
For all my healthcare folks out there:
SELECT
s.NOTE_ID
,STUFF ((
SELECT
[note_text] + ' '
FROM
HNO_NOTE_TEXT s1
WHERE
(s1.NOTE_ID = s.NOTE_ID)
ORDER BY [line] ASC
FOR XML PATH(''),TYPE).value('(./text())[1]','VARCHAR(MAX)')
,
1,
2,
'') AS NOTE_TEXT_CONCATINATED
FROM
HNO_NOTE_TEXT s
GROUP BY NOTE_ID

How to coalesce many rows into one?

I am using SSMS 2008 R2 and am simply trying to coalesce many rows into one. This should be simple I think, but it is currently repeating data in each row. Consider:
create table test
(
Name varchar(30)
)
insert test values('A'),('B'),('C')
select * from test
select distinct Name, coalesce(Name + ', ', '')
from test
How can I rewrite this to achieve one row like:
A,B,C
SELECT STUFF(( SELECT ', ' + Name
from #test
FOR XML PATH(''), TYPE).
value('.','NVARCHAR(MAX)'),1,2,'')
RESULT: A, B, C
I am sure this not how exactly your rows look that you are trying to concatenate, therefore see below for a slightly different data set and how you would go about doing this kind of operation on that
Test Data
create table #test
(
Id INT,
Name varchar(30)
)
insert #test values
(1,'A'),(1,'B'),(1,'C'),(2,'E'),(2,'F'),(2,'G')
Query
select t.Id
, STUFF(( SELECT ', ' + Name
from #test
WHERE Id = T.Id
FOR XML PATH(''), TYPE).
value('.','NVARCHAR(MAX)'),1,2,'') AS List
FROM #test t
GROUP BY t.Id
Result Set
╔════╦═════════╗
║ Id ║ List ║
╠════╬═════════╣
║ 1 ║ A, B, C ║
║ 2 ║ E, F, G ║
╚════╩═════════╝
In SQL Server Transact-SQL, the easiest way to accomplish this is the following.
A table like this:
create table #foo
(
id int not null identity(1,1) primary key clustered ,
name varchar(32) not null ,
)
insert #foo (name) values ( 'a' )
insert #foo (name) values ( 'b' )
insert #foo (name) values ( 'c' )
insert #foo (name) values ( 'd' )
go
Can flattened using this seemingly dubious (but documented) technique:
declare #text varchar(max) = ''
select #text = #text
+ case len(#text)
when 0 then ''
else ','
end
+ t.name
from #foo t
select list_of_names = #text
go
yielding
list_of_names
-------------
a,b,c,d
Easy!
in the old days of SQL Server 7.0 and SQL Server 2000, I Used to do this with a variable and using COALESCE like this:
DECLARE #List VARCHAR(8000)
SELECT #List = COALESCE(#List + ',', '') + CAST(Color AS VARCHAR)
FROM NameColorTable
SELECT #List
But after SQL Server 2005 and XPATH appearance, this the way I prefer to use:
SELECT STUFF((
select ','+ cast(Color as nvarchar(255))
from NameColorTable b
WHERE a.Name = b.Name
FOR XML PATH('')
)
,1,1,'') AS COLUMN2
FROM NameColorTable a
GROUP BY a.Name
I have a blog post about this here:
https://koukia.ca/stuff-vs-coalesce-in-tsql-for-concatenating-row-values-aefb078536f8#.f4iggl22y
Here is the standard solution for concatenation using XML PATH()
SELECT
STUFF(
(
SELECT
',' + Name
FROM test
FOR XML PATH(''),TYPE
).value('.','VARCHAR(MAX)'
), 1, 1, ''
) As list
Another option can be using the new SQL CONCAT() function introduced with SQL Server 2012.
I used SQL Concat() in the below sample
declare #namelist nvarchar(max)
select #namelist = concat(isnull(#namelist+',',''), name) from test2
select #namelist

Assigning select results into variable in stored procedure in mssql?

I have the following select:
SELECT School_Type,COUNT(ID) from Schools where City_ID = 1 group by School_Type
I get results:
10 | 3
20 | 4
30 | 14
I want to put results that are:
type 10 to variable #ElementarySchools
type 20 to variable #HighSchools
type 30 to variable #ProfessionalSchools
and get this result back from the Stored Procedure.
How do I do this ?
something like this? :)
declare #val varchar(max) = ''
select #val = #val + rtrim(foryear) + ' | ' + RTRIM( COUNT(*)) + ',' from mytable
group by ForYear
select #val
Using a table variable like this:
declare #tmp table (School_Type int, School_Count int)
insert into #tmp
SELECT School_Type,COUNT(ID) from Schools where City_ID = 1 group by School_Type
select #ElementarySchools=School_Count from #tmp where School_Type=10
select #HighSchools=School_Count from #tmp where School_Type=20
select #ProfessionalSchools=School_Count from #tmp where School_Type=30

SQL Server CSV per row

I have Data Like:
StudentID | Course
1 | .NET
1 | SQL Server
1 | Ajax
2 | Java
2 | JSP
2 | Struts
I want the query to get the Output data Like the following.
StudentID | Course
1 | .NET, SQL Server, Ajax
2 | Java, JSP, Struts
In SQL Server 2005+, the easiest method is the to use the FOR XML trick:
SELECT StudentID, STUFF((SELECT ',' + Course
FROM table t1
WHERE t1.StudentID = t.StudentID
FOR XML PATH('')), 1, 1, '')
FROM table t
In SQLServer2000+ you can use following
create table tbl (StudentID int, course varchar(10))
insert into tbl values (1,'.NET'),(1, 'SQL Server'), (1, 'Ajax'),(2,'Java'),(2,'JSP'),(2,'Struts')
GO
CREATE FUNCTION dbo.GetCourses(#id INTEGER)
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE #Result VARCHAR(MAX)
SET #Result = ''
SELECT #Result = #Result + [course] + ' ' FROM tbl WHERE StudentID = #id
RETURN RTRIM(#Result)
END
GO
SELECT DISTINCT StudentID, dbo.GetCourses(StudentID) FROM tbl
GO
drop table tbl
drop function dbo.GetCourses

Find non-ASCII characters in varchar columns using SQL Server

How can rows with non-ASCII characters be returned using SQL Server?
If you can show how to do it for one column would be great.
I am doing something like this now, but it is not working
select *
from Staging.APARMRE1 as ar
where ar.Line like '%[^!-~ ]%'
For extra credit, if it can span all varchar columns in a table, that would be outstanding! In this solution, it would be nice to return three columns:
The identity field for that record. (This will allow the whole record to be reviewed with another query.)
The column name
The text with the invalid character
Id | FieldName | InvalidText |
----+-----------+-------------------+
25 | LastName | Solís |
56 | FirstName | François |
100 | Address1 | 123 Ümlaut street |
Invalid characters would be any outside the range of SPACE (3210) through ~ (12710)
Here is a solution for the single column search using PATINDEX.
It also displays the StartPosition, InvalidCharacter and ASCII code.
select line,
patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,Line) as [Position],
substring(line,patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,Line),1) as [InvalidCharacter],
ascii(substring(line,patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,Line),1)) as [ASCIICode]
from staging.APARMRE1
where patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,Line) >0
I've been running this bit of code with success
declare #UnicodeData table (
data nvarchar(500)
)
insert into
#UnicodeData
values
(N'Horse�')
,(N'Dog')
,(N'Cat')
select
data
from
#UnicodeData
where
data collate LATIN1_GENERAL_BIN != cast(data as varchar(max))
Which works well for known columns.
For extra credit, I wrote this quick script to search all nvarchar columns in a given table for Unicode characters.
declare
#sql varchar(max) = ''
,#table sysname = 'mytable' -- enter your table here
;with ColumnData as (
select
RowId = row_number() over (order by c.COLUMN_NAME)
,c.COLUMN_NAME
,ColumnName = '[' + c.COLUMN_NAME + ']'
,TableName = '[' + c.TABLE_SCHEMA + '].[' + c.TABLE_NAME + ']'
from
INFORMATION_SCHEMA.COLUMNS c
where
c.DATA_TYPE = 'nvarchar'
and c.TABLE_NAME = #table
)
select
#sql = #sql + 'select FieldName = ''' + c.ColumnName + ''', InvalidCharacter = [' + c.COLUMN_NAME + '] from ' + c.TableName + ' where ' + c.ColumnName + ' collate LATIN1_GENERAL_BIN != cast(' + c.ColumnName + ' as varchar(max)) ' + case when c.RowId <> (select max(RowId) from ColumnData) then ' union all ' else '' end + char(13)
from
ColumnData c
-- check
-- print #sql
exec (#sql)
I'm not a fan of dynamic SQL but it does have its uses for exploratory queries like this.
try something like this:
DECLARE #YourTable table (PK int, col1 varchar(20), col2 varchar(20), col3 varchar(20));
INSERT #YourTable VALUES (1, 'ok','ok','ok');
INSERT #YourTable VALUES (2, 'BA'+char(182)+'D','ok','ok');
INSERT #YourTable VALUES (3, 'ok',char(182)+'BAD','ok');
INSERT #YourTable VALUES (4, 'ok','ok','B'+char(182)+'AD');
INSERT #YourTable VALUES (5, char(182)+'BAD','ok',char(182)+'BAD');
INSERT #YourTable VALUES (6, 'BAD'+char(182),'B'+char(182)+'AD','BAD'+char(182)+char(182)+char(182));
--if you have a Numbers table use that, other wise make one using a CTE
WITH AllNumbers AS
( SELECT 1 AS Number
UNION ALL
SELECT Number+1
FROM AllNumbers
WHERE Number<1000
)
SELECT
pk, 'Col1' BadValueColumn, CONVERT(varchar(20),col1) AS BadValue --make the XYZ in convert(varchar(XYZ), ...) the largest value of col1, col2, col3
FROM #YourTable y
INNER JOIN AllNumbers n ON n.Number <= LEN(y.col1)
WHERE ASCII(SUBSTRING(y.col1, n.Number, 1))<32 OR ASCII(SUBSTRING(y.col1, n.Number, 1))>127
UNION
SELECT
pk, 'Col2' BadValueColumn, CONVERT(varchar(20),col2) AS BadValue --make the XYZ in convert(varchar(XYZ), ...) the largest value of col1, col2, col3
FROM #YourTable y
INNER JOIN AllNumbers n ON n.Number <= LEN(y.col2)
WHERE ASCII(SUBSTRING(y.col2, n.Number, 1))<32 OR ASCII(SUBSTRING(y.col2, n.Number, 1))>127
UNION
SELECT
pk, 'Col3' BadValueColumn, CONVERT(varchar(20),col3) AS BadValue --make the XYZ in convert(varchar(XYZ), ...) the largest value of col1, col2, col3
FROM #YourTable y
INNER JOIN AllNumbers n ON n.Number <= LEN(y.col3)
WHERE ASCII(SUBSTRING(y.col3, n.Number, 1))<32 OR ASCII(SUBSTRING(y.col3, n.Number, 1))>127
order by 1
OPTION (MAXRECURSION 1000);
OUTPUT:
pk BadValueColumn BadValue
----------- -------------- --------------------
2 Col1 BA¶D
3 Col2 ¶BAD
4 Col3 B¶AD
5 Col1 ¶BAD
5 Col3 ¶BAD
6 Col1 BAD¶
6 Col2 B¶AD
6 Col3 BAD¶¶¶
(8 row(s) affected)
This script searches for non-ascii characters in one column. It generates a string of all valid characters, here code point 32 to 127. Then it searches for rows that don't match the list:
declare #str varchar(128);
declare #i int;
set #str = '';
set #i = 32;
while #i <= 127
begin
set #str = #str + '|' + char(#i);
set #i = #i + 1;
end;
select col1
from YourTable
where col1 like '%[^' + #str + ']%' escape '|';
running the various solutions on some real world data - 12M rows varchar length ~30, around 9k dodgy rows, no full text index in play, the patIndex solution is the fastest, and it also selects the most rows.
(pre-ran km. to set the cache to a known state, ran the 3 processes, and finally ran km again - the last 2 runs of km gave times within 2 seconds)
patindex solution by Gerhard Weiss -- Runtime 0:38, returns 9144 rows
select dodgyColumn from myTable fcc
WHERE patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,dodgyColumn ) >0
the substring-numbers solution by MT. -- Runtime 1:16, returned 8996 rows
select dodgyColumn from myTable fcc
INNER JOIN dbo.Numbers32k dn ON dn.number<(len(fcc.dodgyColumn ))
WHERE ASCII(SUBSTRING(fcc.dodgyColumn , dn.Number, 1))<32
OR ASCII(SUBSTRING(fcc.dodgyColumn , dn.Number, 1))>127
udf solution by Deon Robertson -- Runtime 3:47, returns 7316 rows
select dodgyColumn
from myTable
where dbo.udf_test_ContainsNonASCIIChars(dodgyColumn , 1) = 1
There is a user defined function available on the web 'Parse Alphanumeric'. Google UDF parse alphanumeric and you should find the code for it. This user defined function removes all characters that doesn't fit between 0-9, a-z, and A-Z.
Select * from Staging.APARMRE1 ar
where udf_parsealpha(ar.last_name) <> ar.last_name
That should bring back any records that have a last_name with invalid chars for you...though your bonus points question is a bit more of a challenge, but I think a case statement could handle it. This is a bit psuedo code, I'm not entirely sure if it'd work.
Select id, case when udf_parsealpha(ar.last_name) <> ar.last_name then 'last name'
when udf_parsealpha(ar.first_name) <> ar.first_name then 'first name'
when udf_parsealpha(ar.Address1) <> ar.last_name then 'Address1'
end,
case when udf_parsealpha(ar.last_name) <> ar.last_name then ar.last_name
when udf_parsealpha(ar.first_name) <> ar.first_name then ar.first_name
when udf_parsealpha(ar.Address1) <> ar.last_name then ar.Address1
end
from Staging.APARMRE1 ar
where udf_parsealpha(ar.last_name) <> ar.last_name or
udf_parsealpha(ar.first_name) <> ar.first_name or
udf_parsealpha(ar.Address1) <> ar.last_name
I wrote this in the forum post box...so I'm not quite sure if that'll function as is, but it should be close. I'm not quite sure how it will behave if a single record has two fields with invalid chars either.
As an alternative, you should be able to change the from clause away from a single table and into a subquery that looks something like:
select id,fieldname,value from (
Select id,'last_name' as 'fieldname', last_name as 'value'
from Staging.APARMRE1 ar
Union
Select id,'first_name' as 'fieldname', first_name as 'value'
from Staging.APARMRE1 ar
---(and repeat unions for each field)
)
where udf_parsealpha(value) <> value
Benefit here is for every column you'll only need to extend the union statement here, while you need to put that comparisson three times for every column in the case statement version of this script
To find which field has invalid characters:
SELECT * FROM Staging.APARMRE1 FOR XML AUTO, TYPE
You can test it with this query:
SELECT top 1 'char 31: '+char(31)+' (hex 0x1F)' field
from sysobjects
FOR XML AUTO, TYPE
The result will be:
Msg 6841, Level 16, State 1, Line 3 FOR XML could not serialize the
data for node 'field' because it contains a character (0x001F) which
is not allowed in XML. To retrieve this data using FOR XML, convert it
to binary, varbinary or image data type and use the BINARY BASE64
directive.
It is very useful when you write xml files and get error of invalid characters when validate it.
Here is a UDF I built to detectc columns with extended ascii charaters. It is quick and you can extended the character set you want to check. The second parameter allows you to switch between checking anything outside the standard character set or allowing an extended set:
create function [dbo].[udf_ContainsNonASCIIChars]
(
#string nvarchar(4000),
#checkExtendedCharset bit
)
returns bit
as
begin
declare #pos int = 0;
declare #char varchar(1);
declare #return bit = 0;
while #pos < len(#string)
begin
select #char = substring(#string, #pos, 1)
if ascii(#char) < 32 or ascii(#char) > 126
begin
if #checkExtendedCharset = 1
begin
if ascii(#char) not in (9,124,130,138,142,146,150,154,158,160,170,176,180,181,183,184,185,186,192,193,194,195,196,197,199,200,201,202,203,204,205,206,207,209,210,211,212,213,214,216,217,218,219,220,221,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,248,249,250,251,252,253,254,255)
begin
select #return = 1;
select #pos = (len(#string) + 1)
end
else
begin
select #pos = #pos + 1
end
end
else
begin
select #return = 1;
select #pos = (len(#string) + 1)
end
end
else
begin
select #pos = #pos + 1
end
end
return #return;
end
USAGE:
select Address1
from PropertyFile_English
where udf_ContainsNonASCIIChars(Address1, 1) = 1

Resources