Calculate the row value based on different formula - sql-server

I want to calculate the cell values based on the given formula in each row.
Input record is:
Expected ouput is:

First thing is your formula is not correct. As I understand there is unique row for each sl. And to do the sum of col1 and col2 correct formula is (col1 + col2). So correct that thing first.
After that you may implement this using dynamic sql as
create table tab_sum ( id int, col1 int, col2 int, col varchar(max) )
insert into tab_sum ( id, col1, col2, col )
values ( 1, 3, 5, '(col1 + col2)' )
, ( 2, 4, 6, '(col1 + col2)' )
DECLARE #query NVARCHAR(MAX) = '';
WITH CTE_DistinctFormulas AS
(
SELECT DISTINCT col as Formula FROM tab_sum
)
SELECT #query = #query + '
UNION ALL
SELECT id, col1, col2, col, ' + Formula + ' AS CalcValue FROM tab_sum WHERE col = ''' + Formula + ''' '
FROM CTE_DistinctFormulas;
SET #query = STUFF(#query,1,12,'');
PRINT #query;
EXEC (#query);
Or if editing formula is not an option for you then you may try this, But make sure you have only one row for each id and its corresponding values otherwise it will sum all of the similar rows in one.
create table tab_sum ( id int, col1 int, col2 int, col varchar(max) )
insert into tab_sum ( id, col1, col2, col )
values ( 1, 3, 5, 'sum(col1 + col2)' )
, ( 2, 4, 6, 'sum(col1 + col2)' )
DECLARE #query NVARCHAR(MAX) = '';
WITH CTE_DistinctFormulas AS
(
SELECT DISTINCT col as Formula FROM tab_sum
)
SELECT #query = #query + '
UNION ALL
SELECT id, col1, col2, col, ' + Formula + ' AS CalcValue FROM tab_sum WHERE col = ''' + Formula + ''' group by id, col1, col2, col '
FROM CTE_DistinctFormulas;
SET #query = STUFF(#query,1,12,'');
PRINT #query;
EXEC (#query);

I think you want something like this:
select *,case when row_cal like '%1%' and row_cal like '%2%' then value1 + value2
when row_cal like '%1%' and row_cal like '%3%' then value1 + value3
end as result from YourTable

Did you have a look at the case/ when expression? See MS doc: CASE
So your query would be somthing like:
SELECT sl, audience_1, audience_2, audience_3,
CASE WHEN audience_3 > 0 THEN audience_1 + audience_3
ELSE audience_1 + audience_2
END audience_total
FROM audience
;
You did not provide the criteria, so I had to guess :-)
(I did not run the query since I had no server handy :-) )
UPDATE after comment by OP:
If you have 71 different columns for audience in the same table you have a differnt issue imho.
Then my solution would be:
split out the audience number to a detail table with sl,
audience_type and audience_count
create a audience_calc_config table with two columns calc_method and audience_type
for each different calculation and audience type add an entry to the above
table
remove formula from original table and replace with appropriate
calc_method
Then run a simple select with a JOIN, GROUP BY sl and SUM(audience)
(Of course it would be even nicer to split the audience_calc to a
master/ detail, so you can have a foreign key...)

Related

TSQL: Need to have new column instead of next row

Sorry if I asked in wrong way, I am new to SQL.
I have one temp table where I am inserting values after joining different tables
then,I came across where I need to update that temp table with 3 field( colm1 as datetime,colm2 datetime,colm3 money)
for this i created field in temp table .
now,
select colm1,colm2,sum(colm3)
from OrginalTable
where userid=#userid
group by colm1,colm2
lets suppose this gives result like:
here after updating into Temp table, if multiple data then i need to do like below till 15 times:
Colm1,Com2,Colm3, colm1,colm2,colm3,
instead of creating new row I need this to be displayed as new field with in 1 row
execution result should be all in 1 row.
I am really confused whether I can do what is expected or not, if there is way please somebody give me examples or make it solution to this.
Lastly, appreciate your help
You could use UNPIVOT and then Dynamic PIVOT
DEMO
Setup:
create table temp (
col1 integer, col2 integer, col3 integer
)
insert into temp values (1, 1, 1000);
insert into temp values (1, 2, 500);
insert into temp values (2, 3, 800);
insert into temp values (2, 4, 700);
insert into temp values (3, 1, 1100);
UNPIVOT Query: You need a row_number to create a fake column name. But then overwrite the row to 1 for everyone, so all columns appear in a single row
WITH cte as (
SELECT row_number() over (order by col1, col2) as rn,
col1, col2, col3
FROM temp
)
SELECT 1 as rn,
'c_' +
CAST(rn AS VARCHAR(16)) + '_' +
myCol as myCol,
myVal
INTO temp2
FROM
(SELECT rn, col1, col2, col3
FROM cte) p
UNPIVOT
(myVal FOR myCol IN
(col1, col2, col3)
)AS unpvt;
SELECT * FROM temp2;
Dynamic Pivot:
DECLARE #cols AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX);
SET #cols = STUFF((SELECT distinct ',' + QUOTENAME(c.myCol)
FROM temp2 c
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
set #query = 'SELECT rn, ' + #cols + ' from
(
select rn
, myCol
, myVal
from temp2
) x
pivot
(
max(myVal)
for myCol in (' + #cols + ')
) p ';
EXEC sp_executesql #query
Output: Unpivot and Dynamic Pivot results
that sound like something you should do on your UI layer not for db.
Otherwise you need something like
SELECT t1.*, t2.*, ....... , t15.*
FROM yourTable as t1,
yourTable as t2,
......
yourTable as t15
WHERE t1.col1 >= t2.col1 and t1.col2 > t2.col2
AND t2.col1 >= t3.col1 and t2.col2 > t3.col2
....
AND t14.col1 >= t14.col15 and t14.col2 > t15.col2
The problem is you need to know there are 15 rows, so if that number change your query wont work.
That is why you should do it on UI instead

SQL Pivot table without aggregate

I have a number of text files that are in a format similar to what is shown below.
ENTRY,1,000000,Widget 4000,1,,,2,,
FIELD,Type,A
FIELD,Component,Widget 4000
FIELD,Vendor,Acme
ENTRY,2,000000,PRODUCT XYZ,1,,,3,
FIELD,Type,B
FIELD,ItemAssembly,ABCD
FIELD,Component,Product XYZ - 123
FIELD,Description1,Product
FIELD,Description2,XYZ-123
FIELD,Description3,Alternate Part #440
FIELD,Vendor,Contoso
They have been imported into a table with VARCHAR(MAX) as the only field. Each ENTRY is a "new" item, and all the subsequent FIELD rows are properties of that item. The data next to the FIELD is the column name of the property. The data to the right of the property is the data I want to display.
The desired output would be:
ENTRY Type Component Vendor ItemAssembly Description1
1,000000,Widget 4000 A Widget 4000 Acme
2,000000,Product XYZ B Product XYZ-123 Contoso ABCD Product
I've got the column names using the code below (there are several tables that I have UNIONed together to list all the property names).
select #cols =
STUFF (
(select Distinct ', ' + QUOTENAME(ColName) from
(SELECT
SUBSTRING(ltrim(textFileData),CHARINDEX(',', textFileData, 1)+1,CHARINDEX(',', textFileData, CHARINDEX(',', textFileData, 1)+1)- CHARINDEX(',', textFileData, 1)-1) as ColName
FROM [MyDatabase].[dbo].[MyTextFile]
where
(LEFT(textFileData,7) LIKE #c)
UNION
....
) A
FOR XML PATH(''), TYPE).value('.','NVARCHAR(MAX)'),1,1,'')
Is a Pivot table the best way to do this? No aggregation is needed. Is there a better way to accomplish this? I want to list out data next to the FIELD name in a column format.
Thanks!
Here is the solution in SQL fiddle:
http://sqlfiddle.com/#!3/8f0b0/8
Prepare raw data in format (entry, field, value), use dynamic SQL to make pivot on unknown column count.
MAX() for string is enough to simulate "without aggregate" behavior in this case.
create table t(data varchar(max))
insert into t values('ENTRY,1,000000,Widget 4000,1,,,2,,')
insert into t values('FIELD,Type,A')
insert into t values('FIELD,Component,Widget 4000')
insert into t values('FIELD,Vendor,Acme ')
insert into t values('ENTRY,2,000000,PRODUCT XYZ,1,,,3,')
insert into t values('FIELD,Type,B')
insert into t values('FIELD,ItemAssembly,ABCD')
insert into t values('FIELD,Component,Product XYZ - 123')
insert into t values('FIELD,Description1,Product ')
insert into t values('FIELD,Description2,XYZ-123 ')
insert into t values('FIELD,Description3,Alternate Part #440')
insert into t values('FIELD,Vendor,Contoso');
create type preparedtype as table (entry varchar(max), field varchar(max), value varchar(max))
declare #prepared preparedtype
;with identified as
(
select
row_number ( ) over (order by (select 1)) as id,
substring(data, 1, charindex(',', data) - 1) as type,
substring(data, charindex(',', data) + 1, len(data)) as data
from t
)
, tree as
(
select
id,
(select max(id)
from identified
where type = 'ENTRY'
and id <= i.id) as parentid,
type,
data
from identified as i
)
, pivotsrc as
(
select
p.data as entry,
substring(c.data, 1, charindex(',', c.data) - 1) as field,
substring(c.data, charindex(',', c.data) + 1, len(c.data)) as value
from tree as p
inner join tree as c on c.parentid = p.id
where p.id = p.parentid
and c.parentid <> c.id
)
insert into #prepared
select * from pivotsrc
declare #dynamicPivotQuery as nvarchar(max)
declare #columnName as nvarchar(max)
select #columnName = ISNULL(#ColumnName + ',','')
+ QUOTENAME(field)
from (select distinct field from #prepared) AS fields
set #dynamicPivotQuery = N'select * from #prepared
pivot (max(value) for field in (' + #columnName + ')) as result'
exec sp_executesql #DynamicPivotQuery, N'#prepared preparedtype readonly', #prepared
Here your are, this comes back exactly as you need it. I love tricky SQL :-). This is a real ad-hoc singel-statement call.
DECLARE #tbl TABLE(OneCol VARCHAR(MAX));
INSERT INTO #tbl
VALUES('ENTRY,1,000000,Widget 4000,1,,,2,,')
,('FIELD,Type,A')
,('FIELD,Component,Widget 4000')
,('FIELD,Vendor,Acme ')
,('ENTRY,2,000000,PRODUCT XYZ,1,,,3,')
,('FIELD,Type,B')
,('FIELD,ItemAssembly,ABCD')
,('FIELD,Component,Product XYZ - 123')
,('FIELD,Description1,Product ')
,('FIELD,Description2,XYZ-123 ')
,('FIELD,Description3,Alternate Part #440')
,('FIELD,Vendor,Contoso');
WITH OneColumn AS
(
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 1)) AS inx
,CAST('<root><r>' + REPLACE(OneCol,',','</r><r>') + '</r></root>' AS XML) AS Split
FROM #tbl AS tbl
)
,AsParts AS
(
SELECT inx
,Each.part.value('/root[1]/r[1]','varchar(max)') AS Part1
,Each.part.value('/root[1]/r[2]','varchar(max)') AS Part2
,Each.part.value('/root[1]/r[3]','varchar(max)') AS Part3
,Each.part.value('/root[1]/r[4]','varchar(max)') AS Part4
,Each.part.value('/root[1]/r[5]','varchar(max)') AS Part5
FROM OneColumn
CROSS APPLY Split.nodes('/root') AS Each(part)
)
,TheEntries AS
(
SELECT DISTINCT *
FROM AsParts
WHERE Part1='ENTRY'
)
SELECT TheEntries.Part2 + ',' + TheEntries.Part3 + ',' + TheEntries.Part4 AS [ENTRY]
,MyFields.AsXML.value('(fields[1]/field[Part2="Type"])[1]/Part3[1]','varchar(max)') AS [Type]
,MyFields.AsXML.value('(fields[1]/field[Part2="Component"])[1]/Part3[1]','varchar(max)') AS Component
,MyFields.AsXML.value('(fields[1]/field[Part2="Vendor"])[1]/Part3[1]','varchar(max)') AS Vendor
,MyFields.AsXML.value('(fields[1]/field[Part2="ItemAssembly"])[1]/Part3[1]','varchar(max)') AS ItemAssembly
,MyFields.AsXML.value('(fields[1]/field[Part2="Description1"])[1]/Part3[1]','varchar(max)') AS Description1
FROM TheEntries
CROSS APPLY
(
SELECT *
FROM AsParts AS ap
WHERE ap.Part1='FIELD' AND ap.inx>TheEntries.inx
AND ap.inx < ISNULL((SELECT TOP 1 nextEntry.inx FROM TheEntries AS nextEntry WHERE nextEntry.inx>TheEntries.inx ORDER BY nextEntry.inx DESC),10000000)
ORDER BY ap.inx
FOR XML PATH('field'), ROOT('fields'),TYPE
) AS MyFields(AsXML)

SQL Server Concatenate Query Results

I am new to sql server.
I have a query that retrieves the desired data from various tables.
The result looks like this with other columns, that contain things like name, removed for simplicity.
id xvalue
1 x
1 y
1 z
2 x
2 y
2 z
3 x
3 y
3 z
I would like to wrap the query with a select to concatenate the result set into an new result set like this.
id xvalue
1 x,y,z
2 x,y,z
3 x,y,z
I have tried to figure out how use the for xml path option and cannot seem to find the
correct syntax.
We have two tables in SQL one with columns (id,mdf,hist) another table with columns (id,hist). First table called historial, second table is resultado. Then we need to concat hist column when the mfn is the same and copy the result to the column hist in resultado table.
DECLARE #iterator INT
SET #iterator = 1
WHILE (#iterator < 100) /* Number of rows */
BEGIN
DECLARE #ListaIncidencias VARCHAR(MAX) = ''
SELECT #ListaIncidencias = #ListaIncidencias + ';' + p.hist
FROM historial p WHERE mfn = #iterator
SET #ListaIncidencias = SUBSTRING(#ListaIncidencias,2,LEN(#ListaIncidencias))
SELECT #ListaIncidencias as 'Incidencias'
INSERT INTO resultado(hist) VALUES(#ListaIncidencias) /* Insert results in new table */
SET #iterator = #iterator + 1
END
Try this
CREATE TABLE #Tmp
(
id INT ,
xValue VARCHAR(10)
)
INSERT INTO #Tmp VALUES ( 1, 'x' )
INSERT INTO #Tmp VALUES ( 1, 'y' )
INSERT INTO #Tmp VALUES ( 1, 'Z' )
INSERT INTO #Tmp VALUES ( 2, 'A' )
INSERT INTO #Tmp VALUES ( 2, 'B' )
SELECT id ,
( STUFF(( SELECT DISTINCT
',' + CAST(xValue AS VARCHAR(20))
FROM #Tmp
WHERE id = t.id
FOR
XML PATH('')
), 1, 1, '') ) AS Data
FROM #Tmp t
GROUP BY id
Something like this should do the trick for you. SQL Server should really just have an easy syntax for this. But it'll give you a comma and space separated list of the xValues.
SELECT id
,STUFF(
(SELECT DISTINCT
', ' + t.xvalue
FROM tableName t
WHERE t.id = t1.id
ORDER BY ', ' + t.xvalue
FOR XML PATH(''), TYPE
).value('.','varchar(max)')
,1,2, ''
) AS xvalue
FROM tableName t1
GROUP BY id
select id,Ids=Stuff((SELECT ',' + xvalue FROM t t1 WHERE t1.id=t.id
FOR XML PATH (''))
, 1, 1, '' )
from t
GROUP BY id
FIDDLE
There is very new functionality in Azure SQL Database and SQL Server 2016 to handle this exact scenario. Example:
select id, STRING_AGG(xvalue, ',') as xvalue
from some_table
group by id
STRING_AGG - https://msdn.microsoft.com/en-us/library/mt790580.aspx

Extracting data column-wise from table in SQL Server

I currently have a piece of code that pivots a table in which the row data is inserted dynamically. The code is shown below:
CREATE TABLE Table1
([empname] varchar(6), [empqual] varchar(10), [emprank] int, [empexp] int)
INSERT INTO Table1
([empname], [empqual], [emprank], [empexp])
VALUES
('Joyce', 'UNIVERSITY', 1, 11),
('Angela', 'MASTERS', 2, 10),
('Lily', 'MASTERS', 3, 9),
('Sasha', 'UNIVERSITY', 3, 9),
('Harry', 'UNIVERSITY', 3, 9)
DECLARE #cols AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX)
SET #cols = STUFF((SELECT distinct ',' + 'Column' + CONVERT(VARCHAR,Row_Number() OVER (Order By emprank))
FROM Table1 c
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
SET #query = '
SELECT *
FROM
(
SELECT ''Column'' + CONVERT(VARCHAR,Row_Number() OVER (Order By emprank)) AS Columns,
CONVERT(VARCHAR,empname) as e
FROM Table1
) p
PIVOT
(
MAX (e)
FOR Columns IN
(
' + #cols + ' )
) as pvt
UNION
SELECT *
FROM
(
SELECT ''Column'' + CONVERT(VARCHAR,Row_Number() OVER (Order By emprank)) AS Columns,
CONVERT(VARCHAR,empqual) as e
FROM Table1
) p
PIVOT
(
MAX (e)
FOR Columns IN
(
' + #cols + ' )
) as pvt
UNION
SELECT *
FROM
(
SELECT ''Column'' + CONVERT(VARCHAR,Row_Number() OVER (Order By emprank)) AS Columns,
CONVERT(VARCHAR,emprank) as e
FROM Table1
) p
PIVOT
(
MAX (e)
FOR Columns IN
(
' + #cols + ' )
) as pvt
UNION
SELECT *
FROM
(
SELECT ''Column'' + CONVERT(VARCHAR,Row_Number() OVER (Order By emprank)) AS Columns,
CONVERT(VARCHAR,empexp) as e,
FROM Table1
) p
PIVOT
(
MAX (e)
FOR Columns IN
(
' + #cols + ' )
) as pvt
'
EXECUTE (#query)
The result of the above code is as shown below:
Column1 Column2 Column3 Column4 Column5
1 2 3 3 3
11 10 9 9 9
Joyce Angela Lily Sasha Harry
UNIVERSITY MASTERS MASTERS UNIVERSITY UNIVERSITY
Now, my application requires that I display each of the columns in this table separately, i.e. each of the columns, and not rows, of this table needs to be exported from this table and transferred, possibly into a temporary table, from which it can be displayed easily.
I am well aware of the fact that relational DBs are designed in such a way so as to consider rows, not columns, as individual entities. However, I am constrained by the application on which I am working, which requires code that extracts the data in this table column-wise so that they can be displayed separately.
How would I go about doing this?
This is a terminology overlap.
In SQL, "row" means (roughly) a single entity, and "column" means a property on the entities.
In UI, "row" means data arranged horizontally, and "column" means data arranged vertically.
Your requirement is that you should display your entities vertically. So, retrieve your entities (SQL rows) and then add code in your application to display this data in UI columns. It's unfortunate and perhaps confusing that the terms are the same here, but remember, your database structure (and choice of terminology) is completely irrelevant to your UI layout.
As to what code in your application is required to display the data... well, you haven't even told us what language it's in, so I can't help there.

Find non-ASCII characters in varchar columns using SQL Server

How can rows with non-ASCII characters be returned using SQL Server?
If you can show how to do it for one column would be great.
I am doing something like this now, but it is not working
select *
from Staging.APARMRE1 as ar
where ar.Line like '%[^!-~ ]%'
For extra credit, if it can span all varchar columns in a table, that would be outstanding! In this solution, it would be nice to return three columns:
The identity field for that record. (This will allow the whole record to be reviewed with another query.)
The column name
The text with the invalid character
Id | FieldName | InvalidText |
----+-----------+-------------------+
25 | LastName | Solís |
56 | FirstName | François |
100 | Address1 | 123 Ümlaut street |
Invalid characters would be any outside the range of SPACE (3210) through ~ (12710)
Here is a solution for the single column search using PATINDEX.
It also displays the StartPosition, InvalidCharacter and ASCII code.
select line,
patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,Line) as [Position],
substring(line,patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,Line),1) as [InvalidCharacter],
ascii(substring(line,patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,Line),1)) as [ASCIICode]
from staging.APARMRE1
where patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,Line) >0
I've been running this bit of code with success
declare #UnicodeData table (
data nvarchar(500)
)
insert into
#UnicodeData
values
(N'Horse�')
,(N'Dog')
,(N'Cat')
select
data
from
#UnicodeData
where
data collate LATIN1_GENERAL_BIN != cast(data as varchar(max))
Which works well for known columns.
For extra credit, I wrote this quick script to search all nvarchar columns in a given table for Unicode characters.
declare
#sql varchar(max) = ''
,#table sysname = 'mytable' -- enter your table here
;with ColumnData as (
select
RowId = row_number() over (order by c.COLUMN_NAME)
,c.COLUMN_NAME
,ColumnName = '[' + c.COLUMN_NAME + ']'
,TableName = '[' + c.TABLE_SCHEMA + '].[' + c.TABLE_NAME + ']'
from
INFORMATION_SCHEMA.COLUMNS c
where
c.DATA_TYPE = 'nvarchar'
and c.TABLE_NAME = #table
)
select
#sql = #sql + 'select FieldName = ''' + c.ColumnName + ''', InvalidCharacter = [' + c.COLUMN_NAME + '] from ' + c.TableName + ' where ' + c.ColumnName + ' collate LATIN1_GENERAL_BIN != cast(' + c.ColumnName + ' as varchar(max)) ' + case when c.RowId <> (select max(RowId) from ColumnData) then ' union all ' else '' end + char(13)
from
ColumnData c
-- check
-- print #sql
exec (#sql)
I'm not a fan of dynamic SQL but it does have its uses for exploratory queries like this.
try something like this:
DECLARE #YourTable table (PK int, col1 varchar(20), col2 varchar(20), col3 varchar(20));
INSERT #YourTable VALUES (1, 'ok','ok','ok');
INSERT #YourTable VALUES (2, 'BA'+char(182)+'D','ok','ok');
INSERT #YourTable VALUES (3, 'ok',char(182)+'BAD','ok');
INSERT #YourTable VALUES (4, 'ok','ok','B'+char(182)+'AD');
INSERT #YourTable VALUES (5, char(182)+'BAD','ok',char(182)+'BAD');
INSERT #YourTable VALUES (6, 'BAD'+char(182),'B'+char(182)+'AD','BAD'+char(182)+char(182)+char(182));
--if you have a Numbers table use that, other wise make one using a CTE
WITH AllNumbers AS
( SELECT 1 AS Number
UNION ALL
SELECT Number+1
FROM AllNumbers
WHERE Number<1000
)
SELECT
pk, 'Col1' BadValueColumn, CONVERT(varchar(20),col1) AS BadValue --make the XYZ in convert(varchar(XYZ), ...) the largest value of col1, col2, col3
FROM #YourTable y
INNER JOIN AllNumbers n ON n.Number <= LEN(y.col1)
WHERE ASCII(SUBSTRING(y.col1, n.Number, 1))<32 OR ASCII(SUBSTRING(y.col1, n.Number, 1))>127
UNION
SELECT
pk, 'Col2' BadValueColumn, CONVERT(varchar(20),col2) AS BadValue --make the XYZ in convert(varchar(XYZ), ...) the largest value of col1, col2, col3
FROM #YourTable y
INNER JOIN AllNumbers n ON n.Number <= LEN(y.col2)
WHERE ASCII(SUBSTRING(y.col2, n.Number, 1))<32 OR ASCII(SUBSTRING(y.col2, n.Number, 1))>127
UNION
SELECT
pk, 'Col3' BadValueColumn, CONVERT(varchar(20),col3) AS BadValue --make the XYZ in convert(varchar(XYZ), ...) the largest value of col1, col2, col3
FROM #YourTable y
INNER JOIN AllNumbers n ON n.Number <= LEN(y.col3)
WHERE ASCII(SUBSTRING(y.col3, n.Number, 1))<32 OR ASCII(SUBSTRING(y.col3, n.Number, 1))>127
order by 1
OPTION (MAXRECURSION 1000);
OUTPUT:
pk BadValueColumn BadValue
----------- -------------- --------------------
2 Col1 BA¶D
3 Col2 ¶BAD
4 Col3 B¶AD
5 Col1 ¶BAD
5 Col3 ¶BAD
6 Col1 BAD¶
6 Col2 B¶AD
6 Col3 BAD¶¶¶
(8 row(s) affected)
This script searches for non-ascii characters in one column. It generates a string of all valid characters, here code point 32 to 127. Then it searches for rows that don't match the list:
declare #str varchar(128);
declare #i int;
set #str = '';
set #i = 32;
while #i <= 127
begin
set #str = #str + '|' + char(#i);
set #i = #i + 1;
end;
select col1
from YourTable
where col1 like '%[^' + #str + ']%' escape '|';
running the various solutions on some real world data - 12M rows varchar length ~30, around 9k dodgy rows, no full text index in play, the patIndex solution is the fastest, and it also selects the most rows.
(pre-ran km. to set the cache to a known state, ran the 3 processes, and finally ran km again - the last 2 runs of km gave times within 2 seconds)
patindex solution by Gerhard Weiss -- Runtime 0:38, returns 9144 rows
select dodgyColumn from myTable fcc
WHERE patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,dodgyColumn ) >0
the substring-numbers solution by MT. -- Runtime 1:16, returned 8996 rows
select dodgyColumn from myTable fcc
INNER JOIN dbo.Numbers32k dn ON dn.number<(len(fcc.dodgyColumn ))
WHERE ASCII(SUBSTRING(fcc.dodgyColumn , dn.Number, 1))<32
OR ASCII(SUBSTRING(fcc.dodgyColumn , dn.Number, 1))>127
udf solution by Deon Robertson -- Runtime 3:47, returns 7316 rows
select dodgyColumn
from myTable
where dbo.udf_test_ContainsNonASCIIChars(dodgyColumn , 1) = 1
There is a user defined function available on the web 'Parse Alphanumeric'. Google UDF parse alphanumeric and you should find the code for it. This user defined function removes all characters that doesn't fit between 0-9, a-z, and A-Z.
Select * from Staging.APARMRE1 ar
where udf_parsealpha(ar.last_name) <> ar.last_name
That should bring back any records that have a last_name with invalid chars for you...though your bonus points question is a bit more of a challenge, but I think a case statement could handle it. This is a bit psuedo code, I'm not entirely sure if it'd work.
Select id, case when udf_parsealpha(ar.last_name) <> ar.last_name then 'last name'
when udf_parsealpha(ar.first_name) <> ar.first_name then 'first name'
when udf_parsealpha(ar.Address1) <> ar.last_name then 'Address1'
end,
case when udf_parsealpha(ar.last_name) <> ar.last_name then ar.last_name
when udf_parsealpha(ar.first_name) <> ar.first_name then ar.first_name
when udf_parsealpha(ar.Address1) <> ar.last_name then ar.Address1
end
from Staging.APARMRE1 ar
where udf_parsealpha(ar.last_name) <> ar.last_name or
udf_parsealpha(ar.first_name) <> ar.first_name or
udf_parsealpha(ar.Address1) <> ar.last_name
I wrote this in the forum post box...so I'm not quite sure if that'll function as is, but it should be close. I'm not quite sure how it will behave if a single record has two fields with invalid chars either.
As an alternative, you should be able to change the from clause away from a single table and into a subquery that looks something like:
select id,fieldname,value from (
Select id,'last_name' as 'fieldname', last_name as 'value'
from Staging.APARMRE1 ar
Union
Select id,'first_name' as 'fieldname', first_name as 'value'
from Staging.APARMRE1 ar
---(and repeat unions for each field)
)
where udf_parsealpha(value) <> value
Benefit here is for every column you'll only need to extend the union statement here, while you need to put that comparisson three times for every column in the case statement version of this script
To find which field has invalid characters:
SELECT * FROM Staging.APARMRE1 FOR XML AUTO, TYPE
You can test it with this query:
SELECT top 1 'char 31: '+char(31)+' (hex 0x1F)' field
from sysobjects
FOR XML AUTO, TYPE
The result will be:
Msg 6841, Level 16, State 1, Line 3 FOR XML could not serialize the
data for node 'field' because it contains a character (0x001F) which
is not allowed in XML. To retrieve this data using FOR XML, convert it
to binary, varbinary or image data type and use the BINARY BASE64
directive.
It is very useful when you write xml files and get error of invalid characters when validate it.
Here is a UDF I built to detectc columns with extended ascii charaters. It is quick and you can extended the character set you want to check. The second parameter allows you to switch between checking anything outside the standard character set or allowing an extended set:
create function [dbo].[udf_ContainsNonASCIIChars]
(
#string nvarchar(4000),
#checkExtendedCharset bit
)
returns bit
as
begin
declare #pos int = 0;
declare #char varchar(1);
declare #return bit = 0;
while #pos < len(#string)
begin
select #char = substring(#string, #pos, 1)
if ascii(#char) < 32 or ascii(#char) > 126
begin
if #checkExtendedCharset = 1
begin
if ascii(#char) not in (9,124,130,138,142,146,150,154,158,160,170,176,180,181,183,184,185,186,192,193,194,195,196,197,199,200,201,202,203,204,205,206,207,209,210,211,212,213,214,216,217,218,219,220,221,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,248,249,250,251,252,253,254,255)
begin
select #return = 1;
select #pos = (len(#string) + 1)
end
else
begin
select #pos = #pos + 1
end
end
else
begin
select #return = 1;
select #pos = (len(#string) + 1)
end
end
else
begin
select #pos = #pos + 1
end
end
return #return;
end
USAGE:
select Address1
from PropertyFile_English
where udf_ContainsNonASCIIChars(Address1, 1) = 1

Resources