WHERE clause on VARCHAR column seems to operate as a LIKE - sql-server

I've stumbled across a situation I've never seen before. I hope that someone can explain the following.
I've ran the following query, hoping to get only the results of columns whoes value is exactly equal to 1101
select '--' + MyColumn + '--' SeeSpaces, Len(MyColumn) as LengthOfColumn
from MyTable
where MyColumn = '1101'
However, I also see values where 1101 is followed by (what I believe are) spaces.
So SeeSpaces returns
--1101 --
And LengthOfColumn returns 4
MyColumn is a VARCHAR(8), NOT NULL column. Its values (including the spaces) are inserted through a separate workflow.
Why does this select not return only the exact results?
Thanks in advance

The reason is to do with the way that SQL server compares strings with trailing spaces, it follows the ANSI standard and so the strings '1101' and '1101 ' are equivalent.
See the following for more details:
INF: How SQL Server Compares Strings with Trailing Spaces

I think you have to use LTRIM() and RTRIM() function while comparing like :
LTRIM(RTRIM(MYCOLUMN))='1101'
Also LEN function does not count spaces, it only count characters in string. Please refere : http://msdn.microsoft.com/en-us/library/ms190329%28SQL.90%29.aspx

Related

SQL Server returns wrong result with trailing spaces in Where clause [duplicate]

In SQL Server 2008 I have a table called Zone with a column ZoneReference varchar(50) not null as the primary key.
If I run the following query:
select '"' + ZoneReference + '"' as QuotedZoneReference
from Zone
where ZoneReference = 'WF11XU'
I get the following result:
"WF11XU "
Note the trailing space.
How is this possible? If the trailing space really is there on that row, then I'd expect to return zero results, so I'm assuming it's something else that SQL Server Management Studio is displaying weirdly.
In C# code calling zoneReference.Trim() removes it, suggesting it is some sort of whitespace character.
Can anyone help?
That's the expected result: in SQL Server the = operator ignores trailing spaces when making the comparison.
SQL Server follows the ANSI/ISO SQL-92 specification (Section 8.2, , General rules #3) on how to compare strings with spaces. The ANSI standard requires padding for the character strings used in comparisons so that their lengths match before comparing them. The padding directly affects the semantics of WHERE and HAVING clause predicates and other Transact-SQL string comparisons. For example, Transact-SQL considers the strings 'abc' and 'abc ' to be equivalent for most comparison operations.
The only exception to this rule is the LIKE predicate. When the right side of a LIKE predicate expression features a value with a trailing space, SQL Server does not pad the two values to the same length before the comparison occurs. Because the purpose of the LIKE predicate, by definition, is to facilitate pattern searches rather than simple string equality tests, this does not violate the section of the ANSI SQL-92 specification mentioned earlier.
Source
Trailing spaces are not always ignored.
I experienced this issue today. My table had NCHAR columns and was being joined to VARCHAR data.
Because the data in the table was not as wide as its field, trailing spaces were automatically added by SQL Server.
I had an ITVF (inline table-valued function) that took varchar parameters.
The parameters were used in a JOIN to the table with the NCHAR fields.
The joins failed because the data passed to the function did not have trailing spaces but the data in the table did. Why was that?
I was getting tripped up on DATA TYPE PRECEDENCE. (See http://technet.microsoft.com/en-us/library/ms190309.aspx)
When comparing strings of different types, the lower precedence type is converted to the higher precedence type before the comparison. So my VARCHAR parameters were converted to NCHARs. The NCHARs were compared, and apparently the spaces were significant.
How did I fix this? I changed the function definition to use NVARCHAR parameters, which are of a higher precedence than NCHAR. Now the NCHARs were changed automatically by SQL Server into NVARCHARs and the trailing spaces were ignored.
Why didn't I just perform an RTRIM? Testing revealed that RTRIM killed the performance, preventing the JOIN optimizations that SQL Server would have otherwise used.
Why not change the data type of the table? The tables are already installed on customer sites, and they do not want to run maintenance scripts (time + money to pay DBAs) or give us access to their machinines (understandable).
Yeah, Mark is correct. Run the following SQL:
create table #temp (name varchar(15))
insert into #temp values ('james ')
select '"' + name + '"' from #temp where name ='james'
select '"' + name + '"' from #temp where name like 'james'
drop table #temp
But, the assertion about the 'like' statement appears not to work in the above example. Output:
(1 row(s) affected)
-----------------
"james "
(1 row(s) affected)
-----------------
"james "
(1 row(s) affected)
EDIT:
To get it to work, you could put at the end:
and name <> rtrim(ltrim(name))
Ugly though.
EDIT2:
Given the comments abovem, the following would work:
select '"' + name + '"' from #temp where 'james' like name
try
select Replace('"' + ZoneReference + '"'," ", "") as QuotedZoneReference from Zone where ZoneReference = 'WF11XU'

How can I make LIKE match a number or empty string inside square brackets in T-SQL?

Is it possible to have a LIKE clause with one character number or an empty string?
I have a field in which I will write a LIKE clause (as a string). I will apply it later with an expression in the WHERE clause: ... LIKE tableX.FormatField .... It must contain a number (a single character or an empty string).
Something like [0-9 ]. Where the space bar inside square brackets means an empty string.
I have a table in which I have a configuration for parameters - TblParam with field DataFormat. I have to validate a value from another table, TblValue, with field ValueToCheck. The validation is made by a query. The part for the validation looks like:
... WHERE TblValue.ValueToCheck LIKE TblParam.DataFormat ...
For the configuration value, I need an expression for one numeric character or an empty string. Something like [0-9'']. Because of the automatic nature of the check, I need a single expression (without AND OR OR operators) which can fit the query (see the example above). The same check is valid for other types of the checks, so I have to fit my check engine.
I am almost sure that I can not use [0-9''], but is there another suitable solution?
Actually, I have difficulty to validate a version string: 1.0.1.2 or 1.0.2. It can contain 2-3 dots (.) and numbers.
I am pretty sure it is not possible, as '' is not even a character.
select ascii(''); returns null.
'' = ' '; is true
'' is null; is false
If you want exactly 0-9 '' (and not ' '), then you do to something like this (in a more efficient way than like):
where col in ('1','2','3','4','5','6','7','9','0') or (col = '' and DATALENGTH(col) = 0)
That's a tricky one... As far as I can tell, there isn't a way to do it with only one like clause. You need to do like '[0-9]' OR like ''.
You could accomplish this by having a second column in your TableX. That indicates either a second pattern, or whether or not to include blanks.
If I correctly understand your question, you need something that catches an empty string. Try to use the nullif() function:
create table t1 (a nvarchar(1))
insert t1(a) values('')
insert t1(a) values('1')
insert t1(a) values('2')
insert t1(a) values('a')
-- must select first three
select a from t1 where a like '[0-9]' or nullif(a,'') is null
It returns exactly three records: '', '1' and '2'.
A more convenient method with only one range clause is:
select a from t1 where isnull(nullif(a,''),0) like '[0-9]'

SQL Server string comparison: nvarchar vs. varchar [duplicate]

I have nvarchar(50) column in SQL Server table and data like this:
123abc
234abc
456abc
My query:
select *
from table
where col like '%abc'
Expected result : all rows should be returned
Actual result: No rows are returned
Works fine if the column is varchar but returns no rows if the type is nvarchar.
Any ideas?
You probably have spaces at the end of your data. Take a look at this example.
Declare #Temp Table(col nvarchar(50))
Insert Into #Temp(col) Values(N'123abc')
Insert Into #Temp(col) Values(N'456abc ')
Select * From #Temp Where Col Like '%abc'
When you run the code above, you will only get the 123 row because the 456 row has a space on the end of it.
When you run the code shown below, you will get the data you expect.
Declare #Temp Table(col nvarchar(50))
Insert Into #Temp(col) Values(N'123abc')
Insert Into #Temp(col) Values(N'456abc ')
Select * From #Temp Where rtrim(Col) Like '%abc'
According to the documentation regarding LIKE in books on line (emphasis mine):
http://msdn.microsoft.com/en-us/library/ms179859.aspx
Pattern Matching by Using LIKE
LIKE supports ASCII pattern matching and Unicode pattern matching. When all arguments (match_expression, pattern, and escape_character, if present) are ASCII character data types, ASCII pattern matching is performed. If any one of the arguments are of Unicode data type, all arguments are converted to Unicode and Unicode pattern matching is performed. When you use Unicode data (nchar or nvarchar data types) with LIKE, trailing blanks are significant; however, for non-Unicode data, trailing blanks are not significant. Unicode LIKE is compatible with the ISO standard. ASCII LIKE is compatible with earlier versions of SQL Server.
for nvarchar type you can use select like this :-
select * from Table where ColumnName like N'%abc%'
Are you sure there are no spaces at the end of the value? You can do this to remove the white space:
select *
from yourTable
where rtrim(yourcolumn) like '%abc'
If you don't want to use the RTRIM and the LIKE together you can also use:
Select *
From yourTable
Where charindex('abc', col) > 0
From Microsoft about using LIKE:
SQL Server follows the ANSI/ISO SQL-92 specification (Section 8.2,
, General rules #3) on how to compare strings
with spaces. The ANSI standard requires padding for the character
strings used in comparisons so that their lengths match before
comparing them. The padding directly affects the semantics of WHERE
and HAVING clause predicates and other Transact-SQL string
comparisons. For example, Transact-SQL considers the strings 'abc' and
'abc ' to be equivalent for most comparison operations.
The only exception to this rule is the LIKE predicate. When the right
side of a LIKE predicate expression features a value with a trailing
space, SQL Server does not pad the two values to the same length
before the comparison occurs. Because the purpose of the LIKE
predicate, by definition, is to facilitate pattern searches rather
than simple string equality tests, this does not violate the section
of the ANSI SQL-92 specification mentioned earlier.
If you know the starting position of the 'abc' string then you can use SUBSTRING:
Select *
From yourTable
Where substring(Col, 4, 3) = 'abc'
But then you can use charindex and substring together and you do not have to worry about white space:
select *
from yourTable
where substring(col, charindex('abc', col), 3) = 'abc'
Your query should work just fine, but you can also try.
SELECT * FROM TABLE WHERE (COL LIKE '%abc%')
In case there are characters you cannot see after the 'abc' part.
This will work fine. You can try.
SELECT *
FROM "table"
WHERE CAST("col" AS VARCHAR) LIKE '%abc'
Just to document some of the ASCII vs Unicode weirdness:
-- ascii like
if 'Non rational ' like 'Non[ --]rational%' print 'like' else print 'not like'
-- unicode like
if N'Non rational ' like N'Non[ --]rational%' print 'like' else print 'not like'
-- unicode like, trailing space removed
if N'Non rational' like N'Non[ --]rational%' print 'like' else print 'not like'
-- unicode like, different wildcard
if N'Non rational ' like N'Non_rational%' print 'like' else print 'not like'
Produces the following:
like
not like
not like
like

How can I make SQL Server return FALSE for comparing varchars with and without trailing spaces?

If I deliberately store trailing spaces in a VARCHAR column, how can I force SQL Server to see the data as mismatch?
SELECT 'foo' WHERE 'bar' = 'bar '
I have tried:
SELECT 'foo' WHERE LEN('bar') = LEN('bar ')
One method I've seen floated is to append a specific character to the end of every string then strip it back out for my presentation... but this seems pretty silly.
Is there a method I've overlooked?
I've noticed that it does not apply to leading spaces so perhaps I run a function which inverts the character order before the compare.... problem is that this makes the query unSARGable....
From the docs on LEN (Transact-SQL):
Returns the number of characters of the specified string expression, excluding trailing blanks. To return the number of bytes used to represent an expression, use the DATALENGTH function
Also, from the support page on How SQL Server Compares Strings with Trailing Spaces:
SQL Server follows the ANSI/ISO SQL-92 specification on how to compare strings with spaces. The ANSI standard requires padding for the character strings used in comparisons so that their lengths match before comparing them.
Update: I deleted my code using LIKE (which does not pad spaces during comparison) and DATALENGTH() since they are not foolproof for comparing strings
This has also been asked in a lot of other places as well for other solutions:
SQL Server 2008 Empty String vs. Space
Is it good practice to trim whitespace (leading and trailing)
Why would SqlServer select statement select rows which match and rows which match and have trailing spaces
you could try somethign like this:
declare #a varchar(10), #b varchar(10)
set #a='foo'
set #b='foo '
select #a, #b, DATALENGTH(#a), DATALENGTH(#b)
Sometimes the dumbest solution is the best:
SELECT 'foo' WHERE 'bar' + 'x' = 'bar ' + 'x'
So basically append any character to both strings before making the comparison.
After some search the simplest solution i found was in Anthony Bloesch
WebLog.
Just add some text (a char is enough) to the end of the data (append)
SELECT 'foo' WHERE 'bar' + 'BOGUS_TXT' = 'bar ' + 'BOGUS_TXT'
Also works for 'WHERE IN'
SELECT <columnA>
FROM <tableA>
WHERE <columnA> + 'BOGUS_TXT' in ( SELECT <columnB> + 'BOGUS_TXT' FROM <tableB> )
The approach I’m planning to use is to use a normal comparison which should be index-keyable (“sargable”) supplemented by a DATALENGTH (because LEN ignores the whitespace). It would look like this:
DECLARE #testValue VARCHAR(MAX) = 'x';
SELECT t.Id, t.Value
FROM dbo.MyTable t
WHERE t.Value = #testValue AND DATALENGTH(t.Value) = DATALENGTH(#testValue)
It is up to the query optimizer to decide the order of filters, but it should choose to use an index for the data lookup if that makes sense for the table being tested and then further filter down the remaining result by length with the more expensive scalar operations. However, as another answer stated, it would be better to avoid these scalar operations altogether by using an indexed calculated column. The method presented here might make sense if you have no control over the schema , or if you want to avoid creating the calculated columns, or if creating and maintaining the calculated columns is considered more costly than the worse query performance.
I've only really got two suggestions. One would be to revisit the design that requires you to store trailing spaces - they're always a pain to deal with in SQL.
The second (given your SARG-able comments) would be to add acomputed column to the table that stores the length, and add this column to appropriate indexes. That way, at least, the length comparison should be SARG-able.

Unicode characters causing issues in SQL Server 2005 string comparison

This query:
select *
from op.tag
where tag = 'fussball'
Returns a result which has a tag column value of "fußball". Column "tag" is defined as nvarchar(150).
While I understand they are similar words grammatically, can anyone explain and defend this behavior? I assume it is related to the same collation settings which allow you to change case sensitivity on a column/table, but who would want this behavior? A unique constraint on the column also causes failure on inserts of one value when the other exists due to a constraint violation. How do I turn this off?
Follow-up bonus point question. Explain why this query does not return any rows:
select 1
where 'fußball' = 'fussball'
Bonus question (answer?): #ScottCher pointed out to me privately that this is due to the string literal "fussball" being treated as a varchar. This query DOES return a result:
select 1
where 'fußball' = cast('fussball' as nvarchar)
But then again, this one does not:
select 1
where cast('fußball' as varchar) = cast('fussball' as varchar)
I'm confused.
I guess the Unicode collation set for your connection/table/database specifies that ss == ß. The latter behavior would be because it's on a faulty fast path, or maybe it does a binary comparison, or maybe you're not passing in the ß in the right encoding (I agree it's stupid).
http://unicode.org/reports/tr10/#Searching mentions that U+00DF is special-cased. Here's an insightful excerpt:
Language-sensitive searching and
matching are closely related to
collation. Strings that compare as
equal at some strength level are those
that should be matched when doing
language-sensitive matching. For
example, at a primary strength, "ß"
would match against "ss" according to
the UCA, and "aa" would match "å" in a
Danish tailoring of the UCA.
The SELECT does return a row with collation Latin1_General_CI_AS (SQL2000).
It does not with collation Latin1_General_BIN.
You can assign a table column a collation by using the COLLATE < collation > keyword after N/VARCHAR.
You can also compare strings with a specific collation using the syntax
string1 = string2 COLLATE < collation >
This isn't an answer that explains behavior, but may be relevant:
In this question, I learned that using the collation of
Latin1_General_Bin
will avoid most collation quirks.
Some helper answers - not the complete one to your question, but still maybe helpful:
If you try:
SELECT 1 WHERE N'fußball' = N'fussball'
you'll get "1" - when using the "N" to signify Unicode, the two strings are considered the same - why that's the case, I don't know (yet).
To find the default collation for a server, use
SELECT SERVERPROPERTY('Collation')
To find the collation of a given column in a database, use this query:
SELECT
name 'Column Name',
OBJECT_NAME(object_id) 'Table Name',
collation_name
FROM sys.columns
WHERE object_ID = object_ID('your-table-name')
AND name = 'your-column-name'
Bonus question (answer?): #ScottCher
pointed out to me privately that this
is due to the string literal
"fussball" being treated as a varchar.
This query DOES return a result:
select 1 where 'fußball' =
cast('fussball' as nvarchar)
Here you're dealing with the SQL Server data type precedence rules, as stated in Data Type Precedence. Comparisons are done always using the higher precedence type:
When an operator combines two
expressions of different data types,
the rules for data type precedence
specify that the data type with the
lower precedence is converted to the
data type with the higher precedence.
Since nvarchar has a higher precedence than varchar, the comparison in your example will occur suing the nvarchar type, so it's really exactly the same as select 1 where N'fußball' =N'fussball' (ie. using Unicode types). I hope this also makes it clear why your last case doesn't return any row.

Resources