1+1=3? Space characters in nvarchar variables and string lengths - sql-server

I've just stumbled upon this:
Why doesn't the following code:
DECLARE #s nvarchar(10) = N' '
PRINT CONCAT('#', #s, '#')
PRINT CONCAT('#', LEN(#s), '#')
result in either the output
##
#0#
or
# #
#1#
On a SQL Server 2017, however, this code produces the output
# #
#0#
Which seems contradictory to me.
Either the string has the length 0 and is '' or the length 1 and is ' '.
The whole thing becomes even stranger if you add the following code:
DECLARE #s nvarchar(10) = N' '
PRINT CONCAT('#', #s, '#')
PRINT CONCAT('#', LEN(#s), '#')
DECLARE #l1 int = LEN(CONCAT('#', #s, '#'))
PRINT LEN(#s)
PRINT LEN('#')
PRINT #l1
Which outputs the following:
# #
#0#
0
1
3
So we have three substrings, one with length 0, two with length 1. The total string then has length 3? I'm confused.
If you fill #s with several spaces, it looks even more funny - e.g. 5 spaces results in this output:
# #
#0#
0
1
7
So here's 1×0 + 2×1 even 7. I wish my bank would calculate my account balance like this.
Can someone explain to me what's going on?
Many thanks for your help!

LEN
Returns the number of characters of the specified string expression,
excluding trailing spaces.
So LEN(' ') = 0 (only spaces), but LEN(' x') = 2 (no trailing spaces).
LEN excludes trailing spaces. If that is a problem, consider using the
DATALENGTH (Transact-SQL) function which does not trim the string. If
processing a unicode string, DATALENGTH will return twice the number
of characters.

Related

UNDOCUMENTED FEATURE when SELECT in VARCHAR with trailing whitespace SQL Server

I hope this is an interesting puzzle for an SQL expert out there.
When I run the following query, I would expect it to return no results.
-- Create a table variable Note: This same behaviour occurs in standard tables.
DECLARE #TestResults TABLE (Id int IDENTITY(1,1) NOT NULL, Foo VARCHAR(100) NOT NULL, About VARCHAR(1000) NOT NULL)
-- Add some test data Note: Without space, space prefix and space suffix
INSERT INTO #TestResults(Foo, About) VALUES('Bar', 'No spaces')
INSERT INTO #TestResults(Foo, About) VALUES('Bar ', 'Space Suffix')
INSERT INTO #TestResults(Foo, About) VALUES(' Bar', 'Space prefix')
-- SELECT statement that is filtered by a value without a space and also a value with a space suffix
SELECT
t.Foo
, t.About
FROM #TestResults t
WHERE t.Foo like 'Bar '
AND t.Foo like 'Bar'
AND t.Foo = 'Bar '
AND t.Foo = 'Bar'
The results return a single row:
[Foo] [About]
Bar Space Suffix
I need to know more about this behaviour and how I should work around it.
It is also worth noting that LEN(Foo) is odd too, as follows:
DECLARE #TestResults TABLE (Id int IDENTITY(1,1) NOT NULL, Foo VARCHAR(100) NOT NULL, About VARCHAR(1000) NOT NULL)
INSERT INTO #TestResults(Foo, About) VALUES('Bar', 'No spaces')
INSERT INTO #TestResults(Foo, About) VALUES('Bar ', 'Space Suffix')
INSERT INTO #TestResults(Foo, About) VALUES(' Bar', 'Space prefix')
SELECT
t.Foo
, LEN(Foo) [Length]
, t.About
FROM #TestResults t
Gives the following results:
[Foo] [Length] [About]
Bar 3 No spaces
Bar 3 Space Suffix
Bar 4 Space prefix
Without any lateral thinking, what do I need to change my WHERE clause to in order to return 0 results as expected?
The answer is to add the following clause:
AND DATALENGTH(t.Foo) = DATALENGTH('Bar')
Running the following query...
DECLARE #Chars TABLE (CharNumber INT NOT NULL)
DECLARE #CharNumber INT = 0
WHILE(#CharNumber <= 255)
BEGIN
INSERT INTO #Chars(CharNumber) VALUES(#CharNumber)
SET #CharNumber = #CharNumber + 1
END
SELECT
CharNumber
, IIF('Test' = 'Test' + CHAR(CharNumber),1,0) ['Test' = 'Test' + CHAR(CharNumber)]
, IIF('Test' LIKE 'Test' + CHAR(CharNumber),1,0) ['Test' LIKE 'Test' + CHAR(CharNumber)]
, IIF(LEN('Test') = LEN('Test' + CHAR(CharNumber)),1,0) [LEN('Test') = LEN('Test' + CHAR(CharNumber))]
, IIF(DATALENGTH('Test') = DATALENGTH('Test' + CHAR(CharNumber)),1,0) [DATALENGTH('Test') = DATALENGTH('Test' + CHAR(CharNumber))]
FROM #Chars
WHERE ('Test' = 'Test' + CHAR(CharNumber))
OR ('Test' LIKE 'Test' + CHAR(CharNumber))
OR (LEN('Test') = LEN('Test' + CHAR(CharNumber)))
ORDER BY CharNumber
...produces the following results...
CharNumber 'Test' = 'Test' + CHAR(CharNumber) 'Test' LIKE 'Test' + CHAR(CharNumber) LEN('Test') = LEN('Test' + CHAR(CharNumber)) DATALENGTH('Test') = DATALENGTH('Test' + CHAR(CharNumber))
0 1 1 0 0
32 1 0 1 0
37 0 1 0 0
DATALENGTH can be used to test the equality of two VARCHAR, therefore the original query can be corrected as follows:
-- Create a table variable Note: This same behaviour occurs in standard tables.
DECLARE #TestResults TABLE (Id int IDENTITY(1,1) NOT NULL, Foo VARCHAR(100) NOT NULL, About VARCHAR(1000) NOT NULL)
-- Add some test data Note: Without space, space prefix and space suffix
INSERT INTO #TestResults(Foo, About) VALUES('Bar', 'No spaces')
INSERT INTO #TestResults(Foo, About) VALUES('Bar ', 'Space Suffix')
INSERT INTO #TestResults(Foo, About) VALUES(' Bar', 'Space prefix')
-- SELECT statement that is filtered by a value without a space and also a value with a space suffix
SELECT
t.Foo
, t.About
FROM #TestResults t
WHERE t.Foo like 'Bar '
AND t.Foo like 'Bar'
AND t.Foo = 'Bar '
AND t.Foo = 'Bar'
AND DATALENGTH(t.Foo) = DATALENGTH('Bar') -- Additional clause
I also made a function to be used instead of =
ALTER FUNCTION dbo.fVEQ( #VarCharA VARCHAR(MAX), #VarCharB VARCHAR(MAX) )
RETURNS BIT
WITH SCHEMABINDING
AS
BEGIN
-- Added by WonderWorker on 18th March 2020
DECLARE #Result BIT = IIF(
(#VarCharA = #VarCharB AND DATALENGTH(#VarCharA) = DATALENGTH(#VarCharB))
, 1, 0)
RETURN #Result
END
..Here is a test for all 256 characters used as trailing characters to prove that it works..
-- Test fVEQ with all 256 characters
DECLARE #Chars TABLE (CharNumber INT NOT NULL)
DECLARE #CharNumber INT = 0
WHILE(#CharNumber <= 255)
BEGIN
INSERT INTO #Chars(CharNumber) VALUES(#CharNumber)
SET #CharNumber = #CharNumber + 1
END
SELECT
CharNumber
, dbo.fVEQ('Bar','Bar' + CHAR(CharNumber)) [fVEQ Trailing Char Test]
, dbo.fVEQ('Bar','Bar') [fVEQ Same test]
, dbo.fVEQ('Bar',CHAR(CharNumber) + 'Bar') [fVEQ Leading Char Test]
FROM #Chars
WHERE (dbo.fVEQ('Bar','Bar' + CHAR(CharNumber)) = 1)
AND (dbo.fVEQ('Bar','Bar') = 0)
AND (dbo.fVEQ('Bar',CHAR(CharNumber) + 'Bar') = 1)
The reason why trailing whitespace is disregarded in string comparison, is because of the notion of fixed-length string fields, in which any content shorter than the fixed length is automatically right-padded with spaces. Such fixed-length fields cannot distinguish meaningful trailing spaces from padding.
The rationale for why fixed-length string fields even exist, is that they improve performance significantly in many cases, and when SQL was designed it was common for character-based terminals (which usually treated trailing spaces equivalent to padding), reports printed with monospaced fonts (which used trailing spaces for padding and alignment), and data storage and exchange formats (which used fixed-length fields in place of extensive and costly delimiters and complicated parsing logic), to all be oriented around fixed-length fields, so there was a tight integration with this concept at all stages of processing.
When comparing two fixed-length fields of the same fixed length, a literal comparison would of course be possible and would produce correct results.
But when comparing a fixed-length field of a given fixed length, to a fixed-length field of a different fixed length, the desired behaviour would never be to include the trailing spaces in the comparison, since two such fields could never match literally simply by virtue of their differing fixed lengths. The shorter field could be cast and padded to the length of the longer (at least conceptually if not physically), but the trailing space would still then be considered as padding rather than as meaningful.
When comparing a fixed-length field to a variable-length field, the desired behaviour is also probably never to include trailing spaces in the comparison. More complicated approaches which attempt to attribute meaning to trailing spaces in the variable-length side of the comparison, would only come at the cost of slower comparison logic and additional conceptual complexity and potential for error.
In terms of why variable-length to variable-length comparisons ignore trailing spaces, since here spaces can be meaningful in principle, the rationale is probably maintaining consistency in comparison behaviour as when fixed-length fields are involved, and the avoidance of the most common kind of error, since trailing spaces are spurious in databases far more often than they are meaningful.
Nowadays, a database system designed in every respect from scratch would probably forsake fixed-length fields, and probably perform all comparisons literally, leaving the developer to deal explicitly with spurious trailing spaces, but in my experience this would result in extra development effort and far more frequent error than the current arrangement in SQL, where errors in program logic involving the silent disregard of trailing spaces usually only occurs when designing complex string-shredding logic to be used against un-normalised data (which is a kind of data that SQL is specifically not optimised for handling).
So to be clear, this is not an undocumented feature, but a prominent feature that exists by design.
If you change the query to
SELECT
Foo
, About
, CASE WHEN Foo LIKE 'Bar ' THEN 'T' ELSE 'F' END As Like_Bar_Space
, CASE WHEN Foo LIKE 'Bar' THEN 'T' ELSE 'F' END As Like_Bar
, CASE WHEN Foo = 'Bar ' THEN 'T' ELSE 'F' END As EQ_Bar_Space
, CASE WHEN Foo = 'Bar' THEN 'T' ELSE 'F' END As EQ_Bar
FROM #TestResults
it gives you a better overview, as you see the result of the different conditions separately:
Foo About Like_Bar_Space Like_Bar EQ_Bar_Space EQ_Bar
------ ------------ --------------- --------- ------------- ------
Bar No spaces F T T T
Bar Space Suffix T T T T
Bar Space prefix F F F F
It looks like equals = ignores trailing spaces in both searched string and pattern. LIKE, however, does not ignore the trailing space in the pattern but ignores an extra trailing space in the searched string. Leading spaces are never ignored.
I don't know how wrong entries got in there, but you can fix them with
UPDATE #TestResults SET Foo = TRIM(Foo)
You can make a trailing space sensitive test with:
WHERE t.Foo + ";" = pattern + ";"
You can make a trailing space insensitive test with:
WHERE RTRIM(t.Foo) = RTRIM(pattern)

Why does LEN( char(32) ) = 0 in T-SQL?

I wanted to write a function to count the number of delimiters or any substring (which could be a space) in a string of text, throwing a hack error if the delimiter was null or empty:
if len(#lookfor)=0 or #lookfor is null return Cast('substring must not be null or empty' as int)
But if the function is called with #lookfor = ' ' that trips the error.
I am aware of DATALENGTH(). Just curious why a single space is treated as "trailing" if there's nothing before it.
I am aware of DATALENGTH(). Just curious why a single space is treated
as "trailing" if there's nothing before it.
It's trailing because it's at the end of the string. It's also leading since it's the at the beginning.
But if the function is called with #lookfor = '' that trips the error
Something that messes a lot of people up with SQL is how '' = ' '; Note this query:
DECLARE #blank VARCHAR(10) = '', #space VARCHAR(10) = CHAR(32);
SELECT CASE WHEN #blank = #space THEN 'That the...!?!?' END;
You can change #space to CHAR(32)+CHAR(32)+.... and #space and #blank will still be equal.
Complicating things a little more note that the DATALENGTH for a blank/empty value is 0 when it's a VARCHAR(N) but the DATALENGTH is N when for CHAR(N) values. In other words,
SELECT DATALENGTH(CAST('' AS CHAR(1))) returns 1 and SELECT DATALENGTH(CAST('' AS CHAR(10))) returns 10.
That means that if your delimiter variable is say, CHAR(1) - that will mess you up. Here's the function for you:
CREATE FUNCTION dbo.CountDelimiters(#string VARCHAR(8000), #delimiter VARCHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT DCount = MAX(DATALENGTH(#string)-LEN(REPLACE(#string,#delimiter,'')))
WHERE DATALENGTH(#delimiter) > 0;
Note that #delimter is VARCHAR(1) and NOT a CHAR datatype.
The formula to count delimiters in #string is:
DATALENGTH(#string)-LEN(REPLACE(#string,#delimiter,''))
or
(DATALENGTH(#string)-LEN(REPLACE(#string,#delimiter,'')))/DATALENGTH(#delimiter) when dealing with delimiters longer than 1`.
WHERE DATALENGTH(#delimiter) > 0 will force the function to ignore a NULL or blank value. This is known as a Startup Predicate.
Putting a MAX around DATALENGTH(#string)-LEN(REPLACE(#string,#delimiter,'')) forces the function to rerturn a NULL value in the event you pass it a blank or NULL value.
This will return 10 for the number of spaces in my string:
SELECT f.DCount FROM dbo.CountDelimiters('one space two spaces three ', CHAR(32)) AS f;
Against a table you would use the function like this (note that I'm counting the number of times the letter "A" appears:
-- Sample Strings
DECLARE #table TABLE (SomeText VARCHAR(36));
INSERT #table VALUES('ABCABC'),('XXX'),('AAA'),(''),(NULL);
SELECT t.SomeText, f.DCount
FROM #table AS t
CROSS APPLY dbo.CountDelimiters(t.SomeText, 'A') AS f;
Which returns:
SomeText DCount
------------------------------------ -----------
ABCABC 2
XXX 0
AAA 3
0
NULL NULL
If a string has a chacacter at the end, it is considered trailing, even if there are no other characters before it. Same for logic regarding leading characters.
So ' ' can be considered an empty string ('') having a trailing space.
When I started using SQL, I also noticed the behavior that the LEN function ignores trailing spaces. And I think (but I am not sure) that is has to do with the fact that LEN should probably also behave "correctly" when used with CHAR/NCHAR values. Unlike VARCHAR/NVARCHAR, the CHAR/NCHAR values have a fixed width and will be filled with trailing spaces automatically. So when you put value 'abc' in a field/variable of type CHAR(5), the value will become 'abc ', but the LEN function will still "correctly" return 3 in that case.
I consider this just to be a strange quirk of SQL.
Remark:
The DATALENGTH function will not ignore trailing spaces in VARCHAR/NVARCHAR values. But note that DATALENGTH will return the size in bytes of the field's value. So if you use unicode data (NCHAR/NVARCHAR), the DATALENGTH function will return 6 for value N'abc', because each unicode character in SQL Server uses 2 bytes!

SQL server string or VARCHAR manipulation containing numerics

In SQL server, I have VARCHAR values.
I need a view that automatically reformats data.
Data that is stored in the following form:
hawthorn104freddy#hawthorn.com
scotland2samantha#gmail.com3
birmingham76roger#outlook.co.uk1905student
Needs to be reformatted into the following:
hawthorn 104freddy#hawthorn.com0000
scotland 002samantha#gmail.com 0003
birmingham076roger#outlook.co.uk1905student
Reformatting
Numeric values within the strings are padded with zeros to the length of the longest number
All other characters are padded with space characters to line up the numbers.
Does anyone know how this is done?
Note: Bear in mind that a string may contain any combination of words and numbers.
You should split your values to 4 columns (to find maximum length in each column), then add leading/trailing zeros/spaces, then concat it.
Here is code to split values, hope you will have no problems with adding zeros and spaces:
declare #v varchar(255) = 'hawthorg104freddy#hawthorn.com50'
select
FirstPart = left(#v, patindex('%[a-z][0-9]%', #v)),
SecondPart = substring(#v, patindex('%[0-9]%', #v), patindex('%[0-9][a-z]%', #v) - patindex('%[a-z][0-9]%', #v)),
ThirdPart = substring(#v, patindex('%[0-9][a-z]%', #v) + 1, len(#v) - patindex('%[0-9][a-z]%', #v) - patindex('%[0-9][a-z]%', reverse(#v))),
Fourthpart = right(#v, patindex('%[0-9][a-z]%', reverse(#v)))
Notes:
patindex('%[a-z][0-9]%', #v) - Last letter in hawthorn (nickname?)
patindex('%[0-9][a-z]%', #v) - Last digit in first number (104)
patindex('%[0-9][a-z]%', reverse(#v)) - Length of the last number
You can also use CLR and RegEx to split values to groups:
https://github.com/zzzprojects/Eval-SQL.NET/wiki/SQL-Server-Regex-%7C-Use-regular-expression-to-search,-replace-and-split-text-in-SQL
You can use PATINDEX
declare #str varchar(100)='hawthorn104freddy#hawthorn.com'
SELECT SUBSTRING(#str,0,PATINDEX('%[0-9]%',#str)),
SUBSTRING(#str,PATINDEX('%[0-9]%',#str),LEN(#str)-LEN(SUBSTRING(#str,0,PATINDEX('%[0-9]%',#str))))

Issue With REPLICATE and Strings in SQL Server 2008

One of the functions of our app is that it prints out "sales tapes" that help the tellers close each night. These tapes print on a 40-character, fixed-width heat-paper printer. At the moment these use deprecated code to load the data from our sales tables instead of the reporting "cube" tables. I'm rewriting them to use the cubes.
I'm running into an issue formatting the text in SQL Server 2008. I'm using the REPLICATE function to calculate out each side of the columns, per line. For some reason some of the lines just randomly have an extra character and are 41 characters in width. Needless to say, that prevents amounts from properly appearing. The two columns are 27 characters and 13 characters
Here is an example. Below is are the pieces. From left to right: left spaces, length of left column text, right spaces, length of right column text. | 40 shows that the total of everything is 40 characters
20 7 7 6 | 40
18 9 3 10 | 40
7 20 13 0 | 40
In this case, the left text is 7 characters, followed by 20 spaces, followed by 7 spaces, followed by 6 characters, all of which would be 40 total characters. What it should read is this (account masked for safety):
STEWART $57.70
AT&T (DP) Fee: $1.50
Acct: xxxxxxxxxxxxxx
However, what it actually reads is:
STEWART $57.7
AT&T (DP) Fee: $1.50
Acct: xxxxxxxxxxxxxx
I can't figure out why it is including too many spaces. If you compare, you can see that $57.70 is 6 characters, as calculated in the first line. Yet it appears as 5 because it is truncated by the 20 + 7. Some how 20 (left spaces) + 7 (left text) + 7 (right spaces) + 6 (right text) is equaling 41!! Below is my code in the UDF:
DECLARE #ReturnValue NVARCHAR(40) = '';
DECLARE #LeftSpaces INT = #LeftSideWidth;
DECLARE #RightSpaces INT = (#PageWidth - #LeftSideWidth);
--remove header text space
SET #LeftSpaces = #LeftSpaces - LEN(#LeftText);
SET #RightSpaces = #RightSpaces - LEN(#RightText);
SET #ReturnValue = #LeftText; --add our left column
SET #ReturnValue = #ReturnValue + REPLICATE(' ', #LeftSpaces); --add our left spaces
SET #ReturnValue = #ReturnValue + REPLICATE(' ', #RightSpaces); --add our right spaces
SET #ReturnValue = #ReturnValue + #RightText; --finally, add our right text;
RETURN #ReturnValue;
The UDF is pretty simple. First, I set the spaces to equal the full length of both columns. Then I reduce the count of spaces by the length of the text to appear in the column on this line. Then I add the left text, left spaces, right spaces, and finally the right-aligned text together and return it. For most rows it works perfect. For random rows (so far those with 6, 7, and 15 length on left text), I get what appeared above. The UDF was written to be more succinct originally but I finally broke it out logically into steps when I couldn't figure out what was wrong.
Anyone have an idea? Where is my math wrong?
try using the LEFT and RIGHT functions along with REPLICATE, like so:
DECLARE #ReturnValue NVARCHAR(40) = '';
DECLARE #LeftSpaces INT = #LeftSideWidth;
DECLARE #RightSpaces INT = (#PageWidth - #LeftSideWidth);
--remove header text space
SET #LeftSpaces = #LeftSpaces - LEN(#LeftText);
SET #RightSpaces = #RightSpaces - LEN(#RightText);
SET #ReturnValue = LEFT(#LeftText + REPLICATE(' ', #LeftSpaces), #LeftSideWidth); --add our left column
SET #ReturnValue = #ReturnValue + RIGHT(REPLICATE(' ', #RightSpaces) + #RightText, (#PageWidth - #LeftSideWidth)); --finally, add our right text;
RETURN #ReturnValue;

Splitting string with three delimeters in SQL Server

String pattern:
1#5,7;2#;3#4
These are three sets of values separated by semicolon.
Digit before # goes in one column, digits after # (separated by comma) go in another column (so the second set in this case only has one value)
How can I do this?
This is what I found on the net:
DECLARE #S VARCHAR(MAX) = '1,100,12345|2,345,433|3,23423,123|4,33,55'
DECLARE #x xml = '<r><c>' +
REPLACE(REPLACE(#S, ',','</c><c>'),'|','</c></r><r><c>') +
'</c></r>'
SELECT x.value('c[1]','int') AS seq,
x.value('c[2]','int') AS invoice,
x.value('c[3]','int') AS amount
FROM #x.nodes('/r') x(x)
This however has fixed no. of figures after every delimiter. And it also uses only 2 delimiters.

Resources