does LEN() have bug? - sql-server

It seems that LEN() ignores whitespaces at the right side of a variable.
declare #a varchar(100)
set #a = 'John '
print len(#a)
The above code prints 4 whereas it should be 7.
Is this a bug?

This is not a bug, this is the intended behavior. To quote the documentation:
LEN excludes trailing spaces. If that is a problem, consider using the DATALENGTH (Transact-SQL) function which does not trim the string

Not a bug, is right there in the documentation:
Returns the number of characters of the specified string expression, excluding trailing spaces.

Please read the official documentation. It is expected behaviour. https://learn.microsoft.com/en-us/sql/t-sql/functions/len-transact-sql?view=sql-server-ver15#remarks

Thank you folks,
I really didn't know that LEN() ignores trailing spaces!
I think DATALENGTH() does not return a correct answer always.
For NVARCHAR() type it returns twice the number of characters in a string. In fact, it returns the total number of bytes that the string consumes in memory.
In my opinion in order to get the correct length of a string value, we should use a formula like below:
CREATE FUNCTION dbo.Length(#x NVARCHAR(MAX)) RETURNS INT
AS
BEGIN
RETURN (CASE WHEN RIGHT(#x, 1) = ' ' THEN LEN(REPLACE(#x, ' ', '$')) ELSE LEN(#x) END)
END
The function is not efficient. I know. But, it returns a correct answer at least.

Related

SQL Server - remove left part of string before a specific character

I have a VARCHAR value that looks like this:
5.95 $ Additional fees
How can I remove everything left from character '$' (including that character) ? So that I get the following result:
Additional fees
The '$' is always present.
STUFF and CHARINDEX would be the simpliest way, in my opinion:
SELECT STUFF(YourColumn,1, CHARINDEX('$',YourColumn),'')
FROM (VALUES('5.95 $ Additional fees'))V(YourColumn);
Note that as $ has a whitespace afterwards, the value returned will have a leading whitespace (' Additional fees'). You could use TRIM (or LTRIM and RTRIM on older versions of SQL Server) to remove this, if it isn't wanted.
I haven't assumed that the portion string to be replaced is CHARINDEX('$',YourColumn)+1, as we have one sample. As far as we know, you could also have values such as '10.99$Base Cost'. If the +1 was used, it would return 'ase Cost' for such a value.
Hello do it like below syntax
declare #temp nvarchar(max)='5.95 $ Additional fees'
select SUBSTRING(#temp,charindex('$',#temp)+1,len(#temp)-1)
You can use SUBSTRING get the particular string and CHARINDEX function to get index of special character, in your case $.
DECLARE #Var VARCHAR(100)
SET #Var = '5.95 $ Additional fees'
SELECT SUBSTRING(#Var, CHARINDEX('$', #Var) + 1, LEN(#Var) - LEN(LEFT(#Var, CHARINDEX('$', #Var))))

Why does LEN( char(32) ) = 0 in T-SQL?

I wanted to write a function to count the number of delimiters or any substring (which could be a space) in a string of text, throwing a hack error if the delimiter was null or empty:
if len(#lookfor)=0 or #lookfor is null return Cast('substring must not be null or empty' as int)
But if the function is called with #lookfor = ' ' that trips the error.
I am aware of DATALENGTH(). Just curious why a single space is treated as "trailing" if there's nothing before it.
I am aware of DATALENGTH(). Just curious why a single space is treated
as "trailing" if there's nothing before it.
It's trailing because it's at the end of the string. It's also leading since it's the at the beginning.
But if the function is called with #lookfor = '' that trips the error
Something that messes a lot of people up with SQL is how '' = ' '; Note this query:
DECLARE #blank VARCHAR(10) = '', #space VARCHAR(10) = CHAR(32);
SELECT CASE WHEN #blank = #space THEN 'That the...!?!?' END;
You can change #space to CHAR(32)+CHAR(32)+.... and #space and #blank will still be equal.
Complicating things a little more note that the DATALENGTH for a blank/empty value is 0 when it's a VARCHAR(N) but the DATALENGTH is N when for CHAR(N) values. In other words,
SELECT DATALENGTH(CAST('' AS CHAR(1))) returns 1 and SELECT DATALENGTH(CAST('' AS CHAR(10))) returns 10.
That means that if your delimiter variable is say, CHAR(1) - that will mess you up. Here's the function for you:
CREATE FUNCTION dbo.CountDelimiters(#string VARCHAR(8000), #delimiter VARCHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT DCount = MAX(DATALENGTH(#string)-LEN(REPLACE(#string,#delimiter,'')))
WHERE DATALENGTH(#delimiter) > 0;
Note that #delimter is VARCHAR(1) and NOT a CHAR datatype.
The formula to count delimiters in #string is:
DATALENGTH(#string)-LEN(REPLACE(#string,#delimiter,''))
or
(DATALENGTH(#string)-LEN(REPLACE(#string,#delimiter,'')))/DATALENGTH(#delimiter) when dealing with delimiters longer than 1`.
WHERE DATALENGTH(#delimiter) > 0 will force the function to ignore a NULL or blank value. This is known as a Startup Predicate.
Putting a MAX around DATALENGTH(#string)-LEN(REPLACE(#string,#delimiter,'')) forces the function to rerturn a NULL value in the event you pass it a blank or NULL value.
This will return 10 for the number of spaces in my string:
SELECT f.DCount FROM dbo.CountDelimiters('one space two spaces three ', CHAR(32)) AS f;
Against a table you would use the function like this (note that I'm counting the number of times the letter "A" appears:
-- Sample Strings
DECLARE #table TABLE (SomeText VARCHAR(36));
INSERT #table VALUES('ABCABC'),('XXX'),('AAA'),(''),(NULL);
SELECT t.SomeText, f.DCount
FROM #table AS t
CROSS APPLY dbo.CountDelimiters(t.SomeText, 'A') AS f;
Which returns:
SomeText DCount
------------------------------------ -----------
ABCABC 2
XXX 0
AAA 3
0
NULL NULL
If a string has a chacacter at the end, it is considered trailing, even if there are no other characters before it. Same for logic regarding leading characters.
So ' ' can be considered an empty string ('') having a trailing space.
When I started using SQL, I also noticed the behavior that the LEN function ignores trailing spaces. And I think (but I am not sure) that is has to do with the fact that LEN should probably also behave "correctly" when used with CHAR/NCHAR values. Unlike VARCHAR/NVARCHAR, the CHAR/NCHAR values have a fixed width and will be filled with trailing spaces automatically. So when you put value 'abc' in a field/variable of type CHAR(5), the value will become 'abc ', but the LEN function will still "correctly" return 3 in that case.
I consider this just to be a strange quirk of SQL.
Remark:
The DATALENGTH function will not ignore trailing spaces in VARCHAR/NVARCHAR values. But note that DATALENGTH will return the size in bytes of the field's value. So if you use unicode data (NCHAR/NVARCHAR), the DATALENGTH function will return 6 for value N'abc', because each unicode character in SQL Server uses 2 bytes!

Select with Left Function Condition SQL Server [duplicate]

I've been using this for some time:
SUBSTRING(str_col, PATINDEX('%[^0]%', str_col), LEN(str_col))
However recently, I've found a problem with columns with all "0" characters like '00000000' because it never finds a non-"0" character to match.
An alternative technique I've seen is to use TRIM:
REPLACE(LTRIM(REPLACE(str_col, '0', ' ')), ' ', '0')
This has a problem if there are embedded spaces, because they will be turned into "0"s when the spaces are turned back into "0"s.
I'm trying to avoid a scalar UDF. I've found a lot of performance problems with UDFs in SQL Server 2005.
SUBSTRING(str_col, PATINDEX('%[^0]%', str_col+'.'), LEN(str_col))
Why don't you just cast the value to INTEGER and then back to VARCHAR?
SELECT CAST(CAST('000000000' AS INTEGER) AS VARCHAR)
--------
0
Other answers here to not take into consideration if you have all-zero's (or even a single zero).
Some always default an empty string to zero, which is wrong when it is supposed to remain blank.
Re-read the original question. This answers what the Questioner wants.
Solution #1:
--This example uses both Leading and Trailing zero's.
--Avoid losing those Trailing zero's and converting embedded spaces into more zeros.
--I added a non-whitespace character ("_") to retain trailing zero's after calling Replace().
--Simply remove the RTrim() function call if you want to preserve trailing spaces.
--If you treat zero's and empty-strings as the same thing for your application,
-- then you may skip the Case-Statement entirely and just use CN.CleanNumber .
DECLARE #WackadooNumber VarChar(50) = ' 0 0123ABC D0 '--'000'--
SELECT WN.WackadooNumber, CN.CleanNumber,
(CASE WHEN WN.WackadooNumber LIKE '%0%' AND CN.CleanNumber = '' THEN '0' ELSE CN.CleanNumber END)[AllowZero]
FROM (SELECT #WackadooNumber[WackadooNumber]) AS WN
OUTER APPLY (SELECT RTRIM(RIGHT(WN.WackadooNumber, LEN(LTRIM(REPLACE(WN.WackadooNumber + '_', '0', ' '))) - 1))[CleanNumber]) AS CN
--Result: "123ABC D0"
Solution #2 (with sample data):
SELECT O.Type, O.Value, Parsed.Value[WrongValue],
(CASE WHEN CHARINDEX('0', T.Value) > 0--If there's at least one zero.
AND LEN(Parsed.Value) = 0--And the trimmed length is zero.
THEN '0' ELSE Parsed.Value END)[FinalValue],
(CASE WHEN CHARINDEX('0', T.Value) > 0--If there's at least one zero.
AND LEN(Parsed.TrimmedValue) = 0--And the trimmed length is zero.
THEN '0' ELSE LTRIM(RTRIM(Parsed.TrimmedValue)) END)[FinalTrimmedValue]
FROM
(
VALUES ('Null', NULL), ('EmptyString', ''),
('Zero', '0'), ('Zero', '0000'), ('Zero', '000.000'),
('Spaces', ' 0 A B C '), ('Number', '000123'),
('AlphaNum', '000ABC123'), ('NoZero', 'NoZerosHere')
) AS O(Type, Value)--O is for Original.
CROSS APPLY
( --This Step is Optional. Use if you also want to remove leading spaces.
SELECT LTRIM(RTRIM(O.Value))[Value]
) AS T--T is for Trimmed.
CROSS APPLY
( --From #CadeRoux's Post.
SELECT SUBSTRING(O.Value, PATINDEX('%[^0]%', O.Value + '.'), LEN(O.Value))[Value],
SUBSTRING(T.Value, PATINDEX('%[^0]%', T.Value + '.'), LEN(T.Value))[TrimmedValue]
) AS Parsed
Results:
Summary:
You could use what I have above for a one-off removal of leading-zero's.
If you plan on reusing it a lot, then place it in an Inline-Table-Valued-Function (ITVF).
Your concerns about performance problems with UDF's is understandable.
However, this problem only applies to All-Scalar-Functions and Multi-Statement-Table-Functions.
Using ITVF's is perfectly fine.
I have the same problem with our 3rd-Party database.
With Alpha-Numeric fields many are entered in without the leading spaces, dang humans!
This makes joins impossible without cleaning up the missing leading-zeros.
Conclusion:
Instead of removing the leading-zeros, you may want to consider just padding your trimmed-values with leading-zeros when you do your joins.
Better yet, clean up your data in the table by adding leading zeros, then rebuilding your indexes.
I think this would be WAY faster and less complex.
SELECT RIGHT('0000000000' + LTRIM(RTRIM(NULLIF(' 0A10 ', ''))), 10)--0000000A10
SELECT RIGHT('0000000000' + LTRIM(RTRIM(NULLIF('', ''))), 10)--NULL --When Blank.
Instead of a space replace the 0's with a 'rare' whitespace character that shouldn't normally be in the column's text. A line feed is probably good enough for a column like this. Then you can LTrim normally and replace the special character with 0's again.
My version of this is an adaptation of Arvo's work, with a little more added on to ensure two other cases.
1) If we have all 0s, we should return the digit 0.
2) If we have a blank, we should still return a blank character.
CASE
WHEN PATINDEX('%[^0]%', str_col + '.') > LEN(str_col) THEN RIGHT(str_col, 1)
ELSE SUBSTRING(str_col, PATINDEX('%[^0]%', str_col + '.'), LEN(str_col))
END
The following will return '0' if the string consists entirely of zeros:
CASE WHEN SUBSTRING(str_col, PATINDEX('%[^0]%', str_col+'.'), LEN(str_col)) = '' THEN '0' ELSE SUBSTRING(str_col, PATINDEX('%[^0]%', str_col+'.'), LEN(str_col)) END AS str_col
This makes a nice Function....
DROP FUNCTION [dbo].[FN_StripLeading]
GO
CREATE FUNCTION [dbo].[FN_StripLeading] (#string VarChar(128), #stripChar VarChar(1))
RETURNS VarChar(128)
AS
BEGIN
-- http://stackoverflow.com/questions/662383/better-techniques-for-trimming-leading-zeros-in-sql-server
DECLARE #retVal VarChar(128),
#pattern varChar(10)
SELECT #pattern = '%[^'+#stripChar+']%'
SELECT #retVal = CASE WHEN SUBSTRING(#string, PATINDEX(#pattern, #string+'.'), LEN(#string)) = '' THEN #stripChar ELSE SUBSTRING(#string, PATINDEX(#pattern, #string+'.'), LEN(#string)) END
RETURN (#retVal)
END
GO
GRANT EXECUTE ON [dbo].[FN_StripLeading] TO PUBLIC
cast(value as int) will always work if string is a number
SELECT CAST(CAST('000000000' AS INTEGER) AS VARCHAR)
This has a limit on the length of the string that can be converted to an INT
If you are using Snowflake SQL, might use this:
ltrim(str_col,'0')
The ltrim function removes all instances of the designated set of characters from the left side.
So ltrim(str_col,'0') on '00000008A' would return '8A'
And rtrim(str_col,'0.') on '$125.00' would return '$125'
This might help
SELECT ABS(column_name) FROM [db].[schema].[table]
replace(ltrim(replace(Fieldname.TableName, '0', '')), '', '0')
The suggestion from Thomas G worked for our needs.
The field in our case was already string and only the leading zeros needed to be trimmed. Mostly it's all numeric but sometimes there are letters so the previous INT conversion would crash.
For converting number as varchar to int, you could also use simple
(column + 0)
Very easy way, when you just work with numeric values:
SELECT
TRY_CONVERT(INT, '000053830')
Try this:
replace(ltrim(replace(#str, '0', ' ')), ' ', '0')
If you do not want to convert into int, I prefer this below logic because it can handle nulls
IFNULL(field,LTRIM(field,'0'))
SUBSTRING(str_col, IIF(LEN(str_col) > 0, PATINDEX('%[^0]%', LEFT(str_col, LEN(str_col) - 1) + '.'), 0), LEN(str_col))
Works fine even with '0', '00' and so on.
Starting with SQL Server 2022 (16.x) you can do this
TRIM ( [ LEADING | TRAILING | BOTH ] [characters FROM ] string )
In MySQL you can do this...
Trim(Leading '0' from your_column)

valid record in SQL query

I have a table with few columns and one of the column is DockNumber. I have to display the docknumbers if they confirm to a particular format
First five characters are numbers followed by a - and followed by 5 characters. The last but one character should be a alpha.
12345-678V9
How can I check in SQL if the first 5 characters are numbers and there is a hyphen and next 3 are numbers and last but one is an alpha.
Building on #gbn's answer, this checks to make sure the length is 11 (in case the #val is not a char(11) or varchar(11) and also checks to make sure the second to last char is alpha
DECLARE #val VARCHAR(20)
SET #val = '12345-678V9'
SELECT CASE WHEN LEN(#val) = 11 AND #val LIKE '[0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][A-Z0-9][0-9]'
THEN 'isMatch'
ELSE 'isNotMatch'
END AS [Valid]
you can use this, you will have to figure it out on how to use this...
SELECT Case when
Cast(ISNUMERIC(LEFT(#Str,5)) as int) + case when substring(#str,6,1)= '-' then 1 else 0 end +case when substring(#str,10,1) like '[a-z]' then 1 else 0 end =3
THEN 'Matched'
Else 'NotMatched'
End
Regular Expressions can be your friend.
LIKE '[0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][A-Z][0-9]'
Now, this allows lower case a-z too. You'd need to coerce collation if you wanted upper case only
Value COLLATE Latin_General_BIN
LIKE '[0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][A-Z][0-9]' COLLATE Latin_General_BIN
PATINDEX is probably the ideal solution.
Select ...
From Table
Where PatIndex('[0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][A-Z][0-9]', DockNumber) > 0
Where DockNumber Like '[0-9][0-9][0-9][0-9][0-9][-][0-9][0-9][0-9][a-z][0-9]
should work, but i would suggest using Regular expression in code. Much easier if it is possible.
The regex should be '^\d{5}-\d{3}[A-Z]\d$', because without ^ and $ it would find longer strings that contain that sequence (122 12345-678V9 34).
Use rule
CREATE RULE pattern_rule
AS
#value LIKE '[0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9][A-Z][0-9]'
Then bind rule to column

Data length in ntext column?

How do you find out the length/size of the data in an ntext column in SQL? - It's longer than 8000 bytes so I can't cast it to a varchar. Thanks.
Use DataLength()
SELECT * FROM YourTable WHERE DataLength(NTextFieldName) > 0
The clue's in the question: use DATALENGTH(). Note it has a different behaviour to LEN():
SELECT LEN(CAST('Hello ' AS NVARCHAR(MAX))),
DATALENGTH(CAST('Hello ' AS NVARCHAR(MAX))),
DATALENGTH(CAST('Hello ' AS NTEXT))
returns 5, 16, 16.
In other words, DATALENGTH() doesn't remove trailing spaces and returns the number of bytes, whereas LEN() trims the trailing spaces and returns the number of characters.
Select Max(DataLength([NTextFieldName])) from YourTable

Resources