Concatenation of two varchar columns in select into - sql-server

I have a insert into tableA select from someTables and in my select I have two text columns that I concatenate e.g. colA + colB. They have type varchar(n). Should the column in TableA simply be varchar(2n)? Is it bad for performance if say I have varchar(5*n)?
If the two columns are concatenated from varchar(n) is it possible that the result is more than varchar(2n) or e.g. nvarchar(3n)?

When you concatenate 2 (n)varchar values the resulting datatype is the 2 length properties added together, or 8,000 bytes (which ever is lower). If you concatenating a varchar and an nvarchar the varchar will be implicitly cast to an nvarchar first.
Unless at least 1 of the values concatenated is of MAX length, the return datatype will not be converted to a MAX and any trailing characters will be truncated.
Take the below examples, which return the data types of their aliases:
SELECT REPLICATE('A',10) + REPLICATE('B',10) AS varchar20,
REPLICATE(N'A',10) + REPLICATE(N'B',10) AS nvarchar20,
REPLICATE(N'A',10) + REPLICATE('B',5) AS nvarchar15,
REPLICATE('A',5000) + REPLICATE('B',5000) AS varchar8000, --Truncation occurs
REPLICATE(N'A',3000) + REPLICATE('B',3000) AS nvarchar4000, --Truncation occurs
REPLICATE(CONVERT(nvarchar(MAX),N'A'),3000) + REPLICATE('B',3000) AS nvarcharMAX;
And this can be validated using dm_exec_describe_first_result_set:
SELECT [name], system_type_name
FROM sys.dm_exec_describe_first_result_set(N'SELECT REPLICATE(''A'',10) + REPLICATE(''B'',10) AS varchar20,
REPLICATE(N''A'',10) + REPLICATE(N''B'',10) AS nvarchar20,
REPLICATE(N''A'',10) + REPLICATE(''B'',5) AS nvarchar15,
REPLICATE(''A'',5000) + REPLICATE(''B'',5000) AS varchar8000, --Truncation occurs
REPLICATE(N''A'',3000) + REPLICATE(''B'',3000) AS nvarchar4000, --Truncation occurs
REPLICATE(CONVERT(nvarchar(MAX),N''A''),3000) + REPLICATE(''B'',3000) AS nvarcharMAX;',NULL, NULL);
Obviously, if you concatenate 3 (n)varchar values, then the resulting length is the sum of the 3 length values, etc.
Note that I explicitly state 8,000 bytes not 8,000 or 4,000 characters length. Many confuse the length value for varchar and nvarchar to mean the number of characters it can hold, but this is not actually true, it's the number of bytes; for varchar it's 8,000 single bytes and for nvarchar it is 4,000 double bytes. This is far more important now that SQL Server supports UTF-8 collations.
For example, the below returns a value of 2666, as the character I chose at random (◘) uses 3 bytes per character.
SELECT LEN(REPLICATE(CONVERT(varchar(3),N'◘' COLLATE Latin1_General_100_CI_AI_SC_UTF8),8000));

Related

SQL Server: Handle of string concatenation

Given a variable nvarchar(max), the input is 'aaaaa...' with 16000 length.
The value of the variable has no problem with this setup.
If I break down the input into 3 smaller ones let's say (7964,4594,3442) the variable truncates the concatenation of them.
On the other hand, if at least 1 variable is over 8000 size, the concatenation works without an issue.
Is there any documentation regarding the mentioned behavior?
Taken from the docs:
If the result of the concatenation of strings exceeds the limit of
8,000 bytes, the result is truncated. However, if at least one of the
strings concatenated is a large value type, truncation does not occur.
Operations between varchar and nvarchar are limited to 8000 and 4000 characters respectively, unless you treat any of the involved data types as MAX. Please be very cautious with the order of the operations, this is a very good example from the docs:
DECLARE #x varchar(8000) = replicate('x', 8000)
DECLARE #y varchar(max) = replicate('y', 8000)
DECLARE #z varchar(8000) = replicate('z',8000)
SET #y = #x + #z + #y
-- The result of following select is 16000
SELECT len(#y) AS y
The result is 16k and not 24k because the first operation is #x + #z which is truncated at 8000 because neither of them are MAX. Then the result is concatenated to a type that is MAX, thus breaking the restriction of 8000 as limit, which adds another 8000 characters from #y. In the result, the characters from variable #z are lost at the first concatenation.
If your using CONCAT function
If none of the input arguments has a supported large object (LOB)
type, then the return type truncates to 8000 characters in length,
regardless of the return type. This truncation preserves space and
supports plan generation efficiency.
try
CONCAT(CAST('' as VARCHAR(MAX)),#var1,#var2)
or
CAST(#var1 as VARCHAR(MAX)) + #var2

What exactly is the meaning of nvarchar(n)

The documentation isn't super clear: https://msdn.microsoft.com/en-us/library/ms186939.aspx
What happens if I try to store a 20 character length string in a column defined as nvarchar(10)? Is 10 the max length the field could be or is it the expected length? If I can exceed n characters in the string, what are the performance implications of doing that?
The maximum number of characters you can store in a column or variable typed as nvarchar(n) is n. If you try to store more your string will be truncated, or in case of an insert into a table, the insert would be disallowed with a warning about possible truncation:
String or binary data would be truncated. The statement has been
terminated.
declare #n nvarchar(10)
set #n = N'more than ten chars'
select #n
Result:
----------
more than
(1 row(s) affected)
From my understanding, nvarchar will only only store the provided characters up to the amount defined. Nchar will actually fill in the unused characters with whitespace.

Format string in SQL Server 2005 from numeric value

How I can format string with D in start and leading zeros for digits with length of less than four. E.g:
D1000 for 1000
D0100 for 100
I have tried to work with casting and stuff function, but it didn't work as I expected.
SELECT STUFF('D0000', LEN(#OperatingEndProc) - 2, 4, CAST((CAST(SUBSTRING(#OperatingEndProc, 2, 4) AS INT) + 1) AS VARCHAR(10)));
adding 10000 to the value will cause the number to have have extra zeros first, then casting it as varchar and only using the last 4 will ignore the added 10000. This require that all numbers are between 0 and 9999
declare #value int = 100
select 'D' + right(cast(#value + 10000 as varchar(5)), 4)
This illustration board can come in handy when you wanna get the proper casting practices..
This shows all explicit and implicit data type conversions that are
allowed for SQL Server system-supplied data types. These include xml,
bigint, and sql_variant. There is no implicit conversion on assignment
from the sql_variant data type, but there is implicit conversion to
sql_variant
You can download it here http://www.microsoft.com/en-us/download/details.aspx?id=35834

Right pad a string with variable number of spaces

I have a customer table that I want to use to populate a parameter box in SSRS 2008. The cust_num is the value and the concatenation of the cust_name and cust_addr will be the label. The required fields from the table are:
cust_num int PK
cust_name char(50) not null
cust_addr char(50)
The SQL is:
select cust_num, cust_name + isnull(cust_addr, '') address
from customers
Which gives me this in the parameter list:
FIRST OUTPUT - ACTUAL
1 cust1 addr1
2 customer2 addr2
Which is what I expected but I want:
SECOND OUTPUT - DESIRED
1 cust1 addr1
2 customer2 addr2
What I have tried:
select cust_num, rtrim(cust_name) + space(60 - len(cust_name)) +
rtrim(cust_addr) + space(60 - len(cust_addr)) customer
from customers
Which gives me the first output.
select cust_num, rtrim(cust_name) + replicate(char(32), 60 - len(cust_name)) +
rtrim(cust_addr) + replicate(char(32), 60 - len(cust_addr)) customer
Which also gives me the first output.
I have also tried replacing space() with char(32) and vice versa
I have tried variations of substring, left, right all to no avail.
I have also used ltrim and rtrim in various spots.
The reason for the 60 is that I have checked the max length in both fields and it is 50 and I want some whitespace between the fields even if the field is maxed. I am not really concerned about truncated data since the city, state, and zip are in different fields so if the end of the street address is chopped off it is ok, I guess.
This is not a show stopper, the SSRS report is currently deployed with the first output but I would like to make it cleaner if I can.
Whammo blammo (for leading spaces):
SELECT
RIGHT(space(60) + cust_name, 60),
RIGHT(space(60) + cust_address, 60)
OR (for trailing spaces)
SELECT
LEFT(cust_name + space(60), 60),
LEFT(cust_address + space(60), 60),
The easiest way to right pad a string with spaces (without them being trimmed) is to simply cast the string as CHAR(length). MSSQL will sometimes trim whitespace from VARCHAR (because it is a VARiable-length data type). Since CHAR is a fixed length datatype, SQL Server will never trim the trailing spaces, and will automatically pad strings that are shorter than its length with spaces. Try the following code snippet for example.
SELECT CAST('Test' AS CHAR(20))
This returns the value 'Test '.
This is based on Jim's answer,
SELECT
#field_text + SPACE(#pad_length - LEN(#field_text)) AS RightPad
,SPACE(#pad_length - LEN(#field_text)) + #field_text AS LeftPad
Advantages
More Straight Forward
Slightly Cleaner (IMO)
Faster (Maybe?)
Easily Modified to either double pad for displaying in non-fixed width fonts or split padding left and right to center
Disadvantages
Doesn't handle LEN(#field_text) > #pad_length
Based on KMier's answer, addresses the comment that this method poses a problem when the field to be padded is not a field, but the outcome of a (possibly complicated) function; the entire function has to be repeated.
Also, this allows for padding a field to the maximum length of its contents.
WITH
cte AS (
SELECT 'foo' AS value_to_be_padded
UNION SELECT 'foobar'
),
cte_max AS (
SELECT MAX(LEN(value_to_be_padded)) AS max_len
)
SELECT
CONCAT(SPACE(max_len - LEN(value_to_be_padded)), value_to_be_padded AS left_padded,
CONCAT(value_to_be_padded, SPACE(max_len - LEN(value_to_be_padded)) AS right_padded;
declare #t table(f1 varchar(50),f2 varchar(50),f3 varchar(50))
insert into #t values
('foooo','fooooooo','foo')
,('foo','fooooooo','fooo')
,('foooooooo','fooooooo','foooooo')
select
concat(f1
,space(max(len(f1)) over () - len(f1))
,space(3)
,f2
,space(max(len(f2)) over () - len(f2))
,space(3)
,f3
)
from #t
result
foooo fooooooo foo
foo fooooooo fooo
foooooooo fooooooo foooooo

Data length in ntext column?

How do you find out the length/size of the data in an ntext column in SQL? - It's longer than 8000 bytes so I can't cast it to a varchar. Thanks.
Use DataLength()
SELECT * FROM YourTable WHERE DataLength(NTextFieldName) > 0
The clue's in the question: use DATALENGTH(). Note it has a different behaviour to LEN():
SELECT LEN(CAST('Hello ' AS NVARCHAR(MAX))),
DATALENGTH(CAST('Hello ' AS NVARCHAR(MAX))),
DATALENGTH(CAST('Hello ' AS NTEXT))
returns 5, 16, 16.
In other words, DATALENGTH() doesn't remove trailing spaces and returns the number of bytes, whereas LEN() trims the trailing spaces and returns the number of characters.
Select Max(DataLength([NTextFieldName])) from YourTable

Resources