What exactly is the meaning of nvarchar(n) - sql-server

The documentation isn't super clear: https://msdn.microsoft.com/en-us/library/ms186939.aspx
What happens if I try to store a 20 character length string in a column defined as nvarchar(10)? Is 10 the max length the field could be or is it the expected length? If I can exceed n characters in the string, what are the performance implications of doing that?

The maximum number of characters you can store in a column or variable typed as nvarchar(n) is n. If you try to store more your string will be truncated, or in case of an insert into a table, the insert would be disallowed with a warning about possible truncation:
String or binary data would be truncated. The statement has been
terminated.
declare #n nvarchar(10)
set #n = N'more than ten chars'
select #n
Result:
----------
more than
(1 row(s) affected)

From my understanding, nvarchar will only only store the provided characters up to the amount defined. Nchar will actually fill in the unused characters with whitespace.

Related

Concatenation of two varchar columns in select into

I have a insert into tableA select from someTables and in my select I have two text columns that I concatenate e.g. colA + colB. They have type varchar(n). Should the column in TableA simply be varchar(2n)? Is it bad for performance if say I have varchar(5*n)?
If the two columns are concatenated from varchar(n) is it possible that the result is more than varchar(2n) or e.g. nvarchar(3n)?
When you concatenate 2 (n)varchar values the resulting datatype is the 2 length properties added together, or 8,000 bytes (which ever is lower). If you concatenating a varchar and an nvarchar the varchar will be implicitly cast to an nvarchar first.
Unless at least 1 of the values concatenated is of MAX length, the return datatype will not be converted to a MAX and any trailing characters will be truncated.
Take the below examples, which return the data types of their aliases:
SELECT REPLICATE('A',10) + REPLICATE('B',10) AS varchar20,
REPLICATE(N'A',10) + REPLICATE(N'B',10) AS nvarchar20,
REPLICATE(N'A',10) + REPLICATE('B',5) AS nvarchar15,
REPLICATE('A',5000) + REPLICATE('B',5000) AS varchar8000, --Truncation occurs
REPLICATE(N'A',3000) + REPLICATE('B',3000) AS nvarchar4000, --Truncation occurs
REPLICATE(CONVERT(nvarchar(MAX),N'A'),3000) + REPLICATE('B',3000) AS nvarcharMAX;
And this can be validated using dm_exec_describe_first_result_set:
SELECT [name], system_type_name
FROM sys.dm_exec_describe_first_result_set(N'SELECT REPLICATE(''A'',10) + REPLICATE(''B'',10) AS varchar20,
REPLICATE(N''A'',10) + REPLICATE(N''B'',10) AS nvarchar20,
REPLICATE(N''A'',10) + REPLICATE(''B'',5) AS nvarchar15,
REPLICATE(''A'',5000) + REPLICATE(''B'',5000) AS varchar8000, --Truncation occurs
REPLICATE(N''A'',3000) + REPLICATE(''B'',3000) AS nvarchar4000, --Truncation occurs
REPLICATE(CONVERT(nvarchar(MAX),N''A''),3000) + REPLICATE(''B'',3000) AS nvarcharMAX;',NULL, NULL);
Obviously, if you concatenate 3 (n)varchar values, then the resulting length is the sum of the 3 length values, etc.
Note that I explicitly state 8,000 bytes not 8,000 or 4,000 characters length. Many confuse the length value for varchar and nvarchar to mean the number of characters it can hold, but this is not actually true, it's the number of bytes; for varchar it's 8,000 single bytes and for nvarchar it is 4,000 double bytes. This is far more important now that SQL Server supports UTF-8 collations.
For example, the below returns a value of 2666, as the character I chose at random (◘) uses 3 bytes per character.
SELECT LEN(REPLICATE(CONVERT(varchar(3),N'◘' COLLATE Latin1_General_100_CI_AI_SC_UTF8),8000));

How is nvarchar(n) stored in SQL Server?

Does it occupy fixed N*2 or it may use less storage if the actual value to be stored is smaller then N*2 bytes?
I have a huge table with many fields of fixed nvarchar type. Some are nvarchar(100) and some are nvarchar(400) etc.
Data in column is never an exact size, it varies from 0 to N. Most of data is less then N/2.
For example, a field called RecipientName is of type nvarchar(400) and there are 9026424 rows.
Size of only RecipientName would be 800*9026424 = 6.72 GB.
but actual storage size of entire table is only 2.02 GB. Is there any compression applied or some smaller then N with power of 2 is chosen?
NCHAR data type:
It is a fixed length data type.
It Occupies 2 bytes of space for EACH CHARACTER.
It is used to store Unicode characters (e.g. other languages like Spanish, French, Arabic, German, etc.)
For Example:
Declare #Name NChar(20);
Set #Name = N'Sachin'
Select #Name As Name, DATALENGTH(#Name) As [Datalength In Bytes], LEN(#Name) As [Length];
Name Datalength Length
Sachin 40 6
Even though declared size is 20, the data length column shows 40 bytes storage memory size because it uses 2 bytes for each character.
And this 40 bytes of memory is irrespective of the actual length of data stored.
NVARCHAR data type:
It is a variable length data type.
It Occupies 2 bytes of space for EACH CHARACTER.
It is used to store Unicode characters (e.g. other languages like Spanish, French, Arabic, German, etc.)
For Example:
Declare #Name NVarchar(20);
Set #Name = N'Sachin'
Select #Name As Name, DATALENGTH(#Name) As [Datalength], LEN(#Name) As [Length];
Name Datalength Length
Sachin 12 6
Even though declared size is 20, the data length column shows 12 bytes storage memory size because it uses 2 bytes for each character.
And this 12 bytes of memory is irrespective of the length of data in the declaration.
Hope this is helpful :)
Yes,
it may use less storage if the actual value to be stored is smaller
then N*2 bytes
n just shows the maximum number of characters that can be stored in this field, the number of stored characters is equal to actual characters number you pass in.
And here is the documentation: nchar and nvarchar (Transact-SQL)
For non-MAX, non-XML string types, the length that they are declared as (i.e. the value within the parenthesis) is the maximum number of smallest (in terms of bytes) characters that will be allowed. But, the actual limit isn't calculated in terms of characters but in terms of bytes. CHAR and VARCHAR characters can be 1 or 2 bytes, so the smallest is 1 and hence a [VAR]CHAR(100) has a limit of 100 bytes. That 100 bytes can be filled up by 100 single-byte characters, or 50 double-byte characters, or any combination that does not exceed 100 bytes. NCHAR and NVARCHAR (stored as UTF-16 Little Endian) characters can be either 2 or 4 bytes, so the smallest is 2 and hence a N[VAR]CHAR(100) has a limit of 200 bytes. That 200 bytes can be filled up by 100 two-byte characters or 50 four-byte characters, or any combination that does not exceed 200 bytes.
If you enable ROW or DATA Compression (this is a per-Index setting), then the actual space used will usually be less. NCHAR and NVARCHAR use the Unicode Compression Algorithm which is somewhat complex so not easy to calculate what it would be. And I believe that the MAX types don't allow for compression.
Outside of those technicalities, the difference between the VAR and non-VAR types is simply that the VAR types take up only the space of each individual value inserted or updated, while the non-VAR types are blank-padded and always take up the declared amount of space (which is why one almost always uses the VAR types). The MAX types are only variable (i.e. there is no CHAR(MAX) or NCHAR(MAX)).

converting TEXT to VARCHAR

I 've noticed that when converting TEXT to VARCHAR the converted value is silently clipped at 30 characters.
CREATE TABLE foo (x TEXT)
-- insert a string that's 50 characters long
INSERT INTO foo(x) VALUES('xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx')
SELECT CHAR_LENGTH(CONVERT(VARCHAR, x)) FROM foo -- retuns 30
SELECT CHAR_LENGTH(CONVERT(VARCHAR(3000), x)) FROM foo -- returns 50
My questions are:
where is that limit documented / originate from?
what's an idiomatic way to make the conversion without having to add an arbitrarily high value? (as in the second SELECT statement above)
You can better always specify the varchar-length and the maximum length of a varchar is in Sybase ASE 15.7 and 16.0 16384.
If you try to create a longer varchar, you'll get following error:
Length or precision specification 16385 is not within the range of 1 to 16384.
Tim

SQL server Varchar(max) and space taken

If varchar(max) is used as the datatype and the inserted data is less than the full allocation, i.e. only 200 chars, then will SQL Server always take the full space of varchar(max) or just the 200 chars' space?
Further, what are the other data types that will take the max space even if lesser data is inserted?
Are there any documents that specify this?
From MS DOCS on char and varchar (Transact-SQL):
char [ ( n ) ]
Fixed-length, non-Unicode string data. n defines the string length and must be a value from 1 through 8,000. The storage size is n bytes. The ISO synonym for char is character.
varchar [ ( n | max ) ]
Variable-length, non-Unicode string data. n defines the string length and can be a value from 1 through 8,000. max indicates that the maximum storage size is 2^31-1 bytes (2 GB). The storage size is the actual length of the data entered + 2 bytes. The ISO synonyms for varchar are char varying or character varying.
So for varchar, including max - the storage will depend on actual data length, while char is always fixed size even when entire space is not used.
Use CHAR only for strings
whose length you know to be fixed. For example, if you define a domain
whose values are restricted to 'T' and 'F', you should probably make
that CHAR[1]. If you're storing US social security numbers, make the
domain CHAR[9] (or CHAR[11] if you want punctuation).
Use VARCHAR for strings that can vary in length, like names, short
descriptions, etc. Use VARCHAR when you don't want to worry about
stripping trailing blanks. Use VARCHAR unless there's a good reason
not to.
varchar size depends on the length of the data. So in your case, it will just take 200 chars.

What is the maximum characters for the NVARCHAR(MAX)? [duplicate]

This question already has answers here:
What is the maximum number of characters that nvarchar(MAX) will hold?
(3 answers)
Closed 1 year ago.
I have declared a column of type NVARCHAR(MAX) in SQL Server 2008, what would be its exact maximum characters having the MAX as the length?
The max size for a column of type NVARCHAR(MAX) is 2 GByte of storage.
Since NVARCHAR uses 2 bytes per character, that's approx. 1 billion characters.
Leo Tolstoj's War and Peace is a 1'440 page book, containing about 600'000 words - so that might be 6 million characters - well rounded up. So you could stick about 166 copies of the entire War and Peace book into each NVARCHAR(MAX) column.
Is that enough space for your needs? :-)
By default, nvarchar(MAX) values are stored exactly the same as nvarchar(4000) values would be, unless the actual length exceed 4000 characters; in that case, the in-row data is replaced by a pointer to one or more seperate pages where the data is stored.
If you anticipate data possibly exceeding 4000 character, nvarchar(MAX) is definitely the recommended choice.
Source: https://social.msdn.microsoft.com/Forums/en-US/databasedesign/thread/d5e0c6e5-8e44-4ad5-9591-20dc0ac7a870/
From MSDN Documentation
nvarchar [ ( n | max ) ]
Variable-length Unicode string data. n defines the string length and can be a value from 1 through 4,000. max indicates that the maximum storage size is 2^31-1 bytes (2 GB).
The storage size, in bytes, is two times the actual length of data entered + 2 bytes
I think actually nvarchar(MAX) can store approximately 1070000000 chars.

Resources