Nvarchar in SQL Server - sql-server

Why do we need to add N'பட்டப்பகலில்' for unicode strings in nvarchar.
We are inserting unicode as well as non unicode string in a column that is nvarchar type, Since type of a column is nvarchar why do we need to add N'' before unicode string, we will not be knowing what is unicode and non unicode string .
Do we have any way we can insert unicode and non unicode in nvarchar field without mentioning N in it.

Since type of a column is nvarchar why do we need to add N''
By the time the value is assigned to a column or variable, it's already been processed as a string literal.1
It's therefore far too late to consider the type of the column or variable to decide how to process it. Indeed, it may not be assigned to a column or variable at all - it may be part of a larger expression.
That's why you have to separately indicate the type of each literal. But as others have commented, there's no great penalty in just marking all of your literals as unicode (unless you're working with lots of ~6000 character literals).
1This is the same as many other languages where the type of an expression has to be determined without any regard to if it's going to be assigned to a particular variable, and therefore the type of such a variable does not play a part in determining the type of the expression.

Putting N at the start of a literal string means it is an nvarchar. The difference between 'abc' and N'abc' is that the first literal string is a varchar(3), and the second is a nvarchar(3).
Why is it important? Well one reason is that an nvarchar is double the size of a varchar, so creating a good query plan for the size of the values you have is incredibly important. Also, an nvarchar can only have a maximum length of 4000 before you need to use MAX. A varchar can have up to 8000.

If the column is type nvarchar, then everything is stored as unicode. Even if you use characters that would not need unicode to store, they are still stored as unicode. So you can't insert "non-unicode" strings.
You can omit the N if you'd like, if you're not using any special characters. But SQL Server will just convert it to unicode before storing it.

Related

sort order of unicode characters not covered by the collation

Suppose I have a stored procedure that declares two Nvarchar variables, and
those variables happen to hold arabic values. The default collation of the db
is Latin1_General_CI_AI, so that is the collation that will be used for the
comparison. (per https://msdn.microsoft.com/en-us/library/ms179886.aspx) . I know that the collation determines the sort order of the character set, but what does it do, in this case, for unicode code points that fall outside the Latin range? When I compare the values using greater than/less than, will it use the proper Arabic sort order?
Hope the question makes sense.

What datatype should I use to store custom value in SQL Server

What is the best way to store the following value in SQL Server ?
1234-56789 or
4567-12892
The value will always have 4 digits followed by a hyphen and 5 digits
char(10) is a possibility that I was thinking of using or removing the hyphen and storing as int
If it is a business requirement to have "The value will always have 4 digits followed by a hypen and 5 digits" Then CHAR(10) but if you think Users should be able to add values even if isnt in the expected format then VARCHAR(10) or VARCHAR(15) whatever suits you better.
You should store those kind of values as int only if really represents a number as opposed to a series of digits. Number means something that you can make calculations on, compare are numbers, etc.
Otherwise store it as char. Make it length of 10 if the format is set and won't change.
Another option would be to create a CHAR(4) column and a CHAR(5) column. This would be useful (only) if you envision ever having to query against one or the other part independently.
Very easy to concatenate these back together using a view, computed column, or inline - so you don't have to waste storage space on a dash that will always be there, and so that you can keep these two pieces of data separate if, in fact, they are independent.
Since you didn't provide much detail about what these "numbers" represent or how they will be used / queried, you're going to get a whole bunch of opinions, some of which might not be very relevant to your data model.
Well, if it's guaranteed to always be like that, a char(10) datatype seems appropriate.
But you should also add a check constraint:
column LIKE '[0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9][0-9]'
Here is a SO answer that should help you sort out what you need -
nchar and nvarchar can store Unicode characters.
char and varcharcannot store Unicode characters.
char and nchar are fixed-length which will reserve storage space for number of characters you specify even if you don't use up all that space.
varchar and nvarchar are variable-length which will only use up spaces for the characters you store. It will not reserve storage like char or nchar.

Extra spaces being added at the tail in the column

When I am saving data into a table, extra spaces being added to the valued at the tail. I observed that as the column length is 5, if I am inserting a value of 3 char length, 2 extra spaces are being added. Can any one how to solve this problem.
Is the column type CHAR(5) instead of VARCHAR(5)?
CHAR(x) creates a column that always stores x characters, and pads the data with spaces.
VARCHAR(x) creates a column that varies the lengths of the strings to match the data inserted.
This is a property of CHAR data type. If you want no extra spaces, you need to use VARCHAR although for a small field there is a minimal overhead compared to standard CHAR. Having said that, it is believed that VARCHAR nowadays is as good as CHAR.
CHAR variables will store this extra padding, maybe you need to be using VARCHAR2 variables instead?

What is the best way to append spaces at the end of a char variable in SQL Server

What is the best way to append spaces at the end of a char variable in SQL Server?
I have found 3 ways. Any ideas which one is better? Here I am trying to pad 2 spaces at the end of FOO
1)
declare #var char(5)
set #var = convert(char(5),'FOO')
2)
declare #var char(5)
set #var = cast('FOO' AS char(5))
3)
declare #var char(5)
set #var = 'FOO'
what is the difference between each of them?
When I have to parse huge data which option will be quicker and efficient taking less memory?
The spaces are comming from the way the variable is declared: char(5). Being a fixed length type, the value will be automatically space appended.
You should also look at SET ANSI PADDING setting. For varchar(5) type (variable length) the setting of ANSI PADDING may result in trimming existing spaces from the end of the value:
Trailing blanks in character values
inserted into a varchar column are
trimmed. Trailing zeros in binary
values inserted into a varbinary
column are trimmed.
My guess is that they are all identical.
The T-SQL parser probably creates an internal expression tree from each statement, and after abstracting it, each one becomes the same tree.
Cast and convert are practically identical, the only difference being that a cast is required to switch between decimal and numeric types. Implicit conversion is fine too. The execution is identical.
The only thing to watch for is if your input is longer than your variable size, the end will be trimmed off without warning.

Size of varchar columns

In sql server does it make a difference if I define a varchar column to be of length 32 or 128?
A varchar is a variable character field. This means it can hold text data to a certain length. A varchar(32) can only hold 32 characters, whereas a varchar(128) can hold 128 characters. If I tried to input "12345" into a varchar(3) field; this is the data that will be stored:
"123"
The "45" will be "truncated" (lost).
They are very useful in instances where you know that a certain field will only be (or only should be) a certain length at maximum. For example: a zip code or state abbreviation. In fact, they are generally used for almost all types of text data (such as names/addresses/et cetera) - but in these instances you must be careful that the number you supply is a sane maximum for the type of data that will fill that column.
However, you must also be careful when using them to only allow the user to input the maximum amount of characters that the field will support. Otherwise it may lend to confusion when it truncates the user's input.
There should be no noticeable difference as the backend will only store the amount of data you insert into that column. It's not padded out to the full size of the field like it is with a char column.
Edit: For more info, this link should be useful.
It should not. It just defines the maximum length it can accommodate, the actual used length depends on the data inserted.

Resources