When I describe a table with nvarchar data type, what's the difference between Precision and Length? I see that Length is always the double. For example the values is nvarchar(64), the precision is 64 and the length is 128.
CREATE TABLE T(X nvarchar(64))
EXEC sp_columns 'T'
Precision has no meaning for text data.
As for the Length property, I think you confuse it with the Size reported by SQL Server Management Studio, which is the size of a column in bytes. The Length of an nvarchar(64) column is 64 while Size is 128.
The size of unicode types (nchar, nvarchar) is double the number of characters because Unicode uses two bytes for each character.
You can get these values using the LEN function to get the number of characters and the DATALENGTH function to get the number of bytes, eg.
select len(N'some value'), datalength(N'some value')
which returns 10 20
EDIT
From your comments I see you use sp_columns to get at the table's schema info. You shouldn't use any of the catalog stored procedures and use the catalog views instead.
As the documentation states, catalog stored procedures are used to support ODBC applications, hence their results are limited and may need interpretation, as you found out. sp_columns doesn't differentiate between character and data length for example.
Schema views like those in the INFORMATION_SCHEMA or sys schemas return detailed and unambiguous information. For example, INFORMATION_SCHEMA.COLUMNS returns the character lnegth in CHARACTER_MAXIMUM_LENGTH and byte size in CHARACTER_OCTET_LENGTH. It also includes collation and character set information not returned by sp_columns.
The INFORMATION_SCHEMA views are defined by ISO so they don't include some SQL Server-specific info like whether a column is a computed columns, stored in a filestream or replicated. You can get this info using the system object catalog views like sys.columns
NVARCHAR doesnt have precision. Precision is used for decimal. And length is the character length.
nvarchar [ ( n | max ) ]
Variable-length Unicode string data. n defines the string length and can be a value from 1 through 4,000. max
indicates that the maximum storage size is 2^31-1 bytes (2 GB). The
storage size, in bytes, is two times the actual length of data entered
+ 2 bytes. The ISO synonyms for nvarchar are national char varying and national character varying.
From the source:-
Precision is the number of digits in a number.
Length for a numeric data type is the number of bytes that are used to
store the number. Length for a character string or Unicode data type
is the number of characters
NVARCHAR doesn't have precision, and the length would be the character length. Try the following SQL
SELECT LEN('Whats the length of this string?')
You may be confucing this with a Numeric or Decimal, see the chart here
3 Years later but still relevant.. had a similar issue;
I encountered a UDDT 'd_style', using alt+f1 I see this is a 'nvarchar, length=2, prec=1'.
And indeed, I cannot insert e.g. 'EU' in this field, as data would be truncated.
So length 2 does not mean 2 characters here. This is because each character is saved as 2 bytes, thus prec = length/2.
In this case, 'precision' is indeed the maximum amount of characters allowed.
However, when you create a new table with the 'nvarchar' datatype, you can simply enter the desired length (e.g. my_field nvarchar(2)) if you want to be able to insert example value 'EU'.
Mind that this is all quite 'practical', in theory, precision is only applicable for numeric values, and is the number of digits in a number. So we shouldn't be talking about a 'precision' for nvarchar.
Indeed, this is a confusing topic in SQL server, even the documentation is inconsistent.
For example see the syntax definition of CREATE TABLE
<data type> ::=
[ type_schema_name . ] type_name
[ ( precision [ , scale ] | max |
[ { CONTENT | DOCUMENT } ] xml_schema_collection ) ]
You see "precision", but don't see "length | precision".
And as others pointed out, everywhere else in the documentation, they refer to the same attribute as length, when talking about character datatypes. Except where they don't, like the documentation of sp_help and companion.
Length int Column length in bytes.
Prec char(5) Column precision.
Scale char(5) Column scale.
Here length is the storage length. (Most people would call it size.)
So unfortunately there is no correct or final answer to your question, you must always consider the context, and consult the documentation, when in doubt.
Related
I am learning how to use snowflake. As per my understanding TRUNCATECOLUMNS only work for "COPY INTO", is this true? If so what can I use to achieve the same result for "INSERT INTO".
ps. I just want to truncate a column if the size of string is longer than a certain length.
To truncate column LEFT could be used:
CREATE TABLE trg_tab(col_name VARCHAR(20));
INSERT INTO trg_tab(col_name)
SELECT LEFT(col_name, 20)
FROM src_tab;
Moreover, there is no performance difference between VARCHAR(20) and skipping the size VARCHAR.
VARCHAR
If no length is specified, the default is the maximum allowed length (16,777,216).
When choosing the maximum length for a VARCHAR column, consider the following:
Storage: A column consumes storage for only the amount of actual data stored. For example, a 1-character string in a VARCHAR(16777216) column only consumes a single character.
Performance: There is no performance difference between using the full-length VARCHAR declaration VARCHAR(16777216) or a smaller length
Tools for working with data: Some BI/ETL tools define the maximum size of the VARCHAR data in storage or in memory. If you know the maximum size for a column, you could limit the size when you add the column.
Collation: When you specify a collation for a VARCHAR column, the number of characters that are allowed varies, depending on the number of bytes each character takes and the collation specification of the column.
I have seen prefix N in some insert T-SQL queries. Many people have used N before inserting the value in a table.
I searched, but I was not able to understand what is the purpose of including the N before inserting any strings into the table.
INSERT INTO Personnel.Employees
VALUES(N'29730', N'Philippe', N'Horsford', 20.05, 1),
What purpose does this 'N' prefix serve, and when should it be used?
It's declaring the string as nvarchar data type, rather than varchar
You may have seen Transact-SQL code that passes strings around using
an N prefix. This denotes that the subsequent string is in Unicode
(the N actually stands for National language character set). Which
means that you are passing an NCHAR, NVARCHAR or NTEXT value, as
opposed to CHAR, VARCHAR or TEXT.
To quote from Microsoft:
Prefix Unicode character string constants with the letter N. Without
the N prefix, the string is converted to the default code page of the
database. This default code page may not recognize certain characters.
If you want to know the difference between these two data types, see this SO post:
What is the difference between varchar and nvarchar?
Let me tell you an annoying thing that happened with the N' prefix - I wasn't able to fix it for two days.
My database collation is SQL_Latin1_General_CP1_CI_AS.
It has a table with a column called MyCol1. It is an Nvarchar
This query fails to match Exact Value That Exists.
SELECT TOP 1 * FROM myTable1 WHERE MyCol1 = 'ESKİ'
// 0 result
using prefix N'' fixes it
SELECT TOP 1 * FROM myTable1 WHERE MyCol1 = N'ESKİ'
// 1 result - found!!!!
Why? Because latin1_general doesn't have big dotted İ that's why it fails I suppose.
1. Performance:
Assume your where clause is like this:
WHERE NAME='JON'
If the NAME column is of any type other than nvarchar or nchar, then you should not specify the N prefix. However, if the NAME column is of type nvarchar or nchar, then if you do not specify the N prefix, then 'JON' is treated as non-unicode. This means the data type of NAME column and string 'JON' are different and so SQL Server implicitly converts one operand’s type to the other. If the SQL Server converts the literal’s type
to the column’s type then there is no issue, but if it does the other way then performance will get hurt because the column's index (if available) wont be used.
2. Character set:
If the column is of type nvarchar or nchar, then always use the prefix N while specifying the character string in the WHERE criteria/UPDATE/INSERT clause. If you do not do this and one of the characters in your string is unicode (like international characters - example - ā) then it will fail or suffer data corruption.
Assuming the value is nvarchar type for that only we are using N''
I have seen prefix N in some insert T-SQL queries. Many people have used N before inserting the value in a table.
I searched, but I was not able to understand what is the purpose of including the N before inserting any strings into the table.
INSERT INTO Personnel.Employees
VALUES(N'29730', N'Philippe', N'Horsford', 20.05, 1),
What purpose does this 'N' prefix serve, and when should it be used?
It's declaring the string as nvarchar data type, rather than varchar
You may have seen Transact-SQL code that passes strings around using
an N prefix. This denotes that the subsequent string is in Unicode
(the N actually stands for National language character set). Which
means that you are passing an NCHAR, NVARCHAR or NTEXT value, as
opposed to CHAR, VARCHAR or TEXT.
To quote from Microsoft:
Prefix Unicode character string constants with the letter N. Without
the N prefix, the string is converted to the default code page of the
database. This default code page may not recognize certain characters.
If you want to know the difference between these two data types, see this SO post:
What is the difference between varchar and nvarchar?
Let me tell you an annoying thing that happened with the N' prefix - I wasn't able to fix it for two days.
My database collation is SQL_Latin1_General_CP1_CI_AS.
It has a table with a column called MyCol1. It is an Nvarchar
This query fails to match Exact Value That Exists.
SELECT TOP 1 * FROM myTable1 WHERE MyCol1 = 'ESKİ'
// 0 result
using prefix N'' fixes it
SELECT TOP 1 * FROM myTable1 WHERE MyCol1 = N'ESKİ'
// 1 result - found!!!!
Why? Because latin1_general doesn't have big dotted İ that's why it fails I suppose.
1. Performance:
Assume your where clause is like this:
WHERE NAME='JON'
If the NAME column is of any type other than nvarchar or nchar, then you should not specify the N prefix. However, if the NAME column is of type nvarchar or nchar, then if you do not specify the N prefix, then 'JON' is treated as non-unicode. This means the data type of NAME column and string 'JON' are different and so SQL Server implicitly converts one operand’s type to the other. If the SQL Server converts the literal’s type
to the column’s type then there is no issue, but if it does the other way then performance will get hurt because the column's index (if available) wont be used.
2. Character set:
If the column is of type nvarchar or nchar, then always use the prefix N while specifying the character string in the WHERE criteria/UPDATE/INSERT clause. If you do not do this and one of the characters in your string is unicode (like international characters - example - ā) then it will fail or suffer data corruption.
Assuming the value is nvarchar type for that only we are using N''
I have a table with columns that contain both thai and english text data. NVARCHAR(255).
In SSMS I can query the table and return all the rows easy enough. But if I then query specifically for one of the Thai results it returns no rows.
SELECT TOP 1000 [Province]
,[District]
,[SubDistrict]
,[Branch ]
FROM [THDocuworldRego].[dbo].[allDistricsBranches]
Returns
Province District SubDistrict Branch
อุตรดิตถ์ ลับแล ศรีพนมมาศ Northern
Bangkok Khlong Toei Khlong Tan SSS1
But this query:
SELECT [Province]
,[District]
,[SubDistrict]
,[Branch ]
FROM [THDocuworldRego].[dbo].[allDistricsBranches]
where [Province] LIKE 'อุตรดิตถ์'
Returns no rows.
What do I need o do to get the expected results.
The collation set is Latin1_General_CI_AS.
The data is displayed and inserted with no errors just can't search.
Two problems:
The string being passed into the LIKE clause is VARCHAR due to not being prefixed with a capital "N". For example:
SELECT 'อุตรดิตถ์' AS [VARCHAR], N'อุตรดิตถ์' AS [NVARCHAR]
-- ????????? อุตรดิตถ
What is happening here is that when SQL Server is parsing the query batch, it needs to determine the exact type and value of all literals / constants. So it figures out that 12 is an INT and 12.0 is a NUMERIC, etc. It knows that N'ดิ' is NVARCHAR, which is an all-inclusive character set, so it takes the value as is. BUT, as noted before, 'ดิ' is VARCHAR, which is an 8-bit encoding, which means that the character set is controlled by a Code Page. For string literals and variables / parameters, the Code Page used for VARCHAR data is the Database's default Collation. If there are characters in the string that are not available on the Code Page used by the Database's default Collation, they are either converted to a "best fit" mapping, if such a mapping exists, else they become the default replacement character: ?.
Technically speaking, since the Database's default Collation controls string literals (and variables), and since there is a Code Page for "Thai" (available in Windows Collations), then it would be possible to have a VARCHAR string containing Thai characters (meaning: 'ดิ', without the "N" prefix, would work). But that would require changing the Database's default Collation, and that is A LOT more work than simply prefixing the string literal with "N".
For an in-depth look at this behavior, please see my two-part series:
Which Collation is Used to Convert NVARCHAR to VARCHAR in a WHERE Condition? (Part A of 2: “Duck”)
Which Collation is Used to Convert NVARCHAR to VARCHAR in a WHERE Condition? (Part B of 2: “Rabbit”)
You need to add the wildcard characters to both ends:
N'%อุตรดิตถ์%'
The end result will look like:
WHERE [Province] LIKE N'%อุตรดิตถ์%'
EDIT:
I just edited the question to format the "results" to be more readable. It now appears that the following might also work (since no wildcards are being used in the LIKE predicate in the question):
WHERE [Province] = N'อุตรดิตถ์'
EDIT 2:
A string (i.e. something inside of single-quotes) is VARCHAR if there is no "N" prefixed to the string literal. It doesn't matter what the destination datatype is (e.g. an NVARCHAR(255) column). The issue here is the datatype of the source data, and that source is a string literal. And unlike a string in .NET, SQL Server handles 'string' as an 8-bit encoding (VARCHAR; ASCII values 0 - 127 same across all Code Pages, Extended ASCII values 128 - 255 determined by the Code Page, and potentially 2-byte sequences for Double-Byte Character Sets) and N'string' as UTF-16 Little Endian (NVARCHAR; Unicode character set, 2-byte sequences for BMP characters 0 - 65535, two 2-byte sequences for Code Points above 65535). Using 'string' is the same as passing in a VARCHAR variable. For example:
DECLARE #ASCII VARCHAR(20);
SET #ASCII = N'อุตรดิตถ์';
SELECT #ASCII AS [ImplicitlyConverted]
-- ?????????
Could be a number of things!
Fist of print out the value of the column and your query string in hex.
SELECT convert(varbinary(20)Province) as stored convert(varbinary(20),'อุตรดิตถ์') as query from allDistricsBranches;
This should give you some insight to the problem. I think the most likely cause is the ั, ิ, characters being typed in the wrong sequence. They are displayed as part of the main letter but are stored internally as separate characters.
I want to write a trigger for one of my tables which has an ntext datatype field an as you know the trigger can't be written for ntext datatype.
Now I want to replace the ntext with nvarchar datatype. The ntext maximum length is 2,147,483,647 character whereas nvarchar(max) is 4000 character.
what datatype can I use instead of ntext datatype.
Or are there any ways to write trigger for when I have ntext datatype?
It's better to say my database is designed before with SQL 2000 and it is full of data.
You're out of luck with sql server 2000, but you can possibly chain together a bunch of nvarchar(4000) variables. Its a hack, but it may be the only option you have. I would also do an assesment of your data, and see what the largest data you actually have in that column. A lot of times, columns are made in anticipation of a large data set, but in the end it doesn't have them.
in MSDN i see this :
* Important *
ntext, text, and image data types will be removed in a future version of Microsoft SQL Server. Avoid using these data types in new development work, and plan to modify applications that currently use them. Use nvarchar(max), varchar(max), and varbinary(max) instead.
Fixed and variable-length data types for storing large non-Unicode and Unicode character and binary data. Unicode data uses the UNICODE UCS-2 character set.
and it preferd nvarchar(MAX) , You can see details below :
nvarchar [ ( n | max ) ]
Variable-length Unicode string data. n defines the string length and can be a value from 1 through 4,000. max indicates that the maximum storage size is 2^31-1 bytes (2 GB). The storage size, in bytes, is two times the actual length of data entered + 2 bytes. The ISO synonyms for nvarchar are national char varying and national character varying.