What is the use of writing N' ' in query sql server - sql-server

I am using sql-server 2012 and I have this query
create table t
(
id int not null,
name varchar(10)
);
select OBJECT_NAME(object_id) as table_name,type,name as table_name,type_dec
from sys.indexes
where object_id=OBJECT_ID(N'dbo.t',N'U')
whats the difference in object_id and OBJECT_ID
and what is the use of writing N''
The query returns same result: with or without N

In SQL Server, the prefix N' is used to specify a nvarchar type, which stands for national character.
From the doc :
Prefix Unicode character string constants with the letter N. Without
the N prefix, the string is converted to the default code page of the
database. This default code page may not recognize certain characters.
In other world, it is an unicode character.

The N in N'xxx' means "national language", denoting a unicode string.
If you use it to store data into a VARCHAR as opposed to a NVARCHAR column, it has little use.
You can read more about it under the "Unicode strings" sub-heading on this page: Constants (Transact-SQL).

Q1: Object_id and OBJECT_ID are one and the same.
Q2 is already answered [here][1]

Related

How to validate that UTF-8 columns actually save space?

SQL Server 2019 introduces support for the widely used UTF-8 character encoding.
I have a large table that stores sent emails. So I'd like to give this feature a try.
ALTER TABLE dbo.EmailMessages
ALTER COLUMN Body NVARCHAR(MAX) COLLATE Latin1_General_100_CI_AI_SC_UTF8;
ALTER TABLE dbo.EmailMessages REBUILD;
My concern is that I don't know how to verify size gains. It seems that popular scripts for size estimation do not properly report size in this case.
Basically, column type must be converted to VARCHAR(MAX) then data is stored in a more compact manner:
To limit the amount of changes required for the above scenarios, UTF-8
is enabled in existing the data types CHAR and VARCHAR. String data is
automatically encoded to UTF-8 when creating or changing an object’s
collation to a collation with the “_UTF8” suffix, for example from
LATIN1_GENERAL_100_CI_AS_SC to LATIN1_GENERAL_100_CI_AS_SC_UTF8.
Size can be inspected using sp_spaceused:
sp_spaceused N'EmailMessages';
If unused space is high then you might need to reorganize:
ALTER INDEX ALL ON dbo.EmailMessages REORGANIZE WITH (LOB_COMPACTION = ON);
In my case size was reduced by a factor of ~2 (mostly English text).
As others have already mentioned, you should use VARCHAR instead of NVARCHAR to store UTF-8 encoded text.
You can use a query like the following to compare string lengths. It assumes a table named #Data with an NVARCHAR column called String.
SELECT *
FROM #Data
CROSS APPLY (
SELECT
CONVERT(VARCHAR(MAX), String COLLATE LATIN1_GENERAL_100_CI_AS_SC_UTF8) AS Utf8String
) U
CROSS APPLY (
SELECT
LEN(String) AS Length,
--LEN(Utf8String) AS Utf8Length,
DATALENGTH(String) AS NVarcharBytes,
DATALENGTH(Utf8String) AS Utf8Bytes
) L
CROSS APPLY (
SELECT
CASE WHEN Utf8Bytes < NVarcharBytes THEN 'Yes' ELSE '' END AS IsShorter,
CASE WHEN Utf8Bytes > NVarcharBytes THEN 'Yes' ELSE '' END AS IsLonger
) C
CROSS APPLY (
SELECT
CONVERT(VARCHAR(MAX), CONVERT(VARBINARY(MAX), String), 1) AS NVarcharHex,
CONVERT(VARCHAR(MAX), CONVERT(VARBINARY(MAX), Utf8String), 1) AS Utf8Hex
) H
You can replace FROM #Data with something like FROM (SELECT Email AS String FROM YourTable) D to query your specific data. Replace SELECT * with SELECT SUM(NVarcharBytes) AS NVarcharBytes, SUM(Utf8Bytes) AS Utf8Bytes to get totals.
See this db<>fiddle.
See also: Storage differences between UTF-8 and UTF-16.

Central european characters in SQL

I have an issue. I have data stored on SQL server with central european characters like "č", "ř", "ž" etc. On the database I have the "Czech_CI_AS" collation which should accepted these characters. But when I try to select for example name of the street with this characters like this:
SELECT *
FROM Street where Name = 'Čáslavská'
It returns me nothing
When I remove the "č" it returns me what I need.
SELECT *
FROM Street where Name like '%áslavská'
I have this column in nvarchar type. But I cannot use the N character before my string because the external applications use this table for read and selects are made automaticlly.
Is here any solution? Or have I got something wrong?
Thanks for any help
#YuriyTsarkov really deservers the credit here. To elaborate on his answer.
From MSDN:
Prefix Unicode character string constants with the letter N. Without the N prefix, the string is converted to the default code page of the database. This default code page may not recognize certain characters.
Example
-- Storing Čáslavská in two vars, with and without N prefix.
DECLARE #Test_001 NVARCHAR(255) = 'Čáslavská' COLLATE Czech_CI_AS;
DECLARE #Test_002 NVARCHAR(255) = N'Čáslavská' COLLATE Czech_CI_AS;
-- Test output.
SELECT
#Test_001 AS T1,
#Test_002 AS T2
;
Returns
T1 T2
Cáslavská Čáslavská
You need to update all your external applications code to use selects with N, or, you need to change collation of your column to same, as used by external applications. It may cause some data loss.

Why can I store an Ukrainian string in a varchar column?

I got a little surprised as I was able to store an Ukrainian string in a varchar column .
My table is:
create table delete_collation
(
text1 varchar(100) collate SQL_Ukrainian_CP1251_CI_AS
)
and using this query I am able to insert:
insert into delete_collation
values(N'використовується для вирішення квитки')
but when I am removing 'N' it is showing ?????? in the select statement.
Is it okay or am I missing something in understanding unicode and non-unicode with collate?
From MSDN:
Prefix Unicode character string constants with the letter N. Without
the N prefix, the string is converted to the default code page of the
database. This default code page may not recognize certain characters.
UPDATE:
Please see a similar questions::
What is the meaning of the prefix N in T-SQL statements?
Cyrillic symbols in SQL code are not correctly after insert
sql server 2012 express do not understand Russian letters
To expand on MegaTron's answer:
Using collate SQL_Ukrainian_CP1251_CI_AS, SQL server is able to store ukrainian characters in a varchar column by using CodePage 1251.
However, when you specify a string without the N prefix, that string will be converted to the default non-unicode codepage before it is sent to the database, and that is why you see ??????.
So it is completely fine to use varchar and collate as you do, but you must always include the N prefix when sending strings to the database, to avoid the intermediate conversion to default (non-ukrainian) codepage.

SQL Server string comparison: nvarchar vs. varchar [duplicate]

I have nvarchar(50) column in SQL Server table and data like this:
123abc
234abc
456abc
My query:
select *
from table
where col like '%abc'
Expected result : all rows should be returned
Actual result: No rows are returned
Works fine if the column is varchar but returns no rows if the type is nvarchar.
Any ideas?
You probably have spaces at the end of your data. Take a look at this example.
Declare #Temp Table(col nvarchar(50))
Insert Into #Temp(col) Values(N'123abc')
Insert Into #Temp(col) Values(N'456abc ')
Select * From #Temp Where Col Like '%abc'
When you run the code above, you will only get the 123 row because the 456 row has a space on the end of it.
When you run the code shown below, you will get the data you expect.
Declare #Temp Table(col nvarchar(50))
Insert Into #Temp(col) Values(N'123abc')
Insert Into #Temp(col) Values(N'456abc ')
Select * From #Temp Where rtrim(Col) Like '%abc'
According to the documentation regarding LIKE in books on line (emphasis mine):
http://msdn.microsoft.com/en-us/library/ms179859.aspx
Pattern Matching by Using LIKE
LIKE supports ASCII pattern matching and Unicode pattern matching. When all arguments (match_expression, pattern, and escape_character, if present) are ASCII character data types, ASCII pattern matching is performed. If any one of the arguments are of Unicode data type, all arguments are converted to Unicode and Unicode pattern matching is performed. When you use Unicode data (nchar or nvarchar data types) with LIKE, trailing blanks are significant; however, for non-Unicode data, trailing blanks are not significant. Unicode LIKE is compatible with the ISO standard. ASCII LIKE is compatible with earlier versions of SQL Server.
for nvarchar type you can use select like this :-
select * from Table where ColumnName like N'%abc%'
Are you sure there are no spaces at the end of the value? You can do this to remove the white space:
select *
from yourTable
where rtrim(yourcolumn) like '%abc'
If you don't want to use the RTRIM and the LIKE together you can also use:
Select *
From yourTable
Where charindex('abc', col) > 0
From Microsoft about using LIKE:
SQL Server follows the ANSI/ISO SQL-92 specification (Section 8.2,
, General rules #3) on how to compare strings
with spaces. The ANSI standard requires padding for the character
strings used in comparisons so that their lengths match before
comparing them. The padding directly affects the semantics of WHERE
and HAVING clause predicates and other Transact-SQL string
comparisons. For example, Transact-SQL considers the strings 'abc' and
'abc ' to be equivalent for most comparison operations.
The only exception to this rule is the LIKE predicate. When the right
side of a LIKE predicate expression features a value with a trailing
space, SQL Server does not pad the two values to the same length
before the comparison occurs. Because the purpose of the LIKE
predicate, by definition, is to facilitate pattern searches rather
than simple string equality tests, this does not violate the section
of the ANSI SQL-92 specification mentioned earlier.
If you know the starting position of the 'abc' string then you can use SUBSTRING:
Select *
From yourTable
Where substring(Col, 4, 3) = 'abc'
But then you can use charindex and substring together and you do not have to worry about white space:
select *
from yourTable
where substring(col, charindex('abc', col), 3) = 'abc'
Your query should work just fine, but you can also try.
SELECT * FROM TABLE WHERE (COL LIKE '%abc%')
In case there are characters you cannot see after the 'abc' part.
This will work fine. You can try.
SELECT *
FROM "table"
WHERE CAST("col" AS VARCHAR) LIKE '%abc'
Just to document some of the ASCII vs Unicode weirdness:
-- ascii like
if 'Non rational ' like 'Non[ --]rational%' print 'like' else print 'not like'
-- unicode like
if N'Non rational ' like N'Non[ --]rational%' print 'like' else print 'not like'
-- unicode like, trailing space removed
if N'Non rational' like N'Non[ --]rational%' print 'like' else print 'not like'
-- unicode like, different wildcard
if N'Non rational ' like N'Non_rational%' print 'like' else print 'not like'
Produces the following:
like
not like
not like
like

Unicode characters causing issues in SQL Server 2005 string comparison

This query:
select *
from op.tag
where tag = 'fussball'
Returns a result which has a tag column value of "fußball". Column "tag" is defined as nvarchar(150).
While I understand they are similar words grammatically, can anyone explain and defend this behavior? I assume it is related to the same collation settings which allow you to change case sensitivity on a column/table, but who would want this behavior? A unique constraint on the column also causes failure on inserts of one value when the other exists due to a constraint violation. How do I turn this off?
Follow-up bonus point question. Explain why this query does not return any rows:
select 1
where 'fußball' = 'fussball'
Bonus question (answer?): #ScottCher pointed out to me privately that this is due to the string literal "fussball" being treated as a varchar. This query DOES return a result:
select 1
where 'fußball' = cast('fussball' as nvarchar)
But then again, this one does not:
select 1
where cast('fußball' as varchar) = cast('fussball' as varchar)
I'm confused.
I guess the Unicode collation set for your connection/table/database specifies that ss == ß. The latter behavior would be because it's on a faulty fast path, or maybe it does a binary comparison, or maybe you're not passing in the ß in the right encoding (I agree it's stupid).
http://unicode.org/reports/tr10/#Searching mentions that U+00DF is special-cased. Here's an insightful excerpt:
Language-sensitive searching and
matching are closely related to
collation. Strings that compare as
equal at some strength level are those
that should be matched when doing
language-sensitive matching. For
example, at a primary strength, "ß"
would match against "ss" according to
the UCA, and "aa" would match "å" in a
Danish tailoring of the UCA.
The SELECT does return a row with collation Latin1_General_CI_AS (SQL2000).
It does not with collation Latin1_General_BIN.
You can assign a table column a collation by using the COLLATE < collation > keyword after N/VARCHAR.
You can also compare strings with a specific collation using the syntax
string1 = string2 COLLATE < collation >
This isn't an answer that explains behavior, but may be relevant:
In this question, I learned that using the collation of
Latin1_General_Bin
will avoid most collation quirks.
Some helper answers - not the complete one to your question, but still maybe helpful:
If you try:
SELECT 1 WHERE N'fußball' = N'fussball'
you'll get "1" - when using the "N" to signify Unicode, the two strings are considered the same - why that's the case, I don't know (yet).
To find the default collation for a server, use
SELECT SERVERPROPERTY('Collation')
To find the collation of a given column in a database, use this query:
SELECT
name 'Column Name',
OBJECT_NAME(object_id) 'Table Name',
collation_name
FROM sys.columns
WHERE object_ID = object_ID('your-table-name')
AND name = 'your-column-name'
Bonus question (answer?): #ScottCher
pointed out to me privately that this
is due to the string literal
"fussball" being treated as a varchar.
This query DOES return a result:
select 1 where 'fußball' =
cast('fussball' as nvarchar)
Here you're dealing with the SQL Server data type precedence rules, as stated in Data Type Precedence. Comparisons are done always using the higher precedence type:
When an operator combines two
expressions of different data types,
the rules for data type precedence
specify that the data type with the
lower precedence is converted to the
data type with the higher precedence.
Since nvarchar has a higher precedence than varchar, the comparison in your example will occur suing the nvarchar type, so it's really exactly the same as select 1 where N'fußball' =N'fussball' (ie. using Unicode types). I hope this also makes it clear why your last case doesn't return any row.

Resources