How to store a string along with Syllabication in varchar column - sql-server

Is there any way to store āre exactly in SQL server table.
I hardcoded the same value in varchar column. It is saving are. I wanted to store along with special symbols

Use Nvarchar - Nvarchar stores UNICODE data. If you have requirements to store UNICODE or multilingual data, Nvarchar is the choice. You need an N prefix when inserts data. Varchar stores ASCII data.
Refer below sample code
declare #data table
(field1 nvarchar(10))
insert into #data
values
(N'āre')
select * from #data

You need to declare your string assignment using the N prefix (the N
stands for "National Character") as you need to explicitly say you are
passing a string containing unicode characters here (or an nchar,
ntext etc if you were using those).
NVarchar variable are denoted by N' so it would be
DECLARE #objname nvarchar(255)
set #objname=N'漢字'
select #objname
Now the output will be 漢字 as it has been set. Run above code.

Related

TSQL "Illegal XML Character" When Converting Varbinary to XML

I'm trying to create a stored procedure in SQL Server 2016 that converts XML that was previously converted into Varbinary back into XML, but getting an "Illegal XML character" error when converting. I've found a workaround that seems to work, but I can't actually figure out why it works, which makes me uncomfortable.
The stored procedure takes data that was converted to binary in SSIS and inserted into a varbinary(MAX) column in a table and performs a simple
CAST(Column AS XML)
It worked fine for a long time, and I only began seeing an issue when the initial XML started containing an ® (registered trademark) symbol.
Now, when I attempt to convert the binary to XML I get this error
Msg 9420, Level 16, State 1, Line 23
XML parsing: line 1, character 7, illegal xml character
However, if I first convert the binary to varchar(MAX), then convert that to XML, it seems to work fine. I don't understand what is happening when I perform that intermediate CAST that is different than casting directly to XML. My main concern is that I don't want to add it in to account for this scenario and end up with unintended consequences.
Test code:
DECLARE #foo VARBINARY(MAX)
DECLARE #bar VARCHAR(MAX)
DECLARE #Nbar NVARCHAR(MAX)
--SELECT Varbinary
SET #foo = CAST( '<Test>®</Test>' AS VARBINARY(MAX))
SELECT #foo AsBinary
--select as binary as varchar
SET #bar = CAST(#foo AS VARCHAR(MAX))
SELECT #bar BinaryAsVarchar -- Correct string output
--select binary as nvarchar
SET #nbar = CAST(#foo AS NVARCHAR(MAX))
SELECT #nbar BinaryAsNvarchar -- Chinese characters
--select binary as XML
SELECT TRY_CAST(#foo AS XML) BinaryAsXML -- ILLEGAL XML character
-- SELECT CONVERT(xml, #obfoo) BinaryAsXML --ILLEGAL XML Character
--select BinaryAsVarcharAsXML
SELECT TRY_CAST(#bar AS XML) BinaryAsVarcharAsXML -- Correct Output
--select BinaryAsNVarcharAsXML
SELECT TRY_CAST(#nbar AS XML) BinaryAsNvarcharAsXML -- Chinese Characters
There are several things to know:
SQL-Server is rather limited with character encodings. There is VARCHAR, which is 1-byte-encoded extended ASCII and NVARCHAR, which is UCS-2 (almost the same as utf-16).
VARCHAR uses plain latin for the first set of characters and a codepage-mapping provided by the collation in use for the second set.
VARCHAR is not utf-8. utf-8 works with VARCHAR, as long as all characters are 1-byte-enocded. But utf-8 knows a lot of 2-byte-enocded (up to 4-byte-enocded) characters, which would break the internal storage of a VARCHAR string.
NVARCHAR will work with almost any 2-byte encoded character natively (that means with almost any existing character). But it is not exactly utf-16 (there are 3-byte encoded characters, which would break SQL-Servers internal storage).
XML is not stored as the XML-string you see, but as an hierarchically organised physical table, based on NVARCHAR values.
The natively stored XML is really fast, while any text-based storage will need a very expensive parse-operation in advance (over and over...).
Storing XML as string is bad, storing XML as VARCHAR string is even worse.
Storing a VARCHAR-string-XML as VARBINARY is a cummulation of things you should not do.
Try this:
DECLARE #text1Byte VARCHAR(100)='<test>blah</test>';
DECLARE #text2Byte NVARCHAR(100)=N'<test>blah</test>';
SELECT CAST(#text1Byte AS VARBINARY(MAX)) AS text1Byte_Binary
,CAST(#text2Byte AS VARBINARY(MAX)) AS text2Byte_Binary
,CAST(#text1Byte AS XML) AS text1Byte_XML
,CAST(#text2Byte AS XML) AS text2Byte_XML
,CAST(CAST(#text1Byte AS VARBINARY(MAX)) AS XML) AS text1Byte_XML_via_Binary
,CAST(CAST(#text2Byte AS VARBINARY(MAX)) AS XML) AS text2Byte_XML_via_Binary
The only difference you'll see are the many zeros in 0x3C0074006500730074003E0062006C00610068003C002F0074006500730074003E00. This is due to the 2-byte-encoding of nvarchar, each second byte is not needed in this sample. But if you'd need far-east-characters the picture would be completely different.
The reason why it works: SQL-Server is very smart. The cast from the variable to XML is rather easy, as the engine knows, that the underlying variable is varchar or nvarchar. But the last two casts are different. The engine has to examine the binary, whether it is a valid nvarchar and will give it a second try with varchar if it fails.
Now try to add your registered trademark to the given example. Add it first to the second variable DECLARE #text2Byte NVARCHAR(100)=N'<test>blah®</test>'; and try to run this. Then add it to the first variable and try it again.
What you can try:
Cast your binary to varchar(max), then to nvarchar(max) and finally to xml.
,CAST(CAST(CAST(CAST(#text1Byte AS VARBINARY(MAX)) AS VARCHAR(MAX)) AS NVARCHAR(MAX)) AS XML) AS text1Byte_XML_via_Binary
This will work, but it won't be fast...

Detecting Unicode Text in SQL Server

I am storing bodies of text in SQL Server.
Some bodies of text contain Unicode characters that will be lost when storing in a VARCHAR column within SQL Server.
As only a small portion of text bodies stored will require a NVARCHAR column, I have decided to create 2 columns, one for VARCHAR text and the other a NVARCHAR text. This way I can save on space by only storing Unicode bodies of text in the NVARCHAR column and the rest in the VARCHAR column.
The question is: how do I detect if a body of text contains Unicode characters so that I can determine the best column to store it in?
You could either determine the 256 characters available in your collation's code page and inspect the string for any characters not in that set or cast it to varchar and then compare it to the nvarchar original.
If you are using code page 1252 then the first approach could be done with
DECLARE #String NVARCHAR(MAX) = N'൯'
SELECT CASE
WHEN #String LIKE '%[^' COLLATE Latin1_General_100_BIN + CHAR(0) + '-' + CHAR(255) + ']%'
THEN 'varchar not OK'
ELSE 'varchar OK'
END
and the second approach...
DECLARE #String NVARCHAR(MAX) = N'൯'
SELECT CASE
WHEN CAST(#String AS VARCHAR(MAX)) = #String
THEN 'varchar OK'
ELSE 'varchar not OK'
END
BTW: If you use row compression you also get Unicode compression thrown in which would largely negate the need for this.

Stored procedure Inserts Hebrew characters into an NVARCHAR column, but SELECT shows "?"

When I SELECT from the table, the data that I stored is stored as question marks.
#word is a parameter in my stored procedure, and the value comes from the C# code:
string word = this.Request.Form["word"].ToString();
cmd.Parameters.Add("#word", System.Data.SqlDbType.NVarChar).Value = word;
My stored procedure is like this:
CREATE PROCEDURE ....
(
#word nvarchar(500)
...
)
Insert into rub_translate (language_id,name)
values (8 ,#word COLLATE HEBREW_CI_AS )
My database, and the column, is using the SQL_Latin1_General_CP1_CI_AS collation and I cannot change them.
Can anybody give me a solution how can I solve this problem just by modifying the column or the table?
In order for this to work you need to do the following:
Declare the input parameter in the app code as NVARCHAR (you have done this)
Declare the input parameter in the stored procedure as NVARCHAR (no code is shown for this)
Insert or Update a column in a table that is defined as NVARCHAR (you claim that this is the case)
When using NVARCHAR it does not matter what the default Collation of the Database is. And actually, when using NVARCHAR, it won't matter what the Collation of the column in the table is, at least not for properly inserting the characters.
Also, specifying the COLLATE keyword in the INSERT statement is unnecessary and wouldn't help anyway. If you have the stored procedure input parameter defined as VARCHAR, then the characters are already converted to ? upon coming into the stored procedure. And if the column is actually defined as VARCHAR (you haven't provided the table's DDL) then if the Collation isn't Hebrew_* then there is nothing you can do (besides change either the datatype to NVARCHAR or the Collation to a Hebrew_ one).
If those three items listed at the top are definitely in place, then the last thing to check is the input value itself. Since this is a web app, it is possible that the encoding of the page itself is not set correctly. You need to set a break point just at the cmd.Parameters.Add("#word", System.Data.SqlDbType.NVarChar).Value = word; line and confirm that the value held in the word variable contains Hebrew characters instead of ?s.
ALSO: you should never create a string parameter without specifying the max length/size. The default is 30 (in this case, sometimes it's 1), yet the parameter in the stored procedure is defined as NVARCHAR(500). This could result in silent truncation ("silent" meaning that it will not cause an error, it will just truncate the value). Instead, you should always specify the size. For example:
cmd.Parameters.Add("#word", System.Data.SqlDbType.NVarChar, 500).Value = word;
You could just insert it as-is, since it's unicode and then select it with a proper collation:
declare #test table([name] nvarchar(500) collate Latin1_General_CI_AS);
declare #word nvarchar(500) = N'זה טקסט.';
insert into #test ( [name] ) values ( #word );
select [t].[name] collate Hebrew_CI_AS from #test as [t]
Or you can change the collation of that one column in the table all together. But remember that there is a drawback of having a different collation from your database in one or more columns: you will need to add the collate statement to queries when you need to compare data between different collations.

SQL Server 2012 - Dynamic SQL

Can someone please explain why the below query works? I assume the first DECLARE uses a VARCHAR that's long enough to hold the table name. But why does the second DECLARE use a VARCHAR and why does it's corresponding query need to be wrapped in 'quotes'?
USE Northwind
DECLARE #TableName VARCHAR(25)=
(Select top 1 tab.name
From Sys.tables tab
Where name not like 'dtproperties'
and name not like 'sysdiagrams'
order by tab.name asc)
DECLARE #Output VARCHAR(100) =
'SELECT COUNT(*) AS [CountOf_' + #TABLENAME + ']
FROM [' + #TABLENAME + ']'
EXEC(#Output)
The datatype of #TableName being VARCHAR(25) is incorrect (or at least a poor choice). Most objects (Tables, Views, Stored Procedures, Functions, etc) have a datatype of sysname which is an alias for NVARCHAR(128). So no, the first DECLARE uses a datatype that is not only not long enough, but would also not be able to hold a wide range of otherwise valid Unicode characters.
The second DECLARE uses a VARCHAR(100) because it is making two possibly bad assumptions:
that there will never be any Unicode characters, and
that the names of the tables will never be more than 62 characters long (that's the amount of characters left after you remove the rest of the characters shown in that query)
The query is wrapped in quotes and submitted via the EXEC() (i.e. it is Dynamic SQL) since neither the columns nor the tables of a query can be variables.

Store such characters in SQL Server 2008 R2

I'm storing encrypted passwords in the database, It worked perfect so far on MachineA. Now that I moved to MachineB it seems like the results gets corrupted in the table.
For example: ù9qÆæ\2 Ý-³Å¼]ó will change to ?9q??\2 ?-³?¼]? in the table.
That's the query I use:
ALTER PROC [Employees].[pRegister](#UserName NVARCHAR(50),#Password VARCHAR(150))
AS
BEGIN
DECLARE #Id UNIQUEIDENTIFIER
SET #Id = NEWID()
SET #password = HashBytes('MD5', #password + CONVERT(VARCHAR(50),#Id))
SELECT #Password
INSERT INTO Employees.Registry (Id,[Name],[Password]) VALUES (#Id, #UserName,#Password)
END
Collation: SQL_Latin1_General_CP1_CI_AS
ProductVersion: 10.50.1600.1
Thanks
You are mixing 2 datatypes:
password need to be nvarchar to support non-Western European characters
literals need N prefix
Demo:
DECLARE #pwdgood nvarchar(150), #pwdbad varchar(150)
SET #pwdgood = N'ù9qÆæ\2 Ý-³Å¼]ó'
SET #pwdbad = N'?9q??\2 ?-³?¼]?'
SELECT #pwdgood, #pwdbad
HashBytes gives varbinary(8000) so you need this in the table
Note: I'd also consider salting the stored password with something other than ID column for that row
If you want to store such characters, you need to:
use NVARCHAR as the datatype for your columns and parameters (#Password isn't NVARCHAR and the CAST you're using to assign the password in the database table isn't using NVARCHAR either, in your sample ...)
use the N'....' syntax for indicating Unicode string literals
With those two in place, you should absolutely be able to store and retrieve any valid Unicode character

Resources