How to decode characters to string? - sql-server

I have extracted text data from a rtf file, there are string in CP 1049
lang1049\''ea\''e0\''f6\''e0\''ef\''e8
I have tried to decode it to a string but received not string what I know but other characters
declare #out table (id int not null identity, string varchar(128) collate Cyrillic_General_CI_AS)
insert into #out(string) select (char(0xEA)+char(0xE0)+char(0xF6)+char(0xE0)+char(0xEF)+char(0xE8))
select * from #out
GO
The string should be 'кацапи' .
How to do it correctly?

You can do this using NCHAR()
NCHAR
insert into #out(string) select NCHAR(0xEA)+NCHAR(0xE0)+NCHAR(0xF6)+NCHAR(0xE0)+NCHAR(0xEF)+NCHAR(0xE8)
Alternatively, for UNICODE chars, you can use convert()
insert into #out(string) select convert(varchar(128), 0xEA)+convert(varchar(128), 0xE0)+convert(varchar(128), 0xF6)+convert(varchar(128), 0xE0)+convert(varchar(128), 0xEF)+convert(varchar(128), 0xE8)

Related

How to update a varbinary column with base64 string to bytes?

I want to update a varbinary column with base64 string (an image) to bytes in SQL Server.
Does anyone know how to do it?
I need to have the image in bytes.
Thanks!
You can use the XML Query function xs:base64Binary() to encode and decode Base64 data for you:
When given varbinary(max) input it returns a varchar(max) result, i.e.: Base64 encoded data.
When given varchar(max) input it returns a varbinary(max) result. If the input isn't valid Base64 encoded data it returns NULL.
e.g.:
create table dbo.Demo (
ID int not null identity(1,1),
ImageData varbinary(max)
);
insert dbo.Demo (ImageData) values (null);
declare #Base64Data varchar(max) = 'LzlqLzRB';
update dbo.Demo
set ImageData = cast('' as xml).value('xs:base64Binary(sql:variable("#Base64Data"))', 'varbinary(max)')
where ID = 1;
select * from dbo.Demo;
=====
Edit: if you have already stored your base64 data into varbinary(max) you will need to cast it to varchar(max) before supplying it to xs:base64Binary(), e.g.:
create table dbo.Demo (
ID int not null identity(1,1),
ImageData varbinary(max)
);
insert dbo.Demo (ImageData) values ( cast('LzlqLzRB' as varbinary(max)) );
select * from dbo.Demo; -- base64 characters as varbinary
update Dmo
set ImageData = cast('' as xml).value('xs:base64Binary(sql:column("Base64Data"))', 'varbinary(max)')
from dbo.Demo Dmo
outer apply ( select Base64Data = cast(ImageData as varchar(max)) ) Cst
where ID = 1;
select * from dbo.Demo; -- decoded data as varbinary

T-SQL DecryptByKey returns NULL with Column Value, but not with Column Name

I have an encrypted varbinary(MAX) field in my DB called ACCT_FName_encrypt.
I can successfully decrypt this field with:
CONVERT(nvarchar(MAX), DecryptByKey(ACCT_FName_encrypt)) AS 'ACCT_FName_Denc'
But if I try to decrypt the actual value from the column, I get NULL:
CONVERT(nvarchar(MAX), DecryptByKey('0x001D25D87D3D8E49A97863ADC4958E790100000021E26DD2305384AE49EC9329EF2AF8758134F7C946EC9FE024805B8DF21472C4545D461DA9F2B7F96094C2AED09BF4A9')) AS 'ACCT_FName_Denc'
How can I get the decrypted value from the straight varbinary, without calling the field?
It should not be passed as string and it needs to be cast to the original type after. Full, working example:
OPEN SYMMETRIC KEY StackOverflow
DECRYPTION BY PASSWORD = 'pass123_#pass123_#'
GO
DECLARE #ColumnValue NVARCHAR(MAX);
DECLARE #EncrpytionValue VARBINARY(8000);
SET #ColumnValue = REPLICATE (N'A', 12)
SET #EncrpytionValue = ENCRYPTBYKEY( KEY_GUID('StackOverflow'), #ColumnValue )
SELECT #EncrpytionValue
SELECT CONVERT(NVARCHAR(MAX), DECRYPTBYKEY(#EncrpytionValue));
SELECT CONVERT(NVARCHAR(MAX), DECRYPTBYKEY(0x00B08017838E6C48889DD12542E4C52002000000A8C910DA1CBFFE30E446358940177F03F912EE36FACF91FA2044BE5C75C9AA69BC15E6425DE52C2A193BA13AEDA90AE2276C244E56692B75CB2D4FDEC8D596F9));
--DROP SYMMETRIC KEY StackOverflow;
and in your code it will be just:
CONVERT(nvarchar(MAX), DecryptByKey(0x001D25D87D3D8E49A97863ADC4958E790100000021E26DD2305384AE49EC9329EF2AF8758134F7C946EC9FE024805B8DF21472C4545D461DA9F2B7F96094C2AED09BF4A9)) AS 'ACCT_FName_Denc'
Try converting the value to Varbinary
select convert(varbinary, '0x001D25D87D3D8E49A97863ADC4958E790100000021E26DD2305384AE49EC9329EF2AF8758134F7C946EC9FE024805B8DF21472C4545D461DA9F2B7F96094C2AED09BF4A9')

Convert utf-8 encoded varbinary(max) data to nvarchar(max) string

Is there a simple way to convert a utf-8 encoded varbinary(max) column to varchar(max) in T-SQL. Something like CONVERT(varchar(max), [MyDataColumn]). Best would be a solution that does not need custom functions.
Currently, i convert the data on the client side, but this has the downside, that correct filtering and sorting is not as efficient as done server-side.
XML trick
Following solution should work for any encoding.
There is a tricky way of doing exactly what the OP asks. Edit: I found the same method discussed on SO (SQL - UTF-8 to varchar/nvarchar Encoding issue)
The process goes like this:
SELECT
CAST(
'<?xml version=''1.0'' encoding=''utf-8''?><![CDATA[' --start CDATA
+ REPLACE(
LB.LongBinary,
']]>', --we need only to escape ]]>, which ends CDATA section
']]]]><![CDATA[>' --we simply split it into two CDATA sections
) + ']]>' AS XML --finish CDATA
).value('.', 'nvarchar(max)')
Why it works: varbinary and varchar are the same string of bits - only the interpretation differs, so the resulting xml truly is utf8 encoded bitstream and the xml interpreter is than able to reconstruct the correct utf8 encoded characters.
BEWARE the 'nvarchar(max)' in the value function. If you used varchar, it would destroy multi-byte characters (depending on your collation).
BEWARE 2 XML cannot handle some characters, i.e. 0x2. When your string contains such characters, this trick will fail.
Database trick (SQL Server 2019 and newer)
This is simple. Create another database with UTF8 collation as the default one. Create function that converts VARBINARY to VARCHAR. The returned VARCHAR will have that UTF8 collation of the database.
Insert trick (SQL Server 2019 and newer)
This is another simple trick. Create a table with one column VARCHAR COLLATE ...UTF8. Insert the VARBINARY data into this table. It will get saved correctly as UTF8 VARCHAR. It is sad that memory optimized tables cannot use UTF8 collations...
Alter table trick (SQL Server 2019 and newer)
(don't use this, it is unnecessary, see Plain insert trick)
I was trying to come up with an approach using SQL Server 2019's Utf8 collation and I have found one possible method so far, that should be faster than the XML trick (see below).
Create temporary table with varbinary column.
Insert varbinary values into the table
Alter table alter column to varchar with utf8 collation
drop table if exists
#bin,
#utf8;
create table #utf8 (UTF8 VARCHAR(MAX) COLLATE Czech_100_CI_AI_SC_UTF8);
create table #bin (BIN VARBINARY(MAX));
insert into #utf8 (UTF8) values ('Žluťoučký kůň říčně pěl ďábelské ódy za svitu měsíce.');
insert into #bin (BIN) select CAST(UTF8 AS varbinary(max)) from #utf8;
select * from #utf8; --here you can see the utf8 string is stored correctly and that
select BIN, CAST(BIN AS VARCHAR(MAX)) from #bin; --utf8 binary is converted into gibberish
alter table #bin alter column BIN varchar(max) collate Czech_100_CI_AI_SC_UTF8;
select * from #bin; --voialá, correctly converted varchar
alter table #bin alter column BIN nvarchar(max);
select * from #bin; --finally, correctly converted nvarchar
Speed difference
The Database trick together with the Insert trick are the fastest ones.
The XML trick is slower.
The Alter table trick is stupid, don't do it. It loses out when you change lots of short texts at once (the altered table is large).
The test:
first string contains one replace for the XML trick
second string is plain ASCII with no replaces for XML trick
#TextLengthMultiplier determines length of the converted text
#TextAmount determines how many of them at once will be converted
------------------
--TEST SETUP
--DECLARE #LongText NVARCHAR(MAX) = N'český jazyk, Tiếng Việt, русский язык, 漢語, ]]>';
--DECLARE #LongText NVARCHAR(MAX) = N'JUST ASCII, for LOLZ------------------------------------------------------';
DECLARE
#TextLengthMultiplier INTEGER = 100000,
#TextAmount INTEGER = 10;
---------------------
-- TECHNICALITIES
DECLARE
#StartCDATA DATETIME2(7), #EndCDATA DATETIME2(7),
#StartTable DATETIME2(7), #EndTable DATETIME2(7),
#StartDB DATETIME2(7), #EndDB DATETIME2(7),
#StartInsert DATETIME2(7), #EndInsert DATETIME2(7);
drop table if exists
#longTexts,
#longBinaries,
#CDATAConverts,
#DBConverts,
#INsertConverts;
CREATE TABLE #longTexts (LongText VARCHAR (MAX) COLLATE Czech_100_CI_AI_SC_UTF8 NOT NULL);
CREATE TABLE #longBinaries (LongBinary VARBINARY(MAX) NOT NULL);
CREATE TABLE #CDATAConverts (LongText VARCHAR (MAX) COLLATE Czech_100_CI_AI_SC_UTF8 NOT NULL);
CREATE TABLE #DBConverts (LongText VARCHAR (MAX) COLLATE Czech_100_CI_AI_SC_UTF8 NOT NULL);
CREATE TABLE #InsertConverts (LongText VARCHAR (MAX) COLLATE Czech_100_CI_AI_SC_UTF8 NOT NULL);
insert into #longTexts --make the long text longer
(LongText)
select
REPLICATE(#LongText, #TextLengthMultiplier)
from
TESTES.dbo.Numbers --use while if you don't have number table
WHERE
Number BETWEEN 1 AND #TextAmount; --make more of them
insert into #longBinaries (LongBinary) select CAST(LongText AS varbinary(max)) from #longTexts;
--sanity check...
SELECT TOP(1) * FROM #longTexts;
------------------------------
--MEASURE CDATA--
SET #StartCDATA = SYSDATETIME();
INSERT INTO #CDATAConverts
(
LongText
)
SELECT
CAST(
'<?xml version=''1.0'' encoding=''utf-8''?><![CDATA['
+ REPLACE(
LB.LongBinary,
']]>',
']]]]><![CDATA[>'
) + ']]>' AS XML
).value('.', 'Nvarchar(max)')
FROM
#longBinaries AS LB;
SET #EndCDATA = SYSDATETIME();
--------------------------------------------
--MEASURE ALTER TABLE--
SET #StartTable = SYSDATETIME();
DROP TABLE IF EXISTS #AlterConverts;
CREATE TABLE #AlterConverts (UTF8 VARBINARY(MAX));
INSERT INTO #AlterConverts
(
UTF8
)
SELECT
LB.LongBinary
FROM
#longBinaries AS LB;
ALTER TABLE #AlterConverts ALTER COLUMN UTF8 VARCHAR(MAX) COLLATE Czech_100_CI_AI_SC_UTF8;
--ALTER TABLE #AlterConverts ALTER COLUMN UTF8 NVARCHAR(MAX);
SET #EndTable = SYSDATETIME();
--------------------------------------------
--MEASURE DB--
SET #StartDB = SYSDATETIME();
INSERT INTO #DBConverts
(
LongText
)
SELECT
FUNCTIONS_ONLY.dbo.VarBinaryToUTF8(LB.LongBinary)
FROM
#longBinaries AS LB;
SET #EndDB = SYSDATETIME();
--------------------------------------------
--MEASURE Insert--
SET #StartInsert = SYSDATETIME();
INSERT INTO #INsertConverts
(
LongText
)
SELECT
LB.LongBinary
FROM
#longBinaries AS LB;
SET #EndInsert = SYSDATETIME();
--------------------------------------------
-- RESULTS
SELECT
DATEDIFF(MILLISECOND, #StartCDATA, #EndCDATA) AS CDATA_MS,
DATEDIFF(MILLISECOND, #StartTable, #EndTable) AS ALTER_MS,
DATEDIFF(MILLISECOND, #StartDB, #EndDB) AS DB_MS,
DATEDIFF(MILLISECOND, #StartInsert, #EndInsert) AS Insert_MS;
SELECT TOP(1) '#CDATAConverts ', * FROM #CDATAConverts ;
SELECT TOP(1) '#DBConverts ', * FROM #DBConverts ;
SELECT TOP(1) '#INsertConverts', * FROM #INsertConverts;
SELECT TOP(1) '#AlterConverts ', * FROM #AlterConverts ;
SQL-Server does not know UTF-8 (at least all versions you can use productivly). There is limited support starting with v2014 SP2 (and some details about the supported versions)
when reading an utf-8 encoded file from disc via BCP (same for writing content to disc).
Important to know:
VARCHAR(x) is not utf-8. It is 1-byte-encoded extended ASCII, using a codepage (living in the collation) as character map.
NVARCHAR(x) is not utf-16 (but very close to it, it's ucs-2). This is a 2-byte-encoded string covering almost any known characters (but exceptions exist).
utf-8 will use 1 byte for plain latin characters, but 2 or even more bytes to encoded foreign charsets.
A VARBINARY(x) will hold the utf-8 as a meaningless chain of bytes.
A simple CAST or CONVERT will not work: VARCHAR will take each single byte as a character. For sure this is not the result you would expect. NVARCHAR would take each chunk of 2 bytes as one character. Again not the thing you need.
You might try to write this out to a file and read it back with BCP (v2014 SP2 or higher). But the better chance I see for you is a CLR function.
you can use the following to post string into varbinary field
Encoding.Unicode.GetBytes(Item.VALUE)
then use the following to retrive data as string
public string ReadCString(byte[] cString)
{
var nullIndex = Array.IndexOf(cString, (byte)0);
nullIndex = (nullIndex == -1) ? cString.Length : nullIndex;
return System.Text.Encoding.Unicode.GetString(cString);
}

What is the best way to concatonate three varchar fields, each of which is either null or contains a comma separated string?

We have a result set that has three fields and each of those fields is either null or contains a comma separated list of strings.
We need to combine all three into one comma separated list and eliminate duplicates.
What is the best way to do that?
I found a nice function that can split a string and return a table:
T-SQL split string
I tried to create a UDF that would take three varchar parameters and call that split string function three times, combine them into one table, and then use a FOR XML from there and return it as one comma separated string.
But SQL is complaining about having a SELECT in a function.
Here's an example using the SplitString function you referenced.
DECLARE
#X varchar(max) = 'A, C, F'
, #Y varchar(max) = null
, #Z varchar(max) = 'A, D, E, A'
;WITH SplitResults as
(
-- Note: the function does not remove leading spaces.
SELECT LTRIM([Name]) [Name] FROM SplitString(#X)
UNION
SELECT LTRIM([Name]) [Name] FROM SplitString(#Y)
UNION
SELECT LTRIM([Name]) [Name] FROM SplitString(#Z)
)
SELECT STUFF((
SELECT ', ' + [Name]
FROM SplitResults
FOR XML PATH(''), TYPE
-- Note: here we're pulling the value out in case any characters were escaped, ie. &
-- and then STUFF is removing the leading ,<space>
).value('.', 'nvarchar(max)'), 1, 2, '')
I would not store data as a comma separated string in a single field. Separate the string to a new table and combine it to a string again when you need to.
Finding duplicates and managing the data will also be much easier.
I've used this function before (I didn't write it, and unfortunately cannot remember where I found it) to split a string and add a key (in this case an int) to the data as a separate table, linking back to the original table's PK
CREATE FUNCTION SplitWithID (#id int, #sep VARCHAR(10), #s VARCHAR(MAX))
RETURNS #t TABLE
(
id int,
val VARCHAR(MAX)
)
AS
BEGIN
DECLARE #xml XML
SET #XML = N'<root><r>' + REPLACE(#s, #sep, '</r><r>') + '</r></root>'
INSERT INTO #t(id,val)
SELECT #id, r.value('.','VARCHAR(40)') as Item
FROM #xml.nodes('//root/r') AS RECORDS(r)
RETURN
END
GO
Once you have the data on separate rows you can use any duplicate removal technique to clean the data before applying a primary key to the table.

Select just first line of chars up to CR/LF from a text column

Is it possible to select or substring just the first line of chars in a SQL Server text column, to then prepend as the first line of chars in another text field in another table?
If you are running SQL Server 2005 or higher:
In the LEFT command, use CHARINDEX on CHAR(13) to find the position of the first line feed character, as in the following example:
declare #a table(id int identity(1,1) not null, lines text); --Source
declare #b table(id int identity(1,1) not null, lines text); --Target
insert into #a(lines) values ('1111111'+char(13)+char(10)+'222222')
insert into #b(lines) values ('aaaaa');
update b
set lines=LEFT(cast(a.lines as varchar(max)),CHARINDEX(char(13),cast(a.lines as varchar(max)),1)-1)+cast(b.lines as varchar(max))
from #a a
join #b b on a.id=b.id;
select * from #b;
I suggest also updating your TEXT data types to varchar(max), if possible. varchar(max) is much more robust.
Yes, do a substring or left till the first newline of the text field.
You could easily assign this via subquery for use in insert or update statements.
SELECT ( CASE WHEN CHARINDEX(CHAR(13), action_Item.Description) = 0
THEN action_Item.Description
ELSE SUBSTRING(action_Item.Description, 0,
CHARINDEX(CHAR(13), action_Item.Description))
END ) AS [Description] FROM action_Item
Where I am selecting the first line if "Description" field from a table called "action_Item"
DECLARE #crlf char(2);
SET #crlf = CHAR(13) + CHAR(10);
UPDATE table1
SET LEFT(table2.fieldWithCRLF, CHARINDEX(table2.fieldWithCRLF, #crlf, 0) - 1) + table1.fieldToPrepend
FROM table1
INNER JOIN table2
ON table1.sharedKey = table2.sharedKey
WHERE CHARINDEX(table2.fieldWithCRLF, #crlf, 0) > 0

Resources