base64 encode for chinese chars - sql-server

I'm using both the following methods to encode in base 64 a Chinese string. Problem is that I'm having Pz8= as output, which decoded is ??
What's wrong with this and how can I fix it?
Method 1
CREATE FUNCTION [dbo].[base64Encode] (#input VARCHAR(MAX))
RETURNS NVARCHAR(MAX)
AS
BEGIN
DECLARE #output NVARCHAR(MAX),
#bits VARBINARY(3),
#pos INT
SET #pos = 1
SET #output = ''
WHILE #pos <= LEN(#input)
BEGIN
SET #bits = CONVERT(VARBINARY(3), SUBSTRING(#input, #pos, 3))
SET #output = #output + SUBSTRING('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/', SUBSTRING(#bits, 1, 1) / 4 + 1, 1)
SET #output = #output + SUBSTRING('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/', SUBSTRING(#bits, 1, 1) % 4 * 16 + SUBSTRING(#bits, 2, 1) / 16 + 1, 1)
SET #output = #output + SUBSTRING('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/', SUBSTRING(#bits, 2, 1) % 16 * 4 + SUBSTRING(#bits, 3, 1) / 64 + 1, 1)
SET #output = #output + SUBSTRING('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/', SUBSTRING(#bits, 3, 1) % 64 + 1, 1)
SET #pos = #pos + 3
END
RETURN (LEFT(#output, LEN(#output) - 3 + LEN(#bits)) + REPLICATE('=', 3 - LEN(#bits)))
END
SELECT [dbo].[base64Encode]('你好')
Method 2
SELECT CAST('你好' as varbinary(max)) FOR XML PATH(''), BINARY BASE64

You are missing the N to mark a string literal as unicode:
SELECT N'你好' AS unicode
,'你好' AS ASCII
Try this to get a base64 out of your chinese charcters and vice versa:
SELECT (SELECT CAST(N'你好' AS VARBINARY(MAX)) FOR XML PATH(''),TYPE).value(N'.','nvarchar(max)');
You get this base64 result: YE99WQ==
This is the way to re-convert the base64 to the original value
SELECT CAST(CAST('<x>' + 'YE99WQ==' + '</x>' AS XML).value('.','varbinary(max)') AS NVARCHAR(MAX));
UPDATE Some words about the re-encoding
base64 does not encode a string value, but the binary pattern a system uses to keep that string in memory (this is valid for any data type actually). The bit pattern of a string differs with UTF-8, UTF-16, ASCII whatever... And even worse there is BE and LE.
The steps to get base64 are:
Get the bit pattern of my value (a string, a date, a picture, any value actually)
compute the base64 for this bit pattern
The steps for the re-encoding are
Compute the original bit pattern which is hidden within the base64
Interpret the bit pattern as the original value
The very last step might bring up confusion... You have to know exactly which binary representation a system uses. You have to use exactly the same data type with exactly the same interpretation to get the values back.
With strings one has to know, that SQL-Server works with a very limited choice natively.
There is NVARCHAR (NCHAR), which is 2-byte encoded unicode in UCS-2 flavour (almost the same as utf-16)
And there is VARCHAR (CHAR), which is 1-byte encoded extended ASCII. All non-latin characters are bound to a code page within the connected collation. But this is not UTF-8!

Related

My string comparison doesn't work with hidden ascii characters

I'm trying to compare the concatenation of two strings like this:
SELECT TestChemicalName, ResultChemicalName,
CASE
WHEN LAB_TestChemicalName + LAB_ResultChemicalName = TestChemicalName + ResultChemicalName THEN NULL
WHEN LAB_TestChemicalName + LAB_ResultChemicalName <> TestChemicalName + ResultChemicalName THEN LAB_TestChemicalName + ' ' + LAB_ResultChemicalName
ELSE NULL
END AS FinalElementName
FROM dbo.chemicalTraceTesting
If LAB_TestChemicalName + LAB_ResultChemicalName is the same/equals TestChemicalName + ResultChemicalName, then I want to return NULL.
However, if they are not equal, I want to return it as LAB_TestChemicalName + ' ' + LAB_ResultChemicalName
90% of the time this works, but if there are hidden ascii encoding symbols, like if someone just did a copy and paste from HTML, or Word or Excel, it will sometimes have strange characters. Then my query above won't catch that.
Is there a better way to compare two strings?
Thanks!
With bad data you're never going to get a reliable solution. The best you can do is some heursitic that is good enough most of the time.
What you need to do is compute a hash of the string that has the property that if two strings have the same hash then you consider them to be equal.
Maybe something like.
CREATE FUNCTION Slap(#source nvarchar(max)) RETURNS varchar(max)
AS
BEGIN
DECLARE #hash varchar(max)
;
WITH cteC AS (
SELECT 0 AS I, SUBSTRING(#source, 0, 1) AS C
UNION ALL
SELECT I + 1, SUBSTRING(#Source, I, 1) AS c FROM cteC WHERE I <= LEN(#source)
)
SELECT #hash = STRING_AGG(C, '')
FROM cteC
WHERE ASCII(C) >= 32
AND ASCII(C) <= 126
RETURN #hash
END
This is likely to be very slow.
And it'll fail on long strings.

Is there a way in SQL server to interpret the underlying varchar(4) bits as an INT?

I have data harvested from a binary file that has been encoded as a SQL column with type varchar(4). This is not changeable. The 4 bytes used to create this varchar need to be interpreted sometimes as an int value (big endian). It would be nice if we could do this entirely inside SQL.
Printing the values in this varchar(4) column is not helpful as most of the bytes get interpreted as unprintable control characters.
I can't figure out how CAST or CONVERT can help since they seem to be tailored to converting a varchar like "0054" to int 54. Instead, I need the underlying bits to be interpreted as an int (big endian)--not the varchar characters as an int.
For example, one record prints this column as no visible characters, but STRING_ESCAPE(#value,'json')
will display
\u0000\u0000\u0000\u0007
This needs to be interpreted somehow to be the int 7
Here's a few more examples of what STRING_ESCAPE returns and what the int value should be:
\u0000\u0000\u0000\b ==> 8
\u0000\u0000\u0000\t ==> 9
\u0000\u0000\u0000\n ==> 10
\u0000\u0000\u0000\u000b ==> 11
\u0000\u0000\u0000\f ==> 12
\u0000\u0000\u0000\r ==> 13
\u0000\u0000\u0000\u000e ==> 14
\u0000\u0000\u0000\u000f ==> 15
\u0000\u0000\u0000\u0010 ==> 16
Thanks for your brain!
So, here is a table of sample data. The first row represents your main example. But you don't have any examples where any one of the first 3 characters is not character 0. So I threw in another row where this is the case.
declare #values table (value char(4))
insert #values values
(char(0) + char(0) + char(0) + char(7)),
(char(13) + char(9) + char(14) + char(8));
In the query below, I isolate each character using substring. Then I call ascii to retrieve the character code. What is not clear, however, is how you would take those integer values and combine them. I give 3 possibilities. 'Option1' concatenates them. 'Option2' sums them together. 'Option3' concatenates them like option1, but pads them first so that there is a leading '0' if it is only one digit long.
select escapedVal = string_escape(value,'json'),
ap.*,
option1 = convert(int,concat(pos1, pos2, pos3, pos4)),
option2 = pos1 + pos2 + pos3 + pos4,
option3 = convert(int,
right('00' + convert(varchar(2),pos1),2) +
right('00' + convert(varchar(2),pos2),2) +
right('00' + convert(varchar(2),pos3),2) +
right('00' + convert(varchar(2),pos4),2)
)
from #values v
cross apply (select
pos1 = ascii(substring(value,1,1)),
pos2 = ascii(substring(value,2,1)),
pos3 = ascii(substring(value,3,1)),
pos4 = ascii(substring(value,4,1))
) ap;
This produces:
escapedVal
pos1
pos2
pos3
pos4
option1
option2
option3
\u0000\u0000\u0000\u0007
0
0
0
7
7
7
7
\r\t\u000e\b
13
9
14
8
139148
44
13091408
CAST(CAST(#value as BINARY(4)) as INT)
The part I was missing is specifying the size of binary as 4. Without the size, it always casts to 0!

How to convert GPS exif to geography?

I have attachments table which has GPSLatitude and GPSLongitude columns for each attachment. It's legacy code which is populating the fields and the values looks like:
GPSLatitude
50/1,5/1,1897/100
GPSLongitude
14/1,25/1,4221/100
Is there any build in function which I can use in order to convert them to latitude and longitude decimal values like this:
Location Latitude
41.5803
Location Longitude
-83.9124
I can implement SQL CLR function if this can be done easier with .net also.
What is most difficult for me right now is to understand what these values represent. The legacy code is using some API with no documentation about the returned format and how to read it.
The values above are just for showing how the data is formatted. The following library is used to get the values - chestysoft like this:
IF Image.ExifValueByName("GPSLatitude") <> "" THEN GPSLatitude = Image.ExifValueByName("GPSLatitude") ELSE GPSLatitude = NULL END IF
IF Image.ExifValueByName("GPSLongitude") <> "" THEN GPSLongitude = Image.ExifValueByName("GPSLongitude") ELSE GPSLongitude = NULL END IF
I'm fairly certain you should read it as:
50/1: 50 Degrees
5/1: 5 Minutes
1897/100: 18.97 Seconds
This would put the location you've given in New York (assuming N/W), does that make sense? If you have no way to validate the output it's very difficult to make any other suggestion... See also here
In the link you provided, you can upload a picture to view the exif data. There you can test with known locations. It is also apparent that in the values you mentioned, the GPSLatitudeRef and GPSLongitudeRef are missing. You need these to change the values to a location. Do you have those values in your table? Otherwise you'll have to make (wild) assumptions.
This is by the way the standard EXIF notation for latitude/longitude; I assume there are many, many tools to convert it.
Assuming that #Cool_Br33ze is correct, and the data is in degrees, minutes and seconds, you can calculate the values you need using the following:
declare #v varchar(30) = '50/1,5/1,1897/100'
select #v Original_Value,
convert(decimal(18,4), left(#v, charindex('/', #v) - 1)) [Degrees],
convert(decimal(18,4), substring(
#v,
charindex(',', #v) + 1,
charindex('/', #v, charindex(',', #v)) - (charindex(',', #v) + 1)
) / 60.0
) [Minutes],
convert(decimal(18,4), substring(
#v,
charindex(',', #v, (charindex(',', #v) + 1)) + 1,
charindex('/', #v, charindex(',', #v, (charindex(',', #v) + 1))) - (charindex(',', #v, (charindex(',', #v) + 1)) + 1)
) / 360000.0
) [Seconds]
It looks a bit of a mess, but it splits out the degrees, minutes and seconds (converted to DECIMAL(18,4)), all you need to do is add the three values together to get your Lat/Long value in degrees.
I'd test it thoroughly before implementing it though.

formatting datetime to varchar with padded zeros

I have the following field called "MaterialPrice". It is a data type of -
DECIMAL (18,2)
So a sample values is "10.88"
What I need to change it to is something like below -
0000000000000**1088**0
So the field length is 18, where the last character (to the left is always 0) and the characters in front of the original value are padded with zeros also.
Another example would be
501.02
would be
000000000000**50102**0
Any help would be appreciated.
Thanks
If I understand correctly the requirement, you could as the below:
DECLARE #val DECIMAL(18, 2) = 501.02
SELECT REPLICATE(0, 18 - LEN(#val)) + '**' + REPLACE(CAST(#val AS VARCHAR(50)), '.', '') + '**0'
Result: 000000000000**50102**0
I would:
Multiply by 100,
cast to string,
Measure length,
Concatenate: (17-length) "0"s, "**", the string number and "**0

Convert float into varchar in SQL Server without scientific notation

Convert float into varchar in SQL Server without scientific notation and trimming decimals.
For example:
I have the float value 1000.2324422, and then it would be converted into varchar as same 1000.2324422.
There could be any number of decimal values...the float value comes randomly.
Casting or converting to VARCHAR(MAX) or anything else did not work for me using large integers (in float fields) such as 167382981, which always came out '1.67383e+008'.
What did work was STR().
Neither str() or cast(float as nvarchar(18)) worked for me.
What did end up working was converting to an int and then converting to an nvarchar like so:
convert(nvarchar(18),convert(bigint,float))
The STR function works nice. I had a float coming back after doing some calculations and needed to change to VARCHAR, but I was getting scientific notation randomly as well. I made this transformation after all the calculations:
ltrim(rtrim(str(someField)))
Try CAST(CAST(#value AS bigint) AS varchar)
This works:
CONVERT(VARCHAR(100), CONVERT(DECIMAL(30, 15), fieldname))
Try this:
SELECT REPLACE(RTRIM(REPLACE(REPLACE(RTRIM(REPLACE(CAST(CAST(YOUR_FLOAT_COLUMN_NAME AS DECIMAL(18,9)) AS VARCHAR(20)),'0',' ')),' ','0'),'.',' ')),' ','.') FROM YOUR_TABLE_NAME
Casting as DECIMAL will put decimal point on every value, whether it
had one before or not.
Casting as VARCHAR allows you to use the REPLACE function
First REPLACE zeros with spaces, then RTRIM to get rid of all trailing spaces (formerly zeros), then REPLACE remaining spaces with zeros.
Then do the same for the period to get rid of it for numbers with no decimal values.
This is not relevant to this particular case because of the decimals, but may help people who google the heading.
Integer fields convert fine to varchars, but floats change to scientific notation. A very quick way to change a float quickly if you do not have decimals is therefore to change the field first to an integer and then change it to a varchar.
Below is an example where we can convert float value without any scientific notation.
DECLARE #Floater AS FLOAT = 100000003.141592653
SELECT CAST(ROUND(#Floater, 0) AS VARCHAR(30))
,CONVERT(VARCHAR(100), ROUND(#Floater, 0))
,STR(#Floater)
,LEFT(FORMAT(#Floater, ''), CHARINDEX('.', FORMAT(#Floater, '')) - 1)
SET #Floater = #Floater * 10
SELECT CAST(ROUND(#Floater, 0) AS VARCHAR(30))
,CONVERT(VARCHAR(100), ROUND(#Floater, 0))
,STR(#Floater)
,LEFT(FORMAT(#Floater, ''), CHARINDEX('.', FORMAT(#Floater, '')) - 1)
SET #Floater = #Floater * 100
SELECT CAST(ROUND(#Floater, 0) AS VARCHAR(30))
,CONVERT(VARCHAR(100), ROUND(#Floater, 0))
,STR(#Floater)
,LEFT(FORMAT(#Floater, ''), CHARINDEX('.', FORMAT(#Floater, '')) - 1)
SELECT LEFT(FORMAT(#Floater, ''), CHARINDEX('.', FORMAT(#Floater, '')) - 1)
,FORMAT(#Floater, '')
In the above example, we can see that the format function is useful for us. FORMAT() function returns always nvarchar.
I have another solution since the STR() function would result some blank spaces, so I use the FORMAT() function as folowing example:
SELECT ':' + STR(1000.2324422), ':' + FORMAT(1000.2324422,'##.#######'), ':' + FORMAT(1000.2324422,'##')
The result of above code would be:
: 1000 :1000.2324422 :1000
You can use this code:
STR(<Your Field>, Length, Scale)
Your field = Float field for convert
Length = Total length of your float number with Decimal point
Scale = Number of length after decimal point
For example:
SELECT STR(1234.5678912,8,3)
The result is: 1234.568
Note that the last digit is also round up.
You will have to test your data VERY well. This can get messy. Here is an example of results simply by multiplying the value by 10. Run this to see what happens.
On my SQL Server 2017 box, at the 3rd query I get a bunch of *********. If you CAST as BIGINT it should work every time. But if you don't and don't test enough data you could run into problems later on, so don't get sucked into thinking it will work on all of your data unless you test the maximum expected value.
Declare #Floater AS FLOAT =100000003.141592653
SELECT CAST(ROUND(#Floater,0) AS VARCHAR(30) ),
CONVERT(VARCHAR(100),ROUND(#Floater,0)),
STR(#Floater)
SET #Floater =#Floater *10
SELECT CAST(ROUND(#Floater,0) AS VARCHAR(30) ),
CONVERT(VARCHAR(100),ROUND(#Floater,0)),
STR(#Floater)
SET #Floater =#Floater *100
SELECT CAST(ROUND(#Floater,0) AS VARCHAR(30) ),
CONVERT(VARCHAR(100),ROUND(#Floater,0)),
STR(#Floater)
There are quite a few answers but none of them was complete enough to accommodate the scenario of converting FLOAT into NVARCHAR, so here we are.
This is what we ended up with:
DECLARE #f1 FLOAT = 4000000
DECLARE #f2 FLOAT = 4000000.43
SELECT TRIM('.' FROM TRIM(' 0' FROM STR(#f1, 30, 2))),
TRIM('.' FROM TRIM(' 0' FROM STR(#f2, 30, 2)))
SELECT CAST(#f1 AS NVARCHAR),
CAST(#f2 AS NVARCHAR)
Output:
------------------------------ ------------------------------
4000000 4000000.43
(1 row affected)
------------------------------ ------------------------------
4e+006 4e+006
(1 row affected)
In our scenario the FLOAT was a dollar amount to 2 decimal point was sufficient, but you can easily increase it to your needs.
In addition, we needed to trim ".00" for round numbers.
Try this code
SELECT CONVERT(varchar(max), CAST(1000.2324422 AS decimal(11,2)))
Result:
1000.23
Here decimal(11,2): 11-total digits count (without the decimal point), 2 - for two digits after the decimal point
None of the previous answers for me. In the end I simply used this:
INSERT INTO [Destination_Table_Name]([Field_Name])
SELECT CONCAT('#',CAST([Field_Name] AS decimal(38,0))) [Field_Name]
FROM [dbo].[Source_Table_Name] WHERE ISNUMERIC([CIRCUIT_NUMBER]) = 1
INSERT INTO [Destination_Table_Name]([Field_Name])
SELECT [Field_Name]
FROM [dbo].[Source_Table_Name] WHERE ISNUMERIC([CIRCUIT_NUMBER]) <> 1
select format(convert(float,#your_column),'0.0#########')
Advantage: This solution is independent of the source datatype (float, scientific, varchar, date, etc.)
String is limited to 10 digits, and bigInt gets rid of decimal values.
This works:
Suppose
dbo.AsDesignedBites.XN1E1 = 4016519.564`
For the following string:
'POLYGON(('+STR(dbo.AsDesignedBites.XN1E1, 11, 3)+'...

Resources