Varbinary bytes in Snowflake - snowflake-cloud-data-platform

If below query is executed in mssql I am getting following output:
Query:
select SubString(0x003800010102000500000000,1, 2) as A
,SubString(0x003800010102000500000000, 6, 1) as B
,CAST(CAST(SubString(0x003800010102000500000000, 9, cast(SubString(0x003800010102000500000000,
6, 1)As TinyInt)) As VarChar) As Float) as D
Reading Format: 0x 00 38 00 01 01 02 00 05 00 00 00 00
Output:
A B D
0x0038 0x02 0
Above substring function is taking two byte for each index value specified excluding the first two bytes "0x" in mssql.
Now I am trying to achieve the same output using snowflake. Can someone pls help as I am difficulty in understanding the byte split into two by creating a function.
CREATE OR REPLACE FUNCTION getFloat1 (p1 varchar) RETURNS Float as $$
Select Case
WHEN concat(substr(p1::varchar,1, 2),substr(p1::varchar,5, 4)) <> '0x3E00'
then 0::float
ELSE 1::float
//Else substr(p1::varchar, 9, substr(p1::varchar, 6, 1)):: float End as test1 $$ ;

Snowflake doesn't have a binary literal, so no notation automatically treats a value as a binary like the 0x notation in SQL Server. You always have to cast a value to the BINARY data type to treat the value as a binary.
Also, there are several differences around the BINARY data type handling between SQL Server and Snowflake:
SUBSTRING in Snowflake can handle only a string
... then the second argument of SUBSTRING must be the number of characters as a hex string, not the number of bytes as a binary
Snowflake supports a hex string as a representation of a binary, but the hex string must not include the 0x prefix
Snowflake has no way to convert a binary to numbers but can convert a hex string by using 'X' format string in TO_NUMBER
Based on the above differences, below is an example query achieving the same result as your SQL Server query:
select
substring('003800010102000500000000', 1, 4)::binary A,
substring('003800010102000500000000', 11, 2)::binary B,
to_number(
substring(
'003800010102000500000000',
17,
to_number(substring('003800010102000500000000', 11, 2), 'XX')*2
),
'XXXX'
)::float D
;
It returns the below result that is the same as your query:
/*
A B D
0038 02 0
*/
Explanation:
Since Snowflake doesn't have a binary literal and SUBSTRING only supports a string (VARCHAR), any binary manipulation has to be done with a VARCHAR hex string.
So, in the query, the first SUBSTRING starts from 1 and extracts 4 characters because 1 byte consists of 2 hex characters, then extracting 2 bytes is equivalent to extracting 4 hex characters.
The second SUBSTRING starts from 11 because starting from the 6th byte means ignoring 5 bytes (= 10 hex characters) and starting from the following hex character which is the first hex character of the 6th byte (10 + 1 = 11).
The third SUBSTRING is the same as the second one, starting from the 9th byte means ignoring 8 bytes (= 16 hex characters) and starting from the following hex character (16 + 1 = 17).
Also, to convert from a hex string to numeric data types, using the X character in the second "format" argument of the TO_NUMBER cast function to parse the string as a collection of hex characters. A single X character corresponds to a single hex character in the string to be parsed. That's why I used 'XX' to parse a single byte (2 hex characters) and used 'XXXX' to parse 2 bytes (4 hex characters).

Related

Can I sum integer input from terminal without saving the input as a variable?

I'm trying to write a code for digital root of an extremely big number and can't save it as a variable. Is it possible to do without it?
What you're looking to do is to repeatedly add the digits of a number until you're left with a single digit number, i.e. given 123456, you want 1 + 2 + 3 + 4 + 5 + 6 = 21 ==> 2 + 1 = 3
For a number with up to 50 million digits, the sum of those digits will be no more than 500 million which is well within the range of a 32-bit int.
Start by reading the large number as a string. Then iterate over each character in the string. For each character, verify that it's a character digit, i.e. between '0' and '9'. Convert that character to the appropriate number, then add that number to the sum.
Once you've done that, you've got the first-level sum stored in an int. Now you can loop through the digits of that number using x % 10 to get the lowest digit and x / 10 to shift over the remaining digits. Once you've exhausted the digits, repeat the process until you're left with a value less than 10.

Why is there a difference in precision range widths for decimal?

As is evident by the MSDN description of decimal certain precision ranges have the same amount of storage bytes assigned to them.
What I don't understand is that there are differences in the sizes of the range. How the range from 1 to 9 of 5 storage bytes has a width of 9, while the range from 10 to 19 of 9 storage bytes has a width of 10. Then the next range of 13 storage bytes has a width of 9 again, while the next has a width of 10 again.
Since the storage bytes increase by 4 every time, I would have expected all of the ranges to be the same width. Or maybe the first one to be smaller to reserve space for the sign or something but from then on equal in width. But it goes from 9 to 10 to 9 to 10 again.
What's going on here? And if it would exist, would 21 storage bytes have a precision range of 39-47 i.e. is the pattern 9-10-9-10-9-10...?
would 21 storage bytes have a precision range of 39-47
No. 2 ^ 160 = 1,461,501,637,330,902,918,203,684,832,716,283,019,655,932,542,976 - which has 49 decimal digits. So this hypothetical scenario would cater for a precision range of 39-48 (as a 20 byte integer would not be big enough to hold any 49 digit numbers larger than that)
The first byte is reserved for the sign.
01 is used for positive numbers; 00 for negative.
The remainder stores the value as an integer. i.e. 1.234 would be stored as the integer 1234 (or some multiple of 10 of this dependant on the declared scale)
The length of the integer is either 4, 8, 12 or 16bytes depending on the declared precision. Some 10 digit integers can be stored in 4 bytes however to get the whole range in would overflow this so it needs to go to the next step up.
And so on.
2^32 = 4,294,967,295 (10 digits)
2^64 = 18,446,744,073,709,551,616 (20 digits)
2^96 = 79,228,162,514,264,337,593,543,950,336 (29 digits)
2^128 = 340,282,366,920,938,463,463,374,607,431,768,211,456 (39 digits)
You need to use DBCC PAGE to see this, casting the column as binary does not give you the storage representation. Or use a utility like SQL Server internals viewer.
CREATE TABLE T(
A DECIMAL( 9,0),
B DECIMAL(19,0),
C DECIMAL(28,0) ,
D DECIMAL(38,0)
);
INSERT INTO T VALUES
(999999999, 9999999999999999999, 9999999999999999999999999999, 99999999999999999999999999999999999999),
(-999999999, -9999999999999999999, -9999999999999999999999999999, -99999999999999999999999999999999999999);
Shows the first row stored as
And the second as
Note that the values after the sign bit are byte reversed. 0x3B9AC9FF = 999999999

EF ADN specification in SIM/USIM

I am building an application to read SIM EF files. From 3G TS 31.102 I am trying to parse the EF ADN file.
According to spec for EF ADN,
1 to X Alpha Identifier O X bytes
X+1 Length of BCD number/SSC contents M 1 byte
X+2 TON and NPI M 1 byte
X+3 to X+12 Dialling Number/SSC String M 10 bytes
X+13 Capability/Configuration Identifier M 1 byte
X+14 Extension1 Record Identifier M 1 byte
I am not able to get the coding for -> Length of BCD number/SSC contents.
In the spec the coding is according to GSM 04.08 but I am not able to find.
There is a good utility class for BCD operations to test. Assuming that you are asking how to get length of BCD digits of Abbreviated Dialling Number. ADN numbers can be 3-4 digits , if they are written as BCD they would be 2 bytes long because each BCD digit is 4-bits nibble, after TON/NPI byte you should read N bytes and convert to it to decimal value
byte[] bcds = DecToBCDArray(211);
System.out.println("BCD is "+ Hex.toHexString(bcds));
System.out.println("BCD length is "+ bcds.length);
System.out.println("To decimal "+ BCDtoString(bcds));

output of the c format for gdb x command

From the documentation at [1], an example output for the gdb x command in c format is as follows:
(gdb) x/c testArray
0xbfffef7b: 48 '0'
(gdb) x/5c testArray
0xbfffef7b: 48 '0' 49 '1' 50 '2' 51 '3' 52 '4'
What does these numbers, such as 48, 49, 50 in the output mean?
Is it some kind of relative address?
Thank you very much!
[1] http://visualgdb.com/gdbreference/commands/x
x is displaying the memory contents at a given address using the given format.
In your specific case, x/5c is displaying the first 5 bytes at the memory location testArray, and printing the bytes as a char.
The first 5 bytes of testArray are the characters 0, 1, 2, 3, 4 (the value in single quotes). The value before is the decimal value of the char.

Decode table construction for base64

I am reading this libb64 source code for encoding and decoding base64 data.
I know the encoding procedure but i can't figure out how the following decoding table is constructed for fast lookup to perform decoding of encoded base64 characters. This is the table they are using:
static const char decoding[] = {62,-1,-1,-1,63,52,53,54,55,56,57,58,59,60,61,-1,-1,-1,-2,-1,-1,-1,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,-1,-1,-1,-1,-1,-1,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51};
Can some one explain me how the values in this table are used for decoding purpose.
It's a shifted and limited ASCII translating table. The keys of the table are ASCII values, the values are base64 decoded values. The table is shifted such that the index 0 actually maps to the ASCII character + and any further indices map the ASCII values after +. The first entry in the table, the ASCII character +, is mapped to the base64 value 62. Then three characters are ignored (ASCII ,-.) and the next character is mapped to the base64 value 63. That next character is ASCII /.
The rest will become obvious if you look at that table and the ASCII table.
It's usage is something like this:
int decode_base64(char ch) {
if (ch < `+` or ch > `z`) {
return SOME_INVALID_CH_ERROR;
}
/* shift range into decoding table range */
ch -= `+`;
int base64_val = decoding[ch];
if (base64_val < 0) {
return SOME_INVALID_CH_ERROR;
}
return base64_val;
}
As know, each byte has 8 bits, possible 256 combinations with 2 symbols (base2).
With 2 symbols is need to waste 8 chars to represent a byte, for example '01010011'.
With base 64 is possible to represent 64 combinations with 1 char...
So, we have a base table:
A = 000000
B = 000001
C = 000010
...
If you have the word 'Man', so you have the bytes:
01001101, 01100001, 01101110
and so the stream:
011010110000101101110
Break in group of six bits: 010011 010110 000101 101110
010011 = T
010110 = W
000101 = F
101110 = u
So, 'Man' => base64 coded = 'TWFu'.
As saw, this works perfectly to streams whith length multiple of 6.
If you have a stream that isn't multiple of 6, for example 'Ma' you have the stream:
010011 010110 0001
you need to complete to have groups of 6:
010011 010110 000100
so you have the coded base 64:
010011 = T
010110 = W
000100 = E
So, 'Ma' => 'TWE'
After to decode the stream, in this case you need to calc the last multiple length to be multiple of 8 and so remove the extra bits to obtain the original stream:
T = 010011
W = 010110
E = 000100
1) 010011 010110 000100
2) 01001101 01100001 00
3) 01001101 01100001 = 'Ma'
In really, when we put the trailing 00s, we mark the end of Base64 string with '=' to each trailing '00 added ('Ma' ==> Base64 'TWE=')
See too the link: http://www.base64decode.org/
Images represented on base 64 is a good option to represent with strings in many applications where is hard to work directly with a real binary stream. Real binary stream is better because is a base256, but is difficult inside HTML for example, there are 2 ways, minor traffic, or more easy to work with strings.
See ASCII codes too, the chars of base 64 is from range '+' to 'z' on table ASCII but there are some values between '+' and 'z' that isn't base 64 symbols
'+' = ASCII DEC 43
...
'z' = ASCII DEC 122
from DEC 43 to 122 are 80 values but
43 OK = '+'
44 isn't a base 64 symbols so the decoding index is -1 (invalid symbol to base64)
45 ....
46 ...
...
122 OK = 'z'
do the char needed to decode, decremented of 43 ('+') to be index 0 on vector to quick access by index so, decoding[80] = {62, -1, -1 ........, 49, 50,51};
Roberto Novakosky
Developer Systems
Considering these 2 mapping tables:
static const char decodingTab[] = {62,-1,-1,-1,63,52,53,54,55,56,57,58,59,60,61,-1,-1,-1,-2,-1,-1,-1,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,-1,-1,-1,-1,-1,-1,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51};
static unsigned char encodingTab[64]="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
decodingTab is the reverse mapping table of encondingTab.
So decodingTab[i] should never be -1.
In fact, only 64 values are expected. However decodingTab size is 128.
So, in decodingTab,unexpected index values are set to -1 (an arbitrary number which is not in [0,63])
char c;
unsigned char i;
...
encoding[decoding[c]]=c;
decoding[encoding[i]=i;
Hope it helps.

Resources