If below query is executed in mssql I am getting following output:
Query:
select SubString(0x003800010102000500000000,1, 2) as A
,SubString(0x003800010102000500000000, 6, 1) as B
,CAST(CAST(SubString(0x003800010102000500000000, 9, cast(SubString(0x003800010102000500000000,
6, 1)As TinyInt)) As VarChar) As Float) as D
Reading Format: 0x 00 38 00 01 01 02 00 05 00 00 00 00
Output:
A B D
0x0038 0x02 0
Above substring function is taking two byte for each index value specified excluding the first two bytes "0x" in mssql.
Now I am trying to achieve the same output using snowflake. Can someone pls help as I am difficulty in understanding the byte split into two by creating a function.
CREATE OR REPLACE FUNCTION getFloat1 (p1 varchar) RETURNS Float as $$
Select Case
WHEN concat(substr(p1::varchar,1, 2),substr(p1::varchar,5, 4)) <> '0x3E00'
then 0::float
ELSE 1::float
//Else substr(p1::varchar, 9, substr(p1::varchar, 6, 1)):: float End as test1 $$ ;
Snowflake doesn't have a binary literal, so no notation automatically treats a value as a binary like the 0x notation in SQL Server. You always have to cast a value to the BINARY data type to treat the value as a binary.
Also, there are several differences around the BINARY data type handling between SQL Server and Snowflake:
SUBSTRING in Snowflake can handle only a string
... then the second argument of SUBSTRING must be the number of characters as a hex string, not the number of bytes as a binary
Snowflake supports a hex string as a representation of a binary, but the hex string must not include the 0x prefix
Snowflake has no way to convert a binary to numbers but can convert a hex string by using 'X' format string in TO_NUMBER
Based on the above differences, below is an example query achieving the same result as your SQL Server query:
select
substring('003800010102000500000000', 1, 4)::binary A,
substring('003800010102000500000000', 11, 2)::binary B,
to_number(
substring(
'003800010102000500000000',
17,
to_number(substring('003800010102000500000000', 11, 2), 'XX')*2
),
'XXXX'
)::float D
;
It returns the below result that is the same as your query:
/*
A B D
0038 02 0
*/
Explanation:
Since Snowflake doesn't have a binary literal and SUBSTRING only supports a string (VARCHAR), any binary manipulation has to be done with a VARCHAR hex string.
So, in the query, the first SUBSTRING starts from 1 and extracts 4 characters because 1 byte consists of 2 hex characters, then extracting 2 bytes is equivalent to extracting 4 hex characters.
The second SUBSTRING starts from 11 because starting from the 6th byte means ignoring 5 bytes (= 10 hex characters) and starting from the following hex character which is the first hex character of the 6th byte (10 + 1 = 11).
The third SUBSTRING is the same as the second one, starting from the 9th byte means ignoring 8 bytes (= 16 hex characters) and starting from the following hex character (16 + 1 = 17).
Also, to convert from a hex string to numeric data types, using the X character in the second "format" argument of the TO_NUMBER cast function to parse the string as a collection of hex characters. A single X character corresponds to a single hex character in the string to be parsed. That's why I used 'XX' to parse a single byte (2 hex characters) and used 'XXXX' to parse 2 bytes (4 hex characters).
I am reading this libb64 source code for encoding and decoding base64 data.
I know the encoding procedure but i can't figure out how the following decoding table is constructed for fast lookup to perform decoding of encoded base64 characters. This is the table they are using:
static const char decoding[] = {62,-1,-1,-1,63,52,53,54,55,56,57,58,59,60,61,-1,-1,-1,-2,-1,-1,-1,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,-1,-1,-1,-1,-1,-1,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51};
Can some one explain me how the values in this table are used for decoding purpose.
It's a shifted and limited ASCII translating table. The keys of the table are ASCII values, the values are base64 decoded values. The table is shifted such that the index 0 actually maps to the ASCII character + and any further indices map the ASCII values after +. The first entry in the table, the ASCII character +, is mapped to the base64 value 62. Then three characters are ignored (ASCII ,-.) and the next character is mapped to the base64 value 63. That next character is ASCII /.
The rest will become obvious if you look at that table and the ASCII table.
It's usage is something like this:
int decode_base64(char ch) {
if (ch < `+` or ch > `z`) {
return SOME_INVALID_CH_ERROR;
}
/* shift range into decoding table range */
ch -= `+`;
int base64_val = decoding[ch];
if (base64_val < 0) {
return SOME_INVALID_CH_ERROR;
}
return base64_val;
}
As know, each byte has 8 bits, possible 256 combinations with 2 symbols (base2).
With 2 symbols is need to waste 8 chars to represent a byte, for example '01010011'.
With base 64 is possible to represent 64 combinations with 1 char...
So, we have a base table:
A = 000000
B = 000001
C = 000010
...
If you have the word 'Man', so you have the bytes:
01001101, 01100001, 01101110
and so the stream:
011010110000101101110
Break in group of six bits: 010011 010110 000101 101110
010011 = T
010110 = W
000101 = F
101110 = u
So, 'Man' => base64 coded = 'TWFu'.
As saw, this works perfectly to streams whith length multiple of 6.
If you have a stream that isn't multiple of 6, for example 'Ma' you have the stream:
010011 010110 0001
you need to complete to have groups of 6:
010011 010110 000100
so you have the coded base 64:
010011 = T
010110 = W
000100 = E
So, 'Ma' => 'TWE'
After to decode the stream, in this case you need to calc the last multiple length to be multiple of 8 and so remove the extra bits to obtain the original stream:
T = 010011
W = 010110
E = 000100
1) 010011 010110 000100
2) 01001101 01100001 00
3) 01001101 01100001 = 'Ma'
In really, when we put the trailing 00s, we mark the end of Base64 string with '=' to each trailing '00 added ('Ma' ==> Base64 'TWE=')
See too the link: http://www.base64decode.org/
Images represented on base 64 is a good option to represent with strings in many applications where is hard to work directly with a real binary stream. Real binary stream is better because is a base256, but is difficult inside HTML for example, there are 2 ways, minor traffic, or more easy to work with strings.
See ASCII codes too, the chars of base 64 is from range '+' to 'z' on table ASCII but there are some values between '+' and 'z' that isn't base 64 symbols
'+' = ASCII DEC 43
...
'z' = ASCII DEC 122
from DEC 43 to 122 are 80 values but
43 OK = '+'
44 isn't a base 64 symbols so the decoding index is -1 (invalid symbol to base64)
45 ....
46 ...
...
122 OK = 'z'
do the char needed to decode, decremented of 43 ('+') to be index 0 on vector to quick access by index so, decoding[80] = {62, -1, -1 ........, 49, 50,51};
Roberto Novakosky
Developer Systems
Considering these 2 mapping tables:
static const char decodingTab[] = {62,-1,-1,-1,63,52,53,54,55,56,57,58,59,60,61,-1,-1,-1,-2,-1,-1,-1,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,-1,-1,-1,-1,-1,-1,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51};
static unsigned char encodingTab[64]="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
decodingTab is the reverse mapping table of encondingTab.
So decodingTab[i] should never be -1.
In fact, only 64 values are expected. However decodingTab size is 128.
So, in decodingTab,unexpected index values are set to -1 (an arbitrary number which is not in [0,63])
char c;
unsigned char i;
...
encoding[decoding[c]]=c;
decoding[encoding[i]=i;
Hope it helps.
Is there any character or character combination that MATLAB interprets as comments, when importing data from text files? Being that when it detects it at the beginning of a line, will know all the line is to ignore?
I have a set of points in a file that look like this:
And as you can see he doesn't seem to understand them very well. Is there anything other than // I could use that MATLAB knows it's to ignore?
Thanks!
Actually, your data is not consistent, as you must have the same number of column for each line.
1)
Apart from that, using '%' as comments will be correctly recognized by importdata:
file.dat
%12 31
12 32
32 22
%abc
13 33
31 33
%ldddd
77 7
66 6
%33 33
12 31
31 23
matlab
data = importdata('file.dat')
2)
Otherwise use textscan to specify arbitrary comment symbols:
file2.dat
//12 31
12 32
32 22
//abc
13 33
31 33
//ldddd
77 7
66 6
//33 33
12 31
31 23
matlab
fid = fopen('file2.dat');
data = textscan(fid, '%f %f', 'CommentStyle','//', 'CollectOutput',true);
data = cell2mat(data);
fclose(fid);
If you use the function textscan, you can set the CommentStyle parameter to // or %. Try something like this:
fid = fopen('myfile.txt');
iRow = 1;
while (~feof(fid))
myData(iRow,:) = textscan(fid,'%f %f\n','CommentStyle','//');
iRow = iRow + 1;
end
fclose(fid);
That will work if there are two numbers per line. I notice in your examples the number of numbers per line varies. There are some lines with only one number. Is this representative of your data? You'll have to handle this differently if there isn't a uniform number of columns in each row.
Have you tried %, the default comment character in MATLAB?
As Amro pointed out, if you use importdata this will work.