Id2sym & symbol.object_id - c

Using ruby-hacking-guide site, I've found that fixnum << 8 | 1 is object_id of any fixnum.
I've tried using similar approach with symbol.
#define ID2SYM(x) ((VALUE)(((long)(x))<<8|SYMBOL_FLAG))
When shifting 8 bits left, x becomes a multiple of 256, that means a
multiple of 4. Then after with a bitwise or (in this case it’s the
same as adding) with 0×0e (14 in decimal)
I have tried it with :a(:a.object_id = 175_976, on my 32-bit system):
ASCII number of a is 97.
97 << 8 = 24832
24832 | 14 = 24_846
So it's not even close to :a's object id.
I've checked source of object_id and found this:
* sizeof(RVALUE) is
* 20 if 32-bit, double is 4-byte aligned
* 24 if 32-bit, double is 8-byte aligned
* 40 if 64-bit
*/
if (SYMBOL_P(obj)) {
return (SYM2ID(obj) * sizeof(RVALUE) + (4 << 2)) | FIXNUM_FLAG;
I got ~ 500 000, which is bad value.
So what I'm missing? How to calculate object_id of symbol?

The ID value that you calculate from a symbols object_id doesn’t directly represent the string content of that symbol. It is an index into a table that Ruby maintains containing the string. When you use a symbol in Ruby then if that symbol hasn’t been used before in the current process it will be given the ID value of the next free slot in the symbol table.
This means that a given symbol won’t always have the same ID value. The ID value associated a Ruby processes symbols will depend on the order that they are created.
You can see this by starting a new Ruby process, creating a new symbol and looking at its object_id, and then repeating with a different symbol name. The object_id should be the same in both cases, since it will be referring to the next free spot in the symbol table. You need to be careful doing this as Ruby defines a lot of symbols itself, so if you use one of these you’ll get different results.
For example, an irb session:
2.1.0 :001 > Symbol.all_symbols.find {|sym| sym.to_s == 'matt' }
=> nil
2.1.0 :002 > :matt.object_id
=> 542248
And another:
2.1.0 :001 > Symbol.all_symbols.find {|sym| sym.to_s == 'banana' }
=> nil
2.1.0 :002 > :banana.object_id
=> 542248
Here we first check to see if the name we are going to use doesn’t already exist as a symbol, then we create the symbol and look at its object_id. In both cases it is the same 542248, corresponding to an ID of 2118, even though they have different names (these values may differ on different systems or Ruby versions).

Related

Regex string with 2+ different numbers and some optional characters in Snowflake syntax

I would like to check if a specific column in one of my tables meets the following conditions:
String must contain at least three characters
String must contain at least two different numbers [e.g. 123 would work but 111 would not]
Characters which are allowed in the string:
Numbers (0-9)
Uppercase letters
Lowercase letters
Underscores (_)]
Dashes (-)
I have some experience with Regex but am having issues with Snowflake's syntax. Whenever I try using the '?' regex character (to mark something as optional) I receive an error. Can someone help me understand a workaround and provide a solution?
What I have so far:
SELECT string,
LENGTH(string) AS length
FROM tbl
WHERE REGEXP_LIKE(string,'^[0-9]+{3,}[-+]?[A-Z]?[a-z]?$')
ORDER BY length;
Thanks!
Your regex looks a little confusing and invalid, and it doesn't look like it quite meets your needs either. I read this expression as a string that:
Must start with one or more digits, at least 3 or more times
The confusing part to me is the '+' is a quantifier, which is not quantifiable with {3,} but somehow doesn't produce an error for me
Optionally followed by either a dash or plus sign
Followed by an uppercase character zero or one times (giving back as needed)
Followed by and ending with a lowercase character zero or one times (giving back as needed)
Questions
You say that your string must contain 3 characters and at least 2 different numbers, numbers are characters but I'm not sure if you mean 3 letters...
Are you considering the numbers to be characters?
Does the order of the characters matter?
Can you provide an example of the error you are receiving?
Notes
Checking for a second digit that is not the same as the first involves the concept of a lookahead with a backreference. Snowflake does not support backreferences.
One thing about pattern matching with regular expressions is that order makes a difference. If order is not of importance to you, then you'll have multiple patterns to match against.
Example
Below is how you can test each part of your requirements individually. I've included a few regexp_substr functions to show how extraction can work to check if something exists again.
Uncomment the WHERE clause to see the dataset filtered. The filters are written as expressions so you can remove any/all of the regexp_* columns.
select randstr(36,random(123)) as r_string
,length(r_string) AS length
,regexp_like(r_string,'^[0-9]+{3,}[-+]?[A-Z]?[a-z]?$') as reg
,regexp_like(r_string,'.*[A-Za-z]{3,}.*') as has_3_consecutive_letters
,regexp_like(r_string,'.*\\d+.*\\d+.*') as has_2_digits
,regexp_substr(r_string,'(\\d)',1,1) as first_digit
,regexp_substr(r_string,'(\\d)',1,2) as second_digit
,first_digit <> second_digit as digits_1st_not_equal_2nd
,not(regexp_instr(r_string,regexp_substr(r_string,'(\\d)',1,1),1,2)) as first_digit_does_not_appear_again
,has_3_consecutive_letters and has_2_digits and first_digit_does_not_appear_again as test
from table(generator(rowcount => 10))
//where regexp_like(r_string,'.*[A-Za-z]{3,}.*') // has_3_consecutive_letters
// and regexp_like(r_string,'.*\\d+.*\\d+.*') // has_2_digits
// and not(regexp_instr(r_string,regexp_substr(r_string,'(\\d)',1,1),1,2)) // first_digit_does_not_appear_again
;
Assuming the digits need to be contiguous, you can use a javascript UDF to find the number in a string with with the largest number of distinct digits:
create or replace function f(S text)
returns float
language javascript
returns null on null input
as
$$
const m = S.match(/\d+/g)
if (!m) return 0
const lengths = m.map(m=> [...new Set (m.split(''))].length)
const max_length = lengths.reduce((a,b) => Math.max(a,b))
return max_length
$$
;
Combined with WHERE-clause, this does what you want, I believe:
select column1, f(column1) max_length
from t
where max_length>1 and length(column1)>2 and column1 rlike '[\\w\\d-]+';
Yielding:
COLUMN1 | MAX_LENGTH
------------------------+-----------
abc123def567ghi1111_123 | 3
123 | 3
111222 | 2
Assuming this input:
create or replace table t as
select * from values ('abc123def567ghi1111_123'), ('xyz111asdf'), ('123'), ('111222'), ('abc 111111111 abc'), ('12'), ('asdf'), ('123 456'), (null);
The function is even simpler if the digits don't have to be contiguous (i.e. count the distinct digits in a string). Then core logic changes to:
const m = S.match(/\d/g)
if (!m) return 0
const length = [...new Set (m)].length
return length
Hope that's helpful!

how does SQL Server actually store russian symbols in char?

I have a column NAME, which is CHAR(50).
It contains the value 'Рулон комбинированный СТЕРИТ 50мм ? 200 м'
which integer representation is:
'1056,1091,1083,1086,1085,32,1082,1086,1084,1073,1080,1085,1080,1088,1086,1074,1072,1085,1085,1099,1081,32,1057,1058,1045,1056,1048,1058,32,53,48,1084,1084,32,63,32,50,48,48,32,1084'
but CHAR implies that it contains 8 bit. How does SQL Server store values like '1056,1091,1083,1086,1085' which are UNICODE symbols?
OK, and also ? symbol is actually × (215) (Multiplication Sign)
If SQL Server can represent '1056' why it can't represent '215'?
What the 255 values in a char mean is determined by the database collation. For Russia this is typically Cyrillic_General_CI_AS (where CI means Case Insentitive and AS means Accent Sensitive.)
There's a good chance this matches Windows code page 1251, so л is stored as hex EB or decimal 235. You can verify this with T-SQL:
create database d1 collate Cyrillic_General_CI_AS;
use d1
select ascii('л')
-->
235
In the Cyrillic code page, decimal 215 means Ч, not the multiplication sign. Because SQL Server can't match the multiplication sign to the Cyrillic code page, it replaces it with a question mark:
select ascii('×'), ascii('?')
-->
63 63
In the Cyrillic code page, the char 8-bit representation of the multiplication sign and the question mark are both decimal 63, the question mark.
I have a column NAME, which is CHAR(50).
It contains the value 'Рулон комбинированный СТЕРИТ 50мм ? 200 м'
which integer representation is:
'1056,1091,1083,1086,1085,32,1082,1086,1084,1073,1080,1085,1080,1088,1086,1074,1072,1085,1085,1099,1081,32,1057,1058,1045,1056,1048,1058,32,53,48,1084,1084,32,63,32,50,48,48,32,1084'
Cyted above is wrong.
I make a test within a database with Cyrillic collation and integer representation is different from what you showed us, so or your data type is not char, or your integer representation is wrong, and yes, "but CHAR implies that it contains 8 bit" is correct and here is how you can prove it to youerself:
--create table dbo.t (name char(50));
--insert into dbo.t values ('Рулон комбинированный СТЕРИТ 50мм ? 200 м')
select cast (name as binary(50))
from dbo.t;
select substring(cast (name as binary(50)), n, 1) as bin_substr,
cast(substring(cast (name as binary(50)), n, 1) as int) as int_,
char(substring(cast (name as binary(50)), n, 1)) as cyr_char
from dbo.t cross join nums.dbo.nums;
Here dbo.Nums is an auxiliary table containig integers. I just convert your string from char field into binary, split it byte per byte and convert into int and char.

Is it possible to determine ENCRYPTBYKEY maximum returned value by the clear text type?

I am going to encrypted several fields in existing table. Basically, the following encryption technique is going to be used:
CREATE MASTER KEY ENCRYPTION
BY PASSWORD = 'sm_long_password#'
GO
CREATE CERTIFICATE CERT_01
WITH SUBJECT = 'CERT_01'
GO
CREATE SYMMETRIC KEY SK_01
WITH ALGORITHM = AES_256 ENCRYPTION
BY CERTIFICATE CERT_01
GO
OPEN SYMMETRIC KEY SK_01 DECRYPTION
BY CERTIFICATE CERT_01
SELECT ENCRYPTBYKEY(KEY_GUID('SK_01'), 'test')
CLOSE SYMMETRIC KEY SK_01
DROP SYMMETRIC KEY SK_01
DROP CERTIFICATE CERT_01
DROP MASTER KEY
The ENCRYPTBYKEY returns varbinary with a maximum size of 8,000 bytes. Knowing the table fields going to be encrypted (for example: nvarchar(128), varchar(31), bigint) how can I define the new varbinary types length?
You can see the full specification here
So lets calculate:
16 byte key UID
_4 bytes header
16 byte IV (for AES, a 16 byte block cipher)
Plus then the size of the encrypted message:
_4 byte magic number
_2 bytes integrity bytes length
_0 bytes integrity bytes (warning: may be wrongly placed in the table)
_2 bytes (plaintext) message length
_m bytes (plaintext) message
CBC padding bytes
The CBC padding bytes should be calculated the following way:
16 - ((m + 4 + 2 + 2) % 16)
as padding is always applied. This will result in a number of padding bytes in the range 1..16. A sneaky shortcut is to just add 16 bytes to the total, but this may mean that you're specifying up to 15 bytes that are never used.
We can shorten this to 36 + 8 + m + 16 - ((m + 8) % 16) or 60 + m - ((m + 8) % 16. Or if you use the little trick specified above and you don't care about the wasted bytes: 76 + m where m is the message input.
Notes:
beware that the first byte in the header contains the version number of the scheme; this answer does not and cannot specify how many bytes will be added or removed if a different internal message format or encryption scheme is used;
using integrity bytes is highly recommended in case you want to protect your DB fields against change (keeping the amount of money in an account confidential is less important than making sure the amount cannot be changed).
The example on the page assumes single byte encoding for text characters.
Based upon some tests in SQL Server 2008, the following formula seems to work. Note that #ClearText is VARCHAR():
52 + (16 * ( ((LEN(#ClearText) + 8)/ 16) ) )
This is roughly compatible with the answer by Maarten Bodewes, except that my tests showed the DATALENGTH(myBinary) to always be of the form 52 + (z * 16), where z is an integer.
LEN(myVarCharString) DATALENGTH(encryptedString)
-------------------- -----------------------------------------
0 through 7 usually 52, but occasionally 68 or 84
8 through 23 usually 68, but occasionally 84
24 through 39 usually 84
40 through 50 100
The "myVarCharString" was a table column defined as VARCHAR(50). The table contained 150,000 records. The mention of "occasionally" is an instance of about 1 out of 10,000 records that would get bumped into a higher bucket; very strange. For LEN() of 24 and higher, there were not enough records to get the weird anomaly.
Here is some Perl code that takes a proposed length for "myVarCharString" as input from the terminal and produces an expected size for the EncryptByKey() result. The function "int()" is equivalent to "Math.floor()".
while($len = <>) {
print 52 + ( 16 * int( ($len+8) / 16 ) ),"\n";
}
You might want to use this formula to calculate a size, then add 16 to allow for the anomaly.

Using CAST and bigint

I am trying to understand what does this statement does
SUM(CAST(FILEPROPERTY(name, 'SpaceUsed') AS bigint) * 8192.)/1024 /1024
Also why is there a dot after 8192? Can anybody explain this query bit by bit. Thanks!
FILEPROPERTY() returns an int value. Note that the SpaceUsed property is not in bytes but in "pages" - and in SQL Server a page is 8KiB, so multiplying by 8192 to get the size in KiB is appropriate.
I've never encountered a trailing dot without fractional digits before - the documentation for constants/literals in T-SQL does not give an example of this usage, but reading it implies it's a decimal:
decimal constants are represented by a string of numbers that are not enclosed in quotation marks and contain a decimal point.
Thus multiplying the bigint value by a decimal would yield a decimal value, which may be desirable if you want to preserve fractional digits when dividing by 1024 (and then 1024 again), though it's odd that those numbers are actually int literals, so the operation would just be truncation-division.
I haven't tested it, but you could try just this:
SELECT
SUM( FILEPROPERTY( name, 'SpaceUsed' ) ) * ( 8192.0 / 10485760 ) AS TotalGigabytes
FROM
...
If you're reading through code and you need to do research to understand what it's doing - do a favour for the next person who reads the code by adding an explanatory comment to save them from having to do research, e.g. "gets the total number of 8KiB pages used by all databases, then converts it to gigabytes".
The dot . after an Integer converts it implicitly to decimal value. This is most likely here to force the output to also be decimal (not an integer). In this case you only need one part of the operation to be converted to force the output to be in that type.
This probably has to do with bytes/pages since the numbers 8192 and 1024 (most likely for converting to larger unit). One could also imply this by the value of property which indicates how much space is being used by a file.
A page fits within 8kB which means that multiplying pages value by 8192 does convert the output to bytes being used. Then division two times by 1024 succesfully converts the output to gigabytes.
Explanation on functions used:
FILEPROPERTY returns a value for a file name which is stored within database. If a file is not present, null value is returned
CAST is for casting the value to type bigint
SUM is an aggregate function used in a query to sum values for a specified group

Why character d & f ignored for oracle Number field in where condition?

As I mentioned in question title,character d & f are ignored(?) in Oracle where condition
Below query runs without any error
select employee_id from employees where employee_id > 106f
But if I specify other than d or f after 106 ORA-00933: SQL command not properly ended error will be thrown because employee_id is of datatype Number
Why this Strange behaviour?? That to it happens for only single letter after number,if I specify 106df it throws error(which is correct)
According to Oracle docs, d and f are allowable suffixes for numeric literals, denoting 64-bit (double) and 32-bit (float) binary floating-point types. In your case, the type doesn't make any difference (it probably just gets converted back to integer for the comparison, and with no loss of accuracy because 106 is small enough to be represented exactly as a float), so it looks like nothing is happening. Other letters, and 106df, aren't allowed by the syntax. (e is allowed, but only if followed by a number.)

Resources