Snowflake Masking policy: replace int column by '*****' - snowflake-cloud-data-platform

I'm using the snowflake masking policy to hide value from unauthorized role, but I can't find how replace int column by '*****'. I can replace it with null, but using this can be confusing (not knowing if it is really null or just hide).
The policy I'm using to replace int column by null:
create or replace masking policy values_mask as (val int) returns int ->
case
when current_role() in ('ADMIN') then val
else null
end;
How I apply the policy:
alter table databse.schema.table modify column value set masking policy values_mask;
I'm want else to retrun '****'. I have tried returns string or other similar change. But afterwards It creates errors.

The output data type must match input data type:
CREATE MASKING POLICY:
RETURNS arg_type_to_mask
The return data type must match the input data type of the first column that is specified as an input column.
...
Currently, Snowflake does not support different input and output data types in a masking policy, such as defining the masking policy to target a timestamp and return a string (e.g. MASKED); the input and output data types must match.
The value '****'(string) is definitely not a valid integer value(column data type), therefore masking policy with requested signature will not work.

Related

ORA-22835: Buffer too small and ORA-25137: Data value out of range

We are using a software that has limited Oracle capabilities. I need to filter through a CLOB field by making sure it has a specific value. Normally, outside of this software I would do something like:
DBMS_LOB.SUBSTR(t.new_value) = 'Y'
However, this isn't supported so I'm attempting to use CAST instead. I've tried many different attempts but so far these are what I found:
The software has a built-in query checker/validator and these are the ones it shows as invalid:
DBMS_LOB.SUBSTR(t.new_value)
CAST(t.new_value AS VARCHAR2(10))
CAST(t.new_value AS NVARCHAR2(10))
However, the validator does accept these:
CAST(t.new_value AS VARCHAR(10))
CAST(t.new_value AS NVARCHAR(10))
CAST(t.new_value AS CHAR(10))
Unfortunately, even though the validator lets these ones go through, when running the query to fetch data, I get ORA-22835: Buffer too small when using VARCHAR or NVARCHAR. And I get ORA-25137: Data value out of range when using CHAR.
Are there other ways I could try to check that my CLOB field has a specific value when filtering the data? If not, how do I fix my current issues?
The error you're getting indicates that Oracle is trying to apply the CAST(t.new_value AS VARCHAR(10)) to a row where new_value has more than 10 characters. That makes sense given your description that new_value is a generic audit field that has values from a large number of different tables with a variety of data lengths. Given that, you'd need to structure the query in a way that forces the optimizer to reduce the set of rows you're applying the cast to down to just those where new_value has just a single character before applying the cast.
Not knowing what sort of scope the software you're using provides for structuring your code, I'm not sure what options you have there. Be aware that depending on how robust you need this, the optimizer has quite a bit of flexibility to choose to apply predicates and functions on the projection in an arbitrary order. So even if you find an approach that works once, it may stop working in the future when statistics change or the database is upgraded and Oracle decides to choose a different plan.
Using this as sample data
create table tab1(col clob);
insert into tab1(col) values (rpad('x',3000,'y'));
You need to use dbms_lob.substr(col,1) to get the first character (from the default offset= 1)
select dbms_lob.substr(col,1) from tab1;
DBMS_LOB.SUBSTR(COL,1)
----------------------
x
Note that the default amount (= length) of the substring is 32767 so using only DBMS_LOB.SUBSTR(COL) will return more than you expects.
CAST for CLOB does not cut the string to the casted length, but (as you observes) returns the exception ORA-25137: Data value out of range if the original string is longert that the casted length.
As documented for the CAST statement
CAST does not directly support any of the LOB data types. When you use CAST to convert a CLOB value into a character data type or a BLOB value into the RAW data type, the database implicitly converts the LOB value to character or raw data and then explicitly casts the resulting value into the target data type. If the resulting value is larger than the target type, then the database returns an error.

too long and would be truncated in CONCAT with Data masking policy

I implemented Data masking policy on two view columns which are First_Name and Last_Name in the Customer table with sha2(Val) based on the current role. Ex. alter view .<SCHEMA_NAME>.<TABLE_NAE> modify
column <COLUMN_NAME> set masking policy public.pii_allowed;
When executing view definition by concatenating both columns it's running fine but giving an error with a view.
That is "String 689z3z73z8z32zz46z24zz916z15zzz6z4z45z26zz887zzz98765432zz2312z5 yy3y9y24y61yy0y910y63y6yy384y277y670y283746y2y2y960y25y6y85yy03 is too long and would be truncated in 'CONCAT'". The result value length is 129 including space.
I tryied to write case statement to avoid the to print the value. Ex. case when length(First_name||''||Last_name) > 64 THEN First_Name esle lenght(First_name||''||Last_name) end Name. But till it is giving error with above message with Complex views.
Please suggest resolving this error.

Check if SqlParameter value is in range of it's target SQL column

tldnr;
How should I check the bounds on my SqlParameter values before attempting to put them in the database?
Longer version:
So, I have these dynamically generated SQL statements where I pass in a bunch of SqlParameters.
The way we declare SqlParameters is just
new SqlParameter("fieldName", value)
and we let the runtime figure out what the dbtype is.
That said, occasionally the update / insert statement fails, and we'd like to determine which field is too big (since our server only tells us that a field update failed, not WHICH one), by doing bounds checking. That is we can't put a two digit number in a column which only allows 1 digit (say a decimal(1,0) for example.)
We have the schema of the columns in memory (information_schema.columns ftw), so we could just try to do bounds checking on the SqlParameter value, but since the value is an object and not even necessarily a numeric type, how should I check that a value is in the range?
Or am I making the problem too hard and instead should have supplied the precision and scale when constructing the SqlParameters to begin with? Or even better, should we be using types that reflect what the columns in the database are?
Update:
Setting the precision / scale doesn't seem to have any consequence as seen in this code:
decimal d= 10.0M*(decimal)Math.Pow(10, 6);
SqlParameter p = new SqlParameter("someparam", SqlDbType.Decimal);
p.Precision = (byte)1;
p.Scale = (byte)0;
p.Value = d;
Console.WriteLine(p.SqlValue); // doesn't throw an error, i would think the sql value would be at most 10, not 10 million.
It seems that SqlParameter does not validate upon the Value property being set. And DataColumn does not allow for specifying either Precision or Scale so not terribly useful. However, there is a way:
Using the collection of schema info that you already have, dynamically create an array of SqlMetaData based on the size of the schema collection and populate it with the column name and size data:
SqlMetaData[] _TempColumns = new SqlMetaData[_SchemaCollection.Count];
loop-of-some-sort
{
switch (_SchemaCollection.DataType)
{
case "decimal":
_TempColumns[_Index] = new SqlMetaData(
_SchemaCollection.Name,
SqlDbType.Decimal,
(byte)_SchemaCollection.Precision,
(byte)_SchemaCollection.Scale
);
break;
case "others...."
}
}
Create a new SqlDataRecord using the SqlMetaData[] from step 1:
SqlDataRecord _TempRow = new SqlDataRecord(_TempColumns);
loop through _TempRow calling the appropriate Set method for each position, in a try / catch:
string _DataAintRight;
try
{
_TempRow.SetDecimal(_Index, _SchemaCollection.Value);
}
catch
{
_DataAintRight = _SchemaCollection.Name;
break;
}
NOTES:
This will only do the same validation that passing params to a proc would do. Meaning, it will silently truncate values that are too long, such as too many digits to the right of a decimal point, and a string that exceeds the max size.
Fixed-length numeric types should already be in their equivalent .Net types (i.e. SMALLINT value in an Int16 variable or property) and hence are already pre-verified. If this is indeed the case then there is no additional benefit from testing them. But if they currently reside in a more generic container (a larger Int type or even a string), then testing here is appropriate.
If you need to know that a string will be truncated, then that has to be tested separately. At least not as SqlMetaData, but in the loop and switch, just test the length of the string in that case.
Regardless of any of this testing stuff, it is best to not create parameters by having .Net guess the type via: new SqlParameter("fieldName", value) or even _Command.Parameters.AddWithValue(). So regarding the question of if you "should have supplied the precision and scale when constructing the SqlParameters to begin with", absolutely yes.
Another option (which I can elaborate on tomorrow when I will have time to update this) is to validate everything as if there were no built-in containers that are supposed to be reflections of the real database/provider datatypes. So, there are two main considerations that will drive the implementation:
Is the source data currently strongly typed or is it all serialized as strings?
and
Is there a need to know if the value will be truncated (specifically in cases where the value would otherwise be silently truncated, not causing an error, which could lead to unexpected behavior). The issue here is that inserting data into a field in a table that exceeds the specified max length will cause a string or binary data will be truncated error. But when passing data to a parameter (i.e. not direct to a table) that data will be truncated without causing an error. Sometimes this is ok, but sometimes this can lead to a situation where an input parameter has been specified incorrectly (or was correct but then the field was expanded and the parameter was never updated to match the new length) and might be chopping off the ends of some values that goes undetected until a customer reports that "something doesn't look quite right on a report, and oh, by the way, this has been happening off and on for maybe four or five months now, but I have been really busy and kept forgetting to mention it, maybe it's been nine months, I can't remember, but yeah, something's wrong". I mean, how often do we test our code by passing in the max value for each parameter to make sure that the system can handle it?
If the source data is in the appropriate .Net types:
There are several that do not need to be checked since fixed-length numeric types are the same between .Net and SQL Server. The ones that are pre-validated simply by existing in their respective .Net types are:
bool -> BIT
byte -> TINYINT
Int16 -> SMALLINT
Int32 -> INT
Int64 -> BIGINT
Double -> FLOAT
Single -> REAL
Guid -> UNIQUEIDENTIFIER
There are some that need to be checked only for truncation (if that is a concern) as their values should always be in the same range as their SQL Server counterparts. Keep in mind that here we are talking about strict truncation when the values are passed to parameters of a smaller scale, but will actually round up (well, at 5) when inserted directly into a column having a smaller scale. For example, sending in a DateTime value that is accurate to 5 decimal places will truncate the 3 right-most numbers when passed to a parameter defined as DATETIME2(2).
DateTimeOffset -> DATETIMEOFFSET(0 - 7)
DateTime -> DATETIME2(0 - 7)
DateTime -> DATE : 0001-01-01 through 9999-12-31 (no time)
TimeSpan -> TIME(0 - 7) : 00:00:00.0000000 through 23:59:59.9999999
There are some that need to be checked to make sure that they are not out of the valid range for the SQL Server datatype as out of range would cause an exception. They also possibly need to be checked for truncation (if that is a concern). Keep in mind that here we are talking about strict truncation when the values are passed to parameters of a smaller scale, but will actually round up (well, at 5) when inserted directly into a column having a smaller scale. For example, sending in a DateTime value will lose all seconds and fractional seconds when passed to a parameter defined as SMALLDATETIME.
DateTime -> DATETIME : 1753-01-01 through 9999-12-31, 00:00:00.000 through 23:59:59.997
DateTime -> SMALLDATETIME : 1900-01-01 through 2079-06-06, 00:00 through 23:59 (no seconds)
Decimal -> MONEY : -922,337,203,685,477.5808 to 922,337,203,685,477.5807
Decimal -> SMALLMONEY : -214,748.3648 to 214,748.3647
Decimal -> DECIMAL : range = -9[digits = (Precision - Scale)] to 9[digits = (Precision - Scale)], truncation depends on defined Scale
The following string types will silently truncate when passed to a parameter with a max length that is less than the length of their value, but will error with String or binary data would be truncated if directly inserted into a column with a max length that is less than the length of their value:
byte[] -> BINARY
byte[] -> VARBINARY
string -> CHAR
string -> VARCHAR
string -> NCHAR
string -> NVARCHAR
The following is tricky as the true validation requires knowing more the options it was created with in the database.
string -> XML -- By default an XML field is untyped and is hence very lenient regarding "proper" XML syntax. However, that behavior can be altered by associating an XML Schema Collection (1 or more XSDs) with the field for validation (see also: Compare Typed XML to Untyped XML). So true validation of an XML field would include getting that info, if it exists, and if so, checking against those XSDs. At the very least it should be well-formed XML (i.e. '<b>' will fail, but '<b />' will succeed).
For the above types, the pre-validated types can be ignored. The rest can be tested in a switch(DestinationDataType) structure:
Types that need to be validated for ranges can be done as follows
case "smalldatetime":
if ((_Value < range_min) || (_Value > range_max))
{
_ThisValueSucks = true;
}
break;
Numeric/DateTime truncation, if being tested for, might be best to do a ToString() and using IndexOf(".") for most, or IndexOf(":00.0") for DATE and SMALLDATETIME, to find the number of digits to the right of the decimal (or starting at the "seconds" for SMALLDATETIME)
String truncation, if being tested for, is a simple matter of testing the length.
Decimal range can be tested either numerically:
if ((Math.Floor(_Value) < -999) || (Math.Floor(_Value) > 999))
or:
if (Math.Abs(Math.Floor(_Value)).ToString().Length <= DataTypeMaxSize)
Xml
as XmlDocument is pre-validated outside of potential XSD validation associated with the XML field
as String could first be used to create an XmlDocument, which only leaves any potential XSD validation associated with the XML field
If the source data is all string:
Then they all need to be validated. For these you would first use TryParse methods associated to each type. Then you can apply the rules as noted above for each type.

Grails nullable column returns value = {"class":"java.lang.Character"} in JSON

I have a legacy database that I connect to and not sure why I get this result when I parse the json object in my client. The column Character customersMname is defined in the domains static constraints as:
customersMname nullable: true, maxSize: 1
the result I get back from the JSON object when the field is null is:
<jsonname2>customersMname</jsonname2>
<jsonvalue2>{"class":"java.lang.Character"}</jsonvalue2>
There is actual data in the database column and it should be P. Seems this is occurring with single character columns in MYSQL db when the datatype is defined as CHAR(1) or VARCHAR(1). Any ideas?
Apparently this is a bug in the system. Work around is to change the domain type to String and be done with it.

How are NULLs stored in a database?

I'm curious to know how NULLs are stored into a database ?
It surely depends on the database server but I would like to have an general idea about it.
First try:
Suppose that the server put a undefined value (could be anything) into the field for a NULL value.
Could you be very lucky and retrieve the NULL value with
...WHERE field = 'the undefined value (remember, could be anything...)'
Second try:
Does the server have a flag or any meta-data somewhere to indicate this field is NULL ?
Then the server must read this meta data to verify the field.
If the meta-data indicates a NULL value and if the query doesn't have "field IS NULL",
then the record is ignored.
It seems too easy...
MySql uses the second method. It stores an array of bits (one per column) with the data for each row to indicate which columns are null and then leaves the data for that field blank. I'm pretty sure this is true for all other databases as well.
The problem with the first method is, are you sure that whatever value you select for your data won't show up as valid data? For some values (like dates, or floating point numbers) this is true. For others (like integers) this is false.
On PostgreSQL, it uses an optional bitmap with one bit per column (0 is null, 1 is not null). If the bitmap is not present, all columns are not null.
This is completely separate from the storage of the data itself, but is on the same page as the row (so both the row and the bitmap are read together).
References:
http://www.postgresql.org/docs/8.3/interactive/storage-page-layout.html
The server typically uses meta information rather than a magic value. So there's a bit off someplace that specifies whether the field is null.
-Adam
IBM Informix Dynamic Server uses special values to indicate nulls. For example, the valid range of values for a SMALLINT (16-bit, signed) is -32767..+32767. The other value, -32768, is reserved to indicate NULL. Similarly for INTEGER (4-byte, signed) and BIGINT (8-byte, signed). For other types, it uses other special representations (for example, all bits 1 for SQL FLOAT and SMALLFLOAT - aka C double and float, respectively). This means that it doesn't have to use extra space.
IBM DB2 for Linux, Unix, Windows uses extra bytes to store the null indicators; AFAIK, it uses a separate byte for each nullable field, but I could be wrong on that detail.
So, as was pointed out, the mechanisms differ depending on the DBMS.
The problem with special values to indicate NULL is that sooner or later that special value will be inserted. For example, it will be inserted into a table specifying the special NULL indicators for different database servers
| DBServer | SpecialValue |
+--------------+--------------+
| 'Oracle' | 'Glyph' |
| 'SQL Server' | 'Redmond' |
;-)

Resources