too long and would be truncated in CONCAT with Data masking policy

too long and would be truncated in CONCAT with Data masking policy - snowflake-cloud-data-platform

I implemented Data masking policy on two view columns which are First_Name and Last_Name in the Customer table with sha2(Val) based on the current role. Ex. alter view .<SCHEMA_NAME>.<TABLE_NAE> modify
column <COLUMN_NAME> set masking policy public.pii_allowed;
When executing view definition by concatenating both columns it's running fine but giving an error with a view.
That is "String 689z3z73z8z32zz46z24zz916z15zzz6z4z45z26zz887zzz98765432zz2312z5 yy3y9y24y61yy0y910y63y6yy384y277y670y283746y2y2y960y25y6y85yy03 is too long and would be truncated in 'CONCAT'". The result value length is 129 including space.
I tryied to write case statement to avoid the to print the value. Ex. case when length(First_name||''||Last_name) > 64 THEN First_Name esle lenght(First_name||''||Last_name) end Name. But till it is giving error with above message with Complex views.
Please suggest resolving this error.

Related

SQL Server Data Masking bug with "FOR JSON PATH" clause

I'm working with a masked database on my QA server using SQL Server Standard (64-bit) 14.0.1000.169. This is my structure:
CREATE TABLE [dbo].[Test](
[Column1] [VARCHAR(64)] NULL,
[Column2] [VARCHAR(64)] NULL
)
GO
INSERT INTO [dbo].[Test]
VALUES ('ABCDEFG', 'HIJKLMN')
I've masked the column with the following code:
ALTER TABLE [dbo].[Test]
ALTER COLUMN [Column1] VARCHAR(64) MASKED WITH (FUNCTION = 'default()');
It works as expected when I perform the following query using a non-allowed user:
SELECT [Column1], [Column2]
FROM [dbo].[Test]
FOR JSON PATH
-- RESULT: '[{"Column1":"xxxx", "Column2":"HIJKLMN"}]'
But it doesn't work when the same non-allowed user saves the result in variable (the main goal):
DECLARE #var VARCHAR(64)
SET #var = (SELECT [Column1], [Column2] FROM [dbo].[Test] FOR JSON PATH)
SELECT #var --it should show a valid JSON...
-- RESULT: 'xxxx' <-- JSON LOSES ITS STRUCTURE
-- DESIRED RESULT: '[{"Column1":"xxxx", "Column2":"HIJKLMN"}]' <-- VALID JSON
Main problem: JSON looses its structure when a masked column appear in the SELECT and "FOR JSON PATH" clause is present.
We want to get a valid JSON even if the data column is masked or not, or even if sa user or not.
I've tested using NVARCHAR or doing a CAST in the masked column, but the only way we get the desired result is using a #tempTable before use the "FOR JSON PATH" clause.
How can I do for SELECT a masked column and save it to
VARCHAR variables without loose JSON structure?
Any help will be appreciated.
NOTE: The SA user is default allowed to see unmasked data (so the JSON doesn't loose its structure), but we want to execute it on a non-allowed user and return a valid JSON, not only 'xxxx'.

It does indeed appear to be a bug. Repro is here. Although see below, not so sure.
When using FOR JSON, or for that matter FOR XML, as a top level SELECT construct, a different code path is used as compared to placing it in a subquery or assigning it to a variable. This is one of the reasons for the 2033-byte limit per row in a bare FOR JSON.
What appears to be happening is that in the case of a bare FOR JSON, the data masking happens at the top of the plan, in a Compute Scalar operator just before the JSON SELECT operator. So the masking happens on just the one column.
PasteThePlan
Whereas when putting inside a subquery, a UDX function operator is used. The problem is that the Compute Scalar is happening after the UDX has created the JSON or XML, whereas it should have been pushed down below the UDX in the plan.
PasteThePlan
I suggest you file the bug with Microsoft, on the Azure Feedback site.
Having gone over this a little, I actually think now that it's not a bug. What does seem to be a bug is the case without nesting.
From the documentation:
Whenever you project an expression referencing a column for which a data masking function is defined, the expression will also be masked. Regardless of the function (default, email, random, custom string) used to mask the referenced column, the resulting expression will always be masked with the default function.
Therefore, when you select any masked column, even in a normal SELECT, if you use a function on the column then the masking always happens after any other functions. In other words, the masking is not applied when the data is read, it is applied when it is finally output to the client.
When using a subquery, the data is fed into a UDX function operator. The compiler now senses that the final resultset is a normal SELECT, just that it needs to mask any final result that came from the masked column. So the whole JSON is masked as one blob, similar to if you did UPPER(yourMaskedColumn). See the XML plan in this fiddle for an example of that.
But when using a bare FOR JSON, it appears to the compiler as a normal SELECT, just that the final output is changed to JSON (the top-level SELECT operator is different). So the masking happens before that point. This seems to me to be a bug.
The bug is even more egregious when you use FOR XML, which uses the same mechanisms. If you use a nested FOR XML ..., TYPE then you get just <masked /> irrespective of whether you nest it or not. Again this is because the query plan shows the masking happening after the UDX. Whereas if you don't use , TYPE then it depends if you nest it. See fiddle.

how to write a sql command to convert two sql columns and include a calculation in same query

Following is my code :
SELECT sum_date, sum_accname, sum_description,
CASE WHEN debit NOT LIKE '%[^.0-9]%'
THEN CAST(debit as DECIMAL(9,2))
ELSE NULL
END AS debit,
CASE WHEN credit NOT LIKE '%[^.0-9]%'
THEN CAST(credit as DECIMAL(9,2))
ELSE NULL
END AS credit
FROM sum_balance
while viewing the report it shows an error : Error converting data type varchar to numeric. And i need sum of credit and debit column in the same query. Tried with the above code if i include only one column in the query for conversion its working bt adding another column in conversion it shows the error. I can't figure out the problem

The problem is that your debit and credit columns are text and thus can contain anything. You're attempting to limit it to only numeric values with NOT LIKE '%[^.0-9]%' but that's not enough because you could have a value like 12.3.6.7 which cannot convert to a decimal.
There is no way in SQL Server that I'm aware of using LIKE to achieve what you're trying to achieve, because LIKE does not support the full range of regex operations -- in fact, it's quite limited. In my opinion, you're torturing the database design by trying to multi-purpose those fields. If you're looking to report on numeric data, then store them in numeric fields. That assumes, of course, you have control over the schema.

ORA-04098 trigger error when trying to restrict character length

I'm trying to create a trigger for my database table so that users can only enter a postcode that is 6-8 characters long. However, this doesn't seem to work even though the trigger doesn't show any errors.
Here is the code:
create or replace trigger loc_postcode
before insert or update of postcode
on location
for each row
begin
if ( LENGTH(:new.postcode) > 8) or ( LENGTH(:new.postcode) < 6)
then raise_application_error(0001,
'The postcode must be between 6 and 8 characters long');
end if;
end;
and the error:
ORA-04098: trigger 'C3392387.LOC_ID' is invalid and failed re-validation

As others have mentioned the trouble with the previous version of your trigger was that you were comparing a string to a number when you needed to compare the length of the string to a value. I won't go into any further details on this.
The reason for your current error is that you're not using a valid error code for a user-defined error. Per the documentation the RAISE_APPLICATION_ERROR procedure takes error codes in the range -20000 to -20999. Change the error code to -20001 and the trigger will work.
I'm a little surprised that you were getting the error that you are. I would have expected you to get "ORA-21000: error number argument to raise_application_error of 1 is out of range" as can be demonstrated in this SQL Fiddle. It's possibly because you have a slightly dodgy character after your final semi-colon. It's displaying a a space in Hex when I look at it in a text editor, but judging how it appears when I copy it into SQL Fiddle it might not be. It's also possible it's an artefact of Stack Exchange's rendering engine.
Incidentally, 0001 is not a valid Oracle error code; 00001 is a unique constraint violation and would be declared as -00001 (note the minus sign).
However, this is not how I would go about doing this. Triggers incur additional overhead when used and obfuscate constraints that could be declared in the database. There's also always the danger of having cascading triggers, which can make your data-model extremely complex.
The simpler method of doing this would be to declare your POSTCODE column to be at most 8 characters/bytes (up to you) and to add a check constraint on the column to ensure that the length of the postcode is 6 characters (or bytes) or greater. This embeds the logic you need in the structure of the table (and thus in Oracle's metadata), making it a lot easier to see what's going on.
If you were to declare your table DDL as something like the below (obviously massively simplified):
create table location (
id number
, postcode varchar2(8)
, constraint pk_location primary key (id)
, constraint ck_location_postcode check (length(postcode) between 6 and 8)
)
Then you can achieve the same result (working SQL Fiddle). Note that the maximum length of the column POSTCODE is 8, which takes care of the upper bound and there's a further check constraint limiting it. I've defined the check constraint to take care of both the upper and lower bounds so that you can tell in the future that you intended 8 to be the upper bound. A change to the size of the column will not, therefore, break your constraint. It's a safety feature, nothing more and it could be declared as follows without changing the functionality:
, constraint ck_location_postcode check (length(postcode) >= 6)

Presumably, you want the length not the value:
create or replace trigger loc_id
before insert or update of postcode
on location
for each row
begin
if (length(:new.postcode) > 8) or (length(:new.postcode) < 6)
then raise_application_error(
'The postcode must be between 6 and 8 characters long');
end if;
end;
Your code doesn't generate an error because Oracle allows you to attempt to compare strings and numbers. The failure is when the string is not a numeric format.

It looks like POSTCODE is a string. Since you are trying to check its length, you need to use the LENGTH function.
if ( LENGTH(:new.postcode) > 8) or ( LENGTH(:new.postcode) < 6)
or personally I would prefer:
if NOT LENGTH(:new.postcode) BETWEEN 6 AND 8 THEN
In your version, you are trying to compare the actual value of POSTCODE to the number 6 and 8, which results in an error when the string value can't be converted to a number.

First column's data is missing

I am a newbie to ABAP. I am trying this program with open sql, and when I execute the program, the first column's data is always missing. I have looked up and the syntax appears to be correct. I am using kna1 table, the query is pretty simple too. If anybody notice the issue, please help me out.
DATA: WA_TAB_KNA1 TYPE KNA1,
IT_TAB_KNA1 TYPE TABLE OF KNA1,
V_KUNNR TYPE KUNNR.
SELECT-OPTIONS: P_KUNNR FOR V_KUNNR.
SELECT name1 kunnr name2
INTO TABLE IT_TAB_KNA1 FROM KNA1
WHERE KUNNR IN P_KUNNR.
LOOP AT IT_TAB_KNA1 INTO WA_TAB_KNA1.
WRITE:/ WA_TAB_KNA1-KUNNR,' ', WA_TAB_KNA1-NAME1.
ENDLOOP.

This is a classic - I suppose every ABAP developer has to experience this at least once.
You're using an internal table of structure KNA1, which means that your target variable has the following structure
ccckkkkkkkkkklllnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN...
with ccc being the client, kkkkkkkkkk being the field KUNNR (10 characters), lll the field LAND1 (3 characters), then 35 ns for the field NAME1, 35 Ns for the field NAME2 and so on.
In your SELECT statement, you tell the system to retrieve the columns NAME1, KUNNR and NAME2 - in that order! This will yield a result set that has the following structure, using the nomenclature above:
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnkkkkkkkkkkNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
Instead of raising some kind of type error, the system will then try to squeeze the data into the target structure - mainly for historical reasons. Because the first fields are all character fields, it will succeed. The result: the field MANDT of your internal table contains the first three characters of NAME1, the field KUNNR contains the characters 4-13 of the source field NAME1 and so on.
Fortunately the solution is easy: use INTO CORRESPONDING FIELDS OF TABLE instead of INTO TABLE. This will cause the system to use a fieldname-based mapping when filling the target table. As tomdemuyt mentioned it's also possible to roll your own target structure -- and for large data sets, that's a really good idea because you're wasting a lot of memory otherwise. Still, sometimes this is not an option, so you really have to know this error - recognize it as soon as you see it and know what to do.

SQL Server: Null VS Empty String

How are the NULL and Empty Varchar values stored in SQL Server. And in case I have no user entry for a string field on my UI, should I store a NULL or a '' ?

There's a nice article here which discusses this point. Key things to take away are that there is no difference in table size, however some users prefer to use an empty string as it can make queries easier as there is not a NULL check to do. You just check if the string is empty. Another thing to note is what NULL means in the context of a relational database. It means that the pointer to the character field is set to 0x00 in the row's header, therefore no data to access.
Update
There's a detailed article here which talks about what is actually happening on a row basis
Each row has a null bitmap for columns that allow nulls. If the row in
that column is null then a bit in the bitmap is 1 else it's 0.
For variable size datatypes the acctual size is 0 bytes.
For fixed size datatype the acctual size is the default datatype size
in bytes set to default value (0 for numbers, '' for chars).
the result of DBCC PAGE shows that both NULL and empty strings both take up zero bytes.

Be careful with nulls and checking for inequality in sql server.
For example
select * from foo where bla <> 'something'
will NOT return records where bla is null. Even though logically it should.
So the right way to check would be
select * from foo where isnull(bla,'') <> 'something'
Which of course people often forget and then get weird bugs.

The conceptual differences between NULL and "empty-string" are real and very important in database design, but often misunderstood and improperly applied - here's a short description of the two:
NULL - means that we do NOT know what the value is, it may exist, but it may not exist, we just don't know.
Empty-String - means we know what the value is and that it is nothing.
Here's a simple example:
Suppose you have a table with people's names including separate columns for first_name, middle_name, and last_name. In the scenario where first_name = 'John', last_name = 'Doe', and middle_name IS NULL, it means that we do not know what the middle name is, or if it even exists. Change that scenario such that middle_name = '' (i.e. empty-string), and it now means that we know that there is no middle name.
I once heard a SQL Server instructor promote making every character type column in a database required, and then assigning a DEFAULT VALUE to each of either '' (empty-string), or 'unknown'. In stating this, the instructor demonstrated he did not have a clear understanding of the difference between NULLs and empty-strings. Admittedly, the differences can seem confusing, but for me the above example helps to clarify the difference. Also, it is important to understand the difference when writing SQL code, and properly handle for NULLs as well as empty-strings.

An empty string is a string with zero length or no character.
Null is absence of data.

NULL values are stored separately in a special bitmap space for all the columns.
If you do not distinguish between NULL and '' in your application, then I would recommend you to store '' in your tables (unless the string column is a foreign key, in which case it would probably be better to prohibit the column from storing empty strings and allow the NULLs, if that is compatible with the logic of your application).

NULL is a non value, like undefined. '' is a empty string with 0 characters.
The value of a string in database depends of your value in your UI, but generally, it's an empty string '' if you specify the parameter in your query or stored procedure.

if it's not a foreign key field, not using empty strings could save you some trouble. only allow nulls if you'll take null to mean something different than an empty string. for example if you have a password field, a null value could indicate that a new user has not created his password yet while an empty varchar could indicate a blank password. for a field like "address2" allowing nulls can only make life difficult. things to watch out for include null references and unexpected results of = and <> operators mentioned by Vagif Verdi, and watching out for these things is often unnecessary programmer overhead.
edit: if performance is an issue see this related question: Nullable vs. non-null varchar data types - which is faster for queries?

In terms of having something tell you, whether a value in a VARCHAR column has something or nothing, I've written a function which I use to decide for me.
CREATE FUNCTION [dbo].[ISNULLEMPTY](#X VARCHAR(MAX))
RETURNS BIT AS
BEGIN
DECLARE #result AS BIT
IF #X IS NOT NULL AND LEN(#X) > 0
SET #result = 0
ELSE
SET #result = 1
RETURN #result
END
Now there is no doubt.

How are the "NULL" and "empty varchar" values stored in SQL Server.
Why would you want to know that? Or in other words, if you knew the answer, how would you use that information?
And in case I have no user entry for a string field on my UI, should I store a NULL or a ''?
It depends on the nature of your field. Ask yourself whether the empty string is a valid value for your field.
If it is (for example, house name in an address) then that might be what you want to store (depending on whether or not you know that the address has no house name).
If it's not (for example, a person's name), then you should store a null, because people don't have blank names (in any culture, so far as I know).