SQL Server Data Masking bug with "FOR JSON PATH" clause

SQL Server Data Masking bug with "FOR JSON PATH" clause - sql-server

I'm working with a masked database on my QA server using SQL Server Standard (64-bit) 14.0.1000.169. This is my structure:
CREATE TABLE [dbo].[Test](
[Column1] [VARCHAR(64)] NULL,
[Column2] [VARCHAR(64)] NULL
)
GO
INSERT INTO [dbo].[Test]
VALUES ('ABCDEFG', 'HIJKLMN')
I've masked the column with the following code:
ALTER TABLE [dbo].[Test]
ALTER COLUMN [Column1] VARCHAR(64) MASKED WITH (FUNCTION = 'default()');
It works as expected when I perform the following query using a non-allowed user:
SELECT [Column1], [Column2]
FROM [dbo].[Test]
FOR JSON PATH
-- RESULT: '[{"Column1":"xxxx", "Column2":"HIJKLMN"}]'
But it doesn't work when the same non-allowed user saves the result in variable (the main goal):
DECLARE #var VARCHAR(64)
SET #var = (SELECT [Column1], [Column2] FROM [dbo].[Test] FOR JSON PATH)
SELECT #var --it should show a valid JSON...
-- RESULT: 'xxxx' <-- JSON LOSES ITS STRUCTURE
-- DESIRED RESULT: '[{"Column1":"xxxx", "Column2":"HIJKLMN"}]' <-- VALID JSON
Main problem: JSON looses its structure when a masked column appear in the SELECT and "FOR JSON PATH" clause is present.
We want to get a valid JSON even if the data column is masked or not, or even if sa user or not.
I've tested using NVARCHAR or doing a CAST in the masked column, but the only way we get the desired result is using a #tempTable before use the "FOR JSON PATH" clause.
How can I do for SELECT a masked column and save it to
VARCHAR variables without loose JSON structure?
Any help will be appreciated.
NOTE: The SA user is default allowed to see unmasked data (so the JSON doesn't loose its structure), but we want to execute it on a non-allowed user and return a valid JSON, not only 'xxxx'.

It does indeed appear to be a bug. Repro is here. Although see below, not so sure.
When using FOR JSON, or for that matter FOR XML, as a top level SELECT construct, a different code path is used as compared to placing it in a subquery or assigning it to a variable. This is one of the reasons for the 2033-byte limit per row in a bare FOR JSON.
What appears to be happening is that in the case of a bare FOR JSON, the data masking happens at the top of the plan, in a Compute Scalar operator just before the JSON SELECT operator. So the masking happens on just the one column.
PasteThePlan
Whereas when putting inside a subquery, a UDX function operator is used. The problem is that the Compute Scalar is happening after the UDX has created the JSON or XML, whereas it should have been pushed down below the UDX in the plan.
PasteThePlan
I suggest you file the bug with Microsoft, on the Azure Feedback site.
Having gone over this a little, I actually think now that it's not a bug. What does seem to be a bug is the case without nesting.
From the documentation:
Whenever you project an expression referencing a column for which a data masking function is defined, the expression will also be masked. Regardless of the function (default, email, random, custom string) used to mask the referenced column, the resulting expression will always be masked with the default function.
Therefore, when you select any masked column, even in a normal SELECT, if you use a function on the column then the masking always happens after any other functions. In other words, the masking is not applied when the data is read, it is applied when it is finally output to the client.
When using a subquery, the data is fed into a UDX function operator. The compiler now senses that the final resultset is a normal SELECT, just that it needs to mask any final result that came from the masked column. So the whole JSON is masked as one blob, similar to if you did UPPER(yourMaskedColumn). See the XML plan in this fiddle for an example of that.
But when using a bare FOR JSON, it appears to the compiler as a normal SELECT, just that the final output is changed to JSON (the top-level SELECT operator is different). So the masking happens before that point. This seems to me to be a bug.
The bug is even more egregious when you use FOR XML, which uses the same mechanisms. If you use a nested FOR XML ..., TYPE then you get just <masked /> irrespective of whether you nest it or not. Again this is because the query plan shows the masking happening after the UDX. Whereas if you don't use , TYPE then it depends if you nest it. See fiddle.

Related

How to solve the performance difference while querying for records when numeric data is strored in varchar columns?

I am trying to query my database which is MSSQL with jpa query dsl library (package com.querydsl.jpa.impl.JPAQuery) and found a performance problem while running the query. I am using java api to execute the query dsl predicate.
My table has a column called point_id whose type is Varchar(20) and is used to store numeric values ie number values as string.
When I try the query (which is also done by the query dsl)
select
testperfor0_.serv_code as hm_serv_8_5_,
testperfor0_.version as version9_5_
from
TestPerformanceObject testperfor0_
where
(
testperfor0_.point_id in (
1, 2
);
the performance is very low when compared to the query
select
testperfor0_.serv_code as hm_serv_8_5_,
testperfor0_.version as version9_5_
from
TestPerformanceObject testperfor0_
where
(
testperfor0_.point_id in (
'1,'2'
);
The difference between the 2nd query and the one done by the dsl is that the data is provided in single quotes. This says that there will be a conversion (to_char()) while performing the query and this performance problem is also discussed here .
Is there any solution for this ?
Edit: The column is of type Varchar(20) because it can also hold
non-numeric values.

Your real problem actually begins with your topic sentence:
My table has a column called point_id whose type is Varchar(20) and is used to store numeric values ie number values as string.
If you are trying to store numeric values, then you should be using some kind of number column, not a varchar or other text column.
That being said, the performance difference appears to be due to an implicit conversion which is happening with this version of your query:
where testperfor0_.hm_inv_point_id in (1, 2)
If you must stick with your current data model, then you should be comparing point_id against text values. So, from your JPA code make sure that you are binding Java strings to the IN clause.

How to query multiple JSON document schemas in Snowflake?

Could anyone tell me how to change the Stored Procedure in the article below to recursively expand all the attributes of a json file (multiple JSON document schemas)?
https://support.snowflake.net/s/article/Automating-Snowflake-Semi-Structured-JSON-Data-Handling-part-2

Craig Warman's stored procedure posted in that blog is a great idea. I asked him if it was okay to refactor his code, and he agreed. I've used the refactored version in the field, so I know the SP well as well as how it works.
It may be possible to modify the SP to work on your JSON. It will depend on whether or not Snowflake types the JSON in your variant column. The way you have it structured, it may not type everything. You can check by running this SQL and seeing if the result set includes all the columns you need:
set VARIANT_TABLE = 'WEATHER';
set VARIANT_COLUMN = 'V';
with MAIN_TABLE as
(
select * from identifier($VARIANT_TABLE) sample (1000 rows)
)
select distinct REGEXP_REPLACE(REGEXP_REPLACE(f.path, '\\[(.+)\\]'),'[^a-zA-Z0-9]','_') AS path_name, -- This generates paths with levels enclosed by double quotes (ex: "path"."to"."element"). It also strips any bracket-enclosed array element references (like "[0]")
typeof(f.value) AS attribute_type, -- This generates column datatypes.
path_name AS alias_name -- This generates column aliases based on the path
from
MAIN_TABLE,
LATERAL FLATTEN(identifier($VARIANT_COLUMN), RECURSIVE=>true) f
where TYPEOF(f.value) != 'OBJECT'
AND NOT contains(f.path, '[');
Be sure to replace the variables to your table and column names. If this picks up the type information for the columns in your JSON, then it's possible to modify this SP to do what you need. If it doesn't but there's a way to modify the query to get it to pick up the columns, that would work too.
If it doesn't pick up the columns, based on Craig's idea I decided to write type inference for non variant (such as strings from CSV log files without type information). Try the SQL above and see what results first.

SQL INSERT INTO: comma as decimal

My problem is that my customer runs his SQL Server on a Windows box and the country settings are set to "Germany".
This means, a decimal point is NOT a point ., it's a comma ,!
Inserting a double value to the database works like this
INSERT INTO myTable (myPrice) VALUES (16,5)
Works fine, so far.
The problem comes up if there is more than one value with decimal places in the statement like
INSERT INTO myTable (myPrice, myAmount) VALUES (16,5,10)
I get the error
Number of query values and destination fields are not the same.
Can I somehow "delimit" the values? Tried to add brackets around but this does not work.
Unfortunately I cannot change the language settings of the OS or the database because I am just writing some add-ons to an existing application.
Thank you!
ev

You must put the data in the format allowed by database. Even if you put data using comma... You may loose out numerical calculations.
If I get such situation.. I will check if comma is only required for visibility.. then I would display values in comma format while store them in decimal format.
This way data can be easily processed as numeric. But need to change it to and fro only for UI or display.
Based on this you may describe your situation in more detail if required.
EDIT: To verify my theory can you check if this insert statement has inserted 165 or 16.5 in the database.
INSERT INTO myTable (myPrice) VALUES (16,5);
select from mytable where myprice <17;

INSERT Query SQL (Error converting data type nvarchar to (null))

I'm trying to run an INSERT query but it asks me to convert varchar to null. Here's the code:
INSERT Runtime.dbo.History (DateTime, TagName, vValue)
VALUES ('2015-09-10 09:00:00', 'ErrorComment', 'Error1')
Error message:
Error converting data type nvarchar to (null).
The problem is at the vValue column.
column vValue(nvarchar, null)
How it looks in the database:
The values inside vValue are placed by the program I'm using. I'm just trying to manually insert into the database.
Last post was with the wrong column, I apologize.

After contacting Wonderware support i found out that INSERT is not supported on the vValue column by design. It’s a string value and updates are supposed to carry out via the StringHistory table.

What is the type of the column value in the database ?
If it's float, you should insert a number, not string.
Cast "error1" to FLOAT is non-sense.
Float is a number exemple : 1.15, 12.00, 150.15
When you try to CAST "Error1" to float, he tries to transform the text "error1" to number and he can't, it's logic.
You should insert a number in the column.

I think I can help you with your problem since I've got a decent test environment to experiment with.
Runtime.dbo.History is not a table you can interact directly with, it is a View. In our case here the view is defined as:
select * from [INSQL].[Runtime].dbo.History
...Which I believe implies the History data you are viewing is from the Historian flat file storage itself, a Wonderware Proprietary system. You might see some success if you expand the SQL Server Management Studio's
Server Objects -> Linked Servers -> INSQL
...and play with the data there but I really wouldn't recommend it.
With that said, for what reason do you need to insert tag history? There might be other workarounds for the purpose you need.

EF generate result objects for SQL OUTPUT clause

What can I do so that EF becomes aware of the output clause in SPs and generates the result object accordingly?
INSERT INTO goodtable
(token,
ip, long_ip,
) OUTPUT INSERTED.*
VALUES
(#token,
#ip, #long_ip,
);
What I currently do to circumvent this is writing a dummy select, generate the objects and the comment out the dummy select leaving the output. This is not a good solution for a long term run.
Please do not suggest changing the SQL.

Have you tried. when importing the stored procedure into EF, using function import, to add the returns a collection of option.
I've never tried it, but can't see a reason for it not to work.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

SQL Server Data Masking bug with "FOR JSON PATH" clause - sql-server

Related

How to solve the performance difference while querying for records when numeric data is strored in varchar columns?

How to query multiple JSON document schemas in Snowflake?

SQL INSERT INTO: comma as decimal

INSERT Query SQL (Error converting data type nvarchar to (null))

EF generate result objects for SQL OUTPUT clause

Categories

Resources