Query for columns that have Data masking applied to them

Query for columns that have Data masking applied to them - snowflake-cloud-data-platform

I created a masking policy for PII data. I then applied it to a table like so:
CREATE TABLE EXAMPLE.EXAMPLE_TABLE
(ID INT,
LAST_NAME STRING,
PHONE_NUMBER INT);
ALTER TABLE EXAMPLE.EXAMPLE_TABLE MODIFY COLUMN LAST_NAME SET MASKING POLICY PUBLIC.PII_MASK_STRING;
ALTER TABLE EXAMPLE.EXAMPLE_TABLE MODIFY COLUMN PHONE_NUMBER SET MASKING POLICY PUBLIC.PII_MASK_NUMERIC;
Now I want to be able to reverse engineer a DDL script like this with the ALTER TABLE... SET MASKING POLICY included.
Is there a way to query for the list of columns that have masking policies applied to them (and which mask it uses)?
EDIT: For this case, the user has ownership of the table but not the masking policy. what permissions are required to query this information?

The information schema table-function POLICY_REFERENCES has some interesting information in this case, more here: https://docs.snowflake.com/en/sql-reference/functions/policy_references.html
If this is not enough, you can query all policies with all tables and all columns when you combine the query above with your SHOW MASKING POLICIES; and the RESULT_SCAN()-function. RESULT_SCAN() allows you to query the results of SHOW MASKING POLICIES; (https://docs.snowflake.com/en/sql-reference/functions/result_scan.html)
Consequence: You get all names of policies and for each of them you can call POLICY_REFERENCES().

A vendor option is to use a Snowflake partner solution to automate masking policies on PII for dynamic data masking using Immuta. There is a demo video here if you find it helpful.
Full disclosure: I am employed my Immuta and my team works on content for data engineers.

Related

Can snowflake masking policies be assigned to tags?

I know that masking policies can be assigned to columns, for example:
alter table if exists user_info modify column email set masking policy email_mask;
But can we assign one masking policy to a tag? So that all the columns with the tag in one table can automatically be assigned the masking policy?
Thanks.

The simple answer is No. Masking policies can only be attached on columns of Tables and Views.
If you are looking to automatically assign Masking Policies to tagged columns, you can do the following:
Join the COLUMNS View and TAG_REFERENCES View to generate an ALTER Statement that sets the masking policy to tagged columns that doesn't have masking policies assigned to them
Put it in a Stored Procedure
Schedule a Task to run the Stored Procedure regularly

As of June 2022, tag-based masking policies are now in public preview:
https://docs.snowflake.com/en/release-notes/preview-features.html
https://docs.snowflake.com/en/user-guide/tag-based-masking-policies.html

Retrieve all available masking policies for an account in snowflake

How to get all the masking policies created in a particular account in snowflake? Is there any view to see it?
show masking policies only retrieves data related to the policies and not where it is applied?
How can I get all policies and in which columns in which tables it is applied?

You can query the information schema table-function POLICY_REFERENCES, see here: https://docs.snowflake.com/en/sql-reference/functions/policy_references.html
Here is also an example from the docs:
use database my_db;
use schema information_schema;
select *
from table(information_schema.policy_references(policy_name => 'ssn_mask'));
Important: You have to execute the USE DATABASE ... and USE SCHEMA INFORMATION_SCHEMA commands or use a full qualified identifier.
If this is not enough, you can query all policies with all tables and all columns when you combine the query above with your SHOW MASKING POLICIES; and the RESULT_SCAN()-function.
RESULT_SCAN() allows you to query the results of SHOW MASKING POLICIES; (https://docs.snowflake.com/en/sql-reference/functions/result_scan.html)
Consequence: You get all names of policies and for each of them you can call POLICY_REFERENCES().

Database design and large tables?

Are tables with lots of columns indicative of bad design? For example say I have the following table that stores user information and user settings:
[Users table]
userId
name
address
somesetting1
...
somesetting50
As the site requires more settings the table gets larger. In my mind this table is normalized, all the settings are dependent on the userId.
I have a thing against tables with lots of columns it just seems wrong to me, but then I remembered that you can select what data to return from the table, so If the table is large I could still break it into several different objects in code. For example
[User object]
[UserSetting object]
and return only the data to fill those objects.
Is the above common practice, or are their other techniques that deal with tables with lots of columns that are more suitable to use?

I think you should use multiple tables like this:
[Users table]
userId
name
address
[Settings table]
settingId
userId
settingKey
settingValue
The tables are related by the userId column which you can use to retrieve the settings for the user you need to.

I would say that it is bad table design. If a user doesn't have an entry for 47 of those 50 settings then you will have a large number of NULL's in the table which isn't good practice and will also slow down performance (NULL's have to be handled in a special way).
Instead, have the following:
USER TABLE
Id,
FirstName
LastName
etc
SETTINGS
Id,
SettingName
USER SETTINGS
Id,
SettingId,
UserId,
SettingValue
You then have a many to many join, and eliminate NULL's

first, don't put spaces in table names! all the [braces] will be a real pain!
if you have 50 columns how meaningful will all that data be for each user? will there be lots of nulls? Most data may not even apply to any given user. Think 1 to 1 tables, where you break down the "settings" into logical groups:
Users: --main table where most values will be stored
userId
name
address
somesetting1 ---please note that I'm using "somesetting1", don't
... --- name the columns like this, use meaningful names!!
somesetting5
UserWidgets --all widget settings for the user
userId
somesetting6
....
somesetting12
UserAccounting --all accounting settings for the user
userId
somesetting13
....
somesetting23
--etc..
you only need to have a Users row for each user, and then a row in each table where that data applies to the given user. I f a user doesn't have any widget settings then no row for that user. You can LEFT join each table as necessary to get all the settings as needed. Usually you only need to work on a sub set of settings based on which part of the application that is running, which means you won't need to join in all of the tables, just the one or tow that you need at that time.

You could consider an attributes table. As long as your indexes are good, then you wouldn't have too much of a performance issue:
[AttributeDef]
AttributeDefId int (primary key)
GroupKey varchar(50)
ItemKey varchar(50)
...
[AttributeVal]
AttributeValId int (primary key)
AttributeDefId int (FK -> AttributeDef.AttributeDefId)
UserId int (probably FK to users table?)
Val varchar(255)
...
basically you're "pivoting" your table with many columns into 2 tables with less columns. You can write views and table functions around this structure to give you data for a group of related items or just a specific item, etc. You could also add other things to the attribute definition table to indicate required data elements, restrictions on the data elements, etc.
What's your thought on this type of design?

Use several tables with matching indexes to get the best SELECT speed. Use the indexes as a way to relate the information between tables using a JOIN.

Audit fields(CreatedBy, UpdatedBy) in tables. Is it good idea?

I was working with one product where almost every table had those columns. As developers we constantly had to join to Users table to get Id of who created record and it's just a mess in a code.
I'm designing new product and thinking about this again. Does it have to be like this? Obviously, it is good to know who created record and when. But having 300+ tables reference same User table doesn't seem to be very good..
How do you handle things like this? Should I create CreatedBy column only on major entities where it's most likely needed on UI and than deal with joining? Or should I go and put it everywhere? Or maybe have another "Audit" table where I store all this and look it up only on demand(not every time entity displayed on UI)
I'm just worrying about performance aspect where every UI query will hit User table..
EDIT: This is going to be SQL Server 2008 R2 database

The problem with that approach is that you only know who created the row and who changed the row last. What if the last person to update the row was correcting the previous updater's mistake?
If you're interested in doing full auditing for compliance or accountability reasons, you should probably look into SQL Server Audit. You can dictate which tables you're auditing, can change those on the fly without having to mess with your schema, and you can write queries against this data specifically instead of mixing the auditing logic with your normal application query logic (never mind widening every row of the table itself). This will also allow you to audit SELECT queries, which other potential solutions (triggers, CDC, Change Tracking - all of which are either more work or not complete for true auditing purposes) won't let you do that.

I know that this is an older post, but one way to avoid the lookup on the user table is to de-normalize the audit fields.
So instead of a userid in the CreatedBy field you insert a username itself. This will allow for a review of the table without the user look and also allow for any changes in your user table not reflect in the audit fields. Such as deleted users.
I usually add the following to the end of a table
IsDeleted bit default 0
CreatedBy varchar(20)
CreatedOn datetime2 default getdate()
UpdatedBy varchar(20)
UpdatedOn datetime2 default getdate()

Why i cant add new columns to my Users table?

I am doing some homework. The users of my database uses some other attributes, not just the ones that ASP 2.0 automatically created for me when i implemented the login and registration mechanism. But when i try to save the modification displays me an error. Can someone give me a hand?
This is the error:
The error says:
'aspnet_Users' table
- Unable to modify table. ALTER TABLE only allows columns to be added
that can contain nulls, or have a
DEFAULT definition specified, or the
column being added is an identity or
timestamp column, or alternatively if
none of the previous conditions are
satisfied the table must be empty to
allow addition of this column. Column
'kjoptekvoten' cannot be added to
non-empty table 'aspnet_Users' because
it does not satisfy these conditions.
That database was automatically created when i implemented Forms based authentification and registration. The problem now is that that users needs some more attributes. How can i give to it more attributes? What is the easiest way to do it?Does not mind if it is not theorically correct(It is just for a homework).
I would appreciate a lot your help.

Apart form the technicalities on the database side, there is a deeper issue here.
You should not alter the aspnet_Users table because you are bypassing the way the membership 'system' in asp.net is working. Instead, have a look into the Profile mechanism: https://web.archive.org/web/20211020111657/https://www.4guysfromrolla.com/articles/101106-1.aspx

You need to make the new attributes nullable or provide a default value. But you also need to consider how to obtain the values from db. The sql membership provider utilizes an auto generated stored procedure to put data into the membership user instance returned,so just adding the attributes in the table will not be sufficient to get the attribute values to your application. I would use a user attribute table instead.

The error message says it all:
You are adding a new column that can't be Null (checkbox "Allow Nulls" not checked), but as you didn't provide a default value, it will be Null.
So SQL Server can't create the new column.
You can do two things:
a) Create the new column with Nulls allowed.
THEN put a default value in all existing rows:
update aspnet_Users set kjoptekvoten = 0)
...and THEN uncheck "Allow Nulls"
b) Create the new column directly with default values.
I don't know if you can do this in Management Studio, but it's easy in T-SQL:
alter table aspnet_Users
add kjoptekvoten int not null
constraint Name_For_Constraint default(0) with values
This will add the new not nullable column, AND create a constraint with a default value, AND fill the default value in all existing rows (SQL Server will not do this without the "with values" clause).

Normally I just set the column as allow nulls
then do an SQL UPDATE TABLE SET VALUE = whateva
then update the table definition to not allow nulls.

Categories

azure-form-recognizer

visual-web-developer-2010

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Query for columns that have Data masking applied to them - snowflake-cloud-data-platform

A vendor option is to use a Snowflake partner solution to automate masking policies on PII for dynamic data masking using Immuta. There is a demo video here if you find it helpful. Full disclosure: I am employed my Immuta and my team works on content for data engineers.

Related

Can snowflake masking policies be assigned to tags?

Retrieve all available masking policies for an account in snowflake

Database design and large tables?

Audit fields(CreatedBy, UpdatedBy) in tables. Is it good idea?

Why i cant add new columns to my Users table?

Categories

Resources