Currently, I'm using a Supabase database. One of the big roadblocks that I'm facing is column-level security, which seems a lot more complicated than RLS.
Say that I have a column called is_banned, that is viewable but not editable. However, the rest of the columns should be both editable and viewable.
The only solution that I can really think of is splitting it into two tables and having RLS on the "sensitive information" table - but creating a private table for every table seems rather unnecessary.
Are there other solutions?
In PostgreSQL, you can specify column-level permissions via GRANT and/or REVOKE statements.
The tricky part here is that these permissions are set against PostgreSQL users/roles, NOT against your app users. So you need to ensure that the permissions are set against all users that Supabase uses to execute client requests. As far as I know, Supabase uses the anon and authenticated PostgreSQL roles to execute requests, however, there is no official documentation on this so I am not 100% sure there aren't any others.
You can read more about how to utilize this technique here (see the section called Column-level permissions).
I had to deal with this issue myself. I currently solve it with views, but would rather choose RLS policies, triggers or privileged functions in the future (untested, as of right now). I share the notes from my research into this issue below.
CLS means to selectively prohibit column updates based on certain conditions. There are several alternative solutions for this (summary), each with advantages and disadvantages. They are discussed in detail below.
Option 1: RLS policies
(My favourite option so far, but I have not yet used it in practice.)
Here, you would use a row-level security (RLS) policy by manually retrieving the old row and comparing if your field's value would change from the old to the new value. A solution candiadate for this has been posted as a Stack Overflow answer, but this has to be made into a generic function still. At first, this seems better than a trigger: it shares its advantages and in addition, Supabase promotes the use of RLS policies for access control anyway, and has much better UI support for RLS than for triggers. So it would improve consistency and maintainability of the database by reducing complexity.
However, the Supabase RLS editor cannot be used for complex RLS policies (issue report), so as a workaround one should wrap all RLS code into a single or nested function call, or at least something no longer than one line of code. Even better is to maintain the SQL source code under version control outside of Supabase, and to copy-and-paste it into the Supabase SQL Editor whenever you want to change a RLS policy, table, function and so on.
Option 2: Triggers
See here for instructions.
Advantages:
Does not add another table or view, so that the database structure is determined by the data and not by permission system quirks, as it should be.
Does not require changes to the default Supabase permissions or table-to-schema assignments.
Combined the powers of RLS policies and column-level permissions.
Disadvantages:
Triggers are not yet supported well in the Supabase UI: only the trigger status can be changed, but it cannot be shown or edited in the UI, only in PostgreSQL console. In practice, this is not much of an issue, as for any real-life project you will have to work with the PostgreSQL database directly, anyway.
It requires knowledge of PGSQL or another programming language … and for some, programming is what we want to avoid with Supabase.
Option 3: Privileged Functions
"You can hide the table behind a FUNCTION with SECURITY DEFINER. The table itself would not provide UPDATE access, instead users can only update the table through the FUNCTION." (source)
And in that function, you can determine column-level access permissions in any way you like. Any such function in schema public is automatically available through the API:
"write PostgreSQL SQL functions […] and call them via supabase.rpc('function_name', {param1: 'value'});." (source).
The issue is, however, that the API then no longer has a unified structure of "everything is available in tables".
Option 4: User-specific views
See the instructions. More instructions:
"You can create a view to only show the columns you want, make sure you secure with a WHERE statement as it ignores RLS (normally), and then use RLS to block the original table." (source)
This solution has been recommended by a Supabase maintainer. In total, RLS policies and triggers seem preferable, though.
To make this solution secure, you have to use option security_barrier = on (details), which can severely impact view performance. The only way around that is to not use a WHERE clause and instead to re-use RLS policies of the base table via security_invoker = on. That requires moving the base table to a custom database scheme that is not exposed by API (see below).
Advantages:
Simple. Views are just like tables, and everyone knows PostgreSQL tables – in contrast to triggers or (complex) RLS policies.
You see what you edit. Users (or their applications) who can see records in the table do not have to worry if they are editable due to RLS policies. Whatever a user can see, they can edit.
❓ Extendable as needed. (Still unsure about this.) Only the columns a certain user is allowed to edit can be provided in the view. To find the right column, sometimes more context is needed. Not a problem: join the view and columns from the underlaying base table again as needed, at API access time. Only the surrogate primary key column id needs to be always included into the view; this is not an issue: if a user tries to edit it, it can only succeed when using new values, in which case effectively a new record is created, which the user is probably allowed to do anyway. (To be confirmed that updates with proper access protection are then still possible.)
Disadvantages:
Cluttering the table space. Ideally, the API would expose the data in the form they have in a proper database design. By exposing additional views, the API becomes unnecessarily complex.
Can not really reuse RLS policies of underlaying table. To be done by using security_invoker = on when creating the view (details). However, when doing this, the same user that can, say, update a record through the view can then also update that record in the base table, circumventing the column access restrictions for which the view is used. The only way around that would be to move the base table to a custom database scheme that is not exposed by API. That is possible, but adds yet more structural complexity.
Needs changes to the default view permissions. Since these are simple views, they are "updateable" views in PostgreSQL. Together with the default table-level / view-level permissions in Supabase schema public this means that all users, even anonymous ones, can delete records from these views, leading to the deletion of records in the underlaying tables.
To fix this, one has to remove the INSERT and DELETE privileges from the view. This is a change to the default Supabase permissions that would ideally not be necessary.
There is an alternative solution, but it is not very practical: you can create the views with security_invoker = on to reuse the RLS policies of the underlaying table. Then use these RLS policies to prevent record deletion. However, they have to allow SELECT and UPDATE; so unless you move the underlaying table to a schema not exposed by API, it would allow users to circumvent the column-level security for which the views were created.
No good way to restrict the use of certain values in a column to certain users. This is because views cannot have their own RLS policies. There are several ways to work around this:
Probably the best way to work around that is to structure tables so that a user with write access to a column is allowed to use every value in that column. For example, instead of columns role (user, admin) and status (applied, approved, disapproved), there would be nullable boolean columns user_application, admin_application, user_status, admin_status.
Another option, for complex cases, is to move the underlying table to a custom schema that is not API accessible (while still granting USAGE and permissions to all Supabase roles; see), to create RLS policies on that underlying table, and to re-use them in the views via security_invoker = on.
Another option, also for complex cases, is to use triggers on the view or the underlaying table.
Option 5: Column-level access rights
"You can provide UPDATE access to only a subset of columns of the table: GRANT UPDATE(col1, col2). (details)" (source)
Reportedly, it is a hassle to maintain all these access rights. And it would not be applicable in Supabase to differentiate between different authenticated users, as them all share the same role ("user") authenticated in Supabase. (PostgREST could offer different options for this.)
Option 6: Table Splitting
Compared to views, this splits the main table into multiple parts. Using RLS policies, it is defined who can do what with each partial table; and, different from views where you can only partially emulate RLS policies in a WHERE clause, a RLS policy can also be used to limit which values a user can use for a column. To use them together, they have to be joined in requests. Quite ok when splitting a table in two. But sometimes the splitting is almost "one table per column", for example for permission management tables with one column per role. This is bad because it "atomizes" the data rather than keeping it in a proper normal form, meaning that the data is not even accessible to admins in a comfortable way.
Related
I'm working on an app that has a Slack style workspace architecture where the user can access the same function of the application under multiple "instances" (workspaces).
I'm going to continue with using Slack as an example to explain my issue.
When any action is taken in my application I need to validate that the user has the rights to perform an action on the specified resource and that the resource is within the same workspace as the user.
The first tables I create such as Users have a simple database relationship to the workspace. Using a WorkspaceId field in the Users table for example.
My issue is as I create more tables which are "further" away such as UserSettings which might be a one to one relationship to the Users table I now have to do a join to the Users record to get the workspace which the UserSettings record belongs to.
So now I am thinking is it worth adding a workspaceId value on all tables since I will endup doing a lot of JOINs in my database to continue verifying that the user has permissions to that resource.
Looking for advice/architecture patterns which may help with the scenario.
I'm assuming your main concern with multiple JOIN statements is that the query performance will suffer. Multiple JOIN statements don't always mean a query will be slow. The query performance depends on many factors, how large the dataset is and how well indexed it is, what database engine and ultimately what the query plan is. You'll only end up with lots of JOIN statements if you decide to normalize the database that way. Using a full third normal form is rarely the right choice for a schema because of the potential performance impacts it can have. Some duplication of data is generally okay, the trade off you are making is storage cost vs query performance. To decide on how to normalize the database there are many questions you should be asking here's some that come to mind:
What type of queries do you expect to make?
How often will each type of query be made?
How often will the data change and can a cache be used?
Does a different storage technology better suit the use case?
Is some of the data small enough that it can be all in one table?
In my experience designing user management systems, usually ends up with a cache or similar mechanism for having fast user to a given users permissions that has an acceptable expiry window. This means you are only querying the database for a given user at the expiry window and using the cache a majority of the time. This is why many security systems and user systems don't immediately update settings. The more granular and flexible the type of permission you want to grant user the more expensive the query is going to be because of the complexity. At which point you can decide to denormalize the data or use a coaching mechanism.
I want to provide controlled access to data which is stored in multiple tables. The access is decided based on certain run-time attributes associated with the user. I am looking for a solution which is extensible, performant as well as highly secured.
ILLUSTRATION:
There is a framework level module which stores authorization/access related data for multiple other modules. Then there are n numbers of modules which manage their own life cycle objects. e.g. module Test1 has 1000 instances which are created and stored in its base table. As framework solution I want to protect access to this data by users hence I created a notion of privileges and stored their mapping to user in my own table. Now to provide controlled access to data, my aim is that a user is shown only the objects to which he/she has access to.
Approaches in my mind:
I use oracle database and currently we are using VPD (virtual private database) so here we add a policy on each of the base table of above mentioned modules which firstly evaluates the access of currently logged in user from the privileges given to him and then that data is appended into all the query to each of the base tables of other modules (by default by database itself).
PROS: very efficient and highly secured solution.
CONS: Can not work if the base tables and our current table are in two different schema. May be two different schema in the same database instance can be overcome but some of my integrator systems might be in separate databases altogether.
Design at java layer:
We connect to our DB's through JPA data sources. So I can write a thin layer basically a wrapper of sorts over EntityManager and then replicate what VPD does for me that is firstly get the access related data from my tables then use a monitored query on the table of my integrator and then may be cache the data into a caching server(optimization).
CONS: I want to use it in production system hence want to get it done in the first shot. Want to know any patterns which are already implemented in the industry.
I do not think your solution are flexible enough to work well in a complex scenario like yours. If you have very simple queries, then yes, you can design something like SQL screener at database or "java" level and then just pass all your queries through.
But this is not flexible. As soon as your queries will start to grow complex, improving this query screener will become tremendously difficult since it is not a part of bussiness logic and cannot know the details of your permission system.
I suggest you implement some access checks in your service layer. Service must know for which user it generates or processes the data. Move query generation logic to repositories and have your services call different repository methods depending on user permissions for example. Or just customize repository calls with parameters depending on user permissions.
I have an ASP.NET MVC + SQL Server application with 250 simultaneous users daily which uses AD/NTLM SSO to do all the authorization using a custom authorization security class that control access to controllers & Actions based on users & groups.
A dilemma recently came up where the 50K+ account records of the database are going to be managed by different groups to varying degree's:
All users will be able to view most records certain records can only
be edited by certain users/groups of specific departments There will
be an admin & support groups that will be able to edit any group owned records
etc.
This is not a problem of who has access to what features/forms/etc. in the controllers, but instead a dilemma of data ownership restrictions that must be imposed. I am guessing this means I need some additional layer of security for row level security.
I am looking for a pragmatic & robust way to tackle data ownership within the current application framework with minimal performance hits since it is likely the same thing will need to be imposed on other much larger tables of data. Initially there will be about 5 ownership groups, but creeping up to 25 to 100 in the near future.
Sadly there are no cut and dry business rules that are hard and fast that can be implemented here.. there is no rhyme or reason make sense of who owns what except the record primary key id.
To try to fix it I was thinking of creating a table of owner_roles and map it to the users table then create another table called accounts_ownership that looks something like:
tbl(PK),row(PK),owner(PK),view,create,modify,delete
accounts,1,hr,1,1,1,1
accounts,1,it,1,0,0,0
accounts,2,hr,1,1,1,1
accounts,2,it,1,1,1,1
accounts,3,it,1,0,0,0
But in doing so that would create a table that was 250K lines and could easily get some crappy performance. Looking at sites like Facebook and others this must be a common thing that has to be implemented, but I am hesitant to introduce a table like that since it could create serious performance issues.
The other way I thought this may be implemented is by adding an extra column to the accounts table that is a compound field that is comma separated that would contain the owner(s) with a coded set of rights ie.:
id owners
1 ,hr,
2 ,hr,
3 ,hr,it,
4 ,it,
And then add a custom class to search using the 'like' statement.. provided the logged in users role was "it" and the comma's were reserved and not allowed in owners names:
SELECT * FROM accounts WHERE owners LIKE '%,it,%'
... however this really just feels wrong from a DBA perspective (ugly as hell) and a maintenance nightmare.
Any practical approaches on how I could implement this without destroying my site?
Start with Role-based access control, you can possibly skip the roles from the pure definition but should be able to implement it like this:
Every user can be in one or more groups like admin, support, it, hr
Every data row has an owner like it, hr
On Access, check the access: an admin can see and edit all rows. Support+it sees every row and can edit those from it etc. This way you need only (user-groups + row-access) new rows in your database, not (user-groups * row-access).
User groups in your scenario should be possible to hardcode in your application, in a CMS there is generally a table defining what rights to assign to each user group - complicating the coding but very flexible.
Roles in the original concept allow a user to select what rights he/she wants to use, there would be a "Unlock with admin rights" or the like in your interface.
Primarily for performance reasons, I went with the less elegant approach listed. It took some doing, but there are careful application controls that had to be created to enforce things like no comma's in the id's.
I'm certainly no DBA and only a beginner when it comes to software development, so any help is appreciated. What is the most secure structure for storing the data from multiple parties in one database? For instance if three people have access to the same tables, I want to make sure that each person can only see their data. Is it best to create a unique ID for each person and store that along with the data then query based on that ID? Are there other considerations I should take into account as well?
You are on the right track, but mapping the USER ID into the table is probably not what you want, because in practice many users have access to the corporations data. In those cases you would store "CorpID" as a column, or more generically "ContextID". But yes, to limit access to data, each row should be able to convey who the data is for, either directly (the row actually contains a reference to CorpID, UserID, ContextID or the like) or it can be inferred by joining to other tables that reference the qualifier.
In practice, these rules are enforced by a middle tier that queries the database, providing the user context in some way so that only the correct records are selected out of the database and ultimately presented to the user.
...three people have access to the same tables...
If these persons can query the tables directly through some query tool like toad then we have a serious problem. if not, that is like they access through some middle tier/service layer or so then #wagregg's solution above holds.
coming to the case when they have direct access rights then one approach is:
create database level user accounts for each of the users.
have another table with row level grant information. say your_table has a primary key column MY_PK_COL then the structure of the GRANTS_TABLE table would be like {USER_ID; MY_PK_COL} with MY_PK_COL a foreign key to your_table.
Remove all privileges of concerned users from your_table
Create a view. SELECT * FROM your_table WHERE user_id=getCurrentUserID();
give your users SELECT/INSERT/UPDATE rights on this view.
Most of the database systems (MySQL, Oracle, SQLServer) provide way to get current logged user. (the one used in the connection string). They also provide ways to restrict access to certain tables. now for your users the view will behave as a normal table. they will never know the difference.
a problem happens when there are too many users. provisioning a database level uer account to every one of them may turn difficult. but then DBMS like MsSQLServer can use windows authentication, there by reducing the user/creation problem.
In most of the scenarios the filter at middle tier approach is the best way. but there are times when security is paramount. Also a bug in the middle tier may allow malicious users to bypass the security. SQL injection is one thing to name. then you have to do what you have to do.
It sounds like you're talking about a multi-tenant architecture, but I can't tell for sure.
This SO answer has a summary of the issues, and links to an online article containing details about the trade-offs.
I'm working on a single database with multiple database schemas,
e.g
[Baz].[Table3],
[Foo].[Table1],
[Foo].[Table2]
I'm wondering why the tables are separated this way besides organisation and permissions.
How common is this, and are there any other benefits?
You have the main benefit in terms of logically groupings objects together and allowing permissions to be set at a schema level.
It does provide more complexity in programming, in that you must always know which schema you intend to get something from - or rely on the default schema of the user to be correct. Equally, you can then use this to allow the same object name in different schemas, so that the code only writes against one object, whilst the schema the user is defaulted to decides which one that is.
I wouldn't say it was that common, anecdotally most people still drop everything in the dbo schema.
I'm not aware of any other possible reasons besides organization and permissions. Are these not good enough? :)
For the record - I always use a single schema - but then I'm creating web applications and there is also just a single user.
Update, 10 years later!
There's one more reason, actually. You can have "copies" of your schema for different purposes. For example, imagine you are creating a blog platform. People can sign up and create their own blogs. Each blog needs a table for posts, tags, images, settings etc. One way to do this is to add a column
blog_id to each table and use that to differentiate between blogs. Or... you could create a new schema for each blog and fresh new tables for each of them. This has several benefits:
Programming is easier. You just select the approppriate schema at the beginning and then write all your queries without worrying about forgetting to add where blog_id=#currentBlog somewhere.
You avoid a whole class of potential bugs where a foreign key in one blog points to an object in another blog (accidental data disclosure!)
If you want to wipe a blog, you just drop the schema with all the tables in it. Much faster than seeking and deleting records from dozens of different tables (in the right order, none the less!)
Each blog's performance depends only (well, mostly anyway) on how much data there is in that blog.
Exporting data is easier - just dump all the objects in the schema.
There are also drawbacks, of course.
When you update your platform and need to perform schema changes, you need to update each blog separately. (Added yet later: This could actually be a feature! You can do "rolling udpates" where instead of updating ALL the blogs at the same time, you update them in batches, seeing if there are any bugs or complaints before updating the next batch)
Same about fixing corrupted data if that happens for whatever reason.
Statistics for all the platform together are harder to calculate
All in all, this is a pretty niche use case, but it can be handy!
To me, they can cause more problems because they break ownership chaining.
Example:
Stored procedure tom.uspFoo uses table tom.bar easily but extra rights would be needed on dick.AnotherTable. This means I have to grant select rights on dick.AnotherTable to the callers of tom.uspFoo... which exposes direct table access.
Unless I'm completely missing something...
Edit, Feb 2012
I asked a question about this: SQL Server: How to permission schemas?
The key is "same owner": so if dbo owns both dick and tom schema, then ownership chaining does apply. My previous answer was wrong.
There can be several reasons why this is beneficial:
share data between several (instances
of) an application. This could be the
case if you have group of reference
data that is shared between
applications, and a group of data
that is specific for the instance. Be careful not to have circular references between entities in in different schema's. Meaning don't have a foreign key from an entity in schema 1 to another entity in schema 2 AND have another foreign key from schema 2 to schema 1 in other entities.
data partitioning: allows for data to be stored on different servers
more easily.
as you mentioned, access control on DB level