Related
Currently, I'm using a Supabase database. One of the big roadblocks that I'm facing is column-level security, which seems a lot more complicated than RLS.
Say that I have a column called is_banned, that is viewable but not editable. However, the rest of the columns should be both editable and viewable.
The only solution that I can really think of is splitting it into two tables and having RLS on the "sensitive information" table - but creating a private table for every table seems rather unnecessary.
Are there other solutions?
In PostgreSQL, you can specify column-level permissions via GRANT and/or REVOKE statements.
The tricky part here is that these permissions are set against PostgreSQL users/roles, NOT against your app users. So you need to ensure that the permissions are set against all users that Supabase uses to execute client requests. As far as I know, Supabase uses the anon and authenticated PostgreSQL roles to execute requests, however, there is no official documentation on this so I am not 100% sure there aren't any others.
You can read more about how to utilize this technique here (see the section called Column-level permissions).
I had to deal with this issue myself. I currently solve it with views, but would rather choose RLS policies, triggers or privileged functions in the future (untested, as of right now). I share the notes from my research into this issue below.
CLS means to selectively prohibit column updates based on certain conditions. There are several alternative solutions for this (summary), each with advantages and disadvantages. They are discussed in detail below.
Option 1: RLS policies
(My favourite option so far, but I have not yet used it in practice.)
Here, you would use a row-level security (RLS) policy by manually retrieving the old row and comparing if your field's value would change from the old to the new value. A solution candiadate for this has been posted as a Stack Overflow answer, but this has to be made into a generic function still. At first, this seems better than a trigger: it shares its advantages and in addition, Supabase promotes the use of RLS policies for access control anyway, and has much better UI support for RLS than for triggers. So it would improve consistency and maintainability of the database by reducing complexity.
However, the Supabase RLS editor cannot be used for complex RLS policies (issue report), so as a workaround one should wrap all RLS code into a single or nested function call, or at least something no longer than one line of code. Even better is to maintain the SQL source code under version control outside of Supabase, and to copy-and-paste it into the Supabase SQL Editor whenever you want to change a RLS policy, table, function and so on.
Option 2: Triggers
See here for instructions.
Advantages:
Does not add another table or view, so that the database structure is determined by the data and not by permission system quirks, as it should be.
Does not require changes to the default Supabase permissions or table-to-schema assignments.
Combined the powers of RLS policies and column-level permissions.
Disadvantages:
Triggers are not yet supported well in the Supabase UI: only the trigger status can be changed, but it cannot be shown or edited in the UI, only in PostgreSQL console. In practice, this is not much of an issue, as for any real-life project you will have to work with the PostgreSQL database directly, anyway.
It requires knowledge of PGSQL or another programming language … and for some, programming is what we want to avoid with Supabase.
Option 3: Privileged Functions
"You can hide the table behind a FUNCTION with SECURITY DEFINER. The table itself would not provide UPDATE access, instead users can only update the table through the FUNCTION." (source)
And in that function, you can determine column-level access permissions in any way you like. Any such function in schema public is automatically available through the API:
"write PostgreSQL SQL functions […] and call them via supabase.rpc('function_name', {param1: 'value'});." (source).
The issue is, however, that the API then no longer has a unified structure of "everything is available in tables".
Option 4: User-specific views
See the instructions. More instructions:
"You can create a view to only show the columns you want, make sure you secure with a WHERE statement as it ignores RLS (normally), and then use RLS to block the original table." (source)
This solution has been recommended by a Supabase maintainer. In total, RLS policies and triggers seem preferable, though.
To make this solution secure, you have to use option security_barrier = on (details), which can severely impact view performance. The only way around that is to not use a WHERE clause and instead to re-use RLS policies of the base table via security_invoker = on. That requires moving the base table to a custom database scheme that is not exposed by API (see below).
Advantages:
Simple. Views are just like tables, and everyone knows PostgreSQL tables – in contrast to triggers or (complex) RLS policies.
You see what you edit. Users (or their applications) who can see records in the table do not have to worry if they are editable due to RLS policies. Whatever a user can see, they can edit.
❓ Extendable as needed. (Still unsure about this.) Only the columns a certain user is allowed to edit can be provided in the view. To find the right column, sometimes more context is needed. Not a problem: join the view and columns from the underlaying base table again as needed, at API access time. Only the surrogate primary key column id needs to be always included into the view; this is not an issue: if a user tries to edit it, it can only succeed when using new values, in which case effectively a new record is created, which the user is probably allowed to do anyway. (To be confirmed that updates with proper access protection are then still possible.)
Disadvantages:
Cluttering the table space. Ideally, the API would expose the data in the form they have in a proper database design. By exposing additional views, the API becomes unnecessarily complex.
Can not really reuse RLS policies of underlaying table. To be done by using security_invoker = on when creating the view (details). However, when doing this, the same user that can, say, update a record through the view can then also update that record in the base table, circumventing the column access restrictions for which the view is used. The only way around that would be to move the base table to a custom database scheme that is not exposed by API. That is possible, but adds yet more structural complexity.
Needs changes to the default view permissions. Since these are simple views, they are "updateable" views in PostgreSQL. Together with the default table-level / view-level permissions in Supabase schema public this means that all users, even anonymous ones, can delete records from these views, leading to the deletion of records in the underlaying tables.
To fix this, one has to remove the INSERT and DELETE privileges from the view. This is a change to the default Supabase permissions that would ideally not be necessary.
There is an alternative solution, but it is not very practical: you can create the views with security_invoker = on to reuse the RLS policies of the underlaying table. Then use these RLS policies to prevent record deletion. However, they have to allow SELECT and UPDATE; so unless you move the underlaying table to a schema not exposed by API, it would allow users to circumvent the column-level security for which the views were created.
No good way to restrict the use of certain values in a column to certain users. This is because views cannot have their own RLS policies. There are several ways to work around this:
Probably the best way to work around that is to structure tables so that a user with write access to a column is allowed to use every value in that column. For example, instead of columns role (user, admin) and status (applied, approved, disapproved), there would be nullable boolean columns user_application, admin_application, user_status, admin_status.
Another option, for complex cases, is to move the underlying table to a custom schema that is not API accessible (while still granting USAGE and permissions to all Supabase roles; see), to create RLS policies on that underlying table, and to re-use them in the views via security_invoker = on.
Another option, also for complex cases, is to use triggers on the view or the underlaying table.
Option 5: Column-level access rights
"You can provide UPDATE access to only a subset of columns of the table: GRANT UPDATE(col1, col2). (details)" (source)
Reportedly, it is a hassle to maintain all these access rights. And it would not be applicable in Supabase to differentiate between different authenticated users, as them all share the same role ("user") authenticated in Supabase. (PostgREST could offer different options for this.)
Option 6: Table Splitting
Compared to views, this splits the main table into multiple parts. Using RLS policies, it is defined who can do what with each partial table; and, different from views where you can only partially emulate RLS policies in a WHERE clause, a RLS policy can also be used to limit which values a user can use for a column. To use them together, they have to be joined in requests. Quite ok when splitting a table in two. But sometimes the splitting is almost "one table per column", for example for permission management tables with one column per role. This is bad because it "atomizes" the data rather than keeping it in a proper normal form, meaning that the data is not even accessible to admins in a comfortable way.
I looked around for this task I have on my hands but did not find anything helpful. I am primarily a Java person with sound knowledge of database from software development point of view. I do have some knowledge of DBA functions with what can and cannot be done but not able to come up with a good solution.
The task I have is to compare the databases created in SQL Server and Oracle by our application installer.
I think I have been able to come up with some queries (of course, by searching online) in SQL Server that will give me things like number of tables in a schema, each table's columns with data types and indexes, different types of constraints, triggers, etc. (with their count) created for each of those tables. I can provide those SQLs if somebody is interested.
However, Oracle seem to be more tricky. I would appreciate if somebody can help or maybe point me in the right direction.
I am trying to find out somethings like following:
Number of tables created
Number of indexes, constraints (with their types), triggers for each of those tables
Number of stored procedures/functions created
Number of views created
Any help will be greatly appreciated.
Thank you.
First off, if you are already comfortable writing Java code, I'm not sure that I would be writing a bunch of SQL to do this comparison. JDBC already has a DatabaseMetaData class that has methods like getTables to get all the tables. That would give you one API to work with and let you leverage the fact that the folks that wrote the JDBC drivers already wrote all the code to query the data dictionary tables in whatever database you are using. This will also let you focus on differences in how the objects your installer creates will be perceived by the application.
If you are going to write specific SQL, the Oracle data dictionary tables are pretty easy to work with. The ones you'll care about are going to follow the pattern [user|all|dba]_<<type of thing>>. The [user|all|dba] prefix indicates whether you are looking for objects that you own (user), objects that you have access to (all), or all objects in the database (dba). Normal users often don't have access to the dba views because that is a potential security issue-- generally you don't want people to know that an object exists if they don't have access to it. In my examples, I'll use the all versions of the objects but you can change all to user or dba depending on what you're after.
all_tables will show you information about all the tables you have access to. You probably want to add a filter on owner for the schema(s) that your installer touches since you may have access to tables that are not part of your application.
all_indexes, all_constraints, and all_triggers will show you information about indexes, constraints, and triggers. Again, you may want to add a predicate on owner to limit yourself to the schema(s) that you care about.
all_procedures will show you information about procedures and functions both stand-alone and in packages.
all_views will show you information about all views.
If you are really just interested in counts, you may be able to simply go to all_objects and do a count grouping by object_type. I'm guessing that you'll want to see attributes of the various objects so you'll want to go to the various object-specific views.
Background
building an online information system which user can access through any computer. I don't want to replicate DB and code for every university or organization.
I just want user to hit a domain like www.example.com sign in and use it.
For second user it will also hit the same domain www.example.com sign in and use it. but the data for them are different.
Scenario
suppose a university has 200 employees, 2nd university has 150 and so on.
Qusetion
Do i need to have separate employee table for each university or is it OK to have a single table with a column that has University ID?
I assume 2nd is best but Suppose i have 20 universities or organizations and a total of thousands of employees.
What is the best approach?
This same thing is for all table? This is just to give you an example.
Thanks
The approach will depend upon the data, usage, and client requirements/restrictions.
Use an integrated model, as suggested by duffymo. This may be appropriate if each organization is part of a larger whole (i.e. all colleges are part of a state college board) and security concerns about cross-query access are minimal2. This approach has a minimal amount of separation between each organization as the same schema1 and relations are "openly" shared. It leads to a very simple model initially, but it can become very complicated (with compound FKs and correct usage of such) if needing relations for organization-specific values because it adds another dimension of data.
Implement multi-tenancy. This can be achieved with implicit filters on the relations (perhaps hidden behinds views and store procedures), different schemas, or other database-specific support. Depending upon implementation this may or may not share schema or relations even though all data may reside in the same database. With implicit isolation, some complicated keys or relationships can be hidden/eliminated. Multi-tenancy isolation also generally makes it harder/impossible to cross-query.
Silo the databases entirely. Each customer or "organization" has a separate database. This implies separate relations and schema groups. I have found this approach to to be relatively simple with automated tooling, but it does require managing multiple database. Direct cross-querying is impossible, although "linked databases" can be used if there is a need.
Even though it's not "a single DB", in our case, we had the following restrictions 1) not allowed to ever share/expose data between organizations, and 2) each organization wanted their own local database. Thus, our product ended up using a silo approach. Make sure that the approach chosen meets customer requirements.
None of these approaches will have any issue with "thousands", "hundreds of thousands", or even "millions" of records as long as the indices and queries are correctly planned. However, switching from one to another can violate many assumed constraints and so the decision should be made earlier on.
1 In this response I am using "schema" to refer to the security grouping of database objects (e.g. tables, views) and not the database model itself. The actual database model used can be common/shared, as we do even when using separate databases.
2 An integrated approach is not necessarily insecure - but it doesn't inherently have some of the built-in isolation of other designs.
I would normalize it to have UNIVERSITY and EMPLOYEE tables, with a one-to-many relationship between them.
You'll have to take care to make sure that only people associated with a given university can see their data. Role based access will be important.
This is called a multi-tenant architecture. you should read this:
http://msdn.microsoft.com/en-us/library/aa479086.aspx
I would go with Tenant Per Schema, which means copying the structure across different schemas, however, as you should keep all your SQL DDL in source control, this is very easy to script.
It's easy to screw up and "leak" information between tenants if doing it all in the same table.
For a web application database, from a security standpoint only, what are arguments counter to the point for an sp only solution where the app db account has no rights to tables and views and only exec on sps?
If someone intercepts the app db account, the surface area exposed to an attack is much less then when tables and views aren't exposed. What security advantages would a non sp solution offer (or not)? I see many advantages to using a non sp solution, but exposing all the tables leaves me a little worried.
The question is for major database vendor products in general but specifically, sql server 2008.
From a security point of view only, I can't see any advantages a non-SP approach would have over an SP approach because:
you have to grant permissions directly to the underlying tables etc
with a sproc, all the real-underlying schema information can be encapsulated/hidden away (SPs can be encrypted too)
Let's take a system that needs to be really secure, say your company's accounting system. If you use procs and grant access only to the procs, then users cannot do anything other than what the proc does, ever. This is an internal control designed to make sure that the business rules for the system cannot be gotten around by any user of the system. This is what prevents people from making a company purchase and then approving the funds themselves opening up the door to fraud. This also prevents many people in the organization from deleting all records in the accounts table because they do not have delete rights except the ones granted from the proc which will allow only one delete at a time.
Now developers have to have more rights in order to develop, but they should not have more rights on a production machine ever if you want to consider security. True a developer could write a malicous sp which does something bad when put to prod. This same developer though could put the same code into the application version and be as likely to be caught or not causght as if they maliciously change a proc. Personally I think the proc might be easier to catch because it might get reveiwed separately from the code by the dbas which might mean the manager or configuration management person and the dbas had a chance to look at it vice just the manager or configuration management person. We all know reality is that no one pushing code to prod has the time to review each piece of it personally, so hiring trustworthy developers is critical. Having code review and source control in place can help find a malicious change or roll it back to a previous version but the use of sps vice application code are both at risk from developers no matter what.
The same is true for system admins. The must have full rights to the system in order to do their jobs. They can potentially do a lot of damage without being caught. The best you can do in this case is limit this access to as few people as possible and do the best you can in hiring trustworthy people. At least if you have few people with this access, it is easier to find the source of the problem if it occurs. You can minimize risk by having off-site backups (so at least what the admin breaks if they turn bad can be fixed to some extent) but you can never completely get rid of this risk. Again this is true no matter what way you allow the applications to access data.
So the real use of sps is not to eliminate all possible risk, but to make it so fewer people can harm the system. The use of application code to affect database information is inherently unsecure and in my opinion should not be allowed in any system storing financial information or personal information.
The biggest security advantage to not using stored procedures is clarity. You know exactly what an account can do, by seeing what access to tables it has. With stored procedures, this isn't necessarily the case. If an account has the ability to execute procedure X, that does limit the account to executing that and not hitting an underlying table, but X can do anything. It could drop tables, alter data, delete data etc.
To know what an account can do with stored procedures you have to look at the stored procedure. Each time a sproc is updated, someone will have to look at what it does to make sure that something didn't get "accidentally" placed in it. The real problem with security in sprocs comes from inside the organization, not from rogue attackers.
Here's an example:
Let's say you are trying to restrict access to the employee table. Without stored procedures, you just deny access to the table. To get access someone pretty much has to blatantly ask you to grant permissions. Sure they could get you to run a script to grant access, but most people at least try to review a script which alters the database schema (assuming the script doesn't update a sproc, which I will talk about below).
There are potentially hundreds of stored procedures for an application. In my experience, they get updated quite frequently, add a field here, delete one there. For someone to review the number of update procedure scripts all the time becomes daunting, and in most organizations the database team starts to only quickly look at the procedure (or not look at it all), and move it along. This is where the real problem comes in. Now, in this example, if someone on the IT staff wants to allow access to a table, that person just needs to slip in a line of code granting access or doing something else. In a perfect world this would get caught. Most of us don't work in a perfect world.
The real problem with stored procedures is that they add a level of obfuscation to the system. With obfuscation comes complexity, and with complexity comes ultimately more work to understand an administrate the underlying system. Most people in IT are overworked and things slip through. In this instance you don't try and attack the system to gain access, you use the person in charge of the system to get what you want. Mitnick was right, in security people are the problem.
The majority attacks against an organization come from the inside. Any time you introduce complexity into any system, holes appear, things can get overlooked. Don't believe it, think about where you work. Go through the steps about who you would ask to get access to a system. Pretty soon you realize that you can get people to overlook things at the right moment. The key to successfully penetrating a system with people involved is to do something which seems innocuous, but is really subversive.
Remember, if I am trying to attack a system: I am not your friend; I have no interest in your kids or hobbies; I will use you in any way necessary to get what I want; I don't care if I betray you. The idea of "but he was my friend and that's why I trusted him to believe what he was doing was correct," is no comfort after the fact.
This is one of those areas where conventional wisdom is correct: exposing just the stored procedures gives you more control over security. Giving direct access to tables and views is easier, and there are times you need to do it, but it's going to be less secure.
Well, I guess you really captured the core of the problem yourself: if you don't use stored procedures for all CRUD operations, you have to grant at least a app-specific db user account at least SELECT rights on all tables.
If you want to allow the db account to do even more work, that account might also need other permission, like being able to UPDATE and possibly DELETE on certain tables.
I don't see how a non-stored proc approach would have any security benefits - it does open up the gate just a bit more, the question really is: can you afford to? Can you secure that app-specific DB account enough so it won't compromise your system's overall security?
One possible compromise might be to use views or table access to allow SELECT, but handle everything else (UPDATEs, DELETEs, INSERTs) using stored procs - half secure, half convenient...
As it often is - this is a classic trade-off between convenience (non-sp approach; using an ORM possibly) and security (all SProc approach; probably more cumbersome, but a bit safer).
Marc
In addition to the traditional security separation with stored procedures (EXEC permission on procedures, rely on ownership chaining for data access) stored procedures can be code signed, resulting in very granular and specific access control to any server functionality like linked servers, server scoped management views, controlled access to stored procedures and even data in other databases outside of user ordinary access.
Ordinary requests made in T-SQL batches, no matter how fancy and how many layer upon layers of code generation and ORM are behind it, simply cannot be signed and thus cannot use one of the most specific and powerful access control mechanisms available.
It's an imperfect analogy, but I like to compare the tables in the DB's "dbo" schema to "private" data in OO terminology, and Views and Stored Procs to "public." One can even make a "public" schema separate from the dbo schema to make the distinction explicit. If you follow that idea, you get a security advantage as well as an extensibility advantage.
One account (not the web app's account) has dbo access and owns the database, and the web app connects using another account restricted to the public-facing structures.
The only possible argument against is that I have run into cases where certain statements cannot be effectively parameterized in an SP (and dynamic sql is required) and this gives you the possibility of in-SP SQL-injection. This is really a very narrow consideration however and it is a rare case. At least in PostgreSQL I have once in a while seen a few cases where this had to be subject to extra review.
On the whole even in these cases, I think that SP type approaches give you a benefit security-wise because they mean that the application can use generic anti-SQL-Injection mechanisms where it might not otherwise be possible, and your SP can be used by many applications. Additionally if all activity must go through SP's then you can reduce your exposure to sql-injection and centralize the audits for problems.
In general, the less a user can do the less security exposure generally there is. This means the less a user can do with an sql injection attack.
Stored procedures generally give better and more granular security than you can do without.
Most of the answers here specify the security advantages of using stored procedures. Without disregarding those advantages, there are a few big disadvantages that haven't been mentioned:
The data access patterns are sometimes much more important than a specific procedure that is being done. We want to log/monitor/analyze/raise alerts/block who access the data, when, and how. We can't always get this information when using stored procedures.
Some organizations may have tons of stored procedures. It is impossible to review all of them, and it may make more sense to focus on tables (especially when considering that stored procedures may be very complex, have bugs, and introduce other security issues).
Some organizations may require a separation of concerns. Database administrators (or anyone who writes stored procedures) are not always part of the security personal. It is sometimes necessary for the security personal to focus only on the data simply because they are not responsible for the business logic and the guys that do write the business logic, are not completely trusted.
I'm working on a single database with multiple database schemas,
e.g
[Baz].[Table3],
[Foo].[Table1],
[Foo].[Table2]
I'm wondering why the tables are separated this way besides organisation and permissions.
How common is this, and are there any other benefits?
You have the main benefit in terms of logically groupings objects together and allowing permissions to be set at a schema level.
It does provide more complexity in programming, in that you must always know which schema you intend to get something from - or rely on the default schema of the user to be correct. Equally, you can then use this to allow the same object name in different schemas, so that the code only writes against one object, whilst the schema the user is defaulted to decides which one that is.
I wouldn't say it was that common, anecdotally most people still drop everything in the dbo schema.
I'm not aware of any other possible reasons besides organization and permissions. Are these not good enough? :)
For the record - I always use a single schema - but then I'm creating web applications and there is also just a single user.
Update, 10 years later!
There's one more reason, actually. You can have "copies" of your schema for different purposes. For example, imagine you are creating a blog platform. People can sign up and create their own blogs. Each blog needs a table for posts, tags, images, settings etc. One way to do this is to add a column
blog_id to each table and use that to differentiate between blogs. Or... you could create a new schema for each blog and fresh new tables for each of them. This has several benefits:
Programming is easier. You just select the approppriate schema at the beginning and then write all your queries without worrying about forgetting to add where blog_id=#currentBlog somewhere.
You avoid a whole class of potential bugs where a foreign key in one blog points to an object in another blog (accidental data disclosure!)
If you want to wipe a blog, you just drop the schema with all the tables in it. Much faster than seeking and deleting records from dozens of different tables (in the right order, none the less!)
Each blog's performance depends only (well, mostly anyway) on how much data there is in that blog.
Exporting data is easier - just dump all the objects in the schema.
There are also drawbacks, of course.
When you update your platform and need to perform schema changes, you need to update each blog separately. (Added yet later: This could actually be a feature! You can do "rolling udpates" where instead of updating ALL the blogs at the same time, you update them in batches, seeing if there are any bugs or complaints before updating the next batch)
Same about fixing corrupted data if that happens for whatever reason.
Statistics for all the platform together are harder to calculate
All in all, this is a pretty niche use case, but it can be handy!
To me, they can cause more problems because they break ownership chaining.
Example:
Stored procedure tom.uspFoo uses table tom.bar easily but extra rights would be needed on dick.AnotherTable. This means I have to grant select rights on dick.AnotherTable to the callers of tom.uspFoo... which exposes direct table access.
Unless I'm completely missing something...
Edit, Feb 2012
I asked a question about this: SQL Server: How to permission schemas?
The key is "same owner": so if dbo owns both dick and tom schema, then ownership chaining does apply. My previous answer was wrong.
There can be several reasons why this is beneficial:
share data between several (instances
of) an application. This could be the
case if you have group of reference
data that is shared between
applications, and a group of data
that is specific for the instance. Be careful not to have circular references between entities in in different schema's. Meaning don't have a foreign key from an entity in schema 1 to another entity in schema 2 AND have another foreign key from schema 2 to schema 1 in other entities.
data partitioning: allows for data to be stored on different servers
more easily.
as you mentioned, access control on DB level