Related
Currently, I'm using a Supabase database. One of the big roadblocks that I'm facing is column-level security, which seems a lot more complicated than RLS.
Say that I have a column called is_banned, that is viewable but not editable. However, the rest of the columns should be both editable and viewable.
The only solution that I can really think of is splitting it into two tables and having RLS on the "sensitive information" table - but creating a private table for every table seems rather unnecessary.
Are there other solutions?
In PostgreSQL, you can specify column-level permissions via GRANT and/or REVOKE statements.
The tricky part here is that these permissions are set against PostgreSQL users/roles, NOT against your app users. So you need to ensure that the permissions are set against all users that Supabase uses to execute client requests. As far as I know, Supabase uses the anon and authenticated PostgreSQL roles to execute requests, however, there is no official documentation on this so I am not 100% sure there aren't any others.
You can read more about how to utilize this technique here (see the section called Column-level permissions).
I had to deal with this issue myself. I currently solve it with views, but would rather choose RLS policies, triggers or privileged functions in the future (untested, as of right now). I share the notes from my research into this issue below.
CLS means to selectively prohibit column updates based on certain conditions. There are several alternative solutions for this (summary), each with advantages and disadvantages. They are discussed in detail below.
Option 1: RLS policies
(My favourite option so far, but I have not yet used it in practice.)
Here, you would use a row-level security (RLS) policy by manually retrieving the old row and comparing if your field's value would change from the old to the new value. A solution candiadate for this has been posted as a Stack Overflow answer, but this has to be made into a generic function still. At first, this seems better than a trigger: it shares its advantages and in addition, Supabase promotes the use of RLS policies for access control anyway, and has much better UI support for RLS than for triggers. So it would improve consistency and maintainability of the database by reducing complexity.
However, the Supabase RLS editor cannot be used for complex RLS policies (issue report), so as a workaround one should wrap all RLS code into a single or nested function call, or at least something no longer than one line of code. Even better is to maintain the SQL source code under version control outside of Supabase, and to copy-and-paste it into the Supabase SQL Editor whenever you want to change a RLS policy, table, function and so on.
Option 2: Triggers
See here for instructions.
Advantages:
Does not add another table or view, so that the database structure is determined by the data and not by permission system quirks, as it should be.
Does not require changes to the default Supabase permissions or table-to-schema assignments.
Combined the powers of RLS policies and column-level permissions.
Disadvantages:
Triggers are not yet supported well in the Supabase UI: only the trigger status can be changed, but it cannot be shown or edited in the UI, only in PostgreSQL console. In practice, this is not much of an issue, as for any real-life project you will have to work with the PostgreSQL database directly, anyway.
It requires knowledge of PGSQL or another programming language … and for some, programming is what we want to avoid with Supabase.
Option 3: Privileged Functions
"You can hide the table behind a FUNCTION with SECURITY DEFINER. The table itself would not provide UPDATE access, instead users can only update the table through the FUNCTION." (source)
And in that function, you can determine column-level access permissions in any way you like. Any such function in schema public is automatically available through the API:
"write PostgreSQL SQL functions […] and call them via supabase.rpc('function_name', {param1: 'value'});." (source).
The issue is, however, that the API then no longer has a unified structure of "everything is available in tables".
Option 4: User-specific views
See the instructions. More instructions:
"You can create a view to only show the columns you want, make sure you secure with a WHERE statement as it ignores RLS (normally), and then use RLS to block the original table." (source)
This solution has been recommended by a Supabase maintainer. In total, RLS policies and triggers seem preferable, though.
To make this solution secure, you have to use option security_barrier = on (details), which can severely impact view performance. The only way around that is to not use a WHERE clause and instead to re-use RLS policies of the base table via security_invoker = on. That requires moving the base table to a custom database scheme that is not exposed by API (see below).
Advantages:
Simple. Views are just like tables, and everyone knows PostgreSQL tables – in contrast to triggers or (complex) RLS policies.
You see what you edit. Users (or their applications) who can see records in the table do not have to worry if they are editable due to RLS policies. Whatever a user can see, they can edit.
❓ Extendable as needed. (Still unsure about this.) Only the columns a certain user is allowed to edit can be provided in the view. To find the right column, sometimes more context is needed. Not a problem: join the view and columns from the underlaying base table again as needed, at API access time. Only the surrogate primary key column id needs to be always included into the view; this is not an issue: if a user tries to edit it, it can only succeed when using new values, in which case effectively a new record is created, which the user is probably allowed to do anyway. (To be confirmed that updates with proper access protection are then still possible.)
Disadvantages:
Cluttering the table space. Ideally, the API would expose the data in the form they have in a proper database design. By exposing additional views, the API becomes unnecessarily complex.
Can not really reuse RLS policies of underlaying table. To be done by using security_invoker = on when creating the view (details). However, when doing this, the same user that can, say, update a record through the view can then also update that record in the base table, circumventing the column access restrictions for which the view is used. The only way around that would be to move the base table to a custom database scheme that is not exposed by API. That is possible, but adds yet more structural complexity.
Needs changes to the default view permissions. Since these are simple views, they are "updateable" views in PostgreSQL. Together with the default table-level / view-level permissions in Supabase schema public this means that all users, even anonymous ones, can delete records from these views, leading to the deletion of records in the underlaying tables.
To fix this, one has to remove the INSERT and DELETE privileges from the view. This is a change to the default Supabase permissions that would ideally not be necessary.
There is an alternative solution, but it is not very practical: you can create the views with security_invoker = on to reuse the RLS policies of the underlaying table. Then use these RLS policies to prevent record deletion. However, they have to allow SELECT and UPDATE; so unless you move the underlaying table to a schema not exposed by API, it would allow users to circumvent the column-level security for which the views were created.
No good way to restrict the use of certain values in a column to certain users. This is because views cannot have their own RLS policies. There are several ways to work around this:
Probably the best way to work around that is to structure tables so that a user with write access to a column is allowed to use every value in that column. For example, instead of columns role (user, admin) and status (applied, approved, disapproved), there would be nullable boolean columns user_application, admin_application, user_status, admin_status.
Another option, for complex cases, is to move the underlying table to a custom schema that is not API accessible (while still granting USAGE and permissions to all Supabase roles; see), to create RLS policies on that underlying table, and to re-use them in the views via security_invoker = on.
Another option, also for complex cases, is to use triggers on the view or the underlaying table.
Option 5: Column-level access rights
"You can provide UPDATE access to only a subset of columns of the table: GRANT UPDATE(col1, col2). (details)" (source)
Reportedly, it is a hassle to maintain all these access rights. And it would not be applicable in Supabase to differentiate between different authenticated users, as them all share the same role ("user") authenticated in Supabase. (PostgREST could offer different options for this.)
Option 6: Table Splitting
Compared to views, this splits the main table into multiple parts. Using RLS policies, it is defined who can do what with each partial table; and, different from views where you can only partially emulate RLS policies in a WHERE clause, a RLS policy can also be used to limit which values a user can use for a column. To use them together, they have to be joined in requests. Quite ok when splitting a table in two. But sometimes the splitting is almost "one table per column", for example for permission management tables with one column per role. This is bad because it "atomizes" the data rather than keeping it in a proper normal form, meaning that the data is not even accessible to admins in a comfortable way.
I'm planning to develop a web application in CakePHP that shows information in graphics and cards. I chose CakePHP because the information that we need to show is very structured, so the model approach makes easier to manage data; also I have some experience with MVC from ASP.NET and I like how simple is to use the routing.
So, my problem is that the multiple organizations that could use the app would have their own database with a different schema that the one we need. I can't just set their string connection in the app.php file because their database won't match my model.
And the organization datasource couldn't fit my model for a lot of reasons: the tables don't have the same name, the schema is different, the fields of my entity are in separated tables, maybe they have the info in different databases or also in different DBMS!
I want to know if there's a way to make an interface that achieves this
In such a way that cakephp Model/Entity can use data regardless of the source. Do you have any suggestions of how to do that? Does CakePHP have an option to make this possible? Should I use PHP with some kind of markup language like JSON or XML? Maybe MySQL has an utility to transform data from different sources into a view and I can make CakePHP use the view instead of the table?
In case you have an answer be as detailed as you can.
This other options are possible if it's impossible to make the interface:
- Usw another framework that can handle this easier and has the features I mentioned above.
- Make the organization change their database so it matches my model (I don't like this one, and probably they won't do it).
- Transfer the data in the application own database.
Additional information:
The data shown in graphics are from students in university. Any university has its own database with their own structure and applications using the db, that's why isn't that easy to change structure. I just want to make it as easy as possible to any school to configure their own db.
EDIT:
The version is CakePHP 3.2.
An important appointment is that it doesn't need all CRUD operations, only "reading". Hope that makes the solution easier.
I don't think your "question" can be answered properly, it doesn't contain enough information, not enough details. I guess there is something that will stay the same for all organizations but their data and business logic will be different. But I'll try it.
And the organization datasource couldn't fit my model for a lot of reasons: the tables don't have the same name, the schema is different, the fields of my entity are in separated tables, maybe they have the info in different databases or also in different DBMS!
Model is a whole layer, so if you have completely different table schemas your business logic, which is part of that layer, will be different as well. Simply changing the database connection alone won't help you then. The data needs to be shown in the views as well and the views must be different as well then.
So what you could try to do and what your 2nd image shows is, that you implement a layer that contains interfaces and base classes. Then create a Cake plugin for each of the organizations that uses these interfaces and base classes and write some code that will conditionally use the plugin depending on whatever criteria (guess domain or sub-domain) is checked. You will have to define the intermediate interfaces in a way that you can access any organization the same way on the API level.
And one technical thing: You can define the connection of a table object in the model layer. Any entity knows about it's origin but you should not implement business logic inside an entity nor change the connection through an entity.
EDIT: The version is CakePHP 3.2. An important appointment is that it doesn't need all CRUD operations, only "reading". Hope that makes the solution easier.
If that's true either use the CRUD plugin (yes, you can use only the R part of it) or write some code, like a class that describes the organization and will be used to create your table objects and views on the fly.
Overall it's a pretty interesting problem but IMHO to broad for a simple answer or solution that can be given here. I think this would require some discussion and analysis to find the best solution. If you're interested in consulting you can contact me, check my profile.
I found a way without coding any interface. In fact, it's using some features already included in the DBMS and CakePHP.
In the case that the schema doesn't fit the model, you can create views to match de table names and column names from the model. By definition, views work as a table so CakePHP searches for the same table name and columns and the DBMS makes the work.
I made a test with views in MySQL and it worked fine. You can also combine the data from different tables.
MySQL views
SQL Server views.
If the user uses another DBMS you just change the datasource in app.php, and make the views if it's necessary
If the data is distributed in different DBMS, CakePHP let's you set a datasource for each table, you just add it to app.php and call it in the table if it's required.
Finally, in case you just need the "reading" option, create a user with limited access to the views and only with SELECT privileges.
USING:
CakePHP 3.2
SQL SERVER 2016
MySQL5.7
I'm working on a web application that allows users to create simple websites and publish them on a static web host, and I have problems deciding if and how I should use ancestors in the gray area between necessary and avoidable.
The model is rather simple: currently every User has one or more Website entities. The Website entities store all the basic information about a website, plus it's nested navigation menu that refers to Page entities (the navigation tree is stored as a JSON property). The Page entity types are based on a PolyModel, and there are several page types that behave differently (There's a GalleryPage, for example).
There are no entity groups (or rather, no entities with ancestors) as of yet, and I'll only need a couple of transactions. When updating a Page's name, for example, I have to update it in the Page entity itself as well as in the navigation tree on the Website entity.
I think I understand how entity groups work and the basic implications of using them, but I have trouble deciding on the "best" way to structure my data in the absence of strong reasons for either approach. I could:
Go entirely without ancestors on my entities. As far as I understand I can still use cross-group transactions as long as I get the entities by key and don't need more than 5 within the transaction. The downside is that I'd depend on the XG transactions and there might come a point where I can't ninja my way around using ancestor queries anymore (and then it might be too late).
Make the user object the parent of all his Website's, Page's and other data. This would give the user a strongly consistent view of all of his data, allow me to use transactions whenever I add a feature that would need them, but limits the sustained writes to 1-5/sec. But, as a user will only ever be updating his own data, this might actually work and behave just the same for 1000 users as it will for 1.
Try to use even smaller entity groups (like seperating the navigation from the Website and making that the parent of the Website's Pages). But I'm not quite sure if there's much benefit to this, because most of the editing happens on Pages anyway.
So I guess the real question is: how do you decide when to use ancestor relationships on App Engine when there's no obvious reason for or against them? Would you go for the convenience of strongly consistent queries and being able to use transactions freely while adding features later, or would you avoid them at all costs until there's a very obvious reason for them, even if might limit my ability to do transactions later?
I read the related documentation, read the chapter on transactions in "Programming App Engine", looked at quite a few of the Google I/O videos, but I still find it hard to make that decision.
People seem to avoid building user interfaces that pull their information (names, field types, etc. as well as relationships) from a database; they instead hard-code forms (and tables, etc.) that have pretty much the same data names and types and things.
Am I making sense?
For instance, imagine an emumerated field in MySQL: why not just have the UI construct a drop-down list whenever it encounters an ENUM? Why put the same values in both the database and the code?
Perhaps I'm just missing something; perhaps there are projects out there that do this — sort of super-crud interfaces that can be pointed at any database and from it build a fully-functional relationally-aware user interface. Are there?
I'm possibly not quite conforming to the stackoverflow norms with this question; I shall summarise:
Can you please tell me of a project that constructs its user interface (solely) from analysis of the database schema?
Why is this not a common way to do it — surely it is good to only define data structure in one place (i.e. the database)?
Thank you, and may joyous code-love rain upon your IDE.
I'd like to point out that, last time I checked, .NET and Qt (and probably other environments) make it possible to use "database-aware widgets" (sometimes shortened to just data-aware widgets), which is probably the best pragmatic solution available. What I mean by data-aware widgets is that the widgets themselves know that they're linked directly to database fields, so you would have a combobox that knows that it's backed by an enum and fetches the possible values directly from the database at runtime, just like you suggested.
This is a really neat utility, and used well, it probably won't hurt anything. It still requires that you spend some time laying out widgets manually on a form, but then if you update the database to add a new value to that enum, you don't have to rebuild your app to see it show up in the UI.
But the reason most usability experts will cringe when they hear your question is because programmers tend to think that, well, why not just generate the entire UI, form layout and everything, from the database? And this is where it starts to get really nasty, really fast.
Let's say you have a simple Person table, with first_name, last_name, email_address, street_address, city, state, zip, and phone_number. You want to automatically generate a UI based on these fields. How do you sort the fields? I mean, ideally, first name and last name should be right next to one another. And it would look very silly if you had city and state before street address. So you have to add a new column to the table to specify sort order, if you go with the quickest method, or a new table to specify each field's order index to their field ID.
What if you want to group parts of the information separately? Then you have to add more UI-specific cruft into your database layout (to do this generically, you'll need a new table specifying which UI fields belong to which UI groupboxes). So you've only solved two problems and already your database layout has gotten twice as ugly, plus now instead of a simple O(1) layout operation when you load the UI, you've gotta do several database queries to find out what fields exist and dynamically lay them out while applying the correct widget order... and we haven't even dealt with sizing (should every field be the maximum size to fit its possible contents, or should all text fields be the same width? Wouldn't it be nice if you could say that some text fields should be one width and height, and some should be another combination? etc), or text justification, or formatting, or any other really common elementary usability requirements that will require further sacrifices from the clarity and simplicity of your database schema.
Most visual database editors. phpMyAdmin for instance.
Because the database structure isn't always a very good logical structure for a user to be using, especially in the case of databases that have been denormalised on purpose for efficiency reasons.
Yup, this route has already been traveled.
Simply pointing at a database will create an oversimplified UI, not giving much more than the CRUD of an Access UI. That's why Naked Objects (I'm one of its committers) builds its metamodel from a pojo domain model. This allows the UI to expose any public methods as menus ... we call this "behaviourally complete".
Per the comment about the UI not being suitable for end-users, I have two points:
distinguish between power users vs casual users. Most internal apps are for the former (we use Alan Coopers' term of a "sovereign application" for this), who understand the domain and don't want fancy UI stuff getting in the way. Most external apps, eg public web sites, are for the latter.
for the latter, there's nothing to prevent the autogenerated UI of a tool like Naked Objects being replaced with a custom or semi-customized viewer. One such viewer is Scimpi, I'm also working on an Eclipse RCP viewer that'll expose extension points. But even here, the auto gen UI is still very valuable for the development team and business analysts for exploration and prototyping.
Hope some of the above has piqued your interest. If you want more, google around, or you might want to check out my book on domain-driven design and NO, at pragprag.com.
HTH
Dan
List of projects that implement this idea.
.NET
dotObjects
Naked Objects
TrueView
Java
Domain Object Explorer
JMatter
Naked Objects
Sanssouci
Trails
Lablz
C++
Typical Objects
Specifically to your second question: Alot of it really depends on your data model. Some are very complicated and would lead to un-intuitive user interfaces. Perhaps for simply CRUD based systems, having your UI be a front end to the database would be preferable. In that case, I think that some of these tools would be great. However, for some more complicated systems where some db data needs to be hidden from the users, it would be better if you UI didn't mirror the db schema.
Microsoft Access has used this model for years - the database and UI development are very closely tied. You can auto-generate a form directly from a table definition with smart defaults and search built in. The model works well for developing applications with few concurrent users such as custom applications for small businesses where the amount of data stored is small.
If you are scaling to larger relational DBs with a number of concurrent users, or large databases then reliability and performance become more important, and separately constructed UI and databases make more sense. When more users are involved they often have different requirements so decoupling the UI from the DB schema makes it more efficient to develop.
Just a note on Java "projects that implement this idea" - tynamo is the new version of Trails framework
There are many systems that build an interface for you to edit stuff directly from table information. End-user interfaces, however, must be tweaked a little bit. You may not want to reveal to the user every field in your table.
Frameworks that make good use of the MVC design pattern can let you do all kinds of things with your models, which are the preferred way to build new systems (rather than creating database tables directly).
To answer your questions specifically:
django allows you to construct forms (and a complete admin CMS) out of models.
It is a common thing to do.
Naked Objects is about one step removed from this. They base the UI on an object model, and then persist the object model.
I think you are forgetting to consider the user in your design process if you are thinking like that. Bad mistake. Users don't like it when the interface changes, they would especially not like it if it changed frequently as they then wouldn't know what to do. Further, if you generate your UI on the fly based on the database structure, then what order would the objects be in? UIs need to have objects in an order that makes sense to the users not the database designers.
Further in a well-designed database there are fields that are not meant for the users to see. Things like numeric keys, insert date, last updated etc. You don't want to automatically expose these to the users and you certainly don't want them to have the ability to mess with the data in such fields.
Finally, if you don't think about the functionality of the page, then you aren't doing your job. A UI needs to be more than just a list of fileds that can be edited. You need to have constraints on who can see what, checks of the data before inserting to the database, business rules that need to be applied. You can't just autogenerate a lot of this (and you shouldn't even if you could!). Design needs thought and care.
Now as to drop down lists, of course you can generate them from the database and not the code, in fact it is the better choice. Just make the query the source for your particular object, not a list generated in code.
You can do it with the help of this cool tool from a developer in Philippines, it is called COBALT. You can download it here.
I've heard that exposing database IDs (in URLs, for example) is a security risk, but I'm having trouble understanding why.
Any opinions or links on why it's a risk, or why it isn't?
EDIT: of course the access is scoped, e.g. if you can't see resource foo?id=123 you'll get an error page. Otherwise the URL itself should be secret.
EDIT: if the URL is secret, it will probably contain a generated token that has a limited lifetime, e.g. valid for 1 hour and can only be used once.
EDIT (months later): my current preferred practice for this is to use UUIDS for IDs and expose them. If I'm using sequential numbers (usually for performance on some DBs) as IDs I like generating a UUID token for each entry as an alternate key, and expose that.
There are risks associated with exposing database identifiers. On the other hand, it would be extremely burdensome to design a web application without exposing them at all. Thus, it's important to understand the risks and take care to address them.
The first danger is what OWASP called "insecure direct object references." If someone discovers the id of an entity, and your application lacks sufficient authorization controls to prevent it, they can do things that you didn't intend.
Here are some good rules to follow:
Use role-based security to control access to an operation. How this is done depends on the platform and framework you've chosen, but many support a declarative security model that will automatically redirect browsers to an authentication step when an action requires some authority.
Use programmatic security to control access to an object. This is harder to do at a framework level. More often, it is something you have to write into your code and is therefore more error prone. This check goes beyond role-based checking by ensuring not only that the user has authority for the operation, but also has necessary rights on the specific object being modified. In a role-based system, it's easy to check that only managers can give raises, but beyond that, you need to make sure that the employee belongs to the particular manager's department.
There are schemes to hide the real identifier from an end user (e.g., map between the real identifier and a temporary, user-specific identifier on the server), but I would argue that this is a form of security by obscurity. I want to focus on keeping real cryptographic secrets, not trying to conceal application data. In a web context, it also runs counter to widely used REST design, where identifiers commonly show up in URLs to address a resource, which is subject to access control.
Another challenge is prediction or discovery of the identifiers. The easiest way for an attacker to discover an unauthorized object is to guess it from a numbering sequence. The following guidelines can help mitigate that:
Expose only unpredictable identifiers. For the sake of performance, you might use sequence numbers in foreign key relationships inside the database, but any entity you want to reference from the web application should also have an unpredictable surrogate identifier. This is the only one that should ever be exposed to the client. Using random UUIDs for these is a practical solution for assigning these surrogate keys, even though they aren't cryptographically secure.
One place where cryptographically unpredictable identifiers is a necessity, however, is in session IDs or other authentication tokens, where the ID itself authenticates a request. These should be generated by a cryptographic RNG.
While not a data security risk this is absolutely a business intelligence security risk as it exposes both data size and velocity. I've seen businesses get harmed by this and have written about this anti-pattern in depth. Unless you're just building an experiment and not a business I'd highly suggest keeping your private ids out of public eye. https://medium.com/lightrail/prevent-business-intelligence-leaks-by-using-uuids-instead-of-database-ids-on-urls-and-in-apis-17f15669fd2e
It depends on what the IDs stand for.
Consider a site that for competitive reason don't want to make public how many members they have but by using sequential IDs reveals it anyway in the URL: http://some.domain.name/user?id=3933
On the other hand, if they used the login name of the user instead: http://some.domain.name/user?id=some they haven't disclosed anything the user didn't already know.
The general thought goes along these lines: "Disclose as little information about the inner workings of your app to anyone."
Exposing the database ID counts as disclosing some information.
Reasons for this is that hackers can use any information about your apps inner workings to attack you, or a user can change the URL to get into a database he/she isn't suppose to see?
We use GUIDs for database ids. Leaking them is a lot less dangerous.
If you are using integer IDs in your db, you may make it easy for users to see data they shouldn't by changing qs variables.
E.g. a user could easily change the id parameter in this qs and see/modify data they shouldn't http://someurl?id=1
When you send database id's to your client you are forced to check security in both cases. If you keep the id's in your web session you can choose if you want/need to do it, meaning potentially less processing.
You are constantly trying to delegate things to your access control ;) This may be the case in your application but I have never seen such a consistent back-end system in my entire career. Most of them have security models that were designed for non-web usage and some have had additional roles added posthumously, and some of these have been bolted on outside of the core security model (because the role was added in a different operational context, say before the web).
So we use synthetic session local id's because it hides as much as we can get away with.
There is also the issue of non-integer key fields, which may be the case for enumerated values and similar. You can try to sanitize that data, but chances are you'll end up like little bobby drop tables.
My suggestion is to implement two stages of security.
"Security through obscurity": You can have integer Id as primary key and Gid as GUID as surrogate key in tables. Whereas integer Id column is used for relations and other database back-end and internal purposes (and even for select list keys in web apps to avoid unnecessary mapping between Gid and Id while loading and saving) and Gid is used for REST Urls i.e for GET,POST, PUT, DELETE etc. So that one cannot guess the other record id. This gives first level of protection against guess-based attacks. (i.e. number series guessing)
Access based control at Server side : This is most important, and you have various way to validate the request based on roles and rights defined in application. Its up to you to decide.
From the perspective of code design, a database ID should be considered a private implementation detail of the persistence technology to keep track of a row. If possible, you should be designing your application with absolutely no reference to this ID in any way. Instead, you should be thinking about how entities are identified in general. Is a person identified with their social security number? Is a person identified with their email? If so, your account model should only ever have a reference to those attributes. If there is no real way to identify a user with such a field, then you should be generating a UUID before hitting the DB.
Doing so has a lot of advantages as it would allow you to divorce your domain models from persistence technologies. That would mean that you can substitute database technologies without worrying about primary key compatibility. Leaking your primary key to your data model is not necessarily a security issue if you write the appropriate authorization code but its indicative of less than optimal code design.