How to design a user permission handling database? [closed] - database

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
We have a little problem in one of our projects, where two investors are architects and... as it usually is in life, they don't really get along with some of the ideas. Both have different experiences with previous projects, and it seems they look down upon the ideas of the other one. Yep, I'm one of them.
We have an argument over how to define user permission handling in one our project.
One idea is to have table with permissions, roles which gather sets of permissions and then users who have a role defined.
User
user_id
role_id
Role
role_id
permission_id
Permission
permission_id
The other side would like to propose to do it using a table with columns defining permissions:
User
user_id
role_id
Role
role_id
can_do_something
can_do_something_else
can_do_something_even_different
My take on the first option is that it's far cheaper to maintain:
adding a single permission means it's just one insert + handling of the permission in the code.
In case of the other (to me) it means that you have to alter the database, alter the code handling the database and on top of that, add code to handle the permission.
But maybe I'm just wrong, and I don't see some possible benefits of the other solution.
I always thought the former is the standard to handle it, but I'm told that it's subjective and that making a change in the database is a matter of just running a script (where for me it means that the script has to be added to the deployment, has to be run on every database and in case of migration has to be "remembered" etc.)
I know the question could be opinion based, but I'm kind of hoping, this really is a matter of standards and good practice, rather then subjective opinion.

I posted some other questions as comments to your original question.
Even if you had a completely flat role setup I cannot think of a reason to go for the second proposal. As you argue changing something will require modifying code and data structure.
What your colleague is proposing is a sort of denormalization which is only defensible in case you need to optimize for speed in handling large quantities of data. Which is not usually the case when dealing with roles.
(As an example, LDAP or other general-purpose single-sign-on models adopt something closer to your first solution, because even in a large organization the number of USERS is always larger than the number of ROLES by at least one order of magnitude).
Even if you were designing a Facebook replacement (where you may have billions of users) it is really improbable that you will need more than a handful of roles so this would be a case of premature optimization (and - most probably - made worse by optimizing the wrong part).
In a more general sense I strongly suggest to read the RBAC Wikipedia article for what is considered the standard approach to this kind of problems.

Related

How do you handle collection and storage of new data in an existing system? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I am new to system design and have been asked to solve a problem.
Given a car rental service website, I need to work on a new feature.
The company has come up with some more data that they would like to capture and analyze along with the data that they already have.
This new data can be something like time and cost to assemble a car.
I need to understand the following:
1: How should I approach the problem, from API design perspective?
2: Is changing the schema of your tables going to do any good, if that is an option?
3: Which databases can be used?
The values once stored can be changed. For example, the time to assemble can reduce or increase, hence the users should be able to update the values.
To answer you question let's divide it in two parts, ideal architecture and Q&A's
Architecture:
A typical system would consist of many technologies working together to solve a practical problem. Problems can be solved in many ways and may have more than one solution. We are not talking about efficiency and effectively of any architecture here as it's whole new subject to explore. But it's always wise to choose what's best for your use case.
Since you already have existing software built, it's always helpful to follow it's existing design pattern which will help you understand existing code in detail and allow you to create logical blocks which will fit nicely and actually help in integrating functionality instead of working against it.
Since this clears the pre planning phase let's discuss on how this affects what solution is ideal for your use case in my opinion.
Q&A's
1. How should I approach the problem, from API design perspective?
There will be lots of assumption, anything but less system consisting of api should have basic functionality of authentication and authorization when ever needed. Apart from that, try to stick to full REST specification, which will allow API consumers to follow standard paths and integration would have minimal impact when deciding what endpoints would look like and what they expect from consumer.
Regardless, not all systems are ideal for such use case and thus it's in up to system designer how much of system is compatible with standard practices.
Name convention matters when newer version api will have api/v2
paths and old one having api/v1, which is good practice for routing
new functionality. Which allows system to expand seamlessly.
2: Is changing the schema of your tables going to do any good, if that is an option?
In short term when you do not have much data, it's relatively easy to migrate data. When it becomes huge, it's much more painful and resource intensive.
Good practices would allow you to prevent such scenarios where you might not need migrate data.
Database normalization becomes so crucial in such cases when potential data structure would grow rapidly and requires attention.
Regardless of using any sql or nosql solution, a good data structure will always be helpful in both data management and programming implementation.
In my opinion, getting data structures near perfect is always a good
idea, because it will reduce future costs of migration and
frustration it brings. Still some use cases requires addition on
columns and its okay to add them as long as it does not have much
impact on existing code. Otherwise it can always be decoupled in
separate table for additional fields.
3: Which databases can be used?
Typically any rdbms is enough for this kind of tasks. You might be surprised when you see case studies of large data creators still using mysql in clusters.
So answer is, as long as you have normal scenario, go ahead and pick any database of your choice, until you hit its single instance scalability limits. And those limits are pretty huge for small to mod scale apps.
How should I approach the problem, from API design perspective?
Design a good data model which is appropriate for the data it needs to store. The API design will follow from the data model.
Is changing the schema of your tables going to do any good, if that is an option?
Does the new data belong in the existing tables? Then maybe you should store it there. Except: can you add new columns without breaking any existing applications? Maybe you can but the regression testing you'll need to undertake to prove it may be ruinous for your timelines. Separate tables are probably the safer option.
Which databases can be used?
You're rather vague about the nature of the data you're working with, but it seems structured (numbers?). So that suggests a SQL with strong datatypes would be the best fit. Beyond that, use whatever data platform is currently being used. Any perceived benefits from a different product will be swept away by the complexities and hassle of deploying it.
Last word. Talk this over with your boss (or whoever set you this task). Don't rely on the opinions of some random stranger on the interwebs.

Database schema for Partners [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
We have an application to manage company, teams, branches,employee etc and have different tables for that. Now we have a requirement that we have to give access of same system to our technology partners so that they can also do the same thing which we are doing. But at the same time we need to supervise these partners in our system.
So in terms of DB schema what will be the best way to manage them:
1)To duplicate the entire schema for partners, and for that we have to duplicate around 50-60 tables and many more in future as system will grows.
2)To create some flag in each table which will tell it is internal or external entity.
Please suggest if anyone has any experience.
Consider the following points before finalizing any of the approaches.
Do you want a holistic view of the data
By this I mean that do you want to view the data your partner creates and which you create in a single report / form. If the answer is yes then it would make sense to store the database in the same set of tables and differentiate them based on some set of columns.
Is your application functionality going to vary significantly
If the answer to this question is NO then it would make sense to keep the data in the same set of tables. This way any changes you do to your system will automatically reflect to all the users and you won't have to replicate your code bits across schemas / databases.
Are you and your partner going to use the same master / reference data
If the answer to this question is yes then again it makes sense to use the same set of tables since you will do away with unnecessary redundant data.
Implementation
Rather than creating a flag I would recommend creating a master table known as user_master. The key of this table should be made available in every transaction table. This way if you want to include a second partner down the line you can make a new entry in your user_master table and make necessary modifications to your application code. Your application code should manage the security. Needless to say that you need to implement as much security as possible at the database level too.
Other Suggestions
To physical separate data of these entities you can either implement
partitioning or sharding depending upon the db you are using.
Perform thorough regression testing and check that your data is not
visible in partner reports or forms. Also, check that partner is not
able to update or insert your data.
Since the data in your system will increase significantly it would
make sense to performance test your reports, forms and programs.
If you are using indexes then you will need to revisit those since
your where conditions would change.
Also, revisit your keys and relationships.
None of your asked suggestion is advisable. You need to follow given guideline to secure your whole system and audit your technology partner as well.
[1]You should create a module on Admin side which will show you existing tables as well table which will be added in future.
[2]Create user for your technology partner and provide permission on those objects.
[3]Keep one audit-trail table, and insert entry of user name/IP etc.in it. So you will have complete tracking of activity carried out by your technology partner.

Database constraints vs Application level validation [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
While researching the topic, I came across this post: Should you enforce constraints at the database level as well as the application level?
The person who answered the question claimed that we should enforce Database constraint because it is "easier, integrity, flexible".
The reason I brought out this question is because of my recent maintenance work in one of a very robust systems. Due to a change in business rule, one of the data columns used to have CHAR(5) is now accepting 8 Characters. This table has many dependencies and will also affect many other tables not only in the database but also a few other systems, thus increasing the size to CHAR(8) is literally impossible.
So my question goes back to the database design - wouldn't it be so much easier if you reduce or even eliminate the need of database constraints? If the above mentioned scenario would have happened, all you have to do is to change the front-end or application level validation to make sure the user enter 8 characters for that field.
In my opinion, we should minimize the database constraint to anticipate any changes in the data structure in the future. What is your thought?
It's easier to maintain 100 tables than 100,000 lines of code. In general, constraints that are enforced in the application but not in the database have to be replicated across many applications. Sometimes those applications are even written and maintained by different teams.
Keeping all those changes in sync when the requirements change is a nightmare. The ripple effect is even worse than the cases you outline for changing a five character field into an 8 character field. This is how things were done before databases were invented.
Having said that, there are situations where it's better to enforce the constraints in applications than in the database. There are even cases where it's better to enforce a constraint in both places. (Example: non null constraint).
And very large organizations sometimes maintain a data dictionary, where every data item is cataloged, defined, and described in terms of features, including constraints. In this kind of environment, databases actually acquire their data definitions from the dictionary. And application programs do the same thing, generally at precompile time.
Future proofing such an arrangement is still a challenge.
I agree with you that, constraints like the length of the field should be avoided, you never know how your business will changed. and hardware nowadays are cheep, it really not necessary to use CHAR(8) just for less storage.
But those contraints like not null constraints,duplicate check and foreignkey constraints for a header details table is better to be kept. it's like the goal keeper of your data intergrate.
Database systems provide a number of benefits, one of the most important is (physical) data independence. Data independence can be defined as an immunity of application program to change in the way that the data is physically stored and accessed, this concept is tightly related to data-model design and normalization roles where data constraints are fundamental.
Database sharing is one of the application integration patterns, widely used between independent applications. Tradeoff will be trying to spread data integrity code in all applications or in a centric fashion inside database.
Minimizing the database constraint will minimize usage of wide range of well-known, proven technologies developed over many years by a wide variety of very smart people.
As a foot note:
This table has many dependencies and will also affect many other
tables not only in the database but also a few other systems
Beside this smells redundancy, at least it shows the side effect of the change. Think about when you have to find the side effects with code review!
Application comes, applications go but data remains.

Approaches to finding / controlling illegal data [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Search and destroy / capturing illegal data...
The Environment:
I manage a few very "open" databases. The type of access is usually full select/insert/update/delete. The mechanism for accessing the data is usually through linked tables (to SQL-Server) in custom-build MS Access databases.
The Rules
No social security numbers, etc. (e.g., think FERPA/HIPPA).
The Problem
Users enter / hide the illegal data in creative ways (e.g., ssn in the middle name field, etc.); administrative/disciplinary control is weak/ineffective. The general attitude (even from most of the bosses) is that security is a hassle, if you find a way around it then good for you, etc. I need a (better) way to find the data after it has been entered.
What I've Tried
Initially, I made modifications to the various custom-built user interfaces folks had (that I was aware of), all the way down to the table structures that they were linking to our our database server. The SSN's, for example, no longer had a field of their own, etc. And yet...I continue to find them buried in other data fields.
After a secret audit some folks at my institution did, where they found this buried data, I wrote some sql that (literally) checks every character in every field field in every table of the database looking for anything that matched an ssn pattern. It takes a long time to run, and the users are finding ways around my pattern definitions.
My Question
Of course, a real solution would entail policy enforcement. That has to be addressed (way) above my head, however, it is beyond the scope and authority of my position.
Are you aware of or do you use any (free or commercial) tools that have been targeted at auditing for FERPA & HIPPA data? (or if not those policies specifically, then just data patterns in general?
I'd like to find something that I can run on a schedule, and that stayed updated with new pattern definitions.
I would monitor the users, in two ways.
The same users are likely to be entering the same data, so track who is getting around the roadbloacks, and identify them. Ensure that they are documented as fouling the system, so that they are disciplined appropriately. Their efforts create risk (monetary and legal, which becomes monetary) for the entire organization.
Look at the queries that users issue. If they are successful in searching for the information, then it is somehow stored in the repository.
If you are unable to track users, begin instituting passwords.
In the long-run, though, your organization needs to upgrade its users.
In the end you are fighting an impossible battle unless you have support from management. If it's illegal to store an SSN in your DB, then this rule must have explicit support from the top. #Iterator is right, record who is entering this data and document their actions: implement an audit trail.
Search across the audit trail not the database itself. This should be quicker, you only have one day (or one hour or ...) of data to search. Each violation record and publish.
You could tighten up some validation. No numeric field I guess needs to be as long as an SSN. No name field needs numbers in it. No address field needs more that 5 or 6 numbers in it (how many houses are there on route 66?) Hmmm Could a phone number be used to represent an SSN? Trouble is you can stop someone entering acaaabdf etc. (encoding 131126 etc) there's always a way to defeat your checks.
You'll never achieve perfection, but you can at least catch the accidental offender.
One other suggestion: you can post a new question asking about machine learning plugins (essentially statistical pattern recognition) for your database of choice (MS Access). By flagging some of the database updates as good/bad, you may be able to leverage an automated tool to find the bad stuff and bring it to your attention.
This is akin to spam filters that find the bad stuff and remove it from your attention. However, to get good answers on this, you may need to provide a bit more details in the question, such as the # of samples you have (if it's not many, then a ML plugin would not be useful), your programming skills (for what's known as feature extraction), and so on.
Despite this suggestion, I believe it's better to target the user behavior than to build a smarter mousetrap.

Should I give a client a SQL Server login with the 'db_owner' role? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
One of our clients has requested that we include the 'db_owner' role on the database login that their website uses, so that they can upload a script (an ASP page) to run some database changes. Normally the logins for the databases hosted on our server only include 'db_reader' and 'db_writer'. Is this ok, or should I request that they forward us the sql script to run on their behalf?
Or am I being too protective? Thanks
I would suggest that you act as a filter between them and anything they might want to do to the database such as uploading and running those scripts. If they get db_owner and hose it all up, it will still probably be your head on the chopping block for letting them have it to begin with.
I think that I would want to have a service level agreement that is acceptable to everyone before I would give out that much control over the database. For example, you could specify that if the client damages their databases in a way that they can't fix, your response would be limited to restoring it to a backup point of their choosing within a certain timeframe. You might also require them to maintain a specific technical contact for database issues who will be the first contact for their developers, etc. The SLA should spell out the various risks, including loss of data, inherit in having this level of capability.
In general, I'm in favor of giving more control, rather than less, if the client is willing to accept the responsibility. As a person who uses such services, I know that it can definitely improve productivity if I'm allowed to make the changes that need to be made without having to jump through hoops. I'm also willing to accept the risks involved, but I clearly know what the implications are.
What kind of scripts are they running?
Rather then providing them direct access you could provide some kind of interface as TheTXI suggested. I would be very concerned about giving db_owner access unnecessarily.
That might be you, or a team member, or depending on the type of scripts you may be able to provide them some kind of web interface (thus allowing you to at least wrap some validation around the script).
But if they directly run something on the system that you don't want to it will most likely be on you (whether that be just managing a restore or something more serious)
You can get more granualar with your permissions to let them only do what you want. It would depend on how often they want to make changes and how responsible you are for their data. I would not want to grant dbo to someone unless there was a really good reason.
Make sure that they are the owner of the database not just in the dbo role. If dbchaining is on in another database with the same owner they could turn it on in their database and have dbo permissions in that other database.
Or, make them sign an agreement that says "Any damage you cause to the data or the database schema caused by you or anyone logged in under said db account is not of your fault and no blame can be put on you, etc etc" At least if they balls something up, that way you're covered and the client stays happy. Though you might want to give them a separate login for this, so that they can't blame incorrect changes on the website code.
There's a word for DBAs who are overprotective: "Employed"
The rest, not so much.

Resources