Is there existing terminology / pattern to cover this scenario? - database

In our SaaS system we're dividing users into separate "pools" according to the customer that originally "owns" the user. We're using "email addresses plus ID of owning organisation" to identify users, rather than just email addresses - so duplicate email addresses can exist between customers (don't ask). Users arrive at the site on various subdomains, and we use these subdomains to identify the "user pool" we're authenticating the user against.
My question: is there any established name for this pattern or something similar?
Cheers!

In database terminology, when uniquely identifying a row using more than one column, this is called a composite primary key (aka compound key).
The scenario you describe is used commonly when a single database is used for multiple customers - one form of multitenancy.

"home-realm-discovery" is a common term for identifying what tenant a user belongs to in a multi-tenant SaaS application. It's most often talked about in the context of Federated Identity but applies in your case too. Using a sub-domain like you're doing is a common practice.

I am not aware of any specific name for this scenario, but in general, this would fall under the phrase "multi-tenant" / "multi-tenancy". Many SaaS implementations do customer (or rather tenant) based branding already on the login screen, which would mean that they'd have to identify the user based on the URL / subdomain, or at least in some way other than the email address used.
Routing to different servers based on the subdomain is also a common way to achieve tiered service levels for SaaS implementations.
I'm not sure I've answered the question, but I hope the general info helps!

Related

User identity claims where the combination of two identifiers is unique

We are using IdentityServer4 for managing user identities and logins. In the business domain there are companies that are divided into organizations. Companies are distinguished by unique identifiers, while organization identifiers might clash between different companies, which makes only the combination of company ID and organization ID unique.
A normal user typically has access only to some organizations in one company, but there is a need for superusers that have access to organizations in multiple companies. The claims in IdentityServer were originally designed only for one company, so it has not really been thought how multiple companies should fit into this.
For a normal user it would totally fine that he would have one company claim and one or many organization claims. But this does not really work for superusers that need to access many organizations in many companies.
How should this kind of combination identifier be modelled? The only way we can come up with is combining the company ID and organization ID into one claim, but that means that we must split the claim everywhere we need to have the identifiers separated, which feels rather cumbersome and error prone. So is there a better or more proper way, or is this perhaps a problem that should be fixed outside of Identity server?

In general with Active Directory, what do most companies use as unique identifier for people?

I am trying to build a database that stores Active-Directory entries for users/employees.
Is it safe to assume to query on: (objectClass=person)
What attribute should I store as a unique identifier that isn't the DN? e.g. should I use mail or uid
Also when an employee gets de-activated is there a new attribute that gets added or are they simply removed entirely from AD?
The question asked by you seems to be somewhat opinion based, but I'll talk it from the context of general options available in AD and the usual practices followed.
Is it safe to assume to query on: (objectClass=person)?
All the users created do come under the category of (objectClass=person). But, then if you create a generic-user for having file-share access on a system (through ADUC(dsa.msc) / powershell / C#, etc) which would not be an employee, then in this case it would violate your search condition despite being a person class. I can think of so many other scenarios where it would be impossible to avoid generic-users creation (which would again lie in person objectClass), at least from the viewpoint of mid-sized company and above.
Hence, in such cases it is better to follow a naming convention in your environment to avoid any such confusion. One sample example could be, say set the UPN/sAMAccountName for non-employee users to start from genXXXX, and you'd be easily able to search all employee users henceforth.
What attribute should I store as a unique identifier that isn't the DN? e.g. should I use mail or uid?
There are unique identifiers already available in AD like objectGUID and objectSid. In a domain, the sAMAccountName/UPN values are also unique. But, you cannot rely on that for forest-level search.
objectSid for a user can change when the user is migrated to another domain, but objectGUID never changes. You can read more about SIDs versus GUIDs here.
Also when an employee gets de-activated is there a new attribute that
gets added or are they simply removed entirely from AD?
There is no automatic trigger at AD side. There is an attribute called lastLogontimeStamp which helps keep a track when a user or computer account has logged onto the domain (not the live scenario, but recent one - depending on if it keeps updating properly).
Someone has to manually disable/delete the account if an employee/user leaves the organisation. There are process setup in companies to deal with this scenario where the Access Management solutions are linked with AD modules, and take care of the entry and exit of the users and perform relevant action in AD.
Hope it gives a rough idea of management for the queries raised by you.

Multiple microservices and database associations

I have a question concerning microservices and databases. I am developing an application: a user sees a list of countries and can click through it so he can see a list of attractions of that country. I created a country-service, auth-service (contains users for oAuth2) and an attraction-service. Each service has its own database. I mapped the association between an attraction and its country by the iso code (for example: BE = belgium): /api/attraction/be.
The approach above seems to work but I am a bit stuck with the following: a user must be able to add an attraction to his/her list of favorites, but I do not see how that's possible since I have so many different databases.
Do I create a favorite-service, do I pass id's (I don't think I should do this), what kind of business key can I create, how do I associate the data in a correct way...?
Thanks in advance!
From the information you have provided, using a standalone favourite service sounds like the right option.
A secondary simpler and quicker option might be to also to handle this on your user service which looks after the persistence of your users data as favourites are exclusive to a user entity.
As for ID's, I haven't seen many reasons as to why this might be a bad idea? Your individual services are going need to store some identifying value for related data and the main issue here I feel is just keeping this ID field consistent across your different services. What you choose just needs to be reliable and predictable to keep things easy and simple as your system grows.
If you are using RESTful HTTP, you already have a persistent, bookmarkable identification of resources, URLs (URIs, IRIs if you want to be pedantic). Those are the IDs that you can use to refer to some entity in another microservice.
There is no need to introduce another layer of IDs, be it country codes, or database ids. Those things are internal to your microservice anyway and should be transparent for all clients, including other microservices.
To be clear, I'm saying, you can store the URI to the country in the attractions service. That URI should not change anyway (although you might want to prepare to change it if you receive permanent redirects), and you have to recall that URI anyway, to be able to include it in the attraction representation.
You don't really need any "business key" for favorites either, other than the URI of the attraction. You can bookmark that URI, just as you would in a browser.
I would imagine if there is an auth-service, there are URIs also for identifying individual users. So in a "favorites" service, you could simply link the User URI with Attraction URIs.

Multi Tenant Database with some Shared Data

I have a full multi-tenant database with TenantID's on all the tenanted databases. This all works well, except now we have a requirement to allow the tenanted databases to "link to" shared data. So, for example, the users can create their own "Bank" records and link accounts to them, but they could ALSO link accounts to "global" Bank records that are shared across all tenants.
I need an elegant solution which keeps referential integrity
The ways I have come up with so far:
Copy: all shared data is copied to each tenant, perhaps with a "System" flag. Changes to shared data involve huge updates across all tenants. Probably the simplest solution, but I don't like the data duplication
Special ID's: all links to shared data use special ID's (e.g. negative ID numbers). These indicate that the TenantID is not to be used in the relation. You can't use an FK to enforce this properly, and certainly cannot reuse ID's within tenants if you have ANY FK. Only triggers could be used for integrity.
Separate ID's: all tables which can link to shared data have TWO FK's; one uses the TenantID and links to local data, the other does not use TenantID and links to shared data. A constraint indicates that one or the other is to be used, not both. This is probably the most "pure" approach, but it just seems...ugly, but maybe not as ugly as the others.
So, my question is in two parts:
Are there any options I haven't considered?
Has anyone had experience with these options and has any feedback on advantages/disadvantages?
A colleague gave me an insight that worked well. Instead of thinking about the tenant access as per-tenant think about it as group access. A tenant can belong to multiple groups, including it's own specified group. Data then belongs to a group, possibly the Tenant's specific group, or maybe a more general one.
So, "My Bank" would belong to the Tenant's group, "Local Bank" would belong to a regional grouping which the tenant has access to, and "Global Bank" would belong to the "Everyone" group.
This keeps integrity, FK's and also adds in the possibility of having hierarchies of tenants, not something I need at all in my scenario, but a nice little possibility.
At Citus, we're building a multi-tenant database using PostgreSQL. For shared information, we keep it in what we call "reference" tables, which are indeed copied across all the nodes. However, we keep this in-sync and consistent using 2PC, and can also create FK relationships between reference and non-reference data.
You can find more information here.

Exposing database IDs - security risk?

I've heard that exposing database IDs (in URLs, for example) is a security risk, but I'm having trouble understanding why.
Any opinions or links on why it's a risk, or why it isn't?
EDIT: of course the access is scoped, e.g. if you can't see resource foo?id=123 you'll get an error page. Otherwise the URL itself should be secret.
EDIT: if the URL is secret, it will probably contain a generated token that has a limited lifetime, e.g. valid for 1 hour and can only be used once.
EDIT (months later): my current preferred practice for this is to use UUIDS for IDs and expose them. If I'm using sequential numbers (usually for performance on some DBs) as IDs I like generating a UUID token for each entry as an alternate key, and expose that.
There are risks associated with exposing database identifiers. On the other hand, it would be extremely burdensome to design a web application without exposing them at all. Thus, it's important to understand the risks and take care to address them.
The first danger is what OWASP called "insecure direct object references." If someone discovers the id of an entity, and your application lacks sufficient authorization controls to prevent it, they can do things that you didn't intend.
Here are some good rules to follow:
Use role-based security to control access to an operation. How this is done depends on the platform and framework you've chosen, but many support a declarative security model that will automatically redirect browsers to an authentication step when an action requires some authority.
Use programmatic security to control access to an object. This is harder to do at a framework level. More often, it is something you have to write into your code and is therefore more error prone. This check goes beyond role-based checking by ensuring not only that the user has authority for the operation, but also has necessary rights on the specific object being modified. In a role-based system, it's easy to check that only managers can give raises, but beyond that, you need to make sure that the employee belongs to the particular manager's department.
There are schemes to hide the real identifier from an end user (e.g., map between the real identifier and a temporary, user-specific identifier on the server), but I would argue that this is a form of security by obscurity. I want to focus on keeping real cryptographic secrets, not trying to conceal application data. In a web context, it also runs counter to widely used REST design, where identifiers commonly show up in URLs to address a resource, which is subject to access control.
Another challenge is prediction or discovery of the identifiers. The easiest way for an attacker to discover an unauthorized object is to guess it from a numbering sequence. The following guidelines can help mitigate that:
Expose only unpredictable identifiers. For the sake of performance, you might use sequence numbers in foreign key relationships inside the database, but any entity you want to reference from the web application should also have an unpredictable surrogate identifier. This is the only one that should ever be exposed to the client. Using random UUIDs for these is a practical solution for assigning these surrogate keys, even though they aren't cryptographically secure.
One place where cryptographically unpredictable identifiers is a necessity, however, is in session IDs or other authentication tokens, where the ID itself authenticates a request. These should be generated by a cryptographic RNG.
While not a data security risk this is absolutely a business intelligence security risk as it exposes both data size and velocity. I've seen businesses get harmed by this and have written about this anti-pattern in depth. Unless you're just building an experiment and not a business I'd highly suggest keeping your private ids out of public eye. https://medium.com/lightrail/prevent-business-intelligence-leaks-by-using-uuids-instead-of-database-ids-on-urls-and-in-apis-17f15669fd2e
It depends on what the IDs stand for.
Consider a site that for competitive reason don't want to make public how many members they have but by using sequential IDs reveals it anyway in the URL: http://some.domain.name/user?id=3933
On the other hand, if they used the login name of the user instead: http://some.domain.name/user?id=some they haven't disclosed anything the user didn't already know.
The general thought goes along these lines: "Disclose as little information about the inner workings of your app to anyone."
Exposing the database ID counts as disclosing some information.
Reasons for this is that hackers can use any information about your apps inner workings to attack you, or a user can change the URL to get into a database he/she isn't suppose to see?
We use GUIDs for database ids. Leaking them is a lot less dangerous.
If you are using integer IDs in your db, you may make it easy for users to see data they shouldn't by changing qs variables.
E.g. a user could easily change the id parameter in this qs and see/modify data they shouldn't http://someurl?id=1
When you send database id's to your client you are forced to check security in both cases. If you keep the id's in your web session you can choose if you want/need to do it, meaning potentially less processing.
You are constantly trying to delegate things to your access control ;) This may be the case in your application but I have never seen such a consistent back-end system in my entire career. Most of them have security models that were designed for non-web usage and some have had additional roles added posthumously, and some of these have been bolted on outside of the core security model (because the role was added in a different operational context, say before the web).
So we use synthetic session local id's because it hides as much as we can get away with.
There is also the issue of non-integer key fields, which may be the case for enumerated values and similar. You can try to sanitize that data, but chances are you'll end up like little bobby drop tables.
My suggestion is to implement two stages of security.
"Security through obscurity": You can have integer Id as primary key and Gid as GUID as surrogate key in tables. Whereas integer Id column is used for relations and other database back-end and internal purposes (and even for select list keys in web apps to avoid unnecessary mapping between Gid and Id while loading and saving) and Gid is used for REST Urls i.e for GET,POST, PUT, DELETE etc. So that one cannot guess the other record id. This gives first level of protection against guess-based attacks. (i.e. number series guessing)
Access based control at Server side : This is most important, and you have various way to validate the request based on roles and rights defined in application. Its up to you to decide.
From the perspective of code design, a database ID should be considered a private implementation detail of the persistence technology to keep track of a row. If possible, you should be designing your application with absolutely no reference to this ID in any way. Instead, you should be thinking about how entities are identified in general. Is a person identified with their social security number? Is a person identified with their email? If so, your account model should only ever have a reference to those attributes. If there is no real way to identify a user with such a field, then you should be generating a UUID before hitting the DB.
Doing so has a lot of advantages as it would allow you to divorce your domain models from persistence technologies. That would mean that you can substitute database technologies without worrying about primary key compatibility. Leaking your primary key to your data model is not necessarily a security issue if you write the appropriate authorization code but its indicative of less than optimal code design.

Resources