In SOA I have been confused about how a service that works with data from different databases, or different services even, can have referential integrity so that minimal data is duplicated across databases or services.
For example, you have a user table in some kind of authentication database and you want to reuse this user information in another database. You also want to enforce that user's record to exist in the authentication database. Let’s say you want to associate a user’s account in the authentication database with a news article in another database. How is that done? How would you do that using something like LDAP?
If the authentication information was contained in the same database, just a different table, then I could see how you could just use foreign keys to create an association between a news article and the user account.
I have been trying to search for answers about this concern but I must be using the wrong phrases because I am not coming up with anything useful.
Some platforms allow foreign key constraints between databases, and some don't. If you need referential integrity between databases, you need to pick a platform that supports it. Normalization--a different issue--never says "move these columns to a different database."
In a multi-tenant database, you'd usually choose a different architecture. (You wouldn't normally put authentication for all users in one database, and the stuff they're authenticated to use in another.)
Browse questions tagged "multi-tenant"; that might be a term that will help you. MSDN has a fairly good bird's-eye view.
Related
If you have another application that uses data of an existing database and needs some more, and you don't want to change the schema of the existing database, how do you do that?
Background of my question: We use an IBM product (Connections) to store user profiles. But we have lots of custom requirements (lots of custom fields and logics), so currently we create a few more tables, views and functions in the backend database of Connections to store the custom data. However, as it is IBM's internal database and we are not supposed to touch it, when we upgrade Connections, all our custom tables, views and functions are gone.
So we decide to move out our custom things. But the problem is we still need to join with the data from Connections. (Or not database join, just some other way to integrate with the data before presenting to the users. )
If we create a federated table in our own database, we can create tables and views like we used to. But would it have performance issues? And we are still going to be heavily depend on IBM's schema and have to assume they don't change it. Is it a good approach?
What are the other options we could consider?
If we create a federated table in our own database, we can create tables and views like we used to. But would it have performance issues?
Probably. Your application code would have to do joins between the IBM database tables and your database tables.
I'm assuming that Connections uses DB2. If you bring up your own DB2 database, I think you can do SQL joins between two separate DB2 databases.
Either way, this code should reside in a separate data access package made up of data access objects. The rest of your applications would use the data access package.
And we are still going to be heavily depend on IBM's schema and have to assume they don't change it.
IBM will change their schema, and you have to plan on making corresponding changes to your database and / or application.
What are the other options we could consider?
You could copy the IBM data from their database to your database. You still have to make changes to the copy process when the IBM schema table definitions change.
I'm working on a system which for every "company" has their own "users" and their own "bills". That scenario is better in performance and management? Handle all companies in the same database and link everything to an idempresa, or database for each client?
This is called multi tenancy architecture and each customer is a tenant. There are various strategies to deal with it and each one might bring potential problems.
Having a separate database for each tenant is an option that provides data separation and do not require you to add a column to identify each tenant in your tables and queries, but also has the downside to keep multiple databases up to date.
Having a column in each table of a single database to identify your tenants is also a good strategy, but then it brings problems when scaling and managing different features for different customer for example.
You need to study all available strategies and decides which one is best based on your requirements and pain points.
Putting a tenant data in a separate Database is a straight forward approach and less painful option but then in a long run, when your product gets wildly successful, maintaining this database will become a nightmare.
On the Other hand keeping all the Tenants data in a single database could also make your application non scalable and less performable. The better approach would be the combination of both, the decision of making the choice between these two is completely based on the type, usage and size of the customer.
In certain cases, you may need to provision a separate database for a particular module or feature of your application may be for security or to isolate the specific data alone. I have written an article on these lines; kindly have a look at http://blog.techcello.com/2012/07/database-sharding-scaling-data-in-a-multi-tenant-environment/
I think the scaling problem of multi-tenant in a single database can be overcome by proper planning up-front. Plan to make it easy to migrate a tenant and their data to another database anything they become big enough to justify it.
If you can automate this migration, based on tenant ID, in each table then it should be easy and safe. I'd just make sure I tested it often as development of new features are going on.
You can mitigate the risks of multi-tenant on one database. You can't really do much when there are multiple databases. You can only be diligent and disciplined to make sure all the databases stay in sync.
Good luck!!!
This is an old thread, but it's worth mentioning this for others with this question who may come across this post in the future.
I've had great success on projects in the past by using PostgreSQL and putting the global tables in the "public" schema (like users, groups, etc.) and the same set of tables for each tenant in their own separate schemas.
For example:
For every tenant that's added to the system, a new schema is created with a standard set of tables for the application:
CREATE SCHEMA tenant1;
CREATE TABLE tenant1.products (...);
CREATE TABLE tenant1.orders (...);
etc.
Each tenant's schema would have its own isolated section within the database with the same set of tables that every other tenant has but filled with their own data.
In the default "public" schema you'd have global "users" and "tenants" tables (along with tables for things like groups and access control lists). Every user belongs only to a single tenant. Upon login, the tenant for that user is looked up and from that point forward any time you connect to the database you set it to use that tenant's schema:
SET search_path TO tenant1, public;
Once the schema search_path is set, all your SQL queries can be written as if you're working with a single database with tables named "products", "orders", and so forth (along with the tables in the "public" schema). So you can just use something like "SELECT * FROM products" and it would get the products belonging to this user's tenant.
Unfortunately, the term "schema" has come to take on different definitions for different databases. We're using SQL Server 2008 R2, and with that in mind, I have a better understanding thanks to some other questions here with people asking similar questions. However, before I begin making the database, I want to be sure I have this right for my specific scenario.
Basically it's a database for various departments of the company. For example, Administration will manage employees with a bunch of tables related to employee management. Marketing will have a lot of marketing related tables. And tech support will have a lot of tech support related tables. These "groups" will probably never interact with one another, but they're all part of the same project, so I'm putting them all in one database, rather than three separate databases.
Am I correct in understanding that this means I would want three different schemas? So that for Administration, for example, the tables would be named:
Administration.Employees
Administration.VacationDays
Administration.EmployeeAddresses
etc.
and then for tech support, for example:
Techsupport.Clients
Techsupport.OpenIssues
Techsupport.ClosedIssues
etc.
And then am I correct in understanding that the PURPOSE of this, instead of just having every table in the dbo schema, is for A) organization purposes, and B) permission purposes (users with Techsupport schema access shouldn't be able to access the Administration schema, for instance). The idea I've come to in my head is that schemas in the SQL Server definition is that a schema is just like a virtual folder that groups related tables together.
I think this is right, after all the similar questions that I've read, but I just really want to be sure I'm on the right path before I get too far in and realize I'm doing it completely wrong.
Is throwing everything into the dbo schema and calling a day discouraged / not intended? Should you use a schema, even for small databases that don't necessarily need multiple schemas?
Thanks.
Schemas support two primary purposes:
security container. Permissions can be granted on schemas and such permissions apply to all objects in the schema. Eg. GRANT SELECT ON SCHEMA::Administration TO [foo\bar]; grants the SELECT permission to any table in the schema, including future added tables.
namespace. You can deploy your application in the schema [CptSupermarkt] and know that your app has a very low probability of a name conflict with other applications.
The prevalent use is the first one because most apps are not concerned with side-by-side deployment with other applications and usually assume ownership of an entire database (if not an entire instance). However there are types of applications (eg. audit tools and monitoring apps) that use the namespace aspect of schemas (or, at least, most should use it...).
I use schemas in my databases, but other than the security benefits and the fact that my OCD is happy, I don't really know whay it is good practice to use them. Besides the more granular security, are there other reasons for using schemas when building a database?
The primary pupose of schemas is indeed security. A secondary benefit is that they act like namepaces for your application tables and objects, thus allowing a conflict free side-by-side deployment with other applications that may use same names for its object.
Schema's arose from the original Sql Server. They didn't have schemas which meant that every single object in the database had to be owned by someone. If jill from accounting left the company then you had to manually reassign all her stuff to someone else etc. Schemas now own objects and users belong to schemas, which makes all the DB Admins very happy people :).
Basically you can have users leave and you remove their privileges by removing them from schemas and deleting the user. Adding privileges to a user is now as simple as adding the user to the schema.
Can anyone tell me if there are RDBMSs that allow me to create a separate database for every user so that there is full separation of users' data?
Are there any?
I know I can add UID to every table but this solution has its own problems (for example per user database schema changes are impossible).
Doesnt MySQL, PostgreSQL, Oracle and so on and so on allow you to do that?. There's the grant statements to control ACLs
I would imagine most (all?) databases allow you to create a user which you could then grant database level access to? SQL server certainly does.
Another simple solution if you don't need the databases to be massive or scalable, say for teaching SQL to students or having many testers work against their own database to isolate problems is SQLite, that way the whole database is a single file (per user), and each user cannot possibly screw up or interfere with other users.
They can even mail you the databases, or install them anywhere, say at home and at work with no internet required.
MS SQLServer2005 is one which can be used for multiple users.An instance can be created
if you have any, run the previlegs and use one user per instance
Oracle lets you create a separate schema (set of tables, indexes, functions, etc) for individual users. This is good if they should have separate different tables. Creating a new user could be a very expensive operation as you would be making new tables. Updating is a nightmare as well, as you need to update the model for each user.
If you want everyone to have the same set of tables, but only able to view their own records then you could use Fine Grain Access Control or Virtual Private Database features to do this.