I intend to build a system that stores information in a relational database (PostgreSQL) and in a directory (OpenLDAP).
the directory is for the system's users (workers, customers, brokers), each of them has a UUID that uniquely identifies them;
the database is for things that change often (e.g. order details, transactions currently in progress, etc);
some tables in the database will have a UUID attribute, pointing to entities in OpenLDAP.
A directory is chosen because I want to leverage the ability to build entities that have variable sets of attributes, or entities that inherit or combine attributes from other classes of entities. Such flexibility is needed to support a wider range of business cases.
In other words, the directory provides some "object-orientedness", which would have to be reinvented from scratch, had I chosen to use an RDBMS exclusively.
There's a catch that I am not sure how to deal with yet: if a table refers to a UUID that was removed from the directory - the database will effectively point to nothing. Thus the data will become incomplete or inconsistent.
My solution is to never remove entities from OpenLDAP, instead just mark them as "inactive".
My questions:
is it common practice to utilize a directory and a relational database in such a way?
are there other approaches that can solve the same problem using a single component?
Related
I want to provide controlled access to data which is stored in multiple tables. The access is decided based on certain run-time attributes associated with the user. I am looking for a solution which is extensible, performant as well as highly secured.
ILLUSTRATION:
There is a framework level module which stores authorization/access related data for multiple other modules. Then there are n numbers of modules which manage their own life cycle objects. e.g. module Test1 has 1000 instances which are created and stored in its base table. As framework solution I want to protect access to this data by users hence I created a notion of privileges and stored their mapping to user in my own table. Now to provide controlled access to data, my aim is that a user is shown only the objects to which he/she has access to.
Approaches in my mind:
I use oracle database and currently we are using VPD (virtual private database) so here we add a policy on each of the base table of above mentioned modules which firstly evaluates the access of currently logged in user from the privileges given to him and then that data is appended into all the query to each of the base tables of other modules (by default by database itself).
PROS: very efficient and highly secured solution.
CONS: Can not work if the base tables and our current table are in two different schema. May be two different schema in the same database instance can be overcome but some of my integrator systems might be in separate databases altogether.
Design at java layer:
We connect to our DB's through JPA data sources. So I can write a thin layer basically a wrapper of sorts over EntityManager and then replicate what VPD does for me that is firstly get the access related data from my tables then use a monitored query on the table of my integrator and then may be cache the data into a caching server(optimization).
CONS: I want to use it in production system hence want to get it done in the first shot. Want to know any patterns which are already implemented in the industry.
I do not think your solution are flexible enough to work well in a complex scenario like yours. If you have very simple queries, then yes, you can design something like SQL screener at database or "java" level and then just pass all your queries through.
But this is not flexible. As soon as your queries will start to grow complex, improving this query screener will become tremendously difficult since it is not a part of bussiness logic and cannot know the details of your permission system.
I suggest you implement some access checks in your service layer. Service must know for which user it generates or processes the data. Move query generation logic to repositories and have your services call different repository methods depending on user permissions for example. Or just customize repository calls with parameters depending on user permissions.
I'm building a web app in GAE that needs to make use of some simple relationships between the datastore entities. Additionally, I want to do what I can from the outset to make import and exportability easier, and to reduce development time to migrate the application to another platform.
I can see two possible ways of handling relationships between entities in the datastore:
Including the key (or ID) of the related entity as a field in the entity
OR
Creating a unique identifier as an application-defined field of an entity to allow other entities to refer to it
The latter is less integrated with GAE, and requires some kind of mechanism to ensure the unique identifier is in fact unique (which in turn will rely on ancestor queries).
However, the latter may make data portability easier. For example, if entities are created on a local machine they can be uploaded (provided the unique identifier is unique) without problem. By contrast, relying on the GAE defined ID will not work as the ID will not be consistent from the development to the deployed environment.
There may be data exportability considerations too that mean an application-defined unique identifier is preferable.
What is the best way of doing this?
GAE's datastore just doesn't export well to SQL. There's often situations where data needs to be modeled very differently on GAE to support certain queries, ie many-to-many relationships. Denormalizing is also the right way to support some queries on GAE's datastore. Ancestor relationships are something that don't exist in the SQL world.
In order to import export data, you'll need to write scripts specific to your data models.
If you're planning for compatibility with SQL, use CloudSQL instead of the datastore.
In terms of moving data between dev/production, you've already identified the ways to do it. There's no real "easy" way.
Background
building an online information system which user can access through any computer. I don't want to replicate DB and code for every university or organization.
I just want user to hit a domain like www.example.com sign in and use it.
For second user it will also hit the same domain www.example.com sign in and use it. but the data for them are different.
Scenario
suppose a university has 200 employees, 2nd university has 150 and so on.
Qusetion
Do i need to have separate employee table for each university or is it OK to have a single table with a column that has University ID?
I assume 2nd is best but Suppose i have 20 universities or organizations and a total of thousands of employees.
What is the best approach?
This same thing is for all table? This is just to give you an example.
Thanks
The approach will depend upon the data, usage, and client requirements/restrictions.
Use an integrated model, as suggested by duffymo. This may be appropriate if each organization is part of a larger whole (i.e. all colleges are part of a state college board) and security concerns about cross-query access are minimal2. This approach has a minimal amount of separation between each organization as the same schema1 and relations are "openly" shared. It leads to a very simple model initially, but it can become very complicated (with compound FKs and correct usage of such) if needing relations for organization-specific values because it adds another dimension of data.
Implement multi-tenancy. This can be achieved with implicit filters on the relations (perhaps hidden behinds views and store procedures), different schemas, or other database-specific support. Depending upon implementation this may or may not share schema or relations even though all data may reside in the same database. With implicit isolation, some complicated keys or relationships can be hidden/eliminated. Multi-tenancy isolation also generally makes it harder/impossible to cross-query.
Silo the databases entirely. Each customer or "organization" has a separate database. This implies separate relations and schema groups. I have found this approach to to be relatively simple with automated tooling, but it does require managing multiple database. Direct cross-querying is impossible, although "linked databases" can be used if there is a need.
Even though it's not "a single DB", in our case, we had the following restrictions 1) not allowed to ever share/expose data between organizations, and 2) each organization wanted their own local database. Thus, our product ended up using a silo approach. Make sure that the approach chosen meets customer requirements.
None of these approaches will have any issue with "thousands", "hundreds of thousands", or even "millions" of records as long as the indices and queries are correctly planned. However, switching from one to another can violate many assumed constraints and so the decision should be made earlier on.
1 In this response I am using "schema" to refer to the security grouping of database objects (e.g. tables, views) and not the database model itself. The actual database model used can be common/shared, as we do even when using separate databases.
2 An integrated approach is not necessarily insecure - but it doesn't inherently have some of the built-in isolation of other designs.
I would normalize it to have UNIVERSITY and EMPLOYEE tables, with a one-to-many relationship between them.
You'll have to take care to make sure that only people associated with a given university can see their data. Role based access will be important.
This is called a multi-tenant architecture. you should read this:
http://msdn.microsoft.com/en-us/library/aa479086.aspx
I would go with Tenant Per Schema, which means copying the structure across different schemas, however, as you should keep all your SQL DDL in source control, this is very easy to script.
It's easy to screw up and "leak" information between tenants if doing it all in the same table.
I have a full multi-tenant database with TenantID's on all the tenanted databases. This all works well, except now we have a requirement to allow the tenanted databases to "link to" shared data. So, for example, the users can create their own "Bank" records and link accounts to them, but they could ALSO link accounts to "global" Bank records that are shared across all tenants.
I need an elegant solution which keeps referential integrity
The ways I have come up with so far:
Copy: all shared data is copied to each tenant, perhaps with a "System" flag. Changes to shared data involve huge updates across all tenants. Probably the simplest solution, but I don't like the data duplication
Special ID's: all links to shared data use special ID's (e.g. negative ID numbers). These indicate that the TenantID is not to be used in the relation. You can't use an FK to enforce this properly, and certainly cannot reuse ID's within tenants if you have ANY FK. Only triggers could be used for integrity.
Separate ID's: all tables which can link to shared data have TWO FK's; one uses the TenantID and links to local data, the other does not use TenantID and links to shared data. A constraint indicates that one or the other is to be used, not both. This is probably the most "pure" approach, but it just seems...ugly, but maybe not as ugly as the others.
So, my question is in two parts:
Are there any options I haven't considered?
Has anyone had experience with these options and has any feedback on advantages/disadvantages?
A colleague gave me an insight that worked well. Instead of thinking about the tenant access as per-tenant think about it as group access. A tenant can belong to multiple groups, including it's own specified group. Data then belongs to a group, possibly the Tenant's specific group, or maybe a more general one.
So, "My Bank" would belong to the Tenant's group, "Local Bank" would belong to a regional grouping which the tenant has access to, and "Global Bank" would belong to the "Everyone" group.
This keeps integrity, FK's and also adds in the possibility of having hierarchies of tenants, not something I need at all in my scenario, but a nice little possibility.
At Citus, we're building a multi-tenant database using PostgreSQL. For shared information, we keep it in what we call "reference" tables, which are indeed copied across all the nodes. However, we keep this in-sync and consistent using 2PC, and can also create FK relationships between reference and non-reference data.
You can find more information here.
We use Active Directory as the user store for our web application. All of our user information, such as first name, last name, email, phone, company, etc, is stored on the user record there.
Now we need to store a couple more pieces of info, except for these fields there aren't pre-existing fields on the schema that we can use. The fields we need are security question and security question answer.
I feel that we should extend the Active Directory schema to include these fields, thus keeping all of our user information in a single data store. However, our IT department feels that Active Directory should never be extended because they feel it is too dangerous and that Active Directory isn't intended to be used like this.
Who is right, and what is the philosophy for determining what types of attributes are ok to add to the schema?
Th
AD schema is meant to be extended. Casual AD admins have always been afraid of extending the schema especially because the word "permanent" usually followed. But the fact is that peramanent in ldap really is meaningless. If the new schema attributes or objects are never utilized then there is no adverse performance effect on the directory unless you can't bare the thought of looking at unused schema. The only risk of permanent schema is if it conflicts with existing or future schema and that is rare especially if you use unique naming such as "JohnsCompanySecurityAttribute1" etc. I worked at a hospital for 9 years and extending the schema was common place and is part of the value of AD or ADAM. Your IT guys can always temporarily take a couple DCs offline during the schema extension if they're still unconvinced. Here is some shameless self promotion related to heavy AD/AM usage in a sensitive clinical environment.
Active Directory initially had really crappy schema support. That is, you could not delete something, you could not change schema much.
With the later releases (2008 R2) you get the ability to do much more with schema. People using other directory services will not have this irrational fear.
Do consider encrypting the data as you store it.