Using LDAP server as a storage base, how practical is it? - database

I want to learn how practical using an LDAP server (say AD) as a storage base. To be more clear; how much does it make sense using an LDAP server instead of using RDBMS to store data?
I can guess that most you might just say "it doesn't" but there might be some reasons to make it meaningful (especially business wise);
A few points first;
Each table becomes a container entity and each row becomes a new entity as a child. Row entities contains attributes for columns. So you represent your data in this way. (This should be the most meaningful representation I think, suggestions are welcome)
So storing data like a DB server is possible but lack of FK and PK (not sure about PK) support is an issue. On the other hand it supports attribute (relates to a column) indexing (Not sure how efficient). So consistency of data is responsibility of the application layer.
Why would somebody do this ever?
Data that application uses/stores closely matches with the existing data in AD. (Users, Machines, Department Info etc.) (But still some customization is required to existing entity schema, and new schema definitions are needed for not very much related data.)
(I think strongest reason would be this: business related) Most mid-sized companies have very well configured AD servers (replicated, backed-up etc.) but they don't have such DB setup (you can make comment to this as much as you want). Say when you sell your software which requires a DB setup to these companies, they must manage their DB setup; but if you say "you don't need DB setup and management; you can just use existing AD", it sounds appealing.
Obviously there are many disadvantages of giving up using DB, feel free to mention them but let's assume they are acceptable. (I can mention more if question is not clear enough.)

LDAP is a terrible tool for maintaining most business data.
Think about a typical one-to-many relationship - say, customer and orders. One customer has many orders.
There is no good way to represent this data in an LDAP directory.
You could try having a mock "foreign key" by making every entry of that given object class have a "foreign key" attribute, but your referential integrity just went out the window. Cascade deletes are impossible.
You could try having a "customer" object that has "order" children. However, you've just introduced a specific hierachy - you're now tied to it.
And that's the simplest use case. Once you start getting into more complex relationships, you're basically re-inventing an RDBMS in a system explicity designed for a different purpose. The clue's in the name - directory.
If you're storing a phonebook, then sure, use LDAP. For anything else, use a real database.

For relatively small, flexible data sets I think an LDAP solution is workable. However an RDBMS provides a number significant advantages:
Backup and Recovery: just about any database will provide ACID properties. And, RDBMS backups are generally easy to script and provide several options (e.g. full vs. differential). Just don't know with LDAP, but I imagine these qualities are not as widespread.
Reporting: AFAIK LDAP doesn't offer a way to JOIN values easily, much the less do things like calculate summations. So you would put a lot of effort into application code to reproduce those behaviors when you do need reporting. And what application doesn't ultimately?
Indexing: looks like LDAP solutions have indexing, but again, seems hit or miss. Whereas seemingly all databases out there have put some real effort into getting this right.
I think any serious business system's storage should be backed up in the same fashion you believe LDAP is in most environments. If what you're really after is its flexibility in terms of representing hierarchy and ability to define dynamic schemas I'd suggest looking into NoSQL solutions or the Java Content Repository.

LDAP is very usefull for storing that information and if you want it, you may use it. RDMS is just more comfortable with ORM systems. Your persistence logic with LDAP will so complex.
And worth mentioning that this is not a standard approach -> people who will support the project will spend more time on analysis.
I've used this approach for fun, i generate a phonebook from Active Directory, but i don`t think that it's good idea to use LDAP as a store for business applications.

In short: Use the right tool for the right job.
When people see LDAP you already set an expectation on your system. Don't forget that the L Lightweight. LDAP was designed for accessing directories over a network.
With a “directory database” you can build a certain type of application. If you can map your data to a tree like data structure it will work. I surely would not want to steam videos from LDAP! You can probably hack something but I would prefer a steaming server..
There might be some hidden gotchas down the line if you use a tool not designed for what it is supposed to do. So, the downside is you'll have to test stuff that would have been a given in some cases.
It's not is not just a technical concern. Your operational support team might “frown” on your application as they would have certain expectations/preconceptions based on your applications architectural nature. Imagine their surprise if you give them CRM system (website + files and popped email etc.) in a LDAP server as database to maintain.

If I was in your position, I would steer towards one of the NoSQL db solutions rather than trying to use LDAP. LDAP is fine for things like storing user and employee information, but is terrible to interact with when you need to make changes. A NoSQL db will allow you to store your data how you want without the RDBMS overhead you would like to avoid.

The answer is actually easy. Think of CRUD (Create, Read, Update, Delete). If a lot of Read will be made in your system, you can think of using LDAP. Because LDAP is quick in read operations and designed so. If the other operations will be made more, the RDMS would be a better option.

Related

Multi-Tenancy Data Architecture - Shared Schema - Security

A new project we are starting requires MultiTenancy. At storage level this can be done at several ways. (separate Database / separate schemas / Shared schema )
To keep the operational costs down we believe that "Shared Schema - Shared Tables" is the best way to continue. So all the tenants will share the same table on the same database/schema schema.
However a constraint is to provide good tenant isolation and security. For this we can use encryption. If we are able to provide each tenant with a own keypair, then we provide good security and good isolation. Each tenant can only read his own data and we don't have to add a discriminator field at each table as well.
How can we implement this technically? If you query your table we will get a lot of data we are not able to decrypt ( data from other tenants ). Also in Joins etc it will have higher load due to the other records being in database.
I've already read a couple of articles on MSDN and watched some presentations, but they keep it very high level and abstract. Any thoughts on this ?
Is something like described above possible? I thought you could do something on Amazon RDS ? Is it possible to provide some example - eg on github?
Based on what you've shared, and with some reading between the lines, I am wary of this approach. By itself, shared schema is a very reasonable design for multi-tenancy; where I see problems is with the suggested use of encryption.
While PostgreSQL does support encryption, it's done via functions in the pgcrypto module. RDS, as a managed service for PostgreSQL, adds the ability to easily provision encrypted volumes as well, but to a database user/developer, it's going to look pretty much the same.
The docs suggest using pgcrypto if you only need to encrypt small subsets of your data that you don't need to filter or join on - but it's not clear how much of the data you are looking to encrypt. If only a handful of columns and don't need to filter on them, this may work. Otherwise, reconsider - extensive use of the pgcrypto functions will render almost all standard database operations impossibly inefficient. A where clause will require decrypting the column, in turn requiring scanning/decrypting the full table; there would be zero use of indexes. Your performance will slow to a crawl very quickly.
A major consideration you haven't provided is how you are providing access - for example, a web application, where you completely mediate access with a single, trusted account? Or allowing the customers to connect directly to the database? In the former case, your code would be managing all access anyway, and would always need access to all the keys; why incur the overhead? In the latter case, you'd probably render the database unusable to the customer, because all of the standard query tools would be difficult to use.
More broadly, in my experience, a schema-per-tenant approach can offer a good balance between isolation, efficiency, and development overhead. And with judicious use of roles in PostgreSQL, you can enforce reasonable access controls for direct access (you can do the same with rows, though in my view that would require more overhead to administer correctly).
Take a look at some of the commonly used application frameworks to learn more: Rails offers the Apartment gem (https://github.com/influitive/apartment); Django has the django-tenants library (http://django-tenants.readthedocs.io/en/latest/); Hibernate has a pluggable tenant framework (e.g., https://docs.jboss.org/hibernate/orm/4.2/devguide/en-US/html/ch16.html)
Hope this helps.

Transact SQL, prefix with schema or not?

I'm developing a tool where I've prefixed tables etc. with "dbo" now I get requests for custom schema names. I'm thinking of skipping them and instead let the user control this via the associated login against the Db. I know there's talk about "performance" since it needs to search the users's schemes and then fallback on dbo etc, but is that really an issue? Opinions?
First, I would look at this question as a feature request from your customers (users?). So the immediate decision to make is, should you even consider looking into this now, or do you have other feature requests that are obviously more important and deliver more benefit to the customer?
For example, for now you could simply tell customers that your application requires its own database that should not be shared with other applications or manipulated in any way by the customer. Then you don't have to worry about schemas or the same object name in two schemas because your application 'owns' the database. Perhaps this is already the case, but if so then I don't understand why your customers care which schema your objects are in.
Second, assuming that you do decide to work on it, you should gather some information about why people are asking for this, to make sure that you clearly understand what they expect you to deliver and what the benefit is for them. If customers are really saying "your application runs slowly" then the choice of schema is highly unlikely to be the reason, it's much more probable that indexing, schema design or your application code are the areas to look at.
Finally, if you still want to go ahead you need to find a technical solution. This is partly a deployment issue and partly a coding issue. It's a deployment issue because you have to deploy your database objects in a specific schema that is specified at installation time, and all your patches and later releases need to be aware of that too. The coding issue is that you need your database code to be "schema-aware", in case you end up in a situation where you have dbo.TableName, MyTool.TableName and OtherSchema.TableName all in the same database. The solution is obviously to reference the schema name in all code, which is considered an important best practice anyway. But exactly how you do this depends on how you have structured your application, if you use an ORM etc.

NoSql/Raven DB implementation best practices

I'm investigating a new project which will be a social networking style site. I'm reading up on RavenDb and I like the look of a lot of its features. I've not read up on nosql all that much but I'm wondering if there's a niche it fits best with and old school sql is still the best choice for other stuff.
I'm thinking that the permissions plug in would be ideal for a social net style site - but will it really perform in an environment where the database will be getting hammered - or is it optimised for a more reporting style system where it's possible to keep throwing new data structures at the database and report on those structures.
I'm eager to use the right tool for the job - I'll be using MVC3, Windsor + either Nhibernate+Sql server or RavenDb.
Should I stick with the old school sql or go with the new kid on the block: ravendb?
This question can get very close to being subjective (even though it's really not), you're talking about NoSQL as if it is just one thing, and that is not the case.
You have
graph databases (Neo4j etc),
map/reduce style document databases (Couch,Raven),
document databases which attempt to feel like ordinary databases (Mongo),
Key/value stores (Cassandra etc)
moar goes here.
Each of them attempts to solve a different problem via different means, and whether you'd use one of them over a traditional relational store is
A matter of suitability
A matter of personal preference
At the end of the day, for the primary data-storage for a single system, a document database or relational store is probably what you want, although for different parts of your system you may well end up utilising a graph database (For calculating neighbours etc), or a key/value store (like Facebook does/did for inbox messages).
The main benefit of choosing a document store as your primary store over that of a relational one, is that you haven't got to worry about trying to map your objects into a collection of tables, and there is less configuration overhead involved in doing so.
The other downside/upside would be that you have to learn something new and make mistakes along the way.
So my answer if I am going to be direct?
RavenDB would be suitable
SQL would be suitable
Which do you prefer to use? These days I'd probably just go for Raven, knowing I can dump data into a relational store for reporting purposes and probably do likewise for other parts of my system, and getting free-text search and fastish-writes/fast-reads without going through the effort of defining separate read/write stores is an overall win.
But that's me, and I am biased.

How to model this[Networks, details in post] in database for efficiency and ease of use?

At linkedin, when you visit someones profile you can see how you are connected to them. I believe that linkedin shows upto 3rd level connections if not more, something like
shabda -> Foo user, bar user, baz user -> Joel's connection -> Joel
How can I represent this in the database.
If I model as,
User
Id PK
Name Char
Connection
User1 FK
User2 FK
Then to find the network, three levels deep, I need to get all my connection, their connections, and their connections, and then see if the current user is there. This obviously would be very inefficient with DB of any size, and probably clunky to work with as well.
Since, on linked in I can see this network, on any profile I visit, I don't think this is precalculated either.
The other thing which comes to my mind is probably this is best not stored in a relational DB, but then what would be the best way to store and retrieve it?
My recommendation would be to use a graph database. There seems to be only one implementation currently available, and that's Neo4j. It's written in Java, but has bindings to Ruby and Scala (Python in progress).
If you don't know Java, you probably won't be able to find anything similar on any other platform (yet), unfortunately. However, if you do know Java (or are at least willing to learn), it's way worth it. (Technically you don't even need to learn Java because of the Ruby/Python bindings.) Neo4j was built for exactly what you're trying to do. You'd go through a ton of trouble trying to implement this in a relational database, when you'd be able to do the exact same thing in only a few lines of Java code, and also much more efficiently.
If that's not an option, I'd still recommend looking at other database types such as object databases. Relational databases weren't built for this kind of thing, and you'd go through more pain by trying to do it in an RDBMS than by switching to a different kind of database and learning it.
I don't see why there's anything wrong with using a relational database for this. The tables defined in the question are an excellent start. With proper optimization you'll be able to keep your performance well in hand. I personally think you would need something serious to justify shifting away from such a versatile mainstream product. You'll probably need an RBDMS in the project anyway and there are an unmatchable amount of legitimate choices in many price ranges (even free). You'll get quality documentation, support will be available, and you'll have a large supply of highly trained developers available in the job pool.
Regarding this model of self-relationships (users joined to other users), I recommend looking into recursive queries. That will keep you from performing a cascade of individual queries to find 3 levels of relationships. Consider the following SQL Server method for performing recursive queries with CTEs.
http://msdn.microsoft.com/en-us/library/ms186243.aspx
It allows you to specify how deep you want to go with the MAXRECURSION hint.
Next, you need to start thinking of ways to optimize. That starts with standard best-practices for setting up your tables with proper indexes and maintenance, etc. It inevitably ends with denormalization. That's one of those things you only do once you've already tried everything else, but if you know what you're doing and use good practices then your performance gain will be significant. There are many resources on the internet to help you learn about denormalization, just look it up.

Defining the database schema in the application or in the database?

I know that the title might sound a little contradictory, but what I'm asking is with regards to ORM frameworks (SQLAlchemy in this case, but I suppose this would apply to any of them) that allow you to define your schema within your application.
Is it better to change the database schema directly and then update the column types in your program manually, or does it make more sense to define the tables in your application and then use the ORM framework's table generation functions to make the schema and then build the tables on the database side for you?
Bear in mind that applications and databases tend to live in a M:M relationship in any but the most trivial cases. If your application is at all likely to have interfaces to other systems, reports, data extracts or loads, or data migrated onto or off it from another system then the database has more than one stakeholder.
Be nice to the other stakeholders in your application. Take the time and get the schema right and put some thought into data quality in the design of your application. Keep an eye on anyone else using the application and make sure you don't break bits of the schema that they depend on without telling them. This means that the database has a life of its own to a greater or lesser extent. The more integration, the more independent the database.
Of course, if nobody else uses or cares about the data, feel free to ignore my advice.
My personal belief is that you should design the database on its own merits. The database is the best place to handle things modeling your Domain data. The database is also the biggest source of slow down in applications and letting your ORM design your database seems like a bad idea to me. :)
Of course, I've only got a couple of big projects behind me. I'm still learning daily. :)
The best way to define your database schema is to start with modeling your application domain (domain driven design anyone?) and seeing what tables take shape based on the domain objects you define.
I think this is the best way because really the database is simply a place to persist information from the application, it should never lead the design. It's not the only place to persist information as well. We have users that want to work from flat files or the database for instance. They could also use XML files. So by starting with your domain objects and then generating tables (or flat file or XML schema or whatever) from there will lead to a much better design in the end.
While this may depend on you using an object-oriented language, using an ORM tool like Hibernate/NHibernate, SubSonic, etc. can really make this transition easy for you up to, and including generating the database creation scripts.
In reference to performance, performance should be one of the last things you look at in an application, it should never drive the design. After you get a good schema up and running based on your domain you can always make tweaks to improve its performance.
Alot depends on your skill level with the specific database product that you're going to use. Think of it as the difference between a "manual" and "automatic" transmission car. ORMs provide you with that "automatic" transmission, just start designing your classes, and let the ORM worry about getting it stored into the database somehow.
Sounds good. The problem with most ORMs is that in their quest to be PI "persistence ignorant", they often don't take advantage of specific database features that can provide elegant solutions for a given task. Notice, I didn't say ALL ORMs, just most.
My take is to design the conceptual data model first yourself. Then you can go in either direction, up towards the application space, or down towards the physical database. But remember, only YOU know if it's more advantageous to use a view instead of a table, should you normalize or de-normalize a table, what non-clustered index(es) make sense with this table, is a natural or surrogate key more appropriate for this table, etc... Of course, if you feel that these questions are beyond your grasp, then let the ORM help you out.
One more thing, you really need to seperate the application design from the database design. They are almost never the same. How important is that data? Could another application be designed to use that data? It's a lot easier to refactor an application than it is to refactor a database with a billion rows of data spread across thousands of tables.
Well, if you can get away with it, doing it in the application is probably the best way. Since it's a perfect example of the DRY principle.
Having said that however, getting away with it is always going to be hard to pull off since you're practically choosing to give up most database specific optimizations. (more so, with querying, but it still applies to schemas (indexes, etc)).
You'll probably end up changing the schema by hand anyway, and then you'll be stuck with a brittle database schema that's going to be the source of your worst nightmares :)
My 2 Cents
Design each based on their own requirements as much as possible. Trying to keep them in too rigid sync is a good illustration of increased coupling/decreased cohesion.
Come to think of it, ORMs can easily be used to spread coupling (even though it can be avoided to some degree).

Resources