MongoDB for multitenant site configurations? - arrays

On several occasions I've come across people using databases to store tenant configurations for multi-tenant platforms. I can hardly think of this having any advantages over storing configurations in static files with relevant ones getting loaded during start-up.
With relational databases, one could possibly claim that tables enforce some structure for the configuration, but with document-oriented databases, structure is not enforced.
Arguably, a similar (single-writer, potentially multiple-readers) scenario is storing logs, but in that situation there's the use case of searching through huge amounts of data and thus benefiting from non-linear search techniques. Configurations would hardly be so big to have such search performance issues.
In particular, could anyone suggest any solid reason why a configuration would be stored in MongoDB, rather than a simple plain-text file?

I'm still not convinced that this is a good solution, but in a way it is an "easy" solution to someone well acquainted with MongoDB. With time I've experienced some of the advantages. Here is what I can see being perceived as such:
Comes "automatically distributed" in the sense that it being over the network the properties are accessible in a secure way to multiple web services.
Depending on the client library and with lots of caveats, this commonly comes as "automatically reloading changes".
Comes with "automatic" syntax validation (JSON in the case of MongoDB) - data cannot be saved if it is not syntactically valid.

Related

Ideal database for a minimalist blog engine

So I'm designing this blog engine and I'm trying to just keep my blog data without considering comments or membership system or any other type of multi-user data.
The blog itself is surrounded around 2 types of data, the first is the actual blog post entry which consists of: title, post body, meta data (mostly dates and statistics), so it's really simple and can be represented by simple json object. The second type of data is the blog admin configuration and personal information. Comment system and other will be implemented using disqus.
My main concern here is the ability of such engine to scale with spiked visits (I know you might argue this but lets take it for granted). So since I've started this project I'm moving well with the rest of my stack except the data layer. Now I've been having this dilemma choosing the database, I've considered MongoDB but some reviews and articles/benchmarking were suggesting slow reads after collections read certain size. Next I was looking at Redis and using its persistence features RDB and AOF, while Redis is good at both fast reading/writing I'm afraid of using it because I'm not familiar with it. And this whole search keeps going on to things like "PostgreSQL 9.4 is now faster than MongoDB for storing JSON documents" etc.
So is there any way I can settle this issue for good? considering that I only need to represent my data in key,value structure and only require fast reading but not writing and the ability to be fault tolerant.
Thank you
If I were you I would start small and not try to optimize for big data just yet. A lot of blogs you read about the downsides of a NoSQL solution are around large data sets - or people that are trying to do relational things with a database designed for de-normalized data.
My list of databases to consider:
Mongo. It has huge community support and based on recent funding - it's going to be around for a while. It runs very well on a single instance and a basic replica set. It's easy to set up and free, so it's worth spending a day or two running your own tests to settle the issue once and for all. Don't trust a blog.
Couchbase. Supports key/value storage and also has persistence to disk. http://www.couchbase.com/couchbase-server/features Also has had some recent funding so hopefully that means stability. =)
CouchDB/PouchDB. You can use PouchDB purely on the client side and it can connect to a server side CouchDB. CouchDB might not have the same momentum as Mongo or Couchbase, but it's an actively supported product and does key/value with persistence to disk.
Riak. http://basho.com/riak/. Another NoSQL that scales and is a key/value store.
You can install and run a proof-of-concept on all of the above products in a few hours. I would recommend this for the following reasons:
A given database might scale and hit your points, but be unpleasant to use. Consider picking a database that feels fun! Sort of akin to picking Ruby/Python over Java because the syntax is nicer.
Your use case and domain will be fairly unique. Worth testing various products to see what fits best.
Each database has quirks and you won't find those until you actually try one. One might have quirks that are passable, one will have quirks that are a show stopper.
The benefit of trying all of them is that they all support schemaless data, so if you write JSON, you can use all of them! No need to create objects in your code for each database.
If you abstract the database correctly in code, swapping out data stores won't be that painful. In other words, your code will be happier if you make it easy to swap out data stores.
This is only an option for really simple CMSes, but it sounds like that's what you're building.
If your blog is super-simple as you describe and your main concern is very high traffic then the best option might be to avoid a database entirely and have your CMS generate static files instead. By doing this, you eliminate all your database concerns completely.
It's not the best option if you're doing anything dynamic or complex, but in this small use case it might fit the bill.

Redis for cakePHP app

I want to start a big cakePHP project where performance will be an issue. I will have a users table with act as tree behavior and many financial data related to the users. This application will make a lot of dynamic reports aggregating data for different tree nodes etc.
Since there is on github an easy to use library which sets data source of model to redis, I was wondering if it's a good idea to use it for entire app? Is there anyone who has experience with it, and what could be potential problems if I decide to depend on redis as main/only data storage?
EDIT: I have installed redis and Tried to use RedisModel for two models with simple relation HasMany/BelongsTo. When I tried to simply use those models like standard AppModels - it simply wont work (Redis Error: Missing key). Apparently you can't use Model->find Model->save etc. in standard way. You have to use redis methods instead (setKeyValue ect.). This means that pagination and other cakePHP futures will also not work. So maybe it is not the best idea to use redisModel for all my models...
I cannot speak for CakePHP specifically, but I'll talk about redis in general and the points of your question in particular, it should be applicable to your framework of choice in the end. Let's see:
You mention you want to start an application where performance will be an issue — I just wanted to mention you should be careful with the assumption that you will need a nosql solution, because this is hard to assess beforehand. Redis is hella fast, but MySQL for instance has been proven to be capable to handling millions of records and operations just fine, provided it's properly configured and used, and it's much simpler if you need lots of relational structures.
Concerning Redis as the main and only data store:
Redis is perfectly stable for the job. Instagram
reportedly stored 300 million key-value pairs pseudo-sharded
using hashes to great effect, and while it's not the only data
storage system they use, it goes to show redis is pretty reliable.
This very site (Stack Overflow) uses redis also extensively for
caching purposes.
Redis is also reported to have an overall excellent continuous uptime on average (which shouldn't be surprising considering the point above)
Options exists to mitigate downtime issues, replication is supported to some extent, and Redis Cluster is coming soon to support proper distributed approaches.
The main problem you could face is not understanding properly how its
persistence works. You should absolutely read this and this article before you get started because this point is important. In a nutshell, redis does not write changes immediately to disk, which means that depending on your configuration, a crash can cause a data loss ranging from a few seconds to several minutes since the last disk write. This might or might not be a problem depending on your use case; if the data is extremely sensitive (ie, financial records) you might want to think twice before jumping to redis, or build a system where redis is not exclusively used but rather combined with another storage system.
Relational structures in a non-relational data store like redis mean doing more work and often duplicating/denormalizing data. It can be done, but it's something to consider; in your question you mention you'll need to aggregate data to generate dynamic reports, are you sure you want to use redis for this? it sounds like a relational database would give you way more flexibility at a very small cost of performance. If you know in advance you'll need to run complex queries over your data, it could be a good idea not to reinvent the wheel unless you absolutely need to.
My advice here would be to first get a better feeling on what redis is and how works, potentially build your own models instead of relying on others to better understand what can and cannot be done, and from there assess where you want to take it. Redis is reliable enough to be used standalone, but at the end of the day what's smart is to use the right tool for the right job, and you might find some things of your app work well with redis while some others are better off to a more traditional storage system.

Using LDAP server as a storage base, how practical is it?

I want to learn how practical using an LDAP server (say AD) as a storage base. To be more clear; how much does it make sense using an LDAP server instead of using RDBMS to store data?
I can guess that most you might just say "it doesn't" but there might be some reasons to make it meaningful (especially business wise);
A few points first;
Each table becomes a container entity and each row becomes a new entity as a child. Row entities contains attributes for columns. So you represent your data in this way. (This should be the most meaningful representation I think, suggestions are welcome)
So storing data like a DB server is possible but lack of FK and PK (not sure about PK) support is an issue. On the other hand it supports attribute (relates to a column) indexing (Not sure how efficient). So consistency of data is responsibility of the application layer.
Why would somebody do this ever?
Data that application uses/stores closely matches with the existing data in AD. (Users, Machines, Department Info etc.) (But still some customization is required to existing entity schema, and new schema definitions are needed for not very much related data.)
(I think strongest reason would be this: business related) Most mid-sized companies have very well configured AD servers (replicated, backed-up etc.) but they don't have such DB setup (you can make comment to this as much as you want). Say when you sell your software which requires a DB setup to these companies, they must manage their DB setup; but if you say "you don't need DB setup and management; you can just use existing AD", it sounds appealing.
Obviously there are many disadvantages of giving up using DB, feel free to mention them but let's assume they are acceptable. (I can mention more if question is not clear enough.)
LDAP is a terrible tool for maintaining most business data.
Think about a typical one-to-many relationship - say, customer and orders. One customer has many orders.
There is no good way to represent this data in an LDAP directory.
You could try having a mock "foreign key" by making every entry of that given object class have a "foreign key" attribute, but your referential integrity just went out the window. Cascade deletes are impossible.
You could try having a "customer" object that has "order" children. However, you've just introduced a specific hierachy - you're now tied to it.
And that's the simplest use case. Once you start getting into more complex relationships, you're basically re-inventing an RDBMS in a system explicity designed for a different purpose. The clue's in the name - directory.
If you're storing a phonebook, then sure, use LDAP. For anything else, use a real database.
For relatively small, flexible data sets I think an LDAP solution is workable. However an RDBMS provides a number significant advantages:
Backup and Recovery: just about any database will provide ACID properties. And, RDBMS backups are generally easy to script and provide several options (e.g. full vs. differential). Just don't know with LDAP, but I imagine these qualities are not as widespread.
Reporting: AFAIK LDAP doesn't offer a way to JOIN values easily, much the less do things like calculate summations. So you would put a lot of effort into application code to reproduce those behaviors when you do need reporting. And what application doesn't ultimately?
Indexing: looks like LDAP solutions have indexing, but again, seems hit or miss. Whereas seemingly all databases out there have put some real effort into getting this right.
I think any serious business system's storage should be backed up in the same fashion you believe LDAP is in most environments. If what you're really after is its flexibility in terms of representing hierarchy and ability to define dynamic schemas I'd suggest looking into NoSQL solutions or the Java Content Repository.
LDAP is very usefull for storing that information and if you want it, you may use it. RDMS is just more comfortable with ORM systems. Your persistence logic with LDAP will so complex.
And worth mentioning that this is not a standard approach -> people who will support the project will spend more time on analysis.
I've used this approach for fun, i generate a phonebook from Active Directory, but i don`t think that it's good idea to use LDAP as a store for business applications.
In short: Use the right tool for the right job.
When people see LDAP you already set an expectation on your system. Don't forget that the L Lightweight. LDAP was designed for accessing directories over a network.
With a “directory database” you can build a certain type of application. If you can map your data to a tree like data structure it will work. I surely would not want to steam videos from LDAP! You can probably hack something but I would prefer a steaming server..
There might be some hidden gotchas down the line if you use a tool not designed for what it is supposed to do. So, the downside is you'll have to test stuff that would have been a given in some cases.
It's not is not just a technical concern. Your operational support team might “frown” on your application as they would have certain expectations/preconceptions based on your applications architectural nature. Imagine their surprise if you give them CRM system (website + files and popped email etc.) in a LDAP server as database to maintain.
If I was in your position, I would steer towards one of the NoSQL db solutions rather than trying to use LDAP. LDAP is fine for things like storing user and employee information, but is terrible to interact with when you need to make changes. A NoSQL db will allow you to store your data how you want without the RDBMS overhead you would like to avoid.
The answer is actually easy. Think of CRUD (Create, Read, Update, Delete). If a lot of Read will be made in your system, you can think of using LDAP. Because LDAP is quick in read operations and designed so. If the other operations will be made more, the RDMS would be a better option.

Distributed database management system - alternatives?

I am working to develop an application that needs data distributed across countries. Content will be supplied "per region", but needs to be able to be easily copied to another region. On top of this I have general information that needs to be shared and synchronized across the databases.
The organisation I work for is considering implementing this system themselves, but it feels like there should be some good solutions out there already (I am open to cloud solutions - the less my company needs to manage the better)?
This might be a vague question, but I think it is possible to answer it well.
What are my options when developing this kind of distributed data system?
Update:
Should have elaborated (but I'm not sure how much I can say given NDA). Suffice to say, I have "Content" which I need stored on some space (files). I need metadata stored about the content distributed over several nodes (that might be hosted by us or some one else) to allow fast-paced communication and regionalized differences in data. I need to control HOW data is replicated between nodes, but preferably in a standards compliant way. (Preferably not written by us)
You can try CouchDB. Its off-line replication model sounds like a good fit for geo distributed system.
Interesting question - but it would really help to get more context.
You talk about "data", which usually means something with a fairly well-defined structure, often implemented in a relational database.
You also talk about "content", which usually means something with a (much) less well-defined structure, often implemented as a document of some type. Many solutions exist for structuring "documents", e.g. file systems or web sites.
Assuming we are talking about structured data, the simplest thing to do is have single repository, accessible everywhere. Have a look at "cloud" offerings - Amazon's a good bet. Creating your own global data repository is a significant undertaking - but if you're dealing with highly confidential data, or have specific performance requirements, it may the way to go.
If neither of those options work, you're in the world of "enterprise service bus". Google it, but be careful - it's a complex field, and you really want to find someone who knows what they're doing.
Having said that, using an off the shelf ESB is many times less painful than building your own distributed data structure.
I know it's years after asking, but I was looking up the answer to the same question and it looks like Cassandra may fit the bill. Once setup, it looks and acts like other database solutions (Tables, Views, SQL, Transactions, etc.), but it can also be entirely decentralized. Each instance acts as a node in a cluster of other Cassandra nodes. They synchronize behind the scenes and if one goes down, the others pick up the slack. This makes Cassandra both highly scalable and highly fault tolerant.

In Memory Database

I'm using SqlServer to drive a WPF application, I'm currently using NHibernate and pre-read all the data so it's cached for performance reasons. That works for a single client app, but I was wondering if there's an in memory database that I could use so I can share the information across multiple apps on the same machine. Ideally this would sit below my NHibernate stack, so my code wouldn't have to change. Effectively I'm looking to move my DB from it's traditional format on the server to be an in memory DB on the client.
Note I only need select functionality.
I would be incredibly surprised if you even need to load all your information in memory. I say this because, just as one example, I'm working on a Web app at the moment that (for various reasons) loads thousands of records on many pages. This is PHP + MySQL. And even so it can do it and render a page in well under 100ms.
Before you go down this route make sure that you have to. First make your database as performant as possible. Now obviously this includes things like having appropriate indexes and tuning your database but even though are putting the horse before the cart.
First and foremost you need to make sure you have a good relational data model: one that lends itself to performant queries. This is as much art as it is science.
Also, you may like NHibernate but ORMs are not always the best choice. There are some corner cases, for example, that hand-coded SQL will be vastly superior in.
Now assuming you have a good data model and assuming you've then optimized your indexes and database parameters and then you've properly configured NHibernate, then and only then should you consider storing data in memory if and only if performance is still an issue.
To put this in perspective, the only times I've needed to do this are on systems that need to perform millions of transactions per day.
One reason to avoid in-memory caching is because it adds a lot of complexity. You have to deal with issues like cache expiry, independent updates to the underlying data store, whether you use synchronous or asynchronous updates, how you give the client a consistent (if not up-to-date) view of your data, how you deal with failover and replication and so on. There is a huge complexity cost to be paid.
Assuming you've done all the above and you still need it, it sounds to me like what you need is a cache or grid solution. Here is an overview of Java grid/cluster solutions but many of them (eg Coherence, memcached) apply to .Net as well. Another choice for .Net is Velocity.
It needs to be pointed out and stressed that something like NHibernate is only consistent so long as nothing externally updates the database and that there is exactly one NHibernate-enabled process (barring clustered solutions). If two desktop apps on two different PCs are both updating the same database with NHibernate the caching simply won't work because the persistence units simply won't be aware of the changes the other is making.
http://www.db4o.com/ can be your friend!
Velocity is an out of process object caching server designed by Microsoft to do pretty much what you want although it's only in CTP form at the moment.
I believe there are also wrappers for memcached, which can also be used to cache objects.
You can use HANA, express edition. You can download it for free, it's in-memory, columnar and allows for further analytics capabilities such as text analytics, geospatial or predictive. You can also access with ODBC, JDBC, node.js hdb library, REST APIs among others.

Resources