What are some of the database optimizations for multi-tenant applications - sql-server

Salesforce’s secret sauce: It queries its databases with “The Multi-Tenant Optimizer" So exactly what could this practice be comprised of?

A whole lot of marketing.

Denormalizing the data so that every row has the "tenant id" in it which reduces the number of necessary joins that have to be done to find the owner of the data.
Just a guess.

Patent application is here
Broadly, separate stats for each tenant/user.

Here's a link to one of their webinars, where their chief architect talked about their database architecture.

Another option is to use "Sharding". Here is a link which has a fairly good description of this technique :
http://www.codefutures.com/database-sharding/
If you're using hibernate for Object-Relational Persistence, they have an additional library which adds support for sharding (and insulates the application many of the details) :
http://www.hibernate.org/subprojects/shards.html

Related

Cassandra data modeling for social network with follower and following actions

I want to know anybody can describe how should be Cassandra data modeling for a social network that allow its users to follow each other, and has timeline and some common features that are in social networks like Twitter.
I found twissandra on Github but that was confusing for me.
Please if you can describe how following and follower tables should be in Cassandra or provide links to tutorials
Despite relational databases schema design in which the queries that will be performed have a major impact only in the context of optimization, schema design for Cassandra is query oriented: this means that you need first to figure out the kind information you will ask for in order to be able to design an effective Cassandra instance. A wrong schema design can kill Cassandra's performances.
Therefore, regarding your question, you should first have a complete picture of your context, and then go back to the design phase.
I have personally found really useful the material provided by Datastax Academy. They are free, you will need only to register. I would suggest you to first take a look at the Cassandra system architecture if you are not familiar with it in order to fully understand the schema design choices, and then look at the main design principles.
Regarding the methodology to be used, I don't think there's an established one right now. I would suggest using the Chebotko Diagrams which are well explained in this article.

MapDB vs regular database

When one should use MapDb vs regular database through an ORM? Other than having a direct mapping to Java.util.Map which can be implemented as well with an ORM.
Jan's answer is highly biased, since he is the author of MapDb.
MapDb is superb for "internal storage" and when there is a single entity with 'values' associated to it. Its interface is very straight forward, and you can either serialization in your own format (recommended) or rely on the highly compact internal serialization format in MapDb.
ORMs are most valuable when the stored data is under some type of "external control". This could be that there are storage policies in the company, pre-defined RDBMS schemas, or perhaps that the data must be queryable by some reporting engine that is made for SQL.
Then there are a multitude of situations where opinion and personal preference makes all the difference. Personally, I am in Jan's corner and think that ORMs quickly becomes incredibly hard to deal with, and if you take 'data migration' into account, I think MapDb (and many other NoSQL alternatives) wins out more times than not. For the case of external query engines, I would send data modification events from the primary application to a secondary system that interprets those and updates the "view" needed by such SQL-only systems.
I would use MapDB if you need extra performance and flexibility. Otherwise use regular ORM with DB.

Design Patterns for Interfacing with Relational Databases?

Does anyone know of any design patterns for interfacing with relational databases? For instance, is it better to have SQL inline in your methods, or instantiate a SQL object where you pass data in and it builds the SQL statements? Do you have a static method to return the connection string and each method just gets that string and connects to the DB, performs its action, then disconnects as needed or do you have other structures that are in charge of connecting, executing, disconnecting, etc?
In otherwords, assuming the database already exists, what is the best way for OO applications to interact with it?
Thanks for any help.
I recommend the book Patterns of Enterprise Application Architecture by Martin Fowler for a thorough review of the most common answers to these questions.
There are a few patterns you can use:
Repository Pattern
Active Record Pattern
I personally would hate to work with a database without an ORM. NHibernate is preferable but iBatis is also an option for existing databases (not to say that NH can't handle existing databases).
In general, the best way for OO applications to interface with a relational database is through an ORM; while this isn't a design pattern per se, it's a type of tool that has a specific usage pattern, so it's similar enough. Object Relational Mapping (ORM) tools provide a mapping between a database and a set of objects in memory; usually, these tools provide means for managing things such as sessions, connections, and transactions. A good example of an ORM that works fantastically well would be Hibernate (NHibernate on .NET).
In my experience it is best to have no SQL statements at all (most ORMs will allow that), and it is best not to have any knowledge of connection details (connection string, etc). Even better if you can have the exact same piece of code working with any major db vendor.
POEAA has a wealth of knowledge on the issue if you intend to roll your own.
Everything you will identify by googling for DAL describes a design pattern. Seems like standing in the middle of the forest and asking to see a tree. There are dozens if not thousands.
Here's a quote from a book I'm reading to start with for looking for resources.
... it is impossible to discuss ORM without talking about patterns and best practices for building persistence layers. Then again, it is also impossible to discuss ORM patterns without calling out the gurus in the industry, namely Martin Fowler, Eric Evans, Jimmy Nilsson, Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides, the last four of whom are known in the industry as the Gang of Four (GoF). The purposes of this chapter are to explain and expand some of the patterns created by these gurus and to provide concrete examples using language that every developer can understand.

Django database scalability

We have a new django powered project which have a potential heavy-traffic characteristic(means a heavy db interaction). So we need to consider the database scalability in advance. With some researches, the following questions are still not clear to us:
coarse-grained: how to specify one db table(a django model) to a specific db(maybe in another server)?
fine-grained: how to specify a group of table rows to a specific db(so-called sharding, also can in another db server)?
how to specify write and read to different db?(which will be helpful for future mysql master/slave replication)
We are finding the solution with:
be transparent to application program(means we don't need to have additional codes in views.py)
should be in ORM level(means only needs to specify in models.py)
compatible with the current(or future) django release(to keep a minimal change for future's upgrading of django)
I'm still doing the research. And will share in this thread later if I've got some fruits.
Hope anyone with the experience can answer. Thanks.
Don't forget about caching either. Using memcached to relieve your DB of load is key to building a high performance site.
As alex said, django-core doesn't support your specific requests for those features, though they are definitely on the todo list.
If you don't do this in the application layer, you're basically asking for performance trouble. There aren't any really good open source automation layers for this sort of task, since it tends to break SQL axioms. If you're really concerned about it, you should be coding the entire application for it, not simply hoping that your ORM will take care of it.
There is the GSoC project by Alex Gaynor that in future will allow to use multiple databases in one Django project. But now there is no cross-RDBMS working solution.
There is no solution right now too.
And again - there is no cross-RDBMS solution. But if you are using MySQL you can try excellent third-party Django application called - mysql_replicated. It allows to setup master-slave replication scenario easily.
here for some reason we r using django with sqlalchemy. maybe combination of django and sqlalchemy also works for your needs.

What are the pros and cons of object databases?

There is a lot of information out there on object-relational mappers and how to best avoid impedance mismatch, all of which seem to be moot points if one were to use an object database. My question is why isn't this used more frequently? Is it because of performance reasons or because object databases cause your data to become proprietary to your application or is it due to something else?
Familiarity. The administrators of databases know relational concepts; object ones, not so much.
Performance. Relational databases have been proven to scale far better.
Maturity. SQL is a powerful, long-developed language.
Vendor support. You can pick between many more first-party (SQL servers) and third-party (administrative interfaces, mappings and other kinds of integration) tools than is the case with OODBMSs.
Naturally, the object-oriented model is more familiar to the developer, and, as you point out, would spare one of ORM. But thus far, the relational model has proven to be the more workable option.
See also the recent question, Object Orientated vs Relational Databases.
I've been using db4o which is an OODB and it solves most of the cons listed:
Familiarity - Programmers know their language better then SQL (see Native queries)
Performance - this one is highly subjective but you can take a look at PolePosition
Vendor support and maturity - can change over time
Cannot be used by programs that don't also use the same framework - There are OODB standards and you can use different frameworks
Versioning is probably a bit of a bitch - Versioning is actually easier!
The pros I'm interested in are:
Native queries - Db4o lets you write queries in your static typed language so you don't have to worry about mistyping a string and finding data missing at runtime,
Ease of use - Defining buissiness logic in the domain layer, persistence layer (mapping) and finally the SQL database is certainly violation of DRY. With OODB you define your domain where it belongs.
I agree - OODB have a long way to go but they are going. And there are domain problems out there that are better solved by OODB,
One objection to object databases is that it creates a tight coupling between the data and your code. For certain apps this may be OK, but not for others. One nice thing that a relational database gives you is the possibility to put many views on your data.
Ted Neward explains this and a lot more about OODBMSs a lot better than this.
It has nothing to do with performance. That is to say, basically all applications would perform better with an OODB. But that would also put lots of DBA's out of work/having to learn a new technology. Even more people would be out of work correcting errors in the data. That's unlikely to make OODBs popular with established companies. Gavin seems to be totally clueless, a better link would be Kirk
Cons:
Cannot be used by programs that
don't also use the same framework
for accessing the data store, making
it more difficult to use across the
enterprise.
Less resources available online for
non SQL-based database
No compatibility across database
types (can't swap to a different db
provider without changing all the
code)
Versioning is probably a bit of a
bitch. I'd guess adding a new
property to an object isn't quite as
easy as adding a new column to a
table.
Sören
All of the reasons you stated are valid, but I see the problem with OODBMS is the logical data model. The object-model (or rather the network model of the 70s) is not as simple as the relational one, and is therefore inferior.
jodonnel, i dont' see how use of object databases couples application code to the data. You can still abstract your application from the OODB through using a Repository pattern and replace with an ORM backed SQL database if you design things properly.
For an OO application, an OO database will provide a more natural fit for persisting objects.
What's probably true is that you tie your data to your domain model, but then that's the crux!
Wouldn't it be good to have a single way of looking at both data, business rules and processes using a domain centric view?
So, a big pro is that an OODB matches how most modern, enterprise level object orientated software applications are designed, there is no extra effort to design a data layer using a different (relational) design. Cheaper to build and maintain, and in many cases general higher performance.
Cons, just general lack of maturity and adoption i reckon...

Resources