why are multiple DBs actually needed? - database

I was looking at godaddy.com which says they offer up to 10 MySQL DBs, but I don't know why you would need more than 1 ever since a DB can have mutliple tables. Can't multiple DBs be integrated into a single DB? Is there an example case where its better or unfeasible to not have multiple ones? And how do you differentiate between them when you want to call them, from their directory or from a name?
Best,

I guess separation of concerns would be the most obvious answer. In the same way you can have all of your functionality in one humongous class in object oriented programming, it's a good idea to keep non-related information separate. It's easier to wrap your head around smaller chunks of data, and future developers mights start to think tables are related, and aggregate data in a way they were never meant to.

Imagine that you're doing two different projects with two different teams. Maybe you won't one team to access the other team tables.
There can also be a space limit in each database, and It each one can be configured with specific params to optimize the performance.
In other hand, two final users can be assigned to make the backups of each entire database, and you wan`t one user to make the backup of the other DB because he could be able to restore the database in other place and access the first database data.

I'm sure there are some pretty good DBAs on the forum who can answer this in detail.
Storing tables in different databases makes because you are able to backup them up individually. Furthermore, you will be able to control access to each database under different NT groups (e.g. Admin vs. users). Although this can be done at the indvidual table level, sometimes it makes sense to grant or deny access to an entire database to a particular group.
When you need to call them in SQL Server you need to append the database name to the query like this SELECT * FROM [MyDatabase].[dbo].[MyTable].

One other reason to use separate databases relates to whether you need full transactional recovery or not. For instance, if I havea bunch of tables that are populated on a schedule through import processes and never by the users, putting them in a separate database allows me to set the recovery mode to simple which reduces the logging (a good thing when you are loading millions of records at once). I can also not do transactional log backup every fifteen minutes like I do for the data in the database with the user inserted data. It could also make recovery a faster process when needed as the databases would be smaller and thus individally take less time to recover. Won't help much when the whole server crashes but it could help a lot if onely one datbase gets corrupted for some reason. If the data relates to different applications, it simplifies the security as well to have the data in separte databases. And of course sometimes we have commercial databases and we can;t add tables to those and so may need a separate database to handles some things we want to add to that data (we do this for instance with our Project Management software, we have a spearate database where we extract and summarize data from the PM system for reporting and then write all our custome reports off that.)

Related

Does the location of a stored procedure affect its performance?

I have a large application which contains "modules" such as Finance, HR, Sales, Customer Service.
To make the application manageable and to distribute the load, I have decided to give each module its own database on a single server. There is also going to be a Master database for holding master information such as information about users, some global lookup tables, and some security stuff.
I am now trying to decide whether to place module-specific stored procedure in their corresponding database, or whether to keep them all in the Master database. For example there is a stored procedure named dbo.sales_customer_orders that selects data from only the Sales database tables. And of course this SP is going to be executed a lot of times by users. Therefore should it be in the Sales database or will it be okay to keep it in the Master database in terms of performance/scalability/reliability/security.
Does it matter that a stored procedure resides in a different database to the one its selecting from?
In my experience you would not experience an immediate performance penalty by sharding the data across multiple databases and this is actually a common practice in large n-tier applications. You would obviously experience some minor penalty upon moving the databases to different servers.
You could see this blog post as well as several others on the site which talk about the correct way to shard data as well as the importance of using multiple connection string's for reads and writes to facilitate scaling and possibly caching layer later on.
How do you actually plan to develop all these databases? If you want to use SSDT, you will be drowned in all those cross-database dependencies. Besides, your procedure in question being resided in the head database makes no sense if, for example, some particular customer decided not to buy the Sales module (and there is no Sales database anywhere around). In this case, calling it will lead to some very unpleasant and unexpected consequences, such as batch being aborted and (possibly) transaction left open.
Keep similar things together; otherwise, there will be no modularity in your approach.
Performance-wise, usually there is no difference for cross-database calls within the same SQL Server instance. If your shards are located on different instances, however, the result might be anywhere between "slightly noticeable" to "detrimental" - it depends on many factors, and not all of them can be mitigated by a DBA.

SQL Server - robust protection of client data (multi-tenancy)

We are considering using a single SQL Server database to store data for multiple clients. We feel having all the data in one database could make things more manageable than a "separate db per client" setup.
The biggest concern we have is accidental access to the wrong client. It would be very, very bad if we were to ever accidentally show one client's data to another client. We perform lots of queries, and are afraid of a scenario where someone says "write me a query of this and this to go show the client for the meeting in 15 minutes." If someone is careless and omits the WHERE clause that filters for the correct client then we would be in serious trouble. Is there a robust setup or design pattern for SQL Server such that it makes it impossible (or at least very difficult) to accidently pull the wrong client's data from a single "global" database?
To be clear, this is NOT a database that the clients use directly or via apps (yet). We are talking about a database accessed by several of our programmers and we are afraid of screwing up ourselves.
At the very minimum, you should put the client data in separate schemas. In SQL Server, schemas are the unit of authorization. Only people authorized for a given client should be able to see that client's data. In addition to other protections, you should be using the built-in authorization capabilities of the database.
Right now, it sounds like you are in a situation where a very small group of people are the ones accessing all the data for everyone. Well, if you are successful, then you will probably need more people in the future. In fact, you might be giving some clients direct access to the data. If it is their data, they will want apps running on it.
My best advice, if you are planning on growing, is to place each client's data in a separate database. I would architect the system so this database can be on a remote server. If it needs to synchronize with common data, then develop a replication strategy for moving that data around.
You may think it is bad to have one client see another client's data. From the business perspective, this is deadly -- like "company goes out of business, no job" deadly. Your clients are probably more concerned about such confidentiality than you are. And, an architecture that ensures protection will make them more comfortable.
Multi-Tenant Data Architecture
http://msdn.microsoft.com/en-us/library/aa479086.aspx
here's what we do (mysql unfortunately):
"tenant" column in each table
tables are in one schema [1]
views are in another schema (for easier security and naming). view must not include tenant column. view does a WHERE on the tenant based on current user
tenant value is set by trigger on insert, based on the user
Assuming that all your DDL is in .sql files under source control (which it should be), then having many databases or schemas is not so tough.
[1] a schema in mysql is called a 'database'
You could set up one inline table valued function for each table that takes a required parameter #customerID and filters that particular table to the data of this customer. If the entire app were to use only these TVP's the app would be safe by construction.
There might be some performance implications. The exact numbers depend on the schema and queries. They can be zero, however, as inline TVP's are inlined and optimized together with the rest of the query.
You can limit access to data only via storedprocedures with obligatory customerid parameter.
If you allow you IT build views sooner or later someone forget this where clause as you said.
But a schema per client with already prefiltered views will enable selfservice and extra Brings value i guess.

Use one large database or use single databases per customer

Currently I'm working on a on-line webapplication for construction materials. Companies can log in on our website and then they can use the webapp.
From the beginnen the idea was to create a database per customer. But now it's becomming larger and larger (100+) so we have now 100 databases to manage.
We have to run approx. twice a year an update script for db maintanance.
The advantage that I see, is that when a customer wants to quit, we delete their database and than it's finished.
When I want to add new customer, I have to fill the database with approx. 1.000.000 unique records for that specific customer, because every customer have different prices /materials.
For backups I use a MySQL Dump script, that creates a *.sql file per database that I download every day.
What is your opnion and what do you think?
One large db or per customer a database?
I'm using MySQL with ASP.NET/C#...
I don't want to make a suggestion because there are far too many variables.
I do want to note, however, that my employer has 1000s of deployed databases -- we use one database per customer with replication (2+ databases).
So, the idea is workable. My job isn't related to DB management but I do recall that we do a lot in the way of automation and online tools. Backups and DB management is handled by a team.
Ultimately, you can make the 100+ deployments work but you are going to want to start investing in the development of utility and tools to help automate the backup and/or management of the DBs.
Ideally, nothing (DB Management) should be done by hand. Furthermore, the connection strings should be abstracted away from a given web app deployment.
But now it's becomming larger and larger (100+) so we have now 100 databases to manage
I think you have your answer right there.
Have to agree with #Hogan - the overhead of managing that many databases is probably far from ideal - especially if you ever need to make schema changes, etc. in the future.
That said, if you use a single database are you ever likely to need to separate out a given customer's data into a standalone database/site? If this is likely, how long would it take to carry out this separation?
In essence, if it's likely to take less effort to write a set of tools to handle the above case, then I'd be tempted to go for the single database approach. However, you'll also need to factor in the likely timescales for creating a unified version of the database schemas that handle datasets for each customer, etc.
Also, are the schemas precisely the same for all of the existing 100+ databases? If not, there's potentially a world of pain if you decide to migrate the existing data into a single database.
Update - Incidentally, all of the above is a bit generalised, but it's hard to be specific without knowing more about the amount of data, and traffic, etc. in use. (e.g.: If you ever had a high demand site for a customer it would be trivial to put it onto its own DB server if you were using a per-customer database.)
i agree with #Hogan and #middaparke... if the schemas are the same, you shuol dput it in one instance.
unfortuantely it is impossible to tell from here if your schemas would benefit from reusing most of those million rows or not, if normalized well, the ncertinly it would be beneficial.
it is also impossible to tell how difficult any changes to the applications would be based on this change.
unfortunately, it sounds like you have a large customer base with working applications, and therefore momentum to keep going in that direction - which thros you into the realm of sucking it up and dealing with it by automating the management of so many db's... not the way you would do it from scratch - but maybe cheapest since you are where you are.

What should I keep in mind if I wish to merge many DBs into one DB?

I am working with a half dozen DBs. The DBs all have the same schemas, the same SPs, etc. Speaking to the person who originally designed the DBs, a big part of the motivation for using many DBs was efficiency; the alternative would be to add a column to pretty much every table and sp in the database indicating which set of data was being worked in, resulting in one giant (and thus slower) DB instead of several samll DBs. In place of having a column to indicate which set of data is being queried, the connection string is used to select which database is being hit.
The only reason I really dislike this organization is that it involves a lot of code duplication and thus hurts maintenance. For example, every time I wish to change a stored procedure, I need to run the alter statement on every database.
One solution I have considered is to combine all of the data into one big database, adding an extra column all over the place to indicate which database the data would be in if I had not combined it. Then, I could partition all of the tables by this column's value. In theory, the result of all of this is that the underlying representation of all of the data itself will be morally the same as it is now, but without the redundancies in the indexes, schemas, SPs, etc.
My questions are this:
Is this a good idea? Is there a better way to accomplish this?
Are there any gotchas in doing this?
Will this have any impact on performance?
Everyone will deal with this at some point. My own personal opinion is that multiple databases are a pain in the backside and are not faster. They are a pain because of the maintenance headaches. Adding an extra column in each table as necessary will not slow your process done that much, if indexing is set properly. And your maintenance will be much easier. Plus, doing transactions across multiple DB's can be a hassle and involve MTC.
BTW, using a single database is often called a multi-tenant database. You might want to research this a bit. But I would avoid multiple DB's like this if possible.
I'm of a different mind than Randy.
The multi-tenant model has its advantages.
For one, maintenance is not really much different whether you have 5 databases or 500. At some point you stop looking at maintenance of individual databases and look at the set. Yes you must serialize backups and you can't be performing index reorg/rebuild across all databases at once.
But for code changes across multiple more-or-less identical databases, there are easy ways to script a lot of things to be done to multiple databases without really lifting an extra finger. I use a tool called SQLFarms Combine (now sold by JNetDirect), but there are other offerings such as RedGate MultiScript that I haven't played with.
What I like most about the multi-tenant model is that when you grow and scale and suddenly need a new database server, it is very easy to move one of the tenants (say, the busiest or fastest growing) to the new server. If everybody is jammed into the same database, this extraction of only their data becomes quite difficult, especially if there is to be minimized downtime. In the multi-tenant model, you can set up mirroring for just their database, and then switch the primary when you're ready.
I'd be in favor of combining these databases. There are other facilities built into SQL Server to account for the potential performance downfalls of a very large database, like additional indexing on a second physical disk, partitioning, clustering, etc. The headache and overhead involved in deploying schema updates to that many different databases can be time consuming when it's easily handled in a single database. I think SQL Server scales really well in cases like this - let the database server do what it's designed to do and provide responsive access to your data. You can focus on application design and leave the storage model to SQL Server.
Also, though this isn't mentioned above, I'd suspect that there's some level of dynamic SQL involved in the applications that use this "many database" model because you've got to switch between databases based on something you know, so it can't be hard coded into the application or in a configuration file, meaning that either connection strings or actual SQL statements have to be generated on the fly, and that can be a really big security risk (read about "SQL Injection" if you're unfamiliar with the potential risks of dynamic SQL).

Copying data from a local database to a remote one

I'm writing a system at the moment that needs to copy data from a clients locally hosted SQL database to a hosted server database. Most of the data in the local database is copied to the live one, though optimisations are made to reduce the amount of actual data required to be sent.
What is the best way of sending this data from one database to the other? At the moment I can see a few possibly options, none of them yet stand out as being the prime candidate.
Replication, though this is not ideal, and we cannot expect it to be supported in the version of SQL we use on the hosted environment.
Linked server, copying data direct - a slow and somewhat insecure method
Webservices to transmit the data
Exporting the data we require as XML and transferring to the server to be imported in bulk.
The data copied goes into copies of the tables, without identity fields, so data can be inserted/updated without any violations in that respect. This data transfer does not have to be done at the database level, it can be done from .net or other facilities.
More information
The frequency of the updates will vary completely on how often records are updated. But the basic idea is that if a record is changed then the user can publish it to the live database. Alternatively we'll record the changes and send them across in a batch on a configurable frequency.
The amount of records we're talking are around 4000 rows per table for the core tables (product catalog) at the moment, but this is completely variable dependent on the client we deploy this to as each would have their own product catalog, ranging from 100's to 1000's of products. To clarify, each client is on a separate local/hosted database combination, they are not combined into one system.
As well as the individual publishing of items, we would also require a complete re-sync of data to be done on demand.
Another aspect of the system is that some of the data being copied from the local server is stored in a secondary database, so we're effectively merging the data from two databases into the one live database.
Well, I'm biased. I have to admit. I'd like to hypnotize you into shelling out for SQL Compare to do this. I've been faced with exactly this sort of problem in all its open-ended frightfulness. I got a copy of SQL Compare and never looked back. SQL Compare is actually a silly name for a piece of software that synchronizes databases It will also do it from the command line once you have got a working project together with all the right knobs and buttons. Of course, you can only do this for reasonably small databases, but it really is a tool I wouldn't want to be seen in public without.
My only concern with your requirements is where you are collecting product catalogs from a number of clients. If they are all in separate tables, then all is fine, whereas if they are all in the same table, then this would make things more complicated.
How much data are you talking about? how many 'client' dbs are there? and how often does it need to happen? The answers to those questions will make a big difference on the path you should take.
There is an almost infinite number of solutions for this problem. In order to narrow it down, you'd have to tell us a bit about your requirements and priorities.
Bulk operations would probably cover a wide range of scenarios, and you should add that to the top of your list.
I would recommend using Data Transformation Services (DTS) for this. You could create a DTS package for appending and one for re-creating the data.
It is possible to invoke DTS package operations from your code so you may want to create a wrapper to control the packages that you can call from your application.
In the end I opted for a set of triggers to capture data modifications to a change log table. There is then an application that polls this table and generates XML files for submission to a webservice running at the remote location.

Resources