Is it acceptable to cross between databases? - database

I'm not sure what this practice is actually called, so perhaps someone can edit the title to more accurately reflect my question.
Let's say we have a site that stores objects of different types. Each type of object has its own database (a database of books and assorted information with its tables, a database of CDs and information with its tables, and so on). However, all of the objects have keywords and the keywords should be uniform across all objects, regardless of type. A new database with a few tables is made to store keywords, however each object database is responsible for mapping the object ID to a keyword.
Is that a good practice?

Is there a reason to have separate databases for each type of object? You would be better off using multiple tables, and joining them. For example, you may have a table GENERIC_OBJECT which holds things that are common across all types, and then a table called BOOK_OBJECT where BOOK_OBJECT.ID = GENERIC_OBJECT.ID for a given book. Another table would be CD_OBJECT where CD_OBJECT.ID = GENERIC_OBJECT.ID for a given CD. Then things like keywords that are common across all objects would be stored in the GENERIC_OBJECT table, and things that are specific to the item would go in the item's corresponding table.

By separating them into different databases, you lose:
the ability to do ACID transactions (assuming you aren't using a two-phase commit solution).
the ability to have referential integrity.
JOINs across tables.

Thomas, what you're missing in your comment responses to our concerns about referential integrity is that you can't do a foriegn key across two databases. If the two tables are in one database, then you can use foriegn key constraints to ensure that when you delete an object, anything that relies upon its object id is also deleted, and other similar things.

While it is possible to do joins across databases, I wouldn't generally split the data across databases just because they are of slightly different categories. Others have also mentioned the inability to use referential integrity across databases.
On the other hand, if each type of product has radically different front-end applications, or if you expect each database to become massively large, those might be reasons to consider leaving them in separate databases. (Although scaling isn't a problem for most modern databases).
Syntax example for cross-database joins:
SELECT *
FROM books b
INNER JOIN KeywordDB.dbo.Keywords k
ON b.keywordID = k.keywordID
In this example, you are performing the query from the local database that contains the books table, and you are joining to the other database. (This is a MS SQL syntax example)

No, it's a bad idea. By separating them into different databases, you significantly impair your ability to do JOIN queries.

It does seem a little bit too seperated but with some well designed views it could work especially if the views are simply lookups.
Why such seperation in the first place?

As everyone has mentioned this is, in general, not a good idea. However, to play devils advocate, I've seen other developers do this. I'm sure that there are some reasons that one might want accomplish this however if absolutely needed (not sure if your asking for solutions but) you might want to use some sort of synchronization to keep the data synchronized. Have all (or what is needed) of the data in both databases.
This also isn't an ideal solution, but if you must uses two different database types, this might be a better way to go about such a thing.
It could at least solve the issues that everyone has been outlining – though keep in mind that it does present a new problem… Is everything in sync?
Good Luck,Frank

If the decision about whether to use one database or two is yours, I recommend going with just one database. The data in the two tables appears closely related, judging from your question. The size and complexity doesn't seem to merit splitting into two databases.
What's your DBMS? If it's Oracle, DB2, SQL Server, or even MS Access, you shouldn't have any trouble administering a single database with keyword data and object data in logically related tables.

Related

Database table creation for large data

I am making a client management application in which I am storing the data of employee , admin and company. In the future the database will have hundreds of companies registered. I am thinking to go for the best approach to database design.
I can think of 2 approaches:
Making all tables of app separately for each company
Storing all data in app database
Can you suggest the best way to do that?
Please note that all 3 tables are linked on the basis of ids and there will be hundreds of companies and each company will have many admin and each admin will have hundreds of employee . What would be the best approach to do with security and query performance
With the partial information you provided, it look like 3 normalized tables is what you need, plus the auxiliar data like lookups and other stuff.
But when you design a database you would need to consider many more point like, security, visibility, client access methods, etc
For example if you want to ensure isolation, and don't allow users to have any visibility to other's data, you could create dynamically a schema per company, create user and access rights for each schema dynamically. Then you'll need support these stuff in the DAL, which in fact will be quite fat.
Another approach for the DAl could be exposing views that always return subsets for one company.
A big reason reason that I would suggest going for the normalized approach is that maintenance will be much easier this way.
From a SQL point of view I don't see any performance advantage having many tables or just 3, efficiency of the indexes, and smart DAL will make the difference.
The performance of the query doesn't much depends on the size of table but it depends more on the indexes you have on that table. so you need to put clustered and non clustered indexes as per your requirement and i can guarantee that up to 10 GB of data you will not face any problem
This is a classic problem shared my most web business services: for discussions of the factors involved, Google "multi-tenant architecture."
You almost certainly want to put all companies into a common set of tables: each data table should reference the company key, and all queries should join on that key, among their other criteria. This allows the best overall performance, and saves you the potential maintenance nightmare of duplicating views, stored procedures and so on hundreds of times, or of having to apply the same structural changes to hundreds of tables should you wish to add a field or a table.
To help assure that you don't inadvertently intermingle data from different customers, it might be useful to do all data access through a validated set of stored procedures (all of which take the company ID as a parameter).
Hundreds of parallel databases will not scale very well: the DB server will constantly be pushing tables and indexes out of memory to accommodate the next query, resulting in disk thrashing and poor performance, as well. There is only pain down that path.
depending on the use-cases of your application there is no "best" way.
Please explain the operations your application will provide so we can get further insight into your problem.
The data to be stored seemed to be structured so a relational database at a first glance would work out well, but stick to the point i marked above.
You have not said how this data links at all or if there are even any links between them. However, at a guess, you need 3 tables.
EmployeeTable
AdminTable
CompanyTable
Each with the required properties in there, without additional information I'm not able to provide any more guidance.

How can I organize tables in SQL*Plus?

I am practicing SQL, and suddenly I have so many tables. What is a good way to organize them? Is there a way to put them into different directories?
Or is the only option to create a tablespace as explained here?
It depends what you mean by organise - tablespaces are really focused on organising storage.
For organising tables, grouping them into different SCHEMAS may be more useful.
This is more like the concept of a 'namespace' - i.e. schema1.people is not the same as schema2.people.
It often pays off to separate Operational and Configuration data into different schemas.
If you are talking about organising tables within a schema - and in a real world application, having hundreds of tables in one schema is not unknown - then all you can really do is come up with good naming conventions.
Some places group tables with prefixes at the start of the table name. Personally, I think this leads to duplication - EMP_ADDRESSES and CUST_ADDRESSES rather than a properly linked Addresses.
It depends why you want to organise them and why (and when) you're creating them. If the number is just overwhelming when you look in, say, user_tables, then splitting into tablespaces wont help much as you'd need to specify which one you wanted to query each time. And there isn't really a 'directory' equivalent.
If you're creating practice tables just to experiment with mini projects, then one option might be to create a new Oracle user for each project and create all the related tables under that user schema. Then you'd only see relevant tables when logged in as that user, while working on that project. This has the advantage of allowing you to reuse table names, which can simplify things a bit of you're doing lots of similar projects.
You should also probably be thinking about tidying up a bit, dropping tables when you're sure you've finished that bit of experimentation.
They are allready organised because they are in a database and you have a repository.

Database alternatives?

I was wondering the trade-offs for using databases and what the other options were? Also, what problems are not well suited for databases?
I'm concerned with Relational Databases.
The concept of database is very broad. I will make some simplifications in what I present here.
For some tasks, the most common database is the relational database. It's a database based on the relational model. The relational model assumes that you describe your data in rows, belonging to tables where each table has a given and fixed number of columns. You submit data on a "per row" basis, meaning that you have to provide a row in a single shot containing the data relative to all columns of your table. Every submitted row normally gets an identifier which is unique at the table level, sometimes at the database level. You can create relationships between entities in the relational database, for example by saying that a given cell in your table must refer to another table's row, so to preserve the so called "referential integrity".
This model works fine, but it's not the only one out there. In some cases, data are better organized as a tree. The filesystem is a hierarchical database. starts at a root, and everything goes under this root, in a tree like structure. Another model is the key/value pair. Sleepycat BDB is basically a store of key/value entities.
LDAP is another database which has two advantages: stores rather generic data, it's distributed by design, and it's optimized for reading.
Graph databases and triplestores allow you to store a graph and perform isomorphism search. This is typically needed if you have a very generic dataset that can encompass a broad level of description of your entities, so broad that is basically unknown. This is in clear opposition to the relational model, where you create your tables with a very precise set of columns, and you know what each column is going to contain.
Some relational column-based databases exist as well. Instead of submitting data by row, you submit them by whole column.
So, to answer your question: a database is a method to store data. Technically, even a text file is a database, although not a particularly nice one. The choice of the model behind your database is mostly relative to what is the typical needs of your application.
Setting the answer as CW as I am probably saying something strictly not correct. Feel free to edit.
This is a rather broad question, but databases are well suited for managing relational data. Alternatives would almost always imply to design your own data storage and retrieval engine, which for most standard/small applications is not worth the effort.
A typical scenario that is not well suited for a database is the storage of large amounts of data which are organized as a relatively small amount of logical files, in this case a simple filesystem-like system can be enough.
Don't forget to take a look at NOSQL databases. It's pretty new technology and well suited for stuff that doesn't fit/scale in a relational database.
Use a database if you have data to store and query.
Technically, most things are suited for databases. Computers are made to process data and databases are made to store them.
The only thing to consider is cost. Cost of deployment, cost of maintenance, time investment, but it will usually be worth it.
If you only need to store very simple data, flat files would be an alternative (text files).
Note: you used the generic term 'database', but there are many many different types and implementations of these.
For search applications, full-text search engines (some of which are integrated to traditional DBMSes, but some of which are not), can be a good alternative, allowing both more features (various linguistic awareness, ability to have semi-structured data, ranking...) as well as better performance.
Also, I've seen applications where configuration data is stored in the database, and while this makes sense in some cases, using plain text files (or YAML, XML and such) and loading the underlying objects during initialization, may be preferable, due to the self-contained nature of such alternative, and to the ease of modifying and replicating such files.
A flat log file, can be a good alternative to logging to DBMS, depending on usage of course.
This said, in the last 10 years or so, the DBMS Systems, in general, have added many features, to help them handle different forms of data and different search capabilities (ex: FullText search a fore mentioned, XML, Smart storage/handling of BLOBs, powerful user-defined functions, etc.) which render them more versatile, and hence a fairly ubiquitous service. Their strength remain mainly with relational data however.

Prefixing database table names

I have noticed a lot of companies use a prefix for their database tables. E.g. tables would be named MS_Order, MS_User, etc. Is there a good reason for doing this?
The only reason I can think of is to avoid name collision. But does it really happen? Do people run multiple apps in one database? Is there any other reason?
Personally, I don't see any value in it. In fact, it's a bummer for intellisense-like features because everything begins with MS_. :) The Master agrees with me too.
Huge schemas often have many tables with similar, but distinct, purposes. Thus, various "segmented" naming conventions.
Darn, didn't get first post :-)
In SQL Server 2005 and above the schema feature eliminates the need for any kind of prefix. A good example of their usage can be found by reading about the Schemas in AdventureWorks.
In some older versions of SQL server, having a prefix to create a pseudo namespace might of been useful with DBs with lots of tables.
Other than that I can't really see the point.
Even when the database only contains one application, the prefixes can be useful in grouping like parts of the application together. So tables that containtain cutomer information might be prefixed with cust_, those that contain warehouse information might be prefixed with inv_ (for inventory), those that contain finacial information might be prefixed with fin_, etc.
I've worked on systems where there is an existing database for an application which was created and maintained by a different company and we've needed to add another app that uses large amounts of the same data with just a few extra tables of our own, so in that case having an app specific prefix can help with the separation.
Slightly tangentially to the original question, I've seen databases use prefixes to indicate the type of data that a table is holding. There'd be a prefix for lookup tables which are obviously pretty static in both size and content and a different prefix for the tables that contain variable data. This in turn may be broken into having one prefix for tables that are added to but not really changed like logging, processed orders, customer transactions etc, and and another for more variable data like customer balance or whatever. Link tables could also have their own prefix to separate them out too.
I have never seen a naming collision, as it usually doesn't make sense to put tables from different applications into the same database namespace. If you had some sort of reusable library that could be integrated into different applications, perhaps that might be a reason, but I haven't seen anything like that.
Though, now that I think about it, there are some cheap web hosting providers that only allow users to create a very small number of databases, so it would be possible to run a number of different applications using a single database, so long as the names didn't collide (and such a prefixing convention would certainly help).
Multiple applications using a particular table, correct. Prefixes prevent name collision. Also, it makes it rather simple to backup tables and keep them in the same database, just change the prefix and your backup will be fully functional, etc. Aside from that, it's just good practice.
Prefixes are a good way to sort out which sql objects are associated with which app when multiple apps dip into the same database.
I have also prefixed sql objects differently within the same app to facilitate easier management of security. i.e. all the objects with admin_ need this security applied and the rest need something else.
Prefixes can be handy for humans, search tools and scripts. If the situation is a simple one, however, there is probably no use for them at all.
It's most often use if several applications are sharing one database. For example, if you install Wordpress, it prefixes all tables with "wp_". This is good if you want your applications to share data very easily(sessions across all applications in your company, for example.)
There are better ways to accomplish this however, and I never prefix my table names, as each application has it's own self-contained database.

Why use database schemas?

I'm working on a single database with multiple database schemas,
e.g
[Baz].[Table3],
[Foo].[Table1],
[Foo].[Table2]
I'm wondering why the tables are separated this way besides organisation and permissions.
How common is this, and are there any other benefits?
You have the main benefit in terms of logically groupings objects together and allowing permissions to be set at a schema level.
It does provide more complexity in programming, in that you must always know which schema you intend to get something from - or rely on the default schema of the user to be correct. Equally, you can then use this to allow the same object name in different schemas, so that the code only writes against one object, whilst the schema the user is defaulted to decides which one that is.
I wouldn't say it was that common, anecdotally most people still drop everything in the dbo schema.
I'm not aware of any other possible reasons besides organization and permissions. Are these not good enough? :)
For the record - I always use a single schema - but then I'm creating web applications and there is also just a single user.
Update, 10 years later!
There's one more reason, actually. You can have "copies" of your schema for different purposes. For example, imagine you are creating a blog platform. People can sign up and create their own blogs. Each blog needs a table for posts, tags, images, settings etc. One way to do this is to add a column
blog_id to each table and use that to differentiate between blogs. Or... you could create a new schema for each blog and fresh new tables for each of them. This has several benefits:
Programming is easier. You just select the approppriate schema at the beginning and then write all your queries without worrying about forgetting to add where blog_id=#currentBlog somewhere.
You avoid a whole class of potential bugs where a foreign key in one blog points to an object in another blog (accidental data disclosure!)
If you want to wipe a blog, you just drop the schema with all the tables in it. Much faster than seeking and deleting records from dozens of different tables (in the right order, none the less!)
Each blog's performance depends only (well, mostly anyway) on how much data there is in that blog.
Exporting data is easier - just dump all the objects in the schema.
There are also drawbacks, of course.
When you update your platform and need to perform schema changes, you need to update each blog separately. (Added yet later: This could actually be a feature! You can do "rolling udpates" where instead of updating ALL the blogs at the same time, you update them in batches, seeing if there are any bugs or complaints before updating the next batch)
Same about fixing corrupted data if that happens for whatever reason.
Statistics for all the platform together are harder to calculate
All in all, this is a pretty niche use case, but it can be handy!
To me, they can cause more problems because they break ownership chaining.
Example:
Stored procedure tom.uspFoo uses table tom.bar easily but extra rights would be needed on dick.AnotherTable. This means I have to grant select rights on dick.AnotherTable to the callers of tom.uspFoo... which exposes direct table access.
Unless I'm completely missing something...
Edit, Feb 2012
I asked a question about this: SQL Server: How to permission schemas?
The key is "same owner": so if dbo owns both dick and tom schema, then ownership chaining does apply. My previous answer was wrong.
There can be several reasons why this is beneficial:
share data between several (instances
of) an application. This could be the
case if you have group of reference
data that is shared between
applications, and a group of data
that is specific for the instance. Be careful not to have circular references between entities in in different schema's. Meaning don't have a foreign key from an entity in schema 1 to another entity in schema 2 AND have another foreign key from schema 2 to schema 1 in other entities.
data partitioning: allows for data to be stored on different servers
more easily.
as you mentioned, access control on DB level

Resources