DB - Is table with just one column the right way? - database

i am trying to build a db structure for a multi-language admin panel, and one of the entities is Meal_Plans which will also be referenced by other tables in the design. I can't see at the moment useful attributes that will not have to be translated rather than id (even "active" won't be needed because all of the Meal Plans will be active by default), so the right way of doing things should be
TABLE Meal_PLans
id
TABLE MealPlan_Translations
mealplan_id
language_code
name
description
PRIMARY_KEY (mealplan_id, language_code)
Is having a table with just one column legit? Because referencing mealplan_id inside MealPlan_Translations won't be correct, given that it won't be a unique value in that table.
Thanks for your help

Such a structure makes sense. It captures the concept of a MealPlan being an entity; you also keep the door open for possible future additions to the model.
Other option would be to only use a sequence for generating MealPlan id's and only capture them in the MealPlan_Translations table. Specifics depend on the DB you're using, e.g. MSSQL docs.
This option is also viable, but it doesn't allow a situation where a MealPlan doesn't have a translation (which may or may not be OK, depending on the domain you're modelling).

Related

Is there a pattern to avoid ever-multiplying link tables in database design?

Currently scoping out a new system. Like many systems, it will be required to store documents and link them to other kinds of item. In this instance a Document object can belong to a Job or it can belong to an Item (which in turn belongs to a Job).
We could do this by having a JobId and an ItemId against a Document and leaving one or the other blank if necessary, but that's going to mean annoying conditional logic in the handling code. So, two link tables seems a better idea.
However, it is likely that we will need to link Documents to other items in the system at some point in the future. There are Company and User objects, for example, and we might want to record Documents against those. There may be more.
That would entail a proliferation of link tables which, while effective, is messy and hard to follow.
This solution is in SQL Server and will be handled in code via Entity Framework.
Are there any design principles that can allow us to hook up Document objects with a variety of other system objects as required in a neater and more flexible way?
You could store two values: the id, and the type of object to which the document is attached. It doesn't allow the use of foreign keys, but is compatible with many application development frameworks.
If you have the partitioning option then you could dedicate different partitions to different object types.
You could also have multiple tables, one for job documents, one for item documents, and get an overview of all of them with a view that UNION ALL's them together. If you need uniqueness in that result set then you could use UUIDs for the primary key, or add an extra column to the view to express from which table the row was read.

Type tables and software hardcoded values

I have this DB model:
(TEXT is actually VARCHAR)
entity_group_type is not modifiable at runtime, but it will be modified in a near future to add more entries, several times, by the development team.
Now I need to retrieve all entries from entity that are of a given entity_group_type. How should the software handle this kind of queries? Should I hardcode entity_group_type _id/name in the software? If so, why do I even need this table then? And what's better, hardcode _id or name?
Or is this the wrong way to structure my data?
Thanks in advance!
Taking your questions one at a time:
How should you refer to the entity group in the software? Hard-code the id, or the name?
Refer to the entity groups in a way that makes your code the most readable. So, perhaps you use the name, or perhaps a constant that looks like the name but whose value is the id. Using a constant can avoid one join when you are looking up entities by group type, but usually this is not much of a concern.
Why do I even need that table in the DB then? Is this the wrong way to structure my data?
This is a perfectly acceptable way to structure your data. The most correct way depends on what you are doing with the data, but for most applications your structure would be correct. However, you certainly don't need that table in the database--you could instead have a "group_type" field on the entity_group table. Here are the pros and cons:
Advantages of the current structure:
Easy to add fields that describe the entity_group_type. For example, you might want to make some group types only viewable by admin users, or disabled, or whatever. If this is a possiblity for the future, it pretty much requires this database structure.
Ability to have your database software enforce referential integrity, meaning the data is kept consistent between the entity_group and entity_group_type tables.
Advantages of adding a group_type field on the entity_group table:
Possibly a simpler representation in your code. For exameple, if you use a MVC architecture, having the extra table might require another model object in your code. That's usually not a problem, and may have advantages, but sometimes simpler is better.
When you are looking up entities by entity group type, your SQL statements will be slightly simpler, since there will be one less table/join involved.
I think in most cases your current structure comes out ahead, although it does depend of how you use the data. Unless you had a good reason to structure the data differently, I would stick with your current structure.
I think you should definitely store entity_group_type in the database, in its own table, as per your design diagram above. Storing that information in the DB makes it possible to query against that information, which adds flexibility to your design.
Once you have this information in the DB, your question seems to be whether it should be broken out into its own table, or just be stored as column in the entity_type table. I think you should break it out into its own table, with a foreign key constraint from entity_type to entity_group_type.
Having entity_group_type in its own table allows for group types to be preserved even if all of the entity groups of that type are removed from the DB.
You can leverage the foreign key to ensure that the entity_group_type name is spelled consistently. Inserted entity_groups must satisfy the foreign key constraint, and so would have to either reference an existing entity_group_type having a properly spelled name, or insert an entity_group_type which will set the proper spelling for that new entity_group_type.
Use a synthetic key for your entity_group_type table, to reduce painful redundancy in the appearance of entity_group_type name appearance. This makes the data model more DRY, and allows updating the entity_group_type's name to be a simple update to one table.
As for storing the entity_group_type in your application logic, I would suggest storing the name, and looking up the id for the entity_group_type when it is needed. I think that would make the application codebase sowewhat more readable, and I think I've made a compelling argument for why I think this relevant information should have a representation in the database.
Given, that you know the name of your entity_group_type at runtime, then the correct way to find all entity_groups with that entity_group_type.name, you would use a join:
SELECT entity_group._id, entity_group.name, entity_group_type.name AS groupTypeName
FROM entity_group
JOIN entity_group_type ON entity_group.id_type = entity_group_type._id
WHERE entity_group_type.name = 'someGroupName';
This would result in all entity_group information for the given entity_group_type name.
And yes, this is a good or at least possible way to store that kind of information. Just imagine that eventually the entity_group_type gets a new attribute, e.g. disabled. Then it is still easily possible to find all entity_groups which are (not) in disabled types.

Database Is-a relationship

My problem relates to DB schema developing and is as follows.
I am developing a purchasing module, in which I want to use for purchasing items and SERVICES.
Following is my EER diagram, (note that service has very few specialized attributes – max 2)
My problem is to keep products and services in two tables or just in one table?
One table option –
Reduces complexity as I will only need to specify item id which refers to item table which will have an “item_type” field to identify whether it’s a product or a service
Two table option –
Will have to refer separate product or service in everywhere I want to refer to them and will have to keep “item_type” field in every table which refers to either product or service?
Currently planning to use option 1, but want to know expert opinion on this matter. Highly appreciate your time and advice. Thanks.
I'd certainly go to the "two tables" option. You see, you have to distinguish Products and Services, so you may either use switch(item_type) { ... } in your program or entirely distinct code paths for Product and for Service. And if a need for updating the DB schema arises, switch is harder to maintain.
The second reason is NULLs. I'd advise avoid them as much as you can — they create more problems than they solve. With two tables you can declare all fields non-NULL and forget about NULL-processing. With one table option, you have to manually write code to ensure that if item_type=product, then Product-specific fields are not NULL, and Service-specific ones are, and that if item_type=service, then Service-specific fields are not NULL, and Product-specific ones are. That's not quite pleasant work, and the DBMS can't do it for you (there is no NOT NULL IF another_field = value column constraint in SQL or anything like this).
Go with two tables. It's easier to support. I once saw a DB where everything, every single piece of data went in just two tables — there were pages and pages of code to make sure that necessary fields are not NULL.
If I were to implement I would have gone for the Two table option, It's kinda like the first rule of normalization of the schema. To remove multi-valued attributes. Using item_type is not recommended. Once you create separate tables you dont need to use the item_type you can just use the foreign key relationship.
Consider reading this article :
http://en.wikipedia.org/wiki/Database_normalization
It should help.

What would you do to avoid conflicting data in this database schema?

I'm working on a multi-user internet database-driven website with SQL Server 2008 / LinqToSQL / custom-made repositories as the DAL. I have run across a normalization problem which can lead to an inconsistent database state if exploited correctly and I am wondering how to deal with the problem.
The problem: Several different companies have access to my website. They should be able to track their Projects and Clients at my website. Some (but not all) of the projects should be assignable to clients.
This results in the following database schema:
**Companies:**
ID
CompanyName
**Clients:**
ID
CompanyID (not nullable)
FirstName
LastName
**Projects:**
ID
CompanyID (not nullable)
ClientID (nullable)
ProjectName
This leads to the following relationships:
Companies-Clients (1:n)
Companies-Projects (1:n)
Clients-Projects(1:n)
Now, if a user is malicious, he might for example insert a Project with his own CompanyID, but with a ClientID belonging to another user, leaving the database in an inconsistent state.
The problem occurs in a similar fashion all over my database schema, so I'd like to solve this in a generic way if any possible. I had the following two ideas:
Check for database writes that might lead to inconsistencies in the DAL. This would be generic, but requires some additional database queries before an update and create queries are performed, so it will result in less performance.
Create an additional table for the clients-Projects relationship and make sure the relationships created this way are consistent. This also requires some additional select queries, but far less than in the first case. On the other hand it is not generic, so it is easier to miss something in the long run, especially when adding more tables / dependencies to the database.
What would you do? Is there any better solution I missed?
Edit: You might wonder why the Projects table has a CompanyID. This is because I want users to be able to add projects with and without clients. I need to keep track of which company (and therefore which website user) a clientless project belongs to, which is why a project needs a CompanyID.
I'd go with with the latter, having one or more tables that define the allowable relationships between entities.
Note, there's no circularity in the references you have, so the title is misleading.
What you have is the possibility of conflicting data, that's different.
Why do you have "CompanyID" in the project table? The ID of the company involved is implicitly given by the client you link to. You don't need it.
Remove that column and you've removed your problem.
Additionally, what is the purpose of the "name" column in the client table? Can you have a client with one name, differing from the name of the company?
Or is "client" the person at that company?
Edit: Ok with the clarification about projects without companies, I would separate out the references, but you're not going to get rid of the problem you're describing without constraints that prevent multiple references being made.
A simple constraint for your existing tables would be that not both the CompanyID and ClientID fields of the project row could be non-null at the same time.
If you want to use the table like this and avoid the all the new queries just put triggers on the table and when user tries to insert row with wrong data the trigger with stop him.
Best Regards,
Iordan
My first thought would be to create a special client record for each company with name "No client". Then eliminate the CompanyId from the Project table, and if a project has no client, use the "No client" record rather than a "normal" client record. If processing of such no-client's is special, add a flag to the no-client record to explicitly identify it. (I'd hate to rely on the name being "No Client" or something like that -- too fuzzy.)
Then there would be no way to store inconsistent data so the problem would go away.
In the end I implemented a completely generic solution which solves my problem without much runtime overhead and without requiring any changes to the database. I'll describe it here in case someone else has the same problem.
First off, the approach only works because the only table that other tables are referencing through multiple paths is the Companies table. Since this is the case in my database, I only have to check whether all n:1 referenced entities of each entity that is to be created / updated / deleted are referencing the same company (or no company at all).
I am enforcing this by deriving all of my Linq entities from one of the following types:
SingleReferenceEntityBase - The norm. Only checks (via reflection) if there really is only one reference (no matter if transitive or intransitive) to the Companies table. If this is the case, the references to the companies table cannot become inconsistent.
MultiReferenceEntityBase - For special cases such as the Projects table above. Asks all directly referenced entities what company ID they are referencing. Raises an exception if there is an inconsistency. This costs me a few select queries per CRUD operation, but since MultiReferenceEntities are much rarer than SingleReferenceEntities, this is negligible.
Both of these types implement a "CheckReferences" and I am calling it whenever the linq entity is written to the database by partially implementing the OnValidate(System.Data.Linq.ChangeAction action) method which is automatically generated for all Linq entities.

Database design - do I need one of two database fields for this?

I am putting together a schema for a database. The goal of the database is to track applications in our department. I have a repeated problem that I am trying to solve.
For example, I have an "Applications" table. I want to keep track if any application uses a database or a bug tracking system so right now I have fields in the Applications table called
Table: Applications
UsesDatabase (bit)
Database_ID (int)
UsesBugTracking (bit)
BugTracking_ID (int)
Table: Databases:
id
name
Table: BugTracking:
id
name
Should I consolidate the "uses" column with the respective ID columns so there is only one bug tracking column and only one database column in the applications table?
Any best practice here for database design?
NOTE: I would like to run reports like "Percent of Application that use bug tracking" (although I guess either approach could generate this data.)
You could remove the "uses" fields and make the id columns nullable, and let a null value mean that it doesn't use the feature. This is a common way of representing a missing value.
Edit:
To answer your note, you can easily get that statistics like this:
select
count(*) as TotalApplications,
count(Database_ID) as UsesDatabase,
count(BugTracking_ID) as UsesBugTracking
from
Applications
Why not get rid of the two Use fields and simply let a NULL value in the _ID fields indicate that the record does not use that application (bug tracking or database)
Either solution works. However, if you think you may want to occasionally just get a list of applications which do / do not have databases / bugtracking consider that having the flag fields reduces the query by one (or two) joins.
Having the bit fields is slightly denormalized, as you have to keep two fields in sync to keep one piece of data updated, but I tend to prefer them for cases like this for the reason I gave in the prior paragraph.
Another option would be to have the field nullable, and put null in it for those entries which do not have DBs / etc, but then you run into problems with foreign key constraints.
I don't think there is any one supreme right way, just consider the tradeoffs and go with what makes sense for your application.
I would use 3 tables for the objects: Application, Database, and BugTracking. Then I would use 2 join tables to do 1-to-many joins: ApplicationDatabases, and ApplicationBugTracking.
The 2 join tables would have both an application_id and the id of the other table. If an application used a single database, it would have a single ApplicationDatabases record joining them together. Using this setup, an application could have 0 database (no records for this app in the ApplicationDatabases table), or many databases (multiple records for this app in the ApplicationDatabases table).
"Should i consolidate the "uses" column"
If I look at your problem statement, then there either is no "uses" column at all, or there are two. In either case, it is wrong of you to speak of "THE" uses column.
May I politely suggest that you learn to be PRECISE when asking questions ?
Yes using null in the foreign key fields should be fine - it seems superfluous to have the bit fields.
Another way of doing it (though it might be considered evil by database people ^^) is to default them to 0 and add in an ID 0 data row in both bugtrack and database tables with a name of "None"... when you do the reports, you'll have to do some more work unless you present the "None" values as they are as well with a neat percentage...
To answer the edited question-
Yes, the fields should be combined, with NULL meaning that the application doesn't have a database (or bug tracker).

Resources