How can I restrict a column to certain values from another table without using a foreign key? - sql-server

This is for SQL Server 2008.
I have a master lookup table that looks like this:
Mstr_lookup_ID *Lookup_ID* *Lookup_Category* Lookup_Value
1 1 States CA
2 2 States NY
3 1 Airlines SWA
4 2 Airlines United
5 3 Airlines American
I have my primary key on Lookup_ID and Lookup_Category. I have a table that where I would like to restrict values in certain columns based upon the combination of Lookup_ID and Lookup_Category.
Create Table Main_Table
(State_id INT --ONLY VALUES ALLOWED SHOULD BE 1-2
Airline_id INT) --ONLY VALUES ALLOWED SHOULD BE 1-3
Is there a way to accomplish this neatly? I'd like to create a foreign key for it, but my primary key is on two columns. I'd appreciate any thoughts or suggestions.

See Five Simple Database Design Errors You Should Avoid.
What you're trying to do is their point #1.
It sounds like a great idea - at first. Having just one table instead of many ...
But it's really not, precisely because it defeats the main purpose of a lookup tables - being able to enforce referential integrity.
Don't do this - use a separate lookup table for each category and use proper referential integrity!
The only way you could make this work would be to have a INT IDENTITY on this "global" lookup table - but that ID would be "global" across all categories. Then you could reference this global lookup table based just on that one ID. The downside is that your categories won't all have nice consecutive numbers - it'll be all over the place.

So, you have a "one big lookup table"?
To create a FK (and there is no other declarative way to restrict values based on lookup table, otherwise you can use triggers) you have to add another column to the main_table ie the lookup_category column, and draw the reference using both columns to the primary key in the lookup table.

Related

Database Mapping - Multiple Foreign Keys

I want to make sure this is the best way to handle a certain scenario.
Let's say I have three main tables I will keep them generic. They all have primary keys and they all are independent tables referencing nothing.
Table 1
PK
VarChar Data
Table 2
PK
VarChar Data
Table 3
PK
VarChar Data
Here is the scenario, I want a user to be able to comment on specific rows on each of the above tables. But I don't want to create a bunch of comment tables. So as of right now I handled it like so..
There is a comment table that has three foreign key columns each one references the main tables above. There is a constraint that only one of these columns can be valued.
CommentTable
PK
FK to Table1
FK to Table2
FK to Table3
VarChar Comment
FK to Users
My question: is this the best way to handle the situation? Does a generic foreign key exist? Or should I have a separate comments table for each main table.. even though the data structure would be exactly the same? Or would a mapping table for each one be a better solution?
My question: is this the best way to handle the situation?
Multiple FKs with a CHECK that allows only one of them to be non-NULL is a reasonable approach, especially for relatively few tables like in this case.
The alternate approach would be to "inherit" the Table 1, 2 and 3 from a common "parent" table, then connect the comments to the parent.
Look here and here for more info.
Does a generic foreign key exist?
If you mean a FK that can "jump" from table to table, then no.
Assuming all 3 FKs are of the same type1, you could theoretically implement something similar by keeping both foreign key value and referenced table name2 and then enforcing it through a trigger, but declarative constraints should be preferred over that, even at a price of slightly more storage space.
If your DBMS fully supports "virtual" or "calculated" columns, then you could do something similar to above, but instead of having a trigger, generate 3 calculated columns based on FK value and table name. Only one of these calculated columns would be non-NULL at any given time and you could use "normal" FKs for them as you would for the physical columns.
But, all that would make sense when there are many "connectable" tables and your DBMS is not thrifty in storing NULLs. There is very little to gain when there are just 3 of them or even when there are many more than that but your DBMS spends only one bit on each NULL field.
Or should I have a separate comments table for each main table, even though the data structure would be exactly the same?
The "data structure" is not the only thing that matters. If you happen to have different constraints (e.g. a FK that applies to one of them but not the other), that would warrant separate tables even though the columns are the same.
But, I'm guessing this is not the case here.
Or would a mapping table for each one be a better solution?
I'm not exactly sure what you mean by "mapping table", but you could do something like this:
Unfortunately, that would allow a single comment to be connected to more than one table (or no table at all), and is in itself a complication over what you already have.
All said and done, your original solution is probably fine.
1 Or you are willing to store it as string and live with conversions, which you should be reluctant to do.
2 In practice, this would not really be a name (as in string) - it would be an integer (or enum if DBMS supports it) with one of the well-known predefined values identifying the table.
Thanks for all the help folks, i was able to formulate a solution with the help of a colleague of mine. Instead of multiple mapping tables i decided to just use one.
This mapping table holds a group of comments, so it has no primary key. And each group row links back to a comment. So you can have multiple of the same group id. one-many-one would be the relationship.

SQL Server 2008 - Database Design Query

I have to load the data shown in the below image into my database.
For a particular row, either field PartID would be NULL OR field GroupID will be NULL, and the other available columns refers to the NON-NULL entity. I have following three options:
To use one database table, which will have one unified column say ID, which will have PartID and GroupID data. But, in this case I won't be able to apply foreign key constraint, as this column will be containing both entities' data.
To use one database table, which will have columns for both PartID and GroupID, which will contain the respective data. For each row, one of them will be NULL, But in this case I will be able to apply foreign key constraint.
To use two database tables, which will have similar structure, the only difference will be the column PartID and GroupID. In this case I will be able to apply foreign key constraint.
One thing to note here is that, the table(s) will be used in import processes to import about 30000 rows in one go and will also be heavily used in data retrieve operations. Also, the other columns will be used as pivot columns.
Can someone please suggest what should be best approach to achieve this?
I would use option 2 and add a constraint that only one can be non-null and the other must be null (just to be safe). I would not use option 1 because of the lack of a FK and the possibility of linking to the wrong table when not obeying the type identifier in the join.
There is a 4th option, which is to normalize them as "items" with another (surrogate) key and two link tables which link items to either parts or groups. This eliminates NULLs. There are further problems with that approach (items might be in both again or neither without any simple constraint), so unless that is necessary for other reasons, I wouldn't generally go down that path.
Option 3 could be fine - it really depends if these rows are a relation - i.e. data associated with a primary key. That's one huge problem I see with the data presented, the lack of a candidate key - I think you need to address that first.
IMO option 2 is the best - it's not perfectly normalized but will be the easiest to work with. 30K rows is not a lot of rows to import.
I would modify the table so it has one ID column and then add an IDType that is either "G" for Group or "P" for Part.

SQL Index on two columns

Here's my simple scenario:
I've a Users table and a Locations table. ONE User can be related to MANY Locations so I've a UserLocation table which is as follows:
ID (int-Auto Increment) PK
UserID (Int FK to the Users table)
LocID (Int FK to the Locations table)
Now, as ID is the PK it is Indexed by default in SQL-Server. I was a bit confused about the other two columns:
OPT 1: Shud I define an Index on both the columns like:
IX_UserLocation_UserID_LocID
OR
OPT 2: Shud I define two separate Indexes like : IX_UserLocation_UserID
& IX_UserLocation_LocID
Pardon me if both do the same - in that case pls explain. If not - which one is better and why?
You need
2 columns
UserID (Int FK to the Users table)
LocID (Int FK to the Locations table)
One PK on both (UserID, LocID)
Another index on the reverse (LocID, UserID)
You may not need both indexes but it's rare
Edit, some links to other SO answers
SQL: Do you need an auto-incremental primary key for Many-Many tables?
SQL - many-to-many table primary key
Difference between 2 indexes with columns defined in reverse order
There are several things we hire the database for. One is fast information retrieval and another is declarative referential integrity (DRI).
If you requirement is that a user may be related to a given location only once then you want a unique index on UserID & LocatonID.
If your question is how to retrieve the data fast the answer is -- it depends. How are you accessing the data? If you always get the entire set of locations for a user then I would probably use a clustered non-unique index on UserID. If your access is "who is in locatin x?" then you probably want a clustered non-unique index on LocationID.
If you ask both questions you'll probably want both indexes (although you only get 1 clustered, so the 2nd index may want to use an INCLUDE to grab the other column).
Either way, you probalby don't want ID as your clustered index (the default when marking a column as PK in SSMS table designer).
HTH,
-eric
In addition to the "gbn" answer. It will depend on the Where clause. Whether you are using user or location or both
You should probably create two separate indexes. One thing that is often forgotten with foreign keys is the fact that deleting a user might cascade-delete the user-location relation in your table. If there is no index on userID, this might lead to a table-lock of your user-location relation. The same applies to deleting a location.
The best way to setup all the indexed you think you need on dev and check look at the query plans of the queries your app runs and see what indexes get read.

Database design - defining a basic many-to-one relationship

This is a basic database design question. I want a table (or multiple tables) defining relationships between customers. I want it so PrimaryCustomer can be linked to multiple SecondaryCustomers, and can have many SecondaryCustomers with the same relationship.
PrimaryCustomerID RelationshipID SecondaryCustomerID
1) If the primary key is {PrimaryCustomerID} then I can only have one linked customer of any kind.
2) If the primary key is {PrimaryCustomerID, RelationshipID}, then I can only have one linked customer for each relationship type.
3) If the primary key is {PrimaryCustomerID, RelationshipID, SecondaryCustomerID}, then I can have whatever I like, but having all columns as the primary key seems completely wrong.
What's the right way to set things up?
A third alternative might be for the key to be (PrimaryCustomerId, SecondaryCustomerId), which would make sense if only one type of relationship is permitted per pair of customers. What keys to implement should be defined by what dependencies you need to represent in the table so that the table accurately represents the reality you are modelling. There's nothing wrong in principle with compound keys or all-key tables.
Number 3 is the right way to go for this data model. Linking tables often have all the columns in a join as all they do is link to other tables.
If a customer can only be linked to one primary customer then you can use a simple recursive relationship in the customer table itself.
CustomerID as PK
PrimaryCustomerID as FK to CustomerID
Nothing wrong with No 3.
If you need to prevent reverse-relationship duplicates, you can use
ALTER TABLE CustomerRelationship
ADD CONSTRAINT chk_id CHECK (PrimaryCustomerId < SecondaryCustomerId);

Entity relationship

I have doubt in this design(er_lp). My doubt is in how to create many - to- many relationship with entities with composite keys, Secondly, in using date type as pk. Here each machines work daily for three shifts for different userDepts on one or more fields. So to keep record of working and down hours of machineries I have used shift,taskDay and machinePlate as pks. As you will see from the ER diagram, I ended up with too many pks in the link table in many places. I hesitate not to get in to trouble in coding phase
Is there a better way to do this?
Thank you !!
Dejene
See also extra information posted as a second question Entity Relationship. The material, reformatted, is:
Elaboration: Yes, 'Field ' is referring to areas of land. We have several cane growing fields at different location. It [each field?] is named and has budget.
User is not referring to individual who are working on the machine. They are departments. I used 'isDone' table to link userDept with machine. A machine can be used by several departments and many machines can work for a userDept.
A particular machine can be used for multiple tasks on a given shift. It can work for say 2 hours and can start another task on another field. We have three shifts per day, each of 8 hrs!
If I use Auto increment PK, do you think that other key are important? I don't prefer to use it!
Usually, I use auto increment key alone in a table. How can we create relationship that involves auto increment keys?
Thank you for thoughtful comment!!
You always create many-to-many relationships between two tables using a third table, the rows of which contain the columns for the primary key of each table, and the combination of all columns is the primary key of the third table. The rule doesn't change for tables with composite primary keys.
CREATE TABLE Table1(Col11 ..., Col12 ..., Col1N ...,
PRIMARY KEY(Col11, Col12));
CREATE TABLE Table2(Col21 ..., Col22 ..., Col2N ...,
PRIMARY KEY(Col21, Col22));
CREATE TABLE RelationTable
(
Col11 ...,
Col12 ...,
FOREIGN KEY (Col11, Col12) REFERENCES Table1,
Col21 ...,
Col22 ...,
FOREIGN KEY (Col21, Col22) REFERENCES Table2,
PRIMARY KEY (Col11, Col12, Col21, Col22)
);
This works fine. It does suggest that you should try and keep keys simple whenever possible, but there is absolutely no need to bend over backwards adding auto-increment columns to the referenced tables if they have a natural composite key that is convenient to use. OTOH, the joins involving the relation table are harder to write if you use a composite keys - I'd think several times about what I'm about if either composite key involved more than two columns, not least because it might indicate problems in the design of the referenced tables.
Looking at the actual ER diagram - the 'er_lp' URL in the question - the 'tbl' prefix seems a trifle unnecessary; the things storing data in a database are always tables, so telling me that with the prefix is ... unnecessary. The table called 'Machine' seems to be misnamed; it does not so much describe a machine as the duty allocated to a machine on a particular shift. I'm guessing that the 'Field' table is referring to areas of land, rather than parts of a database. You have the 'IsDone' table (again, not particularly well named) that identifies the user who worked on a machine for a particular shift and hence for a particular task. That involves a link between the Machine table (which has a 3-part primary key) and the User table. It isn't clear whether a particular machine can be used for multiple tasks on a given shift. It isn't clear whether shift numbers cycle around the day or whether each shift number is unique across days, but the presumption must be that there are, say, three shifts per day, and the shift number and date is needed to identify when something occurred. Presumably, the Shift table would identify times and other such information.
The three-part primary key on Machine is fine - but it might be better to have two unique identifiers. One would be the current primary key combination; the other would be an automatically assigned number - auto-increment, serial, sequence or whatever...
Addressing the extended information.
It is not clear to me any more what you are seeking to track. If the 'Machine' table is supposed to track what a given machine was being used for, then you probably need to do some more structuring of the data. Given that a machine can be used for different tasks on different fields during a single shift, you should think, perhaps, in terms of a MachineTasks table which would identify the (date and) time when the operation started and finished and the type of operation. For repair operations, you'd store the information in a table describing repairs; for routine operations in a field, you might not need much extra information. Or maybe that is overkill.
I'm not clear whether particular tasks are performed on behalf of multiple departments, or whether you are simply trying to note that during a single shift a machine might be used by multiple departments, but one department at a time for each task. If each task is for a separate department, then simply include the department info in the main MachineTasks table as a foreign key field.
If you decide on an auto-increment key, you still need to maintain the uniqueness of the composite key. This is the biggest mistake I see people making with auto-increment fields. It isn't quite as simple as "a table with an auto-increment key must also have a second unique constraint on it", but it isn't too far off the mark.
When you use an auto-increment key, you need to retrieve the value assigned when you insert a record into the table; you then use that value in the foreign key columns when you insert other records into the other tables.
You need to read up on database design - I'm not sure what the current good books are as I did most of my learning a decade and more ago, and my books are consequently less likely to be available still.
One good way of not getting into trouble with primary keys is to have a single field for primary key. Usually a numeric (auto incremental) column is just fine. You can still have unique keys with multiple columns.
tblWorksOn
tblMachine
tblIsDone
...seem to be the problem tables.
Its looks like you could use taskDate for the tblMachine table as the primary key. The rest can be foriegn keys.
With the changes to the tblMachine table you can then use the taskDate with the fieldNo for the tblWorksOn table and the taskDate with the userID for the tblIsDone. Use these two fields to create Composite Keys (CK)
e.g.
tblMachine
taskDate (PK)
tblWorksOn
fieldNo (CK)
taskDate (CK)
tblIsDone
userID (CK)
taskDate (CK)

Resources