SQL Server 2008 - Database Design Query - sql-server

I have to load the data shown in the below image into my database.
For a particular row, either field PartID would be NULL OR field GroupID will be NULL, and the other available columns refers to the NON-NULL entity. I have following three options:
To use one database table, which will have one unified column say ID, which will have PartID and GroupID data. But, in this case I won't be able to apply foreign key constraint, as this column will be containing both entities' data.
To use one database table, which will have columns for both PartID and GroupID, which will contain the respective data. For each row, one of them will be NULL, But in this case I will be able to apply foreign key constraint.
To use two database tables, which will have similar structure, the only difference will be the column PartID and GroupID. In this case I will be able to apply foreign key constraint.
One thing to note here is that, the table(s) will be used in import processes to import about 30000 rows in one go and will also be heavily used in data retrieve operations. Also, the other columns will be used as pivot columns.
Can someone please suggest what should be best approach to achieve this?

I would use option 2 and add a constraint that only one can be non-null and the other must be null (just to be safe). I would not use option 1 because of the lack of a FK and the possibility of linking to the wrong table when not obeying the type identifier in the join.
There is a 4th option, which is to normalize them as "items" with another (surrogate) key and two link tables which link items to either parts or groups. This eliminates NULLs. There are further problems with that approach (items might be in both again or neither without any simple constraint), so unless that is necessary for other reasons, I wouldn't generally go down that path.
Option 3 could be fine - it really depends if these rows are a relation - i.e. data associated with a primary key. That's one huge problem I see with the data presented, the lack of a candidate key - I think you need to address that first.

IMO option 2 is the best - it's not perfectly normalized but will be the easiest to work with. 30K rows is not a lot of rows to import.

I would modify the table so it has one ID column and then add an IDType that is either "G" for Group or "P" for Part.

Related

Which is better distinct or unique constraint for table in SQL Server Database?

Which is better distinct or unique constraint for table in SQL Server Database ?
Should I use
distinct
for getting records from the large table or put
unique constraint
to the field so no duplicate entry happened ?
My ultimate goal is only that , get unique data, and i know both will give me this, But if i use unique constraint to field , then It will give me sql error at a time i insert duplicate data. Is it ok ? Is it affect to server or Databases ? I am using SQL Server for this process.
They're totally different use cases.
A unique constraint is what you use if the column itself (or set of columns) must be unique according to the schema details (the data). In other words, if the data is required to be unique on that column (or column set), use a unique constraint.
For example, if you're maintaining a membership table, the member ID should be unique.
The database must protect itself from dodgy data, this is not something that should be left to well-behaved applications, since the first non-well-behaved application that comes along is going to destroy your universe.
If the data is not required to be unique (such as the town each member lives in), then you can decide to "uniquify" it in a select statement, depending on your needs:
-- Get all towns.
select distinct town from members
So, here's your solution matrix, in decreasing priority:
Does the actual data need to be unique on that column? If so, a unique constraint must be used. Otherwise, a unique constraint should not be used.
If the data does not need to be uniques, do you need to only get one row for each possible value for that data? If so, use select distinct. If not, use select on its own.
Depends.
With distinct you pay at query time, but it's simpler for the user.
With unique constraint, you pay at insert time, and the app now has to handle exceptions on duplicates, but the query is faster.
Without more info, I would go with distinct, because life is simpler and you don't lock in behaviour (next week you may need the duplicates).
UNIQUE: always take part in data INSERTION (in brief)
DISTINCT: always concern on data retrieval (in brief)
Maybe this will help you.

Database Mapping - Multiple Foreign Keys

I want to make sure this is the best way to handle a certain scenario.
Let's say I have three main tables I will keep them generic. They all have primary keys and they all are independent tables referencing nothing.
Table 1
PK
VarChar Data
Table 2
PK
VarChar Data
Table 3
PK
VarChar Data
Here is the scenario, I want a user to be able to comment on specific rows on each of the above tables. But I don't want to create a bunch of comment tables. So as of right now I handled it like so..
There is a comment table that has three foreign key columns each one references the main tables above. There is a constraint that only one of these columns can be valued.
CommentTable
PK
FK to Table1
FK to Table2
FK to Table3
VarChar Comment
FK to Users
My question: is this the best way to handle the situation? Does a generic foreign key exist? Or should I have a separate comments table for each main table.. even though the data structure would be exactly the same? Or would a mapping table for each one be a better solution?
My question: is this the best way to handle the situation?
Multiple FKs with a CHECK that allows only one of them to be non-NULL is a reasonable approach, especially for relatively few tables like in this case.
The alternate approach would be to "inherit" the Table 1, 2 and 3 from a common "parent" table, then connect the comments to the parent.
Look here and here for more info.
Does a generic foreign key exist?
If you mean a FK that can "jump" from table to table, then no.
Assuming all 3 FKs are of the same type1, you could theoretically implement something similar by keeping both foreign key value and referenced table name2 and then enforcing it through a trigger, but declarative constraints should be preferred over that, even at a price of slightly more storage space.
If your DBMS fully supports "virtual" or "calculated" columns, then you could do something similar to above, but instead of having a trigger, generate 3 calculated columns based on FK value and table name. Only one of these calculated columns would be non-NULL at any given time and you could use "normal" FKs for them as you would for the physical columns.
But, all that would make sense when there are many "connectable" tables and your DBMS is not thrifty in storing NULLs. There is very little to gain when there are just 3 of them or even when there are many more than that but your DBMS spends only one bit on each NULL field.
Or should I have a separate comments table for each main table, even though the data structure would be exactly the same?
The "data structure" is not the only thing that matters. If you happen to have different constraints (e.g. a FK that applies to one of them but not the other), that would warrant separate tables even though the columns are the same.
But, I'm guessing this is not the case here.
Or would a mapping table for each one be a better solution?
I'm not exactly sure what you mean by "mapping table", but you could do something like this:
Unfortunately, that would allow a single comment to be connected to more than one table (or no table at all), and is in itself a complication over what you already have.
All said and done, your original solution is probably fine.
1 Or you are willing to store it as string and live with conversions, which you should be reluctant to do.
2 In practice, this would not really be a name (as in string) - it would be an integer (or enum if DBMS supports it) with one of the well-known predefined values identifying the table.
Thanks for all the help folks, i was able to formulate a solution with the help of a colleague of mine. Instead of multiple mapping tables i decided to just use one.
This mapping table holds a group of comments, so it has no primary key. And each group row links back to a comment. So you can have multiple of the same group id. one-many-one would be the relationship.

SQL Index on two columns

Here's my simple scenario:
I've a Users table and a Locations table. ONE User can be related to MANY Locations so I've a UserLocation table which is as follows:
ID (int-Auto Increment) PK
UserID (Int FK to the Users table)
LocID (Int FK to the Locations table)
Now, as ID is the PK it is Indexed by default in SQL-Server. I was a bit confused about the other two columns:
OPT 1: Shud I define an Index on both the columns like:
IX_UserLocation_UserID_LocID
OR
OPT 2: Shud I define two separate Indexes like : IX_UserLocation_UserID
& IX_UserLocation_LocID
Pardon me if both do the same - in that case pls explain. If not - which one is better and why?
You need
2 columns
UserID (Int FK to the Users table)
LocID (Int FK to the Locations table)
One PK on both (UserID, LocID)
Another index on the reverse (LocID, UserID)
You may not need both indexes but it's rare
Edit, some links to other SO answers
SQL: Do you need an auto-incremental primary key for Many-Many tables?
SQL - many-to-many table primary key
Difference between 2 indexes with columns defined in reverse order
There are several things we hire the database for. One is fast information retrieval and another is declarative referential integrity (DRI).
If you requirement is that a user may be related to a given location only once then you want a unique index on UserID & LocatonID.
If your question is how to retrieve the data fast the answer is -- it depends. How are you accessing the data? If you always get the entire set of locations for a user then I would probably use a clustered non-unique index on UserID. If your access is "who is in locatin x?" then you probably want a clustered non-unique index on LocationID.
If you ask both questions you'll probably want both indexes (although you only get 1 clustered, so the 2nd index may want to use an INCLUDE to grab the other column).
Either way, you probalby don't want ID as your clustered index (the default when marking a column as PK in SSMS table designer).
HTH,
-eric
In addition to the "gbn" answer. It will depend on the Where clause. Whether you are using user or location or both
You should probably create two separate indexes. One thing that is often forgotten with foreign keys is the fact that deleting a user might cascade-delete the user-location relation in your table. If there is no index on userID, this might lead to a table-lock of your user-location relation. The same applies to deleting a location.
The best way to setup all the indexed you think you need on dev and check look at the query plans of the queries your app runs and see what indexes get read.

how can i have a unique column in many tables

I have ten or more(i don't know) tables that have a column named foo with same datatype.
how can i tell sql that values in all the tables should be unique.
I mean If(i have value "1" in table1) I should NOT be able to have value "1" in table2
Have a common ID's table, which these ten tables reference. That will work well in that it will ensure unique ID's, but doesn't mean you couldn't duplicate the ID's in the table if someone really wants to.
What I mean is a common ID's table ensures that you don't have duplicates for insert (by also inserting an ID into this common table), but the thing is the way to guarantee that it never happens is by building the business rules into the system or placing check constraints to cross reference the other tables (which would ensure uniqueness, but degrade performance).
The question is phrased vaguely; if you need to generate a column that's unique among several tables, use row GUIDs or a common ID generator table; if you need to enforce uniqueness (and the field values are already there), use triggers.
Generally, if you generate the values, you don't need to enforce anything. The generation logic, if done right, will take care of that. If you are inserting, say, user input, then you can and should enforce uniqueness during insertion. As a validation rule or something.
You can define the field as a GUID (or a UNIQUEIDENTIFIER in SQL server). Then it will always be unique no matter what.
How about setting a check constraint on each table, such that ID % 10 = N (where N is the table number, from 0-9). And use IDENTITY(N,10) each time.
I would suggest that possibly your design is flawed. Why are these separate tables? It ouwld be better to put them in one table with one id field and another filed to identify whatever is making these spearate tables (cusotmer id for instance). Then you can read about partioning tables if you want them to be split by customer for performance reasons.

Foreign key referencing composite table

I've got a table structure I'm not really certain of how to create the best way.
Basically I have two tables, tblSystemItems and tblClientItems. I have a third table that has a column that references an 'Item'. The problem is, this column needs to reference either a system item or a client item - it does not matter which. System items have keys in the 1..2^31 range while client items have keys in the range -1..-2^31, thus there will never be any collisions.
Whenever I query the items, I'm doing it through a view that does a UNION ALL between the contents of the two tables.
Thus, optimally, I'd like to make a foreign key reference the result of the view, since the view will always be the union of the two tables - while still keeping IDs unique. But I can't do this as I can't reference a view.
Now, I can just drop the foreign key, and all is well. However, I'd really like to have some referential checking and cascading delete/set null functionality. Is there any way to do this, besides triggers?
sorry for the late answer, I've been struck with a serious case of weekenditis.
As for utilizing a third table to include PKs from both client and system tables - I don't like that as that just overly complicates synchronization and still requires my app to know of the third table.
Another issue that has arisen is that I have a third table that needs to reference an item - either system or client, it doesn't matter. Having the tables separated basically means I need to have two columns, a ClientItemID and a SystemItemID, each having a constraint for each of their tables with nullability - rather ugly.
I ended up choosing a different solution. The whole issue was with easily synchronizing new system items into the tables without messing with client items, avoiding collisions and so forth.
I ended up creating just a single table, Items. Items has a bit column named "SystemItem" that defines, well, the obvious. In my development / system database, I've got the PK as an int identity(1,1). After the table has been created in the client database, the identity key is changed to (-1,-1). That means client items go in the negative while system items go in the positive.
For synchronizations I basically ignore anything with (SystemItem = 1) while synchronizing the rest using IDENTITY INSERT ON. Thus I'm able to synchronize while completely ignoring client items and avoiding collisions. I'm also able to reference just one "Items" table which covers both client and system items. The only thing to keep in mind is to fix the standard clustered key so it's descending to avoid all kinds of page restructuring when the client inserts new items (client updates vs system updates is like 99%/1%).
You can create a unique id (db generated - sequence, autoinc, etc) for the table that references items, and create two additional columns (tblSystemItemsFK and tblClientItemsFk) where you reference the system items and client items respectively - some databases allows you to have a foreign key that is nullable.
If you're using an ORM you can even easily distinguish client items and system items (this way you don't need to negative identifiers to prevent ID overlap) based on column information only.
With a little more bakcground/context it is probably easier to determine an optimal solution.
You probably need a table say tblItems that simply store all the primary keys of the two tables. Inserting items would require two steps to ensure that when an item is entered into the tblSystemItems table that the PK is entered into the tblItems table.
The third table then has a FK to tblItems. In a way tblItems is a parent of the other two items tables. To query for an Item it would be necessary to create a JOIN between tblItems, tblSystemItems and tblClientItems.
[EDIT-for comment below] If the tblSystemItems and tblClientItems control their own PK then you can still let them. You would probably insert into tblSystemItems first then insert into tblItems. When you implement an inheritance structure using a tool like Hibernate you end up with something like this.
Add a table called Items with a PK ItemiD, And a single column called ItemType = "System" or "Client" then have ClientItems table PK (named ClientItemId) and SystemItems PK (named SystemItemId) both also be FKs to Items.ItemId, (These relationships are zero to one relationships (0-1)
Then in your third table that references an item, just have it's FK constraint reference the itemId in this extra (Items) table...
If you are using stored procedures to implement inserts, just have the stored proc that inserts items insert a new record into the Items table first, and then, using the auto-generated PK value in that table insert the actual data record into either SystemItems or ClientItems (depending on which it is) as part of the same stored proc call, using the auto-generated (identity) value that the system inserted into the Items table ItemId column.
This is called "SubClassing"
I've been puzzling over your table design. I'm not certain that it is right. I realise that the third table may just be providing detail information, but I can't help thinking that the primary key is actually the one in your ITEM table and the FOREIGN keys are the ones in your system and client item tables. You'd then just need to do right outer joins from Item to the system and client item tables, and all constraints would work fine.
I have a similar situation in a database I'm using. I have a "candidate key" on each table that I call EntityID. Then, if there's a table that needs to refer to items in more than one of the other tables, I use EntityID to refer to that row. I do have an Entity table to cross reference everything (so that EntityID is the primary key of the Entity table, and all other EntityID's are FKs), but I don't find myself using the Entity table very often.

Resources