I am creating a database where we want to combine data of several sites into one database. I now have an issue with the unique constraint for the samplepoint table. for each site the samplepointname must be unique. in the old system I enforced this with a unique constraint. The problem in the new system is that the siteID's are not stored in te table with samplepoints because these are enheritted from the parent of samplepoints (projects).
can I create a unique constraint that include the siteID stored in its parent, or should I create a siteID field in the table itself
I'm a bit confused by some of the phrasing of the question, so I'm going to lay out some clarifying assumptions based on what I think is my best read of it. Hopefully these assumptions actually match your situation.
In the original configuration, you had:
a single site
represented by a single pair of tables named "project" and "samplepoints"
a unique constraint over a field named "samplepointname"
a field named "siteID" in in a table named "project"
it had previously been unnecessary to add "siteID" to "samplepoints" because there was only one row in "project" and that one row's single "siteID" was always implied throughout the table "samplepoints"
And in the new configuration you have the following changes:
multiple sites
one row for each site in the table "projects"
a unique value for each field "siteID" in "projects"
You've stated that the field "sitepointname" within each site must be unique, but not globally. So I'm going to work with that.
Given these assumptions, you almost certainly will not merely want but need to add "siteID" to your table "sitepoints". This is because you can no longer simply read from "projects" and "sitepoints" at the same time without either joining them or adding a WHERE clause to filter down to the relevant site.
In fact, if your table "sitepoints" has already been populated without "siteID" you may well need to obtain the original tables from all of the different sites, empty that consolidated table, and repopulate it such that "siteID" correctly represents each independent site.
After you've added the new field "siteID", you'll remove the UNIQUE constraint on the field. You're going to replace that with what you'll use instead, and if you don't remove it all names will need to be unique across all sites rather than just within each site.
If you're simply executing commands directly, this will create that index:
CREATE UNIQUE INDEX unique_sitepointnames ON sitepoints (siteID, sitepointname);
The index name "unique_sitepointnames" is just an identifier, it can be whatever you wish, but that's my suggestion for it as it's clear and describes the purpose.
Rather than "UNIQUE" being a constraint on the column, "UNIQUE" is here a constraint on the index. Any more options to how the index is created is just optimization.
Related
There are a lot of questions asking this but I can't seem to find one with the specific solution.
I have a 3rd party database where every field is set to allow null. One of the columns ("Code") is a unique string ID and it is distinct.
Using entity framework I'd like to add this table, by telling EF to treat the column "Code" as a primary key.
I've created a view but I am not sure where to go from here.
I've seen some solutions that involve adding an extra row number to use as the primary key but I would prefer to use "Code" if possible.
Any ideas?
After some playing around I found a read-only solution
In the view I modify the column to be:
SELECT ISNULL(Code, -1) AS Code
Specifying ISNULL allows EF to infer a primary key. It is not ideal as I would like it to be writable as well. You get the message:
Error 6002: The table/view 'KittyCat.dbo.View_GetCatDetails' does not
have a primary key defined. The key has been inferred and the
definition was created as a read-only table/view.
I have a mySQL database which is accessed by web2py, but has one table (which has an auto increment column labelled 'id') which is also regularly altered by another script. This script frequently deletes and inserts new rows into the table, so that although the integers in the 'id' column are unique and ascending, there are also many intermediate numbers missing. Will this cause web2py problems in the future?
Note that I only access this table through a different column, which contains a different set of unique identifiers, so I don't really need the 'id' column at all: it's only there because the docs state that web2py requires it.
Having missing values in the id field would not affect web2py by any means. but deleting or changing an ID of a record while you are editing this record in web2py would result in an error. so just be careful your web2py users are not editing records during script is changing/deleting IDs.
The accepted answer is correct, but also note that depending on your use case, you might not need the auto-incrementing integer id field in your table at all, as the DAL can handle other types of primary keys via the primarykey argument to db.define_table(). In particular, if you are working with a read-only table and any references are to/from other keyed tables, you do not need the id field. For more details, see http://web2py.com/books/default/chapter/29/06/the-database-abstraction-layer#Legacy-databases-and-keyed-tables.
My company requries that each table have a uniqueidentifier. However, most of the lookup tables have a numerical code for each entry. For example, there is a list of privileges: 10-None, 20-View, 30-Edit and so on. The code is essential and must be unique. Alos, the text of the code "None", "Edit", etc also needs to be unique, so now I have two separate fields that each need to be unique. Now I have to add a guid column. That's now three separate fields in the table that each need to be unique. This example uses a very simple list. The code value is essential and also some of these tables equate to an enum. I need both the code and the text. Having three separate fields each with a unique index seems contrary to normal table design.
Is there a more common practice to avoid this?
Thank you.
I do not know why you would want to change that setup even though you have several unique values already. We had a similar practice at my last employer where even though we had uniquely identifying fields everything had a unique identifier field.
This was a good thing as there would be infrequent errors in the program that would cause duplicates to be entered erroneously, except for the identifier fields. This usually kept our program from crashing or returning incorrect data as we always joined on the identifier rather than the 'unique' fields that were not supposed to have duplicates.
Indexes are placed on fields that are typically used when sorting or linking tables. For example, let's say you have a "normalized" environment of Product Orders. You would have one table for Customers, with a Customer ID and other info, and you would have one table with Orders, which woul dcontain the Customer ID. You would index Customer ID in both tables, because you're going to frequently join those tables.
As far as a unique identifier, you don't need to index them for them to be unique.
I have to load the data shown in the below image into my database.
For a particular row, either field PartID would be NULL OR field GroupID will be NULL, and the other available columns refers to the NON-NULL entity. I have following three options:
To use one database table, which will have one unified column say ID, which will have PartID and GroupID data. But, in this case I won't be able to apply foreign key constraint, as this column will be containing both entities' data.
To use one database table, which will have columns for both PartID and GroupID, which will contain the respective data. For each row, one of them will be NULL, But in this case I will be able to apply foreign key constraint.
To use two database tables, which will have similar structure, the only difference will be the column PartID and GroupID. In this case I will be able to apply foreign key constraint.
One thing to note here is that, the table(s) will be used in import processes to import about 30000 rows in one go and will also be heavily used in data retrieve operations. Also, the other columns will be used as pivot columns.
Can someone please suggest what should be best approach to achieve this?
I would use option 2 and add a constraint that only one can be non-null and the other must be null (just to be safe). I would not use option 1 because of the lack of a FK and the possibility of linking to the wrong table when not obeying the type identifier in the join.
There is a 4th option, which is to normalize them as "items" with another (surrogate) key and two link tables which link items to either parts or groups. This eliminates NULLs. There are further problems with that approach (items might be in both again or neither without any simple constraint), so unless that is necessary for other reasons, I wouldn't generally go down that path.
Option 3 could be fine - it really depends if these rows are a relation - i.e. data associated with a primary key. That's one huge problem I see with the data presented, the lack of a candidate key - I think you need to address that first.
IMO option 2 is the best - it's not perfectly normalized but will be the easiest to work with. 30K rows is not a lot of rows to import.
I would modify the table so it has one ID column and then add an IDType that is either "G" for Group or "P" for Part.
I've got a table structure I'm not really certain of how to create the best way.
Basically I have two tables, tblSystemItems and tblClientItems. I have a third table that has a column that references an 'Item'. The problem is, this column needs to reference either a system item or a client item - it does not matter which. System items have keys in the 1..2^31 range while client items have keys in the range -1..-2^31, thus there will never be any collisions.
Whenever I query the items, I'm doing it through a view that does a UNION ALL between the contents of the two tables.
Thus, optimally, I'd like to make a foreign key reference the result of the view, since the view will always be the union of the two tables - while still keeping IDs unique. But I can't do this as I can't reference a view.
Now, I can just drop the foreign key, and all is well. However, I'd really like to have some referential checking and cascading delete/set null functionality. Is there any way to do this, besides triggers?
sorry for the late answer, I've been struck with a serious case of weekenditis.
As for utilizing a third table to include PKs from both client and system tables - I don't like that as that just overly complicates synchronization and still requires my app to know of the third table.
Another issue that has arisen is that I have a third table that needs to reference an item - either system or client, it doesn't matter. Having the tables separated basically means I need to have two columns, a ClientItemID and a SystemItemID, each having a constraint for each of their tables with nullability - rather ugly.
I ended up choosing a different solution. The whole issue was with easily synchronizing new system items into the tables without messing with client items, avoiding collisions and so forth.
I ended up creating just a single table, Items. Items has a bit column named "SystemItem" that defines, well, the obvious. In my development / system database, I've got the PK as an int identity(1,1). After the table has been created in the client database, the identity key is changed to (-1,-1). That means client items go in the negative while system items go in the positive.
For synchronizations I basically ignore anything with (SystemItem = 1) while synchronizing the rest using IDENTITY INSERT ON. Thus I'm able to synchronize while completely ignoring client items and avoiding collisions. I'm also able to reference just one "Items" table which covers both client and system items. The only thing to keep in mind is to fix the standard clustered key so it's descending to avoid all kinds of page restructuring when the client inserts new items (client updates vs system updates is like 99%/1%).
You can create a unique id (db generated - sequence, autoinc, etc) for the table that references items, and create two additional columns (tblSystemItemsFK and tblClientItemsFk) where you reference the system items and client items respectively - some databases allows you to have a foreign key that is nullable.
If you're using an ORM you can even easily distinguish client items and system items (this way you don't need to negative identifiers to prevent ID overlap) based on column information only.
With a little more bakcground/context it is probably easier to determine an optimal solution.
You probably need a table say tblItems that simply store all the primary keys of the two tables. Inserting items would require two steps to ensure that when an item is entered into the tblSystemItems table that the PK is entered into the tblItems table.
The third table then has a FK to tblItems. In a way tblItems is a parent of the other two items tables. To query for an Item it would be necessary to create a JOIN between tblItems, tblSystemItems and tblClientItems.
[EDIT-for comment below] If the tblSystemItems and tblClientItems control their own PK then you can still let them. You would probably insert into tblSystemItems first then insert into tblItems. When you implement an inheritance structure using a tool like Hibernate you end up with something like this.
Add a table called Items with a PK ItemiD, And a single column called ItemType = "System" or "Client" then have ClientItems table PK (named ClientItemId) and SystemItems PK (named SystemItemId) both also be FKs to Items.ItemId, (These relationships are zero to one relationships (0-1)
Then in your third table that references an item, just have it's FK constraint reference the itemId in this extra (Items) table...
If you are using stored procedures to implement inserts, just have the stored proc that inserts items insert a new record into the Items table first, and then, using the auto-generated PK value in that table insert the actual data record into either SystemItems or ClientItems (depending on which it is) as part of the same stored proc call, using the auto-generated (identity) value that the system inserted into the Items table ItemId column.
This is called "SubClassing"
I've been puzzling over your table design. I'm not certain that it is right. I realise that the third table may just be providing detail information, but I can't help thinking that the primary key is actually the one in your ITEM table and the FOREIGN keys are the ones in your system and client item tables. You'd then just need to do right outer joins from Item to the system and client item tables, and all constraints would work fine.
I have a similar situation in a database I'm using. I have a "candidate key" on each table that I call EntityID. Then, if there's a table that needs to refer to items in more than one of the other tables, I use EntityID to refer to that row. I do have an Entity table to cross reference everything (so that EntityID is the primary key of the Entity table, and all other EntityID's are FKs), but I don't find myself using the Entity table very often.