I have a mySQL database which is accessed by web2py, but has one table (which has an auto increment column labelled 'id') which is also regularly altered by another script. This script frequently deletes and inserts new rows into the table, so that although the integers in the 'id' column are unique and ascending, there are also many intermediate numbers missing. Will this cause web2py problems in the future?
Note that I only access this table through a different column, which contains a different set of unique identifiers, so I don't really need the 'id' column at all: it's only there because the docs state that web2py requires it.
Having missing values in the id field would not affect web2py by any means. but deleting or changing an ID of a record while you are editing this record in web2py would result in an error. so just be careful your web2py users are not editing records during script is changing/deleting IDs.
The accepted answer is correct, but also note that depending on your use case, you might not need the auto-incrementing integer id field in your table at all, as the DAL can handle other types of primary keys via the primarykey argument to db.define_table(). In particular, if you are working with a read-only table and any references are to/from other keyed tables, you do not need the id field. For more details, see http://web2py.com/books/default/chapter/29/06/the-database-abstraction-layer#Legacy-databases-and-keyed-tables.
Related
I am creating a database where we want to combine data of several sites into one database. I now have an issue with the unique constraint for the samplepoint table. for each site the samplepointname must be unique. in the old system I enforced this with a unique constraint. The problem in the new system is that the siteID's are not stored in te table with samplepoints because these are enheritted from the parent of samplepoints (projects).
can I create a unique constraint that include the siteID stored in its parent, or should I create a siteID field in the table itself
I'm a bit confused by some of the phrasing of the question, so I'm going to lay out some clarifying assumptions based on what I think is my best read of it. Hopefully these assumptions actually match your situation.
In the original configuration, you had:
a single site
represented by a single pair of tables named "project" and "samplepoints"
a unique constraint over a field named "samplepointname"
a field named "siteID" in in a table named "project"
it had previously been unnecessary to add "siteID" to "samplepoints" because there was only one row in "project" and that one row's single "siteID" was always implied throughout the table "samplepoints"
And in the new configuration you have the following changes:
multiple sites
one row for each site in the table "projects"
a unique value for each field "siteID" in "projects"
You've stated that the field "sitepointname" within each site must be unique, but not globally. So I'm going to work with that.
Given these assumptions, you almost certainly will not merely want but need to add "siteID" to your table "sitepoints". This is because you can no longer simply read from "projects" and "sitepoints" at the same time without either joining them or adding a WHERE clause to filter down to the relevant site.
In fact, if your table "sitepoints" has already been populated without "siteID" you may well need to obtain the original tables from all of the different sites, empty that consolidated table, and repopulate it such that "siteID" correctly represents each independent site.
After you've added the new field "siteID", you'll remove the UNIQUE constraint on the field. You're going to replace that with what you'll use instead, and if you don't remove it all names will need to be unique across all sites rather than just within each site.
If you're simply executing commands directly, this will create that index:
CREATE UNIQUE INDEX unique_sitepointnames ON sitepoints (siteID, sitepointname);
The index name "unique_sitepointnames" is just an identifier, it can be whatever you wish, but that's my suggestion for it as it's clear and describes the purpose.
Rather than "UNIQUE" being a constraint on the column, "UNIQUE" is here a constraint on the index. Any more options to how the index is created is just optimization.
I have general question. I would like to versioning of table records. Typically you need in order remember product and his features (especially price). Commonly is price save within order item. But it is not good solution. Basically you can have much more reasons to remember features of table record. What is the best solution to remember version of records. I have two ideas:
Each table has table name tableName_log and every saving record in tableName, save origin to tableName_log, if is record changed. Problem is with foreign keys in tableName. I solved it by create column data in tableName_log where is array of objects from foreign keys encoded to json. It is very dificcult to manage.
Primary key of record is composite from ID and number of version (two columns). Actual is last version. Old versions is necessary mark by flag (eg. active = false) to possibility obtain all products (just last versions) from the table. It has some other problems.
Which way is better or is some other way to solve it more effectively?
Edit
That solution is something between my first and second example. But when in table of history is some ID (some foreign key), you will store just that ID, but not all record belong to that ID. Really is not better solution?
I'm working on a database design, and I face a situation where notifications will be sent according to logs in three tables, each log contains different data. NOTIFICATIONS table should then refer these three tables, and I thought of three possible designs, each seems to have flaws in it:
Each log table will have a unique incremented id, and NOTIFICATIONS table will have three different columns as FK's. The main flaw in this design is that I can't create real FK's since two of the three fields will be NULL for each row, and the query will have to "figure out" what kind of data is actually logged in this row.
The log tables will have one unique incremented id for all of them. Then I can make three OUTER JOINS with these tables when I query NOTIFCATIONS, and each row will have exactly one match. This seems at first like a better design, but I will have gaps in each log table and the flaws in option 1 still exist.
Option 1/2 + creating three notifications tables instead of one. This option will require the app to query notifications using UNION ALL.
Which option makes a better practice? Is there another way I didn't think of? Any advice will be appreciated.
I have one solution that sacrifices the referential integrity to help you achieve what you want.
You can keep a GUID data type as the primary key in all three log tables. In the Notification table you just need to add one foreign key column which won't point to any particular table. So only you know that it is a foreign key, SQL Server doesn't and it doesn't enforce referential integrity. In this column you store the GUID of notification. The notification can be in any of the three logs but since the primary key of all three logs is GUID, you can store the key in your Notification table.
Also you add another column in the Notification table to tell which of the three logs this GUID belongs to. Now you can uniquely know which row in the required log table you have to go to in order to find this notification info.
The problem is that you have three separate log tables. Instead you should have had only log table which would have an extra column specifying what kind of logging is it. That way you'd have only one table - referential integrity would have stayed and design would have been simple.
Use one table holding notification ids. Each of the three original tables hold subtypes of notification ids with FKs on their own ids to that table. Search re subtyping/subtables in databases. This is a standard design pattern/idiom.
(There are entities. We group them conceptually. We call the groups kinds or types. We say of a particular entity that it is a whatever kind or type of entity, or even that it "is a" whatever. We can have groups that contain all the entities of another group. Since the larger is a superset of the smaller we say that the larger type is a supertype of the smaller type, and the smaller is a subtype of the larger.)
There are idioms you can use to help constrain your tables declaratively. The main one is to have a subtype tag in the supertype table, and even also in the subtype tables (where each table has only one tag value).
I eventually faced two main options:
Following the last suggestion in this answer.
Choosing a less normalized structure for the database, AKA fake/no FK's. To be precise, in my case it would be my second option above with fake FK's.
I chose option #2 as a DBA whom I consulted enlightened me on the idea that database normalization should be done according to possible structure breakage. In my case, although notifications are created based on logs, these FK's are not necessary for querying the notifications nor for querying the log and the app do not have to ensure this relationship for a proper functioning. Thus, following option #1 may be "over-normalization".
Thanks all for your answers and comments.
My company requries that each table have a uniqueidentifier. However, most of the lookup tables have a numerical code for each entry. For example, there is a list of privileges: 10-None, 20-View, 30-Edit and so on. The code is essential and must be unique. Alos, the text of the code "None", "Edit", etc also needs to be unique, so now I have two separate fields that each need to be unique. Now I have to add a guid column. That's now three separate fields in the table that each need to be unique. This example uses a very simple list. The code value is essential and also some of these tables equate to an enum. I need both the code and the text. Having three separate fields each with a unique index seems contrary to normal table design.
Is there a more common practice to avoid this?
Thank you.
I do not know why you would want to change that setup even though you have several unique values already. We had a similar practice at my last employer where even though we had uniquely identifying fields everything had a unique identifier field.
This was a good thing as there would be infrequent errors in the program that would cause duplicates to be entered erroneously, except for the identifier fields. This usually kept our program from crashing or returning incorrect data as we always joined on the identifier rather than the 'unique' fields that were not supposed to have duplicates.
Indexes are placed on fields that are typically used when sorting or linking tables. For example, let's say you have a "normalized" environment of Product Orders. You would have one table for Customers, with a Customer ID and other info, and you would have one table with Orders, which woul dcontain the Customer ID. You would index Customer ID in both tables, because you're going to frequently join those tables.
As far as a unique identifier, you don't need to index them for them to be unique.
I have to load the data shown in the below image into my database.
For a particular row, either field PartID would be NULL OR field GroupID will be NULL, and the other available columns refers to the NON-NULL entity. I have following three options:
To use one database table, which will have one unified column say ID, which will have PartID and GroupID data. But, in this case I won't be able to apply foreign key constraint, as this column will be containing both entities' data.
To use one database table, which will have columns for both PartID and GroupID, which will contain the respective data. For each row, one of them will be NULL, But in this case I will be able to apply foreign key constraint.
To use two database tables, which will have similar structure, the only difference will be the column PartID and GroupID. In this case I will be able to apply foreign key constraint.
One thing to note here is that, the table(s) will be used in import processes to import about 30000 rows in one go and will also be heavily used in data retrieve operations. Also, the other columns will be used as pivot columns.
Can someone please suggest what should be best approach to achieve this?
I would use option 2 and add a constraint that only one can be non-null and the other must be null (just to be safe). I would not use option 1 because of the lack of a FK and the possibility of linking to the wrong table when not obeying the type identifier in the join.
There is a 4th option, which is to normalize them as "items" with another (surrogate) key and two link tables which link items to either parts or groups. This eliminates NULLs. There are further problems with that approach (items might be in both again or neither without any simple constraint), so unless that is necessary for other reasons, I wouldn't generally go down that path.
Option 3 could be fine - it really depends if these rows are a relation - i.e. data associated with a primary key. That's one huge problem I see with the data presented, the lack of a candidate key - I think you need to address that first.
IMO option 2 is the best - it's not perfectly normalized but will be the easiest to work with. 30K rows is not a lot of rows to import.
I would modify the table so it has one ID column and then add an IDType that is either "G" for Group or "P" for Part.