race condition in logic for database - database

I have a database server.
The application logic is that it will query to see if a particular row exist, if not, it will insert a new row. The query is done in Java and container managed transactions.
So with 2 application server running the same code, is it possible for both servers to check the row don't exist, and both insert the row. (the insert will be successful due to another unique auto-number primary key column)
how do we ensure there is only one and only one unique row for that data ?
sknaht

the insert will be successful due to another unique auto-number primary key column
Most DBMSes offer a way to create a "unique key" or "unique index" that enforces uniqueness of a given column (or set of columns) even if it's not the primary key. The second insert would then fail, just as if it had violated a primary-key constraint.
You haven't indicated what DBMS you're using, but most (all?) of the common ones have this feature; for example, PostgreSQL, MySQL, SQL Server, and Oracle all do.

Related

SQL Server re-uses the same IDENTITY Id twice

I hope the question is not too generic.
I have a table Person that has a PK Identity column Id.
Via C#, I insert new entries for Person and the Id get set to 1,2,3 for the 3 persons added.
Also via C#, I perform all deletions of the persons with Id=1,2,3 so that there's no Person in the Table anymore.
Afterwards, I run some change scripts (I can't post them as they are too long) also on Table Person.
I don't do any RESEED.
Now the fun:
If I call SELECT IDENT_CURRENT('Person') it shows 3 instead of 4.
If I do an insert of Person again, I get a Person with the Id 3 added instead of Id 4.
Any idea why and how this can happen?
EDIT
I think I found the explanation of my question:
While performing DB Changes via SQL Server Management Studio, The Designer creates
a temp table Tmp_Person and moves the data from Person inside there. Afterwards he performs a rename of Tmp_Person to Person. Since this is a new table the Index starts again from the beginning.
An IDENTITY property doesn't guarentee uniqueness. That's what a PRIMARY KEY or UNIQUE INDEX is for. This is covered in the documentation in the remarks section, along with other intended behaviour. CREATE TABLE (Transact-SQL) IDENTITY (Property) - Remarks:
The identity property on a column does not guarantee the following:
Uniqueness of the value - Uniqueness must be enforced by using a PRIMARY KEY or UNIQUE constraint or UNIQUE index.
Consecutive values within a transaction - A transaction inserting multiple rows is not guaranteed to get consecutive values for the rows
because other concurrent inserts might occur on the table. If values
must be consecutive then the transaction should use an exclusive lock
on the table or use the SERIALIZABLE isolation level.
Consecutive values after server restart or other failures -SQL Server might cache identity values for performance reasons and some of
the assigned values can be lost during a database failure or server
restart. This can result in gaps in the identity value upon insert. If
gaps are not acceptable then the application should use its own
mechanism to generate key values. Using a sequence generator with the
NOCACHE option can limit the gaps to transactions that are never
committed.
Reuse of values - For a given identity property with specific seed/increment, the identity values are not reused by the engine. If a
particular insert statement fails or if the insert statement is rolled
back then the consumed identity values are lost and will not be
generated again. This can result in gaps when the subsequent identity
values are generated.
These restrictions are part of the design in order to improve
performance, and because they are acceptable in many common
situations. If you cannot use identity values because of these
restrictions, create a separate table holding a current value and
manage access to the table and number assignment with your
application.
Emphasis mine for this question.

Minimum number of candidate keys for a relation?

My question is, is it necessary for a relation/table in database to have a candidate key and hence a primary key? Is it possible to have a relation where a row cannot be uniquely identified by any combination of attributes?
If no, why? And if yes, then how does a DBMS make operations like search, delete etc, efficient?
Relations always have distinct tuples which means that in a Relational DBMS a table always has at least one candidate key.
SQL is a different case. SQL tables are "tuple bags", not relations. SQL tables can have duplicate rows, which is one of SQL's biggest flaws. Despite the fact that SQL supports duplicate rows the language is ill-suited to cope with them. In the presence of duplicate rows the SQL standard UPDATE and DELETE for instance have no guaranteed way to reference individual rows without resorting to some complex cursor-based operations.
Consequent problems of duplicate rows are certain inefficiencies and complexities of SQL DBMSs and a lack of orthogonality in their features. SQL DBMS engines have to use internal structures and support special features as a prerequisite in order to deal with duplicate rows. Some DBMS vendors try to get around the difficulties by disabling certain features for tables that don't have keys.
A database does not require a primary key. A table is just an unordered set of rows. Without any indexes, the only mechanism for accessing rows in a table is a full table scan (or a full partition scan, if the table is partitioned). Such operations are only efficient for very small numbers of rows.
Tables are more useful when you can refer to particular rows. Often, the best primary keys are auto incremented/identity primary keys. These are maintained by the database. In practice, all tables in a well-designed database are going to have primary keys. Here are three reasons:
Rows can be referred to by other tables.
Individual rows can be updated and deleted.
Individual rows can be selected efficiently and unambiguously.
Note: you can have indexes on a table without primary keys. And combinations of one or more columns can be made unique, even if the combination is not a primary key. The primary key itself is an index, so the inverse is not true. And all rows in a table have "row addresses" which are unique. Whether or not these are available for queries depends on the database engine.
Yes, this is possible.
Just note, that some identifier does exists behind the scenes (Example from SQL Server):
When a table is stored as a heap, individual rows are identified by
reference to a row identifier (RID) consisting of the file number, data page number, and slot on the page
How operations will be performed?
A table scan will be needed for almost any operation:
If a table is a heap and does not have any nonclustered indexes, then
the entire table must be examined (a table scan) to find any row

Can't work around this foreign key constraint rule using TRIGGER in SQLite?

First, I want to talk a little about the Foreign key constraint rule and how helpful it is. Suppose I have two tables, a primary table with the primary column called ID, the other table is the foreign one which also has a primary column called ID. This column in the foreign table refers to the ID column in the primary table. If we don't establish any Foreign key relation/constraint between those tables, we may fall foul of many problems related to integrity.
If we create the foreign key relation for them, any changes to the ID column in primary table will 'auto' reflect to the ID column in the foreign table, changes here can be made by DELETE, UPDATE queries. Moreover, any changes to the ID in the foreign table should be constrained by the ID column in the primary table, for example there shouldn't any new value inserted or updated in the ID column of the foreign table unless it does exist in the ID column of the primary table.
I know that SQLite doesn't support foreign key constraint (with full functions as detailed above) and I have to use TRIGGER to work around this problem. I have used TRIGGER to work around successfully in one way (Any changes to the ID column in the primary table will refect to the ID column in the foreign table) but the reverse way (should throw/raise any error if there is a confict occurs, for example, there are only values 1,2,3 in the ID column of the primary table, but the value 2 in the ID column of the foreign table is updated to 4 -> not exist in the primary table -> should throw error) is not easy. The difficult is SQLite doesn't also support IF statement and RAISERROR function. If these features were supported, I could work around easily.
I wonder how you can use SQLite if it doesn't support some important features? Even working around by using TRIGGER is not easy and I think it's impossible, except that you don't care about the reverse way. (In fact, the reverse way is not really necessary if you set up your SQL queries carefully, but who can make sure? Raising error is a mechanism reminding us to fix and correct and making it work exactly without corrupting data and the bugs can't be invisible.
If you still don't know what I want, I would like to have some last words, my purpose is to achieve the full functionality of the Foreign key constraint which is not supported in SQLite (even you can create such a relationship but it's fake, not real as you can benefit from it in SQL Server, SQL Server Ce, MS Access or MySQL).
Your help would be highly appreciated.
PS: I really like SQLite because it is file-based, easy to deploy, supports large file size (an advantage over SQL Server Ce) but some missing features have made me re-think many times, I'm afraid if going for it, my application may be unreliable and corrupt unpredictably.
To answer the question that you have skillfully hidden in your rant:
SQLite allows the RAISE function inside triggers; because of the lack of control flow statements, this must be used with a SELECT:
CREATE TRIGGER check_that_id_exists_in_parent
BEFORE UPDATE OF id ON child_table
FOR EACH ROW
BEGIN
SELECT RAISE(ABORT, 'parent ID does not exist')
WHERE NOT EXISTS (SELECT 1
FROM parent_table
WHERE id = NEW.id);
END;

When having an identity column is not a good idea?

In tables where you need only 1 column as the key, and values in that column can be integers, when you shouldn't use an identity field?
To the contrary, in the same table and column, when would you generate manually its values and you wouldn't use an autogenerated value for each record?
I guess that it would be the case when there are lots of inserts and deletes to the table. Am I right? What other situations could be?
If you already settled on the surrogate side of the Great Primary Key Debacle then I can't find a single reason not use use identity keys. The usual alternatives are guids (they have many disadvatages, primarily from size and randomness) and application layer generated keys. But creating a surrogate key in the application layer is a little bit harder than it seems and also does not cover non-application related data access (ie. batch loads, imports, other apps etc). The one special case is distributed applications when guids and even sequential guids may offer a better alternative to site id + identity keys..
I suppose if you are creating a many-to-many linking table, where both fields are foreign keys, you don't need an identity field.
Nowadays I imagine that most ORMs expect there to be an identity field in every table. In general, it is a good practice to provide one.
I'm not sure I understand enough about your context, but I interpret your question to be:
"If I need the database to create a unique column (for whatever reason), when shouldn't it be a monotonically increasing integer (identity) column?"
In those cases, there's no reason to use anything other than the facility provided by the DBMS for the purpose; in your case (SQL Server?) that's an identity.
Except:
If you'll ever need to merge the table with data from another source, use a GUID, which will prevent duplicate keys from colliding.
If you need to merge databases it's a lot easier if you don't have to regenerate keys.
One case of not wanting an identity field would be in a one to one relationship. The secondary table would have as its primary key the same value as the primary table. The only reason to have an identity field in that situation would seem to be to satisfy an ORM.
You cannot (normally) specify values when inserting into identity columns, so for example if the column "id" was specified as an identify the following SQL would fail:
INSERT INTO MyTable (id, name) VALUES (1, 'Smith')
In order to perform this sort of insert you need to have IDENTITY_INSERT on for that table - this is not intended to be on normally and can only be on for a maximum of 1 tables in the database at any point in time.
If I need a surrogate, I would either use an IDENTITY column or a GUID column depending on the need for global uniqueness.
If there is a natural primary key, or the primary key is defined as a unique combination of other foreign keys, then I typically do not have an IDENTITY, nor do I use it as the primary key.
There is an exception, which is snapshot configuration tables which I am tracking with an audit trigger. In this case, there is usually a logical "primary key" (usually date of the snapshot and natural key of the row - like a cost center or gl account number for which the row is a configuration record), but instead of using the natural "primary key" as the primary key, I add an IDENTITY and make that the primary key and make a unique index or constraint on the date and natural key. Although theoretically the date and natural key shouldn't change, in these tables, if a user does that instead of adding a new row and deleting the old row, I want the audit (which reflects a change to a row identified by its primary key) to really reflect a change in the row - not the disappearance of a key and the appearance of a new one.
I recently implemented a Suffix Trie in C# that could index novels, and then allow searches to be done extremely fast, linear to the size of the search string. Part of the requirements (this was a homework assignment) was to use offline storage, so I used MS SQL, and needed a structure to represent a Node in a table.
I ended up with the following structure : NodeID Character ParentID, etc, where the NodeID was a primary key.
I didn't want this to be done as an autoincrementing identity for two main reasons.
How do I get the value of a NodeID after I add it to the database/data table?
I wanted more control when it came to generating my own IDs.

MS-SQL Filling in Identity column values

I'm building an ASP.Net/MVC application using SQL 2008 Developer edition and a DB in Sql2005 compatibility mode. Using Entity Framework as DAL.
My problem is that I have a table where I'm using the integer identity column in a like an Invoice Number, that is, it always has to be unique and never reused. So using the GUID column type won't work without a substantial effort.
What I'm seeing is that the DB is filling in the gaps in the identity column. This will cause me long term problems. Is there a setting to disable this "filling in"
That sounds like something outside SQL server; SQL server does not "go back" and re-use gaps in identities unless the table's been reseeded, but even then it will blindly increment one-by-one and probably return a lot of duplicate key errors as it hits rows with existing values.
Are you sure the column is an identity? Is there anything else that might be re-assigning keys and/or turning on identity insert when creating rows?
SQL Server does not fill in the gaps of an identity field by default it will just keep going up in numbers as you insert rows.
It is possible to reset the identity back to 1 and therefore you may then see what you are describing.
Can I suggest you post some code / db structure that shows your problem and search for any code you may have that my perform an identity reseed.
Unless I am not understanding your issue correctly. If you create a primary key on your identity column, or a unique constraint, you can avoid the issue of duplicate values.
For example:
create table TableName
(
InvoiceID int identity(1,1) not null primary key
)

Resources