I have a table with a column of type nchar(16) that is automatically filled with random characters generated by setting the default value of the column to dbo.randomtext((16)) (a scalar function). There will be about 1M records in the table.
I know that the likelihood of getting non-unique values is low, but is there some way to ensure that this does not happen and the column is really unique?
What will happen if I define the column as UNIQUE and the random text generated is not unique?
I am using SQL Server 2016 Standard edition.
Seems like what you should really be using is a SEQUENCE, IDENTITY or uniqueidentity and this smells like you are "reinventing" the wheel.
As for the questions you ask:
I know that the likelihood of getting non-unique values is low, but is there some way to ensure that this does not happen and the column is really unique?
Yes, create a UNIQUE INDEX or UNIQUE CONSTRAINT on the column. Then every row, must have a unique in that column.
What will happen if I define the column as UNIQUE and the random text generated is not unique?
If there is already at least one duplicate in the column, then creating the INDEX/CONSTRAINT will fail; you'll need to ensure you DELETE/UPDATE any duplicates before you can create the INDEX/CONSTRAINT. If it is an INSERT/UPDATE it will fail, and entire DML statement will not take affect (the new row(s) won't be inserted, or the row(s) won't be updated).
Related
I have a 'change history' table in my SQL Server DB called tblReportDataQueue that records changes to rows in other source tables.
There are triggers on the source tables in the DB which fire after INSERT, UPDATE or DELETE. The triggers all call a stored procedure that just inserts data into the change history table that has an identity column:
INSERT INTO tblReportDataQueue
(
[SourceObjectTypeID],
[ActionID],
[ObjectXML],
[DateAdded],
[RowsInXML]
)
VALUES
(
#SourceObjectTypeID,
#ActionID,
#ObjectXML,
GetDate(),
#RowsInXML
)
When a row in a source table is updated multiple times in quick succession the triggers fire in the correct order and put the changed data in the change history table in the order that it was changed. The problem is that I had assumed that the DateAdded field would always be in the same order as the identity field but somehow it is not.
So my table is in the order that things actually happened when sorted by the identity field but not when sorted by the 'DateAdded' field.
How can this happen?
screenshot of example problem
In example image 'DateAdded' of last row shown is earlier than first row shown.
You are using a surrogate key. One very important characteristic of a surrogate key is that it cannot be used to determine anything about the tuple it represents, not even the order of creation. All systems which have auto generated values like this, including Oracles sequences, make no guarantee as to order, only that the next value generated will be unique from previous generated values. That is all that is required, really.
We all do it, of course. We look at a row with ID of 2 and assume it was inserted after the row with ID of 1 and before the row with ID of 3. That is a bad habit we should all work to break because the assumption could well be wrong.
You have the DateAdded field to provide the information you want. Order by that field and you will get the rows in order of insertion (if that field is not updateable, that is). The auto generated values will tend to follow that ordering, but absolutely do not rely on that!
try use Sequence...
"Using the identity attribute for a column, you can easily generate auto-
incrementing numbers (which as often used as a primary key). With Sequence, it
will be a different object which you can attach to a table column while
inserting. Unlike identity, the next number for the column value will be
retrieved from memory rather than from the disk – this makes Sequence
significantly faster than Identity.
Unlike identity column values, which are generated when rows are inserted, an
application can obtain the next sequence number before inserting the row by
calling the NEXT VALUE FOR function. The sequence number is allocated when NEXT
VALUE FOR is called even if the number is never inserted into a table. The NEXT
VALUE FOR function can be used as the default value for a column in a table
definition. Use sp_sequence_get_range to get a range of multiple sequence
numbers at once."
How can I get last autoincrement value of specific table right after I open database? It's not last_insert_rowid() because there is no insertion transaction. In other words I want to know in advance which number autoincrement will choose when inserting new row for particular table.
It depends on how the autoincremented column has been defined.
If the column definition is INTEGER PRIMARY KEY AUTOINCREMENT, then SQLite will keep the largest ID in an internal table called sqlite_sequence.
If the column definition does NOT contain the keyword AUTOINCREMENT, SQLite will use its ‘regular’ routine to determine the new ID. From the documentation:
The usual algorithm is to give the newly created row a ROWID that is
one larger than the largest ROWID in the table prior to the insert. If
the table is initially empty, then a ROWID of 1 is used. If the
largest ROWID is equal to the largest possible integer
(9223372036854775807) then the database engine starts picking positive
candidate ROWIDs at random until it finds one that is not previously
used. If no unused ROWID can be found after a reasonable number of
attempts, the insert operation fails with an SQLITE_FULL error. If no
negative ROWID values are inserted explicitly, then automatically
generated ROWID values will always be greater than zero.
I remember reading that, for columns without AUTOINCREMENT, the only surefire way to determine the next ID is to VACUUM the database first; that will reset all ID counters to the largest existing ID for that table + 1. But I can’t find that quote anymore, so this may no longer be true.
That said, I agree with slash_rick_dot that fetching auto-incremented IDs beforehand is a bad idea, especially if there’s even a remote chance that another process might write to the database at the same time.
Different databases implement auto-increment differently. But as far as I know, none of them will answer the question you are asking.
The auto increment feature is intended to create a unique ID for a newly added table row. If a row hasn't been inserted yet, then the feature hasn't produced the id.
And it makes sense... If you did get the next auto increment number, what would you do with it? Likely the intent is to assign it as the primary key of the not-yet-inserted new row. But between the time you got the id, and the time you used it to insert the row, the database could have used that id to insert a row for another process.
Your choices are this: manage the creation of ids yourself, or wait until rows are inserted before using their auto-created identifiers.
I'm fairly new to SQL Server, so if anything I say doesn't make sense, there's a good chance I'm just confused by something. Anyway...
I have a simple mapping table. It has two columns, Before and After. All I want is a constraint that the Before column is unique. Originally it was set to be a primary key, but this created errors when the value was too large. I tried adding an ID column as a primary key and then adding UNIQUE to the Before column, but I have the same problem with the max length exceeding 900 bytes (I guess the constraint creates an index).
The only option I can think of is too change the id column to a checksum column and make that the primary key, but I dislike this option. Is there a different way to do this? I just need two simple columns.
The only way I can think of to guarantee uniqueness inside the database is to use an INSTEAD OF trigger. The link I provided to MSDN has an example for checking uniqueness. This solution will most likely be quite slow indeed, since you won't be able to index on the column being checked.
You could speed it up somewhat by using a computed column to create a hash, perhaps using the HASHBYTES function, of the Before column. You could then create a non-unique index on that hash column, and inside your trigger check for the negative case -- that is, check to see if a row with the same hash doesn't exist. If that happens, exit the trigger. In the case there is another row with the same hash, you could then do the more expensive check for an exact duplicate, and raise an error if the user enters a duplicate value. You also might be able to simplify your check by simply comparing both the hash value and the Before value in one EXISTS() clause, but I haven't played around with the performance of that solution.
(Note that the HASHBYTES function I referred to itself can hash only up to 8000 bytes. If you want to go bigger than that, you'll have to roll your own hash function or live with the collisions caused by the CHECKSUM() function)
I'm inserting new rows into a SQLite table, but I don't want to insert duplicate rows.
I also don't want to specify every column in the database if possible.
I don't even know if this is possible.
I should be able to take my values and create a new row with them, but if they duplicate another row they should either overwrite the existing row or do nothing.
This is one of the very first steps in database design and normalization. You have to be able to explicitly define what you mean by a duplicate row, and then place a primary key constraint, (or a unique constraint), on the columns in your table that represent that definition.
Before you can define what duplicate means, you have to define (or decide) exactly what the table is to contain,. i.e., what real-world business domain entity or abstraction each row in the table represents, or will hold data for...
Once you have done this, the PK or unique constraint will stop you from inserting duplicate rows... The same PK will help you find the duplicate row when it does exist, and update it with the values of the non-duplicate-defining (non-PK) columns that are different from the values in the existing duplicate row. Only after all this has been done, can an insert or replace (as defined by SQL Lite) process help. This command checks whether a duplicate row (*as dedined by yr PK constraint) exists, and if it does, instead of inserting a new row, it updates the non-PK defined columns in that row with the values spplied by your Replace query.
Your desires appear mutually contradictory. While Andrey's insert or replace answer will get you close to what say you want, what should probably clarify for yourself what you really want.
If you don't want to specify every column, and you want a (presumably) partial row to update rather than insert, you should probably look at the unique constraint and know that the ambiguity of your requirements was also made by the SQL92 Committee.
http://www.sqlite.org/lang_insert.html
insert or replace might interest you
I am using SQL Server 2008, had a table with an id (numeric) column as the primary key. Also had a unique index on three varchar columns. I was able to add a row with the exact same set of the three columns. I verified it with a simple query on the values and 2 rows were returned.
I edited the index and added the id column. When I tried to edit it again and remove the id column it complained that there were duplicate rows, it deleted the index but couldn't create it.
I then clean the database of the duplicated, recreated the index with the same 3 varchars as unique and nonclustered and now it works properly, not allowing duplicates.
Does anyone know why the uniqueness of this index was ignored?
The index could had been disabled (see Disabling Indexes), your 'duplicate' values may had been different (trailing spaces for example), or your test may be incorrect.
For sure you did not insert a duplicate in a enforced unique index.
I'm not a pro on this subject, but the 'is unique'-setting of the index probably refers to the way the index is build/stored, not that it is a constraint. This probably also means the performance of the index is sub-optimal.
When creating the index, the DBMS might check this.