Prevent duplicate database table entries

Prevent duplicate database table entries - database

To prevent duplicate table entries in a database I use a primary key. I just add the information and if it is a duplicate then the primary will be a duplicate and it will not add to the table.
Should I also do a SQL query (before trying to add to the database) to see if the entry exists? Or is this redundant since I already have the primary key set?

It is redundant to check for presence of a value if you already have a constraint to prevent duplicates.
But it would also be ineffective to check before you insert, because some other concurrent client might insert that value in the moment between your check and your insert. So even if you check first, you'd still need to handle duplicate key errors.

Defining "unique constraint" on table with desired column will fix everything. If there's a dupplicate you will get error.

With most database platforms, when you create the primary key, the operation will fail if there are duplicate entries, so there should be no need to test for this beforehand.

Usually you'd get an exception or an error code from the call to SQL engine. If you need to handle that or not depends on your application logic. For example if it is a new username, and it already exists in the database, then exception is part of you application logic, and you will provide new user with a message about why registration failed.

Related

Identity Specification is making everything troublesome

So, my issue is that I am trying to make something that will easily load in Excel datasheets into a SQL database, but before this I have to try and make the identity specification cooperate with me.
The issue begins when I assign the primary key with identity specification being true, as then I get the error message "Cannot insert explicit value for identity column in table 'Priskod' when IDENTITY_INSERT is set to OFF."
However when I set that identity specification is false, then I get the error message "Violation of PRIMARY KEY constraint 'PK_dbo.Priskod'. Cannot insert duplicate key in object 'dbo.Priskod'. The duplicate key value is (0). The statement has been terminated."
Does anyone have any suggestions about how I can fix this?

It sound like you have an Excel spreadsheet that holds data that you want to import into a SQL Server table.
The issue is that you are trying to load it directly in to the source table and to enable this, you are disabling the IDENTITY column. This should ring alarm bells really, as SQL is right when it prevents you from inserting duplicate keys.
There are 2 options here:
The key values in Excel are true identity values that are unique, so you will only INSERT records they don't exist in the target table. This would probably be best achieved by importing to a staging/temp table first and the inserting where the ID doesn't exist. You may also want to perform an UPDATE on rows where the ID does exist.
The key values in Excel are NOT true identity values.
Either way, I think you should add a new column to your target table like: ExternalId, which can be duplicated if required or checked against to prevent duplicates. With both approaches, you should leave the IDENTITY insert as it is.

Got the system to work now.
I had to enter the information straight into the database instead of actually trying to upload the information into the database itself. So thanks everyone for your help and support. You are all super.

cx_OracleTools CopyData.py - using without a PK constraint?

I'm attempiting to use cx_OracleTool's CopyData.py script to copy data between two tables on separate Oracle schemas/instances:
http://cx-oracletools.sourceforge.net/cx_OracleTools.html
When I run it against my tables, I get the error:
No primary or unique constraint found on table.
I don't know much about Oracle, to be honest, but from what I can tell the tables don't seem to have any PK constraint or anything like that defined.
The merits of this aside, I think it's simply been setup that way for expediency, and it's unlikely to change anytime nearterm.
Is there any way to get copyData.py to run in this scenario without a PK constraint?
Cheers,
Victor

The issue is that CopyData checks to see if the row exists in the destination table, and it can't do that without a unique key.
If it is acceptable to insert all rows and not update changed ones, use the --no-check-exists option. According to the code this will bypass the primary key check.
Otherwise, use the --key-columns=COLS option to manually specify the columns to be used as the unique key. This will also bypass the primary key check.

Is there a way to reset the IDENTIY column in sql server? and would this be a reason not to use an Identity Column?

I have a development database that has fees in it. It has a feeid, which is a unique key that is the identifier. The problem I run into is that the feeid/fee amount may not match when putting updating the table on a production server. This obviously could lead to some bad things happening, like overcharging for something or undercharging. Is there a way to match reset identities in sql server or match them or is this an example of when you would not want to use them?

Don't make your primary keys
"mean something" other than
identifying an unique record. If you
need to hard code an ID somewhere,
create another column for it.
So-called "natural keys" are more
trouble than they're worth
If,
for some reason, you decide that
either you will not or cannot follow
the first rule, don't use any
automatically generated key values.

That is the behaviour of an identity column, this is also what makes it so fast because it doesn't lock the table
to reset an identity either use DBCC CHECKIDENT or TRUNCATE TABLE
to insert IDs from one table to another and to keep the same values you need to do
SET IDENTITIY_INSERT ON
--upddate/insert rows
SET IDENTITIY_INSERT OFF
keep in mind that during the time between the two SET IDENTITIY_INSERT statements that your regular inserts will FAIL!

You can set IDENTITIY INSERT ON, update the IDs (make sure there are no conflicts) and then turn it back off.

Can DB constraits ignore existing records and apply for only new data?

I want to learn the answer for different DB engines but in our case;
we have some records that are not unique for a column and now we want to make that column unique which forces us to remove duplicate values.
We use Oracle 10g. Is this reasonable? Or is this something like goto statement :) ? Should we really delete? What if we had millions of records?

To answer the question as posted: No, it can't be done on any RDBMS that I'm aware of.
However, like most things you can work around it, by doing the following.
Create a composite key, with a new column and the existing column
You can make it unique without deleting anything by adding a new column, call it PartialKey.
For existing rows you set PartialKey to a unique value (starting at Zero).
Create a unique constraint on the existing column and PartialKey (you can do this because each of these rows will now be unique).
For new rows, only use a default value of Zero for PartialKey (because zero has already been used), this will force the existing column to have unqiue values in the table.
IMPORTANT EDIT
This is weak - if you delete a row with partial key 0. Now another row can be added with a value that is already in the existing column, because the 0 in partial key will guarentee uniqueness.
You would need to ensure that either
You never delete the row with
partial key 0
You always have a dummy row with
partial key 0, and you never delete
it (or you immediately reinsert it automatically)
Edit: Bite the bullet and clean the data
If as you said you've just realised that the column should be unique, then you should (if possible) clean up the data. The above approach is a hack, and you'll find yourself writing more hacks when accessing the table (you may find you've two sets of logic for dealing with queries against that table, one for where the column IS unique, and one where it's NOT. I'd clean this now or it'll come back and bite you in the arse a thousand times over.

This can be done in SQL Server.
When you create a check constraint,
you can set an option to apply it
either to new data only or to existing
data as well. The option of applying
the constraint to new data only is
useful when you know that the existing
data already meets the new check
constraint, or when a business rule
requires the constraint to be enforced
only from this point forward.
for example
ALTER TABLE myTable
WITH NOCHECK
ADD CONSTRAINT myConstraint CHECK ( column > 100 )

You can do this using NOVALIDATE ENABLE constraint state, but deleting is much more preferred way.

You have to set your records straight before adding the constraints.

In Oracle you can put a constraint in a enable novalidate state. When a constraint is in the enable novalidate state, all subsequent statements are checked for conformity to the constraint. However, any existing data in the table is not checked. A table with enable novalidated constraints can contain invalid data, but it is not possible to add new invalid data to it. Enabling constraints in the novalidated state is most useful in data warehouse configurations that are uploading valid OLTP data.

Preventing Duplicate Inserts Into SQL With PHP

I'm going to running thousands of queries into SQL and I need to prevent the duplication of field 'domain'. Never had to do this before and any help would be appreciated.

You probably want to create a "UNIQUE" constraint on the field "Domain" - this constraint will raise an error if you create two rows that have the same domain in the database. For an explanation, see this tutorial in W3C school -
http://www.w3schools.com/sql/sql_unique.asp
If this doesn't solve your problem, please clarify the database you have chosen to use (MySql?).
NOTE: This constraint is completely separate from your choice of PHP as a programming language, it is a SQL database definition thing. A huge advantage of expressing this constraint in SQL is that you can trust the database to preserve the constraint even when people import / export data from the database, your application is buggy or another application shares the database.

If this is an absolute database integrity requirement (It's not likely to change, nor does existing data have this problem), I would enforce it at the database with a unique constraint.
As far as detecting it before or after the attempt in order to notify the user, there are a number of techniques which could be used.

Where is the data coming from? Is this something you only want to run once, or a couple of times, or often? If the domain-value already exists, do you just want to skip the insert or do something else (ie increment a counter)?
Depending on your answers, there are many possible solutions:
Pre-sort your data, eliminate duplicates, then insert
(assumes relatively static data, empty table to begin with)
Use an associative array in PHP as a local domain-value cache
(if table already contains data, start by reading existing content;
not thread-safe, but works if it only runs once at a time)
Make domain a UNIQUE column and write wrapper code to handle return errors
Make domain a UNIQUE or PRIMARY KEY column and use an ON DUPLICATE KEY clause:
INSERT INTO mydata ( domain, count ) VALUES
( 'firstdomain', 1 ),
( 'seconddomain', 1 ),
( 'thirddomain', 1 )
ON DUPLICATE KEY
UPDATE count = count+1
Insert all data into the table, then remove duplicates
Note that batching inserts (ie using multiple value clauses per statement) can be significantly faster.

I'm not really sure I understood your question, but perhaps you are looking for SQL's "UNIQUE" constraint. If the query tries to insert a pre-existing value to a field, you (PHP) will be notified about this constraint breach.

There are a bunch of ways to approach this. You could set a unique constraint (like a primary key) on that column. This will cause the insert to fail if that domain has also been inserted. You could also insert all of the duplicate domains and just delete them later on. This will work well if not that many of the domains are duplicated. There are a few questions posted already on finding duplicate rows.

This can be doen with sql, rather than with php.
i am assuming that you are using MySQl, but the same principles will work with different databases.
make the Domain column the primary key. (makes sense, as it has to unique.)
Rather than using INSERT, use UPDATE.
if the primary key already exists (that you are trying to put into the table), update will update the existing tuple, rather than creating a new tuple.
so you will overwrite existing data if it is different, and if it is identical the update will be skipped.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight