I have a employee table which has 10 columns and have to create a unique key constraint for id, name, address, mobile
In the above case address might can come as null and mobile can come as null. However when they comes the uniqueness should be maintained.
first, i created a unique constraint by combining all the above keys and following is observed.
Actual behaviour in MySQL.
001-Thiagu-NULL-900000 - Accepted
001-Thiagu-NULL-900000 - Accepted
001-Thiagu-0001-900000 - Accepted
001-Thiagu-0001-900000 - Rejected - Duplicate Record
Expected behaviour in all the databases
001-Thiagu-NULL-900000 - Accepted
001-Thiagu-NULL-900000 - Rejected - Duplicate Record
001-Thiagu-0001-900000 - Accepted
001-Thiagu-0001-900000 - Rejected - Duplicate Record
Basically the similar should be considered for duplication no matter whether the value exist as NULL or Not.
To overcome this problem i dropped the idea of combining and creating unique by adding columns to the unique constraint and come up with a new column of string type with unique constraint.
One each insert of the record i manually construct and give the value on any insert so that uniqueness will be maintained.
Is that would be the right approach or any other way to fix in the above first approach which i am not sure.
The created constraint should work for MySQL, SQL Server, Oracle and Postgres.
In SQL, null never equals null. That's not a bug, that's a feature. NULL IS NOT DISTINCT FROM NULL is true, but key declarations employ '=' [in the equivalent longhands], not IS NOT DISTINCT FROM. The 'key' constraint that you want should employ IS NOT DISTINCT FROM, therefore you cannot get there by declaring keys.
The next option would be a CHECK constraint, but products are unlikely to support CHECK constraints accessing other rows than the one being inserted.
The next option would be to create an ASSERTION, but no product supports that [reliably], essentially for the same reason as why they don't support cross-row CHECK constraints.
The next option is to enforce this in a stored procedure, but then you're likely to bump into [some of] the products only talking their proprietary dialect of SQL/PSM language.
The next option is application code.
Related
The idea
So, the title may seem somewhat vague but this is what I would like to have (in Microsoft SQL Server, recent version), to be used from an ASP.NET C# application:
A table with an ordinary primary key, defined as an "official" identity column
some other columns
An additional "logical identity" column
The additional "logical idendity" column should have the following properties
be of type integer
not strictly unique (multiple rows can have the same "locigal idendity")
mandatory
immutable (once set, it may never change). However DELETE of the row must be allowed.
When not provided at INSERT, set to a not yet used value
The last point is probably the hardest to achieve, so that's the question:
The question
How to enforce (preferably on the database level) that a mandatory value is always set to a yet unique value, when not provided by the INSERT script?
The thoughts
What I have considered yet:
Having a normal "identity" on that column is not possible because it's not unique among the existing values
Having a random value is not possible, because it must be unique for new values
Extending the =SaveChanges= Method would be problematic, because it would require to query the database in it
Maybe a database triggered function, but I would hope that there are easier solutions
The context
On some occations, especially when there will be an additional row with the same "logical idendity" insert, the application already defines the "loigcal idendity", and it should be used.
Currently, when the application sets a value as "logical ID" it will be among the existing values. Thus, I could force the database to accept only INSERTed values that at least exist once. This would help it when required to provide new, unique values.
However, if this is some sort of new item, the system should provide a new "locigal idendity" on the fly, while inserting. It must be sure, that no existing value is reused for this.
I will use Entity Framework (Version 6) as my ORM.
If the above requirements are not met, an exception should be thrown on the "Add"
If such a value would be changed, an exception should be thrown on the "Update"
One option is with a SEQUENCE value assigned with a DEFAULT constraint. The immutable requirement is the biggest challenge because SQL Server doesn't provide a declarative way to specify a read-only column so one needs a trigger implementation.
Below is example DDL. I don't know if this technique will pose challenges with EF.
CREATE SEQUENCE SQ_Example_Sequence
AS int START WITH 1 INCREMENT BY 1;
GO
CREATE TABLE dbo.Example(
IdentityColumn int NOT NULL IDENTITY
CONSTRAINT PK_Example PRIMARY KEY CLUSTERED
,SomeDataColumn int
,SequenceColumn int NOT NULL
CONSTRAINT DF_Example_SequenceColumn DEFAULT NEXT VALUE FOR SQ_Example_Sequence
);
GO
CREATE TRIGGER TR_Example_Update
ON dbo.Example FOR UPDATE
AS
IF EXISTS(SELECT 1
FROM inserted
JOIN deleted ON inserted.IdentityColumn = deleted.IdentityColumn
WHERE inserted.SequenceColumn <> deleted.SequenceColumn
)
BEGIN
THROW 50000, 'SequenceColumn value cannot be changed', 16;
END;
GO
Which is better distinct or unique constraint for table in SQL Server Database ?
Should I use
distinct
for getting records from the large table or put
unique constraint
to the field so no duplicate entry happened ?
My ultimate goal is only that , get unique data, and i know both will give me this, But if i use unique constraint to field , then It will give me sql error at a time i insert duplicate data. Is it ok ? Is it affect to server or Databases ? I am using SQL Server for this process.
They're totally different use cases.
A unique constraint is what you use if the column itself (or set of columns) must be unique according to the schema details (the data). In other words, if the data is required to be unique on that column (or column set), use a unique constraint.
For example, if you're maintaining a membership table, the member ID should be unique.
The database must protect itself from dodgy data, this is not something that should be left to well-behaved applications, since the first non-well-behaved application that comes along is going to destroy your universe.
If the data is not required to be unique (such as the town each member lives in), then you can decide to "uniquify" it in a select statement, depending on your needs:
-- Get all towns.
select distinct town from members
So, here's your solution matrix, in decreasing priority:
Does the actual data need to be unique on that column? If so, a unique constraint must be used. Otherwise, a unique constraint should not be used.
If the data does not need to be uniques, do you need to only get one row for each possible value for that data? If so, use select distinct. If not, use select on its own.
Depends.
With distinct you pay at query time, but it's simpler for the user.
With unique constraint, you pay at insert time, and the app now has to handle exceptions on duplicates, but the query is faster.
Without more info, I would go with distinct, because life is simpler and you don't lock in behaviour (next week you may need the duplicates).
UNIQUE: always take part in data INSERTION (in brief)
DISTINCT: always concern on data retrieval (in brief)
Maybe this will help you.
First, I want to talk a little about the Foreign key constraint rule and how helpful it is. Suppose I have two tables, a primary table with the primary column called ID, the other table is the foreign one which also has a primary column called ID. This column in the foreign table refers to the ID column in the primary table. If we don't establish any Foreign key relation/constraint between those tables, we may fall foul of many problems related to integrity.
If we create the foreign key relation for them, any changes to the ID column in primary table will 'auto' reflect to the ID column in the foreign table, changes here can be made by DELETE, UPDATE queries. Moreover, any changes to the ID in the foreign table should be constrained by the ID column in the primary table, for example there shouldn't any new value inserted or updated in the ID column of the foreign table unless it does exist in the ID column of the primary table.
I know that SQLite doesn't support foreign key constraint (with full functions as detailed above) and I have to use TRIGGER to work around this problem. I have used TRIGGER to work around successfully in one way (Any changes to the ID column in the primary table will refect to the ID column in the foreign table) but the reverse way (should throw/raise any error if there is a confict occurs, for example, there are only values 1,2,3 in the ID column of the primary table, but the value 2 in the ID column of the foreign table is updated to 4 -> not exist in the primary table -> should throw error) is not easy. The difficult is SQLite doesn't also support IF statement and RAISERROR function. If these features were supported, I could work around easily.
I wonder how you can use SQLite if it doesn't support some important features? Even working around by using TRIGGER is not easy and I think it's impossible, except that you don't care about the reverse way. (In fact, the reverse way is not really necessary if you set up your SQL queries carefully, but who can make sure? Raising error is a mechanism reminding us to fix and correct and making it work exactly without corrupting data and the bugs can't be invisible.
If you still don't know what I want, I would like to have some last words, my purpose is to achieve the full functionality of the Foreign key constraint which is not supported in SQLite (even you can create such a relationship but it's fake, not real as you can benefit from it in SQL Server, SQL Server Ce, MS Access or MySQL).
Your help would be highly appreciated.
PS: I really like SQLite because it is file-based, easy to deploy, supports large file size (an advantage over SQL Server Ce) but some missing features have made me re-think many times, I'm afraid if going for it, my application may be unreliable and corrupt unpredictably.
To answer the question that you have skillfully hidden in your rant:
SQLite allows the RAISE function inside triggers; because of the lack of control flow statements, this must be used with a SELECT:
CREATE TRIGGER check_that_id_exists_in_parent
BEFORE UPDATE OF id ON child_table
FOR EACH ROW
BEGIN
SELECT RAISE(ABORT, 'parent ID does not exist')
WHERE NOT EXISTS (SELECT 1
FROM parent_table
WHERE id = NEW.id);
END;
Further to my question "Why to use ´not null primary key´ in TSQL?"...
As I understood from other discussions, some RDBMS (for example SQLite, MySQL) permit "unique" NULL in the primary key.
Why is this allowed and how might it be useful?
Background: I believe it is beneficial for communication with colleagues and database professionals to know the differences in fundamental concepts, approaches and their implementations in different DBMS.
Notes
MySQL is rehabilitated and returned to the "NOT NULL PK" list.
SQLite has been added (thanks to Paul Hadfield) to "NULL PK" list:
For the purposes of determining the uniqueness of primary key values, NULL values are considered distinct from all other values, including other NULLs.
If an INSERT or UPDATE statement attempts to modify the table content so that two or more rows feature identical primary key values, it is a constraint violation. According to the SQL standard, PRIMARY KEY should always imply NOT NULL. Unfortunately, due to a long-standing coding oversight, this is not the case in SQLite.
Unless the column is an INTEGER PRIMARY KEY SQLite allows NULL values in a PRIMARY KEY column. We could change SQLite to conform to the standard (and we might do so in the future), but by the time the oversight was discovered, SQLite was in such wide use that we feared breaking legacy code if we fixed the problem.
So for now we have chosen to continue allowing NULLs in PRIMARY KEY columns. Developers should be aware, however, that we may change SQLite to conform to the SQL standard in future and should design new programs accordingly.
— SQL As Understood By SQLite: CREATE TABLE
Suppose you have a primary key containing a nullable column Kn.
If you want to have a second row rejected on the ground that in that second row, Kn is null and the table already contains a row with Kn null, then you are actually requiring that the system would treat the comparison "row1.Kn = row2.Kn" as giving TRUE (because you somehow want the system to detect that the key values in those rows are indeed equal). However, this comparison boils down to the comparison "null = null", and the standard already explicitly specifies that null doesn't compare equal to anything, including itself.
To allow for what you want, would thus amount to SQL deviating from its own principles regarding the treatment of null. There are innumerable inconsistencies in SQL, but this particular one never got past the committee.
I don't know whether older versions of MySQL differ on this, but as of modern versions a primary key must be on columns that are not null. See the manual page on CREATE TABLE: "A PRIMARY KEY is a unique index where all key columns must be defined as NOT NULL. If they are not explicitly declared as NOT NULL, MySQL declares them so implicitly (and silently)."
As far as relational database theory is concerned:
The primary key of a table is used to uniquely identify each and every row in the table
A NULL value in a column indicates that you don't konw what the value is
Therefore, you should never use the value of "I don't know" to uniquely identify a row in a table.
Depending upon the data you are modelling, a "made up" value can be used instead of NULL. I've used 0, "N/A", 'Jan 1, 1980', and similar values to represent dummy "known to be missing" data.
Most, if not all, DB engines do allow for a UNIQUE constraint or index, which does allow for NULL column values, though (ideally) only one row may be assigned the value null (otherwise it wouldn't be a unique value). This can be used to support the irritatingly pragmatic (but occasionally necessary) situations that don't fit neatly into relational theory.
Well, it could allow you to implement the Null Object Pattern natively within the database. So if you were using something similar in code, which interacted very closely with the DB, you could just look up the object corresponding to the key without having to special-case a null check.
Now whether this is worthwhile functionality I'm not sure, but it's really a question of whether the pros of disallowing null pkeys in absolutely all cases outweigh the cons of obstructing someone who (for better or worse) actually wants to use null keys. This would only be worth it if you could demonstrate some non-trivial improvement (such as faster key lookup) from being able to guarantee that keys are non-null. Some DB engines would show this, others might not. And if there aren't any real pros from forcing this, why artificially restrict your clients?
As discussed in other answers, NULL was intended to mean "the information that should go in this column is unknown". However, it is also frequently used to indicate an alternative meaning of "this attribute does not exist". This is a particularly useful interpretation when looking at timestamp fields that are interpreted as the time some particular event occurred, in which case NULL is often used to indicate that the event has not yet occurred.
It is a problem that SQL doesn't support this interpretation very well -- for this to work properly, it really needs to have a separate value (something like "never") that doesn't behave as null does ("never" should be equal to "never" and should compare as higher than all other values). But as SQL lacks this notion, and there is no convenient way to add it, using null for this purposes is often the best choice.
This leaves the problem that when a timestamp of an event that may have not occurred should be part of the primary key of a table (a common requirement perhaps being the use of a natural key along with a deletion timestamp when using soft deletion with a requirement for the ability to recreate the item after deletion) you really want the primary key to have a nullable column. Alas, this is not allowed in most databases, and instead you have to resort to an artificial primary key (e.g. a row sequence number) and a UNIQUE constraint for what should otherwise have been your actual primary key.
An example scenario, in order to clarify this: I have a users table. As I require each user to have a distinct username, I decide to use username as the primary key. I want to support user deletion, but as I need to track the existence of users historically for auditing purposes I use soft deletion (in the first version of the schema, I add a 'deleted' flag to the user, and ensure that the deleted flag is checked in all queries where only active users are expected).
An additional requirement, however, is that if a username is deleted, it should be available for new users to register. An attractive way to achieve this would be to have the deleted flag change to a nullable timestamp (where nulls indicate that the user has not been deleted) and put this in the primary key. Were primary keys to allow nullable columns, this would have the following effect:
Creating a new user with an existing username when that user's deleted column is null would be denied as a duplicate key entry
Deleting a user changes its key (which requires changes to cascade to foreign keys that reference the user, which is suboptimal but if deletions are rare is acceptable) so that the deleted column is a timestamp for the when the deletion occurred
Now a new user (which would have a null deleted timestamp) can be successfully created.
However, this cannot actually be achieved with standard SQL, so instead one must use a different primary key (probably a generated numeric user id in this case) and use a UNIQUE constraint to enforce the uniqueness of (username,deleted).
Having primary key null can be beneficial in some scenarios. In one of my projects I used this feature during synchronisation of databases: one on server and many on different users devices. Considering the fact that not all users have access to the Internet all the time, I decided that only the main database will be able to give ids to my entities. SQLite has its own mechanism for numbering rows. Had I used additional id field I would use more bandwith. Having null as id not only notifies me that an entity has been created on clients device when he hadn't access to the Internet, but also decreases code complexity. The only drawback is that on clients device I can't get an entity by it's id unless it was previously synchronised with main database. However thats not an issue since my user cares for entities for their parameters, not their unique id.
I'm going to running thousands of queries into SQL and I need to prevent the duplication of field 'domain'. Never had to do this before and any help would be appreciated.
You probably want to create a "UNIQUE" constraint on the field "Domain" - this constraint will raise an error if you create two rows that have the same domain in the database. For an explanation, see this tutorial in W3C school -
http://www.w3schools.com/sql/sql_unique.asp
If this doesn't solve your problem, please clarify the database you have chosen to use (MySql?).
NOTE: This constraint is completely separate from your choice of PHP as a programming language, it is a SQL database definition thing. A huge advantage of expressing this constraint in SQL is that you can trust the database to preserve the constraint even when people import / export data from the database, your application is buggy or another application shares the database.
If this is an absolute database integrity requirement (It's not likely to change, nor does existing data have this problem), I would enforce it at the database with a unique constraint.
As far as detecting it before or after the attempt in order to notify the user, there are a number of techniques which could be used.
Where is the data coming from? Is this something you only want to run once, or a couple of times, or often? If the domain-value already exists, do you just want to skip the insert or do something else (ie increment a counter)?
Depending on your answers, there are many possible solutions:
Pre-sort your data, eliminate duplicates, then insert
(assumes relatively static data, empty table to begin with)
Use an associative array in PHP as a local domain-value cache
(if table already contains data, start by reading existing content;
not thread-safe, but works if it only runs once at a time)
Make domain a UNIQUE column and write wrapper code to handle return errors
Make domain a UNIQUE or PRIMARY KEY column and use an ON DUPLICATE KEY clause:
INSERT INTO mydata ( domain, count ) VALUES
( 'firstdomain', 1 ),
( 'seconddomain', 1 ),
( 'thirddomain', 1 )
ON DUPLICATE KEY
UPDATE count = count+1
Insert all data into the table, then remove duplicates
Note that batching inserts (ie using multiple value clauses per statement) can be significantly faster.
I'm not really sure I understood your question, but perhaps you are looking for SQL's "UNIQUE" constraint. If the query tries to insert a pre-existing value to a field, you (PHP) will be notified about this constraint breach.
There are a bunch of ways to approach this. You could set a unique constraint (like a primary key) on that column. This will cause the insert to fail if that domain has also been inserted. You could also insert all of the duplicate domains and just delete them later on. This will work well if not that many of the domains are duplicated. There are a few questions posted already on finding duplicate rows.
This can be doen with sql, rather than with php.
i am assuming that you are using MySQl, but the same principles will work with different databases.
make the Domain column the primary key. (makes sense, as it has to unique.)
Rather than using INSERT, use UPDATE.
if the primary key already exists (that you are trying to put into the table), update will update the existing tuple, rather than creating a new tuple.
so you will overwrite existing data if it is different, and if it is identical the update will be skipped.