Related
I am learning to write unit tests for work. I was advised to use TSQLT FakeTable to test some aspects of a table created by a stored procedure.
In other unit tests, we create a temp table for the stored procedure and then test the temp table. I'm not sure how to work the FakeTable into the test.
EXEC tSQLt.NewTestClass 'TestThing';
GO
CREATE OR ALTER PROCEDURE TestThing.[test API_StoredProc to make sure parameters work]
AS
BEGIN
DROP TABLE IF EXISTS #Actual;
CREATE TABLE #Actual ----Do I need to create the temp table and the Fake table? I thought I might need to because I'm testing a table created by a stored procedure.
(
ISO_3166_Alpha2 NVARCHAR(5),
ISO_3166_Alpha3 NVARCHAR(5),
CountryName NVARCHAR(100),
OfficialStateName NVARCHAR(300),
sovereigny NVARCHAR(75),
icon NVARCHAR(100)
);
INSERT #Actual
(
ISO_3166_Alpha2,
ISO_3166_Alpha3,
CountryName,
OfficialStateName,
sovereigny,
icon
)
EXEC Marketing.API_StoredProc #Username = 'AnyValue', -- varchar(100)
#FundId = 0, -- int
#IncludeSalesForceInvestorCountry = NULL, -- bit
#IncludeRegisteredSalesJurisdictions = NULL, -- bit
#IncludeALLCountryForSSRS = NULL, -- bit
#WHATIF = NULL, -- bit
#OUTPUT_DEBUG = NULL -- bit
EXEC tsqlt.FakeTable #TableName = N'#Actual', -- nvarchar(max) -- How do I differentiate between the faketable and the temp table now?
#SchemaName = N'', -- nvarchar(max)
#Identity = NULL, -- bit
#ComputedColumns = NULL, -- bit
#Defaults = NULL -- bit
INSERT INTO #Actual
(
ISO_3166_Alpha2,
ISO_3166_Alpha3,
CountryName,
OfficialStateName,
sovereigny,
icon
)
VALUES
('AF', 'AFG', 'Afghanistan', 'The Islamic Republic of Afghanistan', 'UN MEMBER STATE', 'test')
SELECT * FROM #actual
END;
GO
EXEC tSQLt.Run 'TestThing';
What I'm trying to do with the code above is basically just to get FakeTable working. I get an error: "FakeTable couold not resolve the object name #Actual"
What I ultimately want to test is the paramaters in the stored procedure. Only certain entries should be returned if, say, IncludeSalesForceInvestorCountry is set to 1. What should be returned may change over time, so that's why I was advised to use FakeTable.
In your scenario, you don’t need to fake any temp tables, just fake the table that is referenced by Marketing.API_StoredProc and populate it with values that you expect to be returned, and some you don’t. Add what you expect to see in an #expected table, call Marketing.API_StoredProc dumping the results into an #actual table and compare the results with tSQLt.AssertEqualsTable.
A good starting point might be to review how tSQLT.FakeTable works and a real world use case.
As you know, each unit test runs within its own transaction started and rolled back by the tSQLT framework. When you call tSQLt.FakeTable within a unit test, it temporarily renames the specified table then creates an exactly named facsimile of that table. The temporary copy allows NULL in every column, has no primary or foreign keys, identity column, check, default or unique constraints (although some of those can be included in the facsimile table depending on parameters passed to tSQLt.FakeTable). For the duration of the test transaction, any object that references the name table will use the fake rather than the real table. At the end of the test, tSQLt rolls back the transaction, the fake table is dropped and the original table returned to its former state (this all happens automatically). You might ask, what is the point of that?
Imagine you have an [OrderDetail] table which has columns including OrderId and ProductId as the primary key, an OrderStatusId column plus a bunch of other NOT NULL columns. The DDL for this table might look something like this.
CREATE TABLE [dbo].[OrderDetail]
(
OrderDetailId int IDENTITY(1,1) NOT NULL
, OrderId int NOT NULL
, ProductId int NOT NULL
, OrderStatusId int NOT NULL
, Quantity int NOT NULL
, CostPrice decimal(18,4) NOT NULL
, Discount decimal(6,4) NOT NULL
, DeliveryPreferenceId int NOT NULL
, PromisedDeliveryDate datetime NOT NULL
, DespatchDate datetime NULL
, ActualDeliveryDate datetime NULL
, DeliveryDelayReason varchar(500) NOT NULL
/* ... other NULL and NOT NULL columns */
, CONSTRAINT PK_OrderDetail PRIMARY KEY CLUSTERED (OrderId, ProductId)
, CONSTRAINT AK_OrderDetail_AutoIncrementingId UNIQUE NONCLUSTERED (OrderDetailId)
, CONSTRAINT FK_OrderDetail_Order FOREIGN KEY (OrderId) REFERENCES [dbo].[Orders] (OrderId)
, CONSTRAINT FK_OrderDetail_Product FOREIGN KEY (ProductId) REFERENCES [dbo].[Product] (ProductId)
, CONSTRAINT FK_OrderDetail_OrderStatus FOREIGN KEY (OrderStatusId) REFERENCES [dbo].[OrderStatus] (OrderStatusId)
, CONSTRAINT FK_OrderDetail_DeliveryPreference FOREIGN KEY (DeliveryPreferenceId) REFERENCES [dbo].[DeliveryPreference] (DeliveryPreferenceId)
);
As you can see, this table has foreign key dependencies on the Orders, Product, DeliveryPreference and OrderStatus table. Product may in turn have foreign keys that reference ProductType, BrandCategory, Supplier among others. The Orders table has foreign key references to Customer, Address and SalesPerson among others. All of the tables in this chain have numerous columns defined as NOT NULL and/or are constrained by CHECK and other constraints. Some of these tables themselves have more foreign keys.
Now imagine you want to write a stored procedure (OrderDetailStatusUpdate) whose job it is to update the order status for a single row on the OrderDetail table. It has three input parameters #OrderId, #ProductId and #OrderStatusId. Think about what you would need to do to set up a test for this procedure. You would need to add at least two rows to the OrderDetail table including all the NOT NULL columns. You would also need to add parent records to all the FK-referenced tables, and also to any tables above that in the hierarchy, ensuring that all your inserts comply with all the nullability and other constraints on those tables too. By my count that is at least 11 tables that need to be populated, all for one simple test. And even if you bite the bullet and do all that set-up, at some point in the future someone may (probably will) come along and add a new NOT NULL column to one of those tables or change a constraint that will cause your test to fail - and that failure actually has nothing to do with your test or the stored procedure you are testing. One of the basic tenets of test-driven development is that a test should have only on reason to fail, I count dozens.
tSQLT.FakeTable to the rescue.
What is the minimum you actually need to do to in order to set up a test for that procedure? You need two rows to the OrderDetail table (one that gets updated, one that doesn’t) and the only columns you actually “need” to consider are OrderId and ProductId (the identifying key) plus OrderStatusId - the column being updated. The rest of the columns whilst important in the overall design, have no relevance to the object under test. In your test for OrderDetailStatusUpdate, you would follow these steps:
Call tSQLt.FakeTable ‘dbo.OrderDetail’
Create an #expected table (with OrderId, ProductId and OrderStatusId
columns) and populate it with the two rows you expect to end up with
(one will have the expected OrderStatusId the other can be NULL)
Add two rows to the now mocked OrderDetail table (OrderId and
ProductId only)
Call the procedure under test OrderDetailStatusUpdate passing the
OrderID and ProductID for one of the rows inserted plus the
OrderStatusId you are changing to.
Use tSQLt.AssertEqualsTable to compare the #expected table with the
OrderDetail table. This assertion will only compare the columns on
the #expected table, the other columns on OrderDetail will be ignored
Creating this test is really quick and the only reason it is ever likely to fail is because something pertinent to the code under test has changed in the underlying schema. Changes to any other columns on the OrderDetail table or any of the parent/grand-parent tables will not cause this test to break.
So the reason for using tSQLt.FakeTable (or any other kind of mock object) is to provide really robust test isolation and simply test data preparation.
I want to store a single row in a configuration table for my application. I would like to enforce that this table can contain only one row.
What is the simplest way to enforce the single row constraint ?
You make sure one of the columns can only contain one value, and then make that the primary key (or apply a uniqueness constraint).
CREATE TABLE T1(
Lock char(1) not null,
/* Other columns */,
constraint PK_T1 PRIMARY KEY (Lock),
constraint CK_T1_Locked CHECK (Lock='X')
)
I have a number of these tables in various databases, mostly for storing config. It's a lot nicer knowing that, if the config item should be an int, you'll only ever read an int from the DB.
I usually use Damien's approach, which has always worked great for me, but I also add one thing:
CREATE TABLE T1(
Lock char(1) not null DEFAULT 'X',
/* Other columns */,
constraint PK_T1 PRIMARY KEY (Lock),
constraint CK_T1_Locked CHECK (Lock='X')
)
Adding the "DEFAULT 'X'", you will never have to deal with the Lock column, and won't have to remember which was the lock value when loading the table for the first time.
You may want to rethink this strategy. In similar situations, I've often found it invaluable to leave the old configuration rows lying around for historical information.
To do that, you actually have an extra column creation_date_time (date/time of insertion or update) and an insert or insert/update trigger which will populate it correctly with the current date/time.
Then, in order to get your current configuration, you use something like:
select * from config_table order by creation_date_time desc fetch first row only
(depending on your DBMS flavour).
That way, you still get to maintain the history for recovery purposes (you can institute cleanup procedures if the table gets too big but this is unlikely) and you still get to work with the latest configuration.
You can implement an INSTEAD OF Trigger to enforce this type of business logic within the database.
The trigger can contain logic to check if a record already exists in the table and if so, ROLLBACK the Insert.
Now, taking a step back to look at the bigger picture, I wonder if perhaps there is an alternative and more suitable way for you to store this information, perhaps in a configuration file or environment variable for example?
I know this is very old but instead of thinking BIG sometimes better think small use an identity integer like this:
Create Table TableWhatever
(
keycol int primary key not null identity(1,1)
check(keycol =1),
Col2 varchar(7)
)
This way each time you try to insert another row the check constraint will raise preventing you from inserting any row since the identity p key won't accept any value but 1
Here's a solution I came up with for a lock-type table which can contain only one row, holding a Y or N (an application lock state, for example).
Create the table with one column. I put a check constraint on the one column so that only a Y or N can be put in it. (Or 1 or 0, or whatever)
Insert one row in the table, with the "normal" state (e.g. N means not locked)
Then create an INSERT trigger on the table that only has a SIGNAL (DB2) or RAISERROR (SQL Server) or RAISE_APPLICATION_ERROR (Oracle). This makes it so application code can update the table, but any INSERT fails.
DB2 example:
create table PRICE_LIST_LOCK
(
LOCKED_YN char(1) not null
constraint PRICE_LIST_LOCK_YN_CK check (LOCKED_YN in ('Y', 'N') )
);
--- do this insert when creating the table
insert into PRICE_LIST_LOCK
values ('N');
--- once there is one row in the table, create this trigger
CREATE TRIGGER ONLY_ONE_ROW_IN_PRICE_LIST_LOCK
NO CASCADE
BEFORE INSERT ON PRICE_LIST_LOCK
FOR EACH ROW
SIGNAL SQLSTATE '81000' -- arbitrary user-defined value
SET MESSAGE_TEXT='Only one row is allowed in this table';
Works for me.
I use a bit field for primary key with name IsActive.
So there can be 2 rows at most and and the sql to get the valid row is:
select * from Settings where IsActive = 1
if the table is named Settings.
The easiest way is to define the ID field as a computed column by value 1 (or any number ,....), then consider a unique index for the ID.
CREATE TABLE [dbo].[SingleRowTable](
[ID] AS ((1)),
[Title] [varchar](50) NOT NULL,
CONSTRAINT [IX_SingleRowTable] UNIQUE NONCLUSTERED
(
[ID] ASC
)
) ON [PRIMARY]
You can write a trigger on the insert action on the table. Whenever someone tries to insert a new row in the table, fire away the logic of removing the latest row in the insert trigger code.
Old question but how about using IDENTITY(MAX,1) of a small column type?
CREATE TABLE [dbo].[Config](
[ID] [tinyint] IDENTITY(255,1) NOT NULL,
[Config1] [nvarchar](max) NOT NULL,
[Config2] [nvarchar](max) NOT NULL
IF NOT EXISTS ( select * from table )
BEGIN
///Your insert statement
END
Here we can also make an invisible value which will be the same after first entry in the database.Example:
Student Table:
Id:int
firstname:char
Here in the entry box,we have to specify the same value for id column which will restrict as after first entry other than writing lock bla bla due to primary key constraint thus having only one row forever.
Hope this helps!
UPDATE: This issue is note related to the XML, I duplicated the table using an nvarchar(MAX) instead and still same issue. I will repost a new topic.
I have a table with about a million records, the table has an XML field. The query is running extremely slow, even when selecting just an ID. Is there anything I can do to increase the speed of this, I have tried setting text in row on, but SQL server will not allow me to, I receive the error "Cannot switch to in row text in table".
I would appreciate any help in a fix or knowledge that I seem to be missing.
Thanks
TABLE
/****** Object: Table [dbo].[Audit] Script Date: 08/14/2009 09:49:01 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Audit](
[ID] [int] IDENTITY(1,1) NOT NULL,
[ParoleeID] [int] NOT NULL,
[Page] [int] NOT NULL,
[ObjectID] [int] NOT NULL,
[Data] [xml] NOT NULL,
[Created] [datetime] NULL,
CONSTRAINT [PK_Audit] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
QUERY
DECLARE #ID int
SET #ID = NULL
DECLARE #ParoleeID int
SET #ParoleeID = 158
DECLARE #Page int
SET #Page = 2
DECLARE #ObjectID int
SET #ObjectID = 93
DECLARE #Created datetime
SET #Created = NULL
SET NOCOUNT ON;
Select TOP 1 [Audit].* from [Audit]
where
(#ID IS NULL OR Audit.ID = #ID) AND
(#ParoleeID IS NULL OR Audit.ParoleeID = #ParoleeID) AND
(#Page IS NULL OR Audit.Page = #Page) AND
(#ObjectID IS NULL OR Audit.ObjectID = #ObjectID) AND
(#Created is null or(Audit.Created > #Created and Audit.Created < DATEADD (d, 1, #Created )) )
You need to create a primary XML index on the column. Above anything else having this will assist ALL your queries.
Once you have this, you can create indexing into the XML columns on the xml data.
From experience though, if you can store some information in the relation tables, SQL is much better at searching and indexing that than XML. Ie any key columns and commonly searched data should be stored relationally where possible.
Sql Server 2005 – Twelve Tips For Optimizing Query Performance by Tony Wright
Turn on the execution plan, and statistics
Use Clustered Indexes
Use Indexed Views
Use Covering Indexes
Keep your clustered index small.
Avoid cursors
Archive old data
Partition your data correctly
Remove user-defined inline scalar functions
Use APPLY
Use computed columns
Use the correct transaction isolation level
http://tonesdotnetblog.wordpress.com/2008/05/26/twelve-tips-for-optimising-sql-server-2005-queries/
I had the very same scenario - and the solution in our case is computed columns.
For those bits of information that you need frequently from your XML, we created a computed column on the "hosting" table, which basically reaches into the XML and pulls out the necessary value from the XML using XPath. In most cases, we're even able to persist this computed column, so that it becomes part of the table and can be queried and even indexed and query speed is absolutely no problem anymore (on those columns).
We also tried XML indices in the beginning, but their disadvantage is the fact that they're absolutely HUGE on disk - this may or may not be a problem. Since we needed to ship back and forth the whole database frequently (as a SQL backup), we eventually gave up on them.
OK, to setup a computed column to retrieve from information bits from your XML, you first need to create a stored function, which will take the XML as a parameter, extract whatever information you need, and then pass that back - something like this:
CREATE FUNCTION dbo.GetShopOrderID(#ShopOrder XML)
RETURNS VARCHAR(100)
AS BEGIN
DECLARE #ShopOrderID VARCHAR(100)
SELECT
#ShopOrderID = #ShopOrder.value('(ActivateOrderRequest/ActivateOrder/OrderHead/OrderNumber)[1]', 'varchar(100)')
RETURN #ShopOrderID
END
Then, you'll need to add a computed column to your table and connect it to this stored function:
ALTER TABLE dbo.YourTable
ADD ShopOrderID AS dbo.GetShipOrderID(ShopOrderXML) PERSISTED
Now, you can easily select data from your table using this new column, as if it were a normal column:
SELECT (fields) FROM dbo.YourTable
WHERE ShopOrderID LIKE 'OSA%'
Best of all - whenever you update your XML, all the computed columns are updated as well - they're always in sync, no triggers or other black magic needed!
Marc
Some information like the query you run, the table structure, the XML content etc would definitely help. A lot...
Without any info, I will guess. The query is running slow when selecting just an ID because you don't have in index on ID.
Updated
There are at least a few serious problems with your query.
Unless an ID is provided, the table can only be scanned end-to-end because there are no indexes
Even if an ID is provided, the condition (#ID is NULL OR ID = #ID) is no guaranteed to be SARGable so it may still result in a table scan.
And most importantly: the query will generate a plan 'optimized' for the first set of parameters it sees. It will reuse this plan on any combination of parameters, no matter which are NULL or not. That would make a difference if there would be some variations on the access path to choose from (ie. indexes) but as it is now, the query can only choose between using a scan or a seek, if #id is present. Due to the ways is constructed, it will pretty much always choose a scan because of the OR.
With this table design your query will run slow today, slower tomorrow, and impossibly slow next week as the size increases. You must look back at your requirements, decide which fields are impoortant to query on, index them and provide separate queryes for them. OR-ing together all possible filters like this is not going to work.
The XML you're trying to retrieve has absolutely nothing to do with the performance problem. You are simply brute forcing a table scan and expect SQL to magically find the records you want.
So if you want to retrieve a specific ParoleeID, Page and ObjectID, you index the fields you search on and run a run a query for those and only those:
CREATE INDEX idx_Audit_ParoleeID ON Audit(ParoleeID);
CREATE INDEX idx_Audit_Page ON Audit(Page);
CREATE INDEX idx_Audit_ObjectID ON Audit(ObjectID);
GO
DECLARE #ParoleeID int
SET #ParoleeID = 158
DECLARE #Page int
SET #Page = 2
DECLARE #ObjectID int
SET #ObjectID = 93
SET NOCOUNT ON;
Select TOP 1 [Audit].* from [Audit]
where Audit.ParoleeID = #ParoleeID
AND Audit.Page = #Page
AND Audit.ObjectID = #ObjectID;
I’m running into an odd problem, and I need some help trying to figure it out.
I have a database which has an ID column (defined as int not null, Identity, starts at 1, increments by 1) in addition to all the application data columns. The primary key for the table is the ID column, no other components.
There is no set of data I can use as a "natural primary key" since the application has to allow for multiple submissions of the same data.
I have a stored procedure, which is the only way to add new records into the table (other than logging into the server directly as the db owner)
While QA was testing the application this morning, they to enter a new record into the database (using the application as it was intended, and as they have been doing for the last two weeks) and encountered a primary key violation on this table.
This is the same way I've been doing Primary Keys for about 10 years now, and have never run across this.
Any ideas on how to fix this? Or is this one of those cosmic ray glitches that shows up once in a long while.
Thanks for any advice you can give.
Nigel
Edited at 1:15PM EDT June 12th, to give more information
A simplified version of the schema...
CREATE TABLE [dbo].[tbl_Queries](
[QueryID] [int] IDENTITY(1,1) NOT NULL,
[FirstName] [varchar](50) NOT NULL,
[LastName] [varchar](50) NOT NULL,
[Address] [varchar](150) NOT NULL,
[Apt#] [varchar](10) NOT NULL
... <12 other columns deleted for brevity>
[VersionCode] [timestamp] NOT NULL,
CONSTRAINT [PK_tbl_Queries] PRIMARY KEY CLUSTERED
(
[QueryID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
(also removed the default value statements)
The stored procedure is as follows
insert into dbo.tbl_Queries
( FirstName,
LastName,
[Address],
[Apt#]...) values
( #firstName,
#lastName,
#address,
isnull(#apt, ''), ... )
It doesn't even look at the identity column, doesn't use IDENTITY, ##scope_identity or anything similar, it's just a file and forget.
I am as confident as I can be that the identity value wasn't reset, and that no-one else is using direct database access to enter values. The only time in this project that identity insert is used is in the initial database deployment to setup specific values in lookup tables.
The QA team tried again right after getting the error, and was able to submit a query successfully, and they have been trying since then to reproduce it, and haven't succeeded so far.
I really do appreciate the ideas folks.
Sounds like the identity seed got corrupted or reset somehow. Easiest solution will be to reset the seed to the current max value of the identity column:
DECLARE #nextid INT;
SET #nextid = (SELECT MAX([columnname]) FROM [tablename]);
DBCC CHECKIDENT ([tablename], RESEED, #nextid);
While I don't have an explanation as to a potential cause, it is certinaly possible to change the seed value of an identity column. If the seed were lowered to where the next value would already exist in the table, then that could certainly cause what you're seeing. Try running DBCC CHECKIDENT (table_name) and see what it gives you.
For more information, check out this page
Random thought based on experience
Have you synched data with, say, Red Gate Data Compare. This has an option to reseed identity columns. It's caused issues for use. And another project last month.
You may also have explicitly loaded/synched IDs too.
Maybe someone insert some records logging into the server directly using a new ID explicity, then when the identity auto increment field reach this number a primary key violation happened.
But The cosmic ray is algo a good explanation ;)
Just to make very, very sure...you aren't using an IDENTITY_INSERT in your stored procedure are you? Some logic like this:
declare #id int;
Set #id=Select Max(IDColumn) From Sometable;
SET IDENTITY_INSERT dbo.SomeTable ON
Insert (IDColumn, ...others...) Values (#id+1, ...others...);
SET IDENTITY_INSERT dbo.SomeTable OFF
.
.
.
I feel sticky just typing it. But every once in awhile you run across folks that just never quite understood what an Identity column is all about and I want to make sure that this is ruled out. By the way: if this is the answer, I won't hold it against you if just delete the question and never admit that this was your problem!
Can you tell that I hire interns every summer?
Are you using functions like ##identity or scope_identity() in any of your procedures? if your table has triggers or multiple inserts you could be getting back the wrong identity value for the table you want
Hopefully that is not the case, but there is a known bug in SQL 2005 with SCOPE_IDENTITY():
http://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=328811
The Primary Key violation is not necessarily coming from that table.
Does the application touch any other tables or call any other stored procedures for that function? Are there any triggers on the table? Or does the stored procedure itself use any other tables or stored procedures?
In particular, an Auditing table or trigger could cause this.
I want to have a unique constraint on a column which I am going to populate with GUIDs. However, my data contains null values for this columns. How do I create the constraint that allows multiple null values?
Here's an example scenario. Consider this schema:
CREATE TABLE People (
Id INT CONSTRAINT PK_MyTable PRIMARY KEY IDENTITY,
Name NVARCHAR(250) NOT NULL,
LibraryCardId UNIQUEIDENTIFIER NULL,
CONSTRAINT UQ_People_LibraryCardId UNIQUE (LibraryCardId)
)
Then see this code for what I'm trying to achieve:
-- This works fine:
INSERT INTO People (Name, LibraryCardId)
VALUES ('John Doe', 'AAAAAAAA-AAAA-AAAA-AAAA-AAAAAAAAAAAA');
-- This also works fine, obviously:
INSERT INTO People (Name, LibraryCardId)
VALUES ('Marie Doe', 'BBBBBBBB-BBBB-BBBB-BBBB-BBBBBBBBBBBB');
-- This would *correctly* fail:
--INSERT INTO People (Name, LibraryCardId)
--VALUES ('John Doe the Second', 'AAAAAAAA-AAAA-AAAA-AAAA-AAAAAAAAAAAA');
-- This works fine this one first time:
INSERT INTO People (Name, LibraryCardId)
VALUES ('Richard Roe', NULL);
-- THE PROBLEM: This fails even though I'd like to be able to do this:
INSERT INTO People (Name, LibraryCardId)
VALUES ('Marcus Roe', NULL);
The final statement fails with a message:
Violation of UNIQUE KEY constraint 'UQ_People_LibraryCardId'. Cannot insert duplicate key in object 'dbo.People'.
How can I change my schema and/or uniqueness constraint so that it allows multiple NULL values, while still checking for uniqueness on actual data?
What you're looking for is indeed part of the ANSI standards SQL:92, SQL:1999 and SQL:2003, ie a UNIQUE constraint must disallow duplicate non-NULL values but accept multiple NULL values.
In the Microsoft world of SQL Server however, a single NULL is allowed but multiple NULLs are not...
In SQL Server 2008, you can define a unique filtered index based on a predicate that excludes NULLs:
CREATE UNIQUE NONCLUSTERED INDEX idx_yourcolumn_notnull
ON YourTable(yourcolumn)
WHERE yourcolumn IS NOT NULL;
In earlier versions, you can resort to VIEWS with a NOT NULL predicate to enforce the constraint.
SQL Server 2008 +
You can create a unique index that accept multiple NULLs with a WHERE clause. See the answer below.
Prior to SQL Server 2008
You cannot create a UNIQUE constraint and allow NULLs. You need set a default value of NEWID().
Update the existing values to NEWID() where NULL before creating the UNIQUE constraint.
SQL Server 2008 And Up
Just filter a unique index:
CREATE UNIQUE NONCLUSTERED INDEX UQ_Party_SamAccountName
ON dbo.Party(SamAccountName)
WHERE SamAccountName IS NOT NULL;
In Lower Versions, A Materialized View Is Still Not Required
For SQL Server 2005 and earlier, you can do it without a view. I just added a unique constraint like you're asking for to one of my tables. Given that I want uniqueness in column SamAccountName, but I want to allow multiple NULLs, I used a materialized column rather than a materialized view:
ALTER TABLE dbo.Party ADD SamAccountNameUnique
AS (Coalesce(SamAccountName, Convert(varchar(11), PartyID)))
ALTER TABLE dbo.Party ADD CONSTRAINT UQ_Party_SamAccountName
UNIQUE (SamAccountNameUnique)
You simply have to put something in the computed column that will be guaranteed unique across the whole table when the actual desired unique column is NULL. In this case, PartyID is an identity column and being numeric will never match any SamAccountName, so it worked for me. You can try your own method—be sure you understand the domain of your data so that there is no possibility of intersection with real data. That could be as simple as prepending a differentiator character like this:
Coalesce('n' + SamAccountName, 'p' + Convert(varchar(11), PartyID))
Even if PartyID became non-numeric someday and could coincide with a SamAccountName, now it won't matter.
Note that the presence of an index including the computed column implicitly causes each expression result to be saved to disk with the other data in the table, which DOES take additional disk space.
Note that if you don't want an index, you can still save CPU by making the expression be precalculated to disk by adding the keyword PERSISTED to the end of the column expression definition.
In SQL Server 2008 and up, definitely use the filtered solution instead if you possibly can!
Controversy
Please note that some database professionals will see this as a case of "surrogate NULLs", which definitely have problems (mostly due to issues around trying to determine when something is a real value or a surrogate value for missing data; there can also be issues with the number of non-NULL surrogate values multiplying like crazy).
However, I believe this case is different. The computed column I'm adding will never be used to determine anything. It has no meaning of itself, and encodes no information that isn't already found separately in other, properly defined columns. It should never be selected or used.
So, my story is that this is not a surrogate NULL, and I'm sticking to it! Since we don't actually want the non-NULL value for any purpose other than to trick the UNIQUE index to ignore NULLs, our use case has none of the problems that arise with normal surrogate NULL creation.
All that said, I have no problem with using an indexed view instead—but it brings some issues with it such as the requirement of using SCHEMABINDING. Have fun adding a new column to your base table (you'll at minimum have to drop the index, and then drop the view or alter the view to not be schema bound). See the full (long) list of requirements for creating an indexed view in SQL Server (2005) (also later versions), (2000).
Update
If your column is numeric, there may be the challenge of ensuring that the unique constraint using Coalesce does not result in collisions. In that case, there are some options. One might be to use a negative number, to put the "surrogate NULLs" only in the negative range, and the "real values" only in the positive range. Alternately, the following pattern could be used. In table Issue (where IssueID is the PRIMARY KEY), there may or may not be a TicketID, but if there is one, it must be unique.
ALTER TABLE dbo.Issue ADD TicketUnique
AS (CASE WHEN TicketID IS NULL THEN IssueID END);
ALTER TABLE dbo.Issue ADD CONSTRAINT UQ_Issue_Ticket_AllowNull
UNIQUE (TicketID, TicketUnique);
If IssueID 1 has ticket 123, the UNIQUE constraint will be on values (123, NULL). If IssueID 2 has no ticket, it will be on (NULL, 2). Some thought will show that this constraint cannot be duplicated for any row in the table, and still allows multiple NULLs.
For people who are using Microsoft SQL Server Manager and want to create a Unique but Nullable index you can create your unique index as you normally would then in your Index Properties for your new index, select "Filter" from the left hand panel, then enter your filter (which is your where clause). It should read something like this:
([YourColumnName] IS NOT NULL)
This works with MSSQL 2012
When I applied the unique index below:
CREATE UNIQUE NONCLUSTERED INDEX idx_badgeid_notnull
ON employee(badgeid)
WHERE badgeid IS NOT NULL;
every non null update and insert failed with the error below:
UPDATE failed because the following SET options have incorrect settings: 'ARITHABORT'.
I found this on MSDN
SET ARITHABORT must be ON when you are creating or changing indexes on computed columns or indexed views. If SET ARITHABORT is OFF, CREATE, UPDATE, INSERT, and DELETE statements on tables with indexes on computed columns or indexed views will fail.
So to get this to work correctly I did this
Right click [Database]-->Properties-->Options-->Other
Options-->Misscellaneous-->Arithmetic Abort Enabled -->true
I believe it is possible to set this option in code using
ALTER DATABASE "DBNAME" SET ARITHABORT ON
but i have not tested this
It can be done in the designer as well
Right click on the Index > Properties to get this window
Create a view that selects only non-NULL columns and create the UNIQUE INDEX on the view:
CREATE VIEW myview
AS
SELECT *
FROM mytable
WHERE mycolumn IS NOT NULL
CREATE UNIQUE INDEX ux_myview_mycolumn ON myview (mycolumn)
Note that you'll need to perform INSERT's and UPDATE's on the view instead of table.
You may do it with an INSTEAD OF trigger:
CREATE TRIGGER trg_mytable_insert ON mytable
INSTEAD OF INSERT
AS
BEGIN
INSERT
INTO myview
SELECT *
FROM inserted
END
It is possible to create a unique constraint on a Clustered Indexed View
You can create the View like this:
CREATE VIEW dbo.VIEW_OfYourTable WITH SCHEMABINDING AS
SELECT YourUniqueColumnWithNullValues FROM dbo.YourTable
WHERE YourUniqueColumnWithNullValues IS NOT NULL;
and the unique constraint like this:
CREATE UNIQUE CLUSTERED INDEX UIX_VIEW_OFYOURTABLE
ON dbo.VIEW_OfYourTable(YourUniqueColumnWithNullValues)
In my experience - if you're thinking a column needs to allow NULLs but also needs to be UNIQUE for values where they exist, you may be modelling the data incorrectly. This often suggests you're creating a separate sub-entity within the same table as a different entity. It probably makes more sense to have this entity in a second table.
In the provided example, I would put LibraryCardId in a separate LibraryCards table with a unique not-null foreign key to the People table:
CREATE TABLE People (
Id INT CONSTRAINT PK_MyTable PRIMARY KEY IDENTITY,
Name NVARCHAR(250) NOT NULL,
)
CREATE TABLE LibraryCards (
LibraryCardId UNIQUEIDENTIFIER CONSTRAINT PK_LibraryCards PRIMARY KEY,
PersonId INT NOT NULL
CONSTRAINT UQ_LibraryCardId_PersonId UNIQUE (PersonId),
FOREIGN KEY (PersonId) REFERENCES People(id)
)
This way you don't need to bother with a column being both unique and nullable. If a person doesn't have a library card, they just won't have a record in the library cards table. Also, if there are additional attributes about the library card (perhaps Expiration Date or something), you now have a logical place to put those fields.
Maybe consider an "INSTEAD OF" trigger and do the check yourself? With a non-clustered (non-unique) index on the column to enable the lookup.
As stated before, SQL Server doesn't implement the ANSI standard when it comes to UNIQUE CONSTRAINT. There is a ticket on Microsoft Connect for this since 2007. As suggested there and here the best options as of today are to use a filtered index as stated in another answer or a computed column, e.g.:
CREATE TABLE [Orders] (
[OrderId] INT IDENTITY(1,1) NOT NULL,
[TrackingId] varchar(11) NULL,
...
[ComputedUniqueTrackingId] AS (
CASE WHEN [TrackingId] IS NULL
THEN '#' + cast([OrderId] as varchar(12))
ELSE [TrackingId_Unique] END
),
CONSTRAINT [UQ_TrackingId] UNIQUE ([ComputedUniqueTrackingId])
)
You can create an INSTEAD OF trigger to check for specific conditions and error if they are met. Creating an index can be costly on larger tables.
Here's an example:
CREATE TRIGGER PONY.trg_pony_unique_name ON PONY.tbl_pony
INSTEAD OF INSERT, UPDATE
AS
BEGIN
IF EXISTS(
SELECT TOP (1) 1
FROM inserted i
GROUP BY i.pony_name
HAVING COUNT(1) > 1
)
OR EXISTS(
SELECT TOP (1) 1
FROM PONY.tbl_pony t
INNER JOIN inserted i
ON i.pony_name = t.pony_name
)
THROW 911911, 'A pony must have a name as unique as s/he is. --PAS', 16;
ELSE
INSERT INTO PONY.tbl_pony (pony_name, stable_id, pet_human_id)
SELECT pony_name, stable_id, pet_human_id
FROM inserted
END
You can't do this with a UNIQUE constraint, but you can do this in a trigger.
CREATE TRIGGER [dbo].[OnInsertMyTableTrigger]
ON [dbo].[MyTable]
INSTEAD OF INSERT
AS
BEGIN
SET NOCOUNT ON;
DECLARE #Column1 INT;
DECLARE #Column2 INT; -- allow nulls on this column
SELECT #Column1=Column1, #Column2=Column2 FROM inserted;
-- Check if an existing record already exists, if not allow the insert.
IF NOT EXISTS(SELECT * FROM dbo.MyTable WHERE Column1=#Column1 AND Column2=#Column2 #Column2 IS NOT NULL)
BEGIN
INSERT INTO dbo.MyTable (Column1, Column2)
SELECT #Column2, #Column2;
END
ELSE
BEGIN
RAISERROR('The unique constraint applies on Column1 %d, AND Column2 %d, unless Column2 is NULL.', 16, 1, #Column1, #Column2);
ROLLBACK TRANSACTION;
END
END
CREATE UNIQUE NONCLUSTERED INDEX [UIX_COLUMN_NAME]
ON [dbo].[Employee]([Username] ASC) WHERE ([Username] IS NOT NULL)
WITH (ALLOW_PAGE_LOCKS = ON, ALLOW_ROW_LOCKS = ON, PAD_INDEX = OFF, SORT_IN_TEMPDB = OFF,
DROP_EXISTING = OFF, IGNORE_DUP_KEY = OFF, STATISTICS_NORECOMPUTE = OFF, ONLINE = OFF,
MAXDOP = 0) ON [PRIMARY];
this code if u make a register form with textBox and use insert and ur textBox is empty and u click on submit button .
CREATE UNIQUE NONCLUSTERED INDEX [IX_tableName_Column]
ON [dbo].[tableName]([columnName] ASC) WHERE [columnName] !=`''`;