How do I create a multiple column unique constraint in SQL Server - sql-server

I have a table that contains, for example, two fields that I want to make unique within the database. For example:
create table Subscriber (
ID int not null,
DataSetId int not null,
Email nvarchar(100) not null,
...
)
The ID column is the primary key and both DataSetId and Email are indexed.
What I want to be able to do is prevent the same Email and DataSetId combination appearing in the table or, to put it another way, the Email value must be unique for a given DataSetId.
I tried creating a unique index on the columns
CREATE UNIQUE NONCLUSTERED INDEX IX_Subscriber_Email
ON Subscriber (DataSetId, Email)
but I found that this had quite a significant impact on search times (when searching for an email address for example - there are 1.5 million rows in the table).
Is there a more efficient way of achieving this type of constraint?

but I found that this had quite a significant impact on search times
(when searching for an email address for example
The index you defined on (DataSetId, Email) cannot be used for searches based on email. If you would create an index with the Email field at the leftmost position, it could be used:
CREATE UNIQUE NONCLUSTERED INDEX IX_Subscriber_Email
ON Subscriber (Email, DataSetId);
This index would server both as a unique constraint enforcement and as a means to quickly search for an email. This index though cannot be used to quickly search for a specific DataSetId.
The gist of it if is that whenever you define a multikey index, it can be used only for searches in the order of the keys. An index on (A, B, C) can be used to seek values on column A, for searching values on both A and B or to search values on all three columns A, B and C. However it cannot be used to search values on B or on C alone.

I assume that only way to enter data into that table is through SPs, If that's the case you can implement some logic in your insert and update SPs to find if the values you are going to insert / update is already exists in that table or not.
Something like this
create proc spInsert
(
#DataSetId int,
#Email nvarchar(100)
)
as
begin
if exists (select * from tabaleName where DataSetId = #DataSetId and Email = #Email)
select -1 -- Duplicacy flag
else
begin
-- insert logic here
select 1 -- success flag
end
end
GO
create proc spUpdate
(
#ID int,
#DataSetId int,
#Email nvarchar(100)
)
as
begin
if exists
(select * from tabaleName where DataSetId = #DataSetId and Email = #Email and ID <> #ID)
select -1 -- Duplicacy flag
else
begin
-- insert logic here
select 1 -- success flag
end
end
GO

Related

What does 'Unnecessary table scan' mean exactly

I have a table of names with Ids
Create table Names (
Id int,
Name nvarchar(500)
)
I'm trying to create a procedure that would select 1 name that matches the provided Id if Id is provided or select all names if no Id is provided
Create Procedure SelectNames
#Id int = null
AS
BEGIN
Select * From Names
Where IsNull(#Id, 0) = 0
Or Id = #Id
END
GO
But I get an error: 'Error: SR0015 : Microsoft.Rules.Data : Deterministic function call (ISNULL) might cause an unnecessary table scan.'
What does the 'unnecessary table scan' refer to in this instance?
And is there a better way to write the procedure?
The simplest way to remove the table scan is to create an index (probably unique) on your Id column. In general, one wouldn't expect a nullable Id value. With that index in place, finding a name by Id will not require scanning (or iterating through every row in) the table.
Regarding "better way to write the procedure" - once the nullability is removed, a simple SELECT without the WHERE should be fine.

SQL Server check constraints - only one particular value per group [duplicate]

How could I set a constraint on a table so that only one of the records has its isDefault bit field set to 1?
The constraint is not table scope, but one default per set of rows, specified by a FormID.
Use a unique filtered index
On SQL Server 2008 or higher you can simply use a unique filtered index
CREATE UNIQUE INDEX IX_TableName_FormID_isDefault
ON TableName(FormID)
WHERE isDefault = 1
Where the table is
CREATE TABLE TableName(
FormID INT NOT NULL,
isDefault BIT NOT NULL
)
For example if you try to insert many rows with the same FormID and isDefault set to 1 you will have this error:
Cannot insert duplicate key row in object 'dbo.TableName' with unique
index 'IX_TableName_FormID_isDefault'. The duplicate key value is (1).
Source: http://technet.microsoft.com/en-us/library/cc280372.aspx
Here's a modification of Damien_The_Unbeliever's solution that allows one default per FormID.
CREATE VIEW form_defaults
AS
SELECT FormID
FROM whatever
WHERE isDefault = 1
GO
CREATE UNIQUE CLUSTERED INDEX ix_form_defaults on form_defaults (FormID)
GO
But the serious relational folks will tell you this information should just be in another table.
CREATE TABLE form
FormID int NOT NULL PRIMARY KEY
DefaultWhateverID int FOREIGN KEY REFERENCES Whatever(ID)
From a normalization perspective, this would be an inefficient way of storing a single fact.
I would opt to hold this information at a higher level, by storing (in a different table) a foreign key to the identifier of the row which is considered to be the default.
CREATE TABLE [dbo].[Foo](
[Id] [int] NOT NULL,
CONSTRAINT [PK_Foo] PRIMARY KEY CLUSTERED
(
[Id] ASC
) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[DefaultSettings](
[DefaultFoo] [int] NULL
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[DefaultSettings] WITH CHECK ADD CONSTRAINT [FK_DefaultSettings_Foo] FOREIGN KEY([DefaultFoo])
REFERENCES [dbo].[Foo] ([Id])
GO
ALTER TABLE [dbo].[DefaultSettings] CHECK CONSTRAINT [FK_DefaultSettings_Foo]
GO
You could use an insert/update trigger.
Within the trigger after an insert or update, if the count of rows with isDefault = 1 is more than 1, then rollback the transaction.
CREATE VIEW vOnlyOneDefault
AS
SELECT 1 as Lock
FROM <underlying table>
WHERE Default = 1
GO
CREATE UNIQUE CLUSTERED INDEX IX_vOnlyOneDefault on vOnlyOneDefault (Lock)
GO
You'll need to have the right ANSI settings turned on for this.
I don't know about SQLServer.But if it supports Function-Based Indexes like in Oracle, I hope this can be translated, if not, sorry.
You can do an index like this on suposed that default value is 1234, the column is DEFAULT_COLUMN and ID_COLUMN is the primary key:
CREATE
UNIQUE
INDEX only_one_default
ON my_table
( DECODE(DEFAULT_COLUMN, 1234, -1, ID_COLUMN) )
This DDL creates an unique index indexing -1 if the value of DEFAULT_COLUMN is 1234 and ID_COLUMN in any other case. Then, if two columns have DEFAULT_COLUMN value, it raises an exception.
The question implies to me that you have a primary table that has some child records and one of those child records will be the default record. Using address and a separate default table here is an example of how to make that happen using third normal form. Of course I don't know if it's valuable to answer something that is so old but it struck my fancy.
--drop table dev.defaultAddress;
--drop table dev.addresses;
--drop table dev.people;
CREATE TABLE [dev].[people](
[Id] [int] identity primary key,
name char(20)
)
GO
CREATE TABLE [dev].[Addresses](
id int identity primary key,
peopleId int foreign key references dev.people(id),
address varchar(100)
) ON [PRIMARY]
GO
CREATE TABLE [dev].[defaultAddress](
id int identity primary key,
peopleId int foreign key references dev.people(id),
addressesId int foreign key references dev.addresses(id))
go
create unique index defaultAddress on dev.defaultAddress (peopleId)
go
create unique index idx_addr_id_person on dev.addresses(peopleid,id);
go
ALTER TABLE dev.defaultAddress
ADD CONSTRAINT FK_Def_People_Address
FOREIGN KEY(peopleID, addressesID)
REFERENCES dev.Addresses(peopleId, id)
go
insert into dev.people (name)
select 'Bill' union
select 'John' union
select 'Harry'
insert into dev.Addresses (peopleid, address)
select 1, '123 someplace' union
select 1,'work place' union
select 2,'home address' union
select 3,'some address'
insert into dev.defaultaddress (peopleId, addressesid)
select 1,1 union
select 2,3
-- so two home addresses are default now
-- try adding another default address to Bill and you get an error
select * from dev.people
join dev.addresses on people.id = addresses.peopleid
left join dev.defaultAddress on defaultAddress.peopleid = people.id and defaultaddress.addressesid = addresses.id
insert into dev.defaultaddress (peopleId, addressesId)
select 1,2
GO
You could do it through an instead of trigger, or if you want it as a constraint create a constraint that references a function that checks for a row that has the default set to 1
EDIT oops, needs to be <=
Create table mytable(id1 int, defaultX bit not null default(0))
go
create Function dbo.fx_DefaultExists()
returns int as
Begin
Declare #Ret int
Set #ret = 0
Select #ret = count(1) from mytable
Where defaultX = 1
Return #ret
End
GO
Alter table mytable add
CONSTRAINT [CHK_DEFAULT_SET] CHECK
(([dbo].fx_DefaultExists()<=(1)))
GO
Insert into mytable (id1, defaultX) values (1,1)
Insert into mytable (id1, defaultX) values (2,1)
This is a fairly complex process that cannot be handled through a simple constraint.
We do this through a trigger. However before you write the trigger you need to be able to answer several things:
do we want to fail the insert if a default exists, change it to 0 instead of 1 or change the existing default to 0 and leave this one as 1?
what do we want to do if the default record is deleted and other non default records are still there? Do we make one the default, if so how do we determine which one?
You will also need to be very, very careful to make the trigger handle multiple row processing. For instance a client might decide that all of the records of a particular type should be the default. You wouldn't change a million records one at a time, so this trigger needs to be able to handle that. It also needs to handle that without looping or the use of a cursor (you really don't want the type of transaction discussed above to take hours locking up the table the whole time).
You also need a very extensive tesing scenario for this trigger before it goes live. You need to test:
adding a record with no default and it is the first record for that customer
adding a record with a default and it is the first record for that customer
adding a record with no default and it is the not the first record for that customer
adding a record with a default and it is the not the first record for that customer
Updating a record to have the default when no other record has it (assuming you don't require one record to always be set as the deafault)
Updating a record to remove the default
Deleting the record with the deafult
Deleting a record without the default
Performing a mass insert with multiple situations in the data including two records which both have isdefault set to 1 and all of the situations tested when running individual record inserts
Performing a mass update with multiple situations in the data including two records which both have isdefault set to 1 and all of the situations tested when running individual record updates
Performing a mass delete with multiple situations in the data including two records which both have isdefault set to 1 and all of the situations tested when running individual record deletes
#Andy Jones gave an answer above closest to mine, but bearing in mind the Rule of Three, I placed the logic directly in the stored proc that updates this table. This was my simple solution. If I need to update the table from elsewhere, I will move the logic to a trigger. The one default rule applies to each set of records specified by a FormID and a ConfigID:
ALTER proc [dbo].[cpForm_UpdateLinkedReport]
#reportLinkId int,
#defaultYN bit,
#linkName nvarchar(150)
as
if #defaultYN = 1
begin
declare #formId int, #configId int
select #formId = FormID, #configId = ConfigID from csReportLink where ReportLinkID = #reportLinkId
update csReportLink set DefaultYN = 0 where isnull(ConfigID, #configId) = #configId and FormID = #formId
end
update
csReportLink
set
DefaultYN = #defaultYN,
LinkName = #linkName
where
ReportLinkID = #reportLinkId

Faking record id result in poor db performance?

I have modeled some data into a table, but privacy is a very important issue. Whenever I create a new record I look for an unused random 9 digit id. (This is to avoid anybody being able to infer the order in which records were created in a worst case scenario.) By faking the id field do I risk losing database performance because it is used for addressing data in anyway? For SQLite3? This is a RubyonRails3 app and am still in a dev environment so not sure if SQLite3 will go to prod.
Larger ID values do not make index lookups any slower.
Smaller values use fewer bytes when stored in the database file, but the difference is unlikely to be noticeable.
For optimal performance, you should declare your ID column as INTEGER PRIMARY KEY so that ID lookups do not need a separate index but can use the table structure itself as index.
CREATE TABLE Bargains
(
RowID INT IDENTITY PRIMARY KEY,
Code AS ABS(CHECKSUM(NEWID())),
CustomerID INT
)
CREATE TABLE Bargains
(
RowID INT IDENTITY PRIMARY KEY,
TheOtherBit VARCHAR(4) NOT NULL DEFAULT(SUBSTRING(CONVERT(varchar(50), NEWID()),
CustomerID INT
)
We use NEWID() to generate a "random" value, take a few digits from that, put that in a SEPARATE field, and incorporate it in the "pretty value" shown to the user (and required when the user retrieves the data, but not required internally).
So we have
MyID INT IDENTITY NOT NULL PRIMARY KEY ...
TheOtherBit VARCHAR(4) NOT NULL DEFAULT(SUBSTRING(CONVERT(varchar(50), NEWID())
but internally for us it would be ordered on RowID and of course u wont have to generate a number randomly either and the user does not get to see ur RowID...
Here is some working code to explain how u can create Unique ids within the database
USE TEST
GO
CREATE TABLE NEWID_TEST
(
ID UNIQUEIDENTIFIER DEFAULT NEWID() PRIMARY KEY,
TESTCOLUMN CHAR(2000) DEFAULT REPLICATE('X',2000)
)
GO
CREATE TABLE NEWSEQUENTIALID_TEST
(
ID UNIQUEIDENTIFIER DEFAULT NEWSEQUENTIALID() PRIMARY KEY,
TESTCOLUMN CHAR(2000) DEFAULT REPLICATE('X',2000)
)
GO
-- INSERT 1000 ROWS INTO EACH TEST TABLE
DECLARE #COUNTER INT
SET #COUNTER = 1
WHILE (#COUNTER <= 50)
BEGIN
INSERT INTO NEWID_TEST DEFAULT VALUES
INSERT INTO NEWSEQUENTIALID_TEST DEFAULT VALUES
SET #COUNTER = #COUNTER + 1
END
GO
SELECT TOP 5 ID FROM NEWID_TEST
SELECT TOP 5 ID FROM NEWSEQUENTIALID_TEST
GO

Handling max(ID) in a concurrent environment

I am new to web application programming and handling concurrency using an RDBMS like SQL Server. I am using SQL Server 2005 Express Edition.
I am generating employee code in which the last four digits come from this query:
SELECT max(ID) FROM employees WHERE district = "XYZ";
I am not following how to handle issues that might arise due to concurrent connections. Many users can pick same max(ID) and while one user clicks "Save Record", the ID might have already been occupied by another user.
How to handle this issue?
Here are two ways of doing what you want. The fact that you might end up with unique constraint violation on EmpCode I will leave you to worry about :).
1. Use scope_identity() to get the last inserted ID and use that to calculate EmpCode.
Table definition:
create table Employees
(
ID int identity primary key,
Created datetime not null default getdate(),
DistrictCode char(2) not null,
EmpCode char(10) not null default left(newid(), 10) unique
)
Add one row to Employees. Should be done in a transaction to be sure that you will not be left with the default random value from left(newid(), 10) in EmpCode:
declare #ID int
insert into Employees (DistrictCode) values ('AB')
set #ID = scope_identity()
update Employees
set EmpCode = cast(year(Created) as char(4))+DistrictCode+right(10000+#ID, 4)
where ID = #ID
2. Make EmpCode a computed column.
Table definition:
create table Employees
(
ID int identity primary key,
Created datetime not null default getdate(),
DistrictCode char(2) not null,
EmpCode as cast(year(Created) as char(4))+DistrictCode+right(10000+ID, 4) unique
)
Add one row to Employees:
insert into Employees (DistrictCode) values ('AB')
It is a bad idea to use MAX, because with a proper locking mechanism, you will not be able to insert rows in multiple threads for the same district.
If it is OK for you that you can only create one user at a time, and if your tests show that the MAX scales up even with a lot of users per district, it may be ok to use it.
Long story short, dealing with identies, as much as possible, you should rely on IDENTITY. Really.
But if it is not possible, one solution is to handle IDs in a separate table.
Create Table DistrictID (
DistrictCode char(2),
LastID Int,
Constraint PK_DistrictCode Primary Key Clustered (DistrictCode)
);
Then you increment the LastID counter. It is important that incrementing IDs is a transaction separated to the user creation transaction if you want to create many users in parallel threads. You can limit to have only the ID generation in sequence.
The code can look like this:
Create Procedure usp_GetNewId(#DistrictCode char(2), #NewId Int Output)
As
Set NoCount On;
Set Transaction Isolation Level Repeatable Read;
Begin Tran;
Select #NewId = LastID From DistrictID With (XLock) Where DistrictCode = #DistrictCode;
Update DistrictID Set LastID = LastID + 1 Where DistrictCode = #DistrictCode;
Commit Tran;
The Repeatable Read and XLOCK keywords are the minimum that you need to avoid two threads to get the same ID.
If the table does not have all districts, you will need to change the Repeatable Read into a Serializable, and fork the Update with a Insert.
This can be done through Transaction Isolation Levels. For instance, if you specify SERIALIZABLE as the level then other transactions will be blocked so that you aren't running into this problem.
If I did not understand your question correctly, please let me know.

Most efficient design to search for this data in my database?

I have the following database tables and a view which represents that data. The tables are heirachial (if that is how u describe it) :-
EDIT: I've replace my 3 tables with
FAKE table names/data (for this post)
because I'm under NDA to not post
anything about out projects, etc. So
yeah.. I don't really save people
names like this :)
FirstNames
FirstNameId INT PK NOT NULL IDENTITY
Name VARCHAR(100)
MiddleNames
MiddleNameId INT PK NOT NULL IDENTITY
Name VARCHAR(100) NOT NULL
FirstNameId INT FK NOT NULL
Surnames
SurnameId INT PK NOT NULL IDENTITY
Name VARCHAR(100) NOT NULL
FirstNameId INT FK NOT NULL
So, the firstname is the parent table with the other two tables being children.
The view looks like...
PersonNames
FirstNameId
FirstName
MiddleNameId
MiddleName
SurnameId
Surname
Here's some sample data.
FNID FN MNID MN SNID SN
-----------------------------------
1 Joe 1 BlahBlah 1 Blogs
2 Jane - - 1 Blogs
3 Jon - - 2 Skeet
Now here's the problem. How can i efficiently search for names on the view? I was going to have a Full Text Search/Catalogue, but I can't put that on a view (or at least I can't get it working using the GUI against a View).
EDIT #2: Here are some sample search queries :-
exec uspSearchForPeople 'joe blogs' (1 result)
exec uspSearchForPeople 'joe' (1 result)
exec uspSearchForPeople 'blogs' (2 results)
exec uspSearchForPeople 'jon skeet' (1 result)
exec uspSearchForPeople 'skeet' (1 result)
Should i generate a new table with the full names? how would that look?
please help!
This doesn't seem like the most logical design decision. Why did you design it like this?
What's your indexing structure currently? A index on Name on each of the 3 tables should speed up the query?
Alternatively, normalizing further and creating a Name table and having NameID in each of the three, then indexing the Name table should also increase performance, but I think indexing the name field on the 3 tables would be easier and work as well.
What's the stats on updates vs selects, as adding these indexes might incur a performance hit.
crazy design, possibly the fake table names makes it stranger than it is.
create indexes based on select usage.
if you are searching on actual first names like "Joe" you need an index on FirstNames.Name
if you are searching on first name ids like 123, you have an index: FirstNames.FirstNameId
if you want to search on FirstNames.name and/or MiddleNames.name and/or Surnames.name you need to have indexes on the combinations that you will use, and the more you make, the harder for the query to pick the best one.
ditch the view and write a dedicated query for the purpose:
go after first/middle
select
FirstNames.name
,MiddleNames.name
,Surnames.name
FROM FirstNames
INNER JOIN MiddleNames ON FirstNames.FirstNameId=MiddleNames.FirstNameId
INNER JOIN Surnames ON FirstNames.FirstNameId=Surnames.FirstNameId
WHERE FirstNames.Name='John'
AND MiddleNames.Name='Q'
go after last
select
FirstNames.name
,MiddleNames.name
,Surnames.name
FROM Surnames
INNER JOIN FirstNames ON Surnames.FirstNameId =FirstNames.FirstNameId
INNER JOIN MiddleNames ON FirstNames.FirstNameId=MiddleNames.FirstNameId
WHERE Surnames.Name='Public'
just make sure you have indexes to cover your main table in the "where" clause
use SET SHOWPLAN_ALL ON to make sure you are using an index ("scans" are bad "seeks" are good")
EDIT
if possible break the names apart before searching for them:
exec uspSearchForPeople 'joe',null,'blogs' (1 result)
exec uspSearchForPeople 'joe',null,null (1 result)
exec uspSearchForPeople null,null,'blogs' (2 results)
exec uspSearchForPeople 'jon',null,'skeet' (1 result)
exec uspSearchForPeople null,null,'skeet' (1 result)
within the stored procedure, have three queries:
if #GivenFirstName is not null
--search from FirstNames where FirstNames.name=#value & join in other tables
else if #GivenMiddleName is not null
--search from MiddleNames where MiddleNames.name=#value & join in other tables
else if #GivenLastName is not null
--search from Surnames where Surnames.name=#value & join in other tables
else --error no names given
have an index on all three tables for Names.
if you can not split the names apart, I think you are out of luck and you will have to table scan every row in each table.
Just think of a phone book if you don't use the index and you are looking for a name, you will need to read the entire book
I would have just one table with a name type column (first, middle, last) and an FK onto itself with the clustered index on the name column.
CREATE TABLE [Name] (
NameID INT NOT NULL IDENTITY,
[Name] varchar(100) not null,
NameType varchar(1) not null,
FirstNameID int null,
)
ALTER TABLE [Name] ADD CONSTRAINT PK_Name PRIMARY KEY NONCLUSTERED (NameID)
ALTER TABLE [Name] ADD CONSTRAINT FK_Name_FirstNameID FOREIGN KEY (FirstNameID) REFERENCES [Name](NameID)
CREATE CLUSTERED INDEX IC_Name ON [Name] ([Name], NameType)
DECLARE #fid int
INSERT [Name] ([Name], NameType, FirstNameID) VALUES ('Joe', 'F', NULL)
SELECT #fid = scope_identity()
INSERT [Name] ([Name], NameType, FirstNameID) VALUES ('BlahBlah', 'M', #fid)
INSERT [Name] ([Name], NameType, FirstNameID) VALUES ('Blogs', 'L', #fid)
INSERT [Name] ([Name], NameType, FirstNameID) VALUES ('Jane', 'F', NULL)
SELECT #fid = scope_identity()
INSERT [Name] ([Name], NameType, FirstNameID) VALUES ('Blogs', 'L', #fid)
INSERT [Name] ([Name], NameType, FirstNameID) VALUES ('Jon', 'F', NULL)
SELECT #fid = scope_identity()
INSERT [Name] ([Name], NameType, FirstNameID) VALUES ('Skeet', 'L', #fid)
You could then build a dynamic but paramterized WHERE clause based on the number of values to search (or hard-code them for that matter assuming there are only at most 3) using sp_executsql in a stored proc, linq to sql, or even ugly string manipulation in code.
I think what you are wanting is an Index table. It doesn't matter how many tables and columns you have in those tables as stuff is inserted into the database it gets indexed. ex.
I would recommend one table for your names.
NameTable
----------
Id
FirstName
MiddleName
LastName
You can have as many normal tables as you want...
IndexTable
----------
Id
Text
You could use the text as the primary key but I always have a separate id column for the primary key (just habit).
IndexItemTable
----------
Id
IndexId // Has a foreign key reference to IndexTable Id
ReferenceId // The record Id of where the text occures
ReferenceTable // The table where the text occures
Then as you insert a name "Jim Barbarovich Fleming" you would also scan you index and find that its empty and create 3 new records for Jim, Barbarovic, and Fleming that would all have the same referenceId and the ReferenceTable would be "NameTable" then you insert another record like "Jim Bradley Fleming" you would scan the index table and see that you already have values for "Jim" and "Fleming" so you would just create IndexItem with referenceId of 2 and ReferenceTable of "NameTable".
By building and index you can search via a single textbox and find all records/fields in your database that have those values.
Note: you going to want to change everything when you insert it to the index to uppercase or lower case and then use equals(value, OrdinalIgnoreCase).
Edit:
I can't just upload the image. I have to host it somewhere I guess but It not any different than the table diagrams I put above. The only relationship IndexTable has is to IndexItemTable. I would do the rest in code. ex.
During Insert or Update of new record in Name table you would have to:
Scan IndexTable and see if each of the fields in the NameTable exist.
If they don't you would add a new record to the Index table with the text that wasn't found. If they do the go on to step 3.
Add a record in the IndexItemTable with the referenceId (the id of the record in the NameTable) and ReferenceTable (NameTable) and then the IndexId of the text found in the IndexTable.
Then when they do a search via your single text box you search for each word in the index table and return the Names from the NameTable that are referenced in the IndexTable.

Resources