I try to create an index view, and unique clustered index on the view. My problem is that how to generate a primary key within a select clause. E.g.
Create view ssrs.vMyView
with schemabinding
as
select firstname, lastname, other columns --example columns
from mytable
how to generate a primary key for each row on the fly?
Update
The problem is that it does NOT have unique columns or combination of columns, so I need to generate a unique id on the fly. Firstname and lastname are just example. There are primary key for the base table.
Thanks in advance!
Once you've created this view, if you obeyed all the rules and requirements for an indexed view, you should be able to just create the clustered index like this:
CREATE CLUSTERED INDEX cix_vMyView ON dbo.vMyView(....)
You need to choose a good, valid clustering key - preferably according to the NUSE principle:
narrow
unique
static
ever-increasing
An INT IDENTITY would be perfect - or something like a BIGINT or a combination of INT and DATETIME.
Update: seeing that your base table doesn't even have a primary key (THAT's a much bigger problem you'll need to fix ASAP!! If it doesn't have a primary key, it's not a table), you could use something like ROW_NUMBER() in your view definition:
CREATE VIEW ssrs.vMyView
WITH SCHEMABINDING
AS
SELECT firstname, lastname,
ROW_NUMBER() OVER(ORDER BY Lastname, FirstName) AS 'ID'
FROM dbo.mytable
to give you an "artificial" unique, ever-increasing primary key.
(Update 2014-Apr-25: unfortunately, contrary to my belief at the time of posting this, this won't work since you cannot create a clustered index on a view that contains a ranking function like ROW_NUMBER .....). Thanks to #jspaey for pointing that out. So this makes it even more important to have a primary key on the base tables and include that in your view definition!)
But again: if your base table doesn't have a primary key - fix that first !!
Update #2: ok, so your base table(s) does have a primary key after all - then why isn't that part of your view definition? I would always include all the primary keys from all base tables in my views - only those PK enable you to clearly identify rows from the base table, and they allow you to make your views updateable.
Pingpong, Marc is right that you need something that is unique to add a primary key. Remember that this does not need to be a single column, so if you have two columns that are unique together that would work perfectly well.
If no combination of columns is unique, you probably wish to rethink your view or even add columns so that there is something unique.
As a related note, remember that Enterprise edition will take advantage of indexed views automatically. But outside of Enterprise Edition, you may need to explicitly tell the optimizer to use the index through the noexpand hint. I wrote about that previous at On Indexes and Views
Related
Model: merchants have one to many customers and customers have one to many accounts. Accounts have names. All three tables have unique IDs for each row.
Constraint: For a given merchant, the account names must be unique.
How do we enforce this constraint in a SQL Server database schema?
Here'some ideas we've considered:
We could add MerchantId to the Account table and create a unique constraint, but it's a redundant column to maintain given that CustomerId is already there. We'd need to make sure the combination of MerchantId and CustomerId are themselves consistent, so we'd make the foreign key between Account and Customer include both columns, even though CustomerId is already a unique identifier.
We could add a check constraint to the Account table and use a UDF to check the constraint rule. But then a Customer could conceivable be assigned to a different Merchant, and the check constraint on Account wouldn't be checked. So we'd have to add another constraint on the Customer table, which starts to seem like we're doing it wrong, especially as the real model gets more complex than described here.
We could enforce the constraint via triggers, but this doesn't seem to improve upon the shortcomings with using check constraints.
Maybe the best idea of a solution is to create a view joining the 3 tables and create a unique index on that view. That would be an indexed view. When you index a view it gets persisted just like a regular table, but it's updated automagically by the database engine as part of regular DML commands.
There are lots of requirements and restrictions on what you can and cannot do, those are in the docs I linked to above, but I think you can get away with it.
The code would go like this:
CREATE VIEW dbo.MerchantAccounts
WITH SCHEMABINDING
AS
SELECT m.MerchantKey, a.AccountKey, a.Name
FROM dbo.Accounts a
INNER JOIN dbo.Customers c
ON a.CustomerKey = c.CustomerKey
INNER JOIN dbo.Merchants m
ON c.MerchantKey = m.MerchantKey;
GO
CREATE UNIQUE CLUSTERED INDEX IX_MerchantKey_AccountName
ON dbo.MerchantAccounts (MerchantKey, Name);
GO
I included 3 columns in the view but only 2 are part of the unique clustered index. So you must not have duplicated MarchantKey,AccountName to begin with, and after that the database engine will ensure that for you.
You don't need to change your table and your relationships as long as you don't violate the requirements.
You can include more columns than just the key columns in your indexes view, and that can help performance for some queries. That's up to you. Just be aware that the resultset of the view (the equivalent of SELECT * FROM dbo.MerchantAccounts) will be persisted on your database and will take up space. So the more columns you add the bigger the view gets and the more expensive it gets to maintain it up to date.
We are trying to enforce a unique table constraint on certain datatables in SQL Server, which I have working but I am running into a few issues. I want it to be ordered by Primary Key, but if I include that in the Index Keys, it no longer enforces uniqueness because it obviously will always have a unique ID since its a primary key.
If I remove the ID from the indexed keys, it works as it is supposed to but it no longer sorts by Primary Key anymore, which is what I want. It sorts by another one of the columns.
How do I include the primary key in the constraint so I can use it for sorting, but have it be ignored when checking the table constraint for uniqueness(ie, it should still not allow a new record to be written if all other info is the same other than ID)?
UPDATE: How do I handle a situation where a table has more columns than can be put into an index? Can I not enforce no duplicate entries in these?
A Relational database is built based on Set theory and Predicate logic. And according to Set theory There is no difference between sets like A {1,2,3} & B {2,3,1}.
So this is the reason there is no guarantee in any RDBMS where results will come in particular order.
But you will get them in your order when you provide an ORDER BY in the SELECT statement explicitely.
So better you do it in front end or by adding an Order By clause to your query.
In SQL Server I have the following lookup table that holds degree levels:
create table dbo.DegreeLevel
(
Id int identity not null
constraint PK_DegreeLevel_Id primary key clustered (Id),
Name nvarchar (80) not null
constraint UQ_DegreeLevel_Name unique (Name)
)
Should I use identity on the ID?
When should I use identity or a simple int in a lookup table?
After dealing with multiple environments where we move changes from one environment to the next, I'd say not to use identity columns on look up tables.
Here's why: if you need to reference an ID as a "magic #", you need consistency. Ideally, you wouldn't ever reference a magic #, but in reality, that is not what is commonly done. And it's a pain to correct when the IDs are out of sync. And it's really not much more effort to insert the table's data with an ID.
In a lookup table, having a "normal" Id INT might be better, because it gives you the ability to pick and choose the Id values. You get to define which values you have, and what they mean.
Identity is very useful for actual data tables, where you just need to know that you have a good, unique ID value - but you don't really care about what that value is.
I guess it comes down to whether or not you have a natural candidate to use in the clustered index...
If you already have a property that can uniquely identify the row, then its definitely worth considering whether adding an identity column is the right move.
If you don't have a natural candidate, then you'd need to invent a value and in this case using an identity column or sequence is probably easier than hand-rolling something.
As an example of having a natural key, imagine a 'DegreeModule' table where each module had a 4-character reference code that was printed on course materials (e.g. U212)
In this case, I would definitely skip creating an internal identifier and use the natural identifier as primary key...
create table dbo.DegreeModule
(
Reference char(4) not null primary key clustered,
Name nvarchar(80) not null
constraint UQ_DegreeModule_Name unique (Name)
/* .. plus FK's for stuff like parent degree, prerequisites,etc .. */
)
When you specify Identity property on an integer column on any table, the column becomes an auto-incrementing integer column. If you want your lookup table to create the id value automatically when you insert any row, use identity. if you want to create it yourself, just define the column as int.
A Table can only have one identity column
You cannot manually insert / update values in an identity column unless you specify SET identity_insert on
If you are going to use some object relational mapping (ORM) tool, refer to its documentation. In that case, you most probably would like to allow ORM to handle the primary key and you should not use identity.
If you have no specific requirements for primary key generation, then using identity here is fine. Specific requirements may be: primary keys follow special format, primary keys should be globally unique, primary keys are imported from other database, e.g. by insert into DegreeLevel values (1, 'Bachelor') etc.
If I have a table like the Following
CustomerAddress(CustomerId, AddressId)
Would I still need an additional primary key, e.g., int auto increment? Or would setting both the columns as primary keys be sufficient?
ASSUMPTION: When deleting, I will only delete by customerId, never by both customerId and AddressId
I suggest you to keep a primary key. Though it is not useful for now, it might be useful in future. May be the combination customerid and addressid could have new field like current_address_flag. And its just about creating a field that is almost manipulated by the DB system.
It seems this is a join table. In this case, I'd have a cascading delete between the dependent objects, e.g. when a customer is deleted, all customerAddresses belonging to said customer are also deleted.
Here's my simple scenario:
I've a Users table and a Locations table. ONE User can be related to MANY Locations so I've a UserLocation table which is as follows:
ID (int-Auto Increment) PK
UserID (Int FK to the Users table)
LocID (Int FK to the Locations table)
Now, as ID is the PK it is Indexed by default in SQL-Server. I was a bit confused about the other two columns:
OPT 1: Shud I define an Index on both the columns like:
IX_UserLocation_UserID_LocID
OR
OPT 2: Shud I define two separate Indexes like : IX_UserLocation_UserID
& IX_UserLocation_LocID
Pardon me if both do the same - in that case pls explain. If not - which one is better and why?
You need
2 columns
UserID (Int FK to the Users table)
LocID (Int FK to the Locations table)
One PK on both (UserID, LocID)
Another index on the reverse (LocID, UserID)
You may not need both indexes but it's rare
Edit, some links to other SO answers
SQL: Do you need an auto-incremental primary key for Many-Many tables?
SQL - many-to-many table primary key
Difference between 2 indexes with columns defined in reverse order
There are several things we hire the database for. One is fast information retrieval and another is declarative referential integrity (DRI).
If you requirement is that a user may be related to a given location only once then you want a unique index on UserID & LocatonID.
If your question is how to retrieve the data fast the answer is -- it depends. How are you accessing the data? If you always get the entire set of locations for a user then I would probably use a clustered non-unique index on UserID. If your access is "who is in locatin x?" then you probably want a clustered non-unique index on LocationID.
If you ask both questions you'll probably want both indexes (although you only get 1 clustered, so the 2nd index may want to use an INCLUDE to grab the other column).
Either way, you probalby don't want ID as your clustered index (the default when marking a column as PK in SSMS table designer).
HTH,
-eric
In addition to the "gbn" answer. It will depend on the Where clause. Whether you are using user or location or both
You should probably create two separate indexes. One thing that is often forgotten with foreign keys is the fact that deleting a user might cascade-delete the user-location relation in your table. If there is no index on userID, this might lead to a table-lock of your user-location relation. The same applies to deleting a location.
The best way to setup all the indexed you think you need on dev and check look at the query plans of the queries your app runs and see what indexes get read.