Updating a table after adding Index - database

I am designing a database using SQLExpress.
I have a table which has three columns. The table looks as below.
CREATE TABLE [dbo].[dummy](
[id] [int] IDENTITY(1,1) NOT NULL,
[someLongString] [text] NOT NULL,
[someLongText_Hash] [binary](20) NOT NULL,
CONSTRAINT [PK_dummy] PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
I already have some data in this table. Whenever I want to add a new row, I first compute a hash on someLongString and query the table to see if a row with this hash already exists. As the table size grows, this query talks longer time and hence I plan to index it by the someLongText_Hash column.
Can some please suggest how to do this in SQL Server Management Studio. Also, after adding this index, how do I index the existing rows in this table ?

Why can't you just set the 'someLongString' field to be unique? That way you don't need to keep a hash and an extra primary key?
You could try using a CHECKSUM.
CREATE TABLE [dbo].[dummy](
[id] [int] IDENTITY(1,1) NOT NULL,
[someLongString] [text] NOT NULL,
[someLongText_CheckSum] NOT NULL,
CONSTRAINT [UC_someLongText_CheckSum] UNIQUE (someLongText_CheckSum),
CONSTRAINT [PK_dummy] PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
See here for further explanation

Related

How can I add values of a column but only where the artist is the same, in SQL Server?

I have 2 tables, first table:
CREATE TABLE [dbo].[songs]
(
[ID_Song] [INT] IDENTITY(1,1) NOT NULL,
[SongTitle] [NVARCHAR](100) NOT NULL,
[ListenedCount] [INT] NOT NULL,
[Artist] [INT] NOT NULL,
CONSTRAINT [PK_songs]
PRIMARY KEY CLUSTERED ([ID_Song] ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[songs] WITH CHECK
ADD CONSTRAINT [FK_songs_artists]
FOREIGN KEY([Artist]) REFERENCES [dbo].[artists] ([ID_Artist])
GO
And second table:
CREATE TABLE [dbo].[artists]
(
[ID_Artist] [INT] IDENTITY(1,1) NOT NULL,
[Name] [NVARCHAR](100) NOT NULL,
CONSTRAINT [PK_artists]
PRIMARY KEY CLUSTERED ([ID_Artist] ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
As you can see column Artist in table Songs references column ID_Artist of table Artists.
I want to get all Artists by summing up ListenedCount of all their songs where it's value is greater than a value.
I have trouble writing the query.
There are many ways to achieve it.
One is by summing in a subquery and using it, the sum, as a filter in the query.
select art.[Name], gba.[ListenedSum]
from [dbo].[artists] art
join
(
select sg.[Artist], sum(sg.[ListenedCount]) as [ListenedSum]
from [dbo].[songs] sg
group by sg.[Artist]
) as gba on gba.[Artist] = art.[ID_Artist]
where gba.[ListenedSum] > 1000000
A more direct way can be using HAVING
select art.[Name], sum(sg.[ListenedCount]) as [ListenedSum]
from [dbo].[artists] art
join [dbo].[songs] sg on sg.[Artist] = art.[ID_Artist]
group by art.[Name]
having sum(sg.[ListenedCount]) > 1000000
It's interesting to note the engine can end running these two queries in different ways (not guaranteed) and they can end with different performances.
There's another interesting way, like using a CTE but I think it's a bit more complicated.

Table property 'name' in sys.filegroups is listed as 'Primary' when it is not needed

I am creating a script to create a table. When the table has a large object column such as varchar(max) the table needs the text TEXTIMAGE_ON [PRIMARY].
To determine the name of the file group, I use the system table, sys.filegroups and the column name. This returns the name of the file group. In the above example it is PRIMARY.
However, one of the tables that I am scripting has this value of PRIMARY when there are no columns in the table that are text, ntext, varchar(max), etc. So the script is failing when it gets to this table because TEXTIMAGE_ON is not allowed unless you have a large object column.
Why in the system table of sys.filegroups is the name not NULL because the TEXTIMAGE_ON should not be set?
This is the query I use to get the attribute of TextImageOnFileGroup:
SELECT [Table].name AS "TableName", fg.name AS "TextImageOnFileGroup"
FROM sys.tables as [Table] left outer join sys.filegroups as fg on [Table].lob_data_space_id = fg.data_space_id
WHERE [Table].name in ('ProviderSpecialty', 'ProviderState', 'ProviderExclusion')
This returns
TableName TextImageOnFileGroup
ProviderSpecialty PRIMARY
ProviderState NULL
ProviderAttribute PRIMARY
The tables ProviderState does not have any columns that have a large object so TextImage_On is NULL. This is expected.
CREATE TABLE [dbo].[ProviderState](
[Id] [int] IDENTITY(1,1) NOT NULL,
[ProviderId] [numeric](19, 0) NOT NULL,
[State] [nvarchar](2) NOT NULL,
CONSTRAINT [PK_dbo.ProviderState] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
The table ProviderAttribute has a varchar(max) column so the TextImage_on property should be set:
CREATE TABLE [dbo].[ProductAttribute](
[ProductAttributeId] [int] IDENTITY(1,1) NOT NULL,
[Description] [nvarchar](max) NOT NULL,
CONSTRAINT [PK_dbo.ProductAttribute] PRIMARY KEY CLUSTERED
(
[ProductAttributeId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
The table ProviderSpecialty is the problem. It has the TextImage_on property set yet there is no column to warrant this.
CREATE TABLE [dbo].[ProviderSpecialty](
[Id] [int] IDENTITY(1,1) NOT NULL,
[ProviderId] [numeric](19, 0) NOT NULL,
[SpecialtyId] [int] NOT NULL,
CONSTRAINT [PK_dbo.ProviderSpecialty] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
I am using the system tables to create a script that creates these tables. If the property name in the sys.filegroups table is set to PRIMARY I add to the table script 'TEXTIMAGE_ON [PRIMARY]`.
When I run the script for this table, it errors because the TEXTIMAGE_ON property is not allowed unless there is a column of text, varchar(max), etc.
Why is the property set in the system table if it is not needed?

Optimization for Date Correlation doesn’t change plan

I have a reporting requirement from the following tables. I created a new database with these tables and imported data from the live database for reporting purpose.
The report parameter is a date range. I read the following and found that DATE_CORRELATION_OPTIMIZATION can be used to make the query work faster by utilizing seek instead of scan. I made the required settings – still the query is using same old plan and same execution time. What additional changes need to be made to make the query utilize the date correlation?
Note: I am using SQL Server 2005
REFERENCES
Optimizing Queries That Access Correlated datetime Columns
The Query Optimizer: Date Correlation Optimisation
SQL
--Database change made for date correlation
ALTER DATABASE BISourcingTest
SET DATE_CORRELATION_OPTIMIZATION ON;
GO
--Settings made
SET ANSI_NULLS ON
SET ANSI_PADDING ON
SET ANSI_WARNINGS ON
SET ARITHABORT ON
SET CONCAT_NULL_YIELDS_NULL ON
SET QUOTED_IDENTIFIER ON
SET NUMERIC_ROUNDABORT OFF
GO
--Test Setting
IF ( (sessionproperty('ANSI_NULLS') = 1) AND
(sessionproperty('ANSI_PADDING') = 1) AND
(sessionproperty('ANSI_WARNINGS') = 1) AND
(sessionproperty('ARITHABORT') = 1) AND
(sessionproperty('CONCAT_NULL_YIELDS_NULL') = 1) AND
(sessionproperty('QUOTED_IDENTIFIER') = 1) AND
(sessionproperty('NUMERIC_ROUNDABORT') = 0)
)
PRINT 'Everything is set'
ELSE
PRINT 'Different Setting'
--Query
SELECT C.ContainerID, C.CreatedOnDate,OLIC.OrderID
FROM ContainersTest C
INNER JOIN OrderLineItemContainers OLIC
ON OLIC.ContainerID = C.ContainerID
WHERE C.CreatedOnDate > '1/1/2015'
AND C.CreatedOnDate < '2/01/2015'
TABLES
CREATE TABLE [dbo].[ContainersTest](
[ContainerID] [varchar](20) NOT NULL,
[Weight] [decimal](9, 2) NOT NULL DEFAULT ((0)),
[CreatedOnDate] [datetime] NOT NULL DEFAULT (getdate()),
CONSTRAINT [XPKContainersTest] PRIMARY KEY CLUSTERED
(
[CreatedOnDate] ASC,
[ContainerID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[OrderLineItemContainers](
[OrderID] [int] NOT NULL,
[LineItemID] [int] NOT NULL,
[ContainerID] [varchar](20) NOT NULL,
[CreatedOnDate] [datetime] NOT NULL DEFAULT (getdate()),
CONSTRAINT [PK_POLineItemContainers] PRIMARY KEY CLUSTERED
(
[OrderID] ASC,
[LineItemID] ASC,
[ContainerID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
CONSTRAINT [IX_OrderLineItemContainers] UNIQUE NONCLUSTERED
(
[ContainerID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
ALTER TABLE [dbo].[OrderLineItemContainers] WITH CHECK ADD CONSTRAINT [FK_POLineItemContainers_Containers] FOREIGN KEY([ContainerID])
REFERENCES [dbo].[Containers] ([ContainerID])
GO
ALTER TABLE [dbo].[OrderLineItemContainers] CHECK CONSTRAINT [FK_POLineItemContainers_Containers]
Plan
--
According to the docs:
https://technet.microsoft.com/en-us/library/ms177416(v=sql.105).aspx
If any one of the datetime columns for which correlation statistics are maintained is not the first or only key of a clustered index, consider creating a clustered index on it. Doing this generally leads to better performance on the types of queries covered by correlation statistics. If a clustered index already exists on the primary key columns, you can modify a table so that the clustered index and primary key use different column sets.
Since your OrderLineItemContainers table has no suitable index by which to filter on the Date, it really can't do anything. Try adding a nonclustered index on the OrderLineItemContainers.CreatedOnDate to see if it will then switch the plan.
It would be better to have it be clustered, but there are other considerations... note you could make the primary key nonclustered, and use the clustered for this new date index if this is the dominant query and this makes it worth it.
So this is optimal:
CREATE TABLE [dbo].[OrderLineItemContainers](
[OrderID] [int] NOT NULL,
[LineItemID] [int] NOT NULL,
[ContainerID] [varchar](20) NOT NULL,
[CreatedOnDate] [datetime] NOT NULL DEFAULT (getdate()),
CONSTRAINT [PK_POLineItemContainers] PRIMARY KEY NONCLUSTERED -- NONCLUSTERED PRIMARY KEY!!
(
[OrderID] ASC,
[LineItemID] ASC,
[ContainerID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
CONSTRAINT [IX_OrderLineItemContainers] UNIQUE NONCLUSTERED
(
[ContainerID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
CREATE CLUSTERED INDEX ON OrderLineItemContainers(CreatedOnDate)
OR you could just try a new NONCLUSTERED index:
CREATE NONCLUSTERED INDEX ON OrderLineItemContainers(CreatedOnDate)

primary key name is required field?

Is there any difference between the below 2 CREATE TABLE statements in SQL Server 200x/2012? I generated this script from two different tables, one had a Key name defined (PK_Table1) whereas the other had some kind of randomly generated number associated to it (PK_Table1_1084F446).
CREATE TABLE [dbo].[Table1](
[ID] [uniqueidentifier] NOT NULL,
<<Other Column declaration here>>
PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
Few more non-clustered indexes declaration here
CREATE TABLE [dbo].[Table1](
[ID] [uniqueidentifier] NOT NULL,
<<Other Column declaration here>>
CONSTRAINT [PK_Table1] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
Few more non-clustered indexes declaration here
It works in the same way, but natural names are more convenient:
1) when altering constraint you can easy refer to it (if you gave sensible name);
2) when query failed due to constraint, name of this constraint is showed, so you can easily know what cause an error (if you gave sensible name).

Database Design - Preventing duplications for "Room" table

Hey everyone, I'm trying to create a database for a personal friend of mine and given my inexperience with developing databases I'm having difficulty trying to establish one I'm currently dealing with. Essentially, my is issue is with my "rooms" table which has an association with another table called "location"; The location is the is everything you would expect (buildingID, streetAddress,etc.), and Room has a foreign key containing the buildingId. I want my "rooms" table to have unique values for room numbers based on the buildingId.
To give you a clearer idea, I'll just c&p the script I'm using to create those tables.
CREATE TABLE [dbo].[Location](
[buildingId] [int] IDENTITY(1,1) NOT NULL,
[streetAddress] [varchar](50) NOT NULL,
[postalCode] [varchar](7) NOT NULL,
[province] [varchar](30) NOT NULL,
[city] [varchar](30) NOT NULL,
CONSTRAINT [PK_Location] PRIMARY KEY CLUSTERED
(
[buildingId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
CONSTRAINT [UN_postalCode] UNIQUE NONCLUSTERED
(
[postalCode] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
CONSTRAINT [UN_streetAddress] UNIQUE NONCLUSTERED
(
[streetAddress] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
CREATE TABLE [dbo].[Room](
[rmId] [int] IDENTITY(1,1) NOT NULL,
[roomNum] [varchar](10) NOT NULL,
[floor] [int] NOT NULL,
[capacity] [int] NOT NULL,
[permission] [bit] NOT NULL,
[buildingId] [int] NOT NULL,
CONSTRAINT [PK_Room_1] PRIMARY KEY CLUSTERED
(
[rmId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
ALTER TABLE [dbo].[Room] WITH CHECK ADD CONSTRAINT [FK_Room_Location] FOREIGN KEY([buildingId])
REFERENCES [dbo].[Location] ([buildingId])
GO
ALTER TABLE [dbo].[Room] CHECK CONSTRAINT [FK_Room_Location]
GO
Any help would greatly be appreciated.
Thanks.
A table level unique constraint?
ALTER TABLE dbo.Room WITH CHECK ADD
CONSTRAINT UQ_Room_RoomBuildingLocation UNIQUE (roomNum, buildingId)
This can be a unique index too which would allow INCLUDE columns

Resources