High index fragmentation - sql-server

we have a SQL Server database running that for some reason encounters massive index fragmentation (up to 85% and higher) and we find ourselves rebuilding indexes almost daily to stop this.
We are at a loss however why this is happening. The tables use newsequentialid() to generate the new GUIDs (primary keys), so we think new rows should always be added at the end, but we feel like this is not the case which maybe results in high fragmentation rate?
Does anyone have any ideas, things we can try to alleviate this problem or further diagnose it?
An example table would be :
CREATE TABLE [dbo].[EmailMessages](
[Id] [uniqueidentifier] NOT NULL,
[Name] [nvarchar](max) NULL
CONSTRAINT [PK_dbo.EmailMessages] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
ALTER TABLE [dbo].[EmailMessages] ADD CONSTRAINT [DF__EmailMessage__Id__7993056A] DEFAULT (newsequentialid()) FOR [Id]
GO

Related

How do I insert a new record to a table containing just an IDENTITY column?

I have a single-column table where the column is a primary key and clustered index. It is used on other tables to relate records together. It doesn't seem an Insert statement is the way to go, there's no other columns to populate. It's a bit cumbersome to SET IDENTITY_INSERT off and on, etc.
I just need to "increment" the primary key of the table to the next integer value.
I believe it's an easy problem to solve, but I'm at that stage of mental exhaustion where the wheel is still spinning but the hamster is dead.
Here is a script to recreate the table I'm working with.
CREATE TABLE [dbo].[PKOnly]
(
[Id] [BIGINT] IDENTITY(1,1) NOT NULL,
CONSTRAINT [PK_PKOnly]
PRIMARY KEY CLUSTERED ([Id] ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY];
You can use DEFAULT VALUES:
INSERT dbo.PKOnly DEFAULT VALUES;
Example db<>fiddle
Note this will also work if you have other columns with defaults.

AspNetUserLogins table and maximum size of index keys in SQL Server

The schema of the identity model in VS2017/aspnetcore defines a table called AspNetUserLogins table to store external logins (CREATE statement below). It defines the primary key as a composite of [LoginProvider] [nvarchar] (450) and [ProviderKey] [nvarchar] (450). The SQL server limits for the maximum size of index keys is specified at 900 bytes here. A note on that page specifically says
"If a table column is a Unicode data type such as nchar or nvarchar,
the column length displayed is the storage length of the column. This
is two times the number of characters specified in the CREATE TABLE
statement. In the previous example, City is defined as an nvarchar(30)
data type; therefore, the storage length of the column is 60."
So is this key not twice the allowed size?
Sql Server Management Studio seems to think so....
Warning! The maximum key length for a clustered index is 900 bytes.
The index 'PK_AspNetUserLogins' has maximum length of 1800 bytes. For
some combination of large values, the insert/update operation will
fail.
CREATE TABLE [dbo].[AspNetUserLogins](
[LoginProvider] [nvarchar](450) NOT NULL,
[ProviderKey] [nvarchar](450) NOT NULL,
[ProviderDisplayName] [nvarchar](max) NULL,
[UserId] [nvarchar](450) NOT NULL,
CONSTRAINT [PK_AspNetUserLogins] PRIMARY KEY CLUSTERED
(
[LoginProvider] ASC,
[ProviderKey] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
Looks like they know...issue1451
It looks as though this will cause subsequent issues. I originally created my database on my desktop prior to deploying it to Azure and there is a significant difference between the 2 databases. In SSMS, using the "Script Table as > CREATE table", the tables designs are:
Azure database:
CREATE TABLE [dbo].[AspNetUserLogins](
[LoginProvider] [nvarchar](225) NOT NULL,
[ProviderKey] [nvarchar](225) NOT NULL,
[ProviderDisplayName] [nvarchar](max) NULL,
[UserId] [nvarchar](450) NOT NULL,
CONSTRAINT [PK_AspNetUserLogins] PRIMARY KEY CLUSTERED
(
[LoginProvider] ASC,
[ProviderKey] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
)
Desktop database:
CREATE TABLE [dbo].[AspNetUserLogins](
[LoginProvider] [nvarchar](450) NOT NULL,
[ProviderKey] [nvarchar](450) NOT NULL,
[ProviderDisplayName] [nvarchar](max) NULL,
[UserId] [nvarchar](450) NOT NULL,
CONSTRAINT [PK_AspNetUserLogins] PRIMARY KEY CLUSTERED
(
[LoginProvider] ASC,
[ProviderKey] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
Note the [PRIMARY] references, I cannot get these into Azure. This results in the following error from a: MVC Net core 2 website using Microsoft.AspNetCore.Identity;
MVC Net Core 2.0 error resulting from the inability to add primary clustered keys

violation of primary key constraint in insert query not touching the PK column

I have a query that inserts record in a table. the primary key column of that table is an Identity field that auto-increments. the select part of the query will have duplicates, but I have an an unique constraint with ignore_dup_key=on on fields (city_nm, prov_en_nm) that should skip them on insert. this used to work fine, but for some reason now it gives me this message. this is the first time I try it since the database was moved from a 2012 sql server to a 2014 if that can have an impact
Violation of PRIMARY KEY constraint 'Dim_city_province_country_pk'. Cannot insert duplicate key in object 'HD_DtlClm.dim_city_province_country_t'. The duplicate key value is (###). (where ### is an ID, a different one every time I run it)
Here is the query.
INSERT INTO HD_DtlClm.[dim_city_province_country_t] (
city_nm, prov_en_nm, prov_fr_nm, contry_fr_nm, contry_en_nm
)
SELECT gr_mbr_city_nm, PROV_ENG_NM, PROV_FR_NM, CONTRY_ENG_NM, CONTRY_FR_NM
FROM isu.gr_dentl_clm_v
LEFT JOIN HD_DtlClm.province_information_t
ON gr_dentl_clm_v.gr_mbr_prov_cd = HD_DtlClm.province_information_t.PROV_CLM_CD
UNION
SELECT gr_prvdr_city_nm, PROV_ENG_NM, PROV_FR_NM, CONTRY_ENG_NM, CONTRY_FR_NM
FROM isu.gr_dentl_clm_v
LEFT JOIN HD_DtlClm.province_information_t
ON gr_dentl_clm_v.gr_prvdr_prov_cd IN (HD_DtlClm.province_information_t.PROV_ENG_CD, HD_DtlClm.province_information_t.PROV_CLM_CD)
Any idea why I get this error that I didn't get in the past?
EDIT to add primary key creation script:
ALTER TABLE [HD_DtlClm].[dim_city_province_country_t] ADD CONSTRAINT [Dim_city_province_country_pk] PRIMARY KEY CLUSTERED
( [cpc_key] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
EDIT2 to add table creation script
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [HD_DtlClm].[dim_city_province_country_t](
[cpc_key] [int] IDENTITY(1,1) NOT NULL,
[city_nm] [char](50) NOT NULL,
[prov_en_nm] [char](50) NULL,
[prov_fr_nm] [char](50) NULL,
[contry_en_nm] [char](75) NULL,
[contry_fr_nm] [char](75) NULL,
[create_ts] [datetime] NOT NULL,
[update_ts] [datetime] NOT NULL,
CONSTRAINT [Dim_city_province_country_pk] PRIMARY KEY CLUSTERED
(
[cpc_key] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
CONSTRAINT [dim_city_province_country_ak1] UNIQUE NONCLUSTERED
(
[city_nm] ASC,
[prov_en_nm] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = ON, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
ALTER TABLE [HD_DtlClm].[dim_city_province_country_t] ADD DEFAULT (getdate()) FOR [create_ts]
GO
ALTER TABLE [HD_DtlClm].[dim_city_province_country_t] ADD DEFAULT (getdate()) FOR [update_ts]
GO
Try running: DBCC CHECKIDENT ('HD_DtlClm.[dim_city_province_country_t]'); look at the results returned in the messages tab & make sure the current identity value is equal to or higher than the current column value. NB running this may even fix the problem itself.
To expand: looks like something had reseeded your identity column, so the insert was causing duplicates to be picked up. Don't think there's any way to check historically what changed it; the most likely candidates are the DBCC CHECKIDENT command with RESEED option, or a TRUNCATE operation (will reseed to the original value).

Updating a table after adding Index

I am designing a database using SQLExpress.
I have a table which has three columns. The table looks as below.
CREATE TABLE [dbo].[dummy](
[id] [int] IDENTITY(1,1) NOT NULL,
[someLongString] [text] NOT NULL,
[someLongText_Hash] [binary](20) NOT NULL,
CONSTRAINT [PK_dummy] PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
I already have some data in this table. Whenever I want to add a new row, I first compute a hash on someLongString and query the table to see if a row with this hash already exists. As the table size grows, this query talks longer time and hence I plan to index it by the someLongText_Hash column.
Can some please suggest how to do this in SQL Server Management Studio. Also, after adding this index, how do I index the existing rows in this table ?
Why can't you just set the 'someLongString' field to be unique? That way you don't need to keep a hash and an extra primary key?
You could try using a CHECKSUM.
CREATE TABLE [dbo].[dummy](
[id] [int] IDENTITY(1,1) NOT NULL,
[someLongString] [text] NOT NULL,
[someLongText_CheckSum] NOT NULL,
CONSTRAINT [UC_someLongText_CheckSum] UNIQUE (someLongText_CheckSum),
CONSTRAINT [PK_dummy] PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
See here for further explanation

Which approach is better for this scenario?

We have the following table:
CREATE TABLE [dbo].[CampaignCustomer](
[ID] [int] IDENTITY(1,1) NOT NULL,
[CampaignID] [int] NOT NULL,
[CustomerID] [int] NULL,
[CouponCode] [nvarchar](20) NOT NULL,
[CreatedDate] [datetime] NOT NULL,
[ModifiedDate] [datetime] NULL,
[Active] [bit] NOT NULL,
CONSTRAINT [PK_CampaignCustomer] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
and the following Unique Index:
CREATE UNIQUE NONCLUSTERED INDEX [IX_CampaignCustomer_CouponCode] ON [dbo].[CampaignCustomer]
(
[CouponCode] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 20) ON [PRIMARY]
GO
We do pretty constant queries using the CouponCode and other foreign keys (not shown above for simplicity). The CampaignCustomer table has almost 4 million records and growing. We also do campaigns that don't require Coupon Codes and therefore we don't insert those records. Now we need to also start tracking those campaigns as well for another purpose. So we have 2 options:
We change the CouponCode column ot allow nulls and create a unique filetered index to not include nulls and allow the table to grow even bigger and faster.
Create a separate table for tracking all campaigns for this specific purpose.
Keep in mind that the CampaignCustomer table is used very often for redeeming coupons and inserting new ones. Bottom line is we don't want our customer to redeem a coupon and stay waiting until they give up or for other processes to fail. So, from an efficiency perspective, which option do you think is best and why?
I'd go for the filtered index... you're storing the same data so keep it in the same table.
Splitting the table is refactoring when you probably don't need it and adds complexity.
Do you have problems with 4 million rows? It's not that much especially for such a narrow table
I'm against a duplicate table for the sake of a single column
Allowing the couponcode to be null means that someone could accidentally create a record where the value is NULL when it should be a valid couponcode
I would create a couponcode that indicates as being a non-coupon rather than resorting to indicator columns "isCoupon" or "isNonCouponCampaign", and use a filtered index to ignore the "nocoupon" value.
Which leads to my next point - I don't see a foreign key reference, but it would be key to knowing what coupons existed and which ones were actually used. Some of the columns in the existing table could be moved up to the parent couponcode table...

Resources