Identity column without index or unique constraint - sql-server

I just experienced a database breakdown due to sudden extradordinary data loading from disk.
I found the issue would arise when I attempted inserting into a log table with approx. 3.5 million rows. The table features an ID column set to IDENTITY, but with no indexes or unique constraints.
CREATE TABLE [dbo].[IntegrationTestLog](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Ident] [varchar](50) NULL,
[Date] [datetime] NOT NULL,
[Thread] [varchar](255) NOT NULL,
[Level] [varchar](50) NOT NULL,
[Logger] [varchar](255) NOT NULL,
[Message] [varchar](max) NOT NULL,
[Exception] [varchar](max) NULL
)
Issue triggered by this line:
INSERT INTO IntegrationTestLog ([Ident],[Date],[Thread],[Level],[Logger],[Message],[Exception]) VALUES (#Ident, #log_date, #thread, #log_level, #logger, #message, #exception)
There are possibly many other queries that will trigger it, but this one I know for sure.
Bear with me, cuz' Im only guessing now, but does the identity seeding process somehow slow down if an index is missing? Could it by any slight chance fall back to doing a MAX(ID) query to get the latest entry? (Probably not). I haven't succeeded in finding any deep technical information about the subject yet. Please share if you know some litterature or links to such.
To solve the issue, we ended up truncating the table, which itself took VERY long. I also promoted ID to be primary key.
Then I read this article: Identity columns and found that truncate actually does touch the identity seed.
A truncate table (but not delete) will update the current seed to the
original seed value.
...which again only led me to be more suspecious of the identity seed.
Again I'm searching in the dark - please enlighten me on this issue if you have the insight.

Related

Deciding clustered index in Microsoft SQL Server

We are creating this “clients” table that will have around 50 million records.
I am having a hard time deciding the ‘clustered index’.
Theory says that it should be: Unique,Narrow,Static, Ever-increasing pattern… But in practice it should be the key you use to refer to your records most often.
The table has 50 columns…
Per the first approach the CI should be:
[Client_id] [bigint] IDENTITY(1,1) NOT NULL,
But I feel tempted to use:
[SF_id] [varchar](18) NOT NULL,
or
[UpdateDate] [datetime] NOT NULL,
or
[SystemModStamp] [datetime] NOT NULL,
Reality is that I do not exactly how the end users will query the table: but, I know they will use SF_id quite often and I know they will rarely use Client_id… And I also know, me myself I will use UpdateDate or SystemModStamp (not sure yet), I will use it as the key for ‘delta’ daily merges that I will set up in a Job/SP.

Slow on Retrieving data from 38GB SQL Table

I am looking for some advise. I have a SQL Server table called AuditLog and this table records any action/changes that happens to our DB from our web application.
I am trying to build some reports and anytime I try to pull data from this table it makes my query run from seconds to 10mins+. Just doing a
select * from dbo.auditlog
takes about 2hours+.
The table has 77 million rows and is growing. Anyhow, only thoughts at this moment is to do an index but that would slow down inserts. Not sure how much that would affect performance but have held back on it. Other thoughts were to partition the table or do an index view but we are running SQL Server 2014 Standard Edition and those options are not supported.
Here is the table create statement:
CREATE TABLE [dbo].[AuditLog]
(
[AuditLogId] [uniqueidentifier] NOT NULL,
[UserId] [uniqueidentifier] NULL,
[EventDateUtc] [datetime] NOT NULL,
[EventType] [char](1) NOT NULL,
[TableName] [nvarchar](100) NOT NULL,
[RecordId] [nvarchar](100) NOT NULL,
[ColumnName] [nvarchar](100) NOT NULL,
[OriginalValue] [nvarchar](max) NULL,
[NewValue] [nvarchar](max) NULL,
[Rams1RecordID] [uniqueidentifier] NULL,
[Rams1AuditHistoryID] [uniqueidentifier] NULL,
[Rams1UserID] [uniqueidentifier] NULL,
[CreatedBy] [uniqueidentifier] NULL,
[CreatedDate] [datetime] NULL DEFAULT (getdate()),
[OriginalValueNiceName] [nvarchar](100) NULL,
[NewValueNiceName] [nvarchar](100) NULL,
CONSTRAINT [PK_AuditLog]
PRIMARY KEY CLUSTERED ([TableName] ASC, [RecordId] ASC, [AuditLogId] ASC)
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
ALTER TABLE [dbo].[AuditLog] WITH NOCHECK
ADD CONSTRAINT [FK_AuditLog_User]
FOREIGN KEY([UserId]) REFERENCES [dbo].[User] ([UserID])
GO
ALTER TABLE [dbo].[AuditLog] CHECK CONSTRAINT [FK_AuditLog_User]
GO
ALTER TABLE [dbo].[AuditLog] WITH NOCHECK
ADD CONSTRAINT [FK_AuditLog_UserCreatedBy]
FOREIGN KEY([CreatedBy]) REFERENCES [dbo].[User] ([UserID])
GO
ALTER TABLE [dbo].[AuditLog] CHECK CONSTRAINT [FK_AuditLog_UserCreatedBy]
GO
With something that big there are a couple of things you might try.
The first thing you need to do is define how you accessing the table MOST of the time and index accordingly.
I would hope you are not do a select * from AuditLog without any filtering for a reporting solution - it shouldn't even be an option.
Finally, instead of indexed views or partitioning, you might consider a partitioned view.
A partitioned view is basically breaking your table up, physically into smaller meaningful tables - based on date or type or object or however you are MOST often accessing it. Each table is then indexed separately giving you much better stats and if you in 2012 or higher you can take advantage of ColumnStore, assuming you use something like a DATE to group the data.
Create a view that spans all of the tables and then report based on the view. Since you already grouped your data by how you MOST often will access it, your filter will act similarly to partition exclusion and get you to your data faster.
Of course this will result in a little more maintenance and some code change, but be well worth the effort if you are storing that much data and more in a single table.

Strategy to keep updated summary tables standing by (SQL Server)

I've got a client portal project (the first one I've developed so a basic best practice is what I'm looking for here, nothing fancy) nearing first release.
A simplification of the main record types used in reporting is the following:
CREATE TABLE [dbo].[conversions](
[conversion_id] [nvarchar](128) primary key NOT NULL,
[click_id] [int] NULL,
[conversion_date] [datetime] NOT NULL,
[last_updated] [datetime] NULL,
[click_date] [datetime] NULL,
[affiliate_affiliate_id] [int] NOT NULL,
[advertiser_advertiser_id] [int] NOT NULL,
[offer_offer_id] [int] NOT NULL,
[creative_creative_id] [int] NOT NULL,
[conversion_type] [nvarchar](max) NULL)
CREATE TABLE [dbo].[clicks](
[click_id] [int] primary key NOT NULL,
[click_date] [datetime] NOT NULL,
[affiliate_affiliate_id] [int] NOT NULL,
[advertiser_advertiser_id] [int] NOT NULL,
[offer_offer_id] [int] NOT NULL,
[campaign_id] [int] NOT NULL,
[creative_creative_id] [int] NOT NULL,
[ip_address] [nvarchar](max) NULL,
[user_agent] [nvarchar](max) NULL,
[referrer_url] [nvarchar](max) NULL,
[region_region_code] [nvarchar](max) NULL,
[total_clicks] [int] NOT NULL)
My specific question is: given millions of rows in each table, what mechanism is used to serve up summary reports quickly on demand given you know all the possible reports that can be requested?
The starting point, performance wise, doing raw queries against a 18 months worth of data for the busiest client is yielding a 3 to 5 second latency on my dashboard and the worst case is upwards of 10 seconds for a summary report with a custom date range spanning all the rows.
I know I can cache them after the first hit, but I want snappy performance on the first hit.
My feeling is this is a fundamental aspect of an application of this nature and that there are tons of applications like this out there, so is there an already well-thought-out method to pre-calculating tables that already did the grouping and aggregation? Then how do you keep them up to date? Do you use SQL agent and custom console apps that brute force the calculations before hand?
Any general pointers would be very appreciated..
Both tables are time series. They seem to be clustered by an ID column which has little value for how time series are queried. Time series are almost always queried by date range, so your clustered organization should service this type of queries first and foremost: cluster by date, move the ID primary key constraint into a non-clustered.
CREATE TABLE [dbo].[conversions](
[conversion_id] [nvarchar](128) NOT NULL,
[conversion_date] [datetime] NOT NULL,
...
constraint pk_conversions nonclustered primary key ([conversion_id]))
go
create clustered index [cdx_conversions] on [dbo].[conversions]([conversion_date]);
go
CREATE TABLE [dbo].[clicks](
[click_id] [int] NOT NULL,
[click_date] [datetime] NOT NULL,
...
constraint [pk_clicks] nonclustered [click_id]);
go
create clustered index [cdx_clicks] on [dbo].[clicks]([click_date]);
This model will serve the typical queries that filter by a range on [click_date] and on [conversion_date]. For any other query the answer will be very specific to your query.
There are limits on how useful a relational row organized model can be for an OLAP/DW workload like yours. Specialized tools do a better job at it. Columnstore indexes can deliver amazingly fast responses, but they are difficult to update. Creating a MOLAP cube can also deliver blazing results but that is a serious project undertaking. There are even specialized time series databases out there.

DB advice needed for performance of a 'SessionVisit' table

I have a 'SessionVisit' table which collects data about user visits.
The script for this table is below. There may be 25,000 rows added a day.
The table CREATE statement is below. My database knowledge is definitely not up to scratch as far as understanding the implications of such a schema.
Can anyone give me their 2c of advice on some of these issues :
Do I need to worry about ROWSIZE for this schema for SQL Server 2008. I'm not even sure how the 8kb rowsize works in 2008. I don't even know if I'm wasting a lot of space if I'm not using all 8kb?
How should I purge old records I don't want. Will new rows fill in the empty spaces from dropped rows?
Any advice on indexes
I know this is quite general in nature. Any 'obvious' or non obvious info would be appreciated.
Here's the table :
USE [MyDatabase]
GO
/****** Object: Table [dbo].[SessionVisit] Script Date: 06/06/2009 16:55:05 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[SessionVisit](
[SessionGUID] [uniqueidentifier] NOT NULL,
[SessionVisitId] [int] IDENTITY(1,1) NOT NULL,
[timestamp] [timestamp] NOT NULL,
[SessionDate] [datetime] NOT NULL CONSTRAINT [DF_SessionVisit_SessionDate] DEFAULT (getdate()),
[UserGUID] [uniqueidentifier] NOT NULL,
[CumulativeVisitCount] [int] NOT NULL CONSTRAINT [DF_SessionVisit_CumulativeVisitCount] DEFAULT ((0)),
[SiteUserId] [int] NULL,
[FullEntryURL] [varchar](255) NULL,
[SiteCanonicalURL] [varchar](100) NULL,
[StoreCanonicalURL] [varchar](100) NULL,
[CampaignId] [int] NULL,
[CampaignKey] [varchar](50) NULL,
[AdKeyword] [varchar](50) NULL,
[PartnerABVersion] [varchar](10) NULL,
[ABVersion] [varchar](10) NULL,
[UserAgent] [varchar](255) NULL,
[Referer] [varchar](255) NULL,
[KnownRefererId] [int] NULL,
[HostAddress] [varchar](20) NULL,
[HostName] [varchar](100) NULL,
[Language] [varchar](50) NULL,
[SessionLog] [xml] NULL,
[OrderDate] [datetime] NULL,
[OrderId] [varchar](50) NULL,
[utmcc] [varchar](1024) NULL,
[TestSession] [bit] NOT NULL CONSTRAINT [DF_SessionVisit_TestSession] DEFAULT ((0)),
[Bot] [bit] NULL,
CONSTRAINT [PK_SessionVisit] PRIMARY KEY CLUSTERED
(
[SessionGUID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
ALTER TABLE [dbo].[SessionVisit] WITH CHECK ADD CONSTRAINT [FK_SessionVisit_KnownReferer] FOREIGN KEY([KnownRefererId])
REFERENCES [dbo].[KnownReferer] ([KnownRefererId])
GO
ALTER TABLE [dbo].[SessionVisit] CHECK CONSTRAINT [FK_SessionVisit_KnownReferer]
GO
ALTER TABLE [dbo].[SessionVisit] WITH CHECK ADD CONSTRAINT [FK_SessionVisit_SiteUser] FOREIGN KEY([SiteUserId])
REFERENCES [dbo].[SiteUser] ([SiteUserId])
GO
ALTER TABLE [dbo].[SessionVisit] CHECK CONSTRAINT [FK_SessionVisit_SiteUser]
I see SessionGUID and SessionVisitId, why have both a uniqueidentifier and an Identity(1,1) on the same table? Seems redundant to me.
I see referer and knownrefererid, think about getting the referer from the knownrefererid if possible. This will help reduce excess writes.
I see campaignkey and campaignid, again if possible get from the campaigns table if possible.
I see orderid and orderdate. I'm sure you can get the order date from the orders table, correct?
I see hostaddress and hostname, do you really need the name? Usually the hostname doesn't serve much purpose and can be easily misleading.
I see multiple dates and timestamps, is any of this duplicate?
How about that SessionLog column? I see that it's XML. Is it a lot of data, is it data you may already have in other columns? If so get rid of the XML or the duplicated columns. Using SQL 2008 you can parse data out of that XML column when reporting and possibly eliminate a few extra columns (thus writes). Are you going to be in trouble in the future when developers add more to that XML? XML to me just screams 'a lot of excessive writing'.
Mitch says to remove the primary key. Personally I would leave the index on the table. Since it is clustered that will help speed up write times as the DB will always write new rows at the end of the table on the disk.
Strip out some of this duplicate information and you'll probably do just fine writing a row each visit.
Well, I'd recommend NOT inserting a few k of data with EVERY page!
First thing I'd do would be to see how much of this information I could get from a 3rd party analytics tool, perhaps combined with log analysis. That should allow you to drop a lot of the fields.
25k inserts a days isn't much, but the catch here is that busier your site gets, the more load this is going to put on the db. Perhaps you could build a queuing system that batches the writes, but really, most of this information is already in the logs.
Agre with Chris that you would probably be better off using log analysis (check out Microsoft's free Log Parser)
Failing that, I would remove the Foreign Key constraints from your SessionVisit table.
You mentioned rowsize; the varchar's in your table do not pre-allocate to their maximum length (more 4 + 4 bytes for an empty field (approx.)). But saying that, a general rule is to keep rows as 'lean' as possible.
Also, I would remove the primary key from the SessionGUID (GUID) column. It won't help you much.
That's also an awful lot of nulls in that table. I think you should group together the columns that must be non-null at the same time. In fact, you should do a better analysis of the data you're writing, rather than lumping it all together in a single table.

SQL Server index advice performance

I'm looking for some advice to how to get the indexes running better on this query...
SQL Server 2005/8 some customers have 5 some 8...
SELECT sales.ChainStoreId,
sales.CashBoxId,
dbo.DateOnly2(sales.BonDate),
MAX(sales.BonDate),
SUM(sales.SumPrice)
FROM [BACK_CDM_CLEAN_BOLTEN].[dbo].[CashBoxSales] sales
WHERE sales.BonType in ('B','P','W')
AND Del = 0
AND sales.BonDate >= #minDate
GROUP BY sales.ChainStoreId,
sales.CashBoxId,
dbo.DateOnly2(sales.BonDate)
Table looks like the following
CREATE TABLE [dbo].[CashBoxSales](
[SalesRowId] [int] IDENTITY(1,1) NOT NULL,
[ChainStoreId] [int] NOT NULL,
[CashBoxId] [int] NOT NULL,
[BonType] [char](1) NOT NULL,
[BonDate] [datetime] NOT NULL,
[BonNr] [nvarchar](20) NULL,
[SumPrice] [money] NOT NULL,
[Discount] [money] NOT NULL,
[EmployeeId] [int] NULL,
[DayOfValidity] [datetime] NOT NULL,
[ProcStatus] [int] NOT NULL,
[Del] [int] NOT NULL,
[InsertedDate] [datetime] NOT NULL,
[LastUpdate] [datetime] NOT NULL,
What would be the correct ordering of the index columns, covered or composite etc.
The table has up to 10 mil rows. There are other similar selects but I'm hoping from the advice getting this one up to speed (Its the most important) I can tweak a few others.
Many thanks!
When you have your query in SQL Server Management Studio, just select "Analyze Query in Database Tuning Advisor" from the context menu, and off you go!
Mind you: this only tweaks this one single query in isolation! Adding indices here to speed this one query up might adversely affect other parts of your application. An index always comes with overhead - inserts and deletes tend to be slower.
Also, don't blindly implement all the recommendations of the DTA - use your own judgment as to whether an index makes sense or not.
And lastly: measure, measure, measure! Measure your performance before any changes as a baseline, then measure again and again after you've made the changes and compare.
My best advice is to run this query through the SQL Profiler. It will recommend some indexes for you to try.
Also, you might try setting up a partitioned table and use one of your GROUP BY columns as the partitioning key.
Off the top of my head I would start with
INDEX (BonType, Del, BonDate)
Or even just
INDEX (BonType, BonDate)
I would recomend using an Index Analyzer, Proflier and Benchmarking various combinations.

Resources