Strategy to keep updated summary tables standing by (SQL Server) - sql-server

I've got a client portal project (the first one I've developed so a basic best practice is what I'm looking for here, nothing fancy) nearing first release.
A simplification of the main record types used in reporting is the following:
CREATE TABLE [dbo].[conversions](
[conversion_id] [nvarchar](128) primary key NOT NULL,
[click_id] [int] NULL,
[conversion_date] [datetime] NOT NULL,
[last_updated] [datetime] NULL,
[click_date] [datetime] NULL,
[affiliate_affiliate_id] [int] NOT NULL,
[advertiser_advertiser_id] [int] NOT NULL,
[offer_offer_id] [int] NOT NULL,
[creative_creative_id] [int] NOT NULL,
[conversion_type] [nvarchar](max) NULL)
CREATE TABLE [dbo].[clicks](
[click_id] [int] primary key NOT NULL,
[click_date] [datetime] NOT NULL,
[affiliate_affiliate_id] [int] NOT NULL,
[advertiser_advertiser_id] [int] NOT NULL,
[offer_offer_id] [int] NOT NULL,
[campaign_id] [int] NOT NULL,
[creative_creative_id] [int] NOT NULL,
[ip_address] [nvarchar](max) NULL,
[user_agent] [nvarchar](max) NULL,
[referrer_url] [nvarchar](max) NULL,
[region_region_code] [nvarchar](max) NULL,
[total_clicks] [int] NOT NULL)
My specific question is: given millions of rows in each table, what mechanism is used to serve up summary reports quickly on demand given you know all the possible reports that can be requested?
The starting point, performance wise, doing raw queries against a 18 months worth of data for the busiest client is yielding a 3 to 5 second latency on my dashboard and the worst case is upwards of 10 seconds for a summary report with a custom date range spanning all the rows.
I know I can cache them after the first hit, but I want snappy performance on the first hit.
My feeling is this is a fundamental aspect of an application of this nature and that there are tons of applications like this out there, so is there an already well-thought-out method to pre-calculating tables that already did the grouping and aggregation? Then how do you keep them up to date? Do you use SQL agent and custom console apps that brute force the calculations before hand?
Any general pointers would be very appreciated..

Both tables are time series. They seem to be clustered by an ID column which has little value for how time series are queried. Time series are almost always queried by date range, so your clustered organization should service this type of queries first and foremost: cluster by date, move the ID primary key constraint into a non-clustered.
CREATE TABLE [dbo].[conversions](
[conversion_id] [nvarchar](128) NOT NULL,
[conversion_date] [datetime] NOT NULL,
...
constraint pk_conversions nonclustered primary key ([conversion_id]))
go
create clustered index [cdx_conversions] on [dbo].[conversions]([conversion_date]);
go
CREATE TABLE [dbo].[clicks](
[click_id] [int] NOT NULL,
[click_date] [datetime] NOT NULL,
...
constraint [pk_clicks] nonclustered [click_id]);
go
create clustered index [cdx_clicks] on [dbo].[clicks]([click_date]);
This model will serve the typical queries that filter by a range on [click_date] and on [conversion_date]. For any other query the answer will be very specific to your query.
There are limits on how useful a relational row organized model can be for an OLAP/DW workload like yours. Specialized tools do a better job at it. Columnstore indexes can deliver amazingly fast responses, but they are difficult to update. Creating a MOLAP cube can also deliver blazing results but that is a serious project undertaking. There are even specialized time series databases out there.

Related

Best performance design with time-series in sql-server

(TL;DR)
The problem to solve with the design:
fast retrieval of related time-series with different frequency.
The tool:
A sql server table and index design.
The longer version:
I wish to calculate different functions at one or mere specific times or intervals with input data from time-series with different resolutions. And my intuition tells me that I need to think extra about the table/index design, given that the object is to have a fast join of the rows.
The designs advice I have seen so far is mostly concerned with retrieving a single time-series vs the problem a hand here, retrieve values from different time-series at the same point of time. Table design for multiple time series data
My purposed overall design, is the following:
CREATE TABLE [dbo].[time_series_definition](
[ID] [int] IDENTITY(1,1) NOT NULL,
[data_type_description] [nvarchar](100) NULL,
[duration_sec] [int] NOT NULL,
CONSTRAINT [PK_time_series_definition] PRIMARY KEY CLUSTERED
(
[ID] ASC
))
CREATE TABLE [dbo].[time_series](
[ID] [int] IDENTITY(1,1) NOT NULL,
[start_date] [date] NOT NULL,
[end_date] [date] NOT NULL,
[time_series_definition_ID] [int] NOT NULL,
[source] [nchar](30) NULL,
[description] [nvarchar](100) NULL,
[update_time] [datetime2](0) NOT NULL,
CONSTRAINT [PK_time_series] PRIMARY KEY CLUSTERED
(
[ID] ASC
))
ALTER TABLE [dbo].[time_series] WITH CHECK ADD CONSTRAINT [FK_time_series_time_series_definition] FOREIGN KEY([time_series_definition_ID])
REFERENCES [dbo].[time_series_definition] ([ID])
CREATE TABLE [dbo].[data_values](
[ID] [int] IDENTITY(1,1) NOT NULL,
[date_time] [datetime2](0) NOT NULL,
[time_series_ID] [int] NOT NULL,
[value] [decimal](19, 8) NULL,
CONSTRAINT [PK_data_values] PRIMARY KEY CLUSTERED
(
[ID] ASC
))
ALTER TABLE [dbo].[data_values] WITH CHECK ADD CONSTRAINT [FK_data_values_time_series] FOREIGN KEY([time_series_ID])
REFERENCES [dbo].[time_series] ([ID])
The values [start_date], [end_date] are redundant, but believe that the might improve query speed, when the start/end of the series is know prior to lookup in the [data_values] table.
The [duration_sec] is to save space in [data_values] table since the series are evenly space within a specific series.
So given this design what is the best index/partition strategy to enable fast lookup of different series at a given time or time-interval.

Deciding clustered index in Microsoft SQL Server

We are creating this “clients” table that will have around 50 million records.
I am having a hard time deciding the ‘clustered index’.
Theory says that it should be: Unique,Narrow,Static, Ever-increasing pattern… But in practice it should be the key you use to refer to your records most often.
The table has 50 columns…
Per the first approach the CI should be:
[Client_id] [bigint] IDENTITY(1,1) NOT NULL,
But I feel tempted to use:
[SF_id] [varchar](18) NOT NULL,
or
[UpdateDate] [datetime] NOT NULL,
or
[SystemModStamp] [datetime] NOT NULL,
Reality is that I do not exactly how the end users will query the table: but, I know they will use SF_id quite often and I know they will rarely use Client_id… And I also know, me myself I will use UpdateDate or SystemModStamp (not sure yet), I will use it as the key for ‘delta’ daily merges that I will set up in a Job/SP.

Slow on Retrieving data from 38GB SQL Table

I am looking for some advise. I have a SQL Server table called AuditLog and this table records any action/changes that happens to our DB from our web application.
I am trying to build some reports and anytime I try to pull data from this table it makes my query run from seconds to 10mins+. Just doing a
select * from dbo.auditlog
takes about 2hours+.
The table has 77 million rows and is growing. Anyhow, only thoughts at this moment is to do an index but that would slow down inserts. Not sure how much that would affect performance but have held back on it. Other thoughts were to partition the table or do an index view but we are running SQL Server 2014 Standard Edition and those options are not supported.
Here is the table create statement:
CREATE TABLE [dbo].[AuditLog]
(
[AuditLogId] [uniqueidentifier] NOT NULL,
[UserId] [uniqueidentifier] NULL,
[EventDateUtc] [datetime] NOT NULL,
[EventType] [char](1) NOT NULL,
[TableName] [nvarchar](100) NOT NULL,
[RecordId] [nvarchar](100) NOT NULL,
[ColumnName] [nvarchar](100) NOT NULL,
[OriginalValue] [nvarchar](max) NULL,
[NewValue] [nvarchar](max) NULL,
[Rams1RecordID] [uniqueidentifier] NULL,
[Rams1AuditHistoryID] [uniqueidentifier] NULL,
[Rams1UserID] [uniqueidentifier] NULL,
[CreatedBy] [uniqueidentifier] NULL,
[CreatedDate] [datetime] NULL DEFAULT (getdate()),
[OriginalValueNiceName] [nvarchar](100) NULL,
[NewValueNiceName] [nvarchar](100) NULL,
CONSTRAINT [PK_AuditLog]
PRIMARY KEY CLUSTERED ([TableName] ASC, [RecordId] ASC, [AuditLogId] ASC)
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
ALTER TABLE [dbo].[AuditLog] WITH NOCHECK
ADD CONSTRAINT [FK_AuditLog_User]
FOREIGN KEY([UserId]) REFERENCES [dbo].[User] ([UserID])
GO
ALTER TABLE [dbo].[AuditLog] CHECK CONSTRAINT [FK_AuditLog_User]
GO
ALTER TABLE [dbo].[AuditLog] WITH NOCHECK
ADD CONSTRAINT [FK_AuditLog_UserCreatedBy]
FOREIGN KEY([CreatedBy]) REFERENCES [dbo].[User] ([UserID])
GO
ALTER TABLE [dbo].[AuditLog] CHECK CONSTRAINT [FK_AuditLog_UserCreatedBy]
GO
With something that big there are a couple of things you might try.
The first thing you need to do is define how you accessing the table MOST of the time and index accordingly.
I would hope you are not do a select * from AuditLog without any filtering for a reporting solution - it shouldn't even be an option.
Finally, instead of indexed views or partitioning, you might consider a partitioned view.
A partitioned view is basically breaking your table up, physically into smaller meaningful tables - based on date or type or object or however you are MOST often accessing it. Each table is then indexed separately giving you much better stats and if you in 2012 or higher you can take advantage of ColumnStore, assuming you use something like a DATE to group the data.
Create a view that spans all of the tables and then report based on the view. Since you already grouped your data by how you MOST often will access it, your filter will act similarly to partition exclusion and get you to your data faster.
Of course this will result in a little more maintenance and some code change, but be well worth the effort if you are storing that much data and more in a single table.

best way to archive records

I have a table which named "RawNews" with the following fields:
[NewsID] [decimal](18, 0) IDENTITY(1,1) NOT NULL,
[Title] [nvarchar](200) NULL,
[Description] [nvarchar](500) NULL,
[Text] [ntext] NULL,
[RegDate] [nvarchar](50) NULL,
[RegTime] [time](0) NULL,
[Status] [nvarchar](300) NULL,
[Tags] [nvarchar](50) NULL,
[SecurityLevelID] [smallint] NULL,
[IsDeleted] [bit] NULL,
[DelDate] [nchar](10) NULL,
[UserName] [nvarchar](50) NULL,
and another table named "UsedNews" which has the same fields plus
some other fields.
and there are some other tables realted to these tables like uploads,images,newsGroups,NewsRooms,Users and etc.
in the RawNews I have 100 records each day. and these 100 recors for UsedNews.
these information should be kept for like eternity.
I wanted to ask you to advice me for a good way too make archive of records, so the search and filtering performance remains good.
and another question: I have log table which logs every event in the system, should i keep it in a seperate database or not?
thanks a lot
I'd suggest table partitioning to solve this. The RegDate can be used to determine whether the records should be moved to the archive group or not.
You could choose to place the archive table on a separate disk so when searches on it might take place, it'll have the least possible effect on the rest of the "live" database.
There are many ways to do this..
You can use appropriate "Replication" functionality of SQL Server. In your case Snapshot Replication type seems appropriate for more details Click
You can create windows service to move data from RawNews to UsedNews.
You can create an exe to move data and using windows task scheduler you can call that exe at specific time when there is less load on server.

SQL Server index advice performance

I'm looking for some advice to how to get the indexes running better on this query...
SQL Server 2005/8 some customers have 5 some 8...
SELECT sales.ChainStoreId,
sales.CashBoxId,
dbo.DateOnly2(sales.BonDate),
MAX(sales.BonDate),
SUM(sales.SumPrice)
FROM [BACK_CDM_CLEAN_BOLTEN].[dbo].[CashBoxSales] sales
WHERE sales.BonType in ('B','P','W')
AND Del = 0
AND sales.BonDate >= #minDate
GROUP BY sales.ChainStoreId,
sales.CashBoxId,
dbo.DateOnly2(sales.BonDate)
Table looks like the following
CREATE TABLE [dbo].[CashBoxSales](
[SalesRowId] [int] IDENTITY(1,1) NOT NULL,
[ChainStoreId] [int] NOT NULL,
[CashBoxId] [int] NOT NULL,
[BonType] [char](1) NOT NULL,
[BonDate] [datetime] NOT NULL,
[BonNr] [nvarchar](20) NULL,
[SumPrice] [money] NOT NULL,
[Discount] [money] NOT NULL,
[EmployeeId] [int] NULL,
[DayOfValidity] [datetime] NOT NULL,
[ProcStatus] [int] NOT NULL,
[Del] [int] NOT NULL,
[InsertedDate] [datetime] NOT NULL,
[LastUpdate] [datetime] NOT NULL,
What would be the correct ordering of the index columns, covered or composite etc.
The table has up to 10 mil rows. There are other similar selects but I'm hoping from the advice getting this one up to speed (Its the most important) I can tweak a few others.
Many thanks!
When you have your query in SQL Server Management Studio, just select "Analyze Query in Database Tuning Advisor" from the context menu, and off you go!
Mind you: this only tweaks this one single query in isolation! Adding indices here to speed this one query up might adversely affect other parts of your application. An index always comes with overhead - inserts and deletes tend to be slower.
Also, don't blindly implement all the recommendations of the DTA - use your own judgment as to whether an index makes sense or not.
And lastly: measure, measure, measure! Measure your performance before any changes as a baseline, then measure again and again after you've made the changes and compare.
My best advice is to run this query through the SQL Profiler. It will recommend some indexes for you to try.
Also, you might try setting up a partitioned table and use one of your GROUP BY columns as the partitioning key.
Off the top of my head I would start with
INDEX (BonType, Del, BonDate)
Or even just
INDEX (BonType, BonDate)
I would recomend using an Index Analyzer, Proflier and Benchmarking various combinations.

Resources