Best performance design with time-series in sql-server - sql-server

(TL;DR)
The problem to solve with the design:
fast retrieval of related time-series with different frequency.
The tool:
A sql server table and index design.
The longer version:
I wish to calculate different functions at one or mere specific times or intervals with input data from time-series with different resolutions. And my intuition tells me that I need to think extra about the table/index design, given that the object is to have a fast join of the rows.
The designs advice I have seen so far is mostly concerned with retrieving a single time-series vs the problem a hand here, retrieve values from different time-series at the same point of time. Table design for multiple time series data
My purposed overall design, is the following:
CREATE TABLE [dbo].[time_series_definition](
[ID] [int] IDENTITY(1,1) NOT NULL,
[data_type_description] [nvarchar](100) NULL,
[duration_sec] [int] NOT NULL,
CONSTRAINT [PK_time_series_definition] PRIMARY KEY CLUSTERED
(
[ID] ASC
))
CREATE TABLE [dbo].[time_series](
[ID] [int] IDENTITY(1,1) NOT NULL,
[start_date] [date] NOT NULL,
[end_date] [date] NOT NULL,
[time_series_definition_ID] [int] NOT NULL,
[source] [nchar](30) NULL,
[description] [nvarchar](100) NULL,
[update_time] [datetime2](0) NOT NULL,
CONSTRAINT [PK_time_series] PRIMARY KEY CLUSTERED
(
[ID] ASC
))
ALTER TABLE [dbo].[time_series] WITH CHECK ADD CONSTRAINT [FK_time_series_time_series_definition] FOREIGN KEY([time_series_definition_ID])
REFERENCES [dbo].[time_series_definition] ([ID])
CREATE TABLE [dbo].[data_values](
[ID] [int] IDENTITY(1,1) NOT NULL,
[date_time] [datetime2](0) NOT NULL,
[time_series_ID] [int] NOT NULL,
[value] [decimal](19, 8) NULL,
CONSTRAINT [PK_data_values] PRIMARY KEY CLUSTERED
(
[ID] ASC
))
ALTER TABLE [dbo].[data_values] WITH CHECK ADD CONSTRAINT [FK_data_values_time_series] FOREIGN KEY([time_series_ID])
REFERENCES [dbo].[time_series] ([ID])
The values [start_date], [end_date] are redundant, but believe that the might improve query speed, when the start/end of the series is know prior to lookup in the [data_values] table.
The [duration_sec] is to save space in [data_values] table since the series are evenly space within a specific series.
So given this design what is the best index/partition strategy to enable fast lookup of different series at a given time or time-interval.

Related

Index for identity in SQL Server automatic or not

I have a huge table for logging. The definition is:
CREATE TABLE [dbo].[TRACELOG]
(
[ID] [int] IDENTITY(1,1) NOT NULL,
[TYPE] [varchar](15) NOT NULL,
[DATEHEURE] [datetime] NOT NULL,
[PROGRAMME] [varchar](25) NOT NULL,
[APPLICATION] [varchar](50) NOT NULL,
[DESCRIPTION] [text] NULL,
[UTILISATEUR] [varchar](10) NOT NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
Indexes are like this:
The table has now about 18 millions or row. When I run a query using ID = 123456, the query is very long.
SELECT *
FROM TRACELOG
WHERE ID = 123456
I'm very surprised... My question is: in a table with IDENTITY, is there an implicit index created on the column in question (not visible in indexes?) or have I to create manually?
NO - having an IDENTITY column does not automatically create an index.
What does create an automatic (and by default clustered) index is the PRIMARY KEY constraint - which is often used on IDENTITY columns.
But not every IDENTITY column has to be the primary key of its table - you have to specify that if you want it that way.

Can SQL Server table have a foreign key to a table that resolves to many records?

Consider the following table...
CREATE TABLE [dbo].[Alerts]
(
[Id] [int] IDENTITY(1,1) NOT NULL,
[I18NMessageKey] [uniqueidentifier] NOT NULL
PRIMARY KEY CLUSTERED ([Id] ASC)
) ON [PRIMARY]
GO
and the following table...
CREATE TABLE [dbo].[I18NMessages]
(
[Id] [int] IDENTITY(1,1) NOT NULL,
[Key] [uniqueidentifier] NOT NULL,
[Culture] [nvarchar](200) NOT NULL,
[Message] [nvarchar](max) NOT NULL,
PRIMARY KEY CLUSTERED ([Id] ASC)
) ON [PRIMARY]
GO
I would like to add a foreign key constraint to table [Alerts] on the column [I18NMessageKey] to refer to many records in table [I18NMessages].
Is this possible without a third table?
The [I18NMessages] table holds the same message for the [Key] but in different languages depending on [Culture]. The relationship between [Alerts] and [I18NMessages] doesn't care about the culture. The resolution of [Culture] depends on the user at runtime.
In SQL Server, the uniqueness of the referenced key column(s) must be enforced by a primary key, unique constraint, or unique index. You need a third table with a unique I18NMessageKey column key to enforce referential integrity.
You can create a trigger and implement custom business logic

SQL Server database design for high volume stock market price data

I am writing application to store and retrieve stock market price data which the data is inserted on daily basis. I am storing the data for each asset (Stock) and for most of the market in the world. This is my current design of the tables
Country table:
CREATE TABLE [dbo].[List_Country]
(
[CountryId] [char](2) NOT NULL,
[Name] [nvarchar](100) NOT NULL,
[CurrenyCode] [nvarchar](5) NULL,
[CurrencyName] [nvarchar](50) NULL
CONSTRAINT [PK_dbo.List_Country]
PRIMARY KEY CLUSTERED ([CountryId] ASC)
)
Asset table:
CREATE TABLE [dbo].[List_Asset]
(
[AssetId] [int] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](max) NOT NULL,
[CountryId] [char](2) NOT NULL,
CONSTRAINT [PK_dbo.List_Asset]
PRIMARY KEY CLUSTERED ([AssetId] ASC)
)
Foreign key constraint on Country:
ALTER TABLE [dbo].[List_Asset] WITH CHECK
ADD CONSTRAINT [FK_dbo.List_Asset_dbo.List_Country_CountryId]
FOREIGN KEY([CountryId])
REFERENCES [dbo].[List_Country] ([CountryId])
ON DELETE CASCADE
GO
Stock_Price table:
CREATE TABLE [dbo].[Stock_Price_Data]
(
[StockPriceDataId] [int] IDENTITY(1,1) NOT NULL,
[AssetId] [int] NOT NULL,
[PriceDate] [datetime] NOT NULL,
[Open] [int] NOT NULL,
[High] [int] NOT NULL,
[Low] [int] NOT NULL,
[Close] [int] NOT NULL,
[Volume] [int] NOT NULL,
CONSTRAINT [PK_dbo.Stock_Price_Data]
PRIMARY KEY CLUSTERED ([StockPriceDataId] ASC)
)
Foreign key constraint on Asset:
ALTER TABLE [dbo].[Stock_Price_Data] WITH CHECK
ADD CONSTRAINT [FK_dbo.Stock_Price_Data_dbo.List_Asset_AssetId]
FOREIGN KEY([AssetId])
REFERENCES [dbo].[List_Asset] ([AssetId])
ON DELETE CASCADE
The concern I have at the moment is Stock_Price_Data table would be filled with high volume rows, i.e. For a specific market in a country, there can be easily 20,000 assets. Thus, in a year (260 days of trading) , I could potentially have 5.2 million rows for each country.
The application does not restrict a user from accessing data other than default country (which is setup during login).
Is it a good idea to have separate table (i.e. Stock_Price_Data_AU) for each country? Or is there a better way to design the database for the above scenario?
-Alan-
First of all - I'd drop the _data from the table name - its overkill.
If you are reasonably certain that the users will always filter the data by Country - ie only looking at 1 country at a time then I'd consider partitioning the table by Country ID - this way SQL Server will use partition elimination to pick only the relevant data. This way you get the ease of maintenance from 1 table but you get the performance as if it is a separate table per country. (I'm assuming you have Enterprise Edition) If your load works on a per country basis too then you can even switch out the partition and then drop the indexes to get even faster loads.

Slow on Retrieving data from 38GB SQL Table

I am looking for some advise. I have a SQL Server table called AuditLog and this table records any action/changes that happens to our DB from our web application.
I am trying to build some reports and anytime I try to pull data from this table it makes my query run from seconds to 10mins+. Just doing a
select * from dbo.auditlog
takes about 2hours+.
The table has 77 million rows and is growing. Anyhow, only thoughts at this moment is to do an index but that would slow down inserts. Not sure how much that would affect performance but have held back on it. Other thoughts were to partition the table or do an index view but we are running SQL Server 2014 Standard Edition and those options are not supported.
Here is the table create statement:
CREATE TABLE [dbo].[AuditLog]
(
[AuditLogId] [uniqueidentifier] NOT NULL,
[UserId] [uniqueidentifier] NULL,
[EventDateUtc] [datetime] NOT NULL,
[EventType] [char](1) NOT NULL,
[TableName] [nvarchar](100) NOT NULL,
[RecordId] [nvarchar](100) NOT NULL,
[ColumnName] [nvarchar](100) NOT NULL,
[OriginalValue] [nvarchar](max) NULL,
[NewValue] [nvarchar](max) NULL,
[Rams1RecordID] [uniqueidentifier] NULL,
[Rams1AuditHistoryID] [uniqueidentifier] NULL,
[Rams1UserID] [uniqueidentifier] NULL,
[CreatedBy] [uniqueidentifier] NULL,
[CreatedDate] [datetime] NULL DEFAULT (getdate()),
[OriginalValueNiceName] [nvarchar](100) NULL,
[NewValueNiceName] [nvarchar](100) NULL,
CONSTRAINT [PK_AuditLog]
PRIMARY KEY CLUSTERED ([TableName] ASC, [RecordId] ASC, [AuditLogId] ASC)
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
ALTER TABLE [dbo].[AuditLog] WITH NOCHECK
ADD CONSTRAINT [FK_AuditLog_User]
FOREIGN KEY([UserId]) REFERENCES [dbo].[User] ([UserID])
GO
ALTER TABLE [dbo].[AuditLog] CHECK CONSTRAINT [FK_AuditLog_User]
GO
ALTER TABLE [dbo].[AuditLog] WITH NOCHECK
ADD CONSTRAINT [FK_AuditLog_UserCreatedBy]
FOREIGN KEY([CreatedBy]) REFERENCES [dbo].[User] ([UserID])
GO
ALTER TABLE [dbo].[AuditLog] CHECK CONSTRAINT [FK_AuditLog_UserCreatedBy]
GO
With something that big there are a couple of things you might try.
The first thing you need to do is define how you accessing the table MOST of the time and index accordingly.
I would hope you are not do a select * from AuditLog without any filtering for a reporting solution - it shouldn't even be an option.
Finally, instead of indexed views or partitioning, you might consider a partitioned view.
A partitioned view is basically breaking your table up, physically into smaller meaningful tables - based on date or type or object or however you are MOST often accessing it. Each table is then indexed separately giving you much better stats and if you in 2012 or higher you can take advantage of ColumnStore, assuming you use something like a DATE to group the data.
Create a view that spans all of the tables and then report based on the view. Since you already grouped your data by how you MOST often will access it, your filter will act similarly to partition exclusion and get you to your data faster.
Of course this will result in a little more maintenance and some code change, but be well worth the effort if you are storing that much data and more in a single table.

Should I normalize this database design further?

I have the following database design:
TABLE [Document]
[DocumentId] [int] NOT NULL, --Primary Key
[Status] [bit] NULL,
[Text] [nvarchar](max) NULL,
[FolderPath] [nvarchar](max) NULL
TABLE [Metadata]
[MetadataId] [int] IDENTITY(1,1) NOT NULL, -- Primary Key
[DocumentId] [int] NOT NULL, -- Foreign Key Document.DocumentId (1:1 relationship)
[Title] [nvarchar](250) NOT NULL,
[Author] [nvarchar](250) NOT NULL
TABLE [Page](
[PageId] [int] IDENTITY(1,1) NOT NULL, -- Primary Key
[DocumentId] [int] NOT NULL, -- Foreign Key Document.DocumentId (1:N Relationship)
[Number] [int] NOT NULL,
[ImagePath] [nvarchar](max) NULL,
[PageText] [nvarchar](max) NOT NULL
TABLE [Word](
[WordId] [int] IDENTITY(1,1) NOT NULL, -- Primary Key
[PageId] [int] NOT NULL, -- Foreign Key Page.PageId (1:N Relationship)
[Text] [nvarchar](50) NOT NULL
TABLE [Keyword](
[KeywordId] [int] IDENTITY(1,1) NOT NULL, -- Primary Key
[Word] [nvarchar](50) NOT NULL
TABLE [DocumentKeyword](
[Document_DocumentId] [int] NOT NULL, -- Foreign Key Document.DocumentId (N:N Relationship)
[Keyword_KeywordId] [int] NOT NULL -- Foreign Key Keyword.KeywordId
I'm using Entity Framework Code First to create the database.
Should I be normalizing my database design further? i.e. creating link tables between Document and Page, Document and Metadata, etc.? If so, is there a way to get the Entity Framework to do create the relationship tables for me, so that I don't have to include them in my models? I'm trying to learn to do this the right and most efficient way possible.
Thank you.
Well I can't immediately answer your question, but I have some thoughts that might improve your design:
A document (in real life, at least) can be written by more than one
author. This means, that your 1:1 relationship from Document to
Metadata should be a 1:n relationship (unless you can prove that
there will never be a situation that there's more than one author)
The title of a document is (in my view) more a property of the document than a piece of metadata (also having 1. in mind)
What does this Word table do?
The column Keyword_KeywordId should be called plainly KeywordId if you want to be consistent in your naming. The same applies to
Document_DocumentId.
For the rest it looks pretty normalized

Resources