Slow on Retrieving data from 38GB SQL Table - sql-server

I am looking for some advise. I have a SQL Server table called AuditLog and this table records any action/changes that happens to our DB from our web application.
I am trying to build some reports and anytime I try to pull data from this table it makes my query run from seconds to 10mins+. Just doing a
select * from dbo.auditlog
takes about 2hours+.
The table has 77 million rows and is growing. Anyhow, only thoughts at this moment is to do an index but that would slow down inserts. Not sure how much that would affect performance but have held back on it. Other thoughts were to partition the table or do an index view but we are running SQL Server 2014 Standard Edition and those options are not supported.
Here is the table create statement:
CREATE TABLE [dbo].[AuditLog]
(
[AuditLogId] [uniqueidentifier] NOT NULL,
[UserId] [uniqueidentifier] NULL,
[EventDateUtc] [datetime] NOT NULL,
[EventType] [char](1) NOT NULL,
[TableName] [nvarchar](100) NOT NULL,
[RecordId] [nvarchar](100) NOT NULL,
[ColumnName] [nvarchar](100) NOT NULL,
[OriginalValue] [nvarchar](max) NULL,
[NewValue] [nvarchar](max) NULL,
[Rams1RecordID] [uniqueidentifier] NULL,
[Rams1AuditHistoryID] [uniqueidentifier] NULL,
[Rams1UserID] [uniqueidentifier] NULL,
[CreatedBy] [uniqueidentifier] NULL,
[CreatedDate] [datetime] NULL DEFAULT (getdate()),
[OriginalValueNiceName] [nvarchar](100) NULL,
[NewValueNiceName] [nvarchar](100) NULL,
CONSTRAINT [PK_AuditLog]
PRIMARY KEY CLUSTERED ([TableName] ASC, [RecordId] ASC, [AuditLogId] ASC)
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
ALTER TABLE [dbo].[AuditLog] WITH NOCHECK
ADD CONSTRAINT [FK_AuditLog_User]
FOREIGN KEY([UserId]) REFERENCES [dbo].[User] ([UserID])
GO
ALTER TABLE [dbo].[AuditLog] CHECK CONSTRAINT [FK_AuditLog_User]
GO
ALTER TABLE [dbo].[AuditLog] WITH NOCHECK
ADD CONSTRAINT [FK_AuditLog_UserCreatedBy]
FOREIGN KEY([CreatedBy]) REFERENCES [dbo].[User] ([UserID])
GO
ALTER TABLE [dbo].[AuditLog] CHECK CONSTRAINT [FK_AuditLog_UserCreatedBy]
GO

With something that big there are a couple of things you might try.
The first thing you need to do is define how you accessing the table MOST of the time and index accordingly.
I would hope you are not do a select * from AuditLog without any filtering for a reporting solution - it shouldn't even be an option.
Finally, instead of indexed views or partitioning, you might consider a partitioned view.
A partitioned view is basically breaking your table up, physically into smaller meaningful tables - based on date or type or object or however you are MOST often accessing it. Each table is then indexed separately giving you much better stats and if you in 2012 or higher you can take advantage of ColumnStore, assuming you use something like a DATE to group the data.
Create a view that spans all of the tables and then report based on the view. Since you already grouped your data by how you MOST often will access it, your filter will act similarly to partition exclusion and get you to your data faster.
Of course this will result in a little more maintenance and some code change, but be well worth the effort if you are storing that much data and more in a single table.

Related

Best performance design with time-series in sql-server

(TL;DR)
The problem to solve with the design:
fast retrieval of related time-series with different frequency.
The tool:
A sql server table and index design.
The longer version:
I wish to calculate different functions at one or mere specific times or intervals with input data from time-series with different resolutions. And my intuition tells me that I need to think extra about the table/index design, given that the object is to have a fast join of the rows.
The designs advice I have seen so far is mostly concerned with retrieving a single time-series vs the problem a hand here, retrieve values from different time-series at the same point of time. Table design for multiple time series data
My purposed overall design, is the following:
CREATE TABLE [dbo].[time_series_definition](
[ID] [int] IDENTITY(1,1) NOT NULL,
[data_type_description] [nvarchar](100) NULL,
[duration_sec] [int] NOT NULL,
CONSTRAINT [PK_time_series_definition] PRIMARY KEY CLUSTERED
(
[ID] ASC
))
CREATE TABLE [dbo].[time_series](
[ID] [int] IDENTITY(1,1) NOT NULL,
[start_date] [date] NOT NULL,
[end_date] [date] NOT NULL,
[time_series_definition_ID] [int] NOT NULL,
[source] [nchar](30) NULL,
[description] [nvarchar](100) NULL,
[update_time] [datetime2](0) NOT NULL,
CONSTRAINT [PK_time_series] PRIMARY KEY CLUSTERED
(
[ID] ASC
))
ALTER TABLE [dbo].[time_series] WITH CHECK ADD CONSTRAINT [FK_time_series_time_series_definition] FOREIGN KEY([time_series_definition_ID])
REFERENCES [dbo].[time_series_definition] ([ID])
CREATE TABLE [dbo].[data_values](
[ID] [int] IDENTITY(1,1) NOT NULL,
[date_time] [datetime2](0) NOT NULL,
[time_series_ID] [int] NOT NULL,
[value] [decimal](19, 8) NULL,
CONSTRAINT [PK_data_values] PRIMARY KEY CLUSTERED
(
[ID] ASC
))
ALTER TABLE [dbo].[data_values] WITH CHECK ADD CONSTRAINT [FK_data_values_time_series] FOREIGN KEY([time_series_ID])
REFERENCES [dbo].[time_series] ([ID])
The values [start_date], [end_date] are redundant, but believe that the might improve query speed, when the start/end of the series is know prior to lookup in the [data_values] table.
The [duration_sec] is to save space in [data_values] table since the series are evenly space within a specific series.
So given this design what is the best index/partition strategy to enable fast lookup of different series at a given time or time-interval.

Simple SQL Server Delete fails

I am seeing intermittent failures upon a simple delete.
Essentially I have a temporary note that has many entries. Each entry has a classification which is a lookup value. Once this note is completed, it gets sent to a note repository, and the temporary version needs to be deleted.
I can't replicate reliably, but on occasion, when calling the stored procedure that does the delete of the temp note, only SOME of the entries get deleted. Coincidentally (?) the entry left behind has always been of one specific classification type.
After many many many attempts I was able to reproduce the issue while running SQL Server Profiler. Despite trying to catch Attention, ErrorLog, EventLog, Exception, and Execution Warnings, the resulting profile shows nothing out of the ordinary.
None of the involved tables are large. In fact they're minuscule. ~100-1000 at any given time in Entry, ~100 in Draft, 9 in Classification, 3 in Category.
I don't believe it should matter, but just in case, this stored procedure is being called from Entity Framework.
Any ideas? Any ideas on what to try for troubleshooting? I'm completely at a loss. Thanks in advance for any help.
Here is the stored procedure for deletion:
CREATE PROCEDURE [NoteDraft].[ClearNoteDraft]
#DraftId BIGINT
AS
BEGIN
SET NOCOUNT ON;
DELETE FROM NoteDraft.[Entry]
WHERE DraftId = #DraftId
DELETE FROM NoteDraft.Draft
WHERE Id = #DraftId
END
Here are the table definitions (with some columns left out for brevity as noted.)
CREATE TABLE [NoteDraft].[Category]
(
[Id] [int] IDENTITY(1,1) NOT NULL,
[Name] [varchar](50) NOT NULL,
[SortOrder] [int] NULL,
CONSTRAINT [PK_Category]
PRIMARY KEY CLUSTERED ([Id] ASC)
) ON [PRIMARY]
CREATE TABLE [NoteDraft].[Classification]
(
[Id] [int] IDENTITY(1,1) NOT NULL,
[CategoryId] [int] NOT NULL,
[Name] [varchar](50) NOT NULL,
[SortOrder] [int] NULL,
CONSTRAINT [PK_Classification]
PRIMARY KEY CLUSTERED ([Id] ASC)
)
CREATE TABLE [NoteDraft].[Draft]
(
[Id] [bigint] IDENTITY(1,1) NOT NULL,
[DateModified] [datetime] NOT NULL CONSTRAINT [DF_TestNoteDraft_DateModified] DEFAULT (getdate()),
[AccountNumber] [varchar](30) NULL,
--...10 other biz columns...
CONSTRAINT [PK_Notes]
PRIMARY KEY CLUSTERED ([Id] ASC)
) ON [PRIMARY]
CREATE TABLE [NoteDraft].[Entry]
(
[DraftId] [bigint] NOT NULL,
[ClassificationId] [int] NOT NULL,
[Body] [varchar](2100) NULL,
CONSTRAINT [PK_Entry]
PRIMARY KEY CLUSTERED ([DraftId] ASC, [ClassificationId] ASC)
) ON [PRIMARY]
ALTER TABLE [NoteDraft].[Classification] WITH CHECK
ADD CONSTRAINT [FK_Classification_Category]
FOREIGN KEY([CategoryId]) REFERENCES [NoteDraft].[Category] ([Id])
GO
ALTER TABLE [NoteDraft].[Classification] CHECK CONSTRAINT [FK_Classification_Category]
GO
ALTER TABLE [NoteDraft].[Entry] WITH CHECK
ADD CONSTRAINT [FK_Entry_Classification]
FOREIGN KEY([ClassificationId]) REFERENCES [NoteDraft].[Classification] ([Id])
GO
ALTER TABLE [NoteDraft].[Entry] CHECK CONSTRAINT [FK_Entry_Classification]
GO
ALTER TABLE [NoteDraft].[Entry] WITH CHECK
ADD CONSTRAINT [FK_Entry_Draft]
FOREIGN KEY([DraftId]) REFERENCES [NoteDraft].[Draft] ([Id])
GO
ALTER TABLE [NoteDraft].[Entry] CHECK CONSTRAINT [FK_Entry_Draft]
GO
As with everything simple like this, the answer wasn't where I was looking.
Turns out, there's an event listener on the page that's re-inserting the records post deletion.
Still having trouble figuring out why the listener is running, but at least I know what's going on.

SQL Server database design for high volume stock market price data

I am writing application to store and retrieve stock market price data which the data is inserted on daily basis. I am storing the data for each asset (Stock) and for most of the market in the world. This is my current design of the tables
Country table:
CREATE TABLE [dbo].[List_Country]
(
[CountryId] [char](2) NOT NULL,
[Name] [nvarchar](100) NOT NULL,
[CurrenyCode] [nvarchar](5) NULL,
[CurrencyName] [nvarchar](50) NULL
CONSTRAINT [PK_dbo.List_Country]
PRIMARY KEY CLUSTERED ([CountryId] ASC)
)
Asset table:
CREATE TABLE [dbo].[List_Asset]
(
[AssetId] [int] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](max) NOT NULL,
[CountryId] [char](2) NOT NULL,
CONSTRAINT [PK_dbo.List_Asset]
PRIMARY KEY CLUSTERED ([AssetId] ASC)
)
Foreign key constraint on Country:
ALTER TABLE [dbo].[List_Asset] WITH CHECK
ADD CONSTRAINT [FK_dbo.List_Asset_dbo.List_Country_CountryId]
FOREIGN KEY([CountryId])
REFERENCES [dbo].[List_Country] ([CountryId])
ON DELETE CASCADE
GO
Stock_Price table:
CREATE TABLE [dbo].[Stock_Price_Data]
(
[StockPriceDataId] [int] IDENTITY(1,1) NOT NULL,
[AssetId] [int] NOT NULL,
[PriceDate] [datetime] NOT NULL,
[Open] [int] NOT NULL,
[High] [int] NOT NULL,
[Low] [int] NOT NULL,
[Close] [int] NOT NULL,
[Volume] [int] NOT NULL,
CONSTRAINT [PK_dbo.Stock_Price_Data]
PRIMARY KEY CLUSTERED ([StockPriceDataId] ASC)
)
Foreign key constraint on Asset:
ALTER TABLE [dbo].[Stock_Price_Data] WITH CHECK
ADD CONSTRAINT [FK_dbo.Stock_Price_Data_dbo.List_Asset_AssetId]
FOREIGN KEY([AssetId])
REFERENCES [dbo].[List_Asset] ([AssetId])
ON DELETE CASCADE
The concern I have at the moment is Stock_Price_Data table would be filled with high volume rows, i.e. For a specific market in a country, there can be easily 20,000 assets. Thus, in a year (260 days of trading) , I could potentially have 5.2 million rows for each country.
The application does not restrict a user from accessing data other than default country (which is setup during login).
Is it a good idea to have separate table (i.e. Stock_Price_Data_AU) for each country? Or is there a better way to design the database for the above scenario?
-Alan-
First of all - I'd drop the _data from the table name - its overkill.
If you are reasonably certain that the users will always filter the data by Country - ie only looking at 1 country at a time then I'd consider partitioning the table by Country ID - this way SQL Server will use partition elimination to pick only the relevant data. This way you get the ease of maintenance from 1 table but you get the performance as if it is a separate table per country. (I'm assuming you have Enterprise Edition) If your load works on a per country basis too then you can even switch out the partition and then drop the indexes to get even faster loads.

Large Table handling [duplicate]

I have a relatively large table (currently 2 million records) and would like to know if it's possible to improve performance for ad-hoc queries. The word ad-hoc being key here. Adding indexs is not an option (there are already indexs on the columns which are queried most commonly).
Running a simple query to return the 100 most recently updated records:
select top 100 * from ER101_ACCT_ORDER_DTL order by er101_upd_date_iso desc
Takes several minutes. See execution plan below:
Additional detail from the table scan:
SQL Server Execution Times:
CPU time = 3945 ms, elapsed time = 148524 ms.
The server is pretty powerful (from memory 48GB ram, 24 core processor) running sql server 2008 r2 x64.
Update
I found this code to create a table with 1,000,000 records. I thought i could then run SELECT TOP 100 * FROM testEnvironment ORDER BY mailAddress DESC on a few different servers to find out if my disk access speeds were poor on the server.
WITH t1(N) AS (SELECT 1 UNION ALL SELECT 1),
t2(N) AS (SELECT 1 FROM t1 x, t1 y),
t3(N) AS (SELECT 1 FROM t2 x, t2 y),
Tally(N) AS (SELECT TOP 98 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM t3 x, t3 y),
Tally2(N) AS (SELECT TOP 5 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM t3 x, t3 y),
Combinations(N) AS (SELECT DISTINCT LTRIM(RTRIM(RTRIM(SUBSTRING(poss,a.N,2)) + SUBSTRING(vowels,b.N,1)))
FROM Tally a
CROSS JOIN Tally2 b
CROSS APPLY (SELECT 'B C D F G H J K L M N P R S T V W Z SCSKKNSNSPSTBLCLFLGLPLSLBRCRDRFRGRPRTRVRSHSMGHCHPHRHWHBWCWSWTW') d(poss)
CROSS APPLY (SELECT 'AEIOU') e(vowels))
SELECT IDENTITY(INT,1,1) AS ID, a.N + b.N AS N
INTO #testNames
FROM Combinations a
CROSS JOIN Combinations b;
SELECT IDENTITY(INT,1,1) AS ID, firstName, secondName
INTO #testNames2
FROM (SELECT firstName, secondName
FROM (SELECT TOP 1000 --1000 * 1000 = 1,000,000 rows
N AS firstName
FROM #testNames
ORDER BY NEWID()) a
CROSS JOIN (SELECT TOP 1000 --1000 * 1000 = 1,000,000 rows
N AS secondName
FROM #testNames
ORDER BY NEWID()) b) innerQ;
SELECT firstName, secondName,
firstName + '.' + secondName + '#fake.com' AS eMail,
CAST((ABS(CHECKSUM(NEWID())) % 250) + 1 AS VARCHAR(3)) + ' ' AS mailAddress,
(ABS(CHECKSUM(NEWID())) % 152100) + 1 AS jID,
IDENTITY(INT,1,1) AS ID
INTO #testNames3
FROM #testNames2
SELECT IDENTITY(INT,1,1) AS ID, firstName, secondName, eMail,
mailAddress + b.N + b.N AS mailAddress
INTO testEnvironment
FROM #testNames3 a
INNER JOIN #testNames b ON a.jID = b.ID;
--CLEAN UP USELESS TABLES
DROP TABLE #testNames;
DROP TABLE #testNames2;
DROP TABLE #testNames3;
But on the three test servers the query ran almost instantaneously. Can anyone explain this?
Update 2
Thank you for the comments- please keep them coming... they led me to try changing the primary key index from non-clustered to clustered with rather interesting (and unexpected?) results.
Non-clustered:
SQL Server Execution Times:
CPU time = 3634 ms, elapsed time = 154179 ms.
Clustered:
SQL Server Execution Times:
CPU time = 2650 ms, elapsed time = 52177 ms.
How is this possible? Without an index on the er101_upd_date_iso column how can a clustered index scan be used?
Update 3
As requested- this is the create table script:
CREATE TABLE [dbo].[ER101_ACCT_ORDER_DTL](
[ER101_ORG_CODE] [varchar](2) NOT NULL,
[ER101_ORD_NBR] [int] NOT NULL,
[ER101_ORD_LINE] [int] NOT NULL,
[ER101_EVT_ID] [int] NULL,
[ER101_FUNC_ID] [int] NULL,
[ER101_STATUS_CDE] [varchar](2) NULL,
[ER101_SETUP_ID] [varchar](8) NULL,
[ER101_DEPT] [varchar](6) NULL,
[ER101_ORD_TYPE] [varchar](2) NULL,
[ER101_STATUS] [char](1) NULL,
[ER101_PRT_STS] [char](1) NULL,
[ER101_STS_AT_PRT] [char](1) NULL,
[ER101_CHG_COMMENT] [varchar](255) NULL,
[ER101_ENT_DATE_ISO] [datetime] NULL,
[ER101_ENT_USER_ID] [varchar](10) NULL,
[ER101_UPD_DATE_ISO] [datetime] NULL,
[ER101_UPD_USER_ID] [varchar](10) NULL,
[ER101_LIN_NBR] [int] NULL,
[ER101_PHASE] [char](1) NULL,
[ER101_RES_CLASS] [char](1) NULL,
[ER101_NEW_RES_TYPE] [varchar](6) NULL,
[ER101_RES_CODE] [varchar](12) NULL,
[ER101_RES_QTY] [numeric](11, 2) NULL,
[ER101_UNIT_CHRG] [numeric](13, 4) NULL,
[ER101_UNIT_COST] [numeric](13, 4) NULL,
[ER101_EXT_COST] [numeric](11, 2) NULL,
[ER101_EXT_CHRG] [numeric](11, 2) NULL,
[ER101_UOM] [varchar](3) NULL,
[ER101_MIN_CHRG] [numeric](11, 2) NULL,
[ER101_PER_UOM] [varchar](3) NULL,
[ER101_MAX_CHRG] [numeric](11, 2) NULL,
[ER101_BILLABLE] [char](1) NULL,
[ER101_OVERRIDE_FLAG] [char](1) NULL,
[ER101_RES_TEXT_YN] [char](1) NULL,
[ER101_DB_CR_FLAG] [char](1) NULL,
[ER101_INTERNAL] [char](1) NULL,
[ER101_REF_FIELD] [varchar](255) NULL,
[ER101_SERIAL_NBR] [varchar](50) NULL,
[ER101_RES_PER_UNITS] [int] NULL,
[ER101_SETUP_BILLABLE] [char](1) NULL,
[ER101_START_DATE_ISO] [datetime] NULL,
[ER101_END_DATE_ISO] [datetime] NULL,
[ER101_START_TIME_ISO] [datetime] NULL,
[ER101_END_TIME_ISO] [datetime] NULL,
[ER101_COMPL_STS] [char](1) NULL,
[ER101_CANCEL_DATE_ISO] [datetime] NULL,
[ER101_BLOCK_CODE] [varchar](6) NULL,
[ER101_PROP_CODE] [varchar](8) NULL,
[ER101_RM_TYPE] [varchar](12) NULL,
[ER101_WO_COMPL_DATE] [datetime] NULL,
[ER101_WO_BATCH_ID] [varchar](10) NULL,
[ER101_WO_SCHED_DATE_ISO] [datetime] NULL,
[ER101_GL_REF_TRANS] [char](1) NULL,
[ER101_GL_COS_TRANS] [char](1) NULL,
[ER101_INVOICE_NBR] [int] NULL,
[ER101_RES_CLOSED] [char](1) NULL,
[ER101_LEAD_DAYS] [int] NULL,
[ER101_LEAD_HHMM] [int] NULL,
[ER101_STRIKE_DAYS] [int] NULL,
[ER101_STRIKE_HHMM] [int] NULL,
[ER101_LEAD_FLAG] [char](1) NULL,
[ER101_STRIKE_FLAG] [char](1) NULL,
[ER101_RANGE_FLAG] [char](1) NULL,
[ER101_REQ_LEAD_STDATE] [datetime] NULL,
[ER101_REQ_LEAD_ENDATE] [datetime] NULL,
[ER101_REQ_STRK_STDATE] [datetime] NULL,
[ER101_REQ_STRK_ENDATE] [datetime] NULL,
[ER101_LEAD_STDATE] [datetime] NULL,
[ER101_LEAD_ENDATE] [datetime] NULL,
[ER101_STRK_STDATE] [datetime] NULL,
[ER101_STRK_ENDATE] [datetime] NULL,
[ER101_DEL_MARK] [char](1) NULL,
[ER101_USER_FLD1_02X] [varchar](2) NULL,
[ER101_USER_FLD1_04X] [varchar](4) NULL,
[ER101_USER_FLD1_06X] [varchar](6) NULL,
[ER101_USER_NBR_060P] [int] NULL,
[ER101_USER_NBR_092P] [numeric](9, 2) NULL,
[ER101_PR_LIST_DTL] [numeric](11, 2) NULL,
[ER101_EXT_ACCT_CODE] [varchar](8) NULL,
[ER101_AO_STS_1] [char](1) NULL,
[ER101_PLAN_PHASE] [char](1) NULL,
[ER101_PLAN_SEQ] [int] NULL,
[ER101_ACT_PHASE] [char](1) NULL,
[ER101_ACT_SEQ] [int] NULL,
[ER101_REV_PHASE] [char](1) NULL,
[ER101_REV_SEQ] [int] NULL,
[ER101_FORE_PHASE] [char](1) NULL,
[ER101_FORE_SEQ] [int] NULL,
[ER101_EXTRA1_PHASE] [char](1) NULL,
[ER101_EXTRA1_SEQ] [int] NULL,
[ER101_EXTRA2_PHASE] [char](1) NULL,
[ER101_EXTRA2_SEQ] [int] NULL,
[ER101_SETUP_MSTR_SEQ] [int] NULL,
[ER101_SETUP_ALTERED] [char](1) NULL,
[ER101_RES_LOCKED] [char](1) NULL,
[ER101_PRICE_LIST] [varchar](10) NULL,
[ER101_SO_SEARCH] [varchar](9) NULL,
[ER101_SSB_NBR] [int] NULL,
[ER101_MIN_QTY] [numeric](11, 2) NULL,
[ER101_MAX_QTY] [numeric](11, 2) NULL,
[ER101_START_SIGN] [char](1) NULL,
[ER101_END_SIGN] [char](1) NULL,
[ER101_START_DAYS] [int] NULL,
[ER101_END_DAYS] [int] NULL,
[ER101_TEMPLATE] [char](1) NULL,
[ER101_TIME_OFFSET] [char](1) NULL,
[ER101_ASSIGN_CODE] [varchar](10) NULL,
[ER101_FC_UNIT_CHRG] [numeric](13, 4) NULL,
[ER101_FC_EXT_CHRG] [numeric](11, 2) NULL,
[ER101_CURRENCY] [varchar](3) NULL,
[ER101_FC_RATE] [numeric](12, 5) NULL,
[ER101_FC_DATE] [datetime] NULL,
[ER101_FC_MIN_CHRG] [numeric](11, 2) NULL,
[ER101_FC_MAX_CHRG] [numeric](11, 2) NULL,
[ER101_FC_FOREIGN] [numeric](12, 5) NULL,
[ER101_STAT_ORD_NBR] [int] NULL,
[ER101_STAT_ORD_LINE] [int] NULL,
[ER101_DESC] [varchar](255) NULL
) ON [PRIMARY]
SET ANSI_PADDING OFF
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_PRT_SEQ_1] [varchar](12) NULL
SET ANSI_PADDING ON
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_PRT_SEQ_2] [varchar](120) NULL
SET ANSI_PADDING OFF
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_TAX_BASIS] [char](1) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_RES_CATEGORY] [char](1) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_DECIMALS] [char](1) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_TAX_SEQ] [varchar](7) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_MANUAL] [char](1) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_TR_LC_RATE] [numeric](12, 5) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_TR_FC_RATE] [numeric](12, 5) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_TR_PL_RATE] [numeric](12, 5) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_TR_DIFF] [char](1) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_TR_UNIT_CHRG] [numeric](13, 4) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_TR_EXT_CHRG] [numeric](13, 4) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_TR_MIN_CHRG] [numeric](13, 4) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_TR_MAX_CHRG] [numeric](13, 4) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_PL_UNIT_CHRG] [numeric](13, 4) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_PL_EXT_CHRG] [numeric](13, 2) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_PL_MIN_CHRG] [numeric](13, 2) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_PL_MAX_CHRG] [numeric](13, 2) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_TAX_RATE_TYPE] [char](1) NULL
SET ANSI_PADDING ON
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_ORDER_FORM] [varchar](2) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_FACTOR] [int] NULL
SET ANSI_PADDING OFF
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_MGMT_RPT_CODE] [varchar](6) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_ROUND_CHRG] [varchar](1) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_WHOLE_QTY] [varchar](1) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_SET_QTY] [numeric](15, 4) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_SET_UNITS] [numeric](15, 4) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_SET_ROUNDING] [varchar](1) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_SET_SUB] [varchar](1) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_TIME_QTY] [numeric](13, 4) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_GL_DISTR_PCT] [numeric](7, 4) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_REG_SEQ] [int] NULL
SET ANSI_PADDING ON
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_ALT_DESC] [varchar](255) NULL
SET ANSI_PADDING OFF
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_REG_ACCT] [varchar](8) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_DAILY] [varchar](1) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_AVG_UNIT_CHRG] [varchar](1) NULL
SET ANSI_PADDING ON
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_ALT_DESC2] [varchar](255) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_CONTRACT_SEQ] [int] NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_ORIG_RATE] [numeric](13, 4) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_DISC_PCT] [decimal](17, 10) NULL
SET ANSI_PADDING OFF
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_DTL_EXIST] [varchar](1) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_ORDERED_ONLY] [varchar](1) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_SHOW_STDATE] [varchar](1) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_SHOW_STTIME] [varchar](1) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_SHOW_ENDATE] [varchar](1) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_SHOW_ENTIME] [varchar](1) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_SHOW_RATE] [varchar](1) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_SHOW_UNITS] [varchar](1) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_BASE_RATE] [numeric](13, 4) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_COMMIT_QTY] [numeric](11, 2) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_MM_QTY_USED] [varchar](2) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_MM_CHRG_USED] [varchar](2) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_ITEM_TEXT_1] [varchar](50) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_ITEM_NBR_1] [numeric](13, 3) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_ITEM_NBR_2] [numeric](13, 3) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_ITEM_NBR_3] [numeric](13, 3) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_PL_BASE_RATE] [numeric](13, 4) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_REV_DIST] [varchar](1) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_COVER] [int] NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_RATE_TYPE] [varchar](2) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_USE_SEASONAL] [varchar](1) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_TAX_EI] [varchar](1) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_TAXES] [numeric](13, 2) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_FC_TAXES] [numeric](13, 2) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_PL_TAXES] [numeric](13, 2) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_FC_QTY] [numeric](13, 2) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_LEAD_HRS] [numeric](6, 2) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_STRIKE_HRS] [numeric](6, 2) NULL
SET ANSI_PADDING ON
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_CANCEL_USER_ID] [varchar](10) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_ST_OFFSET_HRS] [numeric](7, 2) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_EN_OFFSET_HRS] [numeric](7, 2) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_MEMO_FLAG] [varchar](1) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_MEMO_EXT_CHRG] [numeric](13, 4) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_MEMO_EXT_CHRG_PL] [numeric](13, 4) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_MEMO_EXT_CHRG_TR] [numeric](13, 4) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_MEMO_EXT_CHRG_FC] [numeric](13, 4) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_TIME_QTY_EDIT] [varchar](1) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_SURCHARGE_PCT] [decimal](17, 10) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_INCL_EXT_CHRG] [numeric](13, 4) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_INCL_EXT_CHRG_FC] [numeric](13, 4) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_CARRIER] [varchar](6) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_SETUP_ID2] [varchar](8) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_SHIPPABLE] [varchar](1) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_CHARGEABLE] [varchar](2) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_ITEM_NBR_ALLOW] [varchar](2) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_ITEM_NBR_START] [int] NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_ITEM_NBR_END] [int] NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_ITEM_SUPPLIER] [varchar](8) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_TRACK_ID] [varchar](40) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_REF_INV_NBR] [int] NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_NEW_ITEM_STS] [varchar](2) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_MSTR_REG_ACCT_CODE] [varchar](8) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_ALT_DESC3] [varchar](255) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_ALT_DESC4] [varchar](255) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_ALT_DESC5] [varchar](255) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_SETUP_ROLLUP] [varchar](1) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_MM_COST_USED] [varchar](2) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_AUTO_SHIP_RCD] [varchar](1) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_ITEM_FIXED] [varchar](1) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_ITEM_EST_TBD] [varchar](3) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_ROLLUP_PL_UNIT_CHRG] [numeric](13, 4) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_ROLLUP_PL_EXT_CHRG] [numeric](13, 2) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_GL_ORD_REV_TRANS] [varchar](1) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_DISCOUNT_FLAG] [varchar](1) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_SETUP_RES_TYPE] [varchar](6) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_SETUP_RES_CODE] [varchar](12) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_PERS_SCHED_FLAG] [varchar](1) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_PRINT_STAMP] [datetime] NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_SHOW_EXT_CHRG] [varchar](1) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_PRINT_SEQ_NBR] [int] NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_PAY_LOCATION] [varchar](3) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_MAX_RM_NIGHTS] [int] NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_USE_TIER_COST] [varchar](1) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_UNITS_SCHEME_CODE] [varchar](6) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_ROUND_TIME] [varchar](2) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_LEVEL] [int] NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_SETUP_PARENT_ORD_LINE] [int] NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_BADGE_PRT_STS] [varchar](1) NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_EVT_PROMO_SEQ] [int] NULL
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD [ER101_REG_TYPE] [varchar](12) NULL
/****** Object: Index [PK__ER101_ACCT_ORDER] Script Date: 04/15/2012 20:24:37 ******/
ALTER TABLE [dbo].[ER101_ACCT_ORDER_DTL] ADD CONSTRAINT [PK__ER101_ACCT_ORDER] PRIMARY KEY CLUSTERED
(
[ER101_ORD_NBR] ASC,
[ER101_ORD_LINE] ASC,
[ER101_ORG_CODE] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 50) ON [PRIMARY]
The table is 2.8 GB in size with index size standing at 3.9 GB.
Simple Answer: NO. You cannot help ad hoc queries on a 238 column table with a 50% Fill Factor on the Clustered Index.
Detailed Answer:
As I have stated in other answers on this topic, Index design is both Art and Science and there are so many factors to consider that there are few, if any, hard and fast rules. You need to consider: the volume of DML operations vs SELECTs, disk subsystem, other indexes / triggers on the table, distribution of data within the table, are queries using SARGable WHERE conditions, and several other things that I can't even remember right now.
I can say that no help can be given for questions on this topic without an understanding of the Table itself, its indexes, triggers, etc. Now that you have posted the table definition (still waiting on the Indexes but the Table definition alone points to 99% of the issue) I can offer some suggestions.
First, if the table definition is accurate (238 columns, 50% Fill Factor) then you can pretty much ignore the rest of the answers / advice here ;-). Sorry to be less-than-political here, but seriously, it's a wild goose chase without knowing the specifics. And now that we see the table definition it becomes quite a bit clearer as to why a simple query would take so long, even when the test queries (Update #1) ran so quickly.
The main problem here (and in many poor-performance situations) is bad data modeling. 238 columns is not prohibited just like having 999 indexes is not prohibited, but it is also generally not very wise.
Recommendations:
First, this table really needs to be remodeled. If this is a data warehouse table then maybe, but if not then these fields really need to be broken up into several tables which can all have the same PK. You would have a master record table and the child tables are just dependent info based on commonly associated attributes and the PK of those tables is the same as the PK of the master table and hence also FK to the master table. There will be a 1-to-1 relationship between master and all child tables.
The use of ANSI_PADDING OFF is disturbing, not to mention inconsistent within the table due to the various column additions over time. Not sure if you can fix that now, but ideally you would always have ANSI_PADDING ON, or at the very least have the same setting across all ALTER TABLE statements.
Consider creating 2 additional File Groups: Tables and Indexes. It is best not to put your stuff in PRIMARY as that is where SQL SERVER stores all of its data and meta-data about your objects. You create your Table and Clustered Index (as that is the data for the table) on [Tables] and all Non-Clustered indexes on [Indexes]
Increase the Fill Factor from 50%. This low number is likely why your index space is larger than your data space. Doing an Index Rebuild will recreate the data pages with a max of 4k (out of the total 8k page size) used for your data so your table is spread out over a wide area.
If most or all queries have "ER101_ORG_CODE" in the WHERE condition, then consider moving that to the leading column of the clustered index. Assuming that it is used more often than "ER101_ORD_NBR". If "ER101_ORD_NBR" is used more often then keep it. It just seems, assuming that the field names mean "OrganizationCode" and "OrderNumber", that "OrgCode" is a better grouping that might have multiple "OrderNumbers" within it.
Minor point, but if "ER101_ORG_CODE" is always 2 characters, then use CHAR(2) instead of VARCHAR(2) as it will save a byte in the row header which tracks variable width sizes and adds up over millions of rows.
As others here have mentioned, using SELECT * will hurt performance. Not only due to it requiring SQL Server to return all columns and hence be more likely to do a Clustered Index Scan regardless of your other indexes, but it also takes SQL Server time to go to the table definition and translate * into all of the column names. It should be slightly faster to specify all 238 column names in the SELECT list though that won't help the Scan issue. But do you ever really need all 238 columns at the same time anyway?
Good luck!
UPDATE
For the sake of completeness to the question "how to improve performance on a large table for ad-hoc queries", it should be noted that while it will not help for this specific case, IF someone is using SQL Server 2012 (or newer when that time comes) and IF the table is not being updated, then using Columnstore Indexes is an option. For more details on that new feature, look here:
http://msdn.microsoft.com/en-us/library/gg492088.aspx (I believe these were made to be updateable starting in SQL Server 2014).
UPDATE 2
Additional considerations are:
Enable compression on the Clustered Index. This option became available in SQL Server 2008, but as an Enterprise Edition-only feature. However, as of SQL Server 2016 SP1, Data Compression was made available in all editions! Please see the MSDN page for Data Compression for details on Row and Page Compression.
If you cannot use Data Compression, or if it won't provide much benefit for a particular table, then IF you have a column of a fixed-length type (INT, BIGINT, TINYINT, SMALLINT, CHAR, NCHAR, BINARY, DATETIME, SMALLDATETIME, MONEY, etc) and well over 50% of the rows are NULL, then consider enabling the SPARSE option which became available in SQL Server 2008. Please see the MSDN page for Use Sparse Columns for details.
There are a few issues with this query (and this apply to every query).
Lack of index
Lack of index on er101_upd_date_iso column is most important thing as Oded has already mentioned.
Without matching index (which lack of could cause table scan) there is no chance to run fast queries on big tables.
If you cannot add indexes (for various reasons including there is no point in creating index for just one ad-hoc query) I would suggest a few workarounds (which can be used for ad-hoc queries):
1. Use temporary tables
Create temporary table on subset (rows and columns) of data you are interested in.
Temporary table should be much smaller that original source table, can be indexed easily (if needed) and can cached subset of data which you are interested in.
To create temporary table you can use code (not tested) like:
-- copy records from last month to temporary table
INSERT INTO
#my_temporary_table
SELECT
*
FROM
er101_acct_order_dtl WITH (NOLOCK)
WHERE
er101_upd_date_iso > DATEADD(month, -1, GETDATE())
-- you can add any index you need on temp table
CREATE INDEX idx_er101_upd_date_iso ON #my_temporary_table(er101_upd_date_iso)
-- run other queries on temporary table (which can be indexed)
SELECT TOP 100
*
FROM
#my_temporary_table
ORDER BY
er101_upd_date_iso DESC
Pros:
Easy to do for any subset of data.
Easy to manage -- it's temporary and it's table.
Doesn't affect overall system performance like view.
Temporary table can be indexed.
You don't have to care about it -- it's temporary :).
Cons:
It's snapshot of data -- but probably this is good enough for most ad-hoc queries.
2. Common table expression -- CTE
Personally I use CTE a lot with ad-hoc queries -- it's help a lot with building (and testing) a query piece by piece.
See example below (the query starting with WITH).
Pros:
Easy to build starting from big view and then selecting and filtering what really you need.
Easy to test.
Cons:
Some people dislike CDE -- CDE queries seem to be long and difficult to understand.
3. Create views
Similar to above, but create views instead of temporary tables (if you play often with the same queries and you have MS SQL version which supports indexed views.
You can create views or indexed views on subset of data you are interested in
and run queries on view -- which should contain only interesting subset of data much smaller than the whole table.
Pros:
Easy to do.
It's up to date with source data.
Cons:
Possible only for defined subset of data.
Could be inefficient for large tables with high rate of updates.
Not so easy to manage.
Can affect overall system performance.
I am not sure indexed views are available in every version of MS SQL.
Selecting all columns
Running star query (SELECT * FROM) on big table is not good thing...
If you have large columns (like long strings) it takes a lot of time to read them from disk
and pass by network.
I would try to replace * with column names which you really need.
Or, if you need all columns try to rewrite query to something like (using common data expression):
;WITH recs AS (
SELECT TOP 100
id as rec_id -- select primary key only
FROM
er101_acct_order_dtl
ORDER BY
er101_upd_date_iso DESC
)
SELECT
er101_acct_order_dtl.*
FROM
recs
JOIN
er101_acct_order_dtl
ON
er101_acct_order_dtl.id = recs.rec_id
ORDER BY
er101_upd_date_iso DESC
Dirty reads
Last thing which could speed up the ad-hoc query is allowing dirty reads with table hint WITH (NOLOCK).
Instead of hint you can set transaction isolation level to read uncommited:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
or set proper SQL Management Studio setting.
I assume for ad-hoc queries dirty reads is good enough.
You are getting a table scan there, meaning that you do not have an index defined on er101_upd_date_iso, or if that column is part of an existing index, the index can't be used (possibly it is not the primary indexer column).
Adding missing indexes will help performance no end.
there are already indexs on the columns which are queried most commonly
That does not mean they are used in this query (and they probably are not).
I suggest reading Finding the Causes of Poor Performance in SQL Server by Gail Shaw, part 1 and part 2.
The question specifically states the performance needs to be improved for ad-hoc queries, and that indexes can't be added. So taking that at face value, what can be done to improve performance on any table?
Since we're considering ad-hoc queries, the WHERE clause and the ORDER BY clause can contain any combination of columns. This means that almost regardless of what indexes are placed on the table there will be some queries that require a table scan, as seen above in query plan of a poorly performing query.
Taking this into account, let's assume there are no indexes at all on the table apart from a clustered index on the primary key. Now let's consider what options we have to maximize performance.
Defragment the table
As long as we have a clustered index then we can defragment the table using DBCC INDEXDEFRAG (deprecated) or preferably ALTER INDEX.
This will minimize the number of disk reads required to scan the table and will improve speed.
Use the fastest disks possible. You don't say what disks you're using but if you can use SSDs.
Optimize tempdb. Put tempdb on the fastest disks possible, again SSDs. See this SO Article and this RedGate article.
As stated in other answers, using a more selective query will return less data, and should be therefore be faster.
Now let's consider what we can do if we are allowed to add indexes.
If we weren't talking about ad-hoc queries, then we would add indexes specifically for the limited set of queries being run against the table.
Since we are discussing ad-hoc queries, what can be done to improve speed most of the time?
Add a single column index to each column. This should give SQL Server at least something to work with to improve the speed for the majority of queries, but won't be optimal.
Add specific indexes for the most common queries so they are optimized.
Add additional specific indexes as required by monitoring for poorly performing queries.
Edit
I've run some tests on a 'large' table of 22 million rows. My table only has six columns but does contain 4GB of data. My machine is a respectable desktop with 8Gb RAM and a quad core CPU and has a single Agility 3 SSD.
I removed all indexes apart from the primary key on the Id column.
A similar query to the problem one given in the question takes 5 seconds if SQL server is restarted first and 3 seconds subsequently. The database tuning advisor obviously recommends adding an index to improve this query, with an estimated improvement of > 99%. Adding an index results in a query time of effectively zero.
What's also interesting is that my query plan is identical to yours (with the clustered index scan), but the index scan accounts for 9% of the query cost and the sort the remaining 91%. I can only assume your table contains an enormous amount of data and/or your disks are very slow or located over a very slow network connection.
Even if you have indexes on some columns that are used in some queries, the fact that your 'ad-hoc' query causes a table scan shows that you don't have sufficient indexes to allow this query to complete efficiently.
For date ranges in particular it is difficult to add good indexes.
Just looking at your query, the db has to sort all the records by the selected column to be able to return the first n records.
Does the db also do a full table scan without the order by clause? Does the table have a primary key - without a PK, the db will have to work harder to perform the sort?
How is this possible? Without an index on the er101_upd_date_iso column how can a clustered index scan be used?
An index is a B-Tree where each leaf node is pointing to a 'bunch of rows'(called a 'Page' in SQL internal terminology), That is when the index is a non-clustered index.
Clustered index is a special case, in which the leaf nodes has the 'bunch of rows' (rather than pointing to them). that is why...
1) There can be only one clustered index on the table.
this also means the whole table is stored as the clustered index, that is why you started seeing index scan rather than a table scan.
2) An operation that utilizes clustered index is generally faster than a non-clustered index
Read more at http://msdn.microsoft.com/en-us/library/ms177443.aspx
For the problem you have, you should really consider adding this column to a index, as you said adding a new index (or a column to an existing index) increases INSERT/UPDATE costs. But it might be possible to remove some underutilized index (or a column from an existing index) to replace with 'er101_upd_date_iso'.
If index changes are not possible, i recommend adding a statistics on the column, it can fasten things up when the columns have some correlation with indexed columns
http://msdn.microsoft.com/en-us/library/ms188038.aspx
BTW, You will get much more help if you can post the table schema of ER101_ACCT_ORDER_DTL.
and the existing indices too..., probably the query could be re-written to use some of them.
One of the reasons your 1M test ran quicker is likely because the temp tables are entirely in memory and would only go to disk if your server experiences memory pressure. You can either re-craft your query to remove the order by, add a good clustered index and covering index(es) as previously mentioned, or query the DMV to check for IO pressure to see if hardware related.
-- From Glen Barry
-- Clear Wait Stats (consider clearing and running wait stats query again after a few minutes)
-- DBCC SQLPERF('sys.dm_os_wait_stats', CLEAR);
-- Check Task Counts to get an initial idea what the problem might be
-- Avg Current Tasks Count, Avg Runnable Tasks Count, Avg Pending Disk IO Count across all schedulers
-- Run several times in quick succession
SELECT AVG(current_tasks_count) AS [Avg Task Count],
AVG(runnable_tasks_count) AS [Avg Runnable Task Count],
AVG(pending_disk_io_count) AS [Avg Pending DiskIO Count]
FROM sys.dm_os_schedulers WITH (NOLOCK)
WHERE scheduler_id < 255 OPTION (RECOMPILE);
-- Sustained values above 10 suggest further investigation in that area
-- High current_tasks_count is often an indication of locking/blocking problems
-- High runnable_tasks_count is a good indication of CPU pressure
-- High pending_disk_io_count is an indication of I/O pressure
I know that you said that adding indexes is not an option but that would be the only option to eliminate the table scan you have. When you do a scan, SQL Server reads all 2 million rows on the table to fulfill your query.
this article provides more info but remember: Seek = good, Scan = bad.
Second, can't you eliminate the select * and select only the columns you need?
Third, no "where" clause? Even if you have a index, since you are reading everything the best you will get is a index scan (which is better than a table scan, but it is not a seek, which is what you should aim for)
I know it's been quite a time since the beginning... There is a lot of wisdom in all these answers. Good indexing is the first thing when trying to improve a query. Well, almost the first. The most-first (so to speak) is making changes to code so that it's efficient. So, after all's been said and done, if one has a query with no WHERE, or when the WHERE-condition is not selective enough, there is only one way to get the data: TABLE SCAN (INDEX SCAN). If one needs all the columns from a table, then TABLE SCAN will be used - no question about it. This might be a heap scan or clustered index scan, depending on the type of data organization. The only last way to speed things up (if at all possible), is to make sure that as many cores are used as possible to do the scan: OPTION (MAXDOP 0). I'm ignoring the subject of storage, of course, but one should make sure that one has unlimited RAM, which goes without saying :)

DB advice needed for performance of a 'SessionVisit' table

I have a 'SessionVisit' table which collects data about user visits.
The script for this table is below. There may be 25,000 rows added a day.
The table CREATE statement is below. My database knowledge is definitely not up to scratch as far as understanding the implications of such a schema.
Can anyone give me their 2c of advice on some of these issues :
Do I need to worry about ROWSIZE for this schema for SQL Server 2008. I'm not even sure how the 8kb rowsize works in 2008. I don't even know if I'm wasting a lot of space if I'm not using all 8kb?
How should I purge old records I don't want. Will new rows fill in the empty spaces from dropped rows?
Any advice on indexes
I know this is quite general in nature. Any 'obvious' or non obvious info would be appreciated.
Here's the table :
USE [MyDatabase]
GO
/****** Object: Table [dbo].[SessionVisit] Script Date: 06/06/2009 16:55:05 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[SessionVisit](
[SessionGUID] [uniqueidentifier] NOT NULL,
[SessionVisitId] [int] IDENTITY(1,1) NOT NULL,
[timestamp] [timestamp] NOT NULL,
[SessionDate] [datetime] NOT NULL CONSTRAINT [DF_SessionVisit_SessionDate] DEFAULT (getdate()),
[UserGUID] [uniqueidentifier] NOT NULL,
[CumulativeVisitCount] [int] NOT NULL CONSTRAINT [DF_SessionVisit_CumulativeVisitCount] DEFAULT ((0)),
[SiteUserId] [int] NULL,
[FullEntryURL] [varchar](255) NULL,
[SiteCanonicalURL] [varchar](100) NULL,
[StoreCanonicalURL] [varchar](100) NULL,
[CampaignId] [int] NULL,
[CampaignKey] [varchar](50) NULL,
[AdKeyword] [varchar](50) NULL,
[PartnerABVersion] [varchar](10) NULL,
[ABVersion] [varchar](10) NULL,
[UserAgent] [varchar](255) NULL,
[Referer] [varchar](255) NULL,
[KnownRefererId] [int] NULL,
[HostAddress] [varchar](20) NULL,
[HostName] [varchar](100) NULL,
[Language] [varchar](50) NULL,
[SessionLog] [xml] NULL,
[OrderDate] [datetime] NULL,
[OrderId] [varchar](50) NULL,
[utmcc] [varchar](1024) NULL,
[TestSession] [bit] NOT NULL CONSTRAINT [DF_SessionVisit_TestSession] DEFAULT ((0)),
[Bot] [bit] NULL,
CONSTRAINT [PK_SessionVisit] PRIMARY KEY CLUSTERED
(
[SessionGUID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
ALTER TABLE [dbo].[SessionVisit] WITH CHECK ADD CONSTRAINT [FK_SessionVisit_KnownReferer] FOREIGN KEY([KnownRefererId])
REFERENCES [dbo].[KnownReferer] ([KnownRefererId])
GO
ALTER TABLE [dbo].[SessionVisit] CHECK CONSTRAINT [FK_SessionVisit_KnownReferer]
GO
ALTER TABLE [dbo].[SessionVisit] WITH CHECK ADD CONSTRAINT [FK_SessionVisit_SiteUser] FOREIGN KEY([SiteUserId])
REFERENCES [dbo].[SiteUser] ([SiteUserId])
GO
ALTER TABLE [dbo].[SessionVisit] CHECK CONSTRAINT [FK_SessionVisit_SiteUser]
I see SessionGUID and SessionVisitId, why have both a uniqueidentifier and an Identity(1,1) on the same table? Seems redundant to me.
I see referer and knownrefererid, think about getting the referer from the knownrefererid if possible. This will help reduce excess writes.
I see campaignkey and campaignid, again if possible get from the campaigns table if possible.
I see orderid and orderdate. I'm sure you can get the order date from the orders table, correct?
I see hostaddress and hostname, do you really need the name? Usually the hostname doesn't serve much purpose and can be easily misleading.
I see multiple dates and timestamps, is any of this duplicate?
How about that SessionLog column? I see that it's XML. Is it a lot of data, is it data you may already have in other columns? If so get rid of the XML or the duplicated columns. Using SQL 2008 you can parse data out of that XML column when reporting and possibly eliminate a few extra columns (thus writes). Are you going to be in trouble in the future when developers add more to that XML? XML to me just screams 'a lot of excessive writing'.
Mitch says to remove the primary key. Personally I would leave the index on the table. Since it is clustered that will help speed up write times as the DB will always write new rows at the end of the table on the disk.
Strip out some of this duplicate information and you'll probably do just fine writing a row each visit.
Well, I'd recommend NOT inserting a few k of data with EVERY page!
First thing I'd do would be to see how much of this information I could get from a 3rd party analytics tool, perhaps combined with log analysis. That should allow you to drop a lot of the fields.
25k inserts a days isn't much, but the catch here is that busier your site gets, the more load this is going to put on the db. Perhaps you could build a queuing system that batches the writes, but really, most of this information is already in the logs.
Agre with Chris that you would probably be better off using log analysis (check out Microsoft's free Log Parser)
Failing that, I would remove the Foreign Key constraints from your SessionVisit table.
You mentioned rowsize; the varchar's in your table do not pre-allocate to their maximum length (more 4 + 4 bytes for an empty field (approx.)). But saying that, a general rule is to keep rows as 'lean' as possible.
Also, I would remove the primary key from the SessionGUID (GUID) column. It won't help you much.
That's also an awful lot of nulls in that table. I think you should group together the columns that must be non-null at the same time. In fact, you should do a better analysis of the data you're writing, rather than lumping it all together in a single table.

Resources