Speed up retrieval of distinct values for dropdown via caching - sql-server

Overview
In my ASP.Net MVC application, I have several pages that utilize a DataRecord search functionality that is dynamically configured by the site admin to have specific DataRecord fields available as criteria in one of a few different search input types. One of the input types available is a dropdown, which is populated with the distinct DataRecord values of that particular field that are relevant to whatever the search context is.
I'm looking to decrease the amount of time it takes to create these dropdowns, and am open to suggestions.
I'll list out things in the following manner:
SQL Structure
Sample Query
Business Rules
Miscellaneous Info (may or may not be relevant, but I didn't want to rule anything out)
SQL Structure
Listed from greatest to lowest scope, with only relevant fields. Each table has a one to many relationship with the table that follows. Keep in mind these were all created and maintained via EF Code First with Migrations.
CREATE TABLE [dbo].[CompanyInfoes](
[Id] [int] IDENTITY(1,1) NOT NULL,
CONSTRAINT [PK_dbo.CompanyInfoes] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
CREATE TABLE [dbo].[BusinessLines](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Company_Id] [int] NOT NULL,
CONSTRAINT [PK_dbo.BusinessLines] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
ALTER TABLE [dbo].[BusinessLines] WITH CHECK ADD CONSTRAINT [FK_dbo.BusinessLines_dbo.CompanyInfoes_Company_Id] FOREIGN KEY([Company_Id])
REFERENCES [dbo].[CompanyInfoes] ([Id])
ALTER TABLE [dbo].[BusinessLines] CHECK CONSTRAINT [FK_dbo.BusinessLines_dbo.CompanyInfoes_Company_Id]
CREATE TABLE [dbo].[DataFiles](
[Id] [int] IDENTITY(1,1) NOT NULL,
[FileStatus] [int] NOT NULL,
[FileEnvironment] [int] NOT NULL,
[BusinessLine_Id] [int] NOT NULL,
CONSTRAINT [PK_dbo.DataFiles] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
ALTER TABLE [dbo].[DataFiles] WITH CHECK ADD CONSTRAINT [FK_dbo.DataFiles_dbo.BusinessLines_BusinessLine_Id] FOREIGN KEY([BusinessLine_Id])
REFERENCES [dbo].[BusinessLines] ([Id])
ON DELETE CASCADE
ALTER TABLE [dbo].[DataFiles] CHECK CONSTRAINT [FK_dbo.DataFiles_dbo.BusinessLines_BusinessLine_Id]
CREATE TABLE [dbo].[DataRecords](
[Id] [int] IDENTITY(1,1) NOT NULL,
[File_Id] [int] NOT NULL,
[Field1] [nvarchar](max) NULL,
[Field2] [nvarchar](max) NULL,
...
[Field20] [nvarchar](max) NULL,
CONSTRAINT [PK_dbo.DataRecords] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
ALTER TABLE [dbo].[DataRecords] WITH CHECK ADD CONSTRAINT [FK_dbo.DataRecords_dbo.DataFiles_File_Id1] FOREIGN KEY([File_Id])
REFERENCES [dbo].[DataFiles] ([Id])
ON DELETE CASCADE
ALTER TABLE [dbo].[DataRecords] CHECK CONSTRAINT [FK_dbo.DataRecords_dbo.DataFiles_File_Id1]
Sample Query (as generated by EF)
SELECT [Distinct1].[Field2] AS [Field2]
FROM ( SELECT DISTINCT
[Extent1].[Field2] AS [Field2]
FROM [dbo].[DataRecords] AS [Extent1]
INNER JOIN [dbo].[DataFiles] AS [Extent2] ON [Extent1].[File_Id] = [Extent2].[Id]
WHERE ([Extent2].[BusinessLine_Id] IN (4, 5, 6, 7, 8, 11, 12, 13, 14)) AND (0 = [Extent2].[FileEnvironment]) AND (1 = [Extent2].[FileStatus])
) AS [Distinct1]
Business Rules
The values within the Dropdown should be based on the viewing User's BusinessLine access ([BusinessLine_Id] clause in query), and the current page that the search is being used in conjunction with ([FileEnvironment] and [FileStatus]).
Which of the 20 DataRecords Fields should be presented as a Dropdown for searching is controlled by a site admin via an admin page, and is configured at a company level. Company A may have a Dropdown for Field1, Company B may have one for Field5, Field7, and Field18, and Company C may not have any Dropdowns what so ever.
While the layout and format of the DataRecords is consistent from company to company, the usage, and therefore the uniqueness of values, of Field1 - Field20 is not. Company A may have 3 unique values for Field1 across 900k records (hence why it makes sense to use a Dropdown for Field1 for them), while Company B may have something unique in Field1 for every DataRecord.
Everything database related is maintained via EF Migrations, and the site is set to auto apply migrations on App Startup (or on Deploy in the case of the Azure staging site). Anything that is recommended from a database perspective must be able to be implemented programmatically through migrations, so that the upgrading or instancing of the site and database may be done without manual intervention by someone with db access. Also, any database changes that need to be done should be not interfere with CodeFirst Migrations that are created when models are changed (IE cannot rename a column because some rogue index that was added outside of annotations exists)l
Similarly to the previous point, the Dropdown configuration is controlled via the site, so anything that needs to be done must be able to be added and removed on demand at runtime.
Relevant data changes that occur within usage of the site, but not necessarily by the current user:
FileStatus of a DataFile changes from 0 to 1 or 2
Which BusinessLines the current user can access changes
Additional BusinessLines are added
Relevant data changes that occur outside of the site (via importer app which is also part of the solution that the site is in and therefore can be modified if necessary):
New DataFiles and DataRecords are added
Additional BusinessLines are added (not a copy/paste error, they can be added through the importer as well)
Miscellaneous Info
The site is deployed to many locations, but in each deployment, the site to database is 1:1. So an in-memory caching is not out of the question.
There is only one Site Admin that controls which fields are represented as Dropdowns, and he can be educated about ramifications of making frequent changes and the caching each change may result in if necessary. He is also familiar with the data in each field at a Company level, and knows which fields are good candidates for Dropdowns.
Just to give a little data quantity context, in just over 2.5 months, the number of DataRecords for one company went from 558k to 924k. So obviously the solution should be able to work with an ever-growing amount of data.
Offloading the load time of loading of the values to an ajax request as to not hold up the page load is a good solution in general, but not one I can use for this.

Two quick items that jump out here would be
1) to add the Field2 column that is being returned, as an INCLUDE in the CLUSTERED INDEX on the DataRecords table. That will keep it from needing to do a bookmark lookup to find the Field2 after the ON clause has done the main work of finding the ID's.
2) Not sure why there is an double select happening. I don't think it would be a big impact, but the query is just reselecting what it selected as distinct, not even changing the name...

Related

Access Linked Table from SQL Shows #Delete

I created this table:
CREATE TABLE [dbo].[dbo_Country]
(
[Country] [nvarchar](100) NOT NULL,
[ISO3166Code] [smallint] NULL,
[CountryEn] [nvarchar](255) NULL,
[Abriviation] [nvarchar](255) NULL,
CONSTRAINT [dbo_Country$PrimaryKey]
PRIMARY KEY CLUSTERED ([Country] ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
Then I linked it to a MS Access database I try to open the table and see the information but see this:
Does anyone have a solution?
#Deleted is normally shown when rows have been deleted in the underlaying database table while the table is open in Access. They may have been deleted by you in another window or by other users. #Deleted will not show initially when you open the table in Access. Type Shift-F9 to requery. The deleted rows should disappear.
Set a default of 0 for the number (ISO3166Code)
(update all existing number column to = 0.)
Add a row version column (timestamp - NOT date time).
Re-link your table(s).
This is a long time known issue. With bit fields (or int) as null, then you will get that error. As noted, also add a timestamp column (not a datetime column).

Updates on a specific table in a database via Informatica takes a long time at a specific time interval

I am an informatica developer.
I have a dimension table in a Shared application database(SQL Server).
I have 4 keys on the table, and have a unique nonclustered index on this table.
At informatica target, we have the same 4 columns set as key columns.
We do a lookup on this dimension table on the same 4 keys to flag whether its an insert or an update.
I have a schedule in informatica where this job runs 6 times in a day at different intervals. For all the morning loads, the job run so quick, the throughtput of updates is like 1000 rec/sec.
But only for the evening load, for this particular table, the throughtput reduces like 12-15 rec/sec, this has started since last one month.
We expected that, there could be something else locking this table, or something at the database end. So contacted the DBA's they enabled a trace on this particular table, But they were not able to identity anything else. If anyone could assist me in any ways, or any hints where i might see, it would really be great.
The informatica server is also a shared server. But at this point of time where there is performance issue, both the SQL Server and the Informatica server are lightly loaded. If i have missed anything, kindly let me know and i can try to put additional information. The table has 94 columns, and The definition of the table goes as below, with A SurrogateKey, B,C,D and E being the keys :
CREATE TABLE TEMP(
A [int] IDENTITY(1,1) NOT NULL,
B [varchar](8) NOT NULL,
C [varchar](11) NOT NULL,
D [varchar](3) NOT NULL,
E [varchar](2) NOT NULL,
F [varchar](1) NULL,
.
.
.
CP [char](1) NULL,
CONSTRAINT [PK_T_TEMP] PRIMARY KEY CLUSTERED
(
[A] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY],
CONSTRAINT [IX_T_TEMP] UNIQUE NONCLUSTERED
(
[A] ASC,
[B] ASC,
[C] ASC,
[D] ASC,
[E] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY];
All the columns are getting updated based on the 4 keys. So the update statement looks like below.
UPDATE TEMP
SET
F=?,
G=?,
H=?,
..
..
..
CP=?
WHERE B=? AND C=? AND D=? AND E=?;

SQL performance advice

I know that it can be a tricky question and that it highly depends on context.
I have a database with a table containing around 10 millions lines (each line contains 4 varchar).
There is an index non clustered on the field (a varchar) used for the where.
I'm a bit confused because when selecting a line using a where on the indexed columns, it takes around a second to end.
Any advices to improve this response time ?
Would an indexed clustered be a good solution here?
Here is the table definition :
CREATE TABLE [dbo].[MYTABLE](
[ID] [uniqueidentifier] NOT NULL DEFAULT (newid()),
[CreationDate] [datetime] NOT NULL DEFAULT (getdate()),
[UpdateDate] [datetime] NULL,
[FIELD1] [varchar](9) NOT NULL,
[FIELD2] [varchar](max) NOT NULL,
[FIELD3] [varchar](1) NULL,
[FIELD4] [varchar](4) NOT NULL,
CONSTRAINT [PK_MYTABLE] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
Here is the index definition :
CREATE UNIQUE NONCLUSTERED INDEX [IX_FIELD1] ON [dbo].[MYTABLE]
(
[FIELD1] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
And here is the query I use (very basic) :
SELECT * FROM MYTABLE WHERE FIELD1 = 'DATA'
For this case, since you are selecting all columns (*), changing your nonclustered index to a clustered one will improve the time, since accessing additional columns (columns not included in the index, not ordered nor by INCLUDE) from a nonclustered index will need to retrieve another page with the actual data.
Since you can't have more than 1 clustered index by table, you will have to drop the existing one (the PRIMARY KEY in this case) and create it afterwards.
ALTER TABLE dbo.MYTABLE DROP [PK_MYTABLE] -- This might fail if you have foreign keys
ALTER TABLE dbo.MYTABLE ADD CONSTRAINT PK_MYTABLE PRIMARY KEY NONCLUSTERED (ID)
DROP INDEX [IX_FIELD1] ON [dbo].[MYTABLE]
CREATE CLUSTERED INDEX [IX_FIELD1] ON [dbo].[MYTABLE] (FIELD1)
Indexes access times might greatly vary depending on their fragmentation also (if you have many inserts, deletes or updates with values that aren't bigger or lower than the last/first one).
Also keep in mind that if you are doing another operation like joins, function calls or additional WHERE filters, the enging might decide not to use the indexes.
If you are certain that the column used in your WHERE clause is unique, you may create a UNIQUE CLUSTERED INDEX on that column.
If values in that column are not unique, you can implement COVERING INDEXES. For example, if you had this index:
CREATE NONCLUSTERED INDEX IX_Column4 ON MyTable(Column4)
INCLUDE (Column1, Column2);
When executing the following query:
SELECT Column1, Column2 FROM MyTable WHERE Column4 LIKE 'Something';
You would (most likely) be using the IX_Column4 index. But when executing something like:
SELECT Column1, Column2, Column3 FROM MyTable WHERE Column4 LIKE 'Something';
You will not be benefited from the advantages that this kind of index have to offer.
If rows in your table are regurlarly INSERTED, DELETED or UPDATED you should check for INDEX FRAGMENTATION and REBUILD or REORGANIZE them.
I would recommend the following articles in case you want to lear more about indexes:
Available index types, check out the guidelines offered for each kind of index.
SQL Server Index Design Guide
Hope it helps.

Multiple cascade paths warning and not sure why. The alternative if I can't?

I originally had the following two tables.
tbl_Vehicle
VehicleID int identity
tbl_VehicleAssignment
RecordID int identity
VehicleID int FK
EmployeeID int
Original bit
When a vehicle comes in, it has an originally assignee, noted by the [Original] column in the second table above. Also set ON DELETE CASCADE, so if a vehicle is deleted, so are any associated assignment records. No problem there.
I decided this wasn't the best way to organize the data, and adding rules on the VehicleAssignment table to enforce only one vehicle can have a single 1 for Original seemed expensive in the long run. When I say that, I have several tables like this, some of which will grow quite large, so thinking of performance.
I then decided to change my tables like so:
tbl_Vehicle
VehicleID int identity
Orig_Assignment int NULL
tbl_VehicleAssignment
RecordID int identity
VehicleID int FK
EmployeeID int
The problem is, I then wanted to add a constraint so that if RecordID in tbl_VehicleAssignment was deleted, it set Orig_Assignment back to NULL.
Getting the following error:
Introducing FOREIGN KEY constraint 'FK_Orig_Vehicle_VehicleAssignment'
on table 'tbl_Vehicle' may cause cycles or multiple cascade paths.
Specify ON DELETE NO ACTION or ON UPDATE NO ACTION, or modify other
FOREIGN KEY constraints.
But I'm not sure why exactly. If a vehicle is deleted, it cascades and deletes assignment records. But I'd also like to set [Orig_Assignment] to NULL when its associated record is deleted. I don't see how the two cross paths to cause multiple cascade paths. What's interesting though, is if I use the table designer wizard to create this, it actually saves the constraint in the tbl_VehicleAssignment table, but fails on the tbl_Vehicle table, and although closing the dialog and looking at the relationships in the tables look correct, there's obviously something wrong.
If I can't get around this, what would be the best method?
1. Trigger
I'd rather stay away from this
2. Another table for originals that allows nulls
So it would be like:
tbl_VehicleOriginal
VehicleID int
Orig_Assignment int NULL
Orig_SomethingElse int NULL
Orig_Etc. int NULL
This would create a row for every row in the tbl_Vehicle table.
3. Another table for originals but only when present
Such as:
tbl_VehicleOriginal
VehicleID int PK
RecordType PK
RecordID FK
I suppose my question is, why can't I just add the constraint? If it's not possible, what's the best way to organize the data?
UPDATE - Scripts to build the example
CREATE TABLE [dbo].[A_Vehicle](
[VehicleID] [int] IDENTITY(1,1) NOT NULL,
[Orig_Assignment] [int] NULL,
CONSTRAINT [PK_A_Vehicle] PRIMARY KEY CLUSTERED
(
[VehicleID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[A_VehicleAssignment](
[RecordID] [int] IDENTITY(1,1) NOT NULL,
[VehicleID] [int] NOT NULL,
CONSTRAINT [PK_A_VehicleAssignment] PRIMARY KEY CLUSTERED
(
[RecordID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[A_VehicleAssignment] WITH CHECK ADD CONSTRAINT [FK_A_VehicleAssignment_A_Vehicle] FOREIGN KEY([VehicleID])
REFERENCES [dbo].[A_Vehicle] ([VehicleID])
ON DELETE CASCADE
GO
ALTER TABLE [dbo].[A_VehicleAssignment] CHECK CONSTRAINT [FK_A_VehicleAssignment_A_Vehicle]
GO
-- The below will fail
ALTER TABLE [dbo].[A_Vehicle] WITH CHECK ADD CONSTRAINT [FK_Orig_A_VehicleAssignment_A_Vehicle] FOREIGN KEY([Orig_Assignment])
REFERENCES [dbo].[A_VehicleAssignment] ([RecordID])
ON DELETE SET NULL
GO

Which approach is better for this scenario?

We have the following table:
CREATE TABLE [dbo].[CampaignCustomer](
[ID] [int] IDENTITY(1,1) NOT NULL,
[CampaignID] [int] NOT NULL,
[CustomerID] [int] NULL,
[CouponCode] [nvarchar](20) NOT NULL,
[CreatedDate] [datetime] NOT NULL,
[ModifiedDate] [datetime] NULL,
[Active] [bit] NOT NULL,
CONSTRAINT [PK_CampaignCustomer] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
and the following Unique Index:
CREATE UNIQUE NONCLUSTERED INDEX [IX_CampaignCustomer_CouponCode] ON [dbo].[CampaignCustomer]
(
[CouponCode] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 20) ON [PRIMARY]
GO
We do pretty constant queries using the CouponCode and other foreign keys (not shown above for simplicity). The CampaignCustomer table has almost 4 million records and growing. We also do campaigns that don't require Coupon Codes and therefore we don't insert those records. Now we need to also start tracking those campaigns as well for another purpose. So we have 2 options:
We change the CouponCode column ot allow nulls and create a unique filetered index to not include nulls and allow the table to grow even bigger and faster.
Create a separate table for tracking all campaigns for this specific purpose.
Keep in mind that the CampaignCustomer table is used very often for redeeming coupons and inserting new ones. Bottom line is we don't want our customer to redeem a coupon and stay waiting until they give up or for other processes to fail. So, from an efficiency perspective, which option do you think is best and why?
I'd go for the filtered index... you're storing the same data so keep it in the same table.
Splitting the table is refactoring when you probably don't need it and adds complexity.
Do you have problems with 4 million rows? It's not that much especially for such a narrow table
I'm against a duplicate table for the sake of a single column
Allowing the couponcode to be null means that someone could accidentally create a record where the value is NULL when it should be a valid couponcode
I would create a couponcode that indicates as being a non-coupon rather than resorting to indicator columns "isCoupon" or "isNonCouponCampaign", and use a filtered index to ignore the "nocoupon" value.
Which leads to my next point - I don't see a foreign key reference, but it would be key to knowing what coupons existed and which ones were actually used. Some of the columns in the existing table could be moved up to the parent couponcode table...

Resources