I have two tables which are heavily queried by multiple users. Average 100+ (update/select) queries/second requests are made for these tables.
Parent
Child
*GrantParent is not involved in join so, I said only two tables
I need to reorder all children for each parent. There can be 3000-4000 parents and each parent may have around same number of children.
Column Types:
ParentID GUID
ChildIndex int
FileID Varchar
IsDeleted bit
Tables have clustered index on PK and non-clustered index on columns being used in where.
UPDATE C SET C.ChildIndex = T.ReOrderedChildIndex FROM [Child] C INNER JOIN
(
SELECT ROW_NUMBER() OVER (PARTITION BY dbo.Child.[ParentID] ORDER BY [ChildIndex] asc) AS ReOrderedChildIndex,
dbo.Child.ChildIndex,
dbo.Child.FileID,
dbo.Child.ParentID
FROM dbo.Child WITH (NOLOCK) INNER JOIN
dbo.Parent WITH (NOLOCK) ON dbo.Child.ParentID = dbo.Parent.ParentID
WHERE (dbo.Parent.GrandParentID = 1) AND (dbo.Child.IsDeleted = 0)
) T
ON C.FileID =T.FileID AND (C.ParentID=T.ParentID) AND (C.IsDeleted = 0)
It looks above query take longer time and put select queries on wait even I have used WITH (NOLOCK) in all data selection stored procedures.
There is another query which reorder parents in same way as done for childs in above query.
In Activity Monitor the locks are shown for select stored procedures.
What is the best way to reorder perform reordering?
I am having following issues and believe they are stems from these queries:
1- Randomly deadlock occur.
2- Often connection pool time out occurs.
*Database is accessed by a windows application using Entlib 4.0 with connection pooling enabled, pool max size 200.
SQL Server 2008 R2
I'd recommend restructuring your data to a more flexible schema. This schema will allow multiple levels so you can merge GrandParent, Parent, and Child into one logical relationship table and one logical details table. You'll also be able to take advantage of indexes to reduce locks and improve performance.
You'll have to re-build your hierarchy after any relationship changes. The way I wrote the script below should minimize this impact on your system. You will no longer be updating the entire table, just the pieces that have changed.
Schema:
CREATE TABLE dbo.EntityName
(
ID INT IDENTITY(1,1),
ParentID INT -- Todo: Add foreign key back to dbo.EntityName
-- Todo: Add primary key
);
GO
CREATE TABLE dbo.Hierarchy
(
ParentID INT, -- Todo: Add foreign key back to dbo.EntityName
ChildID INT, -- Todo: Add foreign key back to dbo.EntityName
ChildLevel INT
);
GO
Populate script (slightly rough around the edges):
CREATE PROCEDURE [dbo].[uspBuildHierarchy]
AS
BEGIN
SET NOCOUNT ON;
CREATE TABLE #Hierarchy
(
ParentID INT,
ChildID INT,
ChildLevel INT
);
-- Add the root of your hierarchy
INSERT INTO #Hierarchy VALUES (1, 1, 0);
DECLARE #ChildLevel INT = 1,
#LastCount INT = 1;
WHILE (#LastCount > 0)
BEGIN
INSERT INTO #Hierarchy
SELECT
E.ParentID,
E.ID,
#ChildLevel + 1
FROM dbo.EntityName E
INNER JOIN #Hierarchy H ON H.ChildID = E.ParentID
AND H.ChildLevel = (#ChildLevel - 1)
LEFT JOIN #Hierarchy EH ON EH.ParentID = E.ParentID
AND EH.ChildID = E.ID
WHERE EH.ChildLevel IS NULL;
SET #LastCount = ##ROWCOUNT;
SET #ChildLevel = #ChildLevel + 1;
END
MERGE INTO dbo.Hierarchy OH
USING
(
SELECT
ParentID,
ChildID,
ChildLevel
FROM #Hierarchy
) NH
ON OH.ParentID = NH.ParentID
AND OH.ChildID = NH.ChildID
WHEN MATCHED AND OH.ChildLevel <> NH.ChildLevel THEN
UPDATE
SET ChildLevel = NH.ChildLevel
WHEN NOT MATCHED THEN
INSERT
VALUES
(
NH.ParentID,
NH.ChildID,
NH.ChildLevel
)
WHEN NOT MATCHED BY SOURCE
THEN DELETE;
END
GO
Query for all of an entity's children:
SELECT *
FROM dbo.EntityName E
INNER JOIN dbo.Hierarchy H ON H.ChildID = E.ID
AND H.ParentID = #EntityNameID;
Related
I need to create a sequence in the database that cannot be using sequence or identity.
There is a table in the database called File where all the files that users send in different areas of the system are stored.
It contains the id (primary key), name, type, folder, number, hash...
CREATE TABLE dbo.[File]
(
FileId uniqueidentifier NOT NULL,
Name nvarchar(30) NOT NULL,
FileTypeId int NOT NULL,
FileFolderId int NOT NULL,
Number int NOT NULL,
Hash nvarchar(50) NOT NULL
...
) ON [PRIMARY]
And then for each feature there is a table expanding the properties of the File table, an example is ContractFile.
It has the same id of the File table and with a few more fields and the id of the Contract table, creating the relation.
CREATE TABLE dbo.ContractFile
(
FileId uniqueidentifier NOT NULL,
ContractId uniqueidentifier NOT NULL
...
) ON [PRIMARY]
So the filename should follow a pattern.
050#H4G5H4G244#001.pdf
050#H4G5H4G244#002.pdf
060#H4G5H4G244#001.pdf
The first 3 digits is a code that is in the FileType table.
The digits in the middle is the code in the Contract table.
And the last 3 is the sequence that was inserted.
Then it groups the string by the FileType and the Contract.
So I created a trigger in the ContractFile table for when inserting it get the biggest number for that FileType and for the Contract and add +1, setting the Number field of the File table.
Then the file name is updated (in the same trigger)
CREATE TRIGGER [dbo].[tgContractFileInsert]
ON [dbo].[ContractFile]
FOR INSERT
AS
BEGIN
SET NOCOUNT ON
UPDATE dbo.File
SET Number = COALESCE(
(SELECT MAX(AR.Number)
FROM dbo.ContractFile NOA
INNER JOIN dbo.File AR
ON AR.FileId = NOA.FileId
WHERE NOA.ContractId = I.ContractId AND
AR.FileTypeId = T.FileTypeId
),
0) + 1
FROM dbo.File T WITH (XLOCK)
INNER JOIN Inserted I
ON I.FileId = T.FileId
WHERE T.Number IS NULL
UPDATE dbo.File
SET Name = dbo.fnFileName(AP.Code, NOB.Code, T.Numero, T.Name)
FROM dbo.File T
INNER JOIN Inserted I
ON I.FileId = T.FileId
INNER JOIN dbo.FileType AP
ON AP.FileTypeId = T.FileTypeId
INNER JOIN dbo.Contract NOB
ON NOB.ContractId = I.ContractId
END
At first it works, but when we have a large volume being inserted, there is a deadlock.
And from what I'm seeing also when inserting more than one record will end up getting the same number, since the Inserted table will bring two records and the +1 is not checking this.
How could I solve this? What is the best way?
Avoid deadlock, will the sequence be correct even inserting more than one record at a time and have a good performance?
In a relational database (SQL), I have a parent entity that can have 0..n related child entities. The parent entity is uniquely identified in part by its collection of related child entities, such that I should not be able to have two similar parents with the same collection of children.
So I could have Parent 1 with Child 1 and Child 2, and Parent 2 with Child 2 and Child 3, but I cannot have another parent with Child 2 and Child 3.
Ideally, I would like to enforce this uniqueness using a database constraint. I've considered storing a hash of all child records with the parent, but was wondering if there was an easier / more standard way of accomplishing this.
Any ideas?
This kind of constraint is tricky because SQL has no relational equality operator, i.e. no simple way of evaluting A=B where A and B are sets of rows. Standard SQL does support nested tables but unfortunately SQL Server does not.
One possible answer is a predicate like the following, which checks for any identical families in a table:
NOT EXISTS (
SELECT 1
FROM family f, family g
WHERE f.child = g.child
AND f.parent <> g.parent
GROUP BY f.parent, g.parent
HAVING COUNT(*) = (SELECT COUNT(*) FROM family WHERE parent = f.parent)
AND COUNT(*) = (SELECT COUNT(*) FROM family WHERE parent = g.parent)
)
Notice that this query doesn't attempt to deal with childless families. In set-theoretic terms two empty sets are necessarily identical. If you want to allow for childless families then you would have to decide whether two childless families should be deemed identical or not.
SQL is not a truly relational language and it falls well short of what a relational language ought to be capable of. Tutorial D is an example of a real relational language that does support relational equality and relation-valued attributes. In Tutorial D you can in principle represent each family as a value of a single attribute in a relvar. That family attribute can also be a key and therefore duplicate families would not be allowed.
Thanks for the help from those who suggested using a trigger. This is roughly what I have and seems to be working.
CREATE TRIGGER [dbo].[trig_Parent_Child_Uniqueness]
ON [dbo].[Parent_Child]
AFTER INSERT, UPDATE
AS
BEGIN
IF EXISTS (
SELECT 1
FROM Parent p1
--Compare each pair of parents
JOIN Parent p2 ON p1.ParentId <> p2.ParentId
WHERE NOT EXISTS (
--Find any children that are different
SELECT 1
FROM (
SELECT ChildId FROM Parent_Child c1
WHERE c1.ParentId = p1.ParentId
) as c1
FULL OUTER JOIN (
SELECT ChildId FROM Parent_Child c2
WHERE c2.ParentId = p2.ParentId
) as c2 ON c2.ChildId = c1.ChildId
WHERE c1.ChildId IS NULL OR c2.ChildId IS NULL
)
) ROLLBACK;
END;
EDIT: Or a better solution, adapted from #sqlvogel
CREATE TRIGGER [dbo].[trig_Parent_Child_Uniqueness]
ON [dbo].[Parent_Child]
AFTER INSERT, UPDATE
AS
BEGIN
IF EXISTS (
SELECT 1
FROM Parent_Child p1
FULL JOIN Parent_Child p2 ON p1.ParentId <> p2.ParentId
AND p1.ChildId = p2.ChildId
GROUP BY p1.ParentId
HAVING COUNT(p1.ParentId) = COUNT(*)
AND COUNT(p2.ParentId) = COUNT(*)
) ROLLBACK;
END;
This is a bit yucky as it includes triggers and cursors :(
It includes a column in the parent table which is based upon the children
Set up:
CREATE TAble Parent
(
Id INT Primary Key,
Name VARCHAR(50),
ChildItems VARCHAR(200) NOT NULL UNIQUE
)
CREATE TABLE Child
(
Id INT Primary Key,
Name VARCHAR(50)
)
CREATE TABLE ParentChild
(
Id INT Identity Primary Key,
ParentId INT,
ChildId Int
)
Triggers
-- This gives the unique colmn a default based upon the id of the parent
CREATE TRIGGER trg_Parent ON Parent
INSTEAD OF Insert
AS
SET NOCOUNT ON
INSERT INTO Parent (Id, Name, ChildItems)
SELECT Id, Name, '/' + CAST(Id As Varchar(10)) + '/'
FROM Inserted
GO
-- This updates the parent with a path based upon child items
-- If a the exact same child items exist for another parent then this fails
-- because of the unique index
CREATE Trigger trg_ParentChild ON ParentChild
AFTER Insert, Update
AS
DECLARE #ParentId INT = 0
DECLARE #ChildItems VARCHAR(8000) = ''
DECLARE parentCursor CURSOR FOR
SELECT DISTINCT ParentId
FROM Inserted
OPEN parentCursor
FETCH NEXT FROM parentCursor INTO #ParentId
WHILE ##FETCH_STATUS = 0
BEGIN
SELECT #ChildItems = COALESCE(#ChildItems + '/ ', '') + CAST(ChildID As Varchar(10))
FROM ParentChild
WHERE ParentId = #ParentId
ORDER BY ChildId
UPDATE Parent
SET ChildItems = #ChildITems
WHERE Id = #ParentId
FETCH NEXT FROM parentCursor INTO #ParentId
SET #ChildItems = ''
END
CLOSE parentCursor
DEALLOCATE parentCursor
GO
Data Setup
INSERT INTO Parent (Id, Name)
VALUES (1, 'Parent1'), (2,'Parent2'), (3, 'Parent3')
INSERT INTO Child (Id, Name)
VALUES (1,'Child1'), (2,'Child2'), (3,'Child3'), (4,'Child4')
Now insert some data
-- This one succeeds
INSERT INTO ParentChild (ParentId, ChildId)
VALUES (1,1),(1,2),(2,2),(2,3)
-- This one Fails
INSERT INTO ParentChild (ParentId, ChildId) VALUES (3,1),(3,2)
My query :
INSERT into PriceListRows (PriceListChapterId,[No])
SELECT TOP 250 100943 ,N'2'
FROM #AnyTable
This query works fine and the following exception raises as desired:
The INSERT statement conflicted with the CHECK constraint
"CK_PriceListRows_RowNo_Is_Not_Unqiue_In_PriceList". The conflict
occurred in database "TadkarWeb", table "dbo.PriceListRows".
but with changing SELECT TOP 250 to SELECT TOP 251 (yes! just changing 250 to 251!) the query runs successfully without any check constrain exception!
Why this odd behavior?
NOTES :
My check constraint is a function which checks some sort of uniqueness. It queries about 4 table.
I checked on both SQL Server 2012 SP2 and SQL Server 2014 SP1
** EDIT 1 **
Check constraint function:
ALTER FUNCTION [dbo].[CheckPriceListRows_UniqueNo] (
#rowNo nvarchar(50),
#rowId int,
#priceListChapterId int,
#projectId int)
RETURNS bit
AS
BEGIN
IF EXISTS (SELECT 1
FROM RowInfsView
WHERE PriceListId = (SELECT PriceListId
FROM ChapterInfoView
WHERE Id = #priceListChapterId)
AND (#rowID IS NULL OR Id <> #rowId)
AND No = #rowNo
AND (#projectId IS NULL OR
(ProjectId IS NULL OR ProjectId = #projectId)))
RETURN 0 -- Error
--It is ok!
RETURN 1
END
** EDIT 2 **
Check constraint code (what SQL Server 2012 produces):
ALTER TABLE [dbo].[PriceListRows] WITH NOCHECK ADD CONSTRAINT [CK_PriceListRows_RowNo_Is_Not_Unqiue_In_PriceList] CHECK (([dbo].[tfn_CheckPriceListRows_UniqueNo]([No],[Id],[PriceListChapterId],[ProjectId])=(1)))
GO
ALTER TABLE [dbo].[PriceListRows] CHECK CONSTRAINT [CK_PriceListRows_RowNo_Is_Not_Unqiue_In_PriceList]
GO
** EDIT 3 **
Execution plans are here : https://www.dropbox.com/s/as2r92xr14cfq5i/execution%20plans.zip?dl=0
** EDIT 4 **
RowInfsView definition is :
SELECT dbo.PriceListRows.Id, dbo.PriceListRows.No, dbo.PriceListRows.Title, dbo.PriceListRows.UnitCode, dbo.PriceListRows.UnitPrice, dbo.PriceListRows.RowStateCode, dbo.PriceListRows.PriceListChapterId,
dbo.PriceListChapters.Title AS PriceListChapterTitle, dbo.PriceListChapters.No AS PriceListChapterNo, dbo.PriceListChapters.PriceListCategoryId, dbo.PriceListCategories.No AS PriceListCategoryNo,
dbo.PriceListCategories.Title AS PriceListCategoryTitle, dbo.PriceListCategories.PriceListClassId, dbo.PriceListClasses.No AS PriceListClassNo, dbo.PriceListClasses.Title AS PriceListClassTitle,
dbo.PriceListClasses.PriceListId, dbo.PriceLists.Title AS PriceListTitle, dbo.PriceLists.Year, dbo.PriceListRows.ProjectId, dbo.PriceListRows.IsTemplate
FROM dbo.PriceListRows INNER JOIN
dbo.PriceListChapters ON dbo.PriceListRows.PriceListChapterId = dbo.PriceListChapters.Id INNER JOIN
dbo.PriceListCategories ON dbo.PriceListChapters.PriceListCategoryId = dbo.PriceListCategories.Id INNER JOIN
dbo.PriceListClasses ON dbo.PriceListCategories.PriceListClassId = dbo.PriceListClasses.Id INNER JOIN
dbo.PriceLists ON dbo.PriceListClasses.PriceListId = dbo.PriceLists.Id
The explanation is that your execution plan is using a "wide" (index by index) update plan.
The rows are inserted into the clustered index at step 1 in the plan. And the check constraints are validated for each row at step 2.
No rows are inserted into the non clustered indexes until all rows have been inserted into the clustered index.
This is because there are two blocking operators between the clustered index insert / constraints checking and the non clustered index inserts. The eager spool (step 3) and the sort (step 4). Both of these produce no output rows until they have consumed all input rows.
The plan for the scalar UDF uses the non clustered index to try and find matching rows.
At the point the check constraint runs no rows have yet been inserted into the non clustered index so this check comes up empty.
When you insert fewer rows you get a "narrow" (row by row) update plan and avoid the problem.
My advice is to avoid this kind of validation in check constraints. It is difficult to be sure that the code will work correctly in all circumstances (such as different execution plans and isolation levels) and additionally they block parellelism in queries against the table. Try to do it declaratively (a unique constraint that needs to join onto other tables can often be achieved with an indexed view).
A simplified repro is
CREATE FUNCTION dbo.F(#Z INT)
RETURNS BIT
AS
BEGIN
RETURN CASE WHEN EXISTS (SELECT * FROM dbo.T1 WHERE Z = #Z) THEN 0 ELSE 1 END
END
GO
CREATE TABLE dbo.T1
(
ID INT IDENTITY PRIMARY KEY,
X INT,
Y CHAR(8000) DEFAULT '',
Z INT,
CHECK (dbo.F(Z) = 1),
CONSTRAINT IX_X UNIQUE (X, ID),
CONSTRAINT IX_Z UNIQUE (Z, ID)
)
--Fails with check constraint error
INSERT INTO dbo.T1 (Z)
SELECT TOP (10) 1 FROM master..spt_values;
/*I get a wide update plan for TOP (2000) but this may not be reliable
across instances so using trace flag 8790 to get a wide plan. */
INSERT INTO dbo.T1 (Z)
SELECT TOP (10) 2 FROM master..spt_values
OPTION (QUERYTRACEON 8790);
GO
/*Confirm only the second insert succceed (Z=2)*/
SELECT * FROM dbo.T1;
DROP TABLE dbo.T1;
DROP FUNCTION dbo.F;
It's possible that you are encountering an incorrect optimization of a query, but without having the data in all the involved tables, we cannot reproduce the bug.
However, for this kind of checks, I recommend using triggers instead of check constraints based on functions. In a trigger, you could use a SELECT statement to debug why it's not working as expected. For example:
CREATE TRIGGER trg_PriceListRows_CheckUnicity ON PriceListRows
FOR INSERT, UPDATE
AS
IF ##ROWCOUNT>0 BEGIN
/*
SELECT * FROM inserted i
INNER JOIN RowInfsView r
ON r.PriceListId = (
SELECT c.PriceListId
FROM ChapterInfoView c
WHERE c.Id = i.priceListChapterId
)
AND r.Id <> i.Id
AND r.No = i.No
AND (r.ProjectId=i.ProjectId OR r.ProjectId IS NULL AND i.ProjectId IS NULL)
*/
IF EXISTS (
SELECT * FROM inserted i
WHERE EXISTS (
SELECT * FROM RowInfsView r
WHERE r.PriceListId = (
SELECT c.PriceListId
FROM ChapterInfoView c
WHERE c.Id = i.priceListChapterId
)
AND r.Id <> i.Id
AND r.No = i.No
AND (r.ProjectId=i.ProjectId OR r.ProjectId IS NULL AND i.ProjectId IS NULL)
)
) BEGIN
RAISERROR ('Duplicate rows!',16,1)
ROLLBACK
RETURN
END
END
This way, you can see what is being checked and correct your views and/or existing data.
I want to write a trigger to the view, VW_BANKBRANCH:
If the inserted row contains a bankcode that exists in the table, then update the
bName column of bank table with the inserted data
If not, insert rows to bank table to reflect the new information.
But my trigger is not working..
My tables
CREATE TABLE bank(
code VARCHAR(30) PRIMARY KEY,
bName VARCHAR(50)
);
CREATE TABLE branch(
brNum INT PRIMARY KEY,
brName VARCHAR(50),
braddress VARCHAR(50),
bcode VARCHAR(30) REFERENCES bank(code)
);
CREATE VIEW VW_BANKBRANCH
AS
SELECT code,bname,brnum,brName
FROM bank ,branch
WHERE code=bcode
My trigger
CREATE TRIGGER tr_VW_BANKBRANCH_INSERT ON VW_BANKBRANCH
INSTEAD OF INSERT
AS
BEGIN
DECLARE #insertedBankCode INT
#insertedbname varchar
#insertedbrnum int
#insertedbrName varchar
SELECT #insertedBankCode = code
FROM INSERTED
IF(#insertedBankCode=code)
SET code=#insertedBankCode
bname=#insertedbname
brnum=#insertedbrnum
brName=#insertedbrName
ELSE
insert(code,bname,brnum,brName)
END
I've adapted the instead of trigger on the view below - I'm assuming you want to upsert both bank and branch accordingly (although note that the branch address is not currently in the view).
That said, I would be careful of (ab)using an instead of trigger on an INSERT to do upserts - this might not be entirely intuitive to the reader.
Also, remember that the INSERTED pseudo table could contain a SET of rows, so needs to be adjusted to set based approach accordingly.
CREATE TRIGGER tr_VW_BANKBRANCH_INSERT ON VW_BANKBRANCH
INSTEAD OF INSERT
AS
BEGIN
SET NOCOUNT ON;
UPDATE b
SET bname = i.bname
FROM bank b
INNER JOIN inserted i
ON i.code = b.code;
UPDATE br
SET
br.brName = i.brName,
br.braddress = NULL -- TODO add this to the view
FROM branch br
INNER JOIN inserted i
ON br.bcode = i.code
AND br.brNum = i.brNum;
INSERT INTO bank(code, bname)
SELECT code, bname
FROM inserted i
WHERE NOT EXISTS
(SELECT 1 FROM bank b WHERE b.code = i.Code);
INSERT INTO Branch(brNum, brName, braddress, bcode)
SELECT brNum, brName, NULL, code
FROM inserted i
WHERE NOT EXISTS
(SELECT 1
FROM branch br
WHERE br.bcode = i.Code AND br.brNum = i.brNum);
END;
GO
SqlFiddle here - I've also adjusted the view to use a JOIN, rather than the old style of WHERE joins.
If you have SqlServer 2008 or later, you could also use MERGE instead of separate inserts and updates.
We have a table where we store all the exceptions (message, stackTrace, etc..), the table is getting big and we would like to reduce it.
There are plenty of repeated StackTraces, Messages, etc, but enabling compression produces a modest size reduction (10%) while I think much bigger benefits could come if somehow Sql Server will intern the strings in some per-column hash-table.
I could get some of the benefits if I normalize the table and extract StackTraces to another one, but exception messages, exception types, etc.. are also repeated.
Is there a way to enable string interning for some column in Sql Server?
There is no built-in way to do this. You could easily do something like:
SELECT MessageID = IDENTITY(INT, 1, 1), Message
INTO dbo.Messages
FROM dbo.HugeTable GROUP BY Message;
ALTER TABLE dbo.HugeTable ADD MessageID INT;
UPDATE h
SET h.MessageID = m.MessageID
FROM dbo.HugeTable AS h
INNER JOIN dbo.Messages AS m
ON h.Message = m.Message;
ALTER TABLE dbo.HugeTable DROP COLUMN Message;
Now you'll need to do a few things:
Change your logging procedure to perform an upsert to the Messages table
Add proper indexes to the messages table (wasn't sure of Message data type) and PK
Add FK to MessageID column
Rebuild indexes on HugeTable to reclaim space
Do this in a test environment first!
Aaron's posting answers the questions of adding interning to a table, but afterwards you will need to modify your application code and stored-procedures to work with the new schema.
...or so you might think. You can actually create a VIEW that returns data matching the old schema, and you can also support INSERT operations on the view too, which are translated into child operations on the Messages and HugeTable tables. For readability I'll use the names InternedStrings and ExceptionLogs for the tables.
So if the old table was this:
CREATE TABLE ExceptionLogs (
LogId int IDENTITY(1,1) NOT NULL PRIMARY KEY,
Message nvarchar(1024) NOT NULL,
ExceptionType nvarchar(512) NOT NULL,
StackTrace nvarchar(4096) NOT NULL
)
And the new tables are:
CREATE TABLE InternedStrings (
StringId int IDENTITY(1,1) NOT NULL PRIMARY KEY,
Value nvarchar(max) NOT NULL
)
CREATE TABLE ExceptionLogs2 ( -- note the new name
LogId int IDENTITY(1,1) NOT NULL PRIMARY KEY,
Message int NOT NULL,
ExceptionType int NOT NULL,
StackTrace int NOT NULL
)
Add an index to InternedStrings to make the value lookups faster:
CREATE UNIQUE NONCLUSTERED INDEX IX_U_InternedStrings_Value ON InternedStrings ( Value ASC )
Then you would also have a VIEW:
CREATE VIEW ExeptionLogs AS
SELECT
LogId,
MessageStrings .Value AS Message,
ExceptionTypeStrings.Value AS ExceptionType,
StackTraceStrings .Value AS StackTrace
FROM
ExceptionLogs2
INNER JOIN InternedStrings AS MessageStrings ON
MessageStrings.StringId = ExceptionLogs2.Message
INNER JOIN InternedStrings AS ExceptionTypeStrings ON
ExceptionTypeStrings.StringId = ExceptionLogs2.ExceptionType
INNER JOIN InternedStrings AS StackTraceStrings ON
StackTraceStrings.StringId = ExceptionLogs2.StackTrace
And to handle INSERT operations from unmodified clients:
CREATE TRIGGER ExceptionLogsInsertHandler
ON ExceptionLogs INSTEAD OF INSERT AS
DECLARE #messageId int = SELECT StringId FROM InternedStrings WHERE Value = inserted.Message
IF #messageId IS NULL
BEGIN
INSERT INTO InternedStrings ( Text ) VALUES ( inserted.Message )
SET #messageId = SCOPE_IDENTITY()
END
DECLARE #exceptionTypeId int = SELECT StringId FROM InternedStrings WHERE Value = inserted.ExceptionType
IF #exceptionTypeId IS NULL
BEGIN
INSERT INTO InternedStrings ( Text ) VALUES ( inserted.ExceptionType )
SET #exceptionTypeId = SCOPE_IDENTITY()
END
DECLARE #stackTraceId int = SELECT StringId FROM InternedStrings WHERE Value = inserted.StackTrace
IF #stackTraceId IS NULL
BEGIN
INSERT INTO InternedStrings ( Text ) VALUES ( inserted.StackTrace )
SET #stackTraceId = SCOPE_IDENTITY()
END
INSERT INTO ExceptionLogs2 ( Message, ExceptionType, StackTrace )
VALUES ( #messageId, #exceptionTypeId, #stackTraceId )
Note this TRIGGER can be improved: it only supports single-row insertions, and is not entirely concurrency-safe, though because previous data won't be mutated it means that there's a slight risk of data duplication in the InternedStrings table - and because of a UNIQUE index the insert will fail. There are different possible ways to handle this, such as using a TRANSACTION and changing the queries to use holdlock and updlock.