Incrementing revision numbers in table's composite key - sql-server

I'm running SQL Server 2014 locally for a database that will be deployed to an Azure SQL V12 database.
I have a table that stores values of extensible properties for a business-entity object, in this case the three tables look like this:
CREATE TABLE Widgets (
WidgetId bigint IDENTITY(1,1),
...
)
CREATE TABLE WidgetProperties (
PropertyId int IDENTITY(1,1),
Name nvarchar(50)
Type int -- 0 = int, 1 = string, 2 = date, etc
)
CREATE TABLE WidgetPropertyValues (
WidgetId bigint,
PropertyId int,
Revision int,
DateTime datetimeoffset(7),
Value varbinary(255)
CONSTRAINT [PK_WidgetPropertyValues] PRIMARY KEY CLUSTERED (
[WidgetId] ASC,
[PropertyIdId] ASC,
[Revision] ASC
)
)
ALTER TABLE dbo.WidgetPropertyValues WITH CHECK ADD CONSTRAINT FK_WidgetPropertyValues_WidgetProperties FOREIGN KEY( PropertyId )
REFERENCES dbo.WidgetProperties ( PropertyId )
ALTER TABLE dbo.WidgetPropertyValues WITH CHECK ADD CONSTRAINT FK_WidgetPropertyValues_Widgets FOREIGN KEY( WidgetId )
REFERENCES dbo.Widgets ( WidgetId )
So you see how WidgetId, PropertyId, Revision is a composite key and the table stores the entire history of Values (the current values are obtained by getting the rows with the biggest Revision number for each WidgetId + PropertyId.
I want to know how I can set-up the Revision column to increment by 1 for each WidgetId + PropertyId. I want data like this:
WidgetId, PropertyId, Revision, DateTime, Value
------------------------------------------------
1 1 1 123
1 1 2 456
1 1 3 789
1 2 1 012
IDENTITY wouldn't work because it's global to the table and the same applies with SEQUENCE objects.
Update I can think of a possible solution using an INSTEAD OF INSERT trigger:
CREATE TRIGGER WidgetPropertyValueInsertTrigger ON WidgetPropertyValues
INSTEAD OF INSERT
AS
BEGIN
DECLARE #maxRevision int
SELECT #maxRevision = ISNULL( MAX( Revision ), 0 ) FROM WidgetPropertyValues WHERE WidgetId = INSERTED.WidgetId AND PropertyId = INSERTED.PropertyId
INSERT INTO WidgetPropertyValues VALUES (
INSERTED.WidgetId,
INSERTED.PropertyId,
#maxRevision + 1,
INSERTED.DateTime,
INSERTED.Value,
)
END
(For the uninitiated, INSTEAD OF INSERT triggers run instead of any INSERT operation on the table, compared to a normal INSERT-trigger which runs before or after an INSERT operation)
I think this would be concurrency-safe because all INSERT operations have an implicit transaction, and any associated triggers are executed in the same transaction context, which should mean it's safe. Unless anyone can claim otherwise?

You code has a race condition - a concurrent transaction might select and insert the same Revision between your SELECT and your INSERT. That could cause occasional (primary) key violations in concurrent environment (forcing you to retry the entire transaction).
Instead of retrying the whole transaction, a better strategy is to retry only the INSERT. Simply put your code in a loop, and if key violation (and only key violation) happens, increment the Revision and try again.
Something like this (writing from my head):
DECLARE #maxRevision int = (
SELECT
#maxRevision = ISNULL(MAX(Revision), 0)
FROM
WidgetPropertyValues
WHERE
WidgetId = INSERTED.WidgetId
AND PropertyId = INSERTED.PropertyId
);
WHILE 0 = 0 BEGIN
SET #maxRevision = #maxRevision + 1;
BEGIN TRY
INSERT INTO WidgetPropertyValues
VALUES (
INSERTED.WidgetId,
INSERTED.PropertyId,
#maxRevision,
INSERTED.DateTime,
INSERTED.Value,
);
BREAK;
END TRY
BEGIN CATCH
-- The error was different from key violation,
-- in which case we just pass it back to caller.
IF ERROR_NUMBER() <> 2627
THROW;
-- Otherwise, this was a key violation, and we can let the loop
-- enter the next iteration (to retry with the incremented value).
END CATCH
END

Related

T-SQL logic for roll up and group by

I have a question to collapse or roll up data based on the logic below.
How can I implement it?
The logic that allows episodes to be condensed into a single continuous care episode is a discharge code of 22 followed by an admission code of 4 on the same day.
continuous care implementation update
EPN--is a business_key.
episode_continuous_care_key is an artificial key that can be a row number function.
Below is the table structure.
drop table #source
CREATE TABLE #source(patidid varchar(20),epn int,preadmitdate datetime,adminttime varchar(10),
admitcode varchar(10),datedischarge datetime,disctime varchar(10),disccode varchar(10))
INSERT INTO #source VALUES
(1849,1,'4/23/2020','7:29',1,'7/31/2020','9:03',22)
,(1849,2,'7/31/2020','11:00',4,'7/31/2020','12:09',22)
,(1849,3,'7/31/2020','13:10',4,'8/24/2020','10:36',10)
,(1849,4,'8/26/2020','12:25',2,null,null,null)
,(1850,1,'4/23/2020','7:33',1,'6/29/2020','7:30',22)
,(1850,2,'6/29/2020','9:35',4,'7/8/2020','10:51',7)
,(1850,3,'7/10/2020','11:51',3,'7/29/2020','9:12',7)
,(1850,4,'7/31/2020','11:00',2,'8/6/2020','10:24',22)
,(1850,5,'8/6/2020','12:26',4,null,null,null)
,(1851,1,'4/23/2020','7:35',1,'6/24/2020','13:45',22)
,(1851,2,'6/24/2020','15:06',4,'9/24/2020','15:00',2)
,(1851,3,'12/4/2020','8:59',0,null,null,null)
,(1852,1,'4/23/2020','7:37',1,'7/6/2020','11:15',20)
,(1852,2,'7/8/2020','10:56',0,'7/10/2020','11:46',2)
,(1852,3,'7/10/2020','11:47',2,'7/28/2020','13:16',22)
,(1852,4,'7/28/2020','15:17',4,'8/4/2020','11:37',22)
,(1852,5,'8/4/2020','13:40',4,'11/18/2020','15:43',2)
,(1852,6,'12/2/2020','15:23',2,null,null,null)
,(1853,1,'4/23/2020','7:40',1,'7/1/2020','8:30',22)
,(1853,2,'7/1/2020','14:57',4,'12/4/2020','12:55',7)
,(1854,1,'4/23/2020','7:44',1,'7/31/2020','13:07',20)
,(1854,2,'8/3/2020','16:30',0,'8/5/2020','9:32',2)
,(1854,3,'8/5/2020','10:34',2,'8/24/2020','8:15',22)
,(1854,4,'8/24/2020','10:33',4,'12/4/2020','7:30',22)
,(1854,5,'12/4/2020','9:13',4,null,null,null)
That Excel sheet image says little about your database design so I invented my own version that more or less resembles your image. With a proper database design the first step of the solution should not be required...
Unpivot timestamp information so that admission timestamp and discharge timestamps become one column.
I used a common table expression Log1 for this action.
Use the codes to filter out the start of the continuous care periods. Those are the admissions, marked with Code.IsAdmission = 1 in my database design.
Also add the next period start as another column by using the lead() function.
These are all the actions from Log2.
Add a row number as continuous care key.
Using the next period start date, find the current continuous period end date with a cross apply.
Replace empty period end dates with the current date using the coalesce() function.
Calculate the difference as the continuous care period duration with the datediff() function.
Sample data
create table Codes
(
Code int,
Description nvarchar(50),
IsAdmission bit
);
insert into Codes (Code, Description, IsAdmission) values
( 1, 'First admission', 1),
( 2, 'Re-admission', 1),
( 4, 'Campus transfer IN', 0),
(10, 'Trial visit', 0),
(22, 'Campus transfer OUT', 0);
create table PatientLogs
(
PatientId int,
AdmitDateTime smalldatetime,
AdmitCode int,
DischargeDateTime smalldatetime,
DischargeCode int
);
insert into PatientLogs (PatientId, AdmitDateTime, AdmitCode, DischargeDateTime, DischargeCode) values
(1849, '2020-04-23 07:29', 1, '2020-07-31 09:03', 22),
(1849, '2020-07-31 11:00', 4, '2020-07-31 12:09', 22),
(1849, '2020-07-31 13:10', 4, '2020-08-24 10:36', 10),
(1849, '2020-08-26 12:25', 2, null, null);
Solution
with Log1 as
(
select updt.PatientId,
case updt.DateTimeType
when 'AdmitDateTime' then updt.AdmitCode
when 'DischargeDateTime' then updt.DischargeCode
end as Code,
updt.LogDateTime,
updt.DateTimeType
from PatientLogs pl
unpivot (LogDateTime for DateTimeType in (AdmitDateTime, DischargeDateTime)) updt
),
Log2 as (
select l.PatientId,
l.Code,
l.LogDateTime,
lead(l.LogDateTime) over(partition by l.PatientId order by l.LogDateTime) as LogDateTimeNext
from Log1 l
join Codes c
on c.Code = l.Code
where c.IsAdmission = 1
)
select la.PatientId,
row_number() over(partition by la.PatientId order by la.LogDateTime) as ContCareKey,
la.LogDateTime as AdmitDateTime,
coalesce(ld.LogDateTime, convert(smalldatetime, getdate())) as DischargeDateTime,
datediff(day, la.LogDateTime, coalesce(ld.LogDateTime, convert(smalldatetime, getdate()))) as ContStay
from Log2 la -- log admission
outer apply ( select top 1 l1.LogDateTime
from Log1 l1
where l1.PatientId = la.PatientId
and l1.LogDateTime < la.LogDateTimeNext
order by l1.LogDateTime desc ) ld -- log discharge
order by la.PatientId,
la.LogDateTime;
Result
PatientId ContCareKey AdmitDateTime DischargeDateTime ContStay
--------- ----------- ---------------- ----------------- --------
1849 1 2020-04-23 07:29 2020-08-24 10:36 123
1849 2 2020-08-26 12:25 2021-02-03 12:49 161
Fiddle to see things in action with intermediate results.
Here is a T-SQL solution that contains primary and foreign key relationships.
To make it a bit more realistic, I added a simple "Patient" table.
I put all your "codes" into a single table which should make it easier to manage the codes.
I do not understand the purpose of your concept of "continuous care" so I just added an "is first" binary column to the Admission table.
You might also consider adding something about the medical condition for which the patient is being treated.
CREATE SCHEMA Codes
GO
GO
CREATE TABLE dbo.Code
(
codeNr int NOT NULL,
description nvarchar(50),
CONSTRAINT Code_PK PRIMARY KEY(codeNr)
)
GO
CREATE TABLE dbo.Patient
(
patientNr int NOT NULL,
birthDate date NOT NULL,
firstName nvarchar(max) NOT NULL,
lastName nvarchar(max) NOT NULL,
CONSTRAINT Patient_PK PRIMARY KEY(patientNr)
)
GO
CREATE TABLE dbo.Admission
(
admitDateTime time NOT NULL,
patientNr int NOT NULL,
admitCode int,
isFirst bit,
CONSTRAINT Admission_PK PRIMARY KEY(patientNr, admitDateTime)
)
GO
CREATE TABLE dbo.Discharge
(
dischargeDateTime time NOT NULL,
patientNr int NOT NULL,
dischargeCode int NOT NULL,
CONSTRAINT Discharge_PK PRIMARY KEY(patientNr, dischargeDateTime)
)
GO
ALTER TABLE dbo.Admission ADD CONSTRAINT Admission_FK1 FOREIGN KEY (patientNr) REFERENCES dbo.Patient (patientNr) ON DELETE NO ACTION ON UPDATE NO ACTION
GO
ALTER TABLE dbo.Admission ADD CONSTRAINT Admission_FK2 FOREIGN KEY (admitCode) REFERENCES dbo.Code (codeNr) ON DELETE NO ACTION ON UPDATE NO ACTION
GO
ALTER TABLE dbo.Discharge ADD CONSTRAINT Discharge_FK1 FOREIGN KEY (patientNr) REFERENCES dbo.Patient (patientNr) ON DELETE NO ACTION ON UPDATE NO ACTION
GO
ALTER TABLE dbo.Discharge ADD CONSTRAINT Discharge_FK2 FOREIGN KEY (dischargeCode) REFERENCES dbo.Code (codeNr) ON DELETE NO ACTION ON UPDATE NO ACTION
GO
GO

How to create a column null or not-null dependent on the value of another column?

I'm using database first approach with EF core and trying to figure out a clean solution to the below problem -
Consider a Student attendance table (irrelevant columns removed) below that stores date of class and allows the student to enter his class rating -
create table Student (
Id int Identity(1, 1) not null,
ClassDate smalldatetime not null,
ClassRatingByStudent varchar(250) not null
)
This is a webapp where school attendance system automatically populates the above table at EOD and then the student (let's say a few days later) is required to add class ratings. When the table is populated by the school attendance system, there is nothing in the ClassRatingByStudent column. Then when the student logs in, he must add the rating.
As you see, ClassRatingByStudent must be null when the school attendance system populates the table and must be not-null when the student saves his changes. One obvious solution is make ClassRatingByStudent column nullable ad handle it in the code but I'm wondering if there is a neater database (or maybe EF) level solution exists or some sort of pattern/architecture guidelines for this type of scenarios?
I don't know but maybe CHECK constraint could help you:
CREATE TABLE TestTable(
ID int NOT NULL IDENTITY,
RatingAllowed bit NOT NULL DEFAULT 0, -- switcher
RatingValue varchar(250),
CONSTRAINT PK_TestTable PRIMARY KEY(ID),
CONSTRAINT CK_TestTable_RatingValue CHECK( -- constraint
CASE
WHEN RatingAllowed=0 AND RatingValue IS NULL THEN 1
WHEN RatingAllowed=1 AND RatingValue IS NOT NULL THEN 1
ELSE 0
END=1
)
)
INSERT TestTable(RatingAllowed,RatingValue)VALUES(0,NULL)
INSERT TestTable(RatingAllowed,RatingValue)VALUES(1,'AAA')
-- The INSERT statement conflicted with the CHECK constraint "CK_TestTable_RatingValue"
INSERT TestTable(RatingAllowed,RatingValue)VALUES(0,'AAA')
INSERT TestTable(RatingAllowed,RatingValue)VALUES(1,NULL)
I found a variant how to check using another table as switcher
CREATE TABLE TableA(
ID int NOT NULL IDENTITY PRIMARY KEY,
StudentID int NOT NULL,
Grade int
)
CREATE TABLE TableB(
StudentID int NOT NULL PRIMARY KEY
)
GO
-- auxiliary function
CREATE FUNCTION GradeIsAllowed(#StudentID int)
RETURNS bit
BEGIN
DECLARE #Result bit=CASE WHEN EXISTS(SELECT * FROM TableB WHERE StudentID=#StudentID) THEN 1 ELSE 0 END
RETURN #Result
END
GO
-- constraint to check
ALTER TABLE TableA ADD CONSTRAINT CK_TableA_Grade CHECK(
CASE dbo.GradeIsAllowed(StudentID) -- then we can use the function here
WHEN 1 THEN CASE WHEN Grade IS NOT NULL THEN 1 ELSE 0 END
WHEN 0 THEN CASE WHEN Grade IS NULL THEN 1 ELSE 0 END
END=1)
GO
-- Tests
INSERT TableB(StudentID)VALUES(2) -- allowed student
INSERT TableA(StudentID,Grade)VALUES(1,NULL) -- OK
INSERT TableA(StudentID,Grade)VALUES(2,5) -- OK
INSERT TableA(StudentID,Grade)VALUES(1,4) -- Error
INSERT TableA(StudentID,Grade)VALUES(2,NULL) -- Error
INSERT TableB(StudentID)VALUES(1) -- add 1
UPDATE TableA SET Grade=4 WHERE StudentID=1 -- OK
UPDATE TableA SET Grade=NULL WHERE StudentID=1 -- Error

How can I use a trigger to allow an incremented, user-assigned ID?

I am moving a small database from MS Access into SQL Server. Each year, the users would create a new Access database and have clean data, but this change will put data across the years into one pot. The users have relied on the autonumber value in Access as a reference for records. That is very inaccurate if, say, 238 records are removed.
So I am trying to accommodate them with an id column they can control (somewhat). They will not see the real primary key in the SQL table, but I want to give them an ID they can edit, but still be unique.
I've been working with this trigger, but it has taken much longer than I expected.
Everything SEEMS TO work fine, except I don't understand why I have the same data in my INSERTED table as the table the trigger is on. (See note in code.)
ALTER TRIGGER [dbo].[trg_tblAppData]
ON [dbo].[tblAppData]
AFTER INSERT,UPDATE
AS
BEGIN
SET NOCOUNT ON;
DECLARE #NewUserEnteredId int = 0;
DECLARE #RowIdForUpdate int = 0;
DECLARE #CurrentUserEnteredId int = 0;
DECLARE #LoopCount int = 0;
--*** Loop through all records to be updated because the values will be incremented.
WHILE (1 = 1)
BEGIN
SET #LoopCount = #LoopCount + 1;
IF (#LoopCount > (SELECT Count(*) FROM INSERTED))
BREAK;
SELECT TOP 1 #RowIdForUpdate = ID, #CurrentUserEnteredId = UserEnteredId FROM INSERTED WHERE ID > #RowIdForUpdate ORDER BY ID DESC;
IF (#RowIdForUpdate IS NULL)
BREAK;
-- WHY IS THERE A MATCH HERE? HAS THE RECORD ALREADY BEEN INSERTED?
IF EXISTS (SELECT UserEnteredId FROM tblAppData WHERE UserEnteredId = #CurrentUserEnteredId)
BEGIN
SET #NewUserEnteredId = (SELECT Max(t1.UserEnteredId) + 1 FROM tblAppData t1);
END
ELSE
SET #NewUserEnteredId = #CurrentUserEnteredId;
UPDATE tblAppData
SET UserEnteredId = #NewUserEnteredId
FROM tblAppData a
WHERE a.ID = #RowIdForUpdate
END
END
Here is what I want to accomplish:
When new record(s) are added, it should increment values from the Max existing
When a user overrides a value, it should check to see the existence of that value. If found restore the existing value, otherwise allow the change.
This trigger allows for multiple rows being added at a time.
It is great for this to be efficient for future use, but in reality, they will only add 1,000 records a year.
I wouldn't use a trigger to accomplish this.
Here is a script you can use to create a sequence (op didn't tag version), create the primary key, use the sequence as your special id, and put a constraint on the column.
create table dbo.test (
testid int identity(1,1) not null primary key clustered
, myid int null constraint UQ_ unique
, somevalue nvarchar(255) null
);
create sequence dbo.myid
as int
start with 1
increment by 1;
alter table dbo.test
add default next value for dbo.myid for myid;
insert into dbo.test (somevalue)
select 'this' union all
select 'that' union all
select 'and' union all
select 'this';
insert into dbo.test (myid, somevalue)
select 33, 'oops';
select *
from dbo.test
insert into dbo.test (somevalue)
select 'oh the fun';
select *
from dbo.test
--| This should error
insert into dbo.test (myid, somevalue)
select 3, 'This is NO fun';
Here is the result set:
testid myid somevalue
1 1 this
2 2 that
3 3 and
4 4 this
5 33 oops
6 5 oh the fun
And at the very end a test, which will error.

One Bowler can not bowl two consecutive over in cricket

I am working on Cricket Project. I have a table OverDetails. I want to insert data in this table.
ID OverNumber BowlerID InningsID
1 1 150 1
2 4 160 1
3 3 165 1
4 2 150 1
Row_1, Row_2 and Row_3 are legal. Row_4 is not legal, because one bowler can not through two consecutive overs in one innings. It is not necessary that overs are added consecutively in database.
I have added a constraint in SQL Server.
#Constraint_1
ALTER TABLE OverDetails ADD CONSTRAINT UniqueOverInInning
UNIQUE(OverNumber, BowlerID, IninngsID);
This constraint works perfectly.
I need a check like this:
#Constraint_2
ALTER TABLE OverDetails ADD CONSTRAINT UniqueConsecutiveBowlerInOneInning
CHECK (OverNumber + 1 != OverNumber and BowlerID + 1 != BowlerID
and IninngID + 1 != IninngID)
You need a function which returns a last BowlerID from a given InningID:
CREATE FUNCTION dbo.GetBowlerID
( #InningId INT, #OverNumber INT, #BowlerID INT)
RETURNS INT
AS
BEGIN
RETURN (SELECT top 1 CASE WHEN
(SELECT BowlerID
FROM OverDetails
WHERE InningsId = #InningId AND OverNumber = #OverNumber - 1 ) = #BowlerID
OR
(SELECT BowlerID
FROM OverDetails
WHERE InningsId = #InningId AND OverNumber = #OverNumber + 1 ) = #BowlerID
THEN 1 else 0 end)
END
Then you can put it into a check constraint:
ALTER TABLE OverDetails ADD CONSTRAINT UniqueConsecutiveBowlerInOneInning
CHECK (dbo.GetBowlerID(InningsId, OverNumber, BowlerID)=0)
Check constraints cannot directly reference other rows data. There are some techniques that try to use UDFs to get around this limitation but they tend to not work well. Especially in this case where I presume the insert of row 4 should also be blocked if it had a bowlerID of 165 since that would mean overs 2&3 shared a bowler.
Instead, we can implement this with a pair of views. I usually put DRI somewhere in the name of views like this to indicate that they're there for Declarative Referential Integrity reasons, not because I intend people to query them.
create table dbo.Bowling (
ID int not null,
OverNumber int not null,
BowlerID int not null,
InningsID int not null,
constraint PK_Bowling PRIMARY KEY (ID),
constraint UQ_Bowling_Overs UNIQUE (OverNumber,InningsID)
)
go
create view dbo.Bowling_DRI_SuccessiveOvers_Odd
with schemabinding
as
select
(OverNumber/2) as OddON,
BowlerID
from
dbo.Bowling
go
create unique clustered index UQ_Bowling_DRI_SuccessiveOvers_Odd on dbo.Bowling_DRI_SuccessiveOvers_Odd (OddON,BowlerID)
go
create view dbo.Bowling_DRI_SuccessiveOvers_Even
with schemabinding
as
select
((OverNumber+1)/2) as EvenON,
BowlerID
from
dbo.Bowling
go
create unique clustered index UQ_Bowling_DRI_SuccessiveOvers_Even on dbo.Bowling_DRI_SuccessiveOvers_Even (EvenON,BowlerID)
go
insert into dbo.Bowling(ID,OverNumber,BowlerID,InningsID) values
(1,1,150,1),
(2,4,160,1),
(3,3,165,1)
go
insert into dbo.Bowling(ID,OverNumber,BowlerID,InningsID) values
(4,2,150,1)
This final insert generates the error:
Msg 2601, Level 14, State 1, Line 37 Cannot insert duplicate key row
in object 'dbo.Bowling_DRI_SuccessiveOvers_Even' with unique index
'UQ_Bowling_DRI_SuccessiveOvers_Even'. The duplicate key value is (1,
150). The statement has been terminated.
Hopefully, you can see the trick I'm employing to make these views check your desired constraint - it's set up so that rows are paired with either their (logical, based on OrderNumber) successor or predecessor based on dividing the OrderNumber by two using integer maths.
We then apply unique constraints on these pairs and including the BowlerID. Only if the same bowler bowls two successive overs will we generate more than one row with the same (OddON/EvenON) and BowlerID values.
Maybe this one?
create function dbo.chk_fnk (#OverNumber int, #BowlerID int, #InningsID int)
returns int
as
begin
return
case when
exists (select *
from dbo.OverDetails
where BowlerID = #BowlerID and abs(OverNumber - #OverNumber) = 1 and InningsID = #InningsID)
then 1
else 0
end;
end;
go
ALTER TABLE dbo.OverDetails ADD CONSTRAINT UniqueConsecutiveBowlerInOneInning
CHECK (dbo.chk_fnk(OverNumber, BowlerID, InningsID) = 0);

Sql Server string interning

We have a table where we store all the exceptions (message, stackTrace, etc..), the table is getting big and we would like to reduce it.
There are plenty of repeated StackTraces, Messages, etc, but enabling compression produces a modest size reduction (10%) while I think much bigger benefits could come if somehow Sql Server will intern the strings in some per-column hash-table.
I could get some of the benefits if I normalize the table and extract StackTraces to another one, but exception messages, exception types, etc.. are also repeated.
Is there a way to enable string interning for some column in Sql Server?
There is no built-in way to do this. You could easily do something like:
SELECT MessageID = IDENTITY(INT, 1, 1), Message
INTO dbo.Messages
FROM dbo.HugeTable GROUP BY Message;
ALTER TABLE dbo.HugeTable ADD MessageID INT;
UPDATE h
SET h.MessageID = m.MessageID
FROM dbo.HugeTable AS h
INNER JOIN dbo.Messages AS m
ON h.Message = m.Message;
ALTER TABLE dbo.HugeTable DROP COLUMN Message;
Now you'll need to do a few things:
Change your logging procedure to perform an upsert to the Messages table
Add proper indexes to the messages table (wasn't sure of Message data type) and PK
Add FK to MessageID column
Rebuild indexes on HugeTable to reclaim space
Do this in a test environment first!
Aaron's posting answers the questions of adding interning to a table, but afterwards you will need to modify your application code and stored-procedures to work with the new schema.
...or so you might think. You can actually create a VIEW that returns data matching the old schema, and you can also support INSERT operations on the view too, which are translated into child operations on the Messages and HugeTable tables. For readability I'll use the names InternedStrings and ExceptionLogs for the tables.
So if the old table was this:
CREATE TABLE ExceptionLogs (
LogId int IDENTITY(1,1) NOT NULL PRIMARY KEY,
Message nvarchar(1024) NOT NULL,
ExceptionType nvarchar(512) NOT NULL,
StackTrace nvarchar(4096) NOT NULL
)
And the new tables are:
CREATE TABLE InternedStrings (
StringId int IDENTITY(1,1) NOT NULL PRIMARY KEY,
Value nvarchar(max) NOT NULL
)
CREATE TABLE ExceptionLogs2 ( -- note the new name
LogId int IDENTITY(1,1) NOT NULL PRIMARY KEY,
Message int NOT NULL,
ExceptionType int NOT NULL,
StackTrace int NOT NULL
)
Add an index to InternedStrings to make the value lookups faster:
CREATE UNIQUE NONCLUSTERED INDEX IX_U_InternedStrings_Value ON InternedStrings ( Value ASC )
Then you would also have a VIEW:
CREATE VIEW ExeptionLogs AS
SELECT
LogId,
MessageStrings .Value AS Message,
ExceptionTypeStrings.Value AS ExceptionType,
StackTraceStrings .Value AS StackTrace
FROM
ExceptionLogs2
INNER JOIN InternedStrings AS MessageStrings ON
MessageStrings.StringId = ExceptionLogs2.Message
INNER JOIN InternedStrings AS ExceptionTypeStrings ON
ExceptionTypeStrings.StringId = ExceptionLogs2.ExceptionType
INNER JOIN InternedStrings AS StackTraceStrings ON
StackTraceStrings.StringId = ExceptionLogs2.StackTrace
And to handle INSERT operations from unmodified clients:
CREATE TRIGGER ExceptionLogsInsertHandler
ON ExceptionLogs INSTEAD OF INSERT AS
DECLARE #messageId int = SELECT StringId FROM InternedStrings WHERE Value = inserted.Message
IF #messageId IS NULL
BEGIN
INSERT INTO InternedStrings ( Text ) VALUES ( inserted.Message )
SET #messageId = SCOPE_IDENTITY()
END
DECLARE #exceptionTypeId int = SELECT StringId FROM InternedStrings WHERE Value = inserted.ExceptionType
IF #exceptionTypeId IS NULL
BEGIN
INSERT INTO InternedStrings ( Text ) VALUES ( inserted.ExceptionType )
SET #exceptionTypeId = SCOPE_IDENTITY()
END
DECLARE #stackTraceId int = SELECT StringId FROM InternedStrings WHERE Value = inserted.StackTrace
IF #stackTraceId IS NULL
BEGIN
INSERT INTO InternedStrings ( Text ) VALUES ( inserted.StackTrace )
SET #stackTraceId = SCOPE_IDENTITY()
END
INSERT INTO ExceptionLogs2 ( Message, ExceptionType, StackTrace )
VALUES ( #messageId, #exceptionTypeId, #stackTraceId )
Note this TRIGGER can be improved: it only supports single-row insertions, and is not entirely concurrency-safe, though because previous data won't be mutated it means that there's a slight risk of data duplication in the InternedStrings table - and because of a UNIQUE index the insert will fail. There are different possible ways to handle this, such as using a TRANSACTION and changing the queries to use holdlock and updlock.

Resources