I have a question to collapse or roll up data based on the logic below.
How can I implement it?
The logic that allows episodes to be condensed into a single continuous care episode is a discharge code of 22 followed by an admission code of 4 on the same day.
continuous care implementation update
EPN--is a business_key.
episode_continuous_care_key is an artificial key that can be a row number function.
Below is the table structure.
drop table #source
CREATE TABLE #source(patidid varchar(20),epn int,preadmitdate datetime,adminttime varchar(10),
admitcode varchar(10),datedischarge datetime,disctime varchar(10),disccode varchar(10))
INSERT INTO #source VALUES
(1849,1,'4/23/2020','7:29',1,'7/31/2020','9:03',22)
,(1849,2,'7/31/2020','11:00',4,'7/31/2020','12:09',22)
,(1849,3,'7/31/2020','13:10',4,'8/24/2020','10:36',10)
,(1849,4,'8/26/2020','12:25',2,null,null,null)
,(1850,1,'4/23/2020','7:33',1,'6/29/2020','7:30',22)
,(1850,2,'6/29/2020','9:35',4,'7/8/2020','10:51',7)
,(1850,3,'7/10/2020','11:51',3,'7/29/2020','9:12',7)
,(1850,4,'7/31/2020','11:00',2,'8/6/2020','10:24',22)
,(1850,5,'8/6/2020','12:26',4,null,null,null)
,(1851,1,'4/23/2020','7:35',1,'6/24/2020','13:45',22)
,(1851,2,'6/24/2020','15:06',4,'9/24/2020','15:00',2)
,(1851,3,'12/4/2020','8:59',0,null,null,null)
,(1852,1,'4/23/2020','7:37',1,'7/6/2020','11:15',20)
,(1852,2,'7/8/2020','10:56',0,'7/10/2020','11:46',2)
,(1852,3,'7/10/2020','11:47',2,'7/28/2020','13:16',22)
,(1852,4,'7/28/2020','15:17',4,'8/4/2020','11:37',22)
,(1852,5,'8/4/2020','13:40',4,'11/18/2020','15:43',2)
,(1852,6,'12/2/2020','15:23',2,null,null,null)
,(1853,1,'4/23/2020','7:40',1,'7/1/2020','8:30',22)
,(1853,2,'7/1/2020','14:57',4,'12/4/2020','12:55',7)
,(1854,1,'4/23/2020','7:44',1,'7/31/2020','13:07',20)
,(1854,2,'8/3/2020','16:30',0,'8/5/2020','9:32',2)
,(1854,3,'8/5/2020','10:34',2,'8/24/2020','8:15',22)
,(1854,4,'8/24/2020','10:33',4,'12/4/2020','7:30',22)
,(1854,5,'12/4/2020','9:13',4,null,null,null)
That Excel sheet image says little about your database design so I invented my own version that more or less resembles your image. With a proper database design the first step of the solution should not be required...
Unpivot timestamp information so that admission timestamp and discharge timestamps become one column.
I used a common table expression Log1 for this action.
Use the codes to filter out the start of the continuous care periods. Those are the admissions, marked with Code.IsAdmission = 1 in my database design.
Also add the next period start as another column by using the lead() function.
These are all the actions from Log2.
Add a row number as continuous care key.
Using the next period start date, find the current continuous period end date with a cross apply.
Replace empty period end dates with the current date using the coalesce() function.
Calculate the difference as the continuous care period duration with the datediff() function.
Sample data
create table Codes
(
Code int,
Description nvarchar(50),
IsAdmission bit
);
insert into Codes (Code, Description, IsAdmission) values
( 1, 'First admission', 1),
( 2, 'Re-admission', 1),
( 4, 'Campus transfer IN', 0),
(10, 'Trial visit', 0),
(22, 'Campus transfer OUT', 0);
create table PatientLogs
(
PatientId int,
AdmitDateTime smalldatetime,
AdmitCode int,
DischargeDateTime smalldatetime,
DischargeCode int
);
insert into PatientLogs (PatientId, AdmitDateTime, AdmitCode, DischargeDateTime, DischargeCode) values
(1849, '2020-04-23 07:29', 1, '2020-07-31 09:03', 22),
(1849, '2020-07-31 11:00', 4, '2020-07-31 12:09', 22),
(1849, '2020-07-31 13:10', 4, '2020-08-24 10:36', 10),
(1849, '2020-08-26 12:25', 2, null, null);
Solution
with Log1 as
(
select updt.PatientId,
case updt.DateTimeType
when 'AdmitDateTime' then updt.AdmitCode
when 'DischargeDateTime' then updt.DischargeCode
end as Code,
updt.LogDateTime,
updt.DateTimeType
from PatientLogs pl
unpivot (LogDateTime for DateTimeType in (AdmitDateTime, DischargeDateTime)) updt
),
Log2 as (
select l.PatientId,
l.Code,
l.LogDateTime,
lead(l.LogDateTime) over(partition by l.PatientId order by l.LogDateTime) as LogDateTimeNext
from Log1 l
join Codes c
on c.Code = l.Code
where c.IsAdmission = 1
)
select la.PatientId,
row_number() over(partition by la.PatientId order by la.LogDateTime) as ContCareKey,
la.LogDateTime as AdmitDateTime,
coalesce(ld.LogDateTime, convert(smalldatetime, getdate())) as DischargeDateTime,
datediff(day, la.LogDateTime, coalesce(ld.LogDateTime, convert(smalldatetime, getdate()))) as ContStay
from Log2 la -- log admission
outer apply ( select top 1 l1.LogDateTime
from Log1 l1
where l1.PatientId = la.PatientId
and l1.LogDateTime < la.LogDateTimeNext
order by l1.LogDateTime desc ) ld -- log discharge
order by la.PatientId,
la.LogDateTime;
Result
PatientId ContCareKey AdmitDateTime DischargeDateTime ContStay
--------- ----------- ---------------- ----------------- --------
1849 1 2020-04-23 07:29 2020-08-24 10:36 123
1849 2 2020-08-26 12:25 2021-02-03 12:49 161
Fiddle to see things in action with intermediate results.
Here is a T-SQL solution that contains primary and foreign key relationships.
To make it a bit more realistic, I added a simple "Patient" table.
I put all your "codes" into a single table which should make it easier to manage the codes.
I do not understand the purpose of your concept of "continuous care" so I just added an "is first" binary column to the Admission table.
You might also consider adding something about the medical condition for which the patient is being treated.
CREATE SCHEMA Codes
GO
GO
CREATE TABLE dbo.Code
(
codeNr int NOT NULL,
description nvarchar(50),
CONSTRAINT Code_PK PRIMARY KEY(codeNr)
)
GO
CREATE TABLE dbo.Patient
(
patientNr int NOT NULL,
birthDate date NOT NULL,
firstName nvarchar(max) NOT NULL,
lastName nvarchar(max) NOT NULL,
CONSTRAINT Patient_PK PRIMARY KEY(patientNr)
)
GO
CREATE TABLE dbo.Admission
(
admitDateTime time NOT NULL,
patientNr int NOT NULL,
admitCode int,
isFirst bit,
CONSTRAINT Admission_PK PRIMARY KEY(patientNr, admitDateTime)
)
GO
CREATE TABLE dbo.Discharge
(
dischargeDateTime time NOT NULL,
patientNr int NOT NULL,
dischargeCode int NOT NULL,
CONSTRAINT Discharge_PK PRIMARY KEY(patientNr, dischargeDateTime)
)
GO
ALTER TABLE dbo.Admission ADD CONSTRAINT Admission_FK1 FOREIGN KEY (patientNr) REFERENCES dbo.Patient (patientNr) ON DELETE NO ACTION ON UPDATE NO ACTION
GO
ALTER TABLE dbo.Admission ADD CONSTRAINT Admission_FK2 FOREIGN KEY (admitCode) REFERENCES dbo.Code (codeNr) ON DELETE NO ACTION ON UPDATE NO ACTION
GO
ALTER TABLE dbo.Discharge ADD CONSTRAINT Discharge_FK1 FOREIGN KEY (patientNr) REFERENCES dbo.Patient (patientNr) ON DELETE NO ACTION ON UPDATE NO ACTION
GO
ALTER TABLE dbo.Discharge ADD CONSTRAINT Discharge_FK2 FOREIGN KEY (dischargeCode) REFERENCES dbo.Code (codeNr) ON DELETE NO ACTION ON UPDATE NO ACTION
GO
GO
I'm using database first approach with EF core and trying to figure out a clean solution to the below problem -
Consider a Student attendance table (irrelevant columns removed) below that stores date of class and allows the student to enter his class rating -
create table Student (
Id int Identity(1, 1) not null,
ClassDate smalldatetime not null,
ClassRatingByStudent varchar(250) not null
)
This is a webapp where school attendance system automatically populates the above table at EOD and then the student (let's say a few days later) is required to add class ratings. When the table is populated by the school attendance system, there is nothing in the ClassRatingByStudent column. Then when the student logs in, he must add the rating.
As you see, ClassRatingByStudent must be null when the school attendance system populates the table and must be not-null when the student saves his changes. One obvious solution is make ClassRatingByStudent column nullable ad handle it in the code but I'm wondering if there is a neater database (or maybe EF) level solution exists or some sort of pattern/architecture guidelines for this type of scenarios?
I don't know but maybe CHECK constraint could help you:
CREATE TABLE TestTable(
ID int NOT NULL IDENTITY,
RatingAllowed bit NOT NULL DEFAULT 0, -- switcher
RatingValue varchar(250),
CONSTRAINT PK_TestTable PRIMARY KEY(ID),
CONSTRAINT CK_TestTable_RatingValue CHECK( -- constraint
CASE
WHEN RatingAllowed=0 AND RatingValue IS NULL THEN 1
WHEN RatingAllowed=1 AND RatingValue IS NOT NULL THEN 1
ELSE 0
END=1
)
)
INSERT TestTable(RatingAllowed,RatingValue)VALUES(0,NULL)
INSERT TestTable(RatingAllowed,RatingValue)VALUES(1,'AAA')
-- The INSERT statement conflicted with the CHECK constraint "CK_TestTable_RatingValue"
INSERT TestTable(RatingAllowed,RatingValue)VALUES(0,'AAA')
INSERT TestTable(RatingAllowed,RatingValue)VALUES(1,NULL)
I found a variant how to check using another table as switcher
CREATE TABLE TableA(
ID int NOT NULL IDENTITY PRIMARY KEY,
StudentID int NOT NULL,
Grade int
)
CREATE TABLE TableB(
StudentID int NOT NULL PRIMARY KEY
)
GO
-- auxiliary function
CREATE FUNCTION GradeIsAllowed(#StudentID int)
RETURNS bit
BEGIN
DECLARE #Result bit=CASE WHEN EXISTS(SELECT * FROM TableB WHERE StudentID=#StudentID) THEN 1 ELSE 0 END
RETURN #Result
END
GO
-- constraint to check
ALTER TABLE TableA ADD CONSTRAINT CK_TableA_Grade CHECK(
CASE dbo.GradeIsAllowed(StudentID) -- then we can use the function here
WHEN 1 THEN CASE WHEN Grade IS NOT NULL THEN 1 ELSE 0 END
WHEN 0 THEN CASE WHEN Grade IS NULL THEN 1 ELSE 0 END
END=1)
GO
-- Tests
INSERT TableB(StudentID)VALUES(2) -- allowed student
INSERT TableA(StudentID,Grade)VALUES(1,NULL) -- OK
INSERT TableA(StudentID,Grade)VALUES(2,5) -- OK
INSERT TableA(StudentID,Grade)VALUES(1,4) -- Error
INSERT TableA(StudentID,Grade)VALUES(2,NULL) -- Error
INSERT TableB(StudentID)VALUES(1) -- add 1
UPDATE TableA SET Grade=4 WHERE StudentID=1 -- OK
UPDATE TableA SET Grade=NULL WHERE StudentID=1 -- Error
I am working on Cricket Project. I have a table OverDetails. I want to insert data in this table.
ID OverNumber BowlerID InningsID
1 1 150 1
2 4 160 1
3 3 165 1
4 2 150 1
Row_1, Row_2 and Row_3 are legal. Row_4 is not legal, because one bowler can not through two consecutive overs in one innings. It is not necessary that overs are added consecutively in database.
I have added a constraint in SQL Server.
#Constraint_1
ALTER TABLE OverDetails ADD CONSTRAINT UniqueOverInInning
UNIQUE(OverNumber, BowlerID, IninngsID);
This constraint works perfectly.
I need a check like this:
#Constraint_2
ALTER TABLE OverDetails ADD CONSTRAINT UniqueConsecutiveBowlerInOneInning
CHECK (OverNumber + 1 != OverNumber and BowlerID + 1 != BowlerID
and IninngID + 1 != IninngID)
You need a function which returns a last BowlerID from a given InningID:
CREATE FUNCTION dbo.GetBowlerID
( #InningId INT, #OverNumber INT, #BowlerID INT)
RETURNS INT
AS
BEGIN
RETURN (SELECT top 1 CASE WHEN
(SELECT BowlerID
FROM OverDetails
WHERE InningsId = #InningId AND OverNumber = #OverNumber - 1 ) = #BowlerID
OR
(SELECT BowlerID
FROM OverDetails
WHERE InningsId = #InningId AND OverNumber = #OverNumber + 1 ) = #BowlerID
THEN 1 else 0 end)
END
Then you can put it into a check constraint:
ALTER TABLE OverDetails ADD CONSTRAINT UniqueConsecutiveBowlerInOneInning
CHECK (dbo.GetBowlerID(InningsId, OverNumber, BowlerID)=0)
Check constraints cannot directly reference other rows data. There are some techniques that try to use UDFs to get around this limitation but they tend to not work well. Especially in this case where I presume the insert of row 4 should also be blocked if it had a bowlerID of 165 since that would mean overs 2&3 shared a bowler.
Instead, we can implement this with a pair of views. I usually put DRI somewhere in the name of views like this to indicate that they're there for Declarative Referential Integrity reasons, not because I intend people to query them.
create table dbo.Bowling (
ID int not null,
OverNumber int not null,
BowlerID int not null,
InningsID int not null,
constraint PK_Bowling PRIMARY KEY (ID),
constraint UQ_Bowling_Overs UNIQUE (OverNumber,InningsID)
)
go
create view dbo.Bowling_DRI_SuccessiveOvers_Odd
with schemabinding
as
select
(OverNumber/2) as OddON,
BowlerID
from
dbo.Bowling
go
create unique clustered index UQ_Bowling_DRI_SuccessiveOvers_Odd on dbo.Bowling_DRI_SuccessiveOvers_Odd (OddON,BowlerID)
go
create view dbo.Bowling_DRI_SuccessiveOvers_Even
with schemabinding
as
select
((OverNumber+1)/2) as EvenON,
BowlerID
from
dbo.Bowling
go
create unique clustered index UQ_Bowling_DRI_SuccessiveOvers_Even on dbo.Bowling_DRI_SuccessiveOvers_Even (EvenON,BowlerID)
go
insert into dbo.Bowling(ID,OverNumber,BowlerID,InningsID) values
(1,1,150,1),
(2,4,160,1),
(3,3,165,1)
go
insert into dbo.Bowling(ID,OverNumber,BowlerID,InningsID) values
(4,2,150,1)
This final insert generates the error:
Msg 2601, Level 14, State 1, Line 37 Cannot insert duplicate key row
in object 'dbo.Bowling_DRI_SuccessiveOvers_Even' with unique index
'UQ_Bowling_DRI_SuccessiveOvers_Even'. The duplicate key value is (1,
150). The statement has been terminated.
Hopefully, you can see the trick I'm employing to make these views check your desired constraint - it's set up so that rows are paired with either their (logical, based on OrderNumber) successor or predecessor based on dividing the OrderNumber by two using integer maths.
We then apply unique constraints on these pairs and including the BowlerID. Only if the same bowler bowls two successive overs will we generate more than one row with the same (OddON/EvenON) and BowlerID values.
Maybe this one?
create function dbo.chk_fnk (#OverNumber int, #BowlerID int, #InningsID int)
returns int
as
begin
return
case when
exists (select *
from dbo.OverDetails
where BowlerID = #BowlerID and abs(OverNumber - #OverNumber) = 1 and InningsID = #InningsID)
then 1
else 0
end;
end;
go
ALTER TABLE dbo.OverDetails ADD CONSTRAINT UniqueConsecutiveBowlerInOneInning
CHECK (dbo.chk_fnk(OverNumber, BowlerID, InningsID) = 0);
We have a table where we store all the exceptions (message, stackTrace, etc..), the table is getting big and we would like to reduce it.
There are plenty of repeated StackTraces, Messages, etc, but enabling compression produces a modest size reduction (10%) while I think much bigger benefits could come if somehow Sql Server will intern the strings in some per-column hash-table.
I could get some of the benefits if I normalize the table and extract StackTraces to another one, but exception messages, exception types, etc.. are also repeated.
Is there a way to enable string interning for some column in Sql Server?
There is no built-in way to do this. You could easily do something like:
SELECT MessageID = IDENTITY(INT, 1, 1), Message
INTO dbo.Messages
FROM dbo.HugeTable GROUP BY Message;
ALTER TABLE dbo.HugeTable ADD MessageID INT;
UPDATE h
SET h.MessageID = m.MessageID
FROM dbo.HugeTable AS h
INNER JOIN dbo.Messages AS m
ON h.Message = m.Message;
ALTER TABLE dbo.HugeTable DROP COLUMN Message;
Now you'll need to do a few things:
Change your logging procedure to perform an upsert to the Messages table
Add proper indexes to the messages table (wasn't sure of Message data type) and PK
Add FK to MessageID column
Rebuild indexes on HugeTable to reclaim space
Do this in a test environment first!
Aaron's posting answers the questions of adding interning to a table, but afterwards you will need to modify your application code and stored-procedures to work with the new schema.
...or so you might think. You can actually create a VIEW that returns data matching the old schema, and you can also support INSERT operations on the view too, which are translated into child operations on the Messages and HugeTable tables. For readability I'll use the names InternedStrings and ExceptionLogs for the tables.
So if the old table was this:
CREATE TABLE ExceptionLogs (
LogId int IDENTITY(1,1) NOT NULL PRIMARY KEY,
Message nvarchar(1024) NOT NULL,
ExceptionType nvarchar(512) NOT NULL,
StackTrace nvarchar(4096) NOT NULL
)
And the new tables are:
CREATE TABLE InternedStrings (
StringId int IDENTITY(1,1) NOT NULL PRIMARY KEY,
Value nvarchar(max) NOT NULL
)
CREATE TABLE ExceptionLogs2 ( -- note the new name
LogId int IDENTITY(1,1) NOT NULL PRIMARY KEY,
Message int NOT NULL,
ExceptionType int NOT NULL,
StackTrace int NOT NULL
)
Add an index to InternedStrings to make the value lookups faster:
CREATE UNIQUE NONCLUSTERED INDEX IX_U_InternedStrings_Value ON InternedStrings ( Value ASC )
Then you would also have a VIEW:
CREATE VIEW ExeptionLogs AS
SELECT
LogId,
MessageStrings .Value AS Message,
ExceptionTypeStrings.Value AS ExceptionType,
StackTraceStrings .Value AS StackTrace
FROM
ExceptionLogs2
INNER JOIN InternedStrings AS MessageStrings ON
MessageStrings.StringId = ExceptionLogs2.Message
INNER JOIN InternedStrings AS ExceptionTypeStrings ON
ExceptionTypeStrings.StringId = ExceptionLogs2.ExceptionType
INNER JOIN InternedStrings AS StackTraceStrings ON
StackTraceStrings.StringId = ExceptionLogs2.StackTrace
And to handle INSERT operations from unmodified clients:
CREATE TRIGGER ExceptionLogsInsertHandler
ON ExceptionLogs INSTEAD OF INSERT AS
DECLARE #messageId int = SELECT StringId FROM InternedStrings WHERE Value = inserted.Message
IF #messageId IS NULL
BEGIN
INSERT INTO InternedStrings ( Text ) VALUES ( inserted.Message )
SET #messageId = SCOPE_IDENTITY()
END
DECLARE #exceptionTypeId int = SELECT StringId FROM InternedStrings WHERE Value = inserted.ExceptionType
IF #exceptionTypeId IS NULL
BEGIN
INSERT INTO InternedStrings ( Text ) VALUES ( inserted.ExceptionType )
SET #exceptionTypeId = SCOPE_IDENTITY()
END
DECLARE #stackTraceId int = SELECT StringId FROM InternedStrings WHERE Value = inserted.StackTrace
IF #stackTraceId IS NULL
BEGIN
INSERT INTO InternedStrings ( Text ) VALUES ( inserted.StackTrace )
SET #stackTraceId = SCOPE_IDENTITY()
END
INSERT INTO ExceptionLogs2 ( Message, ExceptionType, StackTrace )
VALUES ( #messageId, #exceptionTypeId, #stackTraceId )
Note this TRIGGER can be improved: it only supports single-row insertions, and is not entirely concurrency-safe, though because previous data won't be mutated it means that there's a slight risk of data duplication in the InternedStrings table - and because of a UNIQUE index the insert will fail. There are different possible ways to handle this, such as using a TRANSACTION and changing the queries to use holdlock and updlock.