I am developing a real-time auction site for a school project. We can't make any changes to the design of the database.
The 'Item' table has a column for the expiration date (the day the auction expires) and the expiration time (the exact time at which the auction expires). It also has a column that indicates whether the auction is opened or closed. This [AuctionClosed?] column needs to be updated when the expiration date and time are reached, which has to happen in real-time.
We're using an SQL Server database and the website runs on PHP7. The only possible solution I've found is to run a job every second, but this is too much overhead.
This is the function I want to use to check the column:
CREATE FUNCTION dbo.fn_isAuctionClosed (#Item BIGINT)
RETURNS BIT
AS
BEGIN
DECLARE #expirationDay DATE = (SELECT expirationDate FROM Item WHERE itemId = #Item)
DECLARE #expirationTime TIME = (SELECT expirationTime FROM Item WHERE itemId = #Item)
IF
DATE(GETDATE()) = #expirationDay AND TIME(GETDATE()) >= #expirationTime
OR
DATE(GETDATE()) > #expirationDay
RETURN 1
RETURN 0
END
And this is the procedure that updates the column:
CREATE PROCEDURE updateAuctionClosed #Item BIGINT
AS
UPDATE Item
SET [AuctionClosed?] = fn_isAuctionClosed(#Item)
WHERE itemId = #Item
To be more specific, what you really want here is a calculated column. Like I said in the comments, as the column will rely on the current date and time, the column won't be deterministic. This means it can't be PERSISTED but would be calculated every time the column is referenced (A PERSISTED column actually has it's value stored and is calculated when the row is effected in some way and restored). Even so, it can be calculated as follows:
ALTER TABLE Item DROP COLUMN [AuctionClosed?]; --You can't alter a column to a computed column, so we have to DROP it first
ALTER TABLE Item ADD [AuctionClosed?] AS CASE WHEN CONVERT(datetime,expirationDate) + CONVERT(datetime, expirationTime) > GETDATE() THEN 1 ELSE 0 END;
On a side note, I recommend against special characters in an object's name. Stick to alphanumerical characters only, and (if you must) underscores (_), as these don't force the object to be delimit identified.
Related
Lets say i have a small db table with only two fields. (MSSQL) Like this:
date (Date) daily_counter (Int)
-------------------------
2021-07-18 0
2021-07-18 1
2021-07-18 2
2021-07-19 0
I want to insert a new fifth row and insert value "2021-07-19" to the date field. And i want to know what the daily_counter is for my new row.
As you perhaps understand by the example, the daily_counter is supposed to auto increase, starting over each day.
So, since there is already a row with that date with the value 0 on the field daily_counter. I want to add 1 to daily_counter for my new row without sending the value 1 to the query.
How can i think when designing this type of table and data. Any hints appreciated. Thanks!
Kr
Gustav
UPDATE:
Ok, i think i got something that could work. The only downside would be when deleting and adding new rows, as the new id could be previosly used and deleted and added again.
A side from that i think i got something that i can use.
It might not be pretty now, but it looks like this.
It seems to work also when there is no row for the current day.
DECLARE #date DATE
SET #date = '2021-07-22'
DECLARE #daily_counter INT
SET #daily_counter = (SELECT MAX(daily_counter) from mytable where date = #date);
SET #daily_counter = #daily_counter + 1;
IF #daily_counter IS NULL
BEGIN
SET #daily_counter = 1;
END
INSERT INTO
mytable
(date, daily_counter)
OUTPUT #daily_counter
VALUES (#date, #daily_counter)
Thanks again for the help!
It's not possible to make the database do this automatically in the row itself. You must have a single counter across all dates (a SEQUENCE would be good for this).
What you can do is use the row_number() function to simulate this at the point where you query the data:
SELECT *, row_number() over (partition by [date] order by [date])
FROM ...
Unfortunately, this will still fail if you need to preserve the original position following deletes, but there's not a good way to do this right now in a database without triggers or application code.
I have a table (in SQL Server 2014) including multiple running totals (by different dates) - not an ideal design but imagine a very large number of rows and users able to pick a specified time period - we don't want to calculate SUMs from the start of time to get the running total to that period every time.
I am looking for an elegant way to update those running totals when multiple rows are updated.
The actual scenario is an account reconciliation - the table stores money transactions for which we have the event date (e.g. when a thing was sold), the transaction date (e.g. the invoice date) and the payment date (when the invoice was paid). For each of these there is a running total, e.g. (much simplified)
CREATE TABLE MyTransaction (
Id INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
EventDate DATETIME NOT NULL,
TransactionDate DATETIME,
PaymentDate DATETIME,
Amount INT, -- assume whole numbers for sake of it
RunningTotalByEventDate INT,
RunningTotalByTransactionDate INT,
RunningTotalByPaymentDate INT,
IsCancelled BIT DEFAULT (0)
)
... with indexes on dates as needed, etc. and assume for sake of example that the date/times are unique (in practice there are uniqueifiers and other stuff).
Inserting a transaction is fine(ish) - best I have come up with is three separate queries, each updating the running total by the relevant date... or one query with logic... so after inserting a new row (with obviously-named variables passed inot a stored proc)...
UPDATE MyTransaction SET RunningTotalByEventDate += #Amount
WHERE EventDate > #EventDate
and so on for the other two running totals, or a single query like...
UPDATE MyTransaction
SET RunningTotalByEventDate += CASE WHEN EventDate > #EventDate THEN #Amount ELSE 0 END,
RunningTotalByTransactionDate += CASE WHEN TransactionDate > #TransactionDate THEN #Amount ELSE 0 END,
RunningTotalByPaymentDate += CASE WHEN PaymentDate > #PaymentDate THEN #Amount ELSE 0 END
WHERE EventDate > #EventDate
OR TransactionDate > #TransactionDate
OR PaymentDate > #PaymentDate
Now I need to cancel transactions, e.g. an invoice is written off - the requirement is to leave the row in, but remove the effect - so the row stays with its Amount, but the cancelled flag is set and the row has no effect on the running totals. Unfortunately an invoice may have multiple transactions (e.g. several part payments), so there could be several transaction rows to update.
My best option so far for updating the multiple running totals is to loop/cursor around the (expected to be few) updated rows and reduce the subsequent running totals much as we increased them when adding a row - so for each time around the loop we have the three update queries (or one with logic) to update the three running totals.
A single UPDATE won't work, since it will only update a target row once (and if two part payments are being cancelled, we need to update it twice to take off each amount). I've played variously with windowed functions but cannot see a way to do this neatly with a single query set-wise.
So given a list of MyTransaction.Id values to cancel (e.g. in a table, table variable or CSV string list), what's the best way to update the various running totals?
Any ideas (and apologies for the rambling question) are very welcome.
I have a trigger on a table for insert, delete, update that on the first line gets the current date with GetDate() method.
The trigger will compare the deleted and inserted table to determine what field has been changed and stores in another table the id, datetime and the field changed. This combination must be unique
A stored procedure does an insert and an update sequentially on the table. Sometimes I get a violation of primary key and I suspect that the GetDate() returns the same value.
How can I make the GetDate() return different values in the trigger.
EDIT
Here is the code of the trigger
CREATE TRIGGER dbo.TR
ON table
FOR DELETE, INSERT, UPDATE
AS
BEGIN
SET NoCount ON
DECLARE #dt Datetime
SELECT #dt = GetDate()
insert tableLog (id, date, field, old, new)
select I.id, #dt, 'field', D.field, I.field
from INSERTED I LEFT JOIN DELETED D ON I.id=D.id
where IsNull(I.field, -1) <> IsNull(D.field, -1)
END
and the code of the calls
...
insert into table ( anotherfield)
values (#anotherfield)
if ##rowcount=1 SET #ID=##Identity
...
update table
set field = #field
where Id = #ID
...
Sometimes the GetDate() between the 2 calls (insert and update) takes 7 milliseconds and sometimes it has the same value.
That's not exactly full solution but try using SYSDATETIME instead and of course make sure that target table can store up datetime2 up to microseconds.
Note that you can't force different datetime regardless of precision (unless you will start counting up to ticks) as stuff can just happen at the same time wihthin given precision.
If stretching up to microseconds won't solve the issue on practical level, I think you will have to either redesign this logging schema (perhaps add identity column on top of what you have) or add some dirty trick - like make this insert in try catch block and add like microsecond (nanosecond?) in a loop until you insert successfully. Definitely not s.t. I would recommend.
Look at this answer: SQL Server: intrigued by GETDATE()
If you are inserting multiple ROWS, they will all use the same value of GetDate(), so you can try wrapping it in a UDF to get unique values. But as I said, this is just a guess unless you post the code of your trigger so we can see what you are actually doing?
It sounds like you're trying to create an audit trail - but now you want to forge some of the entries?
I'd suggest instead adding a rowversion column to the table and including that in your uniqueness criteria - either instead of or as well as the datetime value that is being recorded.
In this way, even if two rows are inserted with identical date/time data, you can still tell the actual insertion order.
I am trying to constrain a SQL Server Database by a Start Date and End Date such that I can never double book a resource (i.e. no overlapping or duplicate reservations).
Assume my resources are numbered such that the table looks like
ResourceId, StartDate, EndDate, Status
So lets say I have resource #1. I want to make sure that I cannot have have the a reservation for 1/8/2017 thru 1/16/2017 and a separate reservation for 1/10/2017 - 1/18/2017 for the same resource.
A couple of more complications, a StartDate for a resource can be the same as the EndDate for the resource. So 1/8/1027 thru 1/16/2017 and 1/16/2017 thru 1/20/2017 is ok (i.e., one person can check in on the same day another person checkouts).
Furthermore, the Status field indicates whether the booking of the resource is Active or Cancelled. So we can ignore all cancelled reservations.
We have protected against these overlapping or double booking reservations in Code (Stored Procs and C#) when saving but we are hoping to add an extra layer of protection by adding a DB Contraint.
Is this possible in SQL Server ?
Thanks in Advance
You can use a CHECK constraint to make sure startdate is on or before EndDate easily enough:
CONSTRAINT [CK_Tablename_ValidDates] CHECK ([EndDate] >= [StartDate])
A constraint won't help with preventing an overlapping date range. You can instead use a TRIGGER to enforce this by creating a FOR INSERT, UPDATE trigger that rolls back the transaction if it detects a duplicate:
CREATE TRIGGER [TR_Tablename_NoOverlappingDates] FOR INSERT, UPDATE AS
IF EXISTS(SELECT * from inserted INNER JOIN [MyTable] ON blah blah blah ...) BEGIN
ROLLBACK TRANSACTION;
RAISERROR('hey, no overlapping date ranges here, buddy', 16, 1);
RETURN;
END
Another option is to create a indexed view that finds duplicates and put a unique constraint on that view that will be violated if more than 1 record exists. This is usually accomplished with a dummy table that has 2 rows cartesian joined to an aggregate view that selects the duplicate id-- thus one record with a duplicate would return two rows in the view with the same fake id value that has a unique index.
I've done both, I like the trigger approach better.
Drawing from this answer here: Date range overlapping check constraint.
First, check to make sure there are not existing overlaps:
select *
from dbo.Reservation as r
where exists (
select 1
from dbo.Reservation i
where i.PersonId = r.PersonId
and i.ReservationId != r.ReservationId
and isnull(i.EndDate,'20990101') > r.StartDate
and isnull(r.EndDate,'20990101') > i.StartDate
);
go
If it is all clear, then create your function.
There are a couple of different ways to write the function, e.g. we could skip the StartDate and EndDate and use something based only on ReservationId like the query above, but I will use this as the example:
create function dbo.udf_chk_Overlapping_StartDate_EndDate (
#ResourceId int
, #StartDate date
, #EndDate date
) returns bit as
begin;
declare #r bit = 1;
if not exists (
select 1
from dbo.Reservation as r
where r.ResourceId = #ResourceId
and isnull(#EndDate ,'20991231') > r.StartDate
and isnull(r.EndDate,'20991231') > #StartDate
and r.[Status] = 'Active'
group by r.ResourceId
having count(*)>1
)
set #r = 0;
return #r;
end;
go
Then add your constraint:
alter table dbo.Reservation
add constraint chk_Overlapping_StartDate_EndDate
check (dbo.udf_chk_Overlapping_StartDate_EndDate(ResourceId,StartDate,EndDate)=0);
go
Last: Test it.
I have been using partitioning with a postgreSQL database for a while. My database has grown quite a lot and does so nicely with partitioning. Unfortunately I now seem to have hit another barrier in speed and am trying to figure out some ways to speed up the database even more.
My basic setup is as follows:
I have one master table called database_data from which all the partitions inherit. I chose to have one partition per month and name them like: database_data_YYYY_MM which works nicely.
By analyzing my data usage, I noticed, that I mostly do insert operations on the table and only some updates. The updates, however also occur on only a certain kind of row: I have a column called channel_id (a FK to another table). The rows I update always have a channel_id out of a set of maybe 50 IDs, so this would be a great way of distinguishing the rows that are never updated from the ones that potentially are.
I figured it would speed up my setup further if I would use the partitioning to have one table of insert only data and one of potentially updated data per month, as my updates would have to check less rows each time.
I could of course use the "simple" partitioning I am using now and add another table for each month called database_data_YYYY_MM_update and add the special constraints to that and the database_data_YYYY_MM table in order for the query planner to distinguish between the tables.
I was, however thinking, that I do sometimes have operations which operate on all data of a given month, no matter if updateable or not. In such a case I could JOIN the two tables but there could be an easier way for such queries.
So now to my real question:
Is "two layer" partitioning possible in PostgreSQL? What I mean by that is, that instead of having two tables for each month inheriting from the master table, I would only have one table per month directly inheriting from the master table e.g. database_data_YYYY_MM and then have two more tables inheriting from that table, one for the insert only data e.g. database_data_YYYY_MM_insert and one for the updateable data e.g. database_data_YYYY_MM_update.
Would this speed up the query planning at all? I would guess that it would be faster if the query planner could eliminate both tables at once if the intermediate table was eliminated.
The obvious advantage here would be that I could operate on all data of one month by simply using the table database_data_YYYY_MM and for my updates use the child table directly.
Any drawbacks that I am not thinking of?
Thank you for your thoughts.
Edit 1:
I don't think a schema is strictly necessary to answer my question but if it helps understanding I'll provide a sample schema:
CREATE TABLE database_data (
id bigint PRIMARY KEY,
channel_id bigint, -- This is a FK to another table
timestamp TIMESTAMP WITH TIME ZONE,
value DOUBLE PRECISION
)
I have a trigger on the database_data table that generates the partitions on demand:
CREATE OR REPLACE FUNCTION function_insert_database_data() RETURNS TRIGGER AS $BODY$
DECLARE
thistablename TEXT;
thisyear INTEGER;
thismonth INTEGER;
nextmonth INTEGER;
nextyear INTEGER;
BEGIN
-- determine year and month of timestamp
thismonth = extract(month from NEW.timestamp AT TIME ZONE 'UTC');
thisyear = extract(year from NEW.timestamp AT TIME ZONE 'UTC');
-- determine next month for timespan in check constraint
nextyear = thisyear;
nextmonth = thismonth + 1;
if (nextmonth >= 13) THEN
nextmonth = nextmonth - 12;
nextyear = nextyear +1;
END IF;
-- Assemble the tablename
thistablename = 'database_datanew_' || thisyear || '_' || thismonth;
-- We are looping until it's successfull to catch the case when another connection simultaneously creates the table
-- if that would be the case, we can retry inserting the data
LOOP
-- try to insert into table
BEGIN
EXECUTE 'INSERT INTO ' || quote_ident(thistablename) || ' SELECT ($1).*' USING NEW;
-- Return NEW inserts the data into the main table allowing insert statements to return the values like "INSERT INTO ... RETURNING *"
-- This requires us to use another trigger to delete the data again afterwards
RETURN NEW;
-- If the table does not exist, create it
EXCEPTION
WHEN UNDEFINED_TABLE THEN
BEGIN
-- Create table with check constraint on timestamp
EXECUTE 'CREATE TABLE ' || thistablename || ' (CHECK ( timestamp >= TIMESTAMP WITH TIME ZONE '''|| thisyear || '-'|| thismonth ||'-01 00:00:00+00''
AND timestamp < TIMESTAMP WITH TIME ZONE '''|| nextyear || '-'|| nextmonth ||'-01 00:00:00+00'' ), PRIMARY KEY (id)
) INHERITS (database_data)';
-- Add any trigger and indices to the table you might need
-- Insert the new data into the new table
EXECUTE 'INSERT INTO ' || quote_ident(thistablename) || ' SELECT ($1).*' USING NEW;
RETURN NEW;
EXCEPTION WHEN DUPLICATE_TABLE THEN
-- another thread seems to have created the table already. Simply loop again.
END;
-- Don't insert anything on other errors
WHEN OTHERS THEN
RETURN NULL;
END;
END LOOP;
END;
$BODY$
LANGUAGE plpgsql;
CREATE TRIGGER trigger_insert_database_data
BEFORE INSERT ON database_data
FOR EACH ROW EXECUTE PROCEDURE function_insert_database_data();
As for sample data: Let's assume we only have two channels: 1 and 2. 1 is insert only data and 2 is updateable.
My two layer approach would be something like:
Main table:
CREATE TABLE database_data (
id bigint PRIMARY KEY,
channel_id bigint, -- This is a FK to another table
timestamp TIMESTAMP WITH TIME ZONE,
value DOUBLE PRECISION
)
Intermediate table:
CREATE TABLE database_data_2015_11 (
(CHECK ( timestamp >= TIMESTAMP WITH TIME ZONE '2015-11-01 00:00:00+00' AND timestamp < TIMESTAMP WITH TIME ZONE '2015-12-01 00:00:00+00)),
PRIMARY KEY (id)
) INHERITS(database_data);
Partitions:
CREATE TABLE database_data_2015_11_insert (
(CHECK (channel_id = 1)),
PRIMARY KEY (id)
) INHERITS(database_data_2015_11);
CREATE TABLE database_data_2015_11_update (
(CHECK (channel_id = 2)),
PRIMARY KEY (id)
) INHERITS(database_data_2015_11);
Of course I would then need another trigger on the intermediate table to create the child tables on demand.
It's a clever idea, but sadly it doesn't seem to work. If I have a parent table with 1000 direct children, and I run a SELECT that should pull from just one child, then explain analyze gives me a planning time of around 16ms. On the other hand, if I have just 10 direct children, and they all have 10 children, and those all have 10 children, I get a query planning time of about 29ms. I was surprised---I really thought it would work!
Here is some ruby code I used to generate my tables:
0.upto(999) do |i|
if i % 100 == 0
min_group_id = i
max_group_id = min_group_id + 100
puts "CREATE TABLE datapoints_#{i}c (check (group_id > #{min_group_id} and group_id <= #{max_group_id})) inherits (datapoints);"
end
if i % 10 == 0
min_group_id = i
max_group_id = min_group_id + 10
puts "CREATE TABLE datapoints_#{i}x (check (group_id > #{min_group_id} and group_id <= #{max_group_id})) inherits (datapoints_#{i / 100 * 100}c);"
end
puts "CREATE TABLE datapoints_#{i + 1} (check (group_id = #{i + 1})) inherits (datapoints_#{i / 10 * 10}x);"
end