I have a table (in SQL Server 2014) including multiple running totals (by different dates) - not an ideal design but imagine a very large number of rows and users able to pick a specified time period - we don't want to calculate SUMs from the start of time to get the running total to that period every time.
I am looking for an elegant way to update those running totals when multiple rows are updated.
The actual scenario is an account reconciliation - the table stores money transactions for which we have the event date (e.g. when a thing was sold), the transaction date (e.g. the invoice date) and the payment date (when the invoice was paid). For each of these there is a running total, e.g. (much simplified)
CREATE TABLE MyTransaction (
Id INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
EventDate DATETIME NOT NULL,
TransactionDate DATETIME,
PaymentDate DATETIME,
Amount INT, -- assume whole numbers for sake of it
RunningTotalByEventDate INT,
RunningTotalByTransactionDate INT,
RunningTotalByPaymentDate INT,
IsCancelled BIT DEFAULT (0)
)
... with indexes on dates as needed, etc. and assume for sake of example that the date/times are unique (in practice there are uniqueifiers and other stuff).
Inserting a transaction is fine(ish) - best I have come up with is three separate queries, each updating the running total by the relevant date... or one query with logic... so after inserting a new row (with obviously-named variables passed inot a stored proc)...
UPDATE MyTransaction SET RunningTotalByEventDate += #Amount
WHERE EventDate > #EventDate
and so on for the other two running totals, or a single query like...
UPDATE MyTransaction
SET RunningTotalByEventDate += CASE WHEN EventDate > #EventDate THEN #Amount ELSE 0 END,
RunningTotalByTransactionDate += CASE WHEN TransactionDate > #TransactionDate THEN #Amount ELSE 0 END,
RunningTotalByPaymentDate += CASE WHEN PaymentDate > #PaymentDate THEN #Amount ELSE 0 END
WHERE EventDate > #EventDate
OR TransactionDate > #TransactionDate
OR PaymentDate > #PaymentDate
Now I need to cancel transactions, e.g. an invoice is written off - the requirement is to leave the row in, but remove the effect - so the row stays with its Amount, but the cancelled flag is set and the row has no effect on the running totals. Unfortunately an invoice may have multiple transactions (e.g. several part payments), so there could be several transaction rows to update.
My best option so far for updating the multiple running totals is to loop/cursor around the (expected to be few) updated rows and reduce the subsequent running totals much as we increased them when adding a row - so for each time around the loop we have the three update queries (or one with logic) to update the three running totals.
A single UPDATE won't work, since it will only update a target row once (and if two part payments are being cancelled, we need to update it twice to take off each amount). I've played variously with windowed functions but cannot see a way to do this neatly with a single query set-wise.
So given a list of MyTransaction.Id values to cancel (e.g. in a table, table variable or CSV string list), what's the best way to update the various running totals?
Any ideas (and apologies for the rambling question) are very welcome.
Related
I am trying to set up a SCD of Type 2 for historical records within my Customer table. Attached is how the Customer table is set up alongside the expected outcome. Note that the Customer table in practice has 2 million distinct Customer IDs. I tried to use the query below, but the Start_Date and End_Date are repeating for each row.
SELECT t.Customer_ID, t.Lifecyle_ID, t.Date As Start_Date,
LEAD(t.Date) OVER (ORDER BY t.Date) AS End_Date
FROM Customer AS t
I think a three step query is likely needed.
Use LEAD and LAG, partitioned by Customer and ordered by date, to peek at the next row's values for both Date and Lifecycle.
Use a CASE statement to emit a value for End Date when the current row's Lifecycle <> the next row's lifecycle (otherwise emit NULL). Now do the same using LAG for the Effective Date.
Group By or Distinct on the output from Step #2.
Hopefully that makes sense. I'll try to post a code example later today, but hopefully that's enough to get you started.
I am developing a real-time auction site for a school project. We can't make any changes to the design of the database.
The 'Item' table has a column for the expiration date (the day the auction expires) and the expiration time (the exact time at which the auction expires). It also has a column that indicates whether the auction is opened or closed. This [AuctionClosed?] column needs to be updated when the expiration date and time are reached, which has to happen in real-time.
We're using an SQL Server database and the website runs on PHP7. The only possible solution I've found is to run a job every second, but this is too much overhead.
This is the function I want to use to check the column:
CREATE FUNCTION dbo.fn_isAuctionClosed (#Item BIGINT)
RETURNS BIT
AS
BEGIN
DECLARE #expirationDay DATE = (SELECT expirationDate FROM Item WHERE itemId = #Item)
DECLARE #expirationTime TIME = (SELECT expirationTime FROM Item WHERE itemId = #Item)
IF
DATE(GETDATE()) = #expirationDay AND TIME(GETDATE()) >= #expirationTime
OR
DATE(GETDATE()) > #expirationDay
RETURN 1
RETURN 0
END
And this is the procedure that updates the column:
CREATE PROCEDURE updateAuctionClosed #Item BIGINT
AS
UPDATE Item
SET [AuctionClosed?] = fn_isAuctionClosed(#Item)
WHERE itemId = #Item
To be more specific, what you really want here is a calculated column. Like I said in the comments, as the column will rely on the current date and time, the column won't be deterministic. This means it can't be PERSISTED but would be calculated every time the column is referenced (A PERSISTED column actually has it's value stored and is calculated when the row is effected in some way and restored). Even so, it can be calculated as follows:
ALTER TABLE Item DROP COLUMN [AuctionClosed?]; --You can't alter a column to a computed column, so we have to DROP it first
ALTER TABLE Item ADD [AuctionClosed?] AS CASE WHEN CONVERT(datetime,expirationDate) + CONVERT(datetime, expirationTime) > GETDATE() THEN 1 ELSE 0 END;
On a side note, I recommend against special characters in an object's name. Stick to alphanumerical characters only, and (if you must) underscores (_), as these don't force the object to be delimit identified.
We are logging realtime data every second to a SQL Server database and we want to generate charts from 10 Million rows or more. At the moment we use something like the code below. The goal is to get at least 1000-2000 values to pass into the chart.
In the query below, we take an avg of every next n'th rows depending on the count of data we pick out from the LargeTable. This works fine up to 200.000 selected rows, but then it is way too slow.
SELECT
AVG(X),
AVG(Y)
FROM
(SELECT
X, Y,
(Id / #AvgCount) AS [Group]
FROM
[LargeTable]
WHERE
Timestmp > #From
AND Timestmp < #Till) j
GROUP BY
[Group]
ORDER BY
X;
Now we tried to select out only every n'th row from LargeTable and then make an average of this data to get more performance, but it takes nearly the same time.
SELECT
X, Y
FROM
(SELECT
X, Y,
ROW_NUMBER() OVER (ORDER BY Id) AS rownr
FROM
LargeTable
WHERE
Timestmp >= #From
AND Timestmp <= #Till) a
WHERE
a.rownr % (#count / 10000) = 0;
It is only pseudo code! We have indexes on all relevant columns.
Are there better and faster ways to get chart data?
I think on two approaches to improve the performance of the charts:
Trying to improve the performance of the queries.
Reducing the amount of data needed to be read.
It's almost impossible for me to improve the performance of the queries without the full DDL and execution plans. So I'm suggesting you to reduce the amount of data to be read.
The key is summarizing groups at a given granularity level as the data comes and storing it in a separate table like the following:
CREATE TABLE SummarizedData
(
int GroupId PRIMARY KEY,
FromDate datetime,
ToDate datetime,
SumX float,
SumY float,
GroupCount
)
IdGroup should be equals to Id/100 or Id/1000 depending on how much granularity you want in groups. With larger groups you get more coarse granularity but more efficient charts.
I'm assuming LargeTable Id column increases monotonically, so you can store the last Id that has been processed in another table called SummaryProcessExecutions
You would need a stored procedure ExecuteSummaryProcess that:
Read LastProcessedId from SummaryProcessExecutions
Read the Last Id on large table and store it into #NewLastProcessedId variable
Summarize all rows from LargeTable with Id > #LastProcessedId and Id <= #NewLastProcessedId and store the results into SummarizedData table
Store #NewLastProcessedId variable into SummaryProcessExecutions table
You can execute ExecuteSummaryProcess stored procedure frequently in a SQL Server Agent Job.
I believe that grouping by date would be a better choice than grouping by Id. It would simplify things. The SummarizedData GroupId column would not be related to LargeTable Id and you would not need to update SummarizedData rows, you would only need to insert rows.
Since the time to scan the table increases with the number of rows in it, I assume there is no index on Timestmp column. An index like the one bellow may speed up you query:
CREATE NONCLUSTERED INDEX [IDX_Timestmp] ON [LargeTable](Timestmp) INCLUDE(X, Y, Id)
Please note, that creation of such index may take significant amount of time, and it will impact your inserts too.
I'm working on a leave software, and my problem is that i need to reset the leave days to default number of days (30 day) after one year. would you pleas help me with that.
ps: I'm using VB.NET AND SQL SERVER.
create table Addemployees
(
Fname varchar (500),
Lname varchar (500),
ID int not null identity(1, 1) primary key,
CIN varchar (500),
fromD date,
toD date,
Email varchar(500),
phone varchar(500),
Leave_num int
)
This is the tablet that contains the column Leave_num that has the leave numbers inserted by the user
update addemployees
set leave_num = 30
As for how you trigger this logic. There are many ways you could go about this. You'll need some sort of scheduler like an Agent job, or whatever else you have at your disposal to run this process on a recurring, scheduled, basis. The key thing is not to keep updating the LeaveNum if it's already been updated. You could maintain an extra column on each row indicating the last time they were reset. This is probably the simplest, but if it's truly an all-or-nothing type thing, and those dates will all be the same, that's sort of a waste of space.
You could then either create a separate table which just contains information about when these once-a-year jobs run, or something like an Extended Property (which is a little more involved to set up).
Whatever the solution you choose, Just save off the date (or even just the year), and then when your process runs, if the difference between the last update is greater than a year (or if the year of the last update is less than the current year) run your update, then update however you're storing that information; be it columns, a separate table, or an extended property.
I am not sure how to ask this question but here goes. I am trying to write a procedure to run each night that checks all unpaid invoices for a business and then adds service charge if needed. I need to query unpaid invoices, then check datediff() between creation and current date and then at certain values like 15 or 30 days I need to do several insert and updates to other tables to add the service charge and update balances. From what I read a loop is not the way to go but I am not sure how to keep track of current invoice or how to do inserts while I am inside a large update statement. Here is some psuedocode of what I need
select * from invoice where ispaid = 0
set days = currentdate - invoicecreationdate
switch (days)
case 30
insert servicecharge
update invoice
update balance
case 60
insert servicecharge
update invoice
update balance due
case 90
insert servicecharge
update invoice
update balance
I know this isn't much to go on but I will take any help I can get. I am not sure how this can work without a loop because I have several statements to run within each case that need to know what invoice we are currently dealing with
A loop wouldn't be so bad in your case. Each pass through the loop adds 30 days to the "past due" window and it appears that you intend to process all of the applicable rows for each window as a set. That's goodness.
Alternatively, you could use something like this to generate the appropriate date ranges:
declare #Today as Date = GetDate();
select DateAdd( day, -( AgingDays + 30 ), #Today ) as StartDate,
DateAdd( day, -( AgingDays + 1 ), #Today ) as EndDate, PenaltyPercent
from ( values ( 30, 2 ), ( 60, 5 ), ( 90, 10 ) ) as PastDueIntervals( AgingDays, PenaltyPercent )
It can be easily extended to carry additional data for each range. By JOINing this with your Invoice table you can process all the applicable invoices at once.
Depending on the size of you tables it may make sense to generate a temporary table that contains the invoice id, past due interval and any other applicable data. That table can then be used to supply the information used to update all three tables.
A useful trick is to include a CASE in UPDATE statements, e.g.:
update I
set WatchList = case when Aging >= 60 then 1 else WatchList end,
...
from Invoices as I inner join
#PastDueInvoices as PDI on PDI.InvoiceId = I.InvoiceId
This will set the watch list flag if the temporary table indicates that the invoice is 60 days or more past due, otherwise leave it unchanged.