I have a table that is basically records of items, with columns for each day of the month. So basically each row is ITEM , Day1, Day2, Day3, ....I have to run update statements that basically trawl through each row day by day with the current day information requiring some info from the previous day.
Basically, we have required daily quantities. Because the order goes out in boxes (which are a fixed size) and the calculated quantities are in pieces, the system has to calculate the next largest number of boxes. Any "extra quantity" is carried over to the next day to reduce boxes.
For example, for ONE of those records in the table described earlier (the box size is 100)
My current code is basically getting the record, calculate the requirements for that day, increment by one and repeat. I have to do this for each record. It's very inefficient especially since it's being run sequentially for each record.
Is there anyway to parallel-ize this on SQL Server Standard? I'm thinking of something like a buffer where I will submit each row as a job and the system basically manages the resources and runs the query
If the buffer idea is not feasible, is there anyway to 'chunk' these rows and run the chunks in parallel?
Not sure if this helps, but I played around with your data and was able to calculate the figures without row-by-row handling as such. I transposed the figures with unpivot and calculated the values using running total + lag, so this requires SQL Server 2012 or newer:
declare #BOX int = 100
; with C1 as (
SELECT
Day, Quantity
FROM
(SELECT * from Table1 where Type = 'Quantity') T1
UNPIVOT
(Quantity FOR Day IN (Day1, Day2, Day3, Day4)) AS up
),
C2 as (
select Day, Quantity,
sum(ceiling(convert(numeric(5,2), Quantity) / #BOX) * #BOX - Quantity)
over (order by Day asc) % #BOX as Extra
from C1
),
C3 as (
select
Day, Quantity,
Quantity - isnull(Lag(Extra) over (order by Day asc),0) as Required,
Extra
from C2
)
select
Day, Quantity, Required,
ceiling(convert(numeric(5,2), Required) / #BOX) as Boxes, Extra
from C3
Example in SQL Fiddle
Related
I have invoicing solution that uses Azure SQL to store and calculate invoice data. I have been requested to provide 'credit' functionality so rather than recovering customers charges, the totals are deducted from an amount of available credit and reflected in the invoice (solution xyz may have 1500 worth of charges, but deducted from available credit of 10,000 means its effectively zero'd and leaves 8,500 credit remaining ). Unfortunately after several days I haven't been able to work out how to do this.
I am able to get a list of items and their costs from sql easily:
invoice_id
contact_id
solution_id
total
date
202104-015
52
10000
30317.27
2021-05-22
202104-015
52
10001
2399.90
2021-05-22
202104-015
52
10005
8302.27
2021-05-22
202104-015
52
10060
3625.22
2021-05-22
202104-015
52
10111
22.87
2021-05-22
202104-015
52
10115
435.99
2021-05-22
I have another table that shows the credit available for the given contact:
id
credit_id
owner_id
total_applied
date_applied
1
C00001
52
500000.00
2021-05-14
I have tried using the following SQL statement, based on another stackoverflow question to subtract from the previous row, thinking each row would then reflect the remaining credit:
Select
invoice_id,
solution_id
sum(total) as 'total',
cr.total_remaining - coalesce(lag(total)) over (order by s.solution_id), 0) as credit_available,
date
from
invoices
left join credits cr on
cr.credit_id = 'C00001'
Whilst this does subtract, it only subtracts from the row above it, not all of the rows above it:
invoice_id
solution_id
total
credit_available
date
202104-015
10000
30317.27
500000.00
2021-05-22
202104-015
10001
2399.90
469682.73
2021-05-22
202104-015
10005
8302.27
497600.10
2021-05-22
202104-015
10060
3625.22
491697.73
2021-05-22
202104-015
10111
22.87
496374.78
2021-05-22
202104-015
10115
435.99
499977.13
2021-05-22
I've also tried various queries with a mess of case statements.
Im at the point where I am contemplating using powershell or similar to do the task instead (loop through each solution, check if there is enough available credit, update a deduction table, goto next etc) but I'd rather keep it all in SQL if I can.
Anyone have some pointers for this beginner?
You don't need to use window functions, use a sub-query that sums the total of previous invoices. But be sure to use index the table correctly so that performance is not a problem.
There are two sub-queries, one for the previous total sum and another to get the date of the next credit for contact_id.
SELECT [inv].[invoice_id],
[inv].[solution_id],
[inv].[total],
-- subquery that sums the previous totals
[cr].[total_applied] - COALESCE((
SELECT SUM([inv_inner].[total])
FROM [dbo].[invoices] AS [inv_inner]
WHERE [inv_inner].[solution_id] < [inv].[solution_id]
), 0) AS [credit_available],
[inv].[date]
FROM [dbo].[invoices] [inv]
LEFT JOIN [dbo].[credits] [cr]
ON [cr].[owner_id] = [inv].[contact_id]
-- here, we make sure that the credit is available for the correct period
-- invoice date >= credit date_applied
AND [inv].[date] >= [cr].[date_applied]
-- and invoice date < next date_applied or tomorrow, in case there are no next date_applied
AND [inv].[date] < COALESCE((
SELECT MIN([cr2].[date_applied])
FROM [dbo].[credits] [cr2]
WHERE [cr2].[owner_id] = [cr].[owner_id]
AND [cr2].[date_applied] > [cr].[date_applied]
), GETDATE()+1)
AND [cr].[credit_id] = 'C00001';
This query works, but it is for this question only. Please study it and adapt to your real world problem.
This is a pretty complex scenario. I sadly cannot spend the time to offer a complete solution here. I do can provide you with tips and points of attention here:
Be sure to determine the actual remaining credit based on the complete invoice history. If you introduce filtering (in a WHERE-clause, for example, or by including joins with other tables), the results should not be affected by it. You should probably pre-calculate the available credit per invoice detail record in a temporary table or in a CTE and use that data in your main query.
Make sure that you regard the date_applied value of the credit. Before a credit is applied to a customer, that customer should probably have less credit or no credit at all. That should be reflected correctly on historical invoices, I guess.
Make sure you determine the correct amount of total credit. It is unclear from the information provided in your question how that should be determined/calculated. Is only the latest total_applied value from the credits table active? Or should all the historical total_applied values be summarized to get the total available credit?)
Include a correct join between your invoices table and your credits table. Currently, this join is hard coded in your query.
Also regard actual payments by customers. Payments have effect on the available credit, I assume. Also note that, unless you are OK with a history that changes, you need to regard the payment dates as well (just like the credit change dates).
I'm not sure how you would solve your scenario using PowerShell... I do know for sure, that this can be tackled with SQL.
I cannot say anything about the resulting performance, however. These kinds of calculations surely come with a price tag attached in that regard. If you need high performance, I guess it might be more practical to include columns in your invoices table to physically store the available credit with each invoice detail record.
Edit
I have experimented a little with your scenario and your additional comments.
My solution implementation uses two CTEs:
The first CTE (cte_invoice_credit_dates) retrieves the date of the active credit record for specific invoice IDs.
The second CTE (cte_contact_invoice_summarized_totals) calculates the invoice totals of all the invoices of a specific contact. Since you want to summarize on solution detail per invoice as well, I also included the solution ID per invoice in the querying logic.
The main query selects all columns from the invoices table and uses the data from the two CTEs to calculate three additional columns in the result set:
Column credit_assigned represents the total assigned credit at the invoice's date.
Column summarized_total shows the contact's cumulative invoice total.
Column credit_available shows the remaining credit.
WITH
[cte_invoice_credit_dates] AS (
SELECT DISTINCT
I.[invoice_id],
C.[date_applied]
FROM
[invoices] AS I
OUTER APPLY (SELECT TOP (1) [date_applied]
FROM [credits]
WHERE
[owner_id] = I.[contact_id] AND
[date_applied] <= I.[date]
ORDER BY [date_applied] DESC) AS C
),
[cte_contact_invoice_summarized_totals] AS (
SELECT
I.[contact_id],
I.[invoice_id],
I.[solution_id],
SUM(H.[total]) AS [total]
FROM
[invoices] AS I
INNER JOIN [invoices] AS H ON
H.[contact_id] = I.[contact_id] AND
H.[invoice_id] = I.[invoice_id] AND
H.[solution_id] <= I.[solution_id] AND
H.[date] <= I.[date]
GROUP BY
I.[contact_id],
I.[invoice_id],
I.[solution_id]
)
SELECT
I.[invoice_id],
I.[contact_id],
I.[solution_id],
I.[total],
I.[date],
COALESCE(C.[total_applied], 0) AS [credit_assigned],
H.[total] AS [summarized_total],
COALESCE(C.[total_applied] - H.[total], 0) AS [credit_available]
FROM
[invoices] AS I
INNER JOIN [cte_contact_invoice_summarized_totals] AS H ON
H.[contact_id] = I.[contact_id] AND
H.[invoice_id] = I.[invoice_id] AND
H.[solution_id] = I.[solution_id]
LEFT JOIN [cte_invoice_credit_dates] AS CD ON
CD.[invoice_id] = I.[invoice_id]
LEFT JOIN [credits] AS C ON
C.[owner_id] = I.[contact_id] AND
C.[date_applied] = CD.[date_applied]
ORDER BY
I.[invoice_id],
I.[solution_id];
I have a table (call it "DimMonth") that I often want to select a subset of successive rows from by some numeric column (say "Month"). I always specify the min / max row in this subset, as well as the number of rows in the subset. DimMonth.Month is an integer that represents year and month (in format YYYYMM), with values like 202001, 202012, 202103, etc. There are no keys or indexes defined for the table (although, Month is a foreign key to other tables). What is the best way to go about selecting this subset of rows?
For example, say #month = 202103, and that I want to select it and the 3 months before it. So, I expect a result like:
202103
202102
202101
202012
As far as I know, due to order of execution, even though the following solution works sometimes, I can't rely upon it to work all the time:
SELECT TOP 4
Month
FROM
dbo.DimMonth
WHERE
Month <= #month
ORDER BY
Month DESC
....since SELECT is executed before ORDER BY.
A solution which I know works but is tedious to write every time (and is costly for the CTE, since the result set will grow over time) is:
WITH
all_months_before_desired_month AS
(
SELECT
Month
ROW_NUMBER() OVER(
ORDER BY
Month DESC
) AS RowNum
FROM
dbo.DimMonth
WHERE
Month <= #month
)
,SELECT
Month
FROM
all_months_before_desired_month
WHERE
RowNum BETWEEN 1 AND 4
;
I think the right answer here is to define a key or an index (so that I can use my first solution, but without the ORDER BY), but I'm not sure.
ORDER BY is always executed first, and then SELECT.
It does not matter that ORDER BY is at the end.
If you have no ORDER BY, then the results are in unpredictable order, that is also subject to change.
I am trying to create a report for labour effectiveness in a manufacturing business that links to 2 x distinct MS SQL databases.
"Database A" contains information about what employees were doing when they were on shift.
"Database B" contains information about what time an employee clocked in or out for payroll info.
One of the comparisons I want to report on, is the total time an employee was logged into a job vs the total time they were clocked in the building. The data is related on employee number. The main report is linked to database A and is grouped on "Employee Name", within the group footer, there is a sub report linked to Database B and the employee name is passed through as a parameter.
My problem is the way database B records clocking in. I am using a SQL command to collect the data:
SELECT
tws.date_and_time
, CONVERT(date, tws.date_and_time) AS 'Date'
,te.first_name
, te.last_name
, concat(te.first_name,' ', te.last_name) AS 'ConcatName'
, CASE WHEN tws.flag IN (1,3) THEN 1 ELSE 0 END as manual_adjustment
, ROW_NUMBER() OVER (PARTITION BY tw.date_and_time ORDER BY tws.date_and_time ASC) AS swipe_number
, ROW_NUMBER() OVER (PARTITION BY tw.date_and_time ORDER BY tws.date_and_time ASC) % 2 AS in_swipe
FROM temployee te
INNER JOIN twork tw
ON te.employee_id = tw.employee_id AND tw.type = 1000
INNER JOIN twork_swipe tws
on tw.work_id = tws.work_id
The "swipe_number" details what number swipe that record is in the time period (e.g. 1st, 2nd, 3rd etc.)
The "in_swipe" displays 1 if this is the employee clocking in or 0 if it employee clocking out.
I am grouping the sub report on date.
This is relatively straight forward if an employee clocks in and out once on the same day, but I am struggling to work out how to account for if an employee clocks in and out multiple times during a shift (for a break for example) or if an employee clocks in on one day and out on another (night shift for example).
I need to sum the total time an employee is clocked in, so I need to evaluate if the 1st clock of the day is an "in swipe", (swipe_number = 1 AND in_swipe = 1) if it is not, it should not be recorded as the difference between swipe_number 2 and 1, it should be from the start of the day (00:00:0000) to swipe_number = 1 as this indicates the employee has been there since midnight.
Likewise if the last (or only) "swipe_number" of the day is an "in swipe", the time should be recorded as between that time and 23:59:5999.
Outside of this, I need to find the time between the date time fields where swipe numbers = 2 & 1, 4 & 3, 6 & 5 etc. (no fixed number of times a swipe can occur).
Can this be handled dynamically in formulas?
I have a booking system that allows a user to book places for 30 min timeslots (e.g. 1pm, 1:30pm, 2pm etc...)
In the sql database I may have one booking for 10am, a booking for 1pm and two for 2pm. I am trying to display a view of all 30 min booking slots in between a date time range displaying number of current bookings for each slot.
I am not storing each slot explicitly as it's not very efficient. Is there a way to make sql return 'empty' timeslots in a single query? I don't want to create a timeslot array then query each timeslot individually for the total count of bookings.
I am using sql server and asp.net mvc6 as my technology base. Some suggestions on technique would be appreciated.
Thanks.
you need to build a 30 minute interval time range table and do left join with your table to get all time slots
This query generates 30 minute interval times starting from startDate , total 12 time slots are generated, you can modify it accordingly.
declare #startDate datetime ='2014-01-12 12:00:00'
;with cte(value,nextval,n)
as
(
select CONVERT(VARCHAR(5),#startDate,108) as value,
dateadd(minute, datediff(minute, 0, #startDate)+30, 0) as nextval, 1 as n
union all
select CONVERT(VARCHAR(5),cte.nextval,108) as value,
dateadd(minute, datediff(minute, 0, cte.nextval)+30, 0) as nextval, n+1
from cte
where n <=12
)
select * from cte
left join Table1
on cte.nextval = Table1.timeslotvalue
We have a table of transactions which is structured like the following :
TranxID int (PK and Identity field)
ItemID int
TranxDate datetime
TranxAmt money
TranxAmt can be positive or negative, so the running total of this field (for any ItemID) will go up and down as time goes by. Getting the current total is obviously simple, but what I'm after is a performant way of getting the highest value of the running total and the TranxDate when this occurred. Note that TranxDate is not unique, and due to some backdating the ID field is not necessarily in the same sequence as TranxDate for a given Item.
Currently we're doing something like this (#tblTranx is a table variable containing just the transactions for a given Item) :
SELECT Top 1 #HighestTotal = z.TotalToDate, #DateHighest = z.TranxDate
FROM
(SELECT a.TranxDate, a.TranxID, Sum(b.TranxAmt) AS TotalToDate
FROM #tblTranx AS a
INNER JOIN #tblTranx AS b ON a.TranxDate >= b.TranxDate
GROUP BY a.TranxDate, a.TranxID) AS z
ORDER BY z.TotalToDate DESC
(The TranxID grouping removes the issue caused by duplicate date values)
This, for one Item, gives us the HighestTotal and the TranxDate when this occurred. Rather than run this on the fly for tens of thousands of entries, we only calculate this value when the app updates the relevant entry and record the value in another table for use in reporting.
The question is, can this be done in a better way so that we can work out these values on the fly (for multiple items at once) without falling into the RBAR trap (some ItemIDs have hundreds of entries). If so, could this then be adapted to get the highest values of subsets of transactions (based on a TransactionTypeID not included above). I'm currently doing this with SQL Server 2000, but SQL Server 2008 will be taking over soon here so any SQL Server tricks can be used.
SQL Server sucks in calculating running totals.
Here's a solution for your very query (which groups by dates):
WITH q AS
(
SELECT TranxDate, SUM(TranxAmt) AS TranxSum
FROM t_transaction
GROUP BY
TranxDate
),
m (TranxDate, TranxSum) AS
(
SELECT MIN(TranxDate), SUM(TranxAmt)
FROM (
SELECT TOP 1 WITH TIES *
FROM t_transaction
ORDER BY
TranxDate
) q
UNION ALL
SELECT DATEADD(day, 1, m.TranxDate),
m.TranxSum + q.TranxSum
FROM m
CROSS APPLY
(
SELECT TranxSum
FROM q
WHERE q.TranxDate = DATEADD(day, 1, m.TranxDate)
) q
WHERE m.TranxDate <= GETDATE()
)
SELECT TOP 1 *
FROM m
ORDER BY
TranxSum DESC
OPTION (MAXRECURSION 0)
You need to have an index on TranxDate for this to work fast.