LAG and/or LEAD alternative without window function in Pstgresql - database

Well I'm using Postgresql(but it won't matter if you can advise a solution in any SQL syntax), I have a table like
employee
department
salary
1
sales
30,000
2
sales
25,000
3
marketing
45,000
4
marketing
55,000
so on...
What I want to achieve is:
employee
department
salary
difference
1
sales
30,000
0/null
2
sales
25,000
5,000
3
marketing
45,000
0/null
4
marketing
55,000
10,000
So technically I want to extract the value difference of consecutive rows, however I can't use the window functions (I don't know why, but it is must to avoid in this challenge)
in a perfect world, we'd be able to do lag() or lead() functions partitioned by department name and store the value difference in other column, but I don't know how to do it without them.
I tried subqueries multiple ways, but every time I ended up having NULL or 0 in new a column

You can use a self-join to table itself and join each employee with the previous row within the same department.
SELECT t1.employee, t1.department, t1.salary, ABS(t2.salary - t1.salary) AS difference
FROM tab t1
LEFT JOIN tab t2
ON t1.department = t2.department AND t1.employee = t2.employee +1

Related

SQL Server - deducting from available credit

I have invoicing solution that uses Azure SQL to store and calculate invoice data. I have been requested to provide 'credit' functionality so rather than recovering customers charges, the totals are deducted from an amount of available credit and reflected in the invoice (solution xyz may have 1500 worth of charges, but deducted from available credit of 10,000 means its effectively zero'd and leaves 8,500 credit remaining ). Unfortunately after several days I haven't been able to work out how to do this.
I am able to get a list of items and their costs from sql easily:
invoice_id
contact_id
solution_id
total
date
202104-015
52
10000
30317.27
2021-05-22
202104-015
52
10001
2399.90
2021-05-22
202104-015
52
10005
8302.27
2021-05-22
202104-015
52
10060
3625.22
2021-05-22
202104-015
52
10111
22.87
2021-05-22
202104-015
52
10115
435.99
2021-05-22
I have another table that shows the credit available for the given contact:
id
credit_id
owner_id
total_applied
date_applied
1
C00001
52
500000.00
2021-05-14
I have tried using the following SQL statement, based on another stackoverflow question to subtract from the previous row, thinking each row would then reflect the remaining credit:
Select
invoice_id,
solution_id
sum(total) as 'total',
cr.total_remaining - coalesce(lag(total)) over (order by s.solution_id), 0) as credit_available,
date
from
invoices
left join credits cr on
cr.credit_id = 'C00001'
Whilst this does subtract, it only subtracts from the row above it, not all of the rows above it:
invoice_id
solution_id
total
credit_available
date
202104-015
10000
30317.27
500000.00
2021-05-22
202104-015
10001
2399.90
469682.73
2021-05-22
202104-015
10005
8302.27
497600.10
2021-05-22
202104-015
10060
3625.22
491697.73
2021-05-22
202104-015
10111
22.87
496374.78
2021-05-22
202104-015
10115
435.99
499977.13
2021-05-22
I've also tried various queries with a mess of case statements.
Im at the point where I am contemplating using powershell or similar to do the task instead (loop through each solution, check if there is enough available credit, update a deduction table, goto next etc) but I'd rather keep it all in SQL if I can.
Anyone have some pointers for this beginner?
You don't need to use window functions, use a sub-query that sums the total of previous invoices. But be sure to use index the table correctly so that performance is not a problem.
There are two sub-queries, one for the previous total sum and another to get the date of the next credit for contact_id.
SELECT [inv].[invoice_id],
[inv].[solution_id],
[inv].[total],
-- subquery that sums the previous totals
[cr].[total_applied] - COALESCE((
SELECT SUM([inv_inner].[total])
FROM [dbo].[invoices] AS [inv_inner]
WHERE [inv_inner].[solution_id] < [inv].[solution_id]
), 0) AS [credit_available],
[inv].[date]
FROM [dbo].[invoices] [inv]
LEFT JOIN [dbo].[credits] [cr]
ON [cr].[owner_id] = [inv].[contact_id]
-- here, we make sure that the credit is available for the correct period
-- invoice date >= credit date_applied
AND [inv].[date] >= [cr].[date_applied]
-- and invoice date < next date_applied or tomorrow, in case there are no next date_applied
AND [inv].[date] < COALESCE((
SELECT MIN([cr2].[date_applied])
FROM [dbo].[credits] [cr2]
WHERE [cr2].[owner_id] = [cr].[owner_id]
AND [cr2].[date_applied] > [cr].[date_applied]
), GETDATE()+1)
AND [cr].[credit_id] = 'C00001';
This query works, but it is for this question only. Please study it and adapt to your real world problem.
This is a pretty complex scenario. I sadly cannot spend the time to offer a complete solution here. I do can provide you with tips and points of attention here:
Be sure to determine the actual remaining credit based on the complete invoice history. If you introduce filtering (in a WHERE-clause, for example, or by including joins with other tables), the results should not be affected by it. You should probably pre-calculate the available credit per invoice detail record in a temporary table or in a CTE and use that data in your main query.
Make sure that you regard the date_applied value of the credit. Before a credit is applied to a customer, that customer should probably have less credit or no credit at all. That should be reflected correctly on historical invoices, I guess.
Make sure you determine the correct amount of total credit. It is unclear from the information provided in your question how that should be determined/calculated. Is only the latest total_applied value from the credits table active? Or should all the historical total_applied values be summarized to get the total available credit?)
Include a correct join between your invoices table and your credits table. Currently, this join is hard coded in your query.
Also regard actual payments by customers. Payments have effect on the available credit, I assume. Also note that, unless you are OK with a history that changes, you need to regard the payment dates as well (just like the credit change dates).
I'm not sure how you would solve your scenario using PowerShell... I do know for sure, that this can be tackled with SQL.
I cannot say anything about the resulting performance, however. These kinds of calculations surely come with a price tag attached in that regard. If you need high performance, I guess it might be more practical to include columns in your invoices table to physically store the available credit with each invoice detail record.
Edit
I have experimented a little with your scenario and your additional comments.
My solution implementation uses two CTEs:
The first CTE (cte_invoice_credit_dates) retrieves the date of the active credit record for specific invoice IDs.
The second CTE (cte_contact_invoice_summarized_totals) calculates the invoice totals of all the invoices of a specific contact. Since you want to summarize on solution detail per invoice as well, I also included the solution ID per invoice in the querying logic.
The main query selects all columns from the invoices table and uses the data from the two CTEs to calculate three additional columns in the result set:
Column credit_assigned represents the total assigned credit at the invoice's date.
Column summarized_total shows the contact's cumulative invoice total.
Column credit_available shows the remaining credit.
WITH
[cte_invoice_credit_dates] AS (
SELECT DISTINCT
I.[invoice_id],
C.[date_applied]
FROM
[invoices] AS I
OUTER APPLY (SELECT TOP (1) [date_applied]
FROM [credits]
WHERE
[owner_id] = I.[contact_id] AND
[date_applied] <= I.[date]
ORDER BY [date_applied] DESC) AS C
),
[cte_contact_invoice_summarized_totals] AS (
SELECT
I.[contact_id],
I.[invoice_id],
I.[solution_id],
SUM(H.[total]) AS [total]
FROM
[invoices] AS I
INNER JOIN [invoices] AS H ON
H.[contact_id] = I.[contact_id] AND
H.[invoice_id] = I.[invoice_id] AND
H.[solution_id] <= I.[solution_id] AND
H.[date] <= I.[date]
GROUP BY
I.[contact_id],
I.[invoice_id],
I.[solution_id]
)
SELECT
I.[invoice_id],
I.[contact_id],
I.[solution_id],
I.[total],
I.[date],
COALESCE(C.[total_applied], 0) AS [credit_assigned],
H.[total] AS [summarized_total],
COALESCE(C.[total_applied] - H.[total], 0) AS [credit_available]
FROM
[invoices] AS I
INNER JOIN [cte_contact_invoice_summarized_totals] AS H ON
H.[contact_id] = I.[contact_id] AND
H.[invoice_id] = I.[invoice_id] AND
H.[solution_id] = I.[solution_id]
LEFT JOIN [cte_invoice_credit_dates] AS CD ON
CD.[invoice_id] = I.[invoice_id]
LEFT JOIN [credits] AS C ON
C.[owner_id] = I.[contact_id] AND
C.[date_applied] = CD.[date_applied]
ORDER BY
I.[invoice_id],
I.[solution_id];

MSAccess/SQL lookup table for match field based on sum of current table.field

I've been battling this for the last week with many attempted solutions. I want to return the unique names in table with the sum of their points and their current dance level based on that sum. Ultimately I want compare the returned dance level with what is stored in the customer table against the customer and show only the records where the two dance levels are different (the stored dance level and the calculated dance level based on the current sum of the points.
The final solution will be a web page using ADODB connection to MSAccess DB (2013). But for starters just want it to work in MSAccess.
I have a MSAccess DB (2013) with the following tables.
PointsAllocation
CustomerID Points
100 2
101 1
102 1
100 1
101 4
DanceLevel
DLevel Threshold
Beginner 2
Intermediate 4
Advanced 6
Customer
CID Firstname Dancelevel1
100 Bob Beginner
101 Mary Beginner
102 Jacqui Beginner
I want to find the current DLevel for each customer by using the SUM of their Points in the first table. I have this first...
SELECT SUM(Points), CustomerID FROM PointsAllocation GROUP BY CustomerID
Works well and gives me total points per customer. I can then INNER JOIN this to the customer table to get the persons name. Perfect.
Now I want to add the DLevel from the DanceLevel table to the results where the SUM total is used to lookup the Threshold and not exceed the value so I get the following:
(1) (2) (3) (4)
Bob 3 Beginner Intermediate
Mary 5 Beginner Advanced
Where...
(1) Customer.Firstname
(2) SUM(PointsAllocation.Points)
(3) Customer.Dancelevel1
(4) Dancelevel.DLevel
Jacqui is not shown as her SUM of Points is less than or equal to 2 giving her a calculated dance level of Beginner and this already matches the her Dancelevel1 in the Customer table.
Any ideas anyone?
You can start from the customer table because you want to list every customer. Then left join it with a subquery that calculates the dance levels and point totals. The innermost subquery totals the points and then joins on valid dance levels and selects the max threshold value from the dance levels. Then left join on the DanceLevel table again on the threshold value to get the level's description.
Select Customer.Firstname,
CustomerDanceLevels.Points,
Customer.Dancelevel1,
Dancelevel.DLevel
from Customer
left join
(select CustomerID, Points, Min(Threshold) Threshold
from
(select CustomerID, sum(Points) Points
from PointsAllocation
group by CustomerID
) PointsTotal
left join DanceLevel
on PointsTotal.Points <= DanceLevel.Threshold
group by CustomerID, Points
) CustomerDanceLevels
on Customer.CID = CustomerDanceLevels.CustomerID
left join DanceLevel
on CustomerDanceLevels.Threshold = DanceLevel.Threshold

SQL Statement to total all employee records

I have a sql statement that is missing all employee names.
Table employee_list contains all employees for the company.
Table apps contain the employee that is assigned to the app
Table details contains the total dollar amount for the order
My query will not group and total for employees that did not have any apps. For example employee John had 5 apps for $250, Bill had 2 apps for $75 and Henry had 0 apps for $0 (no rows in apps or details table for Henry).
My query returns:
John 5 250.00
Bill 2 75.00
I need it to return
John 5 250.00
Bill 2 75.00
Henry 0 0.00
Any ideas? Here is my current code
SELECT employee_list.Fullname,
count(apps.acntnum),
sum(details.cost)
FROM employee_list
left join apps on employee_list.Fullname=apps.EmployeeName
LEFT JOIN details ON (apps.ID=details.ObjOwner_ID AND details.Active=1)
Group BY
employee_list.Fullname
The important thing is to be using a LEFT JOIN from your employee_list table and any subsequent tables you're joining to, and to not do anything that will filter out NULLs from the right-hand tables (because the NULLs would be for the 'missing' rows).
Your query is fine, but I suspect you're using it in a wider query, where you may inadvertently have an INNER JOIN or mention one of the columns in a WHERE clause.
I agree with all the other answers, however, you could also try this....
SELECT employee_list.Fullname,
(SELECT count(apps.acntnum) FROM apps WHERE employee_list.Fullname=apps.EmployeeName) AS Cnt,
(SELECT sum(details.cost) FROM apps LEFT JOIN details ON (apps.ID=details.ObjOwner_ID AND details.Active=1) WHERE employee_list.Fullname=apps.EmployeeName) AS cost
FROM employee_list
This will always return the full list of employees, and separately go and count/sum the other values.
This answer does not take performance into account.

MSSQL Comparing rows same table

Hi im looking to compare several rows and check if a certain condition is true/false.
The tables has several columns the ones im interested in are:
Events.Badgeno
Events.Name
Events.Date
Events.Time
Events.Region_id
Events.Data
The region ID can either be 1 or 2.
I want to check weather the same badgeno registers with a different region within a specified date/time difference say 10 mins. (Could be 10 mins before or 10 mins after).
I'm looking to show the records which don't have a record against the 2 regions.
As a further note it should only be within the first and last records of that badge per day.
Normally each record should have a region 1 and 2 record at the start and end. But there maybe multiple region 1's through out the day.
Any suggestions for the best method?
Id Date Time Name Badgeid Region
3385033 27/02/2014 08:16:11 FirstName Surname 5304 2
I think something like this would work
SELECT e.Badgeno,e.Name, e.Date, e.Time,e.Region_id, e.Data
FROM events e
INNER JOIN events e1 ON e1.BadgeNo = e.BadgeNo AND e1.Region_id <> e.RegionId AND DATEDIFF(minutes,e1.date + e1.time,e.date + e.time) > -10 AND DATEDIFF(minutes,e1.date + e1.time,e.date + e.time) < 10
WHERE e1.Region_id IS NULL
you should provide sample data.
This Query is not complete,you can try something with
row_number/rank/dense, partition and check thus number column
generated .
select *,
row_number()over(partition by badgeno,regionno order by badge no)rn from table
where condition of date time

MS Access row number, specify an index

Is there a way in MS access to return a dataset between a specific index?
So lets say my dataset is:
rank | first_name | age
1 Max 23
2 Bob 40
3 Sid 25
4 Billy 18
5 Sally 19
But I only want to return those records between 'rank' 2 and 4, so my results set is Bob, Sid and Billy? However, Rank is not part of the table, and this should be generated when the query is run. Why don't I use an autogenerated number, because if a record is deleted, this will be inconsistent, and what if I wanted the results in reverse!
This obviously very simple, and the reason I ask is because I am working on a product catalogue and I am looking for a more efficient way of paging through the returned dataset, so if I only return 1 page worth of data from the database this is obviously going to be quicker then return a complete set of 3000 records and then having to subselect from that set!
Thanks R.
Original suggestion:
SELECT * from table where rank BETWEEN 2 and 4;
Modified after comment, that rank is not existing in structure:
Select top 100 * from table;
And if you want to choose subsequent results, you can choose the ID of the last record from the first query, say it was ID 101, and use a WHERE clause to get the next 100;
Select top 100 * from table where ID > 100;
But these won't give you what you're looking for either, I bet.
How are you calculating rank? I assume you are basing it on some data in another dataset somewhere. If so, create a function, do a table join, or do something that can calculate rank based on values in other table(s), then you can do queries based on the rank() function.
For example:
select *
from table
where rank() between 2 and 4
If you are not calculating rank based on some data somewhere, there really isn't a way to write this query, and you might as well be returning three random rows from the table.
I think you need to use a correlated subquery to calculate the rank on the fly e.g. I'm guessing the rank is based on name:
SELECT T1.first_name, T1.age,
(
SELECT COUNT(*) + 1
FROM MyTable AS T2
WHERE T1.first_name > T2.first_name
) AS rank
FROM MyTable AS T1;
The bad news is the Access data engine is poorly optimized for this kind of query; in my experience, performace will start to noticeably degrade beyond a few hundred rows.
If it is not possible to maintain the rank on the db side of the house (e.g. high insertion environment) consider doing the paging on the client side. For example, an ADO classic recordset object has properties to support paging (PageCount, PageSize, AbsolutePage, etc), something for which DAO recordsets (being of an older vintage) have no support.
As always, you'll have to perform your own timings but I suspect that when there are, say, 10K rows you will find it faster to take on the overhead of fetching all the rows to an ADO recordset then finding the page (then perhaps fabricate smaller ADO recordset consisting of just that page's worth of rows) than it is to perform a correlated subquery to only fetch the number of rows for the page.
Unfortunately the LIMIT keyword isn't available in MS Access -- that's what is used in MySQL for a multi-page presentation. If you can write an order key into the results table, then you can use it something like this:
SELECT TOP 25 MyOrder, Etc FROM Table1 WHERE MyOrder in
(SELECT TOP 55 MyOrder FROM Table1 ORDER BY MyOrder DESC)
ORDER BY MyOrder ASCENDING
If I understand you correctly, there is ionly first_name and age columns in your table. If this is the case, then there is no way to return Bob, Sid, and Billy with a single query. Unless you do something like
SELECT * FROM Table
WHERE FirstName = 'Bob'
OR FirstName = 'Sid'
OR FirstName = 'Billy'
But I think that this is not what you are looking for.
This is because SQL databases make no guarantee as to the order that the data will come out of the database unless you specify an ORDER BY clause. It will usually come out in the same order it was added, but there are no guarantees, and once you get a lot of rows in your table, there's a reasonably high probability that they won't come out in the order you put them in.
As a side note, you should probably add a "rank" column (this column is usually called id) to your table, and make it an auto incrementing integer (see Access documentation), so that you can do the query mentioned by Sev. It's also important to have a primary key so that you can be certain which rows are being updated when you are running an update query, or which rows are being deleted when you run a delete query. For example, if you had 2 people named Max, and they were both 23, how you delete 1 row without deleting the other. If you had another auto incrementing unique column in there, you could specify the unique ID in your query to delete only one.
[ADDITION]
Upon reading your comment, If you add an autoincrement field, and want to read 3 rows, and you know the ID of the first row you want to read, then you can use "TOP" to read 3 rows.
Assuming your data looks like this
ID | first_name | age
1 Max 23
2 Bob 40
6 Sid 25
8 Billy 18
15 Sally 19
You can wuery Bob, Sid and Billy with the following QUERY.
SELECT TOP 3 FirstName, Age
From Table
WHERE ID >= 2
ORDER BY ID

Resources