SUM of Column based on Distinct value of other Column

SUM of Column based on Distinct value of other Column - sql-server

I have a Job Work flow table in SQL Server which is having the task start_time and end_time on a page and total points for that page.
The Page start_time and end_time on a particular date will be as duplicated i.e the employee will have started and stopped the task on that page so many times like below
My Query
SELECT DISTINCT tu.user_id,tjd.job_id, tjd.job_case_no,isnull(tu.first_name,tu.user_name)Name,(Count(tjw.total_pages)) total_pages,tjd.job_status_id_fk,tjd.drawing_type_id_fk,SUM(tjw.total_points) Points
FROM dbo.tbl_job_workflow tjw
LEFT JOIN dbo.tbl_user tu ON tu.user_id = tjw.user_id_fk
LEFT JOIN dbo.tbl_job_details tjd ON job_id=job_id_fk
WHERE isnull(tjd.job_case_no,'')<>'' AND tjw.start_time>='2016-06-28' AND tjw.end_time<='2016-06-28'
GROUP BY tjd.job_case_no,tu.first_name,tu.user_name,tu.user_id,tjd.job_id,job_status_id_fk,tjd.drawing_type_id_fk
Sample Output
user_id job_id job_case_no Name total_pages Points
4 298 Testcase_17062016_0244PM Emp1 1 6
4 346 TestCase-01 Emp1 2 4
27 346 TestCase-01 Emp2 11 11
27 350 5435435 Emp2 1 1
4 350 5435435 Emp1 5 5
In the above Output for case TestCase-01 for uemployee Emp2 the points should be 10 not 11
I need to get Sum of the Points of distinct pages on the for a day.
i.e if a job with 4 pages haven been worked on with page no 1 having 2 Points have been worked on the same day twice then the Sum of the Points for that day should be 2 not 4
Kindly anyone guide me how to get the Sum based on the above condition

I had took the Sum by passing the variables to a function
;WITH CTE_Table
AS
(
SELECT DISTINCT tjw.total_points,tjw.total_pages
FROM dbo.tbl_job_workflow tjw WHERE USER_id_fk=#user_id and job_id_fk=#job_id
)
SELECT SUM(total_points) FROM CTE_Table

Related

Finding A Time When A Value Changed

I am still learning many new things about SQL such as PARTITION BY and CTEs. I am currently working on a query which I have cobbled together from a similar question I found online. However, I can not seem to get it to work as intended.
The problem is as follows -- I have been tasked to show rank promotions in an organization from the begining of 2022 to today. I am working with 2 primary tables, an EMPLOYEES table and a PERIODS table. This periods table captures a snapshot of any given employee each month - including their rank at the time. Each of these months is also assigned a PeriodID (e.g. Jan 2022 = PeriodID 131). Our EMPLOYEE table holds the employees current rank. These ranks are stored as an int (e.g. 1,2,3 with 1 being lowest rank). It is possible for an employee to rank up more than once in any given month.
I have simplified the used query as much as I can for the sake of this problem. Query follows as:
;WITH x AS
(
SELECT
e.EmployeeID, p.PeriodID, p.RankID,
rn = ROW_NUMBER() OVER (PARTITION BY e.EmployeeID ORDER BY p.PeriodID DESC)
FROM employees e
LEFT JOIN periods p on p.EmployeeID= e.EmployeeID
WHERE p.PeriodID <= 131 AND p.PeriodID >=118 --This is the time range mentioned above
),
rest AS (SELECT * FROM x WHERE rn > 1)
SELECT
main.EmployeeID,
PeriodID = MIN(
CASE
WHEN main.CurrentRankID = Rest.RankID
THEN rest.PeriodID ELSE main.PeriodID
END),
main.RankID, rest.RankID
FROM x AS main LEFT OUTER JOIN rest ON main.EmployeeID = rest.EmployeeID
AND rest.rn >1
LEFT JOIN periods p on p.EmployeeID = e.EmployeeID
WHERE main.rn = 1
AND NOT EXISTS
(
SELECT 1 FROM rest AS rest2
WHERE EmployeeID = rest.EmployeeID
AND rn < rest.rn
AND main.RankID <> rest.RankID
)
and p.PeriodID <= 131 AND p.PeriodID >=118
GROUP BY main.EmployeeID, main.PeriodID, main.RankID, rest.RankID
As mentioned before, this query was borrowed from a similar question and modified for my own use. I imagine the bones of the query is good and maybe I have messed up a variable somewhere but I can not seem to locate the problem line. The end goal is for the query to result in a table showing the EmployeeID, PeriodID, the rank they are being promoted from, and the rank they are being promoted to in the month the promotion was earned. Similar to the below.
EmployeeID
PeriodID
PerviousRankID
NewRank
123
131
1
2
123
133
2
3
Instead, my query is spitting out repeating previous/current ranks and the PeriodIDs seem to be static (such as what is shown below).
EmployeeID
PeriodID
PerviousRankID
NewRank
123
131
1
1
123
131
1
1
I am hoping someone with a greater knowledge base on these functions is able to quickly notice my mistake.

If we assume some example DML/DDL (it's really helpful to provide this with your question):
DECLARE #Employees TABLE (EmployeeID INT IDENTITY, Name VARCHAR(20), RankID INT);
DECLARE #Periods TABLE (PeriodID INT, EmployeeID INT, RankID INT);
INSERT INTO #Employees (Name, RankID) VALUES ('Jonathan', 10),('Christopher', 10),('James', 10),('Jean-Luc', 8);
INSERT INTO #Periods (PeriodID, EmployeeID, RankID) VALUES
(1,1,1),(2,1,1),(3,1,1),(4,1,8 ),(5,1,10),(6,1,10),
(1,2,1),(2,2,1),(3,2,1),(4,2,8 ),(5,2,8 ),(6,2,10),
(1,3,1),(2,3,1),(3,3,7),(4,3,10),(5,3,10),(6,3,10),
(1,4,1),(2,4,1),(3,4,1),(4,4,8 ),(5,4,9 ),(6,4,9 )
Then we can accomplish what I think you're looking for using a OUTER APPLY then aggregates the values based on the current-row values:
SELECT e.EmployeeID, e.Name, e.RankID AS CurrentRank, ap.PeriodID AS ThisPeriod, p.PeriodID AS LastRankChangePeriodID, p.RankID AS LastRankChangedFrom, ap.RankID - p.RankID AS LastRankChanged
FROM #Employees e
LEFT OUTER JOIN #Periods ap
ON e.EmployeeID = ap.EmployeeID
OUTER APPLY (
SELECT EmployeeID, MAX(PeriodID) AS PeriodID
FROM #Periods
WHERE EmployeeID = e.EmployeeID
AND RankID <> ap.RankID
AND PeriodID < ap.PeriodID
GROUP BY EmployeeID
) a
LEFT OUTER JOIN #Periods p
ON a.EmployeeID = p.EmployeeID
AND a.PeriodID = p.PeriodID
ORDER BY e.EmployeeID, ap.PeriodID DESC
Using the correlated subquery we get a view of the data which we can filter using the current-row values, and we aggregate that to return the period we're looking for (where it's before this period, and it's not the same rank). Then it's just a join back to the Periods table to get the values.
You used an LEFT JOIN, so I've preserved that using an OUTER APPLY. If you wanted to filter using it, it would be a CROSS APPLY instead.
EmployeeID
Name
CurrentRank
ThisPeriod
LastRankChangePeriodID
LastRankChangedFrom
LastRankChanged
1
Jonathan
10
6
4
8
2
1
Jonathan
10
5
4
8
2
1
Jonathan
10
4
3
1
7
1
Jonathan
10
3
1
Jonathan
10
2
1
Jonathan
10
1
2
Christopher
10
6
5
8
2
2
Christopher
10
5
3
1
7
2
Christopher
10
4
3
1
7
2
Christopher
10
3
2
Christopher
10
2
2
Christopher
10
1
3
James
10
6
3
7
3
3
James
10
5
3
7
3
3
James
10
4
3
7
3
3
James
10
3
2
1
6
3
James
10
2
3
James
10
1
4
Jean-Luc
8
6
5
9
-1
4
Jean-Luc
8
5
4
8
1
4
Jean-Luc
8
4
3
1
7
4
Jean-Luc
8
3
4
Jean-Luc
8
2
4
Jean-Luc
8
1
Now we can see what the previous change looked like for each period. Currently Jonathan is has RankID 10. Last time that was different was in PeriodID 4 when it was 8. The same was true for PeriodID 5. In PeriodID 4 he had RankID 8, and prior to that he had RankID 1. Before that his Rank hadn't changed.
Jean-Luc was actually demoted as his last change. I don't know if this is possible within your model.

Query to Find the "Balance Amount after each Payment of corresponding Bill" in SQL Server

PaymentID SupplyInvoiceID Date TotalBill BillPaidAmount Remaining Bill
1 1 05-04-2018 2,10,000 20,000 1,90,000
2 1 10-05-2018 2,10,000 60,000 1,30,000
3 1 13-06-2018 2,10,000 1,30,000 0
4 2 10-05-2018 80,000 40,000 40,000
5 2 13-06-2018 80,000 20,000 20,000
6 2 13-06-2018 80,000 20,000 0
The payment of each Bill is paid in installments in different dates as shown above. How to find the remaining Bill amount each time when the partial payment of each bill is made?
I used the following Query:
SELECT siph.SupplyPaymentID,si.SupplyInvoiceID,
siph.DateOfPayment,si.TotalBill, siph.BillPaidAmount,
si.TotalBill - SUM(siph.BillPaidAmount) over(order by siph.SupplyPaymentID asc) as RemainingBillAmount,
siph.PaymentMode
from SupplyInvoicePaymentHistory siph inner join
SupplyInvoice si
on siph.SupplyInvoiceID = si.SupplyInvoiceID
But it works fine for only bill payments of 1st SupplyInvoiceID. As i enter the bill payments of 2nd and onward SupplyInvoiceID, i gets the wrong result as follows:
PaymentID SupplyInvoiceID Date TotalBill BillPaidAmount Remaining Bill
1 1 05-04-2018 2,10,000 20,000 1,90,000
2 1 10-05-2018 2,10,000 60,000 1,30,000
3 1 13-06-2018 2,10,000 1,30,000 0
4 2 10-05-2018 80,000 40,000 -1,70,000
5 2 13-06-2018 80,000 20,000 -1,90,000
6 2 15-06-2018 80,000 20,000 -2,10,000
..please help to find the correct result as tabulated at the first para of the above question.

You need to add PARTITION BY clause to your sum() over () to make it a cumulative sum for each Invoice ID.
Add this to your RemainingBillAmount column:
... - SUM(...) over (partition by si.SupplyInvoiceID ...)
Entire query:
SELECT siph.SupplyPaymentID,si.SupplyInvoiceID,
siph.DateOfPayment,si.TotalBill, siph.BillPaidAmount,
si.TotalBill - SUM(siph.BillPaidAmount) over(partition by si.SupplyInvoiceID order by siph.SupplyPaymentID asc) as RemainingBillAmount,
siph.PaymentMode
from SupplyInvoicePaymentHistory siph inner join
SupplyInvoice si
on siph.SupplyInvoiceID = si.SupplyInvoiceID

SQL Server: How to get a rolling sum over 3 days for different customers within same table

This is the input table:
Customer_ID Date Amount
1 4/11/2014 20
1 4/13/2014 10
1 4/14/2014 30
1 4/18/2014 25
2 5/15/2014 15
2 6/21/2014 25
2 6/22/2014 35
2 6/23/2014 10
There is information pertaining to multiple customers and I want to get a rolling sum across a 3 day window for each customer.
The solution should be as below:
Customer_ID Date Amount Rolling_3_Day_Sum
1 4/11/2014 20 20
1 4/13/2014 10 30
1 4/14/2014 30 40
1 4/18/2014 25 25
2 5/15/2014 15 15
2 6/21/2014 25 25
2 6/22/2014 35 60
2 6/23/2014 10 70
The biggest issue is that I don't have transactions for each day because of which the partition by row number doesn't work.
The closest example I found on SO was:
SQL Query for 7 Day Rolling Average in SQL Server
but even in that case there were transactions made everyday which accomodated the rownumber() based solutions
The rownumber query is as follows:
select customer_id, Date, Amount,
Rolling_3_day_sum = CASE WHEN ROW_NUMBER() OVER (partition by customer_id ORDER BY Date) > 2
THEN SUM(Amount) OVER (partition by customer_id ORDER BY Date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
END
from #tmp_taml9
order by customer_id
I was wondering if there is way to replace "BETWEEN 2 PRECEDING AND CURRENT ROW" by "BETWEEN [DATE - 2] and [DATE]"

One option would be to use a calendar table (or something similar) to get the complete range of dates and left join your table with that and use the row_number based solution.
Another option that might work (not sure about performance) would be to use an apply query like this:
select customer_id, Date, Amount, coalesce(Rolling_3_day_sum, Amount) Rolling_3_day_sum
from #tmp_taml9 t1
cross apply (
select sum(amount) Rolling_3_day_sum
from #tmp_taml9
where Customer_ID = t1.Customer_ID
and datediff(day, date, t1.date) <= 3
and t1.Date >= date
) o
order by customer_id;
I suspect performance might not be great though.

T-SQL how to calculate datediff from previous or next row on log?

I use MS SSMS 2008 R2 to extract data from our company management software, which registers our employee actions and schedules. The table has and ID field, which is unique to each entry. job is the activity the user is performing. user is the user ID. start_time and duration are exactly that. Then there is a "type" where 0 is login (the user logs into the job) and 1 is available time (while performing a job the user may be available or not). "reason" is the reason why the user has become unavailable (break, coffee, lunch, training, etc). Type 0 entries have no reason so reason is always null.
I need to extract the unavailable times by reason and all I'm being able to achieve is to do a DATEADD of duration to start_time in order to get end_time and then use Excel to manually calculate the times for each row.
The SQL table looks like this:
id job user start_time duration type reason
4436812 3 758 05-06-2015 09:00 125670 0 NULL
4436814 3 758 05-06-2015 09:00 6970 1 1004
4436944 3 758 05-06-2015 09:14 39280 1 1004
4437119 3 758 05-06-2015 10:20 0 1 1002
4437172 3 758 05-06-2015 10:35 18470 1 1004
4437312 3 758 05-06-2015 11:09 3960 1 1004
4437350 3 758 05-06-2015 11:16 0 1 1006
4437360 3 758 05-06-2015 11:19 30080 1 1004
4437638 3 758 05-06-2015 12:13 6730 1 1004
4437695 3 758 05-06-2015 12:24 0 1 1007
4438227 3 758 05-06-2015 13:43 NULL 0 NULL
4438228 3 758 05-06-2015 13:43 NULL 1 NULL
(job = 3 and user = 758)
This is the query I made:
select CONVERT(date,start_time) Data, a.job, a.user, convert(varchar(15),convert(datetime,a.start_time),108) StartTime, a.duration duracao,
convert(varchar(15),convert(datetime,DATEADD(second,a.duration/10,a.start_time)),108) EndTime, a.type, a.reason
from schedule_log a
where a.job = 3
and a.user = 758
and CONVERT(date,start_time) = '20150605'
order by a.start_time, a.type
Which translates to:
Date job user LogTime Avail NotAvail
2015-06-05 3 758 04:44:01 04:10:23 00:33:38
So, for each reason, I have to do a DATEDIFF from end time (start+duration) to either the next type 1 start_time or the previous type 0 end time, which ever happened first (the user may become unavailable and then logoff).
How do I do this?
ps: duration is in tenths of second.

Ok, here is my updated suggestion. It is broken into three steps for clarity, but the temp tables are unnecessary - they could become subqueries.
Step 1: Calculate the end time for each period of activity, excluding logins.
Step 2: Join each row to the row that occurred immediately after it, to get the unavailable time following each reason. Note: some of your timestamps do not line up properly, possibly as a result of storing duration in seconds but timestamps only to the minute.
Step 3: Total the unavailable time, and subtract from the duration of the login to get the available time.
Step 4: Total the unavailable time by reason.
SELECT *
,dateadd(s, duration / 10, start_time) AS Endtime
,row_number() OVER (
PARTITION BY job ,[user] ORDER BY start_time, [type]
) AS RN
INTO #temp2
FROM MyTable
WHERE [type] = 1
SELECT a.[user]
,a.job
,a.reason
,a.start_time
,a.type
,a.duration / 10 AS AvailableSeconds
,datediff(s, a.Endtime, b.start_time) AS UnavailableSeconds
INTO #temp3
FROM #temp2 a
LEFT JOIN #temp2 b
ON a.[user] = b.[user]
AND a.job = b.job
AND a.RN = b.RN - 1
SELECT cast(a.start_time AS DATE) AS [Date]
,a.job
,a.[user]
,b.duration / 10 AS LogTime
,b.duration / 10 - sum(UnavailableSeconds) AS Avail
,sum(UnavailableSeconds) AS NotAvail
FROM #temp3 a
LEFT JOIN MyTable b
ON a.job = b.job
AND a.[user] = b.[user]
AND b.[type] = 0
AND b.duration IS NOT NULL
GROUP BY cast(a.start_time AS DATE)
,a.job
,a.[user]
,b.duration
SELECT cast(a.start_time AS DATE) AS [Date]
,a.job
,a.[user]
,a.reason
,sum(UnavailableSeconds) AS NotAvail
FROM #temp3 a
where reason is not null
GROUP BY cast(a.start_time AS DATE)
,a.job
,a.[user]
,a.reason

Select fields from differents rows

I have the following table Test
id value type
1 100 prime
1 200 13 month
2 120 prime
2 300 13 month
How can I get the following result
id valuePrime typePrime valueMonth typeMonth
1 100 prime 200 13 month
2 120 prime 300 13 month

Looking on this you could split data using type prime:
select id, value as ValPrime, 'prime' as TypePrime from tbl where type = 'prime'
then select :
select id, value as ValMonth, type as TypeMonth from tbl where type != 'prime'
and then joing them
but this is workaround, this really not good:)

Modify your schema:
id parentid value type
---------------------------
1 null 100 prime
2 1 200 13 month
3 null 120 prime
4 3 300 13 month
and query like this:
SELECT a.id, a.value AS valuePrime, a.type AS typePrime, b.value AS valueMonth, b.type AS typeMonth
FROM Test AS a
INNER JOIN Test AS b
ON a.id=b.parentid

This might work out for you although beware that this would work only in case if there are only 2 records with the same ID as you have shown above.I would still suggest a change in the schema.The following query might help you for the time being.
SELECT q1.id, q1.value AS valueprime, q1.type AS typeprime,q2.value
as valueMonth,q2.type as typeMonth
FROM Test AS q1 INNER JOIN Test AS q2 ON q1.id = q2.id AND q1.value<> q2.value