SUMIF greater than and Workday column - sql-server

So I'm trying to convert an Excel table into SQL and I'm having difficulty coming up with the last 2 columns. Below, find my Excel table that is fully functional (in green) and a table for the code that I have in SQL so far (in yellow). I need help replicating columns C and D, I pasted the Excel formula I'm using so you can understand what I'm trying to do:
Here's the code that I have so far:
WITH
cte_DistinctScheduling AS (
SELECT DISTINCT
s.JobNo
FROM
dbo.Scheduling s
WHERE
s.WorkCntr = 'Framing')
SELECT
o.OrderNo,
o.Priority AS [P],
SUM(r.TotEstHrs)/ROUND((8*w.CapacityFactor*(w.UtilizationPct/100)),2) AS
[Work Days Left],
Cast(GetDate()+ROUND(SUM(r.TotEstHrs)/ROUND((8*w.CapacityFactor*
(w.UtilizationPct/100)),2),3) AS DATE) AS DueDate
FROM OrderDet o JOIN cte_DistinctScheduling ds ON o.JobNo = ds.JobNo
JOIN OrderRouting r ON o.JobNo = r.JobNo
JOIN WorkCntr w ON r.WorkCntr = w.ShortName
WHERE r.WorkCntr = 'Framing'
AND o.OrderNo NOT IN ('44444', '77777')
GROUP BY o.OrderNo, o.Priority, ROUND((8*w.CapacityFactor*
(w.UtilizationPct/100)),2)
ORDER BY o.Priority DESC;
My work days left column in SQL gets the right amount for that particular row, but I need it to sum itself and everything with a P value above it and then add that to today's date, while taking workdays into account. I don't see a Workday function in SQL from what I've been reading, so I'm wondering what are some creative solutions? Could perhaps a CASE statement be the answer to both of my questions? Thanks in advance

Took me a while to understand how is the Excel helpful, and I'm still having a hard time absorbing the rest, can't tell if it's a me thing or a you thing, in any case...
First, I've mocked up something to test SUM per your rationale, the idea is doing a self-JOIN and summing everything from that JOIN side, relying on the fact that NULLs will come up for anything that shouldn't be summed:
DECLARE #TABLE TABLE(P int, [Value] int)
INSERT INTO #TABLE SELECT 1, 5
INSERT INTO #TABLE SELECT 2, 6
INSERT INTO #TABLE SELECT 3, 2
INSERT INTO #TABLE SELECT 4, 4
INSERT INTO #TABLE SELECT 5, 9
SELECT T1.P, [SUM] = SUM(ISNULL(T2.[Value], 0))
FROM #TABLE AS T1
LEFT JOIN #TABLE AS T2 ON T2.P <= T1.P
GROUP BY T1.P
ORDER BY P DESC
Second, workdays is a topic that comes up regularly. In case you didn't, consider reading a little about it from previous questions, I even posted an answer on one question last week, and the thread as a whole had several references.
Thirdly, we could use table definitions and sample data loaded on SQL itself, something like I did above.
Lastly, could you please check result of UtilizationPct / 100? If that's an integer-like data type, you're probably getting a bad result on it.

Related

SQL - Attain Previous Transaction Informaiton [duplicate]

I need to calculate the difference of a column between two lines of a table. Is there any way I can do this directly in SQL? I'm using Microsoft SQL Server 2008.
I'm looking for something like this:
SELECT value - (previous.value) FROM table
Imagining that the "previous" variable reference the latest selected row. Of course with a select like that I will end up with n-1 rows selected in a table with n rows, that's not a probably, actually is exactly what I need.
Is that possible in some way?
Use the lag function:
SELECT value - lag(value) OVER (ORDER BY Id) FROM table
Sequences used for Ids can skip values, so Id-1 does not always work.
SQL has no built in notion of order, so you need to order by some column for this to be meaningful. Something like this:
select t1.value - t2.value from table t1, table t2
where t1.primaryKey = t2.primaryKey - 1
If you know how to order things but not how to get the previous value given the current one (EG, you want to order alphabetically) then I don't know of a way to do that in standard SQL, but most SQL implementations will have extensions to do it.
Here is a way for SQL server that works if you can order rows such that each one is distinct:
select rank() OVER (ORDER BY id) as 'Rank', value into temp1 from t
select t1.value - t2.value from temp1 t1, temp1 t2
where t1.Rank = t2.Rank - 1
drop table temp1
If you need to break ties, you can add as many columns as necessary to the ORDER BY.
WITH CTE AS (
SELECT
rownum = ROW_NUMBER() OVER (ORDER BY columns_to_order_by),
value
FROM table
)
SELECT
curr.value - prev.value
FROM CTE cur
INNER JOIN CTE prev on prev.rownum = cur.rownum - 1
Oracle, PostgreSQL, SQL Server and many more RDBMS engines have analytic functions called LAG and LEAD that do this very thing.
In SQL Server prior to 2012 you'd need to do the following:
SELECT value - (
SELECT TOP 1 value
FROM mytable m2
WHERE m2.col1 < m1.col1 OR (m2.col1 = m1.col1 AND m2.pk < m1.pk)
ORDER BY
col1, pk
)
FROM mytable m1
ORDER BY
col1, pk
, where COL1 is the column you are ordering by.
Having an index on (COL1, PK) will greatly improve this query.
LEFT JOIN the table to itself, with the join condition worked out so the row matched in the joined version of the table is one row previous, for your particular definition of "previous".
Update: At first I was thinking you would want to keep all rows, with NULLs for the condition where there was no previous row. Reading it again you just want that rows culled, so you should an inner join rather than a left join.
Update:
Newer versions of Sql Server also have the LAG and LEAD Windowing functions that can be used for this, too.
select t2.col from (
select col,MAX(ID) id from
(
select ROW_NUMBER() over(PARTITION by col order by col) id ,col from testtab t1) as t1
group by col) as t2
The selected answer will only work if there are no gaps in the sequence. However if you are using an autogenerated id, there are likely to be gaps in the sequence due to inserts that were rolled back.
This method should work if you have gaps
declare #temp (value int, primaryKey int, tempid int identity)
insert value, primarykey from mytable order by primarykey
select t1.value - t2.value from #temp t1
join #temp t2
on t1.tempid = t2.tempid - 1
Another way to refer to the previous row in an SQL query is to use a recursive common table expression (CTE):
CREATE TABLE t (counter INTEGER);
INSERT INTO t VALUES (1),(2),(3),(4),(5);
WITH cte(counter, previous, difference) AS (
-- Anchor query
SELECT MIN(counter), 0, MIN(counter)
FROM t
UNION ALL
-- Recursive query
SELECT t.counter, cte.counter, t.counter - cte.counter
FROM t JOIN cte ON cte.counter = t.counter - 1
)
SELECT counter, previous, difference
FROM cte
ORDER BY counter;
Result:
counter
previous
difference
1
0
1
2
1
1
3
2
1
4
3
1
5
4
1
The anchor query generates the first row of the common table expression cte where it sets cte.counter to column t.counter in the first row of table t, cte.previous to 0, and cte.difference to the first row of t.counter.
The recursive query joins each row of common table expression cte to the previous row of table t. In the recursive query, cte.counter refers to t.counter in each row of table t, cte.previous refers to cte.counter in the previous row of cte, and t.counter - cte.counter refers to the difference between these two columns.
Note that a recursive CTE is more flexible than the LAG and LEAD functions because a row can refer to any arbitrary result of a previous row. (A recursive function or process is one where the input of the process is the output of the previous iteration of that process, except the first input which is a constant.)
I tested this query at SQLite Online.
You can use the following funtion to get current row value and previous row value:
SELECT value,
min(value) over (order by id rows between 1 preceding and 1
preceding) as value_prev
FROM table
Then you can just select value - value_prev from that select and get your answer

Rows to columns without PIVOT in SQL Server

I have a 3 tables from which contain this data:
Table 1:
Table 2:
Table 3:
Output:
I have tried using Pivot but it has to have an aggregate function in it.
SELECT
project_code, project_name, fk_prj_project_id,
[A], [B], [C], [D]
FROM
(SELECT
project_code, project_name, employee_name,
fk_prj_project_id, fk_prj_project_id AS nm,
activity_details
FROM
PRJ_MST_PROJECT AS a
LEFT JOIN
PRJ_TNS_DAILY_SUMMARY AS b ON a.pk_prj_project_id = b.fk_prj_project_id
LEFT JOIN
HRM_EMP_MST_EMPLOYEE AS c ON b.fk_hrm_emp_employee_id = c.pk_hrm_emp_employee_id
WHERE
a.project_status = 0
AND b.transaction_status = 1
AND CONVERT(date, b.transaction_date, 103) = CONVERT(date, '15/04/2021', 103)) x
PIVOT
(MAX(nm)
FOR nm IN ([A], [B], [C], [D])
) p
The problem is you set your PIVOT to look for values of nm in A, B, C, and D, but nm is an alias for fk_prj_project_id, which has possible values of 1, 2, 3, 4, and 5. So there are no A, B, C, or D values to be had. I don't even see a name for the column that holds A, B, C, and D, but whatever column that is needs to be what you put in the "FOR ___ IN" section of your pivot.
Test your query by commenting out the reference to the pivot columns in the SELECT and comment out the word PIVOT and everything after it and re-run your query. You should see some column with values A, B, C, D. If you don't, fix your query so you do. Once you do, that column is what you PIVOT on (put it between FOR and IN in the pivot block).
Oh, and if you provide data in a usable format people might run your query and give you directly usable results, it's a lot to ask to have people enter your data to get to help you so meet them half way. A link to sqlfiddle is ideal, but even just a bunch of DECLARE #T1 and INSERT INTO T1 VALUES statements is usually enough to get significantly better help.
EDIT:
Nice job with the Fiddle!
OK, so using your data, we can test out actual queries. For PIVOT to work, we need a column to look up (employee name), a column to aggregate (activity_details), and some columns that will be constant across the rows produced (the project's name and ID). You're working with text not numbers, so your aggregation can't be mathematical, leaving you with pretty much just MAX or MIN. To make sure you get the right (newest) one, I first built a table of comments and numbered them by how new they were, then I picked just the newest comment for each (project, user) pair. cteCommentNewest is the result of that.
Now with a clean (and verified) table to pivot, the actual pivot syntax is simple. Well, as simple as Pivot can be, it's inherently pretty confusing IMHO, but structuring it this way keeps the actual PIVOT as clean as possible.
Note that the query is in twice, I tested it as a static query before converting it to dynamic because it's much easier to troubleshoot a static query, then I left it in in case you want to experiment with it. You don't need it for the final solution to work.
Here's the final code, fully tested and producing the specified output:
DECLARE #cols3 AS NVARCHAR(MAX)
DECLARE #query3 AS NVARCHAR(MAX)=''
DECLARE #dt varchar(100)='14/04/2021'
select #cols3 = STUFF((SELECT ',' + QUOTENAME(employee_name)
from dbo.HRM_EMP_MST_EMPLOYEE
order by employee_name
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
--SELECT #cols3 --Test column list for dynamic query
--Test the core functions of pivot before making dynamic
;with cteCommentsAll as (
SELECT P.project_code , P.project_name, C.activity_details , E.employee_name
, ROW_NUMBER () over (PARTITION BY P.project_code , E.employee_name ORDER BY C.transaction_date DESC) as Newness
FROM dbo.PRJ_MST_PROJECT as P --Projects
LEFT OUTER JOIN dbo.PRJ_TNS_DAILY_SUMMARY as C --Comments on projects
ON P.pk_prj_project_id = C.fk_prj_project_id --Get all projects, then all comments for each project
LEFT OUTER JOIN dbo.HRM_EMP_MST_EMPLOYEE as E --Employees who commented
on E.pk_hrm_emp_employee_id = C.fk_hrm_emp_employee_id
), cteCommentsNewest as (
SELECT project_code , project_name, activity_details , employee_name
FROM cteCommentsAll WHERE Newness = 1 --Only one comment per user per project of CROSS problems
)
SELECT *
FROM cteCommentsNewest as N --TEST up to this point to see the raw table
PIVOT (MAX(activity_details) FOR employee_name IN (A, B, C) ) as P
--Put the working query, modified for dynamic columns, into a variable
set #query3 = N'
;with cteCommentsAll as (
SELECT P.project_code , P.project_name, C.activity_details , E.employee_name
, ROW_NUMBER () over (PARTITION BY P.project_code , E.employee_name ORDER BY C.transaction_date DESC) as Newness
FROM dbo.PRJ_MST_PROJECT as P --Projects
LEFT OUTER JOIN dbo.PRJ_TNS_DAILY_SUMMARY as C --Comments on projects
ON P.pk_prj_project_id = C.fk_prj_project_id --Get all projects, then all comments for each project
LEFT OUTER JOIN dbo.HRM_EMP_MST_EMPLOYEE as E --Employees who commented
on E.pk_hrm_emp_employee_id = C.fk_hrm_emp_employee_id
), cteCommentsNewest as (
SELECT project_code , project_name, activity_details , employee_name
FROM cteCommentsAll WHERE Newness = 1 --Only one comment per user per project of CROSS problems
)SELECT *
FROM cteCommentsNewest as N
PIVOT (MAX(activity_details) FOR employee_name IN (' + #cols3 + ') ) as P
'
exec sp_executesql #query3
which produces the following output
project_code
project_name
A
B
C
MOA20171
Project A
some remark By Employee A on 14
NULL
some remark By Employee C on 14
MOA20172
Project B
NULL
NULL
some remark By Employee C on 15
MOA20173
Project C
NULL
NULL
NULL

Select one record, grab "n" records before it, and iterate through them to see if they're sequential

So, I'd like to grab a record from a table of results. Let's say that this is our "sample" record.
Once I have the sample record, I'd like to grab 10 results down the table, and check to see if the sample is sequential within this list of 10 results.
So, if our sample record was 124, I'd like to grab the 10 records before it, and check to see if they follow the sequence of 123, 122, 121, 120, etc.
Once I know that the sample result is in fact sequential down to 10 records, I would like to insert that record into a different table for keeping.
I am using SQL Server and T-SQL to do this, and pulling my hair out trying to do so. If anyone could offer any advice, I would GREATLY appreciate it. Here's what I have so far (with some data removed), with no idea if I'm on the right track.
declare #TestTable as table (a char(15), RowNumber integer)
declare #SampleNumber as char(15)
insert into #TestTable (a, RowNumber)
select top 10
[NUMBERS],
ROW_NUMBER() over (order by a) as RowNumber
from [TABLE]
where
[NUMBERS] like [CONDITIONS]
order by [NUMBERS] desc
With this, I'm trying to grab the result and also a set of row numbers, allowing me to iterate through them based on that row number. But, I'm getting an "Invalid column name 'a'" error when running. Feel free to forget about that error and write something totally new though, because I don't even know if I'm on the right track.
Again, any help would be appreciated.
I am not sure how well this would perform on a larger dataset, but as Peter Smith mentioned, this is possible by using lag to see what the value of the row x rows prior in an ordered window was, though be aware this will run for all rows in your table and return all those that meet the criteria, rather than randomly sampling:
-- Create a not quite sequential dataset
declare #t table(n int);
with n as
(
select row_number() over (order by (select null)) as n
,abs(checksum(newid())) % 14 as r
from sys.all_objects
)
insert into #t
select n
from n
where r > 2;
-- Output the original dataset
select *
from #t;
-- Only return rows that come after a certain number of sequential numbers
declare #seq int = 10;
with l as
(
select n
,n - lag(n,#seq,null) over (order by n) as l
from #t
)
select n
from l
where l = #seq;

TSQL Subquery filter to performance problem

I'm working on this query. When I assign a date filter from subquery to main query, the response time increases from 1 second to 4.5 minutes.
I don't know how to solve this problem and fix my query. I'm writing the query and the methods I've tried.
Thank you for your help.
My query:
select
START_DATE as DATE,
[MINUTE] as MIN,
map1.LT,
ISNULL((SELECT
(SELECT CAST((main.MIN) AS FLOAT)) /
(
(nullif(
(select cast(
(select
sum(MIN2)
from fooTable2 d2
CROSS APPLY (select Top(1) LT from FooMap2 where x = d2.x) k2
where k2.LT = map1.LT
**-- PROBLEM CODE START**
and YEAR(d2.DATE) = YEAR(main.DATE) and MONTH(d2.DATE) = MONTH(main.DATE)
**-- PROBLEM CODE END**
) as float)),0))
as XX,
.......
......
from Table1 main
OUTER APPLY (select Top(1) LT from FooMap where x = main.x) map1
I tried creating a virtual table.
But not working.
declare #child table ([Year] smallint, [Month] smallint, [Total] float,[LTCode] nvarchar(20))
insert into #child ([LTCode],[Year],[Month],[Total])
(select
k2.LT,YEAR(d2.DATE) as YIL,MONTH(d2.DATE) as AY,sum(MIN) as SURE
from DURUS d2
CROSS APPLY (select Top(1) LT from FooMap2 where x = d2.x) k2
group by k2.LT,YEAR(d2.DATE),MONTH(d2.DATE))
...
....
(select [Total] from #child where [YEAR] = YEAR(main.DATE) and [MONTH] = MONTH(main.DATE) and [LTCode] = map1.LT)
What should I do ?
The root problem is the data model. You need to filter on month and year but store your data as a DATE, DATETIME or similar. There is no easy way to make this fast:
and YEAR(d2.DATE) = YEAR(main.DATE)
and MONTH(d2.DATE) = MONTH(main.DATE)
WHERE FUNCTION(Input) = FUNCTION(Input) forces a scan against each table, having two such filters means you are touching/evaluating each value (d2.date and main.date) twice for each row in each table. To fix this your best options include:
Adding a persisted computed column on each table for year and month then add the appropriate index (on year, month with all columns involved in your query added as Include columns.
Use an indexed view to pre-join Durus and main, not simple but doable.
Learn how to create and utilize a correctly indexed calendar table. This will require some effort but will also change your career.
Work other filters on the Left side of your joins...
For example: add a WHERE clause after from fooTable2 d2 to filter out any additional rows before the join.
Common ways to optimize:
1) remove calculations from the left in filters
2) render subqueries in separate temporary tables or in intermediate CTE queries
3) do not use table variables
4) in the early stages try to filter as much data as possible before connections

SQL Server Comparing Subsequent Rows for Duplicates

I am trying to write a SQL Server query but have had no luck and was wondering if anyone may have any ideas on how to achieve my query.
What i'm trying to do:
I have a table with several columns naming the ones that i am dealing with TaskID, StatusCode, Timestamp. Now this table just holds tasks for one of our systems that run throughout the day and when something runs it gets a timestamp and the statuscode depending on the status for that task.
Sometimes what happens is the task table will be updated with a new timestamp but the statusCode will not have changed since the last update of the task so for two or more consecutive rows of a given task the statusCode can be the same. When i say consecutive rows i mean with regards to timestamp.
So example task 88 could have twenty rows at statusCode 2 after which the status code changes to something else.
Now what i am trying to do with no luck at the moment is to retrieve a list from this table of all the tasks and the statuscodes and the timestamps but in the case where i have more than one consecutive row for a task with the same statuscode i just want to take the first row with the lowest timestamp and ignore the rest of the row until the statuscode for that task changes.
To make it simpler in this case you can assume that i have a taskid which i am filtering on so i am just looking at a single task.
Does anyone have any ideas as to how i can do this or perhaps something that i coudl probably read to help me?
Thanks
Irfan.
This are a couple ways of getting what you want:
SELECT
T1.task_id,
T1.status_code,
T1.status_timestamp
FROM
My_Table T1
LEFT OUTER JOIN My_Table T2 ON
T2.task_id = T1.task_id AND
T2.status_timestamp < T1.status_timestamp
LEFT OUTER JOIN My_Table T3 ON
T3.task_id = T1.task_id AND
T3.status_timestamp < T1.status_timestamp AND
T3.status_timestamp > T2.status_timestamp
WHERE
T3.task_id IS NULL AND
(T2.status_code IS NULL OR T2.status_code <> T1.status_code)
ORDER BY
T1.status_timestamp
or
SELECT
T1.task_id,
T1.status_code,
T1.status_timestamp
FROM
My_Table T1
LEFT OUTER JOIN My_Table T2 ON
T2.task_id = T1.task_id AND
T2.status_timestamp = (
SELECT
MAX(status_timestamp)
FROM
My_Table T3
WHERE
T3.task_id = T1.task_id AND
T3.status_timestamp < T1.status_timestamp)
WHERE
(T2.status_code IS NULL OR T2.status_code <> T1.status_code)
ORDER BY
T1.status_timestamp
Both methods rely on there being no exact matches of the status_timestamp values (two rows can't have the same exact status_timestamp for a given task_id.)
Something like
select TaskID,StatusCode,Min(TimeStamp)
from table
group by TaskID,StatusCode
order by 1,2
Note that is statuscode can duplicate, you will need an additional field, but hopefully this can point you in the right direction...
Something like the following should get you in the right direction....
CREATE TABLE #T
(
TaskId INT
,StatusCode INT
,StatusTimeStamp DATETIME
)
INSERT INTO #T
SELECT 1, 1, '2009-12-01 14:20'
UNION SELECT 1, 2, '2009-12-01 16:20'
UNION SELECT 1, 2, '2009-12-02 09:15'
UNION SELECT 1, 2, '2009-12-02 12:15'
UNION SELECT 1, 3, '2009-12-02 18:15'
;WITH CTE AS
(
SELECT TaskId
,StatusCode
,StatusTimeStamp
,ROW_NUMBER() OVER (PARTITION BY TaskId, StatusCode ORDER BY TaskId, StatusTimeStamp DESC) AS RNUM
FROM #T
)
SELECT TaskId
,StatusCode
,StatusTimeStamp
FROM CTE
WHERE RNUM = 1
DROP TABLE #T

Resources