How to create a week counter? - sql-server

Example of what I am trying to do:
I have 10 employees. They all started on different days throughout the year. Each potentially gets paid once a week. I want to query their first paycheck and call that week 1 for all employees. Then each subsequent paycheck will be 2...3...through 13. So basically I want to see what each of their first 13 weeks on the job looked like stacked against each other. The catch here is the potentially above. Employees might not get paid each week so I would want to see a zero for that week. I know this is tough because there is no record to read for that week. I would expect my output to look something like this:
I was thinking of using a Tally Table of some kind and reading their hire date +7 over an over? I am open to any idea.

You can use Row_Number() as shown below.
SELECT Week
,EmployeeId
,[Paycheck Date]
,Amount
,Row_Number() OVER (
PARTITION BY EmployeeId ORDER BY [Paycheck Date]
) AS WkNo
FROM Yourtable
To create and join with the Week Number table you can try something as shown below.
;WITH WeekTable(n) AS
(
SELECT 1
UNION ALL
SELECT n + 1 FROM WeekTable WHERE n < 52
)
SELECT
n
FROM WeekTable
left join (SELECT Week
,EmployeeId
,[Paycheck Date]
,Amount
,Row_Number() OVER (
PARTITION BY EmployeeId ORDER BY [Paycheck Date]
) AS WkNo
FROM Yourtable)emp on n = emp.WkNo
ORDER BY n
OPTION (MAXRECURSION 1000);

Related

Update latest record with previous record values

I have a table called Audits that has a CompanyId, Date, AuditNumber field, each Company can have many audits and the AuditNumber field keeps track of how many audits said company has.
I'm trying to update all latest audit records date with it's previous date + 5 years, so say CompanyId 12345 has 3 audits, I want to update the 3rd audit (3rd audit being the latest one) records date with the 2nd audit records date + 5 years into the future, etc... basically doing this to all the latest records.
What I've got so far is trying to use a while loop to do this but I'm pretty stuck as it's not exactly doing what I want it to...
DECLARE #counter INT = 1;
WHILE(#counter <= (SELECT COUNT(*) FROM Audits WHERE AuditNumber > 1)
BEGIN
UPDATE Audits
SET Date = CASE
WHEN AuditNumber > 1 THEN (SELECT TOP 1 DATEADD(YEAR, 5, Date) FROM Audits WHERE AuditNumber < (SELECT(MAX(AuditNumber) FROM Audits))
END
WHERE AuditNumber > 1
SET #counter = #counter + 1
END
I'm no expert on SQL, but this just updates the Date with the first previous date it can find due to the SELECT TOP(1) but if I don't put that TOP(1) the subquery returns more than 1 record so it complains.
Any help would be appreciated.
Thanks!
No need for a procedure and a loop. I would recommend window functions and an updatable cte for this:
with cte as (
select date,
row_number() over(partition by company order by auditnumber desc) rn,
lag(date) over(partition by company order by auditnumber) lag_date
from audits
)
update cte
set date = dateadd(year, 5, lag_date)
where rn = 1 and lag_date is not null
The common table expression ranks records having the same company by descending audit number, and retrieves the date of the previous audit. The outer query filters on the top record per group, and updates the date to 5 years after the previous date.
You did not tell what to do when a company has just one audit. I added a condition to no update those rows, if any.
You must add row_number to you result_tbl first then join result_tbl with self ON
Al.CompanyId=A2.CompanyId AND Al.IND=1 AND A2.IND=2, now you have latest record and previous record in one record, and you can update original table
WITH A AS
(
SELECT *,ROW_NUMBER(PARTITION BY CompanyId ORDER BY AuditNumber DESC) IND FROM Audits
),B AS
(
SELECT Al.CompanyId,A1.AuditNumber,A2.[DATE] FROM A A1 INNER JOIN A A2 ON Al.CompanyId=A2.CompanyId AND Al.IND=1 AND A2.IND=2
)UPDATE _Audits SET _Audits.[Date]= DATEADD(YEAR,5,B.[DATE]) FROM
B LEFT JOIN Audits _Audits ON B.CompanyId=_Audits.CompanyId AND B.AuditNumber=_Audits.AuditNumber

Creating a table with percentiles from raw database

I have a database with 160 million something records in it, that is segmented by an code called TMC. The TMC represents a section of highway that is then measured every five minutes for speed and travel time. So the TMC isn't a unique identifier as it the same for every five minutes for all days of the year for that one section of highway. There are 3,440 unique TMCs, as for each TMC, I am trying to calculate a percentile of travel times for an entire year for a specific time of day.
I can get the code for the percentiles to work, but I do not understand how to create and update a table in SQL so the percentiles can be dumped and stored within it. Something to do with the with statement being used to get the percentile does not mesh well with update functions. I normally just use select and copy the data into excel, and then reimport the data into my SQL database, but I am trying to see if I can automate this process as much as possible.
Here is the code that I got so far.
create table TMCF5 (
TMC_code varchar(50),
P95M varchar(50),
P50M varchar(50),
P95A varchar(50),
P50A varchar(50))
go
WITH PERCENTILES_Afternoon AS (SELECT TMC_code, EPOCH, percentile_CONT(.95)
WITHIN GROUP (ORDER BY cast(travel_time_minutes as float)) OVER (PARTITION BY TMC_code) AS P95afternoon, percentile_CONT(.50)
WITHIN GROUP (ORDER BY cast(travel_time_minutes as float)) OVER (PARTITION BY TMC_code) AS P50afternoon FROM [dbo].[AR_2018_TRUCKS_1_3]
WHERE DATEPART(HOUR, EPOCH) between 16 and 17 AND (WKDAY != 'SAT' and WKDAY != 'SUN'))
insert tmcf5 (tmc_code) select tmc_code from percentiles_afternoon group by tmc_code
go
WITH PERCENTILES_Afternoon2 AS (SELECT TMC_code, EPOCH, percentile_CONT(.95)
WITHIN GROUP (ORDER BY cast(travel_time_minutes as float)) OVER (PARTITION BY TMC_code) AS P95afternoon, percentile_CONT(.50)
WITHIN GROUP (ORDER BY cast(travel_time_minutes as float)) OVER (PARTITION BY TMC_code) AS P50afternoon FROM [dbo].[AR_2018_TRUCKS_1_3]
WHERE DATEPART(HOUR, EPOCH) between 16 and 17 AND (WKDAY != 'SAT' and WKDAY != 'SUN'))
update TMCF5 set tmcF5.p95A = percentiles_Afternoon2.P95Afternoon from percentiles_afternoon2
join percentiles_afternoon2 on tmcf5.tmc_code = percentiles_afternoon2.tmc_code
The text in your error message that you provided in the comments leads me to identify the problem as being present in this statement:
update TMCF5 set tmcF5.p95A = percentiles_Afternoon2.P95Afternoon from percentiles_afternoon2
join percentiles_afternoon2 on tmcf5.tmc_code = percentiles_afternoon2.tmc_code
It has 'percentiles_afternoon2' listed as both tables. It seems you wanted to reference 'tmcf5' as one of your objects.
Also, if your first insert statement purely serves the purpose of bringing in your tmc_codes, then just simplify the query to:
-- no cte's
insert tmcf5 (tmc_code)
select
distinct tmc_code
from AR_2018_TRUCKS_1_3;

SELECT records that were not yesterday

I have the following CTE
;
WITH cte
AS
(
select t.UserId, t.Date
from (select
Date
, UserId
, row_number() over(partition by UserId order by Date desc) as RowNumber
from dbo.Income_Expenses) as t
where t.RowNumber = 1
)
If I make a selection on it, I'll get the following results:
Date UserId RowNumber
2015-05-10 00:00:00.000 6 1
2015-05-08 00:00:00.000 7 1
Basically I get the last record that has been inserted by every user.
Now, when I make a selection on the CTE, I want to get the records that are older than the day before yesterday.
I.E. Today is May 10th; I want all the records that are from May 8th and later. (8th, 7th, etc, but not 9th and 10th).
So I tried some expression with DATEADD, DATEDIFF and none of them worked.
Can someone help me?
Try to add this condition to the cte: and datediff(day, date, getDate()) >= 2

T-SQL select values grouped by week, zero if no values present for week

I am trying to group values (sales data) by week in SQL Server. For items with no sales in a certain week, I still want to get the week number and year, with a sum of 0.
The sales ledger table has computed columns for year and week number, by which I group.
Right now my Query looks like this:
select ItemNumber, sum(Amount), year, week
from JournalPosition
group by week, year, ItemNumber
order by ItemNumber asc, year desc, week desc
What would be an efficient way to accomplish what i want without having to implement a data warehouse? (Stored procedure or temporary table would be fine for me)
You need to generate a list of all of the weeks that you want to include in your query and join onto it. You can either store these in a pre-generated table or use a CTE. Something like this will help you with a CTE how to get the start and end dates of all weeks between two dates in SQL server?
You can use recursive CTE with dates from your table:
declare #StartDate datetime,#EndDate datetime
set #StartDate=(select convert(varchar,min(Year),102) from JournalPosition)
set #EndDate=(select dateadd(day,-1,dateadd(year,2,convert(varchar,max(Year),102))) from JournalPosition)
print #StartDate
print #EndDate
;with CTE as (
select #StartDate as StartDate, DATEPART(week,#StartDate) as WeekNumber, DATEPART(year,#StartDate) as YearNumber
union all
select DATEADD(week, 1, StartDate), DATEPART(week,DATEADD(WEEK, 1, StartDate)), DATEPART(year,DATEADD(week, 1, StartDate))
from CTE
where DATEADD(week, 1, StartDate) <= #EndDate
)
select ItemNumber, isnull(sum(Amount),0), CTE.YearNumber, datepart(week,CTE.StartDate)
from JournalPosition
full join CTE
on JournalPosition.week=datepart(week,CTE.StartDate) and JournalPosition.year=CTE.YearNumber
group by CTE.YearNumber, datepart(week,CTE.StartDate), ItemNumber
order by 3 desc, 4 desc, 1 asc
option (maxrecursion 32767);
But maybe it's better not to use recursion (see http://www.sqlservercentral.com/Forums/Topic779830-338-1.aspx).

How can I optimize a SQL query that performs a count nested inside a group-by clause?

I have a charting application that dynamically generates SQL Server queries to compute values for each series on a given chart. This generally works quite well, but I have run into a particular situation in which the generated query is very slow. The query looks like this:
SELECT
[dateExpr] AS domainValue,
(SELECT COUNT(*) FROM table1 WHERE [dateExpr]=[dateExpr(maintable)] AND column2='A') AS series1
FROM table1 maintable
GROUP BY [dateExpr]
ORDER BY domainValue
I have abbreviated [dateExpr] because it's a combination of CAST and DATEPART functions that convert a datetime field to a string in the form of 'yyyy-MM-dd' so that I can easily group by all values in a calendar day. The query above returns both those yyyy-MM-dd values as labels for the x-axis of the chart and the values from the data series "series1" to display on the chart. The data series is supposed to count the number of records that fall into that calendar day that also contain a certain value in [column2]. The "[dateExpr]=[dateExpr(maintable)]" expression looks like this:
CAST(DATEPART(YEAR,dateCol) AS VARCHAR)+'-'+CAST(DATEPART(MONTH,dateCol) AS VARCHAR) =
CAST(DATEPART(YEAR,maintable.dateCol) AS VARCHAR)+'-'+CAST(DATEPART(MONTH,maintable.dateCol) AS VARCHAR)
with an additional term for the day (ommitted above for the sake of space). That is the source of the slowness of the query, but I don't know how to rewrite the query so that it returns the same result more efficiently. I have complete control over the generation of the query, so if I could find more efficient SQL that returned the same results, I could modify the query generator appropriately. Any pointers would be greatly appreciated.
I havent tested but i think it can be done by:
SELECT
[dateExpr] AS domainValue,
SUM (CASE WHEN column2='A' THEN 1 ELSE 0 END) AS series1
FROM table1 maintable
GROUP BY [dateExpr]
ORDER BY domainValue
The fastest way to do this would be to use calendar tables. Create a sql table with an entry for every month for next who knows how many years. Then select from that calendar table, joining in the entries from table1 that have dates between the start and end date for the month. Then, if your clustered index is on the dateCol in table1, the query will run very quickly.
EDIT: Example Query. This assumes a months table exists with two columns, StartDate and EndDate where EndDate is the midnight on the first day of the next month. The clustered index on the months table should be on StartDate
SELECT
months.StartDate,
COUNT(*) AS [Count]
FROM months
INNER JOIN table1
ON table1.dateCol >= months.StartDate AND table1.dateCol < months.EndDate
GROUP BY months.StartDate;
With Calendar As
(
Select DateAdd(d, DateDiff(d, 0, Min( dateCol ) ), 0) As [date]
From Table1
Union All
Select DateAdd(d, 1, [date])
From Calendar
Where [date] <= (
Select Max( DateAdd(d, DateDiff(d, 0, dateCol) + 1, 0) )
From Table1
)
)
Select C.date, Count(Table1.PK) As Total
From Calendar As C
Left Join Table1
On Table1.dateCol >= C.date
And Table1.dateCol < DateAdd(d, 1, C.date )
And Table1.column2 = 'A'
Group By C.date
Option (Maxrecursion 0);
Rather than try to force the display format in SQL, you should do that in your report or chart generator. However, what you can do in the SQL is to strip the time portion from the datetime values as I've done in my solution.

Resources