Not sure where to start searching. Basically I have a script that returns multiple metrics from tables. It uses an as of date (each monday). I was able to collect the past years Mondays "As of dates" Now I want to be able to write a script that will use those dates instead of running it manually 52 times.
The end table looks like this:
Office | Metric_1| Metric_2|As_of_Date|
12 | 2000000 | 1 |2017-06-28|
15 | 4000000 | 2 |2017-06-28|
20 | 8000000 | 4 |2017-06-28|
I;d greatly appreciate any direction or help.
Thank you
The end result table would look like this:
Office | Metric_1| Metric_2|As_of_Date|
12 | 2000000 | 1 |2017-06-28|
15 | 4000000 | 2 |2017-06-28|
20 | 8000000 | 4 |2017-06-28|
12 | 2000000 | 1 |2017-05-15|
15 | 4000000 | 2 |2017-05-15|
20 | 8000000 | 4 |2017-05-15|
If I didn't get you wrong, what you need is to find all the data that has as_of_date being in this year, so all you need is to limit date only to this year, some of the examples on how to do it is below
select * from table where as_of_date >= to_date('2017-01-01','yyyy-mm-dd') and as_of_date <= to_date('2017-12-31','yyyy-mm-dd')
or the better way
select * from table where WHERE EXTRACT(YEAR FROM as_of_date ) = 2017
UPD: as you stated in your comment, in order to get some data from DataTable according to the DateTable you mentioned you can use join like below:
SELECT A.* FROM DATATABLE A RIGHT JOIN DATETABLE B ON A.AS_OF_DATE = B.AS_OF_DATE
Related
I need to select or update records from badge-records that have a date difference of more than 30 days after the last visit. A select query to find them is ok, so I can update them.
Difficult to explain in detail but I'll try with an example:
(This is an access system where people scan a badge and the timestamp is recorded.)
I only need to know the records when a badge has entered the system more than 30 days after the previous scan, + the very first scan.
The example table is showing the records I need from the table (i need 5 records)
Only records of the same badge number must be compared and updated.
Is this possible using TSQL ?
Example:
+------------------+--------------+
| TimeStamp | Badge |
+------------------+--------------+
| 19-10-2022 10:18 | Badge1 | <--- **select** (more the 30 days after previous scan)
| 01-01-2022 12:18 | Badge1 | <--- ok (less then 30 days)
| 08-12-2021 13:23 | Badge1 | <--- ok (less then 30 days)
| 20-11-2021 11:18 | Badge1 | <--- ok (less then 30 days)
| 22-10-2021 13:18 | Badge1 | <--- **select** (more the 30 days after previous scan)
| 23-08-2020 14:18 | Badge1 | <--- **select** (first entrance)
| 01-01-2022 09:18 | Badge12 | <--- ok (less then 30 days)
| 02-12-2021 10:18 | Badge12 | <--- **select** (more the 30 days after previous scan)
| 29-10-2021 23:18 | Badge12 | <--- ok (less then 30 days)
| 25-10-2021 12:18 | Badge12 | <--- **select** (first entrance)
+------------------+---------+----+
use this fiddle to have the example db and my wrong answer https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=c1528618004f0fe6bb6319e8e638abae
Help others help you. Post a script that contains DDL and sample data that can be used as the basis for writing code.
with cte as (
select *, ROW_NUMBER() over (partition by Badge order by Timestamp) as rno
from #x
)
select cte.*, prior.rno as prno, datediff(day, prior.TimeStamp, cte.Timestamp) as ddif
from cte
left join cte as prior on cte.badge = prior.badge and cte.rno - 1 = prior.rno
where cte.rno = 1 or datediff(day, prior.TimeStamp, cte.Timestamp) > 30
order by cte.Badge, cte.TimeStamp;
This should work but I have no way of testing on 2008. fiddle to demonstrate. Comment out the WHERE clause to see the all the rows and the columns that are computed for the query logic. This uses ROW_NUMBER to generate a sequence number and then simply self joins using that value to simulate LAG.
updated fiddle: https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=a24d23f54030d7aadd8f889819cd4512
;WITH Ordered AS (
SELECT *
, ROW_NUMBER() OVER (PARTITION BY Badge ORDER BY CONVERT(DATETIME, [scandate] ,103) DESC) rn
FROM History
)
SELECT M.*, DATEDIFF(dd, p.[scandate],m.[scandate]) DaysGap
FROM Ordered M
LEFT JOIN Ordered P
ON M.rn = P.rn-1
AND M.Badge = P.Badge
WHERE P.rn IS NULL -- first entrance
OR DATEDIFF(dd, p.[scandate],m.[scandate]) > 30
I am trying to insert a running total column into a SQL Server table as part of a stored procedure. I am needing this for a financial database so I am dealing with accounts and departments. For example, let's say I have this data set:
Account | Dept | Date | Value | Running_Total
--------+--------+------------+----------+--------------
5000 | 40 | 2018-02-01 | 10 | 15
5000 | 40 | 2018-01-01 | 5 | 5
4000 | 40 | 2018-02-01 | 10 | 30
5000 | 30 | 2018-02-01 | 15 | 15
4000 | 40 | 2017-12-01 | 20 | 20
The Running_Total column provides a historical sum of dates less than or equal to each row's date value. However, the account and dept must match for this to be the case.
I was able to get close by using
SUM(Value) OVER (PARTITION BY Account, Dept, Date)
but it does not go back and get the previous months...
Any ideas? Thanks!
You are close. You need an order by:
Sum(Value) over (partition by Account, Dept order by Date)
Take an example I have the following transaction table, with transaction values of each department for each trimester.
TransactionID | Department | Trimester | Year | Value | Moving Avg
1 | Dep1 | T1 | 2014 | 13 |
2 | Dep1 | T1 | 2014 | 43 |
3 | Dep1 | T2 | 2014 | 36 |
300 | Dep1 T1 | 2017 | 28 |
301 | Dep2 T1 | 2014 | 24 |
I would like to calculate moving average for each transaction from the same department, taking the window as from the 6 trimesters to 2 trimesters before the current line's trimester. Example for transaction 300 in T1 2017, I'd like to have the average of transaction values for Dep1 from T1-2015 to T2-2016.
How can I achieve this with sliding window function in SQL Server 2014. My thought is that I should use something like
SELECT
AVG(VALUES) OVER
(PARTITION BY DEPARTMENT ORDER BY TRIMESTER,
YEAR RANGE [Take the range from previous 6 to 2 trimesters])
How would we define the RANGE clause. I suppose I could not use ROWS due to the number of rows for the window is unknown.
The same question for median. How would we rewrite for calculating the median instead of mean ?
I feel like this isn't too bad of a problem but I've been looking for a solution for the greater part of the day to no avail. Other solutions I've seen plenty of that don't seem to help me have been for getting columns that aren't unique values along with a group by and aggregate function.
The problem
I have a table of historical data as follows:
ID | source | value | date
---+--------+-------+-----------
1 | 12 | 10 | 2016-11-16
2 | 12 | 20 | 2015-11-16
3 | 12 | 30 | 2014-11-16
4 | 13 | 40 | 2016-11-16
5 | 13 | 50 | 2015-11-16
6 | 13 | 60 | 2014-11-16
I'm trying to get data before a certain date(within a loop to go different ranges), then getting the sum of the values grouped by source. So as an example "get all records before 30 days ago, and get the sum of the values of the unique sources, using the most recent dated entry for each".
So the first step was to remove entries with dates not in the range, an easy where date < getdate()-30 for example to get:
ID | source | value | date
---+--------+-------+-----------
2 | 12 | 20 | 2015-11-16
3 | 12 | 30 | 2014-11-16
5 | 13 | 50 | 2015-11-16
6 | 13 | 60 | 2014-11-16
Now my issue is finding a way to group by source and take the max date, and then sum up the result across all sources. The idea hear is that we don't know when the last entry is, so before the specified date we get all records, then take the newest entry for each unique source, and sum those up to get the total value at that time.
So the next step would be to group by source using the max of date, resulting in :
ID | source | value | date
---+--------+-------+-----------
2 | 12 | 20 | 2015-11-16
5 | 13 | 50 | 2015-11-16
And then the final step would be to sum the values, and then this process is repeated to get the sum value for multiple dates, so this would result in the row
value | date
-------+-----------
70 | getdate() - 30
to use for the rest.
Where I'm stuck
I'm trying to group by source and use the max of date to get the most recent entry for each unique source, but if I use the aggregate function or group by, then I can't preserve the ID or value columns to stick with the chosen max row. It's totally possible I'm just misunderstanding how aggregate functions work.
Progress so far
The best place I've gotten to yet is something like
with dataInDateRange as (
select *
from #historicalData hd
where hd.date < getdate() - 30
)
select ???, max(date)
from dataInDateRange
group by source
But I'm not seeing how I can do this without somehow preserving a unique ID for the row that has the max date for each source so then I can go back and sum up the numbers.
Thank you great people for any help/guidance/lessons
USE row_number()
with dataInDateRange as (
select *
from #historicalData hd
where hd.date < getdate() - 30
), rows as (
select *,
row_number() over (partition by source
order by date desc) as rn
from dataInDateRange
)
SELECT *
FROM rows
WHERE rn = 1
I know there are several unpivot / cross apply discussions here but I was not able to find any discussion that covers my problem. What I've got so far is the following:
SELECT Perc, Salary
FROM (
SELECT jobid, Salary_10 AS Perc10, Salary_25 AS Perc25, [Salary_Median] AS Median
FROM vCalculatedView
WHERE JobID = '1'
GROUP BY JobID, SourceID, Salary_10, Salary_25, [Salary_Median]
) a
UNPIVOT (
Salary FOR Perc IN (Perc10, Perc25, Median)
) AS calc1
Now, what I would like is to add several other columns, eg. one named Bonus which I also want to put in Perc10, Perc25 and Median Rows.
As an alternative, I also made a query with cross apply, but here, it seems as if you can not "force" sort the rows like you can with unpivot. In other words, I can not have a custom sort, but only a sort that is according to a number within the table, if I am correct? At least, here I do get the result like I wish to have, but the rows are in a wrong order and I do not have the rows names like Perc10 etc. which would be nice.
SELECT crossapplied.Salary,
crossapplied.Bonus
FROM vCalculatedView v
CROSS APPLY (
VALUES
(Salary_10, Bonus_10)
, (Salary_25, Bonus_25)
, (Salary_Median, Bonus_Median)
) crossapplied (Salary, Bonus)
WHERE JobID = '1'
GROUP BY crossapplied.Salary,
crossapplied.Bonus
Perc stands for Percentile here.
Output is intended to be something like this:
+--------------+---------+-------+
| Calculation | Salary | Bonus |
+--------------+---------+-------+
| Perc10 | 25 | 5 |
| Perc25 | 35 | 10 |
| Median | 27 | 8 |
+--------------+---------+-------+
Do I miss something or did I something wrong? I'm using MSSQL 2014, output is going into SSRS. Thanks a lot for any hint in advance!
Edit for clarification: The Unpivot-Method gives the following output:
+--------------+---------+
| Calculation | Salary |
+--------------+---------+
| Perc10 | 25 |
| Perc25 | 35 |
| Median | 27 |
+--------------+---------+
so it lacks the column "Bonus" here.
The Cross-Apply-Method gives the following output:
+---------+-------+
| Salary | Bonus |
+---------+-------+
| 35 | 10 |
| 25 | 5 |
| 27 | 8 |
+---------+-------+
So if you compare it to the intended output, you'll notice that the column "Calculation" is missing and the row sorting is wrong (note that the line 25 | 5 is in the second row instead of the first).
Edit 2: View's definition and sample data:
The view basically just adds computed columns of the table. In the table, I've got Columns like Salary and Bonus for each JobID. The View then just computes the percentiles like this:
Select
Percentile_Cont(0.1)
within group (order by Salary)
over (partition by jobID) as Salary_10,
Percentile_Cont(0.25)
within group (order by Salary)
over (partition by jobID) as Salary_25
from Tabelle
So the output is like:
+----+-------+---------+-----------+-----------+
| ID | JobID | Salary | Salary_10 | Salary_25 |
+----+-------+---------+-----------+-----------+
| 1 | 1 | 100 | 60 | 70 |
| 2 | 1 | 100 | 60 | 70 |
| 3 | 2 | 150 | 88 | 130 |
| 4 | 3 | 70 | 40 | 55 |
+----+-------+---------+-----------+-----------+
In the end, the view will be parameterized in a stored procedure.
Might this be your approach?
After your edits I understand, that your solution with CROSS APPLY would comes back with the right data, but not in the correct output. You can add constant values to your VALUES and do the sorting in a wrapper SELECT:
SELECT wrapped.Calculation,
wrapped.Salary,
wrapped.Bonus
FROM
(
SELECT crossapplied.*
FROM vCalculatedView v
CROSS APPLY (
VALUES
(1,'Perc10',Salary_10, Bonus_10)
, (2,'Perc25',Salary_25, Bonus_25)
, (3,'Median',Salary_Median, Bonus_Median)
) crossapplied (SortOrder,Calculation,Salary, Bonus)
WHERE JobID = '1'
GROUP BY crossapplied.SortOrder,
crossapplied.Calculation,
crossapplied.Salary,
crossapplied.Bonus
) AS wrapped
ORDER BY wrapped.SortOrder