How to compute the moving average over the last n hours - sql-server

I am trying to compute efficiently (using SQL Server 2008) the moving average of the ProductCount over a period of 24 hours. For every single row in the Product table, I'd like to know what was the average of ProductCount (for that given products) over the last 24 hours. One problem with our data is that not all the dates/hours are present (see example below). If a TimeStamp is missing, it means that the ProductCount was 0.
I have a table with millions or rows with a Date, Product and Count. Below is a simplified example of the data I have to deal with.
Any idea on how to acheive that?
EDIT: One other piece of data that I need is the MIN and MAX ProductCount for the period (i.e. 24h). Computing the MIN/MAX is a bit trickier because of the missing values...
+---------------------+-------------+--------------+
| Date | ProductName | ProductCount |
+---------------------+-------------+--------------+
| 2012-01-01 00:00:00 | Banana | 15000 |
| 2012-01-01 01:00:00 | Banana | 16000 |
| 2012-01-01 02:00:00 | Banana | 17000 |
| 2012-01-01 05:00:00 | Banana | 12000 |
| 2012-01-01 00:00:00 | Apple | 5000 |
| 2012-01-01 05:00:00 | Apple | 6000 |
+---------------------+-------------+--------------+
SQL
CREATE TABLE ProductInventory (
[Date] DATETIME,
[ProductName] NVARCHAR(50),
[ProductCount] INT
)
INSERT INTO ProductInventory VALUES ('2012-01-01 00:00:00', 'Banana', 15000)
INSERT INTO ProductInventory VALUES ('2012-01-01 01:00:00', 'Banana', 16000)
INSERT INTO ProductInventory VALUES ('2012-01-01 02:00:00', 'Banana', 17000)
INSERT INTO ProductInventory VALUES ('2012-01-01 05:00:00', 'Banana', 12000)
INSERT INTO ProductInventory VALUES ('2012-01-01 00:00:00', 'Apple', 5000)
INSERT INTO ProductInventory VALUES ('2012-01-01 05:00:00', 'Apple', 6000)

Well, the fact that you need to calculate the average for every hour, actually makes this simpler, since you just need to SUM the product count and divide it by a fixed number (24). So I think that this will get the results you want (though in this particular case, a cursor by be actually faster):
SELECT A.*, B.ProductCount/24 DailyMovingAverage
FROM ProductInventory A
OUTER APPLY ( SELECT SUM(ProductCount) ProductCount
FROM ProductInventory
WHERE ProductName = A.ProductName
AND [Date] BETWEEN DATEADD(HOUR,-23,A.[Date]) AND A.[Date]) B

I added to Lamak's answer to include min/max:
SELECT *
FROM ProductInventory A
OUTER APPLY (
SELECT
SUM(ProductCount) / 24 AS DailyMovingAverage,
MAX(ProductCount) AS MaxProductCount,
CASE COUNT(*) WHEN 24 THEN MIN(ProductCount) ELSE 0 END AS MinProductCount
FROM ProductInventory
WHERE ProductName = A.ProductName
AND [Date] BETWEEN DATEADD(HOUR, -23, A.[Date]) AND A.[Date]) B
To account for missing records, check that there were indeed 24 records in the last 24 hours before using MIN(ProductCount), and return 0 otherwise.
Working SQL Fiddle, with a bunch (bushel?) of Oranges added to show the MinProductCount working

Related

Compare value between current date and yesterday on the same table POSTGRESQL

First of all, i hope you guys understand my poor english :))
I have a table like this
product | value | trx_date
apple | 100 | 2020-06-01
apple | 300 | 2020-06-02
apple | 500 | 2020-06-03
and i need create a report like this (lets say today is 2020-06-03)
product | yesterday | current_date | delta
apple | 300 | 500 | 200
im confused how to create a query (postgre), comparing those value.. fyi, i always update this table everyday.. i tried with ('1 day'::interval) query but it always show all date before 2020-06-03 which is 2020-06-01 and 2020-06-02..
i appreciate for your help..
Use the Window Function lead or lag to 'combine' data to the current row from following rows (lead) or previous rows (lag). In this case the I use the lag function to get "yesterdays" value.
select product, yesterday, today, today-yesterday delta
from ( select p.product, p.value today
, lag(value) over (partition by p.product
order by p.trx_date) yesterday
, p.trx_date
from products p
) d
where trx_date = '2020-06-03'::date ;
Using CTE:
https://www.postgresql.org/docs/12/queries-with.html
An example:
CREATE TABLE product_table (product varchar, value integer, trx_date date);
INSERT INTO product_table values ('apple', 100, '06/01/2020'), ('apple', 300, '06/02/2020'), ('apple', 500, '06/03/2020');
WITH prev AS (
SELECT
product,
value
FROM
product_table
WHERE
trx_date = '06/03/2020'::date - '1 day'::interval
)
SELECT
pt.product,
prev.value AS yesterday,
pt.value AS CURRENT_DATE,
pt.value - prev.value AS delta
FROM
product_table AS pt,
prev
WHERE
trx_date = '06/03/2020';
product | yesterday | current_date | delta
---------+-----------+--------------+-------
apple | 300 | 500 | 200

Update a column with LastExclusionDate

In SQL Server 2012, I have a table t1 where we store a list of excluded product.
I would like to add a column LastExclusionDate to store the date since when the product has been excluded.
Every day the product is inserted into the table if it is excluded. If not there will be no row and the next time when the product will be excluded there will be a gap date with the previous insert.
I would like to find a T-SQL query to update the LastExclusionDate column.
I would like to use it to populate column LastExclusionDate the first time (=initialisation) and use it every day to update the column when we insert a new row
I've tried this query, but I don't know how to get LastExclusionDate!
;WITH Cte AS
(
SELECT
product_id,
CreationDate,
LAG(CreationDate) OVER (PARTITION BY Product_ID ORDER BY CreationDate) AS GapStart,
(DATEDIFF(DAY, LAG(CreationDate) OVER (PARTITION BY Product_id ORDER BY CreationDate), CreationDate) -1) AS GapDays
FROM
#t1
)
SELECT *
FROM cte
Here's some sample data:
+------------+--------------+--------------------------------+
| product_id | CreationDate | LastExclusionDate_(toPopulate) |
+------------+--------------+--------------------------------+
| 100 | 2018-05-01 | 2018-05-01 |
| 100 | 2018-05-02 | 2018-05-01 |
| 100 | 2018-05-03 | 2018-05-01 |
| 100 | 2018-06-01 | 2018-06-01 |
| 100 | 2018-06-02 | 2018-06-01 |
| 200 | 2018-09-01 | 2018-09-01 |
| 200 | 2018-09-02 | 2018-09-01 |
| 200 | 2018-09-17 | 2018-09-17 |
+------------+--------------+--------------------------------+
Thanks
The idea in finding gap-less sequences is to compare the series to a gap-less sequence and find groups of records where the difference of both doesn't change. For example, when the date increases one by one and a row number also does, then the difference between both stays the same and we found a group:
WITH
cte (product_id, CreationDate, grp) AS (
SELECT product_id, CreationDate
, DATEDIFF(day, '19000101', CreationDate)
- ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY CreationDate)
FROM #t1
)
SELECT product_id, CreationDate
, MIN(CreationDate) OVER (PARTITION BY product_id, grp) AS LastExclusionDate
FROM cte
For ongoing daily insertions it can be done with something like this.
INSERT INTO <yourTable>
SELECT
newProduct.[product_id],
newProduct.[creationDate],
isnull(existingProduct.[lastExclusionDate], newProduct.[creationDate]) AS [lastExclusionDate]
FROM
(SELECT <#product_id> AS [product_id], <#createionDate> AS [creationDate]) AS newProduct
LEFT JOIN #temp existingProduct
ON existingProduct.[product_id] = newProduct.product_id
AND existingProduct.[creationDate] = DATEADD(DAY,-1,newProduct.[creationDate])
I've got a demo here http://rextester.com/BDEO23118 . It's a larger than necessary demo because it uses the code above with the data you provided to populate a table row-by-row like you might in a daily update process. It then does individual insertions using this code with some new dates so you can see the way it handles new ranges. (just an FYI, rextester displays result dates in day.month.year hh:mm:ss format, but you can dump the script into management studio and it will output in DATE format)

SQL Server join and insert data

My name is Thorsten and I'm new to SQL Server. Now I am facing a problem after setting a join... I joined two tables, and it worked so far, but I don't have enough knowledge to proceed.
Here is table1:
| Item | validDate | Price |
| ---- | --------- | ----- |
| A | 01.01.2017 | 100 |
| A | 31.03.2017 | 100 |
| A | 01.04.2017 | 120 |
| A | 31.07.2017 | 120 |
Now I want to create a table that includes a dataset for the gap in table1:
| Item | validDate | Price |
| ---- | --------- | ----- |
| A | 01.01.2017 | 100 |
| A | 28.02.2017 | 100 |
| A | 31.03.2017 | 120 |
... and so on.
My idea was to set a join from table1 to a date table, were every month end is included. But I have to insert the gap as well by creating a new dataset. With what code I'll be able to solve this issue?
As mentioned - I'm a beginner, so I hope I was able to describe my problem.
Thanks in advance for help!
Try to make use of below Query :
DECLARE #Table TABLE (Item VARCHAR(2), validDate DATE, Price INT)
INSERT INTO #Table VALUES
('A','2017-01-01',100),
('A','2017-03-31',100),
('A','2017-04-01',120),
('A','2017-07-31',120)
SELECT Item,DATEADD(DD,-1,validDate) AS Date,Price FROM #Table
OUTPUT
Item Date Price
A 2016-12-31 100
A 2017-03-30 100
A 2017-03-31 120
A 2017-07-30 120
I'm not sure if I understand you correctly but to fill the 'gaps' in your table you need to insert last days of each month. Here's the script that will do that for you. Since I'm fairly new in SQL too, this might not be the best solution, but worked for me. Please not that it will not insert records with dates that already exist in your validDate column:
declare #dateVar date = '2017-01-01' -- script will start calculating last day of the month from this date. DON'T modify the day value
declare #yearVar int = '2017' -- script insert months until the end of the year in this variable
declare #endDates table
(
item nvarchar(1),
endOfMonthDate date
)
while datepart(year, #dateVar) = #yearVar
begin
insert into #endDates
(item, endOfMonthDate)
values (
'A',
dateadd(day, -1, dateadd(month, 1, #dateVar))
)
set #dateVar = dateadd(month, 1, #dateVar)
end
insert into dbo.table1
(Item, validDate)
(
select item, endOfMonthDate
from #endDates
where endOfMonthDate not in (
select validDate
from table1)
)
Now, updating the records with the correct prices will be a little tricky. First we set up the Price for the last days of each month based on the price from the beginning of the month.
update dbo.table1
set table1.Price = t2.Price
from table1
left join table1 as t2
on month(table1.validDate) = MONTH(t2.validDate)
And then, we update the rest records that don't have prices with values from previous months:
declare #loopVar int = 0
declare #nullsNumb int = (select sum(case when table1.Price is null then 1 else 0 end) from table1) --calculates number of nulls in the Price column
while #loopVar < #nullsNumb --not so great solution that inserts previous month's price to every record that doesn't have any price at this moment
begin
update dbo.table1
set table1.Price = t2.Price
from table1
left join table1 as t2
on month(table1.validDate) = MONTH(t2.validDate) + 1
where table1.Price is null
set #loopVar = #loopVar + 1
end
Here's how the data in table1 look now when ordered by validDate:
Item validDate Price
A 2017-01-01 100
A 2017-01-31 100
A 2017-02-28 100
A 2017-03-31 100
A 2017-04-01 120
A 2017-04-30 120
A 2017-05-31 120
A 2017-06-30 120
A 2017-07-31 120
A 2017-08-31 120
A 2017-09-30 120
A 2017-10-31 120
A 2017-11-30 120
A 2017-12-31 120
Let me know if I was able to help.

SQL Query to duplicate records by number of days

I have database with job numbers, scheduled date, and scheduled hours such as this:
J410 | 11/14/2016 | 50|
I have been asked to produce a report with one line for each day of the job like this:
J410 | 11/14/2016 | 10 |
J410 | 11/15/2016 | 10 |
J410 | 11/16/2016 | 10 |
J410 | 11/17/2016 | 10 |
J410 | 11/18/2016 | 10 |
The logic is that we assume 10 hour days, so the total number of hours divided by 10 = the number of days, then the users want a line for each day.
I can easily get the number of days like this:
SELECT CEILING(Hours / 10.0) - Note that some hours don't divide evenly by 10 so I am rounding up.
I don't have the slightest idea how to attack the problem of creating (for reporting only) additional lines for each date.
My initial thoughts are to select the records into a temp table and then select each record and use a WHILE statement to duplicate the records until the number of days have been reached.
Can anyone provide a better idea ?
If it helps
Declare #YourTable table (JobNumber varchar(25),Date date,Hours int)
Insert Into #YourTable values
('J410','11/14/2016',50)
;with cte0(N) As (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N))
,cteN(N) As (Select Row_Number() over (Order By (Select NULL)) From cte0 N1, cte0 N2, cte0 N3)
Select A.JobNumber
,Date = DateAdd(DD,N-1,Date)
,Hours = cast(Hours/CEILING(Hours/10.0) as decimal(10,2))
From #YourTable A
Join cteN B on N<=CEILING(Hours/10.0)
Returns
JobNumber Date Hours
J410 2016-11-14 10.00
J410 2016-11-15 10.00
J410 2016-11-16 10.00
J410 2016-11-17 10.00
J410 2016-11-18 10.00
Use a Numbers Table and add a day to your existing table until the date limit is reached...

Cleaning up old record to a specific date: How to select the old record?

I posted a question here, which I now need to perform. I edited it a few times to match the current requirement, and now I think i will make it clearer as a final solution for me as well.
My table:
Items | Price | UpdateAt
1 | 2000 | 02/02/2015
2 | 4000 | 06/04/2015
1 | 2500 | 05/25/2015
3 | 2150 | 07/05/2015
4 | 1800 | 07/05/2015
5 | 5540 | 08/16/2015
4 | 1700 | 12/24/2015
5 | 5200 | 12/26/2015
2 | 3900 | 01/01/2016
4 | 2000 | 06/14/2016
As you can see, this is a table that keeps items' price as well as their old price before the last update.
Now I need to find the rows which :
UpdateAt is more than 1 year ago from now
Must have updated price at least once ever since
Aren't the most up-to-date price
Why those conditions? Because I need to perform a cleanup on that table off of those records that older than 1 year, while still maintain the full item list.
So with those conditions, the result from the above table should be :
Items | Price | UpdateAt
1 | 2000 | 02/02/2015
2 | 4000 | 06/04/2015
4 | 1800 | 07/05/2015
The update at 02/02/2015 of item 1 should be selected, while the update no. 2 at 05/25/2015, though still over 1 year old, should not because it is the most up-to-date price for item 1.
Item 3 isn't in the list because it never been updated, hence its price remain the same until now so i don't need to clean it up.
At first i think it wouldn't be so hard, and i think I've already had an answer but as I proceed, it isn't something that easy anymore.
#Tim Biegeleisen provided me with an answer in the last question, but it doesn't select the items which price doesn't change over the year at all, which i'm having to deal with now.
I need a solution to effectively clean up the table - it isn't necessary to follow 3 conditions above if it can produce the same result as I need : Records that needs to be deleted.
try this,
DECLARE #Prices TABLE(Items INT, Price DECIMAL(10,2), UpdateAt DATETIME)
INSERT INTO #Prices
VALUES
(1, 2000, '02/02/2015')
,(2, 4000, '06/04/2015')
,(1, 2500, '05/25/2015')
,(3, 2150, '07/05/2015')
,(4, 1800, '07/05/2015')
,(5, 5540, '08/16/2015')
,(4, 1700, '12/24/2015')
,(5, 5200, '12/26/2015')
,(2, 3900, '01/01/2016')
,(4, 2000, '06/14/2016')
SELECT p.Items, p.Price, p.UpdateAt
FROM #Prices p
LEFT JOIN ( SELECT
p1.Items,
p1.UpdateAt,
ROW_NUMBER() OVER (PARTITION BY p1.Items ORDER BY p1.UpdateAt DESC) AS RowNo
FROM #Prices p1
) AS hp ON hp.Items = p.Items
AND hp.UpdateAt = p.UpdateAt
WHERE hp.RowNo > 1 -- spare one price for each item at any date
AND p.UpdateAt < DATEADD(YEAR, -1, GETDATE()) -- remove only prices older than a year
the result is:
Items Price UpdateAt
----------- --------------------------------------- -----------------------
1 2000.00 2015-02-02 00:00:00.000
2 4000.00 2015-06-04 00:00:00.000
4 1800.00 2015-07-05 00:00:00.000
This query will return the dataset you're looking for:
SELECT t1.Items, t1.Price, t1.UpdateAt
FROM
(
SELECT
t2.Items,
t2.Price,
t2.UpdateAt,
ROW_NUMBER() OVER (PARTITION BY t2.Items ORDER BY t2.UpdateAt DESC) AS rn
FROM [Table] AS t2
) AS t1
WHERE t1.rn > 1
AND t1.UpdateAt < DATEADD(year, -1, GETDATE())

Resources