I have two tables, first table section with schema as:
SecID | Date | SecReturn
-------|---------------|--------------
208 | 2015-04-01 | 0.00355
208 | 2015-04-02 | -0.00578
208 | 2015-04-03 | 0.00788
208 | 2015-04-04 | 0.08662
105 | 2015-04-01 | 0.00786
and the second table SectionDates with schema as:
SecID | MonthlyDate | DailyDate
------|---------------|-------------
208 | 2015-04-02 | 2015-04-03
105 | 2015-04-01 | 2015-04-01
I want to calculate the running product on SecReturn column of the table Section with date range (DailyDate to MonthlyDate) from second table SectionDates.
Running product will be calculated for each sectionID based on formula :
Date | SecReturn | SectionTotal
-----------|---------------|--------------------
2015-04-01 | X (lets say) | (1+x)-1
2015-04-01 | Y | (1+x)(1+y)-1
2015-04-01 | Z | (1+x)(1+y)(1+z)-1
After applying above calculation values will be computed in SectionTotal column as for date 2015-04-01 computed value will be (1+0.00355)-1. Similarly, for date 2015-04-02 computed value will be (1+0.00355)(1+-0.00578)-1 and for date 2015-04-03 computed value will be (1+0.00355)(1+-0.00578)(1+0.00788)-1 and so on.
The final output:
SecID | Date | SectionTotal
-------|------------|-----------------
105 | 2015-04-01 | 0.00786
208 | 2015-04-01 | 0.00355
208 | 2015-04-02 | -0.0022505
208 | 2015-04-03 | 0.0056117
You can try following query:
SELECT SecID, [Date], [SecReturn],
ROUND((1 + SecReturn) * COALESCE(v,1) - 1, 5) AS SectionTotal
FROM mytable AS t1
OUTER APPLY (
SELECT EXP(SUM(LOG(SecReturn + 1))) AS v
FROM mytable AS t2
WHERE t1.SecID = t2.SecID AND t1.[Date] > t2.[Date]) AS t3
OUTER APPLY, which is available in SQL Server 2005 AFAIK, fetches all records to be considered in running multiplication calculation.
Using the formula for a multiplication aggregate, found in this post you can obtain desired result.
Demo here
Related
I have a table consisting of ID, Year, Value
---------------------------------------
| ID | Year | Value |
---------------------------------------
| 1 | 2006 | 100 |
| 1 | 2007 | 200 |
| 1 | 2008 | 150 |
| 1 | 2009 | 250 |
| 2 | 2005 | 50 |
| 2 | 2006 | 75 |
| 2 | 2007 | 65 |
---------------------------------------
I then create a derived, aggregated table consisting of an ID, MinYear, and MaxYear
---------------------------------------
| ID | MinYear | MaxYear |
---------------------------------------
| 1 | 2006 | 2009 |
| 2 | 2005 | 2007 |
---------------------------------------
I then want to find the sum of Values between the MinYear and MaxYear foreach ID in the aggregated table, but I am having trouble determining a proper query.
The final table should look something like this
----------------------------------------------------
| ID | MinYear | MaxYear | SumVal |
----------------------------------------------------
| 1 | 2006 | 2009 | 700 |
| 2 | 2005 | 2007 | 190 |
----------------------------------------------------
Right now I can perform all the joins to create the second table. But then I use a fast forward cursor to iterate through each record of the second table with the code inside the for loop looking like the following
DECLARE #curMin int
DECLARE #curMax int
DECLARE #curID int
FETCH Next FROM fastCursor INTo #curISIN, #curMin , #curMax
WHILE ##FETCH_STATUS = 0
BEGIN
SELECT Sum(Value) FROM ValTable WHERE Year >= #curMin and Year <= #curMax and ID = #curID
Group By ID
FETCH Next FROM fastCursor INTo #curISIN, #curMin , #curMax
Having found the sum of values between specified years, I can connect it back to the second table and I wind up the desired result (the third table).
However, the second table in reality is roughly 4 million rows, so this iteration is extremely time consuming (~generating 300 results a minute) and presumably not the best solution.
My question is, is there a way to generate the third table's results without having to use a cursor/for loop?
During a group by the sum will only be for the ID in question -- since the min year and max year is for the ID itself then you don't need to double query. The query below should give you exactly what you need. If you have a different requirement let me know.
SELECT ID, MIN(YEAR) as MinYear, MAX(YEAR) as MaxYear, SUM(VALUE) as SUMVALUE
FROM tablenameyoudidnotsay
GROUP BY ID
You could use query as bellow
TableA is your first table, and TableB is the second one
SELECT *,
(select SUM(Value) FROM TableA where tablea.ID=TableB.ID AND tableA.Year BETWEEN
TableB.MinYear AND TableB.MaxYear) AS SumValue
from TableB
You can put your criteria into a join and obtain the result all as one set which should be faster:
SELECT b.Id, b.MinYear, b.MaxYear, sum(a.Value)
FROM Table2 b
JOIN Table1 a ON a.Id=b.Id AND b.MinYear <= a.Year AND b.MaxYear >= a.Year
GROUP BY b.Id, b.MinYear, b.MaxYear
I have an interesting conundrum and I am using SQL Server 2012 or SQL Server 2016 (T-SQL obviously). I have a list of products, each with their own UPC code. These products have a discontinue date and the UPC code gets recycled to a new product after the discontinue date. So let's say I have the following in the Item_UPCs table:
Item Key | Item Desc | UPC | UPC Discontinue Date
123456 | Shovel | 0009595959 | 2018-04-01
123456 | Shovel | 0007878787 | NULL
234567 | Rake | 0009595959 | NULL
As you can see, I have a UPC that gets recycled to a new product. Unfortunately, I don't have an effective date for the item UPC table, but I do in an items table for when an item was added to the system. But let's ignore that.
Here's what I want to do:
For every inventory record up to the discontinue date, show the unique UPC associated with that date. An inventory record consists of the "Inventory Date", the "Purchase Cost", the "Purchase Quantity", the "Item Description", and the "Item UPC".
Once the discontinue date is over with (e.g.: it's the next day), start showing only the UPC that is in effect.
Make sure that no duplicate data exists and the UPCs are truly being "attached" to each row per whatever the date is in the query.
Here is an example of the inventory details table:
Inv_Key | Trans_Date | Item_Key | Purch_Qty | Purch_Cost
123 | 2018-05-12 | 123456 | 12.00 | 24.00
108 | 2018-03-22 | 123456 | 8.00 | 16.00
167 | 2018-07-03 | 234567 | 12.00 | 12.00
An example query:
SELECT DISTINCT
s.SiteID
,id.Item_Key
,iu.Item_Desc
,iu.Item_Department
,iu.Item_Category
,iu.Item_Subcategory
,iu.UPC
,iu.UPC_Discontinue_Date
,id.Trans_Date
,id.Purch_Cost
,id.Purch_Qty
FROM Inventory_Details id
INNER JOIN Item_UPCs iu ON iu.Item_Key = id.Item_Key
INNER JOIN Sites s ON s.Site_Key = id.Site_Key
The real query I have is far too long to post here. It has three CTEs and the resultant query. This is simply a mockup. Here is an example result set:
Site_ID | Item_Key | Item_Desc | Item_Department | Item_Category | UPC | UPC_Discontinue Date | Trans_Date | Purch_Cost | Purch_Qty
2457 | 123456 | Shovel | Digging Tools | Shovels | 0009595959 | 2018-04-01 | 2018-03-22 | 16.00 | 8.00
2457 | 123456 | Shovel | Digging Tools | Shovels | 0007878787 | NULL | 2018-03-22 | 16.00 | 8.00
2457 | 234567 | Rakes | Garden Tools | Rakes | 0009595959 | NULL | 2018-07-03 | 12.00 | 12.00
2457 | 123456 | Shovel | Digging Tools | Shovels | 0007878787 | NULL | 2018-05-12 | 24.00 | 12.00
Do any of you know how I can "assign" a UPC to a specific range of dates in my query and then "assign" an updated UPC to the item for every effective date thereafter?
Many thanks!
Given your current Item_UPC table, you can generate effective start dates from the Discontinue Date using the LAG analytic function:
With Effective_UPCs as (
select [Item_Key]
, [Item_Desc]
, [UPC]
, coalesce(lag([UPC_Discontinue_Date])
over (partition by [Item_Key]
order by coalesce( [UPC_Discontinue_Date]
, datefromparts(9999,12,31))
),
lag([UPC_Discontinue_Date])
over (partition by [UPC]
order by coalesce( [UPC_Discontinue_Date]
, datefromparts(9999,12,31))
)) [UPC_Start_Date]
, [UPC_Discontinue_Date]
from Item_UPCs i
)
select * from Effective_UPCs;
Which yields the following Results:
| Item_Key | Item_Desc | UPC | UPC_Start_Date | UPC_Discontinue_Date |
|----------|-----------|------------|----------------|----------------------|
| 123456 | Shovel | 0007878787 | 2018-04-01 | (null) |
| 123456 | Shovel | 0009595959 | (null) | 2018-04-01 |
| 234567 | Rake | 0009595959 | 2018-04-01 | (null) |
This function produces a fully open ended interval where both the start and discontinue dates could be null indicating that it's effective for all time. To use this in your query simply reference the Effective_UPCs CTE in place of the Item_UPCs table and add a couple additional predicates to take the effective dates into consideration:
SELECT DISTINCT
s.SiteID
,id.Item_Key
,iu.Item_Desc
,iu.Item_Department
,iu.Item_Category
,iu.Item_Subcategory
,iu.UPC
,iu.UPC_Discontinue_Date
,id.Trans_Date
,id.Purch_Cost
,id.Purch_Qty
FROM Inventory_Details id
INNER JOIN Effective_UPCs iu
ON iu.Item_Key = id.Item_Key
and (iu.UPC_Start_Date is null or iu.UPC_Start_Date < id.Trans_Date)
and (iu.UPC_Discontinue_Date is null or id.Trans_Date <= iu.UPC_Discontinue_Date)
INNER JOIN Sites s ON s.Site_Key = id.Site_Key
Note that the above query uses a partially open range (UPC_Start_Date < trans_date <= UPC_Discontinue_Date instead of <= for both inequalities) this prevents transactions occurring exactly on the discontinue date from matching both the prior and next Item_Key record. If transactions that occur exactly on the discontinue date should match the new record and not the old simply swap the two inequalities:
and (iu.UPC_Start_Date is null or iu.UPC_Start_Date <= id.Trans_Date)
and (iu.UPC_Discontinue_Date is null or id.Trans_Date < iu.UPC_Discontinue_Date)
instead of
and (iu.UPC_Start_Date is null or iu.UPC_Start_Date < id.Trans_Date)
and (iu.UPC_Discontinue_Date is null or id.Trans_Date <= iu.UPC_Discontinue_Date)
I have a table that has values like these :
Table 1 :
Name | DateTimeFrom | DateTimeTo
A | 2017-02-03 02:00 | 2017-02-10 23:55
B | 2017-01-03 14:00 | 2017-05-10 19:55
And another table that has values like these :
Table 2:
Name | Date | Hour | Value
A | 2017-01-01 | 00:00 | 0.25
A | 2017-01-01 | 00:15 | 0.25
A | 2017-01-01 | 00:30 | 0
A | 2017-01-01 | 00:45 | 0
A | 2017-01-01 | 01:00 | 0.25
[...] Contains values 0 or 0.25 every 15mins
Result :
Name | DateTimeFrom | DateTimeTo | Value
A | 2017-02-03 02:00 | 2017-02-10 23:55 | 345.0
B | 2017-01-03 14:00 | 2017-05-10 19:55 | 1202
I've created a view that contains all the columns from table 1 and the SUM of all the values from the table 2 according to the daterange on the table 1. The problem is that Table 2 contains more than 3 million rows and the SELECT takes about 10 mins...
Is there a way to speed up the process ?
I tried to create an index on the table 2 but I don't know which index (clustered ? on which columns ? ) i must create to lower the execution time.
Edit (here is the query) :
SELECT Name, DateTimeFrom, DateTimeTo FROM Table1
LEFT OUTER JOIN Table2 ON Table1.Name = Table2.Name AND Table1.DateTimeFrom <=
CAST(Table2.Date AS DATETIME) + CAST(Table2.Hour AS DATETIME)
AND (CASE WHEN Table1.DateTimeTo IS NULL THEN GETDATE() ELSE
Table1.DateTimeTo END) > CAST(Table2.Date AS DATETIME) + CAST(Table2.Hour AS DATETIME)
Op(Swapper) - Are you trying to only return the past 2 days?
Start with a non clustered index on table 2 date include value column.
Then add a filter for only the data set you need, no one can consume 3 million records. something like where datetimefrom > datediff(month, 1, sysdatetime()) (in the view definition)
A second thought, why compute this data over and over again via a view, consider materializing this data into a table.
In Sql Server, I have a simple table that store amount and balance like this:
ID | Date | Amount | Balance
-------------------------------------
101 | 1/15/2017 | 3.00 | 67.50
102 | 1/16/2017 | 5.00 | 72.50
103 | 1/19/2017 | 9.00 | 81.50
104 | 1/20/2017 | -2.00 | 79.50
If I changed a amount of a record, I need to update all the balance after that record.
ID | Date | Amount | Balance
-------------------------------------
101 | 1/15/2017 | 3.00 | 67.50
102 | 1/16/2017 | *5.02* | *72.52*
103 | 1/19/2017 | 9.00 | *81.52*
104 | 1/20/2017 | -2.00 | *79.52*
By now I have more than 100 million records in this table. To do this work, I don't want to use sql cursor or client program, it will submit plenty Update statements and take several hours to finish.
Is it can be done in one sql statement to re-calculate the balance of entire table?
You can easily do it in a single SQL statement using SUM() OVER.
eg
WITH tot as (select ID, SUM(Amount) as balance OVER (order by ID)
UPDATE Tab
SET Balance = t.Balance
FROM YOURTABLE tab
JOIN Tot
ON tot.id = tab.id
If the balance is reset by any other column then use this as a partition by clause and include in the join.
Now if you are inserting a new row you can simply run this update query with a where clause.
Take an example I have the following transaction table, with transaction values of each department for each trimester.
TransactionID | Department | Trimester | Year | Value | Moving Avg
1 | Dep1 | T1 | 2014 | 13 |
2 | Dep1 | T1 | 2014 | 43 |
3 | Dep1 | T2 | 2014 | 36 |
300 | Dep1 T1 | 2017 | 28 |
301 | Dep2 T1 | 2014 | 24 |
I would like to calculate moving average for each transaction from the same department, taking the window as from the 6 trimesters to 2 trimesters before the current line's trimester. Example for transaction 300 in T1 2017, I'd like to have the average of transaction values for Dep1 from T1-2015 to T2-2016.
How can I achieve this with sliding window function in SQL Server 2014. My thought is that I should use something like
SELECT
AVG(VALUES) OVER
(PARTITION BY DEPARTMENT ORDER BY TRIMESTER,
YEAR RANGE [Take the range from previous 6 to 2 trimesters])
How would we define the RANGE clause. I suppose I could not use ROWS due to the number of rows for the window is unknown.
The same question for median. How would we rewrite for calculating the median instead of mean ?