Calculate Weighted Average in SQL Server - sql-server

I am trying to calculate a weighted average based on the following calculations.
I have a dataset that looks something like this:
item | Date Sent | Date Received
1 | 2 Feb 10am | 3 Feb 10am
1 | 6 Feb 11am | 6 Feb 12pm
2 | 2 Feb 10am | 3 Feb 10am
2 | 6 Feb 11am | 6 Feb 12pm
I then need to calculate the average based on the time difference rounded down meaning:
Time Diff | Count |
1 | 2 |
12 | 2 |
So in this case it would be:
1 * 2 + 12 * 2 / (12 + 1)
I have already written the SQL query to calculate the aggregate table:
select
floor(datediff(hh, dateSent, dateReceived)) as hrs,
count(item) as freq
from
table
group by
floor(datediff(hh, dateSent, dateReceived))
having
floor(datediff(hh, dateSent, dateReceived)) < 100
order by
floor(datediff(hh, dateSent, dateReceived)) asc;
Should I do a subquery? I am not proficient and I have tried but keep getting syntax errors.
Can somebody help me to get the SQL query to get the weighted average?

If what you mean by "weighted average" is average of all time differences, then the following may be helpful:
select AVG(a.hrs)
from
(
select floor(datediff(hh,dateSent,dateReceived)) as hrs,
count(item) as freq from table
group by floor(datediff(hh,dateSent,dateReceived))
having floor(datediff(hh,dateSent,dateReceived)) <100
-- order by floor(datediff(hh,dateSent,dateReceived)) asc
) a

Related

DAX - Divide a column over itself with different filters to get percentages

In power Pivot I have tables along the lines of:
Table 1
Year
Month
Branch_ID
Store_ID
Article
Value
2022
10
1
1
Sales
100
2022
10
1
2
Sales
200
2022
10
1
2
Operating expenses
50
2022
10
1
1
Operating expenses
80
2022
10
1
2
Cost of Sales
20
2022
10
1
1
Cost of Sales
30
Table 2
Year
Month
Branch_ID
Store_ID
Article
Value
2022
10
1
1
Sales_Ecomm
20
2022
10
1
2
Sales_Ecomm
15
Table 3
| Article |
|--------------------|
| Sales |
| Operating expenses |
| Cost of Sales |
| Sales_Ecomm |
There are multiple branches and months, so these columns may not be ignored.
Table 1 and table 2 are separate. Table 3 is connected to both so that I could build a pivot table.
In the pivot table I want to have all articles re-evaluated as percentage of Sales, i.e. I am trying to get a pivot table along the lines of:
Store ID
Sales
Operating expenses
Cost of Sales
Sales_Ecomm
Value
% of sales
Value
% of sales
Value
% of sales
Value
% of sales
1
100
100.00%
80
80.00%
30
30.00%
20
20.00%
2
200
100.00%
50
25.00%
20
10.00%
15
7.50%
I have a measure
Val. := sum(table1[Value]) + sum(table2[value])
which seems to be working for absolute values of the articles.
However, I can't seem to come up with an appropriate DAX measure for percentages. I have tried:
%_of_Sales := [Val.] / calculate([Val.], filter(table3; table3[Article]="Sales"))
but it only counts Sales as percentage of Sales (100%), yielding #NUM! for other articles in the pivot table.
How do I define a ratio measure so that every article is evaluated against Sales?
You're missing a crucial ALL:
=
DIVIDE(
[Val.],
CALCULATE(
[Val.],
FILTER(
ALL( table3 ),
table3[Article] = "Sales"
)
)
)
which is equivalent to:
=
DIVIDE(
[Val.],
CALCULATE(
[Val.],
table3[Article] = "Sales"
)
)

SQL Server loop to run a script based off a date?

Not sure where to start searching. Basically I have a script that returns multiple metrics from tables. It uses an as of date (each monday). I was able to collect the past years Mondays "As of dates" Now I want to be able to write a script that will use those dates instead of running it manually 52 times.
The end table looks like this:
Office | Metric_1| Metric_2|As_of_Date|
12 | 2000000 | 1 |2017-06-28|
15 | 4000000 | 2 |2017-06-28|
20 | 8000000 | 4 |2017-06-28|
I;d greatly appreciate any direction or help.
Thank you
The end result table would look like this:
Office | Metric_1| Metric_2|As_of_Date|
12 | 2000000 | 1 |2017-06-28|
15 | 4000000 | 2 |2017-06-28|
20 | 8000000 | 4 |2017-06-28|
12 | 2000000 | 1 |2017-05-15|
15 | 4000000 | 2 |2017-05-15|
20 | 8000000 | 4 |2017-05-15|
If I didn't get you wrong, what you need is to find all the data that has as_of_date being in this year, so all you need is to limit date only to this year, some of the examples on how to do it is below
select * from table where as_of_date >= to_date('2017-01-01','yyyy-mm-dd') and as_of_date <= to_date('2017-12-31','yyyy-mm-dd')
or the better way
select * from table where WHERE EXTRACT(YEAR FROM as_of_date ) = 2017
UPD: as you stated in your comment, in order to get some data from DataTable according to the DateTable you mentioned you can use join like below:
SELECT A.* FROM DATATABLE A RIGHT JOIN DATETABLE B ON A.AS_OF_DATE = B.AS_OF_DATE

Dynamic table/output each month for report

I have a table and report I need to create and I'm not sure how to wrap my head around how to make it display in the correct order each month for the output.
Using SQL Server 2012 and SSRS 2016 as the output, I need to create a rolling report that displays the last 12 months with their corresponding values. Each month the previous 12th month will drop off.
What's the best table design to approach something like this and how do you control the output to drop off the previous 12th month and keep it rolling?
Sample of desired output would be something like below but next month I need to drop off Dec - 15 and add Jan - 16 but have the columns sorted in a descending order so the previous month is always the last month in the report.
-- Desc | DEC - 15 | Jan - 16 | Feb - 16 | restofmonths| Nov 16 | Dec 16|
********************************************************************************
-- Loss | 1,000 | 2500 | 1700 | 123 | 4565 | 3433 |
-- Expense | 2,000 | 3200 | 900 | 456 | 1223 | 4445 |
-- Reserve | 3,000 | 3300 | 400 | 789 | 4747 | 4444 |
You need to use a matrix.
In your dataset, add 2 columns (if not already present). The first is the column header and the second is the column sort order. Something like this
[Header] = LEFT(DATENAME(MONTH, DateValue), 3) + ' - ' + RIGHT(YearNum, 2)
, [HeaderSort] = CONVERT(VARCHAR, YEAR(DateValue)) + RIGHT('0' + CONVERT(VARCHAR, DATEPART(MONTH, DateValue)), 2)
Set the matrix column group to your Header value and set the sort order to your HeaderSort value.

Using SQL Server windowing function to get running total by fiscal year

I'm using SQL Server 2014. I have a Claims table containing totals of claims made per month in my system:
+-----------+-------------+------------+
| Claim_ID | Claim_Date | Nett_Total |
+-----------+-------------+------------+
| 1 | 31 Jan 2012 | 321454.67 |
| 2 | 29 Feb 2012 | 523542.34 |
| 3 | 31 Mar 2012 | 35344.33 |
| 4 | 30 Apr 2012 | 142355.63 |
| etc. | etc. | etc. |
+-----------+-------------+------------+
For a report I am writing I need to be able to produce a cumulative running total that resets to zero at the start of each fiscal year (in my country this is from March 1 to February 28/29 of the following year).
The report will look similar to the table, with an extra running total column, something like:
+-----------+-------------+------------+---------------+
| Claim_ID | Claim_Date | Nett_Total | Running Total |
+-----------+-------------+------------+---------------+
| 1 | 31 Jan 2012 | 321454.67 | 321454.67 |
| 2 | 29 Feb 2012 | 523542.34 | 844997.01 |
| 3 | 31 Mar 2012 | 35344.33 | 35344.33 | (restart at 0
| 4 | 30 Apr 2012 | 142355.63 | 177699.96 | for new yr)
| etc. | etc. | etc. | |
+-----------+-------------+------------+---------------+
I know windowing functions are very powerful and I've used them in rudimentary ways in the past to get overall sums and averages while avoiding needing to group my resultset rows. I have an intuition that I will need to employ the 'preceding' keyword to get the running total for the current fiscal year each row falls into, but I can't quite grasp how to express the fiscal year as a concept to use in the 'preceding' clause (or if indeed it's possible to use a date range in this way).
Any assistance on the way of "phrasing" the fiscal year for the "preceding" clause will be of enormous help to me, please.
i think you should try this:
/* Create Table*/
CREATE TABLE dbo.Claims (
Claim_ID int
,Claim_Date datetime
,Nett_Total decimal(10,2)
);
/* Insert Testrows*/
INSERT INTO dbo.Claims VALUES
(1, '20120101', 10000)
,(2, '20120202', 10000)
,(3, '20120303', 10000)
,(4, '20120404', 10000)
,(5, '20120505', 10000)
,(6, '20120606', 10000)
,(7, '20120707', 10000)
,(8, '20120808', 10000)
Query the Data:
SELECT Claim_ID, Claim_Date, Nett_Total, SUM(Nett_Total) OVER
(PARTITION BY YEAR(DATEADD(month,-2,Claim_Date)) ORDER BY Claim_ID) AS
[Running Total] FROM dbo.Claims
The Trick: PARTITION BY YEAR(DATEADD(month,-2,Claim_Date))
New Partition by year, but i change the date so it fits your fiscal year.
Output:
Claim_ID |Claim_Date |Nett_Total |Running Total
---------+---------------------------+------------+-------------
1 |2012-01-01 00:00:00.000 |10000.00 |10000.00
2 |2012-02-02 00:00:00.000 |10000.00 |20000.00
3 |2012-03-03 00:00:00.000 |10000.00 |10000.00 <- New partition
4 |2012-04-04 00:00:00.000 |10000.00 |20000.00
5 |2012-05-05 00:00:00.000 |10000.00 |30000.00
6 |2012-06-06 00:00:00.000 |10000.00 |40000.00
7 |2012-07-07 00:00:00.000 |10000.00 |50000.00
8 |2012-08-08 00:00:00.000 |10000.00 |60000.00

Aggregate function on one column, group by on another, leave a third unaffected

I feel like this isn't too bad of a problem but I've been looking for a solution for the greater part of the day to no avail. Other solutions I've seen plenty of that don't seem to help me have been for getting columns that aren't unique values along with a group by and aggregate function.
The problem
I have a table of historical data as follows:
ID | source | value | date
---+--------+-------+-----------
1 | 12 | 10 | 2016-11-16
2 | 12 | 20 | 2015-11-16
3 | 12 | 30 | 2014-11-16
4 | 13 | 40 | 2016-11-16
5 | 13 | 50 | 2015-11-16
6 | 13 | 60 | 2014-11-16
I'm trying to get data before a certain date(within a loop to go different ranges), then getting the sum of the values grouped by source. So as an example "get all records before 30 days ago, and get the sum of the values of the unique sources, using the most recent dated entry for each".
So the first step was to remove entries with dates not in the range, an easy where date < getdate()-30 for example to get:
ID | source | value | date
---+--------+-------+-----------
2 | 12 | 20 | 2015-11-16
3 | 12 | 30 | 2014-11-16
5 | 13 | 50 | 2015-11-16
6 | 13 | 60 | 2014-11-16
Now my issue is finding a way to group by source and take the max date, and then sum up the result across all sources. The idea hear is that we don't know when the last entry is, so before the specified date we get all records, then take the newest entry for each unique source, and sum those up to get the total value at that time.
So the next step would be to group by source using the max of date, resulting in :
ID | source | value | date
---+--------+-------+-----------
2 | 12 | 20 | 2015-11-16
5 | 13 | 50 | 2015-11-16
And then the final step would be to sum the values, and then this process is repeated to get the sum value for multiple dates, so this would result in the row
value | date
-------+-----------
70 | getdate() - 30
to use for the rest.
Where I'm stuck
I'm trying to group by source and use the max of date to get the most recent entry for each unique source, but if I use the aggregate function or group by, then I can't preserve the ID or value columns to stick with the chosen max row. It's totally possible I'm just misunderstanding how aggregate functions work.
Progress so far
The best place I've gotten to yet is something like
with dataInDateRange as (
select *
from #historicalData hd
where hd.date < getdate() - 30
)
select ???, max(date)
from dataInDateRange
group by source
But I'm not seeing how I can do this without somehow preserving a unique ID for the row that has the max date for each source so then I can go back and sum up the numbers.
Thank you great people for any help/guidance/lessons
USE row_number()
with dataInDateRange as (
select *
from #historicalData hd
where hd.date < getdate() - 30
), rows as (
select *,
row_number() over (partition by source
order by date desc) as rn
from dataInDateRange
)
SELECT *
FROM rows
WHERE rn = 1

Resources