Average number of rows by hour based on total number of days - database

Dear all
In PowerBI, using DirectQuery, I would like to have the sum of of occurences by hour per day, divided by the total number of days
Let me provide you with a sample data.
DataTable:
ID;DATE;HOUR715;2019-10-19;15:47:37181;2019-10-19;15:56:11349;2019-10-19;15:57:256ec;2019-10-19;15:58:1657e;2019-10-19;16:02:35860;2019-10-19;16:03:427a5;2019-10-19;16:03:52978;2019-10-19;16:05:19da0;2019-10-20;11:00:45c2d;2019-10-20;23:04:53355;2019-10-20;23:04:534f5;2019-10-20;23:05:10396;2019-10-21;14:42:245f7;2019-10-21;14:43:3793a;2019-10-21;14:55:36a44;2019-10-21;14:59:21264;2019-10-21;15:05:20f48;2019-10-21;15:07:01
And a summarized Dimension Table with the values present in DataTable:
DimHourTable:
COMPLETEHOUR;HOUR2415:47:37;1515:56:11;1515:57:25;1515:58:16;1516:02:35;1616:03:42;1616:03:52;1616:05:19;1611:00:45;1123:04:53;2323:04:53;2323:05:10;2314:42:24;1414:43:37;1414:55:36;1414:59:21;1415:05:20;1515:07:01;15
Note: Relationship with Both Directions filter between DataTable[HOUR] and DimHourTable[COMPLETEHOUR]
I'm now doing this:
formula1Occurrences = COUNTA( DataTable[id] )formula2 CountDays = DISTINCTCOUNT( DataTable[date] )formula3Avg_Occurrences = DIVIDE( [Occurences] , [CountDays] )
Then I'm putting in a matrix the following
Rows: DimHourTable[HOUR24]
Values: Avg_Occurrences
With that Sample Data, this is the average I'm getting.
11 -> 114 -> 415 -> 316 -> 423 -> 3
It ends up dividing by the number of days that contains that specific hour.
But, in reality, I would like to have this:
11 -> 0.3314 -> 1.3315 -> 216 -> 1.3323 -> 1
I would like to divide the occurrences by the total number of days present in the DataTable, independent if it contains that specific hours or not.
Does someone have an idea how to solve it?
Thanks in advance!

Try using this for formula2
CountDays = CALCULATE ( DISTINCTCOUNT ( DataTable[date] ), ALL ( 'DataTable' ) )
When you are using slicers, you can also try
CountDays = CALCULATE ( DISTINCTCOUNT ( DataTable[date] ), ALLSELECTED ( 'DataTable' ) )
The ALL and ALLSELECTED functions will remove the filtercontext that is created by DimHourTable[HOUR24], that you put on rows in your matrix-visual

Related

Need to generate from and to numbers based on the result set with a specified interval

I have below requirement.
Input is like as below.
Create table Numbers
(
Num int
)
Insert into Numbers
values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15)
Create table FromTo
(
FromNum int
,ToNum int
)
Select * From FromTo
Output should be as below.
FromNum ToNum
1 5
6 10
11 15
Actual Requirement is as below.
I need to load the data for a column into a table which will have thousands of records with different no's.
Consider like below.
1,2,5,7,9,11,15,34,56,78,98,123,453,765 etc..
I need to load these into other table which is having FROM and TO columns with the intervals of 5000. For example in the first 5000 if i have the no's till 3000, my 1st row should have FromNo as 1 and ToNum as 3000. second row: if the data is not having till 10000 and the next no started as 12312(This is the 2nd Row FromNum) the ToNum value should be +5000 i.e 17312. Here also if we don't have the no's data till 17312 it need to consider the ToNum between the 12312 and 17312
Output should be as below.
FromNum ToNum
1 3205
1095806 1100805
1100808 1105806
1105822 1110820
Can you guys please help me with the solution for the above.
Thanks in advance.
What you may try in this situation is to group data and get the expected results:
DECLARE #interval int = 5
INSERT INTO FromTo (FromNum, ToNum)
SELECT MIN(Num) AS FromNum, MAX(Num) AS ToNum
FROM Numbers
GROUP BY (Num - 1) / #interval

Summation based on unique entries of two arrays | Speed Issue

I have 3 arrays of size 803500*1 with the following details:
Rid: It can contain any number
RidID: It contains elements from 1 to 184 in random order. Each element appears multiple times.
r: It contains elements 0,1,2,...12. All elements (except zero) appear nearly 3400 to 3700 times at random indices in this array.
Following may be useful for generating sample data:
Rid = rand(803500,1);
RidID = randi(184,803500,1);
r = randi(13,803500,1)-1; %This may not be a good sample for r as per previously mentioned details?
What I want to do?
I want to calculate the sum of those entries of Rid which correspond to each positive unique entry of r and each unique entry of RidID.
This may be clearer with the code which I wrote for this problem:
RNum = numel(unique(RidID));
RSum = ones(RNum,12); %Preallocating for better speed
for i=1:12
RperM = r ==i;
for j = 1:RNum
RSum(j,i) = sum(Rid(RperM & (RidID==j)));
end
end
Issue:
My code works but it takes 5 seconds on average on my computer and I have to do this calculation nearly a thousand times. If this time be reduced from 5 seconds to atleast half of it, I'll be very happy. But how do I optimize this? I don't mind if it is made better with vectorization or any better written loop.
I am using MATLAB R2017b.
You can use accumarray :
u = unique(RidID);
A = accumarray([RidID r+1], Rid);
RSum = A(u, 2:13);
This is slower than accumarray as suggested by rahnema, but using findgroups and splitapply may save memory.
In your example, there may be thousands of zero-valued elements in the resulting matrix, where a combination of RidID and r does not occur. In this case a stacked result would be more memory efficient, like so:
RidID | r | Rid_sum
-------------------------
1 | 1 | 100
2 | 1 | 200
4 | 2 | 85
...
This can be achieved with the following code:
[ID, rn, RidIDn] = findgroups(r,RidID); % Get unique combo ID for 'r' and 'RidID'
RSum = splitapply( #sum, Rid, ID ); % Sum for each ID
output = table( RidIDn, rn, RSum ); % Nicely formatted table output
% Get rid of elements where r == 0
output( output.rn == 0, : ) = [];
You could convert this to the same output as the accumarray method, but it's already a slower method...
% Convert to 'unstacked' 2D matrix (optional)
RSum = full( sparse( 1:numel(Ridn), 1:numel(rn), RSum ) );

Calculate measure involving dates on a factTable with DAX

I have this problem:
given a "Movements" factTable that holds a list of warehouse transactions.
I want to know how many items arrived, how many were shipped (and this is trivial) but also how many are "In Order" at a particular time (and this is the difficult part)
So, each line can either be a receipt (it has a positive "qIn" value) or a shipment (positive qOut)
For example a very simple list of records could be:
ID Item TransactionDate OrderDate qIn qOut
1 A 2019-01-30 2019-01-10 5 0
2 A 2019-02-20 2019-01-15 3 0
3 A 2019-03-12 2019-01-20 0 6
4 A 2019-03-30 2019-02-20 20 0
That means:
On TransactionDate 2019-01-30 Items A has arrived in quantity 5.
The order for this had been created on 2019-01-10: so for that 20 days there was 5 quantity of Item A "ordered".
However, when I watch at the end of January, I should see 0 for this transaction in the "ordered" measure because it arrived on January 30.
Instead, for the second record, at the end of January I should see that a quantity of 3 was "in order", because the actual arrival has been on 2019-02-20.
So, at the end of the line, the Excel pivot table should show a situation similar to this:
Year 2019
Month January February March
IN | Ord IN | Ord IN | Ord
Item
A 5 3 3 20 20 0
The simple measure of qIn is:
qIN := SUM(Transactions[qtaIn])
The measure of ordered quantity I have elucubrated at the moment (that does nothing!):
orderedQty :=
CALCULATE (
SUMX ( Transactions; Transactions[qIn] );
DATESBETWEEN (
Transactions[TransactionDate];
MINX ( Transactions; Transactions[OrderDate] );
MAXX ( Transactions; Transactions[TransactionDate] )
)
)
EDIT
The "InOrder" measure should be "additive" in the sense that it should not only take into account what has happened in the current month, but also how much of the InOrder from past months is yet to be received.
With a picture (but that would be to do...) the whole thing would be clearer, at least from a logic perspective. However, also with a picture, I can't see how to extract "direct measures" from that logic.
Instead, exploiting the measures already provided by #Olly, the problem could be reformulated as:
InOrderFromOtherMonths := Sum (qIn) where Order Month <> Current Month
(i.e. how many are arrived in current month that comes from orders taken in past months)
InOrder := Total sum of (ORDER measure) - InOrderFromOtherMonths
PS.
I have created an Excel file with a little more interesting example.
In that file, using the "direct measure picture" the InOrder for January would be:
ID 2 + ID 5 + ID 6 (orders yet opened at end of January).
In values = 3+9+17=29
With the "indirect" measure would be:
Total sum of ORDER = 15+23+12=50
InOrderFromOtherMonths = 6+15=21
InOrder = Total sum of ORDER - InOrderFromOtherMonths = 50 - 21 = **29**
Create a Calendar table, including a YYYY-MM field. If you don't already have a calendar table, you can automatically create one in PowerPivot: Design > Date Table > New
Create an ACTIVE relationship between Calendar[Date] and Transactions[TransactionDate]
Create an INACTIVE relationship between Calendar[Date] and Transactions[OrderDate]
Now create your measures:
Measure IN:
IN:=SUM ( Transactions[qIn] )
Measure ORDERS:
ORDERS:=
CALCULATE (
SUM ( Transactions[qIn] ),
USERELATIONSHIP ( 'Calendar'[Date], Transactions[OrderDate] )
)
Measure ORDER:
ORDER:=
IF (
HASONEVALUE ( 'Calendar'[YYYY-MM] ),
CALCULATE (
[ORDERS],
FORMAT ( Transactions[TransactionDate], "YYYY-MM" ) <> VALUES ( 'Calendar'[YYYY-MM] )
)
)
And pivot to suit:
EDIT
After your question edit, I'm finding some of your labels confusing - but try creating the following measures:
Measure: Ordered
Ordered:=
CALCULATE (
SUM ( Movements[qIn] ),
USERELATIONSHIP ( 'Calendar'[Date], Movements[OrdDate] )
)
Measure: Received
Received:= SUM ( Movements[qIn] )
Measure: Outstanding
Outstanding:=
VAR EOMaxDate =
EOMONTH ( LASTDATE ( 'Calendar'[Date] ), 0 )
RETURN
IF (
ISBLANK ( [Ordered] ) && ISBLANK ( [Received] ),
BLANK(),
CALCULATE (
[Ordered] - [Received],
FILTER (
ALL ( 'Calendar'),
'Calendar'[Date] <= EOMaxDate
)
)
)
Now use those three measures in your pivot:
Or, more clearly:
See https://excel.solutions/so_55596609-2/ for example XLSX file

In SSRS, how can I add a row to aggregate all the rows that don't match a filter?

I'm working on a report that shows transactions grouped by type.
Type Total income
------- --------------
A 575
B 244
C 128
D 45
E 5
F 3
Total 1000
I only want to provide details for transaction types that represent more than 10% of the total income (i.e. A-C). I'm able to do this by applying a filter to the group:
Type Total income
------- --------------
A 575
B 244
C 128
Total 1000
What I want to display is a single row just above the total row that has a total for all the types that have been filtered out (i.e. the sum of D-F):
Type Total income
------- --------------
A 575
B 244
C 128
Other 53
Total 1000
Is this even possible? I've tried using running totals and conditionally hidden rows within the group. I've tried Iif inside Sum. Nothing quite seems to do what I need and I'm butting up against scope issues (e.g. "the value expression has a nested aggregate that specifies a dataset scope").
If anyone can give me any pointers, I'd be really grateful.
EDIT: Should have specified, but at present the dataset actually returns individual transactions:
ID Type Amount
---- ------ --------
1 A 4
2 A 2
3 B 6
4 A 5
5 B 5
The grouping is done using a row group in the tablix.
One solution is to solve that in the SQL source of your dataset instead of inside SSRS:
SELECT
CASE
WHEN CAST([Total income] AS FLOAT) / SUM([Total income]) OVER (PARTITION BY 1) >= 0.10 THEN [Type]
ELSE 'Other'
END AS [Type]
, [Total income]
FROM Source_Table
See also SQL Fiddle
Try to solve this in SQL, see SQL Fiddle.
SELECT I.*
,(
CASE
WHEN I.TotalIncome >= (SELECT Sum(I2.TotalIncome) / 10 FROM Income I2) THEN 10
ELSE 1
END
) AS TotalIncomePercent
FROM Income I
After this, create two sum groups.
SUM(TotalIncome * TotalIncomePercent) / 10
SUM(TotalIncome * TotalIncomePercent)
Second approach may be to use calculated column in SSRS. Try to create a calculated column with above case expression. If it allows you to create it, you may use it in the same way as SQL approach.
1) To show income greater than 10% use row visibility condition like
=iif(reportitems!total_income.value/10<= I.totalincome,true,false)
here reportitems!total_income.value is total of all income textbox value which will be total value of detail group.
and I.totalincome is current field value.
2)add one more row to outside of detail group to achieve other income and use expression as
= reportitems!total_income.value-sum(iif(reportitems!total_income.value/10<= I.totalincome,I.totalincome,nothing))

how to calculate rolling volatility

I am trying to design a function that will calculate 30 day rolling volatility.
I have a file with 3 columns: date, and daily returns for 2 stocks.
How can I do this? I have a problem in summing the first 30 entries to get my vol.
Edit:
So it will read an excel file, with 3 columns: a date, and daily returns.
daily.ret = read.csv("abc.csv")
e.g. date stock1 stock2
01/01/2000 0.01 0.02
etc etc, with years of data. I want to calculate rolling 30 day annualised vol.
This is my function:
calc_30day_vol = function()
{
stock1 = abc$stock1^2
stock2 = abc$stock1^2
j = 30
approx_days_in_year = length(abc$stock1)/10
vol_1 = 1: length(a1)
vol_2 = 1: length(a2)
for (i in 1 : length(a1))
{
vol_1[j] = sqrt( (approx_days_in_year / 30 ) * rowSums(a1[i:j])
vol_2[j] = sqrt( (approx_days_in_year / 30 ) * rowSums(a2[i:j])
j = j + 1
}
}
So stock1, and stock 2 are the squared daily returns from the excel file, needed to calculate vol. Entries 1-30 for vol_1 and vol_2 are empty since we are calculating 30 day vol. I am trying to use the rowSums function to sum the squared daily returns for the first 30 entries, and then move down the index for each iteration.
So from day 1-30, day 2-31, day 3-32, etc, hence why I have defined "j".
I'm new at R, so apologies if this sounds rather silly.
This should get you started.
First I have to create some data that look like you describe
library(quantmod)
getSymbols(c("SPY", "DIA"), src='yahoo')
m <- merge(ROC(Ad(SPY)), ROC(Ad(DIA)), all=FALSE)[-1, ]
dat <- data.frame(date=format(index(m), "%m/%d/%Y"), coredata(m))
tmpfile <- tempfile()
write.csv(dat, file=tmpfile, row.names=FALSE)
Now I have a csv with data in your very specific format.
Use read.zoo to read csv and then convert to an xts object (there are lots of ways to read data into R. See R Data Import/Export)
r <- as.xts(read.zoo(tmpfile, sep=",", header=TRUE, format="%m/%d/%Y"))
# each column of r has daily log returns for a stock price series
# use `apply` to apply a function to each column.
vols.mat <- apply(r, 2, function(x) {
#use rolling 30 day window to calculate standard deviation.
#annualize by multiplying by square root of time
runSD(x, n=30) * sqrt(252)
})
#`apply` returns a `matrix`; `reclass` to `xts`
vols.xts <- reclass(vols.mat, r) #class as `xts` using attributes of `r`
tail(vols.xts)
# SPY.Adjusted DIA.Adjusted
#2012-06-22 0.1775730 0.1608266
#2012-06-25 0.1832145 0.1640912
#2012-06-26 0.1813581 0.1621459
#2012-06-27 0.1825636 0.1629997
#2012-06-28 0.1824120 0.1630481
#2012-06-29 0.1898351 0.1689990
#Clean-up
unlink(tmpfile)

Resources