Probability density function with large mu and sigma values? - arrays

I am using the following function :
pd=makedist('normal',mu,sigma);
y = pdf(pd,speed)
The size of mu and sigma is 50x1 and X size is 3000x1. passing one value of mu,sigma and speed at a time, I am getting the output. But how I can pass all these values at the same time so that at the end I will get a data set containing all y values?
I think I have to use a for loop but am unsure how to do it.

mu = rand(50,1);
sigma = rand(50,1);
speed = rand(3000,1);
y = zeros(numel(mu),numel(speed));
for k = 1:numel(mu)
pd = makedist('normal',mu(k),sigma(k));
y(k,:) = pdf(pd,speed); %store in for loop
end
By initialising the output one can easily double-loop to calculate all components. Your ouput is now indexed as y(mu/sigma,speed), thus the first index corresponds to the mu/sigma pair and the second to the speed entry used.

Related

How can I merge my data set based on the time in MATLAB?

I want to merge the two data into one. But as they are in two different times that's why I cannot just combine them.
I need to add them by keeping the time as it is.
How can I do this?
data_1_y_axes=[0,1,3,5,4,6,8,9,7]
time_1_x_axes=[.02,0.03,.05,.06,.07,0.08,0.09,.1,.2]
data_2_y_axes=[0,2,4,5,2,7,5,7,5]
time_2_x_axes=[.002,0.004,.006,.009,.02,0.04,0.06,.07,.09]
plot(time_1_x_axes,data_1_y_axes)
hold on
plot(time_2_x_axes,data_2_y_axes)
My expected data will be as follows:
New_data=[ 0, 2,4,5,2+0,1,7,3,5+5,7+4,6,5+8,9,7]
New_time=[.002,.004,.006,.009,.02,.03,.04,.05,.06,.07,.08,.09,.1,.2]
How can I do this?
Here is another way to do it without using a for loop. This will run much faster:
data_1_y_axes=[0,1,3,5,4,6,8,9,7]
time_1_x_axes=[.02,0.03,.05,.06,.07,0.08,0.09,.1,.2]
data_2_y_axes=[0,2,4,5,2,7,5,7,5]
time_2_x_axes=[.002,0.004,.006,.009,.02,0.04,0.06,.07,.09]
[time_merged,i1,i2] = intersect(time_1_x_axes, time_2_x_axes)
data_merged = data_1_y_axes(i1) + data_2_y_axes(i2)
[time1_remaining, ir1] = setdiff(time_1_x_axes, time_merged)
[time2_remaining, ir2] = setdiff(time_2_x_axes, time_merged)
[time_merged, idx] = sort([time_merged time_1_x_axes(ir1) time_2_x_axes(ir2)])
data_merged = [data_merged data_1_y_axes(ir1) data_2_y_axes(ir2)]
data_merged = data_merged(idx)
plot(time_merged,data_merged)
You could combine the x and y axis arrays, then aggregate by x-axis values using
unique to get unique x values and their indices within the y values) and
accumarray to add up all of the y values with a common x axis index
Using your example data, this would look like:
y1 =[0,1,3,5,4,6,8,9,7];
x1 =[.02,0.03,.05,.06,.07,0.08,0.09,.1,.2];
y2 =[0,2,4,5,2,7,5,7,5];
x2 =[.002,0.004,.006,.009,.02,0.04,0.06,.07,.09];
x = [x1, x2]; % Combine x axis data
y = [y1, y2]; % Combine y axis data
[x, ~, idx] = unique( x(:) ); % Get unique x, and their indices
y = accumarray( idx, y ); % Add up y values according to x value index
Aggregating Sample Values With Different Sampling Times
The following approach combines the data into a single vectors, Time_Vector and Data_Vector. Then the unique() function is used to find the unique sample times that exist within Time_Vector. A for-loop is used to evaluate the indices where the same sample time exists by using the find() function. After finding these indices the corresponding values are obtained by using matrix indexing (finds all the values that occur at a given sample time). The aggregate is then taken of this array by using the sum() function.
data_1_y_axes = [0,1,3,5,4,6,8,9,7];
time_1_x_axes = [0.02,0.03,0.05,0.06,0.07,0.08,0.09,0.1,0.2];
data_2_y_axes = [0,2,4,5,2,7,5,7,5];
time_2_x_axes = [0.002,0.004,0.006,0.009,0.02,0.04,0.06,0.07,0.09];
Data_Vector = [data_1_y_axes data_2_y_axes];
Time_Vector = [time_1_x_axes time_2_x_axes];
Unique_Times = unique(Time_Vector);
for Sample_Index = 1: length(Unique_Times)
Time_Value = Unique_Times(Sample_Index);
Indices_With_Matching_Time = find(Time_Vector == Time_Value);
Output_Data(Sample_Index) = sum(Data_Vector(Indices_With_Matching_Time));
end
plot(Unique_Times,Output_Data);
Ran using MATLAB R2019b

Random sampling of elements from an array based on a target condition

I have an array (let's call it ElmInfo) of size Nx2 representing a geometry. In that array the element number and element volume are on the column 1 and column 2 respectively. The volume of elements largely vary. The sum of the volume of all elements leads to a value V which can be obtained in MATLAB as:
V=sum(ElmInfo(:,2));
I want to randomly sample elements from the array ElmInfo in such a way that the volume of sampled elements (with no repetition) will lead to a target volume V1. Note: V1 is less than V. So I don't know the number of elements to be sampled. I am giving an example. For a sampling case number of sampled element can be '10' whereas in other sampling number of sampled element can be '15'.
There is no straightforward MATLAB in-built function to meet the target condition. How can I implement the code in MATLAB?
Finally I got the answer of my question. Here is the solution I got from a contributor at MATLAB central. For the convenience of the stack overflow community I am posting the answer here.
TotVol=sum(ElmInfo(:,2));
DefVf = 1.5; % This is the volume fraction I want to sample
% Target sample volume
DefVolm_target = TotVol*(DefVf/100);
% **************************************
n = 300;
v = ElmInfo(:,2);
tol = 1e-6;
sample = [];
maxits = 10000;
for count = 1:maxits
p = randperm(n);
s = cumsum(v(p));
k = find(abs(s - DefVolm_target) < tol);
if ~isempty(k)
sample_indices = p(1:k(1));
sample = v(sample_indices);
fprintf('Sample found after %d iterations\n', count);
break
end
end
DefVol_sim=sum(sample);
sampled_Elm=sort(sample_indices);

Finite difference derivative of an array

I am trying to take a derivative of an array but am having trouble. The array is two dimensional, x and y directions. I would like to take a derivative along x and along y using central difference discretization. The array has random values of numbers, no values are NaN. I will provide a basic portion of the code below to illustrate my point (assume the array u is defined and has some initial values already inputted into it)
integer :: i,j
integer, parameter :: nx=10, ny=10
real, dimension(-nx:nx, -ny:ny) :: u,v,w
real, parameter :: h
do i=-nx,nx
do j=-ny,ny
v = (u(i+1,j)-u(i-1,j))/(2*h)
w = (u(i,j+1)-u(i,j-1))/(2*h)
end do
end do
Note, assume the array u is defined and filled up before I find v,w. v,w are supposed to be derivatives of the array u along x and along y,respectively. Is this the correct way to take a derivative of an array?
I can see several problems in your code.
1.You must be careful what you have on the left hand side.
v = (u(i+1,j)-u(i-1,j))/(2*h)
means that the whole array v will be set to the same number everywhere. You don't want this in a loop. In a loop you want to set just one point at a time
v(i,j) = (u(i+1,j)-u(i-1,j)) / (2*h)
and 2) You are accessing the array out of bounds. You can keep the simple loop, but you must use the boundary points as "ghost points" which store the boundary values. If I assume that points -nx,nx,-nyandny` are lying on the boundary, then you can only compute the derivative using the central difference inside the domain:
do i=-nx+1,nx-1
do j=-ny+1,ny-1
v(i,j) = (u(i+1,j)-u(i-1,j)) / (2*h)
w(i,j) = (u(i,j+1)-u(i,j-1)) / (2*h)
end do
end do
If you need the derivative on the boundary, you must use a on-sided difference like
do j=-ny+1,ny-1
v(nx,j) = (u(nx,j)-u(nx-1,j)) / h
w(nx,j) = (u(nx,j+1)-u(nx,j-1)) / h
end do

Store values from a time series function in an array using a for loop in R

I am working with Bank of America time series data for stock prices. I am trying to store the forecasted value for a specific step ahead (in this case 1:20 steps) in an array. I then need to subtract each value of the array from each value of the test array. Then I have to square each value of the array, sum all the squared values of the array, then divide by N (N = number of steps forecasted ahead).
I have the following so far. Also, the quantmod and fpp libraries are needed for this.
---------Bank of America----------
library(quantmod)
library(fpp)
BAC = getSymbols('BAC',from='2009-01-02',to='2014-10-15',auto.assign=FALSE)
BAC.adj = BAC$BAC.Adjusted
BAC.daily=dailyReturn(BAC.adj,type='log')
test = tail(BAC.daily, n = 20)
train = head(BAC.daily, n = 1437)
Trying to write a function to forecast, extract requisite value (point forecast for time i), then store it in an array where I can perform operations on that array (i.e. - add, multiply, exponentiate, sum the values of the array)
MSE = function(N){
for(i in 1:(N)){
x = forecast(model1, h = i)
y = x$mean
w = as.matrix(as.double(as.matrix(unclass(y))))
p = array(test[i,]-w[i,])
}
}
and we also have:
model1 = Arima(train, order = c(0,2,0))
MSE = function(N){
result = vector("list", length = (N))
for(i in 1:(N)){
x = forecast(model1, h = i)
point_forecast = as.double(as.matrix(unclass(x$mean)))
result[i] = point_forecast
}
result = as.matrix(do.call(cbind, result))
}
Neither of these functions have worked so far. When I run the MSE function, I get the following errors:
> MSE(20)
There were 19 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: In result[i] = point_forecast :
number of items to replace is not a multiple of replacement length
2: In result[i] = point_forecast :
number of items to replace is not a multiple of replacement length
3: In result[i] = point_forecast :
number of items to replace is not a multiple of replacement length
4: In result[i] = point_forecast :
When I run MSE2 function, I get the following ouput:
MSE2(20)
[1] -0.15824
When putting a print statement inside, it printed out 'p' as a singular number, just like above (even though that had been run for i = 20). The x,y, and w variable in the MSE2 function act as vectors as far as storing the output, so I do not understand why p does not as well.
I appreciate any help in this matter, thank you.
Sincerely,
Mitchell Healy
Your question has two MSE functions: one in the first code block and one in the second code block.
Also, library(forecast) is needed to run Arima and forecast.
My understanding of what you are trying to do in the first paragraph is to compute the 20-step ahead forecast error. That is, what is the error in forecasts from model1 20 days ahead, based on your test data. This can be done in the code below:
model1 <- Arima(train, order = c(0,2,0))
y_fcst<-forecast(model1,h=20)$mean
errors<-as.vector(y_fcst)-as.vector(test)
MSE.fcst<-mean(errors^2)
However, I'm not sure what you're trying to do here: an ARIMA(0,2,0) model is simply modelling the differences in returns as a random walk. That is, this model just differences the returns twice and assumes this twice-differenced data is white noise. There's no parameters other than $\sigma^2$ being estimated.
Rob Hyndman has a blog post covering computing errors from rolling forecasts.
My solution to finding the MSE is below. I used log adjusted daily return data from Bank of America gathered through quantmod. Then I subsetted the data (which had length 1457) into training[1:1437] and testing[1438:1457].
The solution is:
forc = function(N){
forecast = matrix(data = NA, nrow = (N) )
for(i in 1:N){
fit = Arima(BAC.adj[(1+(i-1)):(1437+(i-1))], order = c(0,0,4))
x = forecast(fit, h = 1)
forecast[i,] = as.numeric(x$mean)
}
error = test - forecast
error_squared = error^2
sum_error_squared = sum(error_squared)
MSE = sum_error_squared/N
MSE
}

Plot Representative sample of large data set - Matlab

I have a large data set with two arrays, say x and y. The arrays have over 1 million data points in size. Is there a simple way to do a scatter plot of only 2000 of these points but have it be representative of the entire set?
I'm thinking along the lines of creating another array r ; r = max(x)*rand(2000,1) to get a random sample of the x array. Is there a way to then find where a value in r is equal to, or close to a value in x ? They wouldn't have to be in the same indexed location but just throughout the whole matrix. We could then plot the y values associated with those found x values against r
I'm just not sure how to code this. Is there a better way than doing this?
I'm not sure how representative this procedure will be of your data, because it depends on what your data looks like, but you can certainly code up something like that. The easiest way to find the closest value is to take the min of the abs of the difference between your test vector and your desired value.
r = max(x)*rand(2000,1);
for i = 1:length(r)
[~,z(i)] = min(abs(x-r(i)));
end
plot(x(z),y(z),'.')
Note that the [~,z(i)] in the min line means we want to store the index of the minimum value in vector z.
You might also try something like a moving average, see this video: http://blogs.mathworks.com/videos/2012/04/17/using-convolution-to-smooth-data-with-a-moving-average-in-matlab/
Or you can plot every n points, something like (I haven't tested this, so no guarantees):
n = 1000;
plot(x(1:n:end),y(1:n:end))
Or, if you know the number of points you want (again, untested):
npoints = 2000;
interval = round(length(x)/npoints);
plot(x(1:interval:end),y(1:interval:end))
Perhaps the easiest way is to use round function and convert things to integers, then they can be compared. For example, if you want to find points that are within 0.1 of the values of r, multiply the values by 10 first, then round:
r = max(x) * round(2000,1);
rr = round(r / 0.1);
xx = round(x / 0.1);
inRR = ismember(xx, rr)
plot(x(inRR), y(inRR));
By dividing by 0.1, any values that have the same integer value are within 0.1 of each other.
ismember returns a 1 for each value of xx if that value is in rr, otherwise a 0. These can be used to select entries to plot.

Resources