I want to merge the two data into one. But as they are in two different times that's why I cannot just combine them.
I need to add them by keeping the time as it is.
How can I do this?
data_1_y_axes=[0,1,3,5,4,6,8,9,7]
time_1_x_axes=[.02,0.03,.05,.06,.07,0.08,0.09,.1,.2]
data_2_y_axes=[0,2,4,5,2,7,5,7,5]
time_2_x_axes=[.002,0.004,.006,.009,.02,0.04,0.06,.07,.09]
plot(time_1_x_axes,data_1_y_axes)
hold on
plot(time_2_x_axes,data_2_y_axes)
My expected data will be as follows:
New_data=[ 0, 2,4,5,2+0,1,7,3,5+5,7+4,6,5+8,9,7]
New_time=[.002,.004,.006,.009,.02,.03,.04,.05,.06,.07,.08,.09,.1,.2]
How can I do this?
Here is another way to do it without using a for loop. This will run much faster:
data_1_y_axes=[0,1,3,5,4,6,8,9,7]
time_1_x_axes=[.02,0.03,.05,.06,.07,0.08,0.09,.1,.2]
data_2_y_axes=[0,2,4,5,2,7,5,7,5]
time_2_x_axes=[.002,0.004,.006,.009,.02,0.04,0.06,.07,.09]
[time_merged,i1,i2] = intersect(time_1_x_axes, time_2_x_axes)
data_merged = data_1_y_axes(i1) + data_2_y_axes(i2)
[time1_remaining, ir1] = setdiff(time_1_x_axes, time_merged)
[time2_remaining, ir2] = setdiff(time_2_x_axes, time_merged)
[time_merged, idx] = sort([time_merged time_1_x_axes(ir1) time_2_x_axes(ir2)])
data_merged = [data_merged data_1_y_axes(ir1) data_2_y_axes(ir2)]
data_merged = data_merged(idx)
plot(time_merged,data_merged)
You could combine the x and y axis arrays, then aggregate by x-axis values using
unique to get unique x values and their indices within the y values) and
accumarray to add up all of the y values with a common x axis index
Using your example data, this would look like:
y1 =[0,1,3,5,4,6,8,9,7];
x1 =[.02,0.03,.05,.06,.07,0.08,0.09,.1,.2];
y2 =[0,2,4,5,2,7,5,7,5];
x2 =[.002,0.004,.006,.009,.02,0.04,0.06,.07,.09];
x = [x1, x2]; % Combine x axis data
y = [y1, y2]; % Combine y axis data
[x, ~, idx] = unique( x(:) ); % Get unique x, and their indices
y = accumarray( idx, y ); % Add up y values according to x value index
Aggregating Sample Values With Different Sampling Times
The following approach combines the data into a single vectors, Time_Vector and Data_Vector. Then the unique() function is used to find the unique sample times that exist within Time_Vector. A for-loop is used to evaluate the indices where the same sample time exists by using the find() function. After finding these indices the corresponding values are obtained by using matrix indexing (finds all the values that occur at a given sample time). The aggregate is then taken of this array by using the sum() function.
data_1_y_axes = [0,1,3,5,4,6,8,9,7];
time_1_x_axes = [0.02,0.03,0.05,0.06,0.07,0.08,0.09,0.1,0.2];
data_2_y_axes = [0,2,4,5,2,7,5,7,5];
time_2_x_axes = [0.002,0.004,0.006,0.009,0.02,0.04,0.06,0.07,0.09];
Data_Vector = [data_1_y_axes data_2_y_axes];
Time_Vector = [time_1_x_axes time_2_x_axes];
Unique_Times = unique(Time_Vector);
for Sample_Index = 1: length(Unique_Times)
Time_Value = Unique_Times(Sample_Index);
Indices_With_Matching_Time = find(Time_Vector == Time_Value);
Output_Data(Sample_Index) = sum(Data_Vector(Indices_With_Matching_Time));
end
plot(Unique_Times,Output_Data);
Ran using MATLAB R2019b
I am trying to take a derivative of an array but am having trouble. The array is two dimensional, x and y directions. I would like to take a derivative along x and along y using central difference discretization. The array has random values of numbers, no values are NaN. I will provide a basic portion of the code below to illustrate my point (assume the array u is defined and has some initial values already inputted into it)
integer :: i,j
integer, parameter :: nx=10, ny=10
real, dimension(-nx:nx, -ny:ny) :: u,v,w
real, parameter :: h
do i=-nx,nx
do j=-ny,ny
v = (u(i+1,j)-u(i-1,j))/(2*h)
w = (u(i,j+1)-u(i,j-1))/(2*h)
end do
end do
Note, assume the array u is defined and filled up before I find v,w. v,w are supposed to be derivatives of the array u along x and along y,respectively. Is this the correct way to take a derivative of an array?
I can see several problems in your code.
1.You must be careful what you have on the left hand side.
v = (u(i+1,j)-u(i-1,j))/(2*h)
means that the whole array v will be set to the same number everywhere. You don't want this in a loop. In a loop you want to set just one point at a time
v(i,j) = (u(i+1,j)-u(i-1,j)) / (2*h)
and 2) You are accessing the array out of bounds. You can keep the simple loop, but you must use the boundary points as "ghost points" which store the boundary values. If I assume that points -nx,nx,-nyandny` are lying on the boundary, then you can only compute the derivative using the central difference inside the domain:
do i=-nx+1,nx-1
do j=-ny+1,ny-1
v(i,j) = (u(i+1,j)-u(i-1,j)) / (2*h)
w(i,j) = (u(i,j+1)-u(i,j-1)) / (2*h)
end do
end do
If you need the derivative on the boundary, you must use a on-sided difference like
do j=-ny+1,ny-1
v(nx,j) = (u(nx,j)-u(nx-1,j)) / h
w(nx,j) = (u(nx,j+1)-u(nx,j-1)) / h
end do
I am working with Bank of America time series data for stock prices. I am trying to store the forecasted value for a specific step ahead (in this case 1:20 steps) in an array. I then need to subtract each value of the array from each value of the test array. Then I have to square each value of the array, sum all the squared values of the array, then divide by N (N = number of steps forecasted ahead).
I have the following so far. Also, the quantmod and fpp libraries are needed for this.
---------Bank of America----------
library(quantmod)
library(fpp)
BAC = getSymbols('BAC',from='2009-01-02',to='2014-10-15',auto.assign=FALSE)
BAC.adj = BAC$BAC.Adjusted
BAC.daily=dailyReturn(BAC.adj,type='log')
test = tail(BAC.daily, n = 20)
train = head(BAC.daily, n = 1437)
Trying to write a function to forecast, extract requisite value (point forecast for time i), then store it in an array where I can perform operations on that array (i.e. - add, multiply, exponentiate, sum the values of the array)
MSE = function(N){
for(i in 1:(N)){
x = forecast(model1, h = i)
y = x$mean
w = as.matrix(as.double(as.matrix(unclass(y))))
p = array(test[i,]-w[i,])
}
}
and we also have:
model1 = Arima(train, order = c(0,2,0))
MSE = function(N){
result = vector("list", length = (N))
for(i in 1:(N)){
x = forecast(model1, h = i)
point_forecast = as.double(as.matrix(unclass(x$mean)))
result[i] = point_forecast
}
result = as.matrix(do.call(cbind, result))
}
Neither of these functions have worked so far. When I run the MSE function, I get the following errors:
> MSE(20)
There were 19 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: In result[i] = point_forecast :
number of items to replace is not a multiple of replacement length
2: In result[i] = point_forecast :
number of items to replace is not a multiple of replacement length
3: In result[i] = point_forecast :
number of items to replace is not a multiple of replacement length
4: In result[i] = point_forecast :
When I run MSE2 function, I get the following ouput:
MSE2(20)
[1] -0.15824
When putting a print statement inside, it printed out 'p' as a singular number, just like above (even though that had been run for i = 20). The x,y, and w variable in the MSE2 function act as vectors as far as storing the output, so I do not understand why p does not as well.
I appreciate any help in this matter, thank you.
Sincerely,
Mitchell Healy
Your question has two MSE functions: one in the first code block and one in the second code block.
Also, library(forecast) is needed to run Arima and forecast.
My understanding of what you are trying to do in the first paragraph is to compute the 20-step ahead forecast error. That is, what is the error in forecasts from model1 20 days ahead, based on your test data. This can be done in the code below:
model1 <- Arima(train, order = c(0,2,0))
y_fcst<-forecast(model1,h=20)$mean
errors<-as.vector(y_fcst)-as.vector(test)
MSE.fcst<-mean(errors^2)
However, I'm not sure what you're trying to do here: an ARIMA(0,2,0) model is simply modelling the differences in returns as a random walk. That is, this model just differences the returns twice and assumes this twice-differenced data is white noise. There's no parameters other than $\sigma^2$ being estimated.
Rob Hyndman has a blog post covering computing errors from rolling forecasts.
My solution to finding the MSE is below. I used log adjusted daily return data from Bank of America gathered through quantmod. Then I subsetted the data (which had length 1457) into training[1:1437] and testing[1438:1457].
The solution is:
forc = function(N){
forecast = matrix(data = NA, nrow = (N) )
for(i in 1:N){
fit = Arima(BAC.adj[(1+(i-1)):(1437+(i-1))], order = c(0,0,4))
x = forecast(fit, h = 1)
forecast[i,] = as.numeric(x$mean)
}
error = test - forecast
error_squared = error^2
sum_error_squared = sum(error_squared)
MSE = sum_error_squared/N
MSE
}
I have a large data set with two arrays, say x and y. The arrays have over 1 million data points in size. Is there a simple way to do a scatter plot of only 2000 of these points but have it be representative of the entire set?
I'm thinking along the lines of creating another array r ; r = max(x)*rand(2000,1) to get a random sample of the x array. Is there a way to then find where a value in r is equal to, or close to a value in x ? They wouldn't have to be in the same indexed location but just throughout the whole matrix. We could then plot the y values associated with those found x values against r
I'm just not sure how to code this. Is there a better way than doing this?
I'm not sure how representative this procedure will be of your data, because it depends on what your data looks like, but you can certainly code up something like that. The easiest way to find the closest value is to take the min of the abs of the difference between your test vector and your desired value.
r = max(x)*rand(2000,1);
for i = 1:length(r)
[~,z(i)] = min(abs(x-r(i)));
end
plot(x(z),y(z),'.')
Note that the [~,z(i)] in the min line means we want to store the index of the minimum value in vector z.
You might also try something like a moving average, see this video: http://blogs.mathworks.com/videos/2012/04/17/using-convolution-to-smooth-data-with-a-moving-average-in-matlab/
Or you can plot every n points, something like (I haven't tested this, so no guarantees):
n = 1000;
plot(x(1:n:end),y(1:n:end))
Or, if you know the number of points you want (again, untested):
npoints = 2000;
interval = round(length(x)/npoints);
plot(x(1:interval:end),y(1:interval:end))
Perhaps the easiest way is to use round function and convert things to integers, then they can be compared. For example, if you want to find points that are within 0.1 of the values of r, multiply the values by 10 first, then round:
r = max(x) * round(2000,1);
rr = round(r / 0.1);
xx = round(x / 0.1);
inRR = ismember(xx, rr)
plot(x(inRR), y(inRR));
By dividing by 0.1, any values that have the same integer value are within 0.1 of each other.
ismember returns a 1 for each value of xx if that value is in rr, otherwise a 0. These can be used to select entries to plot.