I created an array of values:
binBorder=exp(0:5)
# 1.000000 2.718282 7.389056 20.085537 54.598150 148.413159
which gives me an array with the length 6 in this case. Now I want to create a second array, which contains the the number which is exactly between thos two numbers. This should give an array of the size of five in this case and contain the values:
1.000000 - ( 1.000000 - 2.718282) / 2
2.718282 - ( 2.718282 - 7.389056) / 2
7.389056 - ( 7.389056 - 20.085537) / 2
20.085537 - (20.085537 - 54.598150) / 2
54.598150 - (54.598150 - 148.413159) / 2
Is there a built-in function for such things? I need it for the calculation of the bin center (that should be a common problem). Or is the following code the "easiest solution"?
> bb1 = exp(0:4)
> bb2 = exp(1:5)
> bb = bb1 + ((bb2 - bb1) / 2)
> bb
I'm a newcomer to R so I'm not sure how problems are generally solved. Is it more built-in functions or constructing things like the solution that I made up?
Thanks for your help,
Sven
Your solution can be rewritten using subsetting to avoid the intermediate variables:
(binBorder[1:5]+binBorder[-1])/2
[1] 1.859141 5.053669 13.737297 37.341843 101.505655
In fact, more generally you could write the following function:
midPoints <- function(x){
(x[-length(x)]+x[-1])/2
}
The function filter does what you are asking for. When used in the following way, it calculates the 2-period moving average:
filter(binBorder, c(0.5, 0.5), sides=1)
Time Series:
Start = 1
End = 6
Frequency = 1
[1] NA 1.859141 5.053669 13.737297 37.341843 101.505655
The only (slight) downside of filter is that it returns a value of class ts (for time series).
You can avoid that by calling convolve:
convolve(binBorder, c(0.5, 0.5), type="filter")
[1] 1.859141 5.053669 13.737297 37.341843 101.505655
Isn't this easily handled by diff()?
binBorder <- exp(0:5)
binBorder[1:5] + diff(binBorder)/2
Related
I would like to randomly choose from an array a certain number of elements in a way that those respect always a limit in their reciprocal distance.
For example, having a vector a <- seq(1,1000), how can I pick 20 elements with a minimum distance of 15 between each other?
For now, I am using a simple iteration for which I reject the choice whenever is too next to any element, but it is cumbersome and tends to be long if the number of elements to pick is high. Is there a best-practice/function for this?
EDIT - Summary of answers and analysis
So far I had two working answers which I wrapped in two specific functions.
# dash2 approach
# ---------------
rand_pick_min <- function(ar, min.dist, n.picks){
stopifnot(is.numeric(min.dist),
is.numeric(n.picks), n.picks%%1 == 0)
if(length(ar)/n.picks < min.dist)
stop('The number of picks exceeds the maximum number of divisions that the array allows which is: ',
floor(length(ar)/min.dist))
picked <- array(NA, n.picks)
copy <- ar
for (i in 1:n.picks) {
stopifnot(length(copy) > 0)
picked[i] <- sample(copy, 1)
copy <- copy[ abs(copy - picked[i]) >= min.dist ]
}
return(picked)
}
# denis approach
# ---------------
rand_pick_min2 <- function(ar, min.dist, n.picks){
require(Surrogate)
stopifnot(is.numeric(min.dist),
is.numeric(n.picks), n.picks%%1 == 0)
if(length(ar)/n.picks < min.dist)
stop('The number of picks exceeds the maximum number of divisions that the array allows which is: ',
floor(length(ar)/min.dist))
lar <- length(ar)
dist <- Surrogate::RandVec(a=min.dist, b=(lar-(n.picks)*min.dist),
s=lar, n=(n.picks+1), m=1, Seed=sample(1:lar, size = 1))$RandVecOutput
return(cumsum(round(dist))[1:n.picks])
}
Using the same example proposed I run 3 tests. Firstly, the effective validity of the minimum limit
# Libs
require(ggplot2)
require(microbenchmark)
# Inputs
a <- seq(1, 1000) # test vector
md <- 15 # min distance
np <- 20 # number of picks
# Run
dist_vec <- c(sapply(1:500, function(x) c(dist(rand_pick_min(a, md, np))))) # sol 1
dist_vec2 <- c(sapply(1:500, function(x) c(dist(rand_pick_min2(a, md, np))))) # sol 2
# Tests - break the min
cat('Any distance breaking the min in sol 1?', any(dist_vec < md), '\n') # FALSE
cat('Any distance breaking the min in sol 2?', any(dist_vec2 < md), '\n') # FALSE
Secondly, I tested for the distribution of the resulting distances, obtaining the first two plots in order of solution (sol1 [A] is dash2's sol, while sol2 [B] is denis' one).
pa <- ggplot() + theme_classic() +
geom_density(aes_string(x = dist_vec), fill = 'lightgreen') +
geom_vline(aes_string(xintercept = mean(dist_vec)), col = 'darkred') + xlab('Distances')
pb <- ggplot() + theme_classic() +
geom_density(aes_string(x = dist_vec2), fill = 'lightgreen') +
geom_vline(aes_string(xintercept = mean(dist_vec)), col = 'darkred') + xlab('Distances')
print(pa)
print(pb)
Lastly, I computed the computational times needed for the two approaches as following and obtaining the last figure.
comp_times <- microbenchmark::microbenchmark(
'solution_1' = rand_pick_min(a, md, np),
'solution_2' = rand_pick_min2(a, md, np),
times = 500
)
ggplot2::autoplot(comp_times); ggsave('stckoverflow2.png')
Enlighted by the results, I am asking my-self if the distance distribution as it is should be expected or it is a deviation due to the applied methods.
EDIT2 - Answer to the last question, following the comment made by denis
Using many more sampling procedures (5000), I produced a pdf of the resulting positions and indeed your approach contains some artefact that makes your solution (B) deviate from the one I needed. Nonetheless, it would be interesting to have the ability to enforce a specific final distribution of positions.
If you want to avoid the hit and miss methods, you will have to translate your problem into a sampling of distances with constraints on the sum of your distances.
Basically how i translate what you want: your N positions sampled are equivalent to N+1 distance, ranging from the minimum distance to the size of your vector - N*mindist (the case where all your samples are packed together). You then need to constrain the sum of the distances to be equal to 1000 (the size of your vector).
In this case the solution will use Surrogate::RandVec from Surrogate package (see Random sampling to give an exact sum), that allows a sampling with a fixed sum.
library(Surrogate)
a <- seq(1,1000)
mind <- 15
N <- 20
dist <- Surrogate::RandVec(a=mind, b=(1000-(N)*mind), s=1000, n=(N+1), m=1, Seed=sample(1:1000, size = 1))$RandVecOutput
pos <- cumsum(round(dist))[1:20]
pos
> pos
[1] 22 59 76 128 204 239 289 340 389 440 489 546 567 607 724 773 808 843 883 927
dist is the sampling f the distance. You reconstruct your position by making the sum of the distances. It gives you pos, the vector of your index positions.
The advantage is that you can get any value, and that your sampling is supposed to be random. For the speed part I don't know, you'll need to compare to your method for your big data case.
Here is an histogramm of 1000 try:
I think the best solution, which guarantees randomness in some sense (I'm not exactly sure what sense!) may be:
Pick a random element
Remove all elements that are too close to that element
Pick another element
Return to 2.
So:
min_dist <- 15
a <- seq(1, 1000)
picked <- integer(20)
copy <- a
for (i in 1:20) {
stopifnot(length(copy) > 0)
picked[i] <- sample(copy, 1)
copy <- copy[ abs(copy - picked[i]) >= min_dist ]
}
Whether this is faster than sample-and-reject may depend on the characteristics of the original vector. Also, as you can see, you are not guaranteed to be able to get all the elements you want, though in your particular case there won't be a problem because 19 intervals of width 30 could never cover the whole of seq(1, 1000).
I would like to compute the product of the next n adjacent elements of a matrix. The number n of elements to be multiplied should be given in function's input.
For example for this input I should compute the product of every 3 consecutive elements, starting from the first.
[p, ind] = max_product([1 2 2 1 3 1],3);
This gives [1*2*2, 2*2*1, 2*1*3, 1*3*1] = [4,4,6,3].
Is there any practical way to do it? Now I do this using:
for ii = 1:(length(v)-2)
p = prod(v(ii:ii+n-1));
end
where v is the input vector and n is the number of elements to be multiplied.
in this example n=3 but can take any positive integer value.
Depending whether n is odd or even or length(v) is odd or even, I get sometimes right answers but sometimes an error.
For example for arguments:
v = [1.35912281237829 -0.958120385352704 -0.553335935098461 1.44601450110386 1.43760259196739 0.0266423803393867 0.417039432979809 1.14033971399183 -0.418125096873537 -1.99362640306847 -0.589833539347417 -0.218969651537063 1.49863539349242 0.338844452879616 1.34169199365703 0.181185490389383 0.102817336496793 0.104835620599133 -2.70026800170358 1.46129128974515 0.64413523430416 0.921962619821458 0.568712984110933]
n = 7
I get the error:
Index exceeds matrix dimensions.
Error in max_product (line 6)
p = prod(v(ii:ii+n-1));
Is there any correct general way to do it?
Based on the solution in Fast numpy rolling_product, I'd like to suggest a MATLAB version of it, which leverages the movsum function introduced in R2016a.
The mathematical reasoning is that a product of numbers is equal to the exponent of the sum of their logarithms:
A possible MATLAB implementation of the above may look like this:
function P = movprod(vec,window_sz)
P = exp(movsum(log(vec),[0 window_sz-1],'Endpoints','discard'));
if isreal(vec) % Ensures correct outputs when the input contains negative and/or
P = real(P); % complex entries.
end
end
Several notes:
I haven't benchmarked this solution, and do not know how it compares in terms of performance to the other suggestions.
It should work correctly with vectors containing zero and/or negative and/or complex elements.
It can be easily expanded to accept a dimension to operate along (for array inputs), and any other customization afforded by movsum.
The 1st input is assumed to be either a double or a complex double row vector.
Outputs may require rounding.
Update
Inspired by the nicely thought answer of Dev-iL comes this handy solution, which does not require Matlab R2016a or above:
out = real( exp(conv(log(a),ones(1,n),'valid')) )
The basic idea is to transform the multiplication to a sum and a moving average can be used, which in turn can be realised by convolution.
Old answers
This is one way using gallery to get a circulant matrix and indexing the relevant part of the resulting matrix before multiplying the elements:
a = [1 2 2 1 3 1]
n = 3
%// circulant matrix
tmp = gallery('circul', a(:))
%// product of relevant parts of matrix
out = prod(tmp(end-n+1:-1:1, end-n+1:end), 2)
out =
4
4
6
3
More memory efficient alternative in case there are no zeros in the input:
a = [10 9 8 7 6 5 4 3 2 1]
n = 2
%// cumulative product
x = [1 cumprod(a)]
%// shifted by n and divided by itself
y = circshift( x,[0 -n] )./x
%// remove last elements
out = y(1:end-n)
out =
90 72 56 42 30 20 12 6 2
Your approach is correct. You should just change the for loop to for ii = 1:(length(v)-n+1) and then it will work fine.
If you are not going to deal with large inputs, another approach is using gallery as explained in #thewaywewalk's answer.
I think the problem may be based on your indexing. The line that states for ii = 1:(length(v)-2) does not provide the correct range of ii.
Try this:
function out = max_product(in,size)
size = size-1; % this is because we add size to i later
out = zeros(length(in),1) % assuming that this is a column vector
for i = 1:length(in)-size
out(i) = prod(in(i:i+size));
end
Your code works when restated like so:
for ii = 1:(length(v)-(n-1))
p = prod(v(ii:ii+(n-1)));
end
That should take care of the indexing problem.
using bsxfun you create a matrix each row of it contains consecutive 3 elements then take prod of 2nd dimension of the matrix. I think this is most efficient way:
max_product = #(v, n) prod(v(bsxfun(#plus, (1 : n), (0 : numel(v)-n)')), 2);
p = max_product([1 2 2 1 3 1],3)
Update:
some other solutions updated, and some such as #Dev-iL 's answer outperform others, I can suggest fftconv that in Octave outperforms conv
If you can upgrade to R2017a, you can use the new movprod function to compute a windowed product.
I am just learning matlab now. I faced a difficulty in creating an array of 3 elements in a row.
I wrote a code
Source = randi ([0,1],1,3);
which gave me output
[1,1,0].....
[0,1,1]....
but I was willing to get only one 1 and two zeros in the output instead of getting two 1 and one zero.
I know I am wrong because I am using randi function and gives random value of 0 & 1 and output I get can be [0,0,1] ... [1,0,0]... too.
My clear problem is to only get only one 1 if I repeat as many times. e.g. I should get only [0,0,1] or [0,1,0] or [1,0,0].
Hope I can get solution.
Thank you.
Ujwal
Here's a way using randperm:
n = 3; %// total number of elements
m = 1; %// number of ones
x = [ones(1,m) zeros(1,n-m)];
x = x(randperm(numel(x)));
Here is a couple of alternative solutions for your problem.
Create zero-filled matrix and set random element to one:
x = zeros(1, 3);
x(randi(3)) = 1;
Create 1x3 eye matrix and randomly circshift it:
x = circshift(eye(1,3), [0, randi(3)]);
Let's say that I've got data called 'myData.dat' in the form
x y
0 0
1 1
2 2
4 3
8 4
16 5
I need to find the following things from this data:
slope for points
0 to 5
1 to 5
2 to 5
3 to 5
4 to 5
y-intercept for the same pairs
equation for the line connecting the same pairs
Then I need to plot the the data and overlay the lines; below is a picture of what I'm asking for.
I know how to obtain the the slope and y-intercept for a single pair of points, and plot the data and the equation of the line. For example for points 1 and 5:
set table
plot "myData.dat" using 0:($0==0 ? y1=$2 : $2)
plot "myData.dat" using 0:($0==4 ? y5=$2 : $2)
unset table
m1 = (y5 - y1)/(5-1)
b1 = y1 - m1*1
y1(x) = m1*x + b1
I'm new to iteration (and gnuplot) and I think there's something wrong with my syntax. I've tried a number of things and they haven't worked. My best guess is that it would be in the form
plot for [i=1:4] using 0:($0==1 ? y.i=$1 : $1)
do for [i=1:5]{
m.i = (y5 - y.i)/(5-i)
b.i = y.i - m.i*1
y.i(x) = m.i*x + b.i
}
set multiplot
plot "myData.dat" w lp
plot for [i=1:4] y.1(x)
unset multiplot
So what is going wrong? Is gnuplot able to concatencate the loop counter to variables?
Your syntax is incorrect. Although there are other ways to do what you want, for instace using word(var,i), the most straightforward fix to what you already have would be to use eval to evaluate a string to which you can concatenate variables:
do for [i=1:5]{
eval "m".i." = (y5 - y".i.")/(5-".i.")"
eval "b".i." = y".i." - m".i."*1"
eval "y".i."(x) = m".i."*x + b".i
}
I am relatively new to R programming. I am writing a code that generates an array of numbers:
[1] 0.5077399, 0.4388107, 0.3858783, 0.3462711, 0.3170844, 0.2954411, 0.2789464, 0.2658839,
[9] 0.2551246, 0.2459498
Note: I manually separated the values by commas for ease on the eyes :)
I want to pick the first 3 numbers from this array that are below 0.3 - [0.2954411, 0.2658839, 0.2551246]. In addition to picking these values, I want to generate the numbers that represents where those three values exist within the array. In this case, I want the code to give me [6,7,8].
How would I write code to do this?
I greatly appreciate the help.
For a similar simulated set,
y <- c(2, 4,6, 8)
ind <- which(y < 6) ## for finding indices 1 and 2
val <- y[y<6] ## for picking values 2 and 4