Array Manipulation - randomly choosing elements - arrays

Suppose I have an array of length N. I want to choose n positions randomly, make them zero and then add the existing elements to the next non-zero element.
For example, suppose r = (r1,r2,r3,r4,r5), N = 5. Let n = 2. And the randomly picked positions are 3rd and 4th. Then I want to transform r to
r_new = (r1, r2, 0, 0, r3+r4+r5).
Instead if the randomly selected positions were 1 and 3, then I want to have
r_new = (0, r1 + r2, 0, r3+r4, r5).
I am coding in MATLAB. Here is my current code.
u = randperm(T);
ind = sort(u(1:n(i)));
tmp = r(ind);
r(ind) = 0;
x = find( r );
I am not necessarily looking for MATLAB code. Pseudocode would be helpful enough.

I'm assuming the last position can never be selected, otherwise the intended behaviour is undefined. So you randomly select n positions uniformly distributed from 1 up to N-1 (not up to N).
Here's one approach:
Select n distinct random positions from 1 to N-1, and sort them. Call the resulting vector of positions pos. This can be easily done with randperm and sort.
For each value in pos, say p, accumulate r(p) into r(p+1), and set r(p) to zero. This is done with a for loop.
In step 2, if position p+1 happens to belong to pos too, the accumulated value will be moved further to the right in a subsequent iteration. This works because pos has been sorted, so the randomly selected positions are processed from left to right.
r = [3 5 4 3 7 2 8]; %// data
n = 2; %// number of positions
pos = sort(randperm(numel(r)-1,n)); %// randomly select positions, and sort them
for p = pos
r([p p+1]) = [0 r(p)+r(p+1)]; %// process position p
end

Assuming N, n and r are already generated, then we select random indexes:
inds = randi(N,n,1);
Then to achieve the desired results you can loop as follows:
inds = sort(inds);
for ii=1:numel(inds)
if(inds(ii)<N)
r(inds(ii)+1)=r(inds(ii)+1) +r(inds(ii));
r(inds)=0;
else
r(inds)=0;
end
end
This will create the desired outcome of adding the values to the next index that wasn't selected to be set to 0.
Note I had to assume an edge case where if the last index is set to 0, then its value is not added to anything.

Related

How can I update a range within an array with a sequence

Given an array of values, how can I update a range with a sequence within that array, efficiently?
Updates are performed multiple times. After all updates are performed, we can query any index of the array for its final value.
If we update a value of v at index i, every element at index j is increased with a value of max { v - | i - j | , 0 }
For example.
array = {1,1,1,1,1,1}
Now I do an update at index 4 with a value of 3 the resulting array will look like this:
array = {1,1,2,3,4,3}
I want to perform both operations efficiently.
You can't update a range of elements "efficiently". Questions like these are always about figuring out how to avoid updating a range of elements altogether.
To figure out this one, consider two operations:
INTEGRATE(A) takes an array and replaces every element A[i] with sum(A[0]...A[i]).
DIFF(A) takes an array and replaces every element with its difference from the previous element (the first element is left unaltered).
These operations have some important properties:
They are inverses: INTEGRATE(DIFF(A)) = DIFF(INTEGRATE(A)) = A for all arrays A; and
They are linear: If A = B+C, then INTEGATE(A) = INTEGRATE(B) + INTEGRATE(C), and similarly for DIFF.
Your final array is the sum of the original array, plus a whole bunch of those "triangle" arrays. Let's say it's A + T1 + T2 + T3... etc.
Each one of those triangles has a whole bunch of non-zero elements, but watch what happens when you apply DIFF twice:
[0,0,1,2,3,2,1,0,0] -> [0,0,1,1,1,-1,-1,-1,0] -> [0,0,1,0,0,-2,0,0,1]
The result has only 3 non-zero elements. That gives us a way to calculate your final array quickly.
Let D(X) = DIFF(DIFF(X)) and let I(X) = INTEGRATE(INTEGRATE(X)). Then instead of calculating A + T1 + T2 + T3..., you calculate I( D(A) + D(T1) + D(T2) + D(T3)... )
Since all those D(Tx) have at most 3 non-zero elements, it's quick and easy to add them into the result.
I'm deliberately explaining how to solve it, without giving you full code. This also handles the complex case of interleaved updates and lookups, but therefore is more complex than what Matter Timmermans came up with.
You obviously can't use an array as your representation. It makes lookups fast, but an update with value k will be an O(k) operation.
Our second try, is to just have a list of the updates. Now updates are O(1), but after m updates a lookup is O(m).
What we need is to have a way to store updates such that both adding an update and doing a lookup are fast.
The first step is to change an update from "update at a value" to "update a range by a linear rule". That is currently you say:
update at 4 by 3
Instead we'd say:
from 2 to 3:
update by x - 2
from 4 to 5:
update by 7 - x
This isn't yet a win. But it becomes one when you rewrite the ranges in terms of a standard set of intervals. First the original array
from 0 to 5 1 + 0x
Now the array after update:
from 0 to 5, 1 + 0x +
from 2 to 3, -1 + x
from 4 to 5, 7 - x
This can be represented compactly in 2 arrays:
m = [0, 0, 1, 0, -1, 0]
b = [1, 0, -1, 0, 7, 0]
And as complicated as it feels, now both updates and lookups wind up with O(log(n)) work.
For example for a lookup:
def rising_binary (n):
power = 1
m = 0
yield m
while m < n:
if n & power:
m += power
yield m
power *= 2
...
answer = 0
for bin in rising_binary(k):
answer += m[bin] * k + b[bin]

Julia / Cellular Automata: efficient way to get neighborhood

I'd like to implement a cellular automaton (CA) in Julia. Dimensions should be wrapped, this means: the left neighbor of the leftmost cell is the rightmost cell etc.
One crucial question is: how to get the neighbors of one cell to compute it's state in the next generation? As dimensions should be wrapped and Julia does not allow negative indices (as in Python) i had this idea:
Considered a 1D CA, one generation is a one-dimensional array:
0 0 1 0 0
What if we create a two dimensional Array, where the first row is shifted right and the third is shifted left, like this:
0 0 0 1 0
0 0 1 0 0
0 1 0 0 0
Now, the first column contain the states of the first cell and it's neighbors etc.
i think this can easily be generalized for two and more dimensions.
First question: do you think this is a good idea, or is this a wrong track?
EDIT: Answer to first question was no, second Question and code example discarded.
Second question: If the approach is basically ok, please have a look at the following sketch:
EDIT: Other approach, here is a stripped down version of a 1D CA, using mod1() for getting neighborhood-indices, as Bogumił Kamiński suggested.
for any cell:
- A array of all indices
- B array of all neighborhood states
- C states converted to one integer
- D lookup next state
function digits2int(digits, base=10)
int = 0
for digit in digits
int = int * base + digit
end
return int
end
gen = [0,0,0,0,0,1,0,0,0,0,0]
rule = [0,1,1,1,1,0,0,0]
function nextgen(gen, rule)
values = [mod1.(x .+ [-1,0,1], size(gen)) for x in 1:length(gen)] # A
values = [gen[value] for value in values] # B
values = [digits2int(value, 2) for value in values] # C
values = [rule[value+1] for value in values] # D
return values
end
for _ in 1:100
global gen
println(gen)
gen = nextgen(gen, rule)
end
Next step should be to extend it to two dimensions, will try it now...
The way I typically do it is to use mod1 function for wrapped indexing.
In this approach, no matter what dimensionality of your array a is then when you want to move from position x by delta dx it is enough to write mod1(x+dx, size(a, 1)) if x is the first dimension of an array.
Here is a simple example of a random walk on a 2D torus counting the number of times a given cell was visited (here I additionally use broadcasting to handle all dimensions in one expression):
function randomwalk()
a = zeros(Int, 8, 8)
pos = (1,1)
for _ in 1:10^6
# Von Neumann neighborhood
dpos = rand(((1,0), (-1,0), (0,1), (0,-1)))
pos = mod1.(pos .+ dpos, size(a))
a[pos...] += 1
end
a
end
Usually, if the CA has cells that are only dependent on the cells next to them, it's simpler just to "wrap" the vector by adding the last element to the front and the first element to the back, doing the simulation, and then "unwrap" by taking the first and last elements away again to get the result length the same as the starting array length. For the 1-D case:
const lines = 10
const start = ".........#........."
const rules = [90, 30, 14]
rule2poss(rule) = [rule & (1 << (i - 1)) != 0 for i in 1:8]
cells2bools(cells) = [cells[i] == '#' for i in 1:length(cells)]
bools2cells(bset) = prod([bset[i] ? "#" : "." for i in 1:length(bset)])
function transform(bset, ruleposs)
newbset = map(x->ruleposs[x],
[bset[i + 1] * 4 + bset[i] * 2 + bset[i - 1] + 1
for i in 2:length(bset)-1])
vcat(newbset[end], newbset, newbset[1])
end
const startset = cells2bools(start)
for rul in rules
println("\nUsing Rule $rul:")
bset = vcat(startset[end], startset, startset[1]) # wrap ends
rp = rule2poss(rul)
for _ in 1:lines
println(bools2cells(bset[2:end-1])) # unwrap ends
bset = transform(bset, rp)
end
end
As long as only the adjacent cells are used in the simulation for any given cell, this is correct.
If you extend this to a 2D matrix, you would also "wrap" the first and last rows as well as the first and last columns, and so forth.

matlab: how to speed up the count of consecutive values in a cell array

I have the 137x19 cell array Location(1,4).loc and I want to find the number of times that horizontal consecutive values are present in Location(1,4).loc. I have used this code:
x=Location(1,4).loc;
y={x(:,1),x(:,2)};
for ii=1:137
cnt(ii,1)=sum(strcmp(x(:,1),y{1,1}{ii,1})&strcmp(x(:,2),y{1,2}{ii,1}));
end
y={x(:,1),x(:,2),x(:,3)};
for ii=1:137
cnt(ii,2)=sum(strcmp(x(:,1),y{1,1}{ii,1})&strcmp(x(:,2),y{1,2}{ii,1})&strcmp(x(:,3),y{1,3}{ii,1}));
end
y={x(:,1),x(:,2),x(:,3),x(:,4)};
for ii=1:137
cnt(ii,3)=sum(strcmp(x(:,1),y{1,1}{ii,1})&strcmp(x(:,2),y{1,2}{ii,1})&strcmp(x(:,3),y{1,3}{ii,1})&strcmp(x(:,4),y{1,4}{ii,1}));
end
y={x(:,1),x(:,2),x(:,3),x(:,4),x(:,5)};
for ii=1:137
cnt(ii,4)=sum(strcmp(x(:,1),y{1,1}{ii,1})&strcmp(x(:,2),y{1,2}{ii,1})&strcmp(x(:,3),y{1,3}{ii,1})&strcmp(x(:,4),y{1,4}{ii,1})&strcmp(x(:,5),y{1,5}{ii,1}));
end
... continue for all the columns. This code run and gives me the correct result but it's not automated and it's slow. Can you give me ideas to automate and speed up the code?
I think I will write an answer to this since I've not done so for a while.
First convert your cell Array to a matrix,this will ease the following steps by a lot. Then diff is the way to go
A = randi(5,[137,19]);
DiffA = diff(A')'; %// Diff creates a matrix that is 136 by 19, where each consecutive value is subtracted by its previous value.
So a 0 in DiffA would represent 2 consecutive numbers in A are equal, 2 consecutive 0s would mean 3 consecutive numbers in A are equal.
idx = DiffA==0;
cnt(:,1) = sum(idx,2);
To do 3 consecutive number counts, you could do something like:
idx2 = abs(DiffA(:,1:end-1))+abs(DiffA(:,2:end)) == 0;
cnt(:,2) = sum(idx2,2);
Or use another Diff, the abs is used to avoid negative number + positive number that also happens to give 0; otherwise only 0 + 0 will give you a 0; you can now continue this pattern by doing:
idx3 = abs(DiffA(:,1:end-2))+abs(DiffA(:,2:end-1))+abs(DiffA(:,3:end)) == 0
cnt(:,3) = sum(idx3,2);
In loop format:
absDiffA = abs(DiffA)
for ii = 1:W
absDiffA = abs(absDiffA(:,1:end-1) + absDiffA(:,1+1:end));
idx = (absDiffA == 0);
cnt(:,ii) = sum(idx,2);
end
NOTE: this method counts [0,0,0] twice when evaluating 2 consecutives, and once when evaluating 3 consecutives.

Substitute a vector value with two values in MATLAB

I have to create a function that takes as input a vector v and three scalars a, b and c. The function replaces every element of v that is equal to a with a two element array [b,c].
For example, given v = [1,2,3,4] and a = 2, b = 5, c = 5, the output would be:
out = [1,5,5,3,4]
My first attempt was to try this:
v = [1,2,3,4];
v(2) = [5,5];
However, I get an error, so I do not understand how to put two values in the place of one in a vector, i.e. shift all the following values one position to the right so that the new two values fit in the vector and, therefore, the size of the vector will increase in one. In addition, if there are several values of a that exist in v, I'm not sure how to replace them all at once.
How can I do this in MATLAB?
Here's a solution using cell arrays:
% remember the indices where a occurs
ind = (v == a);
% split array such that each element of a cell array contains one element
v = mat2cell(v, 1, ones(1, numel(v)));
% replace appropriate cells with two-element array
v(ind) = {[b c]};
% concatenate
v = cell2mat(v);
Like rayryeng's solution, it can replace multiple occurrences of a.
The problem mentioned by siliconwafer, that the array changes size, is here solved by intermediately keeping the partial arrays in cells of a cell array. Converting back to an array concenates these parts.
Something I would do is to first find the values of v that are equal to a which we will call ind. Then, create a new output vector that has the output size equal to numel(v) + numel(ind), as we are replacing each value of a that is in v with an additional value, then use indexing to place our new values in.
Assuming that you have created a row vector v, do the following:
%// Find all locations that are equal to a
ind = find(v == a);
%// Allocate output vector
out = zeros(1, numel(v) + numel(ind));
%// Determine locations in output vector that we need to
%// modify to place the value b in
indx = ind + (0:numel(ind)-1);
%// Determine locations in output vector that we need to
%// modify to place the value c in
indy = indx + 1;
%// Place values of b and c into the output
out(indx) = b;
out(indy) = c;
%// Get the rest of the values in v that are not equal to a
%// and place them in their corresponding spots.
rest = true(1,numel(out));
rest([indx,indy]) = false;
out(rest) = v(v ~= a);
The indx and indy statements are rather tricky, but certainly not hard to understand. For each index in v that is equal to a, what happens is that we need to shift the vector over by 1 for each index / location of v that is equal to a. The first value requires that we shift the vector over to the right by 1, then the next value requires that we shift to the right by 1 with respect to the previous shift, which means that we actually need to take the second index and shift by the right by 2 as this is with respect to the original index.
The next value requires that we shift to the right by 1 with respect to the second shift, or shifting to the right by 3 with respect to the original index and so on. These shifts define where we're going to place b. To place c, we simply take the indices generated for placing b and move them over to the right by 1.
What's left is to populate the output vector with those values that are not equal to a. We simply define a logical mask where the indices used to populate the output array have their locations set to false while the rest are set to true. We use this to index into the output and find those locations that are not equal to a to complete the assignment.
Example:
v = [1,2,3,4,5,4,4,5];
a = 4;
b = 10;
c = 11;
Using the above code, we get:
out =
1 2 3 10 11 5 10 11 10 11 5
This successfully replaces every value that is 4 in v with the tuple of [10,11].
I think that strrep deserves a mention here.
Although it's called string replacement and warns for non-char input, it still works perfectly fine for other numbers as well (including integers, doubles and even complex numbers).
v = [1,2,3,4]
a = 2, b = 5, c = 5
out = strrep(v, a, [b c])
Warning: Inputs must be character arrays or cell arrays of strings.
out =
1 5 5 3 4
You are not attempting to overwrite an existing value in the vector. You're attempting to change the size of the vector (meaning the number of rows or columns in the vector) because you're adding an element. This will always result in the vector being reallocated in memory.
Create a new vector, using the first and last half of v.
Let's say your index is stored in the variable index.
index = 2;
newValues = [5, 5];
x = [ v(1:index), newValues, v(index+1:end) ]
x =
1 2 5 5 3 4

Matlab: Help in implementing quantized time series

I am having trouble implementing this code due to the variable s_k being logical 0/1. In what way can I implement this statement?
s_k is a random sequence of 0/1 generated using a rand() and quantizing the output of rand() by its mean given below. After this, I don't know how to implement. Please help.
N =1000;
input = randn(N);
s = (input>=0.5); %converting into logical 0/1;
UPDATE
N = 3;
tmax = 5;
y(1) = 0.1;
for i =1 : tmax+N-1 %// Change here
y(i+1) = 4*y(i)*(1-y(i)); %nonlinear model for generating the input to Autoregressive model
end
s = (y>=0.5);
ind = bsxfun(#plus, (0:tmax), (0:N-1).');
x = sum(s(ind+1).*(2.^(-ind+N+1))); % The output of this conversion should be real numbers
% Autoregressive model of order 1
z(1) =0;
for j =2 : N
z(j) = 0.195 *z(j-1) + x(j);
end
You've generated the random logical sequence, which is great. You also need to know N, which is the total number of points to collect at one time, as well as a list of time values t. Because this is a discrete summation, I'm going to assume the values of t are discrete. What you need to do first is generate a sliding window matrix. Each column of this matrix represents a set of time values for each value of t for the output. This can easily be achieved with bsxfun. Assuming a maximum time of tmax, a starting time of 0 and a neighbourhood size N (like in your equation), we can do:
ind = bsxfun(#plus, (0:tmax), (0:N-1).');
For example, assuming tmax = 5 and N = 3, we get:
ind =
0 1 2 3 4 5
1 2 3 4 5 6
2 3 4 5 6 7
Each column represents a time that we want to calculate the output at and every row in a column shows a list of time values we want to calculate for the desired output.
Finally, to calculate the output x, you simply take your s_k vector, make it a column vector, use ind to access into it, do a point-by-point multiplication with 2^(-k+N+1) by substituting k with what we got from ind, and sum along the rows. So:
s = rand(max(ind(:))+1, 1) >= 0.5;
x = sum(s(ind+1).*(2.^(-ind+N+1)));
The first statement generates a random vector that is as long as the maximum time value that we have. Once we have this, we use ind to index into this random vector so that we can generate a sliding window of logical values. We need to offset this by 1 as MATLAB starts indexing at 1.

Resources