Modifying an array during iteration - arrays

Consider the excerpt:
a = [1, 2, 3, 4, 5]
a.each { |e| a.shift ; p e ; p a }
that outputs:
1
[2, 3, 4, 5]
3
[3, 4, 5]
5
[4, 5]
It reveals that the implementation of each is done in terms of an index (1 is the element at position 0 when printed, 3 is the element at 1 when printed, and 5 is the element at position 2 when printed).
An alternative would be to print 1, 2, 3.
Is this behaviour intended? Or is it just implementation detail, and it is possible that someday Array gets reimplemented and this behavior may change?

Yes, this is a well known behaviour.
It is likely that the behaviour is due to implementation reason, i.e., efficiency, etc.
But whether or not it is just implementation detail does not have much significance. Ruby developers are concerned about not breaking backward compatibility, and it is very unlikely that such basic behaviour will be altered. Even if they were to go on, they will have a reasonably long span of transition period.
Regardless of that, it is usually not good to modify an array during iteration from the point of view of readability. In the relevant cases, it is likely that you should duplicate the array.

Related

How to index a Julia array

I am having trouble understanding what seems like an inconsistent behavior in Julia.
X = reshape(1:100, 10, 10)
b = [1 5 9]
X[2, :][b] # returns the correct array
X[2, :][1 5 9] # throws an error
Can someone explain why using the variable b works to index an array but not when I write the index myself?
Since x = X[2,:] is just a vector, we can simplify the example to just talking about indexing behavior on vectors.
x[v] where v is a collection of integers returns the subset of x. Thus x[(1,5,9)], or x[[1,5,9]] is thus using that getindex(x::Vector,i::AbstractArray) dispatch.
Note that x[[1 5 9]] works because v = [1 5 9] makes v a row vector. That's valid syntax, but x[1 5 9] just isn't even valid Julia syntax. That syntax means something else:
v = Float64[1 5 9]
returns a row vector with element type Float64.
I have figured out a solution.
Rather than write X[2, :][1 5 9] I should have written x[2, :][[1 5 9]]
I believe this makes sense when we imagine indexing on two dimensions the second time. This makes it possible to write more complicate indices, like X[2:4, :][[1 3],[1 3]]

Average between arrays of different length

I'm trying to develop a sort of very simple machine learning example to recognize similarity between arrays.
For this reason I'm trying to calculate the average between 2 arrays with different length.
For example if I have:
array_1 = [0, 4, 5];
array_2 = [4, 2, 7];
The average is:
average_array = [2, 3, 6];
But how can I manage to calculate the average if I have the following situation:
array_1 = [0, 4, 5, 10, 7];
array_2 = [4, 2, 7];
As you can see the arrays have a different length.
Is there an algorithm that I can apply to solve this problems?
Does anyone have an idea or some suggestion?
Of course I can consider the missing values of the second array as 0, and evaluate the average as, for example:
average_array = [2, 3, 6, 5, 3.5];
or consider the values as "null" and have:
average_array = [2, 3, 6, 10, 7];
But are this two approach good?
Or there is something smarter?
Thanks for your help!!
To answer your question, we really need more information on what you are trying to achieve.
I'm trying to develop a sort of very simple machine learning example
to recognize similarity between arrays. For this reason I'm trying to
calculate the average between 2 arrays with different length.
Depending on your usecase, similarity might be defined completely differently.
For instance:
if the array encodes sound-information you might want to measure similarity as "does this sound clip occur in this one" or "are the main frequencies (which would correspond to chords) the same"
if the array encodes image information (properly DFT-ed and zig-zag-encoded) you might not care about the low frequencies (end of the array) and only measure the difference between the first few values of the array
if the array encodes some kind of composition of elements (e.g. this essay contains keyword "matrix" 40 times, and keyword "SVM" 27 times) the difference in values might be very important.
General advice:
Think about what you're measuring
Decide what's important
But in general, have a look at smoothing algorithms. For instance Kneyser-Ney or Good-Turing smoothing. They explictly deal with comparing a vector of probabilities that may differ in length (in other words, have explicit zero entries)
https://en.wikipedia.org/wiki/Good%E2%80%93Turing_frequency_estimation
If after taking the the average of the arrays, you intend to take the mod of the difference of the array and the average array, then you are probably in the right direction if you will measure the dissimilarity by the magnitude of the difference.
But for arrays of different lengths I propose that you also take the index of extra elements in consideration.
For
array_1 = [0, 4, 5, 10, 7];
array_2 = [4, 2, 7];
average should be average_array = [2, 3, 6, 6.5, 5.5];
6.5 = (10 + 3(index) + 0(element) ) / 2
and
5.5 = (7 + 4(index) + 0(element))/2
Reason for taking index into consideration is that the length factor is also dealth with this approach. However this is just my 2 cents. May be there are better algorithms out there.
You should also take a look at this post

Efficient way of finding sequential numbers across multiple arrays?

I'm not looking for any code or having anything being done for me. I need some help to get started in the right direction but do not know how to go about it. If someone could provide some resources on how to go about solving these problems I would very much appreciate it. I've sat with my notebook and am having trouble designing an algorithm that can do what I'm trying to do.
I can probably do:
foreach element in array1
foreach element in array2
check if array1[i] == array2[j]+x
I believe this would work for both forward and backward sequences, and for the multiples just check array1[i] % array2[j] == 0. I have a list which contains int arrays and am getting list[index] (for array1) and list[index+1] for array2, but this solution can get complex and lengthy fast, especially with large arrays and a large list of those arrays. Thus, I'm searching for a better solution.
I'm trying to come up with an algorithm for finding sequential numbers in different arrays.
For example:
[1, 5, 7] and [9, 2, 11] would find that 1 and 2 are sequential.
This should also work for multiple sequences in multiple arrays. So if there is a third array of [24, 3, 15], it will also include 3 in that sequence, and continue on to the next array until there isn't a number that matches the last sequential element + 1.
It also should be able to find more than one sequence between arrays.
For example:
[1, 5, 7] and [6, 3, 8] would find that 5 and 6 are sequential and also 7 and 8 are sequential.
I'm also interested in finding reverse sequences.
For example:
[1, 5, 7] and [9, 4, 11]would return 5 and 4 are reverse sequential.
Example with all:
[1, 5, 8, 11] and [2, 6, 7, 10] would return 1 and 2 are sequential, 5 and 6 are sequential, 8 and 7 are reverse sequential, 11 and 10 are reverse sequential.
It can also overlap:
[1, 5, 7, 9] and [2, 6, 11, 13] would return 1 and 2 sequential, 5 and 6 sequential and also 7 and 6 reverse sequential.
I also want to expand this to check numbers with a difference of x (above examples check with a difference of 1).
In addition to all of that (although this might be a different question), I also want to check for multiples,
Example:
[5, 7, 9] and [10, 27, 8] would return 5 and 10 as multiples, 9 and 27 as multiples.
and numbers with the same ones place.
Example:
[3, 5, 7] and [13, 23, 25] would return 3 and 13 and 23 have the same ones digit.
Use a dictionary (set or hashmap)
dictionary1 = {}
Go through each item in the first array and add it to the dictionary.
[1, 5, 7]
Now dictionary1 = {1:true, 5:true, 7:true}
dictionary2 = {}
Now go through each item in [6, 3, 8] and lookup if it's part of a sequence.
6 is part of a sequence because dictionary1[6+1] == true
so dictionary2[6] = true
We get dictionary2 = {6:true, 8:true}
Now set dictionary1 = dictionary2 and dictionary2 = {}, and go to the third array.. and so on.
We only keep track of sequences.
Since each lookup is O(1), and we do 2 lookups per number, (e.g. 6-1 and 6+1), the total is n*O(1) which is O(N) (N is the number of numbers across all the arrays).
The brute force approach outlined in your pseudocode will be O(c^n) (exponential), where c is the average number of elements per array and n is the number of total arrays.
If the input space is sparse (meaning there will be more missing numbers on average than presenting numbers), then one way to speed up this process is to first create a single sorted set of all the unique numbers from all your different arrays. This "master" set will then allow you to early exit (i.e. break statements in your loops) on any sequences which are not viable.
For example, if we have input arrays [1, 5, 7] and [6, 3, 8] and [9, 11, 2], the master ordered set would be {1, 2, 3, 5, 6, 7, 8, 9, 11}. If we are looking for n+1 type sequences, we could skip ever continuing checking any sequence that contains a 3 or 9 or 11 (because the n+1 value in not present at the next index in the sorted set. While the speedups are not drastic in this particular example, if you have hundreds of input arrays and very large range of values for n (sparsity), then the speedups should be exponential because you will be able to early exit on many permutations. If the input space is not sparse (such as in this example where we didn't have many holes), the speedups will be less than exponential.
A further improvement would be to store a "master" set of key-value pairs, where the key is the n value as shown in the example above, and the value portion of the pair is a list of the indices of any arrays that contain that value. The master set of the previous example would then be: {[1, 0], [2, 2], [3, 1], [5, 0], [6, 1], [7, 0], [8, 1], [9, 2], [11, 2]}. With this architecture, scan time could potentially be as low as O(c*n), because you could just traverse this single sorted master set looking for valid sequences instead of looping over all the sub-arrays. By also requiring the array indexes to increment, you can clearly see that the 1->2 sequence can be skipped because the arrays are not in the correct order, and the same with the 2->3 sequence, etc. Note this toy example is somewhat oversimplified because in practice you would need a list of indices for the value portions of the key-value pairs. This would be necessary if the same value of n ever appeared in multiple arrays (duplicate values).

Vectorizing Matlab replace array values from start to end

I have an array in which I want to replace values at a known set of indices with the value immediately preceding it. As an example, my array might be
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 0];
and the indices of values to be replaced by previous values might be
y = [2, 3, 8];
I want this replacement to occur from left to right, or else start to finish. That is, the value at index 2 should be replaced by the value at index 1, before the value at index 3 is replaced by the value at index 2. The result using the arrays above should be
[1, 1, 1, 4, 5, 6, 7, 7, 9, 0]
However, if I use the obvious method to achieve this in Matlab, my result is
>> x(y) = x(y-1)
x =
1 1 2 4 5 6 7 7 9 0
Hopefully you can see that this operation was performed right to left and the value at index 3 was replaced by the value at index 2, then 2 was replaced by 1.
My question is this: Is there some way of achieving my desired result in a simple way, without brute force looping over the arrays or doing something time consuming like reversing the arrays around?
Well, practically this is a loop but the order is number of consecutive index elements
while ~isequal(x(y),x(y-1))
x(y)=x(y-1)
end
Using nancumsum you can achieve a fully vectorized version. Nevertheless, for most cases the solution karakfa provided is probably one to prefer. Only for extreme cases with long sequences in y this code is faster.
c1=[0,diff(y)==1];
c1(c1==0)=nan;
shift=nancumsum(c1,2,4);
y(~isnan(shift))=y(~isnan(shift))-shift(~isnan(shift));
x(y)=x(y-1)

What is the best way to perform vector crossover in genetic algorithm?

I'm using genetic algorithm "to learn" the best parameters for a draughts/checkers AI. This parameters are stored in a vector of double.
[x1 x2 x3 x4 x5 x6 x7 x8 x9]
Actually I do the crossover using two simple methods: one-point crossover and two-point crossover. Unfortunately, in my opinion, this methods are not good enough.
For example if I have a genetic pool with:
[10 20 1]
[30 10 9]
[100 1 10]
If the theoretical optimum for x1 value is 50 I can't never find it by crossover. My only hope is to spawn a mutation with x1=50 good enough to pass in the next generation.
So, there is a better way to perform crossover with an array of numbers?
It seems that you have an encoding problem,- not a crossover. If you want more variability in chromosome - then encode data as sequence of bytes (or even bits).
Suppose you have 3 integer parameters,- then you can represent them as 3*4=12 byte vector:
{114,2,0,214, // first 32-bit int
14,184,220,7, // second 32-bit int
145,2,32,12, // etc...
}
then after crossover your ints will evolve with great variability. Also you can use not 1/2 point crossover, but uniform crossover - when at each chromosome point you will randomly decide what gene version you will use. In such case you will get even more variability. But keep in mind that too much variability in crossover is also disastrous because results in population which may never reach optimal solution, because even sub-optimal solution are teared apart by big random fluctuations in crossover operation. Stabilized evolution is main keyword here.
Another approach - is not to use genetic algorithm, but evolution strategy algorithms which changes all genes in chromosome. But this approach is feasible if number of different gene versions is not very big. So this may not fit your problem with floats/doubles.
HTH!
It really depends on how the fitness function. In the crossover you could also average over the values (again, if it make sense for the fitness function) but probably this would drive the algorithm to converge too easily to a population with very similar individuals.
I think that is the mutation that should drive the single values toward the best ones, you should get 50 because of the mutation if you can't get it because of the crossover.
Consider doing some kind of local search on the single individuals as well (memetic algorithm).
There exist a very huge number of possible crossover (and mutation) and the literature about it is almost infinite. If you wish to use that representation (vector of double) then you might want to look at the simulated binary crossover or blend crossover and gaussian mutation operator, they are most likely gonna help you to find children that are blends of their parents genes rather than simple exchanges.
For example, the simulated binary with eta = 0.5 will give (there is randomization implied) from those two parents
[30 10 9]
[100 1 10]
The two childs
[52 8 9]
[77 2 10]
As far as I know, almost all major EC frameworks implement those operators (Open Beagle, ECJ, DEAP, EO, etc.)
The crossover algorithm in my GA is different than what you are using--not better, just different. In sum, rather than substitution, i coded crossover as an array splicing/concatenation operation in which the splicing point is randomized (and also 'synchronized' so that when the two spliced portions are assembled the child vector that results is the same length as each parent.
I think it's much easier to explain in code:
DOMAIN_LENGTH = 14
def crossover(v1, v2):
crossover_point = random.randint(1, DOMAIN_LENGTH-2)
return v1[:crossover_point] + v2[crossover_point:]
# create a simple function to generate a couple of 'parent' vectors
>>> fnx = lambda v : [random.choice(range(10)) for c in range(DOMAIN_LENGTH)]
# now generate those parent vectors
>>> v1 = fnx(DOMAIN_LENGTH)
>>> v2 = fnx(DOMAIN_LENGTH)
>>> v1
[7, 9, 5, 6, 6, 7, 6, 9, 8, 6, 6, 4, 5, 8]
>>> v2
[2, 2, 9, 7, 1, 4, 6, 9, 0, 7, 1, 9, 3, 0]
>>> len(v1); len(v2)
14
14
# create the child vector via crossover
>>> child_01 = crossover(v1, v2)
>>> child_01
[7, 9, 9, 7, 1, 4, 6, 9, 0, 7, 1, 9, 3, 0]
>>> len(child_01)
14
so for:
domain size (vector length) of 5
a *crossover_point* of 2, and t
he two parent vectors are [4, 3, 2, 4, 8] and [1, 3, 1, 6, 3]
then:
# fragment contributed from first parent:
>>> f1 = p1[:2]
>>> f1
[4, 3]
# fragment contributed from second parent:
>>> f2 = p2[2:]
>>> f2
[1, 6, 3]
# now just concatenate the two fragments to produce the child fragment
>>> child = f1 + f2
>>> child
[4, 3, 1, 6, 3]
>>> len(child) == len(p2)
True

Resources