Repeated array sampling yields duplicates. Why? (Ruby) - arrays

In Ruby:
I sample element from array. I see duplicate (same element) every 30 samples or so. Sometimes as close as 5-6 samples apart. Why?
This is my code:
some_array = IO.readlines("file with 5000 unique elements")
some_array.shuffle!
#random_element = some_array.sample
puts #random_element

If you want n random non-duplicate elements from your array, you should call some_array.sample(n).
Sample doesn't guarantee that two consecutive calls will not contain duplicates; it guarantees that all elements chosen in one call won't.

Related

Scala Collections - Making a new array from an existing using certain rules

I'm trying to make a function in Scala that when -
println(thisIsAFunction(Array(1,1,4,5,4,4)))
is called, it will give me a new array that is always half the size of the array input (the array input will always be an even number and bigger than 2), and in the new array it will not contain the same number twice. So for example, in the above statement, it would return an array consisting of the numbers 1, 4 and 5. I have been trying to approach this a number of ways but can't get my head round it.
def thisIsAFunction(a: Array[Int]) =
a.distinct.take(a.length/2)
The distinct method removes any duplicates from the Array so that it does not contain the same number twice.
The take methods reads the first n elements of the result, so take(a.length/2) gives an Array that is half the length of the original.

Match each element of one array with elements of other array without loops

I want to match each element of one array (lessnum) with elements of the other array say (cc). Then multiply with a number from the third array (gl). I am doing using loops. The length of arrays are very large therefore it takes couple of hours. Is it possible to do without loops or make it faster. Here is the code, I am doing,
uniquec=sort(unique(cc));
maxc=max(uniquec);
c35p=0.35*maxc;
lessnum=uniquec(uniquec<=c35p);
greaternum=uniquec(uniquec>c35p);
gl=linspace(1,2,length(lessnum));
gr=linspace(2,1,length(greaternum));
newC=zeros(size(cc));
for i=1:length(gl)
newC(cc==lessnum(i))= cc(cc==lessnum(i)).*gl(i);
end
for i=1:length(gr)
newC(cc==greaternum(i))= cc(cc==greaternum(i)).*gr(i);
end
What you need to do is instead of storing the values that are less than or greater than c35p in lessnum and greaternum, respectively, you should store the indices of these numbers. That way, you can directly access the newC variable using these indices and then multiply your linearly generated values.
Further modifications are explained in the code itself. If you have any confusion you can read the help for unique
Here is the modified code (I assume that cc is a one-dimensional array)
%randomly generate a cc vector
cc = randi(100, 1, 10);
% modified code below
[uniquec, ~, induniquec]=unique(cc, 'sorted'); % modified to explicitly specify the inbuilt sorting capability of unique and generate the indicies of unique values in the array
maxc=max(uniquec);
c35p=0.35*maxc;
lessnum=uniquec<=c35p; % instead of lessnum=uniquec(uniquec<=c35p);
greaternum=uniquec>c35p; % instead of greaternum=uniquec(uniquec>c35p);
gl=linspace(1,2,sum(lessnum));
gr=linspace(2,1,sum(greaternum));
% now there is no need for 'for' loops. We first modify the unique values as specified and then regenerate the required matrix using the indices obtained previously
newC=uniquec;
newC(lessnum) = newC(lessnum) .* gl;
newC(greaternum) = newC(greaternum) .* gr;
newC = newC(induniquec);
This new code will run much faster than the original one but is much more memory intensive depending on the number of unique values in your original array.

No. of paths in integer array

There is an integer array, for eg.
{3,1,2,7,5,6}
One can move forward through the array either each element at a time or can jump a few elements based on the value at that index. For e.g., one can go from 3 to 1 or 3 to 7, then one can go from 1 to 2 or 1 to 2(no jumping possible here), then one can go 2 to 7 or 2 to 5, then one can go 7 to 5 only coz index of 7 is 3 and adding 7 to 3 = 10 and there is no tenth element.
I have to only count the number of possible paths to reach the end of the array from start index.
I could only do it recursively and naively which runs in exponential time.
Somebody plz help.
My recommendation: use dynamic programming.
If this key word is sufficient and you want the challenge to find a possible solution on your own, dont read any further!
Here a possible DP-algorithm on the example input {3,1,2,7,5,6}. It will be your job to adjust on the general problem.
create array sol length 6 with just zeros in it. the array will hold the number of ways.
sol[5] = 1;
for (i = 4; i>=0;i--) {
sol[i] = sol[i+1];
if (i+input[i] < 6 && input[i] != 1)
sol[i] += sol[i+input[i]];
}
return sol[0];
runtime O(n)
As for the directed graph solution hinted in the comments :
Each cell in the array represents a node. Make an directed edge from each node to the node accessable. Basically you can then count more easily the number of ways by just looking at the outdegrees on the nodes (since there is no directed cycle) however it is a lot of boiler plate to actual program it.
Adjusting the recursive solution
another solution would be to pruning. This is basically equivalent to the DP-algorithm. The exponentiel time comes from the fact, that you calculate values several times. Eg function is recfunc(index). The initial call recFunc(0) calls recFunc(1) and recFunc(3) and so on. However recFunc(3) is bound to be called somewhen again, which leads to a repeated recursive calculation. To prune this you add a Map to hold all already calculated values. If you make a call recFunc(x) you lookup in the map if x was already calculated. If yes, return the stored value. If not, calculate, store and return it. This way you get a O(n) too.

Reallocating/Erasing numpy array vs. new allocation in loop

In my program I need to work with arrays roughly 500x500 to 1500x1500 within a function that is looped over 1000's of times. In each iteration, I need to start with an array that has the same form (whose dimensions are fixed across all iterations). The initial values will be:
[0 0 0 ... 1]
[0 0 0 ... 1]
....
However, the contents of the array will be modified within the loop. What is the most efficient way to "reset" the array to this format so I can pass the same array to the function every time without having to allocate a new set of memory every time? (I know the range of rows that were modified)
I have tried:
a[first_row_modified:last_row_modified,:] = 0.
a[first_row_modified:last_row_modified,:-1] = 1.
but it takes roughly the same amount of time as just creating a new array every time with the following:
a = zeros((sizeArray, sizeArray))
a[:,-1] = 1.
Is there a faster way to effectively "erase" the array and change the last column to ones? I think this is similar to this question, clearing elements of numpy array , although my array doesn't change sizes and i didn't see the definitive answer to the previously asked question.
No; I think the way you are doing it is about as fast as it gets.

Help with a special case of permutations algorithm (not the usual)

I have always been interested in algorithms, sort, crypto, binary trees, data compression, memory operations, etc.
I read Mark Nelson's article about permutations in C++ with the STL function next_perm(), very interesting and useful, after that I wrote one class method to get the next permutation in Delphi, since that is the tool I presently use most. This function works on lexographic order, I got the algo idea from a answer in another topic here on stackoverflow, but now I have a big problem. I'm working with permutations with repeated elements in a vector and there are lot of permutations that I don't need. For example, I have this first permutation for 7 elements in lexographic order:
6667778 (6 = 3 times consecutively, 7 = 3 times consecutively)
For my work I consider valid perm only those with at most 2 elements repeated consecutively, like this:
6676778 (6 = 2 times consecutively, 7 = 2 times consecutively)
In short, I need a function that returns only permutations that have at most N consecutive repetitions, according to the parameter received.
Does anyone know if there is some algorithm that already does this?
Sorry for any mistakes in the text, I still don't speak English very well.
Thank you so much,
Carlos
My approach is a recursive generator that doesn't follow branches that contain illegal sequences.
Here's the python 3 code:
def perm_maxlen(elements, prefix = "", maxlen = 2):
if not elements:
yield prefix + elements
return
used = set()
for i in range(len(elements)):
element = elements[i]
if element in used:
#already searched this path
continue
used.add(element)
suffix = prefix[-maxlen:] + element
if len(suffix) > maxlen and len(set(suffix)) == 1:
#would exceed maximum run length
continue
sub_elements = elements[:i] + elements[i+1:]
for perm in perm_maxlen(sub_elements, prefix + element, maxlen):
yield perm
for perm in perm_maxlen("6667778"):
print(perm)
The implentation is written for readability, not speed, but the algorithm should be much faster than naively filtering all permutations.
print(len(perm_maxlen("a"*100 + "b"*100, "", 1)))
For example, it runs this in milliseconds, where the naive filtering solution would take millenia or something.
So, in the homework-assistance kind of way, I can think of two approaches.
Work out all permutations that contain 3 or more consecutive repetitions (which you can do by treating the three-in-a-row as just one psuedo-digit and feeding it to a normal permutation generation algorithm). Make a lookup table of all of these. Now generate all permutations of your original string, and look them up in lookup table before adding them to the result.
Use a recursive permutation generating algorthm (select each possibility for the first digit in turn, recurse to generate permutations of the remaining digits), but in each recursion pass along the last two digits generated so far. Then in the recursively called function, if the two values passed in are the same, don't allow the first digit to be the same as those.
Why not just make a wrapper around the normal permutation function that skips values that have N consecutive repetitions? something like:
(pseudocode)
funciton custom_perm(int max_rep)
do
p := next_perm()
while count_max_rerps(p) < max_rep
return p
Krusty, I'm already doing that at the end of function, but not solves the problem, because is need to generate all permutations and check them each one.
consecutive := 1;
IsValid := True;
for n := 0 to len - 2 do
begin
if anyVector[n] = anyVector[n + 1] then
consecutive := consecutive + 1
else
consecutive := 1;
if consecutive > MaxConsecutiveRepeats then
begin
IsValid := False;
Break;
end;
end;
Since I do get started with the first in lexographic order, ends up being necessary by this way generate a lot of unnecessary perms.
This is easy to make, but rather hard to make efficient.
If you need to build a single piece of code that only considers valid outputs, and thus doesn't bother walking over the entire combination space, then you're going to have some thinking to do.
On the other hand, if you can live with the code internally producing all combinations, valid or not, then it should be simple.
Make a new enumerator, one which you can call that next_perm method on, and have this internally use the other enumerator, the one that produces every combination.
Then simply make the outer enumerator run in a while loop asking the inner one for more permutations until you find one that is valid, then produce that.
Pseudo-code for this:
generator1:
when called, yield the next combination
generator2:
internally keep a generator1 object
when called, keep asking generator1 for a new combination
check the combination
if valid, then yield it

Resources