How to shuffle an array using a seed whereby the sequence of elements remains consistent when additional items are added to the array - arrays

Note: This question can apply to any programming language, for example Python or JavaScript.
How would you shuffle an array of elements deterministically with a seed, but where the following is also guaranteed:
If you add an additional element to the array before shuffling, the sequence of the original elements remains the same as with shuffling the original array.
I can probably explain this better with an example:
Let's say the array [a, b, c] is shuffled with seed 123, and this results in the output [c, a, b].
As you can see, b comes after a, and a comes after c.
We add an additional element to the end of the array, [a, b, c, d], and proceed to shuffle with seed 123.
This time, b must still come after a, and a must still come after c.
The output might be [c, a, d, b] or [d, c, a, b], but cannot be [b, a, c, d].
The same must apply if we continue to add more elements.
Edit: The positions of each element in the shuffled list should be completely random (certain positions should not be biased for a certain element), if mathematically possible.

(I am pretty new to python)
You are adding the additional element at the end, But you can also insert it in a random position after the shuffle
import random
x = ['a','b','c']
random.Random(123).shuffle(x)
print(x)
x = ['a','b','c']
random.Random(123).shuffle(x)
x.insert(random.randint(0,len(x)),'d')
print(x)
But this will become problematic if more elements are added.

Related

Julia: How to efficiently sort subarrays of 2 large arrays in parallel?

I have large 1D arrays a and b, and an array of pointers I that separates them into subarrays. My a and b barely fit into RAM and are of different dtypes (one contains UInt32s, the other Rational{Int64}s), so I don’t want to join them into a 2D array, to avoid changing dtypes.
For each i in I[2:end], I wish to sort the subarray a[I[i-1],I[i]-1] and apply the same permutation to the corresponding subarray b[I[i-1],I[i]-1]. My attempt at this is:
function sort!(a,b)
p=sortperm(a);
a[:], b[:] = a[p], b[p]
end
Threads.#threads for i in I[2:end]
sort!( a[I[i-1], I[i]-1], b[I[i-1], I[i]-1] )
end
However, already on a small example, I see that sort! does not alter the view of a subarray:
a, b = rand(1:10,10), rand(-1000:1000,10) .//1
sort!(a,b); println(a,"\n",b) # works like it should
a, b = rand(1:10,10), rand(-1000:1000,10) .//1
sort!(a[1:5],b[1:5]); println(a,"\n",b) # does nothing!!!
Any help on how to create such function sort! (as efficient as possible) are welcome.
Background: I am dealing with data coming from sparse arrays:
using SparseArrays
n=10^6; x=sprand(n,n,1000/n); #random matrix with 1000 entries per column on average
x = SparseMatrixCSC(n,n,x.colptr,x.rowval,rand(-99:99,nnz(x)).//1); #chnging entries to rationals
U = randperm(n) #permutation of rows of matrix x
a, b, I = U[x.rowval], x.nzval, x.colptr;
Thus these a,b,I serve as good examples to my posted problem. What I am trying to do is sort the row indices (and corresponding matrix values) of entries in each column.
Note: I already asked this question on Julia discourse here, but received no replies nor comments. If I can improve on the quality of the question, don't hesitate to tell me.
The problem is that a[1:5] is not a view, it's just a copy. instead make the view like
function sort!(a,b)
p=sortperm(a);
a[:], b[:] = a[p], b[p]
end
Threads.#threads for i in I[2:end]
sort!(view(a, I[i-1]:I[i]-1), view(b, I[i-1]:I[i]-1))
end
is what you are looking for
ps.
the #view a[2:3], #view(a[2:3]) or the #views macro can help making thins more readable.
First of all, you shouldn't redefine Base.sort! like this. Now, sort! will shadow Base.sort! and you'll get errors if you call sort!(a).
Also, a[I[i-1], I[i]-1] and b[I[i-1], I[i]-1] are not slices, they are just single elements, so nothing should happen if you sort them either with views or not. And sorting arrays in a moving-window way like this is not correct.
What you want to do here, since your vectors are huge, is call p = partialsortperm(a[i:end], i:i+block_size-1) repeatedly in a loop, choosing a block_size that fits into memory, and modify both a and b according to p, then continue to the remaining part of a and find next p and repeat until nothing remains in a to be sorted. I'll leave the implementation as an exercise for you, but you can come back if you get stuck on something.

Tensorflow: analogue for numpy.take?

Is there analogue for numpy.take?
I want to form N+1-dimensional array from N-dimensional array, more precisely from array with shape (B, H, W, C) I want to make (B, H, W, X, C) array.
I suppose that for my case there is solution even without such general operation. But I'm really unsure that if I will write code with multiple intermediate operations and tensors (shifting, repeating and so on) TF will be able to optimize it and remove unnecessary operations. Moreover I suppose that such code will be unclean and just awful.
I want to add dimension with shifted values. I.e. for (H,W)->(H,W,3) dimensions case indices must be
[
[[0,0], #[0,-1], may be padding with zeros but for now pad with edge value
[0,0],
[0,1]],
[[0,0],
[0,1]
[0,2]]
...
[[1,0],
[1,0],
[1,1]],
[[1,0],
[1,1],
[1,2]],
...
]
I thought about tf.scatter_nd (https://www.tensorflow.org/api_docs/python/tf/scatter_nd) but for now I don't understand how to use it. If I understand correctly, I can't use indices with shapes larger than shapes of update array (i.e. I can't use indices with shape (3,4,5,3) and update with shape (3,4,3) or even (3,4,1,3). If it's so then this operation seems useless until I make intermediate array with shape that I need to form in result.
UPD: may be I'm wrong and tensors operations (shifting, tiling and so on) is more appropriate and efficient solution.
But in any case I think that analogue for np.take will be useful.
The closest function in tensorflow to np.take are tf.gather and tf.gather_nd.
tf.gather_nd is more general than tf.gather (and np.take) as it can slices through several dimensions at once.
A noticeable restriction of tf.gather[_nd] compared to np.take is that they slice through the first dimensions of the tensor only -- you can't slice through inner dimensions. When you want to slice through an arbitrary dimension (as in your case), you need to transpose the array to put the slice dimensions first, gather, then transpose back.
Exemplary code for tf.gather replacing np.take:
import numpy as np
a = np.array([5, 7, 42])
b = np.random.randint(0, 3, (2, 3, 4))
c = a[b]
result_numpy = np.take(a, b)
print(a, b, c, result_numpy)
import tensorflow as tf
a = tf.convert_to_tensor(a)
b = tf.convert_to_tensor(b)
# c = a[b] # does not work
result_tf = tf.gather(a, b)
print(a, b, result_tf)
assert(np.array_equal(result_numpy, result_tf.numpy()))

Fill a vector in Julia with a repeated list

I would like to create a column vector X by repeating a smaller column vector G of length h a number n of times. The final vector X will be of length h*n. For example
G = [1;2;3;4] #column vector of length h
X = [1;2;3;4;1;2;3;4;1;2;3;4] #ie X = [G;G;G;G] column vector of
length h*n
I can do this in a loop but is there an equivalent to the 'fill' function that can be used without the dimensions going wrong. When I try to use fill for this case, instead of getting one column vector of length h*n I get a column vector of length n where each row is another vector of length h. For example I get the following:
X = [[1,2,3,4];[1,2,3,4];[1,2,3,4];[1,2,3,4]]
This doesn't make sense to me as I know that the ; symbol is used to show elements in a row and the space is used to show elements in a column. Why is there the , symbol used here and what does it even mean? I can access the first row of the final output X by X[1] and then any element of this by X[1][1] for example.
Either I would like to use some 'fill' equivalent or some sort of 'flatten' function if it exists, to flatten all the elements of the X into one column vector with each entry being a single number.
I have also tried the reshape function on the output but I can't get this to work either.
Thanks Dan Getz for the answer:
repeat([1, 2, 3, 4], outer = 4)
Type ?repeat at the REPL to learn about this useful function.
In older versions of Julia, repmat was an alternative, but it has now been deprecated and absorbed into repeat
As #DanGetz has pointed out in a comment, repeat is the function you want. From the docs:
repeat(A, inner = Int[], outer = Int[])
Construct an array by repeating the entries of A. The i-th element of inner specifies the number of times that the individual entries of the i-th dimension of A should be repeated. The i-th element of outer specifies the number of times that a slice along the i-th dimension of A should be repeated.
So an example that does what you want is:
X = repeat(G; outer=[k])
where G is the array to be repeated, and k is the number of times to repeat it.
I will also attempt to answer your confusion about the result of fill. Julia (like most languages) makes a distinction between vectors containing numbers and numbers themselves. We know that fill(5, 5) produces [5, 5, 5, 5, 5], which is a one-dimensional array (a vector) where each element is 5.
Note that fill([5], 5), however, produces a one-dimensional array (a vector) where each element is [5], itself a vector. This prints as
5-element Array{Array{Int64,1},1}:
[5]
[5]
[5]
[5]
[5]
and we see from the type that this is indeed a vector of vectors. That of course is not the same thing as the concatenation of vectors. Note that [[5]; [5]; [5]; [5]; [5]] is syntax for concatenation, and will return [5, 5, 5, 5, 5] as you might expect. But although ; syntax (vcat) does concatenation, fill does not do concatenation.
Mathematically (under certain definitions), we may imagine R^(kn) to be distinct (though isomorphic to) from (R^k)^n, for instance, where R^k is the set of k-tuples of real numbers. fill constructs an object of the latter, whereas repeat constructs an object of the former.
As long as you are working with 1-dimensional arrays (Vectors)...
X=repmat(G,4) should do it.
--
On another note, Julia makes no distinction between row and column vector, they are both one-dimensional arrays.
[1,2,3]==[1;2;3] returns true as they are both 3-element Array{Int64,1} or vectors (Array{Int,1} == Vector{Int} returns true)
This is one of the differences between Matlab and Julia...
If, for some specific reason you want to do it, you can create 2-dimensional Arrays (or Matrices) with one of the dimensions equal to 1.
For example:
C = [1 2 3 4] will create a 1x4 Array{Int64,2} the 2 there indicates the dimensions of the Array.
D = [1 2 3 4]' will create a 4x1 Array{Int64,2}.
In this case, C == D returns false of course. But neither is a Vector for Julia, they are both Matrices (Array{Int,2} == Matrix{Int} returns true).

Inflate array (add additional dimension with copies of itself)

I need to perform basic calculations on arrays of different dimensionality.
The recommended solution seems to be to inflate all arrays to match size.
For example, Array B with dimension 10x100. And array C with dimension 10x10000. The result (e.g. by multiplication) should be an array D of size 10x100x10000.
Therefore I would "inflate" B by the "10000" dimension, and C by the "100" dimension and simply do B*C.
Now, what is the best (fastest) way to achieve this inflation?
A slow method illustrating the desired outcome:
B <- array(dim=c(10,100),rnorm(n=10*100)) # small array
A <- array(dim=c(10,100,10000)) # creating empty big array
A[] <- B # "inflate" B into A, creating 10'000 copies of B
Ideas?
Idea 1
Just tried this, about 3x faster:
A <- rep(B,10000)
dim(A) <- c(10,100,10000)
Still searching..
Reference to original request

Inserting zeros into an array and looping using a for loop

I have several arrays that are calculated example a,b and c (there are more than three) are calculated: Please note this is just an example the numbers are much larger and are not so basic
a=[1,2,3,4,5] b=[10,20,30,40,50] c=[100,200,300,400,500] and I want a for loop that inserts zeros into it so I can have the new_abc array steps look like.
1st for loop step new_abc=[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
2nd for loop step new_abc=[1,0,0,2,0,0,3,0,0,4,0,0,5,0,0]
3rd for loop step new_abc=[1,10,0,2,20,0,3,30,0,4,40,0,5,50,0]
4th for loop step new_abc=[1,10,100,2,20,200,3,30,300,4,40,400,5,50,500]
how can I do this with a for loop?
I started with the code below which gives me the zeros
a=[1,2,3,4,5]
new_abc=zeros(1,length(a)*(3));
But I'm not sure how to place the values of the array a b and c using a for loopinto the correct locations ofnew_abc
I know I could place all the arrays into one large array and do a reshape but the calculated arrays I use become to large and I run out of ram, so reading / calculating each array and inserting them into one common array new_abcusing a for loop works best.
I'm running octave 3.8.1 which is like matlab.
This should do it. You can put a,b,c into a cell array. (you can also put them in a matrix...)
new_abc = zeros(1, 3*numel(a));
in = {a, b, c};
for k = 1:3
new_abc(k:3:end) = in{k};
end

Resources