Generating/Placing k random items in a 2d array - arrays

I have a 2d array, matrix of a sort (m x n). I need to generate '1' in k cells, but the probability of it should be equal for each cell.
for example, if k=3, we pick randomly where to place the 3 '1's :
[0, 0, 0, 0]
[0, 1, 1, 0]
[1, 0, 0, 0]
At first, I tackled this by generating a Random of modulu m * n (rows * columns).
But, that means that we could theoretically get to the end of the matrix without generating a single '1'.
Then, I read about Yates Shuffle, but wasn't sure whether that's wise and even feasible to implement it with that.
What is an efficient way to implement this?

This is essentially a problem of "sampling without replacement": Out of mn cells, choose k of them without replacement. There are many approaches to this problem, depending on how big mn is in relation to k. If mn is relatively small, then a Fisher–Yates shuffle will work well; make a list of cells, shuffle them, then take the first k cells to set to 1.
For more details, see my sections on sampling without replacement and shuffling. (Part of my comment moved here:) Different sampling algorithms have different tradeoffs in terms of time and space. For example, a Fisher–Yates shuffle has time and space complexity of O(mn), while a partial shuffle can have time complexity O(k).

Related

Search specific permutation of permutationsubset with constraints

Iam searching one permutation P consisting of p1...pn of following subset S.
S is defined of the Labels L.
L1...Lk. Where a L contains pi...pj.
Where the inverse of P has at most k-1 decreasing adjecent Elements. k <= n.
Example:
n := 4
k := 2
L1 := 1,2
L2 := 3,4
L := L1,L2,L1,L2
S := 1324,1423,2314,2413
one solution would be P := 1342
no solution would be P := 3142 because decreasing adjecent elements are 2 but only max1 ist allowed because k =2.
Exists therefor an algorithm to find P of S defined by L?
Currently I use bruteforce to figure one permutation P, but its getting very fast unusable slow.
So each of L1, ..., Lk is a consecutive set of elements. At each place we see Li, Lj in the definition of L, one of three things is true:
i < j in which case it is ascending.
i = j in which case it could be ascending or descending.
i > j in which case it must be descending.
By counting the number of places where case 3 is true, we get a minimum number of descending elements already in the definition of L.
Next, for each Li we have a pattern we can write down with len(Li)-1 ; and , where a ; means that there are elements of other Ljs between two members of Li, and , means that Li elements are adjacent and so the order of the elements may result in a descent. We want to know, "For each possible number of descents within Li, how many permutations of Li have that number of descents?"
We will think of building the permutations as follows:
The first element goes at position 0.
The second element goes to position 0 or 1. (If at 0, the first element is moved.)
The third element goes to position 0, 1, or 2.
etc
A descent is when the next element is smaller than the previous, at a transition matching a ,.
We actually will want the following data structure for later use:
cache[Li] gives:
by how many elements are chosen:
by the last element chosen:
by the number of descents we will add:
how many ways of finishing this permutation
So we can write a recursive function that takes:
The pattern for Li.
How many elements have been chosen.
What index was last chosen.
It then returns a dictionary mapping descents to count of ways to finish the permutation for Li.
Memoize that and we get our desired data structure.
Now we'll repeat the idea. We want:
cache2[i] gives:
by number of descents to use:
how many permutations of L[i], L[i+1], ..., L[k] meet it.
Again we can write a recursive function using cache to calculate this, and we can memoize it to get cache2.
And NOW we can reverse the process.
We know how many descents came from the definition of L.
We know the distribution of remaining descents from cache2[1], so we can randomly pick how many descents there will be meeting our condition among L1...Lk.
For L1...Lk we can look at cache[L1][1][0] and cache2[i+1] to figure out how many descents there will be within Li with the correct probability.
For each Li we can look at how many descents we want to wind up with, its pattern, and cache2[Li] to figure out a random sequence of inserts winding up with the right pattern. The first insert is always at 0. After that you always know the size, and where the last insert was, and how many descents are left. So for each possiblenext insert you figure out if it counts as a descent (look at both pattern, and whether it is before the last insert), and the number of ways to finish from there. Then you can choose the next insert randomly with the right possibility.
For each Li we can turn the pattern of inserts into the list of values in order. (I will explain this step more.)
We can now follow the pattern of L and fill in all of the values.
Now for step 5, let's illustrate with your example from the chat. Suppose that L2 = [4, 5, 6] and the pattern of inserts we came up with was [0, 1, 0]. How do we figure out the arrangement of values?
Well first we do our inserts:
[1]
[1, 2]
[3, 1, 2]
This says that the first element (4) goes to the third place, the second (5) to the first, and the third (6) to the second. So our permutation for L2 is [5, 6, 4].
This will be a lot of code to write. But it will be polynomial. Specifically if m is the count of the most common label, cache will have total size at most O(k m^2). Thanks to memoization, each entry takes O(m) to calculate. Everything else is small relative to that. So total space is O(k m^2) and time is O(k m^3).

Check if array is sorted using O(1)

Is it possible to check (in Java) if an array is sorted or not with O(1) worst time complexity?
It's impossible to have a correct algorithm with better than O(n) complexity.
Let's prove it by contradiction. If we are given an algorithm with better than O(n)
complexity we can provide a test array
{1, 2, 3, ..., n}
where n is large enough so the algorithm has to skip some items (note, that
if algorithm inspects all items it has at least O(n) time complexity). If
algorithm returns false it's incorrect; if it returns true we have to create
one more test. Let m be the item which is not inspected:
{1, 2, 3, ... m - 1, m, m + 1,... n}
^
Not inspected by the algorithm.
Let's create the test array as it was before but change m into n + 1 (or 1 if m == n)
{1, 2, 3, ..., m - 1, n + 1, m + 1, ... n}
^
we changed m into n + 1
Since m is not inspected, the algorithm returns true which is now incorrect. So the arbitrary algorithm with time complexity better than O(n) is incorrect, or put it in different way: there are no correct algorithms with better than O(n) time complexity.
As others suggest, since you need to access all of the N array elements, you end up with O(N) complexity. But this is true only when the accesses take place sequentially. If you have N processors at your disposal, you can access all elements in one go and get O(1) complexity. But this is just a theoretical dream. Still, in practice today, most computers sport many cores, and most languages offer parallel constructs. For example, Java has parallel streams. So you will never reach O(1), but you may do better than O(N).

Array operations for maximum sum

Given an array A consisting of N elements. Our task is to find the maximal subarray sum after applying the following operation exactly once:
. Select any subarray and set all the elements in it to zero.
Eg:- array is -1 4 -1 2 then answer is 6 because we can choose -1 at index 2 as a subarray and make it 0. So the resultatnt array will be after applying the operation is : -1 4 0 2. Max sum subarray is 4+0+2 = 6.
My approach was to find start and end indexes of minimum sum subarray and make all elements as 0 of that subarray and after that find maximum sum subarray. But this approach is wrong.
Starting simple:
First, let us start with the part of the question: Finding the maximal subarray sum.
This can be done via dynamic programming:
a = [1, 2, 3, -2, 1, -6, 3, 2, -4, 1, 2, 3]
a = [-1, -1, 1, 2, 3, 4, -6, 1, 2, 3, 4]
def compute_max_sums(a):
res = []
currentSum = 0
for x in a:
if currentSum > 0:
res.append(x + currentSum)
currentSum += x
else:
res.append(x)
currentSum = x
return res
res = compute_max_sums(a)
print(res)
print(max(res))
Quick explanation: we iterate through the array. As long as the sum is non-negative, it is worth appending the whole block to the next number. If we dip below zero at any point, we discard whole "tail" sequence since it will not be profitable to keep it anymore and we start anew. At the end, we have an array, where j-th element is the maximal sum of a subarray i:j where 0 <= i <= j.
Rest is just the question of finding the maximal value in the array.
Back to the original question
Now that we solved the simplified version, it is time to look further. We can now select a subarray to be deleted to increase the maximal sum. The naive solution would be to try every possible subarray and to repeat the steps above. This would unfortunately take too long1. Fortunately, there is a way around this: we can think of the zeroes as a bridge between two maxima.
There is one more thing to address though - currently, when we have the j-th element, we only know that the tail is somewhere behind it so if we were to take maximum and 2nd biggest element from the array, it could happen that they would overlap which would be a problem since we would be counting some of the elements more than once.
Overlapping tails
How to mitigate this "overlapping tails" issue?
The solution is to compute everything once more, this time from the end to start. This gives us two arrays - one where j-th element has its tail i pointing towards the left end of the array(e.g. i <=j) and the other where the reverse is true. Now, if we take x from first array and y from second array we know that if index(x) < index(y) then their respective subarrays are non-overlapping.
We can now proceed to try every suitable x, y pair - there is O(n2) of them. However since we don't need any further computation as we already precomputed the values, this is the final complexity of the algorithm since the preparation cost us only O(n) and thus it doesn't impose any additional penalty.
Here be dragons
So far the stuff we did was rather straightforward. This following section is not that complex but there are going to be some moving parts. Time to brush up the max heaps:
Accessing the max is in constant time
Deleting any element is O(log(n)) if we have a reference to that element. (We can't find the element in O(log(n)). However if we know where it is, we can swap it with the last element of the heap, delete it, and bubble down the swapped element in O(log(n)).
Adding any element into the heap is O(log(n)) as well.
Building a heap can be done in O(n)
That being said, since we need to go from start to the end, we can build two heaps, one for each of our pre-computed arrays.
We will also need a helper array that will give us quick index -> element-in-heap access to get the delete in log(n).
The first heap will start empty - we are at the start of the array, the second one will start full - we have the whole array ready.
Now we can iterate over whole array. In each step i we:
Compare the max(heap1) + max(heap2) with our current best result to get the current maximum. O(1)
Add the i-th element from the first array into the first heap - O(log(n))
Remove the i-th indexed element from the second heap(this is why we have to keep the references in a helper array) - O(log(n))
The resulting complexity is O(n * log(n)).
Update:
Just a quick illustration of the O(n2) solution since OP nicely and politely asked. Man oh man, I'm not your bro.
Note 1: Getting the solution won't help you as much as figuring out the solution on your own.
Note 2: The fact that the following code gives the correct answer is not a proof of its correctness. While I'm fairly certain that my solution should work it is definitely worth looking into why it works(if it works) than looking at one example of it working.
input = [100, -50, -500, 2, 8, 13, -160, 5, -7, 100]
reverse_input = [x for x in reversed(input)]
max_sums = compute_max_sums(input)
rev_max_sums = [x for x in reversed(compute_max_sums(reverse_input))]
print(max_sums)
print(rev_max_sums)
current_max = 0
for i in range(len(max_sums)):
if i < len(max_sums) - 1:
for j in range(i + 1, len(rev_max_sums)):
if max_sums[i] + rev_max_sums[j] > current_max:
current_max = max_sums[i] + rev_max_sums[j]
print(current_max)
1 There are n possible beginnings, n possible ends and the complexity of the code we have is O(n) resulting in a complexity of O(n3). Not the end of the world, however it's not nice either.

Sort an array so the difference of elements a[i]-a[i+1]<=a[i+1]-a[i+2]

My mind is blown since I began, last week, trying to sort an array of N elements by condition: the difference between 2 elements being always less or equal to the next 2 elements. For example:
Α[4] = { 10, 2, 7, 4}
It is possible to rearrange that array this way:
{2, 7, 10, 4} because (2 - ­7 = ­-5) < (7 - ­10 = -­3) < (10 - ­4 = 6)
{4, 10, 7, 2} because (4 - ­10 = -­6) < (10 - ­7 = ­3) < (7 - ­2 = 5)
One solution I considered was just shuffling the array and checking each time if it agreed with the conditions, an efficient method for a small number of elements, but time consuming or even impossible for a larger number of elements.
Another was trying to move elements around the array with loops, hoping again to meet the requirements, but again this method is very time consuming and also sometimes not possible.
Trying to find an algorithm doesn't seem to have any result but there must be something.
Thank you very much in advance.
I normally don't just provide code, but this question intrigued me, so here's a brute-force solution, that might get you started.
The concept will always be slow because the individual elements in the list to be sorted are not independent of each other, so they cannot be sorted using traditional O(N log N) algorithms. However, the differences can be sorted that way, which simplifies checking for a solution, and permutations could be checked in parallel to speed up the processing.
import os,sys
import itertools
def is_diff_sorted(qa):
diffs = [qa[i] - qa[i+1] for i in range(len(qa)-1)]
for i in range(len(diffs)-1):
if diffs[i] > diffs[i+1]:
return False
return True
a = [2,4,7,10]
#a = [1,4,6,7,20]
a.sort()
for perm in itertools.permutations(a):
if is_diff_sorted(perm):
print "Solution:",str(a)
break
This condition is related to differentiation. The (negative) difference between neighbouring elements has to be steady or increasing with increasing index. Multiply the condition by -1 and you get
a[i+1]-a[i] => a[i+2]-a[i+1]
or
0 => (a[i+2]-a[i+1])- (a[i+1]-a[i])
So the 2nd derivative has to be 0 or negative, which is the same as having the first derivative stay the same or changing downwards, like e.g. portions of the upper half of a circle. That does not means that the first derivative itself has to start out positive or negative, just that it never change upward.
The problem algorithmically is that it can't be a simple sort, since you never compare just 2 elements of the list, you'll have to compare three at a time (i,i+1,i+2).
So the only thing you know apart from random permutations is given in Klas` answer (values first rising if at all, then falling if at all), but his is not a sufficient condition since you can have a positive 2nd derivative in his two sets (rising/falling).
So is there a solution much faster than the random shuffle? I can only think of the following argument (similar to Klas' answer). For a given vector the solution is more likely if you separate the data into a 1st segment that is rising or steady (not falling) and a 2nd that is falling or steady (not rising) and neither is empty. Likely an argument could be made that the two segments should have approximately equal size. The rising segment should have the data that are closer together and the falling segment should contain data that are further apart. So one could start with the mean, and look for data that are close to it, move them to the first set,then look for more widely spaced data and move them to the 2nd set. So a histogram might help.
[4 7 10 2] --> diff [ 3 3 -8] --> 2diff [ 0 -11]
Here is a solution based on backtracking algorithm.
Sort input array in non-increasing order.
Start dividing the array's values into two subsets: put the largest element to both subsets (this would be the "middle" element), then place second largest one into arbitrary subset.
Sequentially put the remaining elements to either subset. If this cannot be done without violating the "difference" condition, use other subset. If both subsets are not acceptable, rollback and change preceding decisions.
Reverse one of the arrays produced on step 3 and concatenate it with other array.
Below is Python implementation (it is not perfect, the worst defect is recursive implementation: while recursion is quite common for backtracking algorithms, this particular algorithm seems to work in linear time, and recursion is not good for very large input arrays).
def is_concave_end(a, x):
return a[-2] - a[-1] <= a[-1] - x
def append_element(sa, halves, labels, which, x):
labels.append(which)
halves[which].append(x)
if len(labels) == len(sa) or split_to_halves(sa, halves, labels):
return True
if which == 1 or not is_concave_end(halves[1], halves[0][-1]):
halves[which].pop()
labels.pop()
return False
labels[-1] = 1
halves[1].append(halves[0][-1])
halves[0].pop()
if split_to_halves(sa, halves, labels):
return True
halves[1].pop()
labels.pop()
def split_to_halves(sa, halves, labels):
x = sa[len(labels)]
if len(halves[0]) < 2 or is_concave_end(halves[0], x):
return append_element(sa, halves, labels, 0, x)
if is_concave_end(halves[1], x):
return append_element(sa, halves, labels, 1, x)
def make_concave(a):
sa = sorted(a, reverse = True)
halves = [[sa[0]], [sa[0], sa[1]]]
labels = [0, 1]
if split_to_halves(sa, halves, labels):
return list(reversed(halves[1][1:])) + halves[0]
print make_concave([10, 2, 7, 4])
It is not easy to produce a good data set to test this algorithm: plain set of random numbers either is too simple for this algorithm or does not have any solutions. Here I tried to generate a set that is "difficult enough" by mixing together two sorted lists, each satisfying the "difference" condition. Still this data set is processed in linear time. And I have no idea how to prepare any data set that would demonstrate more-than-linear time complexity of this algorithm...
Not that since the diffence should be ever-rising, any solution will have element first in rising order and then in falling order. The length of either of the two "suborders" may be 0, so a solution could consist of a strictly rising or strictly falling sequence.
The following algorithm will find any solutions:
Divide the set into two sets, A and B. Empty sets are allowed.
Sort A in rising order and B in falling order.
Concatenate the two sorted sets: AB
Check if you have a solution.
Do this for all possible divisions into A and B.
Expanding on the #roadrunner66 analysis, the solution is to take two smallest elements of the original array, and make them first and last in the target array; take two next smallest elements and make them second and next-to-last; keep going until all the elements are placed into the target. Notice that which one goes to the left, and which one to the right doesn't matter.
Sorting the original array facilitates the process (finding smallest elements becomes trivial), so the time complexity is O(n log n). The space complexity is O(n), because it requires a target array. I don't know off-hand if it is possible to do it in-place.

Haskell: In the Repa library ... lists are not Elts, but what about "k-tuples"

I want to write an algorithm that traverses a 2dimensional matrix of size k by n+1 (say), and where each element in the array is a list of elements. These lists vary in size they may be length 1, 2, ... , k. I can even say, for sure, that in the first row, they will all be length 1, in the second row: length 2, ... , in the kth row: length k. I imagine that Haskell has some sort of mechanism for "k-tuples", I just don't know what it is. Even if the type were indexed by some fixed size, that would be okay -- it would mean a small performance hit, but not too bad.
Any suggestions?
If you have k rows of n columns each with k elements - you could perhaps use a list of rows of kxn matrices for the same thing?
In the repa head repository they have a slightly different design where you can have elements which are non-unboxed types - you could use lists (or vectors) there.
http://code.ouroborus.net/repa/

Resources