Given a string of red and blue balls, find min number of swaps to club the colors together - arrays

We are given a string of the form: RBBR, where R - red and B - blue.
We need to find the minimum number of swaps required in order to club the colors together. In the above case that answer would be 1 to get RRBB or BBRR.
I feel like an algorithm to sort a partially sorted array would be useful here since a simple sort would give us the number of swaps, but we want the minimum number of swaps.
Any ideas?
This is allegedly a Microsoft interview question according to this.

Take one pass over the string and count the number of reds (#R) and the number of blues (#B). Then take a second pass counting the number of reds in the first #R balls (#r) and the number of blue balls in the first #B balls (#b). The lesser of (#R - #r) and (#B - #b) will be the minimum number of swaps needed.

We are given the string S that we have to convert to the final string F = R^a B^b or B^b R^a. The number of differences between S and F should be even because for every misplaced R there will be a complementary misplaced B. So why not find the minimum number of differences between S and both possible F's and divide that by 2?
For example, you're given S = RBRRBRBR which should convert to
RRRRRBBB
or
BBBRRRRR
Comparing the differences between S and F for each character for each possibility, there are 4 differences for each possible final string so regardless the minimum is 2 swaps.

Let's look at your example. You know that the end state will be RRBB or BBRR. In other words, the end state is always nRmB or mBnR, where n is the number of R's and m is the number o B's in your string.
Since the end state is defined, maybe some sort of path-finding algorithm would be a good aproach for this? How about considering each swap as a state-change and thinking of a heuristic function to aproximate the number of left over swaps needed.
I'm just throwing an idea in the air, but I hope this helps.

Start with two indices simultaneously from the right and left end of the string. Advance the left index until you find an R. Advance the right index backwards until you find a B. Swap them. Repeat until the left index meets the right index, and count the swaps. Then, do the same, but look for B on the left and R on the right. The minimum is the lower of both swap counts.

I think the number of swaps can be derived from the number of inversions required to sort the vector. This is the example of doing the same with permutation vector.

This isn't a technical answer, but I looked at this more intuitively.
RRBBBBR is can be reduced to RBR, since a group of R's can be moved as a single block. This means that the array is really just a N sets of RB.
The only thing that matters is the number of N sets of RB blocks (including incomplete blocks for the last one).
RBR -> 1 swap to get to RRB (2 sets of RB block, RB and R)
RBRB-> 1 swap to get to RRBB (2 full sets of RB blocks)
RBRBRB-> 2 swaps to get to RRRBBB (3 full sets of RB blocks)
RBRBRBRB -> 4 sets of RB = 3 swaps
So to generalize this, the number of swaps needed = N sets of RB block (including incomplete blocks) and subtract 1.

Related

Algorithm for searching an array for 5 elements which sum to a value

[I asked lately a similar question, Search unsorted array for 3 elements which sum to a value
and got wonderful answers, thank you all! :)]
I need your help for solving the following problem:
I am looking for an algorithm, the time-complexity must be ϴ( n³ ).
The algorithm searches an unsorted array (of n integers) for 5 different integers
which sum to a given z.
E.g.: for the input: ({2,5,7,6,3,4,9,8,21,10} , 22)
the output should be true for we can sum up 2+7+6+3+4=22
(the sorting doesn't really matter. The array can be sorted first without affecting the complexity.
So you can look at the problem as if the array is already sorted.)
-No memory constraints-
-We only know that the array elements are n integers.-
Any help would be appriciated.
Algorithm:
1) Generate an array consisting of pairs of your initial integers and sort it. That step will take O(n^2 * log (n^2)) time.
2) Choose a value from your initial array. O(n) ways.
3) Now you have a very similar problem to the linked one. You have to choose two pairs such that their sum will be equal to z - chosen value. Thankfully, you have an array of all pairs, already sorted, of length O(n^2). Finding such pairs should be straightforward -- same thing you did in a 3 integer sum problem. You make two pointers and move both of them O(n^2) times in total.
O(n^3) total complexity.
You may get into some problems with finding pairs that consist of your chosen value. Skip every pair that consists of your chosen value (just move the pointer further when you reach such a pair like it never existed).
Let's say that you have two pairs, p1 and p2, such that sum(p1) + sum(p2) + chosen value = z. If all of the integers in p1 and p2 are different, you have the solution. If not, that's where it gets a little bit messy.
Let's fix p1 and check the next value after p2. It may have the same sum as p2 since two different pairs can have same sum. If it does, definitely you will not have the same collision with p1 as you had with p2, but you may get a collision with the other integer of p1. If so, check the second value after p2, if it also has the same sum -- it definitely won't have any collision with p1.
So assuming that there are at least 3 pairs with same sum as p1 or p2, you will always find a solution checking 3 values for fixed p1 or checking 3 values for fixed p2.
The only possibility left is that there are less than 3 pairs with same sum as p1 and there are less than 3 pairs with same sum as p2. You can choose them in up to 4 ways -- just check each possibility.
It is a bit unpleasant, but in constant amount of operations you are able to handle such problems. That means the total complexity is O(n^3).

Interleaving array {a1,a2,....,an,b1,b2,...,bn} to {a1,b1,a2,b2,a3,b3} in O(n) time and O(1) space

I have to interleave a given array of the form
{a1,a2,....,an,b1,b2,...,bn}
as
{a1,b1,a2,b2,a3,b3}
in O(n) time and O(1) space.
Example:
Input - {1,2,3,4,5,6}
Output- {1,4,2,5,3,6}
This is the arrangement of elements by indices:
Initial Index Final Index
0 0
1 2
2 4
3 1
4 3
5 5
By observation after taking some examples, I found that ai (i<n/2) goes from index (i) to index (2i) & bi (i>=n/2) goes from index (i) to index (((i-n/2)*2)+1). You can verify this yourselves. Correct me if I am wrong.
However, I am not able to correctly apply this logic in code.
My pseudo code:
for (i = 0 ; i < n ; i++)
if(i < n/2)
swap(arr[i],arr[2*i]);
else
swap(arr[i],arr[((i-n/2)*2)+1]);
It's not working.
How can I write an algorithm to solve this problem?
Element bn is in the correct position already, so lets forget about it and only worry about the other N = 2n-1 elements. Notice that N is always odd.
Now the problem can be restated as "move the element at each position i to position 2i % N"
The item at position 0 doesn't move, so lets start at position 1.
If you start at position 1 and move it to position 2%N, you have to remember the item at position 2%N before you replace it. The the one from position 2%N goes to position 4%N, the one from 4%N goes to 8%N, etc., until you get back to position 1, where you can put the remaining item into the slot you left.
You are guaranteed to return to slot 1, because N is odd and multiplying by 2 mod an odd number is invertible. You are not guaranteed to cover all positions before you get back, though. The whole permutation will break into some number of cycles.
If you can start this process at one element from each cycle, then you will do the whole job. The trouble is figuring out which ones are done and which ones aren't, so you don't cover any cycle twice.
I don't think you can do this for arbitrary N in a way that meets your time and space constraints... BUT if N = 2x-1 for some x, then this problem is much easier, because each cycle includes exactly the cyclic shifts of some bit pattern. You can generate single representatives for each cycle (called cycle leaders) in constant time per index. (I'll describe the procedure in an appendix at the end)
Now we have the basis for a recursive algorithm that meets your constraints.
Given [a1...an,b1...bn]:
Find the largest x such that 2x <= 2n
Rotate the middle elements to create [a1...ax,b1...bx,ax+1...an,bx+1...bn]
Interleave the first part of the array in linear time using the above-described procedure, since it will have modulus 2x-1
Recurse to interleave the last part of the array.
Since the last part of the array we recurse on is guaranteed to be at most half the size of the original, we have this recurrence for the time complexity:
T(N) = O(N) + T(N/2)
= O(N)
And note that the recursion is a tail call, so you can do this in constant space.
Appendix: Generating cycle leaders for shifts mod 2x-1
A simple algorithm for doing this is given in a paper called "An algorithm for generating necklaces of beads in 2 colors" by Fredricksen and Kessler. You can get a PDF here: https://core.ac.uk/download/pdf/82148295.pdf
The implementation is easy. Start with x 0s, and repeatedly:
Set the lowest order 0 bit to 1. Let this be bit y
Copy the lower order bits starting from the top
The result is a cycle leader if x-y divides x
Repeat until you have all x 1s
For example, if x=8 and we're at 10011111, the lowest 0 is bit 5. We switch it to 1 and then copy the remainder from the top to give 10110110. 8-5=3, though, and 3 does not divide 8, so this one is not a cycle leader and we continue to the next.
The algorithm I'm going to propose is probably not o(n).
It's not based on swapping elements but on moving elements which probably could be O(1) if you have a list and not an array.
Given 2N elements, at each iteration (i) you take the element in position N/2 + i and move it to position 2*i
a1,a2,a3,...,an,b1,b2,b3,...,bn
| |
a1,b1,a2,a3,...,an,b2,b3,...,bn
| |
a1,b1,a2,b2,a3,...,an,b3,...,bn
| |
a1,b1,a2,b2,a3,b3,...,an,...,bn
and so on.
example with N = 4
1,2,3,4,5,6,7,8
1,5,2,3,4,6,7,8
1,5,2,6,3,4,7,8
1,5,2,6,3,7,4,8
One idea which is a little complex is supposing each location has the following value:
1, 3, 5, ..., 2n-1 | 2, 4, 6, ..., 2n
a1,a2, ..., an | b1, b2, ..., bn
Then using inline merging of two sorted arrays as explained in this article in O(n) time an O(1) space complexity. However, we need to manage this indexing during the process.
There is a practical linear time* in-place algorithm described in this question. Pseudocode and C code are included.
It involves swapping the first 1/2 of the items into the correct place, then unscrambling the permutation of the 1/4 of the items that got moved, then repeating for the remaining 1/2 array.
Unscrambling the permutation uses the fact that left items move into the right side with an alternating "add to end, swap oldest" pattern. We can find the i'th index in this permutation with this this rule:
For even i, the end was at i/2.
For odd i, the oldest was added to the end at step (i-1)/2
*The number of data moves is definitely O(N). The question asks for the time complexity of the unscramble index calculation. I believe it is no worse than O(lg lg N).

Find way to separate array so each subarrays sum is less or equal to a number

I have a mathematical/algorithmic problem here.
Given an array of numbers, find a way to separate it to 5 subarrays, so that sum of each subarrays is less than or equal to a given number. All numbers from the initial array, must go to one of the subarrays, and be part of one sum.
So the input to the algorithm would be:
d - representing the number that each subarrays sum has to be less or equal
A - representing the array of numbers that will be separated to different subarrays, and will be part of one sum
Algorithm complexity must be polynomial.
Thank you.
If by "subarray" you mean "subset" as opposed to "contiguous slice", it is impossible to find a polynomial time algorithm for this problem (unless P = NP). The Partition Problem is to partition a list of numbers into to sets such that the sum of both sets are equal. It is known to be NP-complete. The partition problem can be reduced to your problem as follows:
Suppose that x1, ..., x_n are positive numbers that you want to partition into 2 sets such that their sums are equal. Let d be this common sum (which would be the sum of the xi divided by 2). extend x_i to an array, A, of size n+3 by adding three copies of d. Clearly the only way to partition A into 5 subarrays so that the sum of each is less than or equal to d is if the sum of each actually equals d. This would in turn require 3 of the subarrays to have length 1, each consisting of the number d. The remaining 2 subarrays would be exactly a partition of the original n numbers.
On the other hand, if there are additional constraints on what the numbers are and/or the subarrays need to be, there might be a polynomial solution. But, if so, you should clearly spell out what there constraints are.
Set up of the problem:
d : the upper bound for the subarray
A : the initial array
Assuming A is not sorted.
(Heuristic)
Algorithm:
1.Sort A in ascending order using standard sorting algorithm->O(nlogn)
2.Check if the largest element of A is greater than d ->(constant)
if yes, no solution
if no, continue
3.Sum up all the element in A, denote S. Check if S/5 > d ->O(n)
if yes, no solution
if no, continue
4.Using greedy approach, create a new subarray Asi, add next biggest element aj in the sorted A to Asi so that the sum of Asi does not exceed d. Remove aj from sorted A ->O(n)
repeat step4 until either of the condition satisfied:
I.At creating subarray Asi, there are only 5-i element left
In this case, split the remaining element to individual subarray, done
II. i = 5. There are 5 subarray created.
The algorithm described above is bounded by O(nlogn) therefore in polynomial time.

Array balancing

we have two arrays a[] and b[] and we need to find minimum absolute difference between sum of two arrays a & b and minimum no. of moves to make minimum absolute difference.
Example : a[ ] = {70,30,33,23,4,4,34,95} sum = 293b[ ] = {50,10,10,7} sum = 77
move 95,23 from array a to b.
move 10 from array a to b
after moving both the array's sum becomes 185
output is 0 , 3 (difference between two arrays , no. of moves)
The first part of your problem, "find minimum absolute difference between sum of two arrays a & b", is a variation of the Knapsack problem. Wikipedia defines that as "Given a set of items, each with a weight and a value, determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible."
To see this, combine all the values in a and in b into a new array ab and find half the sum of its values. You want to find elements in ab that sum to that half-sum, or as close to it as possible. You could then place those values and a and the rest in b, and that is one of the ways to get the minimum absolute difference.
To find your "minimum number of moves" we could find all the ways to solve the knapsack problem, then for each solution find how many moves it would take to get back to the original a and b (or the original b and a if that takes fewer moves).
The computational complexity of just the first part of your problem is famously NP-complete, so expect a long-running program for any sizable arrays. The Wikipedia article has a variety of algorithms to solve that first part of your problem, so you can start there and make a choice of algorithms.
No wonder this is a competitive-programming problem!

Find the Element Occurring b times in an an array of size n*k+b

Description
Given an Array of size (n*k+b) where n elements occur k times and one element occurs b times, in other words there are n+1 distinct Elements. Given that 0 < b < k find the element occurring b times.
My Attempted solutions
Obvious solution will be using hashing but it will not work if the numbers are very large. Complexity is O(n)
Using map to store the frequencies of each element and then traversing map to find the element occurring b times.As Map's are implemented as height balanced trees Complexity will be O(nlogn).
Both of my solution were accepted but the interviewer wanted a linear solution without using hashing and hint he gave was make the height of tree constant in tree in which you are storing frequencies, but I am not able to figure out the correct solution yet.
I want to know how to solve this problem in linear time without hashing?
EDIT:
Sample:
Input: n=2 b=2 k=3
Aarray: 2 2 2 3 3 3 1 1
Output: 1
I assume:
The elements of the array are comparable.
We know the values of n and k beforehand.
A solution O(n*k+b) is good enough.
Let the number occuring only b times be S. We are trying to find the S in an array of n*k+b size.
Recursive Step: Find the median element of the current array slice as in Quick Sort in lineer time. Let the median element be M.
After the recursive step you have an array where all elements smaller than M occur on the left of the first occurence of M. All M elements are next to each other and all element larger than M are on the right of all occurences of M.
Look at the index of the leftmost M and calculate whether S<M or S>=M. Recurse either on the left slice or the right slice.
So you are doing a quick sort but delving only one part of the divisions at any time. You will recurse O(logN) times but each time with 1/2, 1/4, 1/8, .. sizes of the original array, so the total time will still be O(n).
Clarification: Let's say n=20 and k = 10. Then, there are 21 distinct elements in the array, 20 of which occur 10 times and the last occur let's say 7 times. I find the medium element, let's say it is 1111. If the S<1111 than the index of the leftmost occurence of 1111 will be less than 11*10. If S>=1111 then the index will be equal to 11*10.
Full example: n = 4. k = 3. Array = {1,2,3,4,5,1,2,3,4,5,1,2,3,5}
After the first recursive step I find the median element is 3 and the array is something like: {1,2,1,2,1,2,3,3,3,5,4,5,5,4} There are 6 elements on the left of 3. 6 is a multiple of k=3. So each element must be occuring 3 times there. So S>=3. Recurse on the right side. And so on.
An idea using cyclic groups.
To guess i-th bit of answer, follow this procedure:
Count how many numbers in array has i-th bit set, store as cnt
If cnt % k is non-zero, then i-th bit of answer is set. Otherwise it is clear.
To guess whole number, repeat the above for every bit.
This solution is technically O((n*k+b)*log max N), where max N is maximal value in the table, but because number of bits is usually constant, this solution is linear in array size.
No hashing, memory usage is O(log k * log max N).
Example implementation:
from random import randint, shuffle
def generate_test_data(n, k, b):
k_rep = [randint(0, 1000) for i in xrange(n)]
b_rep = [randint(0, 1000)]
numbers = k_rep*k + b_rep*b
shuffle(numbers)
print "k_rep: ", k_rep
print "b_rep: ", b_rep
return numbers
def solve(data, k):
cnts = [0]*10
for number in data:
bits = [number >> b & 1 for b in xrange(10)]
cnts = [cnts[i] + bits[i] for i in xrange(10)]
return reduce(lambda a,b:2*a+(b%k>0), reversed(cnts), 0)
print "Answer: ", solve(generate_test_data(10, 15, 13), 3)
In order to have a constant height B-tree containing n distinct elements, with height h constant, you need z=n^(1/h) children per nodes: h=log_z(n), thus h=log(n)/log(z), thus log(z)=log(n)/h, thus z=e^(log(n)/h), thus z=n^(1/h).
Example, with n=1000000, h=10, z=3.98, that is z=4.
The time to reach a node in that case is O(h.log(z)). Assuming h and z to be "constant" (since N=n.k, then log(z)=log(n^(1/h))=log(N/k^(1/h))=ct by properly choosing h based on k, you can then say that O(h.log(z))=O(1)... This is a bit far-fetched, but maybe that was the kind of thing the interviewer wanted to hear?
UPDATE: this one use hashing, so it's not a good answer :(
in python this would be linear time (set will remove the duplicates):
result = (sum(set(arr))*k - sum(arr)) / (k - b)
If 'k' is even and 'b' is odd, then XOR will do. :)

Resources