How to find all the subarrays with xor 0? - arrays

The problem is to find all the subarrays of the given array with xor of all its elements equal to zero.
For example, if array contains elements [13,8,5,3,3], the solution should give the indices of all subarrays like 0-2, 3-4, 0-4, etc.
The question is similar to the one asked here
The only difference is that I want the indices of all the subarrays that satisfies the equation A0 xor A1 xor...xor An = 0

This is a fairly straightforward extension of the linked question. In Python,
# Multivalued map from the XOR of array[:i] to i for all i.
prefix_xor_to_stops = {0: [0]}
prefix_xor = 0
for j, x in range(array):
prefix_xor ^= x
# Returns the value associated with prefix_xor. Inserts [] if not present.
stops = prefix_xor_to_stops.setdefault(prefix_xor, [])
for i in stops:
yield (i, j+1)
stops.append(j+1)
As before, the idea is that a subarray array[i:j] has XOR zero if and only if the XOR of array[:i] equals the XOR of array[:j]. For each subsequent element of the array, we compute the XOR of the prefix ending at that element from the XOR of the prefix ending at the previous element, then look up all of the solutions i to the above equation. Then we insert the new association and continue.

If you want to modify the answer mentioned in the post then i hope you have understood that solution very well.
Now the thing which is missing in that solution is that its only storing the first index occurrence of a particular prefix xor sum. Other indexes where the same xorSum occurs are not tracked. So what you have to do is modify map to keep a list(vector in C++) of indexes for each xorSum.

If you have two different prefixes of the array with equal xor, let's say prefix of length x1 and prefix of length x2, then subarray from x1 + 1 to x2 has xor equal to 0. Make a dictionary (BST, hash table, whatever similar) and store there pairs (value of prefix sum, prefixes that gives that value). Any two elements with the same value give you one subarray. You can also find it using Trie if you like.
Using Trie:
At the beginning Trie consists of single node and no edges. We want to add to it numbers. It would also be convenient to index them, since we want to find all subarrays. Each node that represents some numbers (multiple in case of duplicates) in Trie will store list of their indices, so we can easily get the subarrays.
When we add a number n with an index i we write n as a binary number. We start from the initial node. If the most significant bit of n equals 0, if there exists an edge labelled 0 from our start then we move to a corresponding vertex, if not we create a new edge labelled 0 pointing to a new node, then we move to this newly created one (same thing for 1). Then we keep doing this until we iterated through every bit of n. We add index i to a list of indices in a node that we ended up in.
Make variable prefsum = 0
For each i = 1 to n:
add prefsum to Trie with index i
set prefsum = prefsum ^ array[i]
check if there exists value prefsum in Trie. For each such value v, the subarray of xor equal to 0 is between indices v-th and i-th.
Total complexity is O(n * log(max value in array))
It may not be better than using BST or hash array, but it is a popular trick that especially shines in some problems with XOR operation.

I will write the code blocks in Python 3.7
let l be list of tuples of (i,j)
The most efficient and simple way to deal with is problem is:
Step 1: calculate the xor of prefixes :
xorArr[0] = arr[0] #here arr = [13,8,5,3,3]
for i in range(1, n):
xorArr[i] = xorArr[i - 1] ^ arr[i]
Step 2: Check if at any point xorArr[i]=0, if yes then arr[:i+1] is one subarray whose xor is zero:
for i in range(1, n):
xorArr[i] = xorArr[i - 1] ^ arr[i]
if xorArr[i]==0:
l.append((0,i))
Step 3: Now make a dictionary to store list of indexes of each element occuring in xorArr
d = {xorArr[0]:[0]}
for x in range(1,n):
if xorArr[x] in d.keys():
d[xorArr[x]].append(x)
else:
d[xorArr[x]] = [x]
Step 4: Make a function that will pair up(i,j) for every element in d[xorArr[x]] and add it to l:
from itertools import combinations
def pair_up(arr):
return list(combinations(arr,2))
for x in d.values():
if len(x)==1: #you don't have to worry about elements that occur only once
continue
else: # if same element is present at i and j (i<j) then
l+=pair_up(x) # all pairs of (i,j) are valid (xor(arr[i:j]) = 0)
P.S : You don't have to worry about sorting as all the value in d will obviously be sorted. Hope this Helps.
Do upvote. Cheers!
Edit :
Complexity of code : O(n*((frequency of element with maximum frequency in xorArr) chooses 2)) or O(n*(max_freq C 2)).

Related

Given a sorted array of integers find subarrays such that the largest elements of the subarrays are within some distance of the smallest

For example, given an array
a = [1, 2, 3, 7, 8, 9]
and an integer
i = 2. Find maximal subarrays where the distance between the largest and the smallest elements is at most i. The output for the example above would be:
[1,2,3] [7,8,9]
The subarrays are maximal in the sense given two subarrays A and B. There exists no element b in B such that A + b satisfies the condition given. Does there exist a non-polynomial time algorithm for said problem ?
This problem might be solved in linear time using method of two pointers and two deques storing indices, the first deque keeps minimum, another keeps maximum in sliding window.
Deque for minimum (similar for maximum):
current_minimum = a[minq.front]
Adding i-th element of array: //at the right index
while (!minq.empty and a[minq.back] > a[i]):
//last element has no chance to become a minimum because newer one is better
minq.pop_back
minq.push_back(i)
Extracting j-th element: //at the left index
if (!minq.empty and minq.front == j)
minq.pop_front
So min-deque always contains non-decreasing sequence.
Now set left and right indices in 0, insert index 0 into deques, and start to move right. At every step add index in order into deques, and check than left..right interval range is good. When range becomes too wide (min-max distance is exceeded), stop moving right index, check length of the last good interval, compare with the best length.
Now move left index, removing elements from deques. When max-min becomes good, stop left and start with right again. Repeat until array end.

Sorting an array of integers in nlog(n) time without using comparison operators

Imagine there's have an array of integers but you aren't allowed to access any of the values (so no Arr[i] > Arr[i+1] or whatever). The only way to discern the integers from one another is by using a query() function: this function takes a subset of elements as inputs and returns the number of unique integers in this subset. The goal is to partition the integers into groups based on their values — integers in the same group should have the same value, while integers in different groups have different values.
The catch - the code has to be O(nlog(n)), or in other words the query() function can only be called O(nlog(n)) times.
I've spent hours optimizing different algorithms in Python, but all of them have been O(n^2). For reference, here's the code I start out with:
n = 100
querycalls = 0
secretarray = [random.randint(0, n-1) for i in range(n)]
def query(items):
global querycalls
querycalls += 1
return len(set(items))
groups = []
secretarray generates a giant random list of numbers of length n. querycalls keeps track of how much the function is called. groups are where the results go.
The first thing I did was try to create an algorithm based off of merge sort (split the arrays down and then merge based on the query() value) but I could never get it below O(n^2).
Let's say you have an element x and an array of distinct elements, A = [x0, x1, ..., x_{k-1}] and want to know if the x is equivalent to some element in the array and if yes, to which element.
What you can do is a simple recursion (let's call it check-eq):
Check if query([x, A]) == k + 1. If yes, then you know that x is distinct from every element in A.
Otherwise, you know that x is equivalent to some element of A. Let A1 = A[:k/2], A2 = A[k/2+1:]. If query([x, A1]) == len(A1), then you know that x is equivalent to some element in A1, so recurse in A1. Otherwise recurse in A2.
This recursion takes at most O(logk) steps. Now, let our initial array be T = [x0, x1, ..., x_{n-1}]. A will be an array of "representative" of the groups of elements. What you do is first take A = [x0] and x = x1. Now use check-eq to see if x1 is in the same group as x0. If no, then let A = [x0, x1]. Otherwise do nothing. Proceed with x = x2. You can see how it goes.
Complexity is of course O(nlogn), because check-eq is called exactly n-1 times and each call take O(logn) time.

Minimum steps needed to make all elements equal by adding adjacent elements

I have an array A of size N. All elements are positive integers. In one step, I can add two adjacent elements and replace them with their sum. That said, the array size reduces by 1. Now I need to make all the elements same by performing minimum number of steps.
For example: A = [1,2,3,2,1,3].
Step 1: Merge index 0 and 1 ==> A = [3,3,2,1,3]
Step 2: Merge index 2 and 3 (of new array) ==> [3,3,3,3]
Hence number of steps are 2.
I couldn't think of a straight solution, so tried a recursive approach by merging all indices one by one and returning the min level I can get when either array size is 1 or all elements are equal.
Below is the code I tried:
# Checks if all the elements are same or not
def check(A):
if len(set(A)) == 1:
return True
return False
# Recursive function to find min steps
def min_steps(N,A,level):
# If all are equal return the level
if N == 1 or check(A):
return level
# Initialize min variable
mn = float('+inf')
# Try merging all one by one and recur
for i in range(N-1):
temp = A[:]
temp[i]+=temp[i+1]
temp.pop(i+1)
mn = min(mn, min_steps(N-1,temp, level+1))
return mn
This solution has complexity of O(N^N). I want to reduce it to polynomial time near to O(N^2) or O(N^3). Can anyone help me modify this solution or tell me if I am missing something?
Combining any k adjacent pairs of elements (even if they include elements formed from previous combining steps) leaves exactly n-k elements in total, each of which we can map back to the contiguous subarray of the original problem that constitutes the elements that were added together to form it. So, this problem is equivalent to partitioning the array into the largest possible number of contiguous subarrays such that all subarrays have the same sum: Any adjacent pair of elements within the same subarray can be combined into a single element, and this process repeated within the subarray with adjacent pairs chosen in any order, until all elements have been combined into a single element.
So, if there are n elements and they sum to T, then a simple O(nT) algorithm is:
For i from 0 to T:
Try partitioning the elements into subarrays each having sum i. This amounts to scanning along the array, greedily adding the current element to the current subarray if the sum of elements in the current subarray is strictly < i. When we reach a total of exactly i, the current subarray ends and a new subarray (initially having sum 0) begins. If adding the current element takes us above the target of i, or if we run out of elements before reaching the target, stop this scan and try the next outer loop iteration (value of i). OTOH if we get to the end, having formed k subarrays in the process, stop and report n-k as the optimal (minimum possible) number of combining moves.
A small speedup would be to only try target i values that evenly divide T.
EDIT: To improve the time complexity from O(nT) to O(n^2), it suffices to only try target i values corresponding to sums of prefixes of the array (since there must be a subarray containing the first element, and this subarray can only have such a sum).

Maximize number of inversion count in array

We are given an unsorted array A of integers (duplicates allowed) with size N possibly large. We can count the number of pairs with indices i < j, for which A[i] < A[j], let's call this X.
We can change maximum one element from the array with a cost equal to the difference in absolute values (for instance, if we replace element on index k with the new number K, the cost Y is | A[k] - K |).
We can only replace this element with other elements found in the array.
We want to find the minimum possible value of X + Y.
Some examples:
[1,2,2] should return 1 (change the 1 to 2 such that the array becomes [2,2,2])
[2,2,3] should return 1 (change the 3 to 2)
[2,1,1] should return 0 (because no changes are necessary)
[1,2,3,4] should return 6 (this is already the minimum possible value)
[4,4,5,5] should return 3 (this can accomplished by changing the first 4 into a 5 or the last 5 in a 4)
The number of pairs can be found with a naive O(n²) solution, here in Python:
def calc_x(arr):
n = len(arr)
cnt = 0
for i in range(n):
for j in range(i+1, n):
if arr[j] > arr[i]:
cnt += 1
return cnt
A brute-force solution is easily written as for example:
def f(arr):
best_val = calc_x(arr)
used = set(arr)
for i, v in enumerate(arr):
for replacement in used:
if replacement == v:
continue
arr2 = arr[0:i] + replacement + arr[i:]
y = abs(replacement - v)
x = calc_x(arr2)
best_val = min(best_val, x + y)
return best_val
We can count for each element the number of items right of it larger than itself in O(n*log(n)) using for instance an AVL-tree or some variation on merge sort.
However, we still have to search which element to change and what improvement it can achieve.
This was given as an interview question and I would like some hints or insights as how to solve this problem efficiently (data structures or algorithm).
Definitely go for a O(n log n) complexity when counting inversions.
We can see that when you change a value at index k, you can either:
1) increase it, and then possibly reduce the number of inversions with elements bigger than k, but increase the number of inversions with elements smaller than k
2) decrease it (the opposite thing happens)
Let's try not to count x every time you change a value. What do you need to know?
In case 1):
You have to know how many elements on the left are smaller than your new value v and how many elements on the right are bigger than your value. You can pretty easily check that in O (n). So what is your x now? You can count it with the following formula:
prev_val - your previous value
prev_x - x that you've counted at the beginning of your program
prev_l - number of elements on the left smaller than prev_val
prev_r - number of elements on the right bigger than prev_val
v - new value
l - number of elements on the right smaller than v
r - number of elements on the right bigger than v
new_x = prev_x + r + l - prev_l - prev_r
In the second case you pretty much do the opposite thing.
Right now you get something like O( n^3 ) instead of O (n^3 log n), which is probably still bad. Unfortunately that's all what I came up for now. I'll definitely tell you if I come up with sth better.
EDIT: What about memory limit? Is there any? If not, you can just for each element in the array make two sets with elements before and after the current one. Then you can find the amount of smaller/bigger in O (log n), making your time complexity O (n^2 log n).
EDIT 2: We can also try to check, what element would be the best to change to a value v, for every possible value v. You can make then two sets and add/erase elements from them while checking for every element, making the time complexity O(n^2 log n) without using too much space. So the algorithm would be:
1) determine every value v that you can change any element, calculate x
2) for each possible value v:
make two sets, push all elements into the second one
for each element e in array:
add previous element (if there's any) to the first set and erase element e from the second set, then count number of bigger/smaller elements in set 1 and 2 and calculate new x
EDIT 3: Instead of making two sets, you could go with prefix sum for a value. That's O (n^2) already, but I think we can go even better than this.

Find the Element Occurring b times in an an array of size n*k+b

Description
Given an Array of size (n*k+b) where n elements occur k times and one element occurs b times, in other words there are n+1 distinct Elements. Given that 0 < b < k find the element occurring b times.
My Attempted solutions
Obvious solution will be using hashing but it will not work if the numbers are very large. Complexity is O(n)
Using map to store the frequencies of each element and then traversing map to find the element occurring b times.As Map's are implemented as height balanced trees Complexity will be O(nlogn).
Both of my solution were accepted but the interviewer wanted a linear solution without using hashing and hint he gave was make the height of tree constant in tree in which you are storing frequencies, but I am not able to figure out the correct solution yet.
I want to know how to solve this problem in linear time without hashing?
EDIT:
Sample:
Input: n=2 b=2 k=3
Aarray: 2 2 2 3 3 3 1 1
Output: 1
I assume:
The elements of the array are comparable.
We know the values of n and k beforehand.
A solution O(n*k+b) is good enough.
Let the number occuring only b times be S. We are trying to find the S in an array of n*k+b size.
Recursive Step: Find the median element of the current array slice as in Quick Sort in lineer time. Let the median element be M.
After the recursive step you have an array where all elements smaller than M occur on the left of the first occurence of M. All M elements are next to each other and all element larger than M are on the right of all occurences of M.
Look at the index of the leftmost M and calculate whether S<M or S>=M. Recurse either on the left slice or the right slice.
So you are doing a quick sort but delving only one part of the divisions at any time. You will recurse O(logN) times but each time with 1/2, 1/4, 1/8, .. sizes of the original array, so the total time will still be O(n).
Clarification: Let's say n=20 and k = 10. Then, there are 21 distinct elements in the array, 20 of which occur 10 times and the last occur let's say 7 times. I find the medium element, let's say it is 1111. If the S<1111 than the index of the leftmost occurence of 1111 will be less than 11*10. If S>=1111 then the index will be equal to 11*10.
Full example: n = 4. k = 3. Array = {1,2,3,4,5,1,2,3,4,5,1,2,3,5}
After the first recursive step I find the median element is 3 and the array is something like: {1,2,1,2,1,2,3,3,3,5,4,5,5,4} There are 6 elements on the left of 3. 6 is a multiple of k=3. So each element must be occuring 3 times there. So S>=3. Recurse on the right side. And so on.
An idea using cyclic groups.
To guess i-th bit of answer, follow this procedure:
Count how many numbers in array has i-th bit set, store as cnt
If cnt % k is non-zero, then i-th bit of answer is set. Otherwise it is clear.
To guess whole number, repeat the above for every bit.
This solution is technically O((n*k+b)*log max N), where max N is maximal value in the table, but because number of bits is usually constant, this solution is linear in array size.
No hashing, memory usage is O(log k * log max N).
Example implementation:
from random import randint, shuffle
def generate_test_data(n, k, b):
k_rep = [randint(0, 1000) for i in xrange(n)]
b_rep = [randint(0, 1000)]
numbers = k_rep*k + b_rep*b
shuffle(numbers)
print "k_rep: ", k_rep
print "b_rep: ", b_rep
return numbers
def solve(data, k):
cnts = [0]*10
for number in data:
bits = [number >> b & 1 for b in xrange(10)]
cnts = [cnts[i] + bits[i] for i in xrange(10)]
return reduce(lambda a,b:2*a+(b%k>0), reversed(cnts), 0)
print "Answer: ", solve(generate_test_data(10, 15, 13), 3)
In order to have a constant height B-tree containing n distinct elements, with height h constant, you need z=n^(1/h) children per nodes: h=log_z(n), thus h=log(n)/log(z), thus log(z)=log(n)/h, thus z=e^(log(n)/h), thus z=n^(1/h).
Example, with n=1000000, h=10, z=3.98, that is z=4.
The time to reach a node in that case is O(h.log(z)). Assuming h and z to be "constant" (since N=n.k, then log(z)=log(n^(1/h))=log(N/k^(1/h))=ct by properly choosing h based on k, you can then say that O(h.log(z))=O(1)... This is a bit far-fetched, but maybe that was the kind of thing the interviewer wanted to hear?
UPDATE: this one use hashing, so it's not a good answer :(
in python this would be linear time (set will remove the duplicates):
result = (sum(set(arr))*k - sum(arr)) / (k - b)
If 'k' is even and 'b' is odd, then XOR will do. :)

Resources