Finding first duplicated element in linear time [duplicate] - arrays

There is an array of size n and the elements contained in the array are between 1 and n-1 such that each element occurs once and just one element occurs more than once. We need to find this element.
Though this is a very FAQ, I still haven't found a proper answer. Most suggestions are that I should add up all the elements in the array and then subtract from it the sum of all the indices, but this won't work if the number of elements is very large. It will overflow. There have also been suggestions regarding the use of XOR gate dup = dup ^ arr[i] ^ i, which are not clear to me.
I have come up with this algorithm which is an enhancement of the addition algorithm and will reduce the chances of overflow to a great extent!
for i=0 to n-1
begin :
diff = A[i] - i;
sum = sum + diff;
end
diff contains the duplicate element, but using this method I am unable to find out the index of the duplicate element. For that I need to traverse the array once more which is not desirable. Can anyone come up with a better solution that does not involve the addition method or the XOR method works in O(n)?

There are many ways that you can think about this problem, depending on the constraints of your problem description.
If you know for a fact that exactly one element is duplicated, then there are many ways to solve this problem. One particularly clever solution is to use the bitwise XOR operator. XOR has the following interesting properties:
XOR is associative, so (x ^ y) ^ z = x ^ (y ^ z)
XOR is commutative: x ^ y = y ^ x
XOR is its own inverse: x ^ y = 0 iff x = y
XOR has zero as an identity: x ^ 0 = x
Properties (1) and (2) here mean that when taking the XOR of a group of values, it doesn't matter what order you apply the XORs to the elements. You can reorder the elements or group them as you see fit. Property (3) means that if you XOR the same value together multiple times, you get back zero, and property (4) means that if you XOR anything with 0 you get back your original number. Taking all these properties together, you get an interesting result: if you take the XOR of a group of numbers, the result is the XOR of all numbers in the group that appear an odd number of times. The reason for this is that when you XOR together numbers that appear an even number of times, you can break the XOR of those numbers up into a set of pairs. Each pair XORs to 0 by (3), and th combined XOR of all these zeros gives back zero by (4). Consequently, all the numbers of even multiplicity cancel out.
To use this to solve the original problem, do the following. First, XOR together all the numbers in the list. This gives the XOR of all numbers that appear an odd number of times, which ends up being all the numbers from 1 to (n-1) except the duplicate. Now, XOR this value with the XOR of all the numbers from 1 to (n-1). This then makes all numbers in the range 1 to (n-1) that were not previously canceled out cancel out, leaving behind just the duplicated value. Moreover, this runs in O(n) time and only uses O(1) space, since the XOR of all the values fits into a single integer.
In your original post you considered an alternative approach that works by using the fact that the sum of the integers from 1 to n-1 is n(n-1)/2. You were concerned, however, that this would lead to integer overflow and cause a problem. On most machines you are right that this would cause an overflow, but (on most machines) this is not a problem because arithmetic is done using fixed-precision integers, commonly 32-bit integers. When an integer overflow occurs, the resulting number is not meaningless. Rather, it's just the value that you would get if you computed the actual result, then dropped off everything but the lowest 32 bits. Mathematically speaking, this is known as modular arithmetic, and the operations in the computer are done modulo 232. More generally, though, let's say that integers are stored modulo k for some fixed k.
Fortunately, many of the arithmetical laws you know and love from normal arithmetic still hold in modular arithmetic. We just need to be more precise with our terminology. We say that x is congruent to y modulo k (denoted x ≡k y) if x and y leave the same remainder when divided by k. This is important when working on a physical machine, because when an integer overflow occurs on most hardware, the resulting value is congruent to the true value modulo k, where k depends on the word size. Fortunately, the following laws hold true in modular arithmetic:
For example:
If x ≡k y and w ≡k z, then x + w ≡k y + z
If x ≡k y and w ≡k z, then xw ≡k yz.
This means that if you want to compute the duplicate value by finding the total sum of the elements of the array and subtracting out the expected total, everything will work out fine even if there is an integer overflow because standard arithmetic will still produce the same values (modulo k) in the hardware. That said, you could also use the XOR-based approach, which doesn't need to consider overflow at all. :-)
If you are not guaranteed that exactly one element is duplicated, but you can modify the array of elements, then there is a beautiful algorithm for finding the duplicated value. This earlier SO question describes how to accomplish this. Intuitively, the idea is that you can try to sort the sequence using a bucket sort, where the array of elements itself is recycled to hold the space for the buckets as well.
If you are not guaranteed that exactly one element is duplicated, and you cannot modify the array of elements, then the problem is much harder. This is a classic (and hard!) interview problem that reportedly took Don Knuth 24 hours to solve. The trick is to reduce the problem to an instance of cycle-finding by treating the array as a function from the numbers 1-n onto 1-(n-1) and then looking for two inputs to that function. However, the resulting algorithm, called Floyd's cycle-finding algorithm, is extremely beautiful and simple. Interestingly, it's the same algorithm you would use to detect a cycle in a linked list in linear time and constant space. I'd recommend looking it up, since it periodically comes up in software interviews.
For a complete description of the algorithm along with an analysis, correctness proof, and Python implementation, check out this implementation that solves the problem.
Hope this helps!

Adding the elements is perfectly fine you just have to take mod(%) of the intermediate aggregate when calculating the sum of the elements and the expected sum. For the mod operation you can use something like 2n. You also have to fix the value after substraction.

Related

Extracting base K random “bits” from a pre-filled random buffer

Say I need N cryptographically-secure pseudorandom integers within the range [0, K). The most obvious way of achieving this would be N calls to arc4random_uniform(3), and this is exactly what I’m doing.
However, the profiler tells me that numerous calls to arc4random_uniform(3) are taking 2/3 of the whole execution time, and I really need to make my code faster. This is why I‘m planning to generate some random bytes in advance (probably with arc4random_buf(3)) and subsequently extract from it bit by bit.
For K = 2, I can simply mask the desired bit out, but when K is not a power of 2, things are getting hairy. Surely I can use a bunch of %= and /=, but then I would have modulo bias. Another problem is when N grows too large, I can no longer interpret the whole buffer as an integer and perform arithmetic operations on it.
In case it’s relevant, K would be less than 20, whereas N can be really large, like millions.
You can use the modulus operator and division, you just need to do a bit of extra preprocessing. Generate your array of values as normal. Take P to be the largest power of K less than or equal to 2^32 (where ^ denotes exponentiation), and iterate over your array making sure all random values are strictly less then P. Any which aren't, replace with a new random number which is less than P. This will remove the bias.
Now to handle large N, you'll need to two loops. The first loop iterates over the elements in the array, the second extracts multiple random numbers from each element. If P = k ^ e, then you can extract e random numbers from [0, k) from each element in the array. Each time you extract a random number from an element, do a floored division by k on that element.
Of course, this doesn't necessarily need to be actual loops. You can store two variables (array index, sub-element index) and extract from the array_index element as a function gets called. If sub_element_index == e, then reset it to zero and increase the array_index. Extract a random number from this array element and return it instead.

How do you find the largest subset of an array of integers that xor to zero

To clarify, the largest subset of the array: [0,1,4,5,6,8] that xors to 0 would be [0,1,4,5] since 0^1^4^5=0 (where ^ is xor). I know this can be done in exponential time by brute force, but I'd like to know what the lower bound is, and what algorithm solves it in that time.
I'm doing to implement the Rational sieve algorithm. Beyond the wikipedia article, resources on the algorithm are fairly scarce. To complete the rational sieve you attempt to find a subset of a group of arrays, such that when adding up corresponding elements, the resulting array has only even numbers. For example:
[2,3,4,5]+[4,3,4,3]=[6,6,8,8] This would be a valid solution, provided that these arrays exist in the larger set.
According to that wikipedia article, this can be solved using linear algebra, but I don't know enough linear algebra to solve it.
For the purpose of the algorithm, an empty subset isn't useful.
I simplified the problem by saying that the arrays can only have 0s, and 1s, and by putting the array into a single number so that the sum can be computed with a single operator, but otherwise they are the same problem.
Yes, it can be formulated as a linear optimization problem. Assuming the integers are k bits and there are n of them, you can represent them as a k * n matrix A, where columns represent the integers, and row r of column n is the r-th bit of integer i.
Then the selection and xor-ing of integers can be represented as A * x, where x is a vector of size n that has 1-s at positions of selected integers. This has to be over GF(2), so multiplication is the standard one and addition is XOR. So you are solving maximize(|x|) subject to Ax = 0.

Is it correct to use a table of interpolated prime-counting function `pi(x)` values as an upper bound for an array of primes?

Suppose I want to allocate an array of integers to store all the prime numbers less than some N. I would then need an estimate for the array size, E(N). There is mathematical function that gives the exact number of primes below N, it's the Prime-counting function - pi(n). However, it looks impossible to define the function in terms of elementary functions.
There exist some approximations to the function, but all of them are asymptotic approximations, so they can be either above or below the true number of primes and cannot in general be used as the estimate E(N).
I've tried to use tabulated values of pi(n) for certain n like power-of-two and interpolate between them. However I noticed that the function pi(n) is convex, so the interpolation between sparse table points may accidentally yield values of E(n) below true pi(n) that may result in buffer overflow.
I then decided to exploit the monotonic nature of pi(n) and use the table values of pi(2^(n+1)) as an far upper estimate for E(2^n) an interpolate between them this time.
I still feel not completely sure that for some 2^n < X < 2^(n+1) an interpolation between pi(2^(n+1)) and pi(2^(n+2)) would be the safe upper estimate. Is it correct? How do I prove it?
You are overthinking this. In C, you just use malloc and realloc. I'd 100 times prefer an algorithm that just obviously works instead of one that requires a deep mathematical proof.
Use an upper bound. There are a number to choose from, each more complicated but tighter. I call this prime_count_upper(n) since you want a value guaranteed to be greater than or equal to the number of primes under n. See Chebyshev, Rosser and Schoenfeld, Dusart 1999, Dusart 2010, Axler 2014, and Büthe 2015. R&S is simple and not terrible: π(x) <= x/(log(x)-3/2) for x >= 67 but Dusart gives better ones for larger values. Either way, no tables or original research needed.
The prime number theorem guarantees the nth prime P(n) is on the range n log n < P(n) < n log n + n log log n for n > 5. As DanaJ suggests, tighter bounds can be computed.
If you want to store the primes in an array, you can't be talking about anything too big. As you suggest, there is no direct computation of pi(n) in terms of elementary arithmetic functions, but there are several methods for computing pi(n) exactly that aren't too hard, as long as n isn't too big. See this, for instance.

How does C perform the % operation interally

I am curious to understand the logic behind the mod operation since I understand that bit-shifting operations can be performed to do different things such as bit shifting to multiply.
One way I can see it being done is by a recursive algorithm that keeps dividing until you cannot divide anymore, but this does not seem efficient.
Any ideas will be helpful. Thanks in advance!
The quick version is: Depends on hardware, the optimizer, if it's division by a constant or not (pdf), if there's exceptions to be checked for (e.g. modulo by 0), if and how negative numbers are handled (this is a scary question for C++), etc...
R gave a nice, concise answer for unsigned integers, but it's difficult to understand unless you're well versed with C.
The crux of the technique illuminated by R is to strip away multiples of q until there's no more multiples of q left. We could naively do this with a simple loop:
while (p >= q) p -= q; // One liner, woohoo!
The code may be short, but for large values of p and small values of q this might take a very long time.
Better than stripping away one q at a time would be to strip away many q's at a time. Note that we actually want to strip away as many q's as possible -- that is, floor(p/q) many q's... And indeed, that's a valid technique. For unsigned integers, one would expect that p % q == p - (p / q) * q. (Note that unsigned integer division rounds down.)
But this almost feels like cheating because division and remainder operations are so intimately related. (In fact, often if hardware natively supports division, it supports a divide-and-compute-remainder operation because they're so strongly related.)
Assuming we've no access to division, how shall we find a multiple of q greater than 1 to strip away? In hardware, fixed shift operations are cheap (if not practically free) and conceptually represent multiplication by a non-negative power of two. For example, shifting a bit string left by 3 is equivalent to multiplying by 8 (that is, 2^3), e.g. 5 decimal is equivalent to '101' binary. Shift '101' in binary by adding three zeroes on the right (giving '101000') and the result is 50 in decimal -- five times eight.
Likewise, shift operations are very cheap as software operations and you'll struggle to find a controller that doesn't support them and quickly. (Some architectures such as ARM can even combine shifts with other instructions to make them 'free' a good deal of the time.)
ARMed (couldn't resist) with these shift operations, we can proceed as follows:
Find out the largest power of two we can multiply q by and still be less than p.
Working from the largest power of two to the smallest, multiply q by each power of two and if it's less than what's left of p subtract it from what's left of p.
Whatever you've got left is the remainder.
Why does this work? Because in the end you'll find that all the subtracted powers of two actually sum to floor(p / q)! Don't take my word for it, similar knowledge has been known for a very long time.
Breaking apart R's answer:
#define HI (-1U-(-1U/2))
This effectively gives you an unsigned integer with only the highest value bit set.
unsigned i;
for (i=0; !(HI & (q<<i)); i++);
This line actually finds the highest power of two q can be multiplied before overflowing an unsigned integer. This isn't strictly necessary, but it doesn't change the results other than increasing the amount of execution time required.
In case you're not familiar with the C-isms in this line:
(q<<i) is a left bit shift by i. Recall this is equivalent to multiplying by 2^i.
HI & (q<<i) performs a bitwise-AND. Since HI only has its top bit populated this will only result in a non-zero value when (q<<i) is large enough to cause the top bit to be non-zero. One more shift over to the left and there'd be an integer overflow.
!(HI & (q<<i)) is 'true' when (HI & (q<<i)) is zero and 'false' otherwise.
do { if (p >= (q<<i)) p -= (q<<i); } while (i--);
This is a simple decreasing loop do { .... } while (i--);. Note that post-decrementing is used on i so the loop executes, then it checks to see if i is not zero, then it subtracts one from i, and then if its earlier check resulted in true it continues. This has the property that the loop executes its last time when i is 0. This is important because we may need to strip away an unmultiplied copy of q.
if (p >= (q<<i)) checks if the 2^i * q is less than or equal to p. If it is, p -= (q<<i) strips it away.
The remainder is left.
While most C implementations run on hardware that has a division instruction, the remainder operation can be performed roughly like this, for computing p%q, assuming unsigned values:
#define HI (-1U-(-1U/2))
unsigned i;
for (i=0; !(HI & (q<<i)); i++);
do { if (p >= (q<<i)) p -= (q<<i); } while (i--);
The resulting remainder is in p.
In addition to a hardware instruction and implementation using shifts, as R.. suggests, there's also reciprocal multiplication.
This technique can be used when the right-hand side of % is a constant, known at compile time.
Reciprocal multiplication is used to implement division, but using it for % is easy, based on the formula a%b == a-(a/b)*b.
Depending on the smarts of the optimizer, there is a shortcut for modulo base 2. For example, a % 32 can be implemented as a & 31. In general, a % (2^N) == a & (2^N -1). This is lightning fast compared to division. Most dividers (ever hardware) require at least 1 cycle for each bit of the result to calculate, while logic AND is just a few cycle operation (in the pipeline).
EDIT: this only works if a is unsigned !

Finding out the duplicate element in an array

There is an array of size n and the elements contained in the array are between 1 and n-1 such that each element occurs once and just one element occurs more than once. We need to find this element.
Though this is a very FAQ, I still haven't found a proper answer. Most suggestions are that I should add up all the elements in the array and then subtract from it the sum of all the indices, but this won't work if the number of elements is very large. It will overflow. There have also been suggestions regarding the use of XOR gate dup = dup ^ arr[i] ^ i, which are not clear to me.
I have come up with this algorithm which is an enhancement of the addition algorithm and will reduce the chances of overflow to a great extent!
for i=0 to n-1
begin :
diff = A[i] - i;
sum = sum + diff;
end
diff contains the duplicate element, but using this method I am unable to find out the index of the duplicate element. For that I need to traverse the array once more which is not desirable. Can anyone come up with a better solution that does not involve the addition method or the XOR method works in O(n)?
There are many ways that you can think about this problem, depending on the constraints of your problem description.
If you know for a fact that exactly one element is duplicated, then there are many ways to solve this problem. One particularly clever solution is to use the bitwise XOR operator. XOR has the following interesting properties:
XOR is associative, so (x ^ y) ^ z = x ^ (y ^ z)
XOR is commutative: x ^ y = y ^ x
XOR is its own inverse: x ^ y = 0 iff x = y
XOR has zero as an identity: x ^ 0 = x
Properties (1) and (2) here mean that when taking the XOR of a group of values, it doesn't matter what order you apply the XORs to the elements. You can reorder the elements or group them as you see fit. Property (3) means that if you XOR the same value together multiple times, you get back zero, and property (4) means that if you XOR anything with 0 you get back your original number. Taking all these properties together, you get an interesting result: if you take the XOR of a group of numbers, the result is the XOR of all numbers in the group that appear an odd number of times. The reason for this is that when you XOR together numbers that appear an even number of times, you can break the XOR of those numbers up into a set of pairs. Each pair XORs to 0 by (3), and th combined XOR of all these zeros gives back zero by (4). Consequently, all the numbers of even multiplicity cancel out.
To use this to solve the original problem, do the following. First, XOR together all the numbers in the list. This gives the XOR of all numbers that appear an odd number of times, which ends up being all the numbers from 1 to (n-1) except the duplicate. Now, XOR this value with the XOR of all the numbers from 1 to (n-1). This then makes all numbers in the range 1 to (n-1) that were not previously canceled out cancel out, leaving behind just the duplicated value. Moreover, this runs in O(n) time and only uses O(1) space, since the XOR of all the values fits into a single integer.
In your original post you considered an alternative approach that works by using the fact that the sum of the integers from 1 to n-1 is n(n-1)/2. You were concerned, however, that this would lead to integer overflow and cause a problem. On most machines you are right that this would cause an overflow, but (on most machines) this is not a problem because arithmetic is done using fixed-precision integers, commonly 32-bit integers. When an integer overflow occurs, the resulting number is not meaningless. Rather, it's just the value that you would get if you computed the actual result, then dropped off everything but the lowest 32 bits. Mathematically speaking, this is known as modular arithmetic, and the operations in the computer are done modulo 232. More generally, though, let's say that integers are stored modulo k for some fixed k.
Fortunately, many of the arithmetical laws you know and love from normal arithmetic still hold in modular arithmetic. We just need to be more precise with our terminology. We say that x is congruent to y modulo k (denoted x ≡k y) if x and y leave the same remainder when divided by k. This is important when working on a physical machine, because when an integer overflow occurs on most hardware, the resulting value is congruent to the true value modulo k, where k depends on the word size. Fortunately, the following laws hold true in modular arithmetic:
For example:
If x ≡k y and w ≡k z, then x + w ≡k y + z
If x ≡k y and w ≡k z, then xw ≡k yz.
This means that if you want to compute the duplicate value by finding the total sum of the elements of the array and subtracting out the expected total, everything will work out fine even if there is an integer overflow because standard arithmetic will still produce the same values (modulo k) in the hardware. That said, you could also use the XOR-based approach, which doesn't need to consider overflow at all. :-)
If you are not guaranteed that exactly one element is duplicated, but you can modify the array of elements, then there is a beautiful algorithm for finding the duplicated value. This earlier SO question describes how to accomplish this. Intuitively, the idea is that you can try to sort the sequence using a bucket sort, where the array of elements itself is recycled to hold the space for the buckets as well.
If you are not guaranteed that exactly one element is duplicated, and you cannot modify the array of elements, then the problem is much harder. This is a classic (and hard!) interview problem that reportedly took Don Knuth 24 hours to solve. The trick is to reduce the problem to an instance of cycle-finding by treating the array as a function from the numbers 1-n onto 1-(n-1) and then looking for two inputs to that function. However, the resulting algorithm, called Floyd's cycle-finding algorithm, is extremely beautiful and simple. Interestingly, it's the same algorithm you would use to detect a cycle in a linked list in linear time and constant space. I'd recommend looking it up, since it periodically comes up in software interviews.
For a complete description of the algorithm along with an analysis, correctness proof, and Python implementation, check out this implementation that solves the problem.
Hope this helps!
Adding the elements is perfectly fine you just have to take mod(%) of the intermediate aggregate when calculating the sum of the elements and the expected sum. For the mod operation you can use something like 2n. You also have to fix the value after substraction.

Resources