What is the answer for this computational analysis problem? - c

Two algorithms have the same function, while algorithm A has computational complexity O(2^N) and algorithm B has computational complexity O(N^10). Suppose a real computer can continuously run 10^7seconds, performing 10^3 basic operations per second.
In this computer environment, please answer the following questions.
A) What is the approximate range of N for algorithms A and B, respectively?
B) Which algorithm is more suitable in the environment? Why?

The question is defective.
The fact that A has complexity O(2N) means the number of basic operations (presumably modeled as each basic operation taking the same amount of time) means A takes at most some constant times 2N steps for N at least some threshold N0. Similarly, the fact B has complexity O(N10) means B takes at most some constant times N10 steps for N at least some threshold N1. However, they may be different constants; the number of steps for A is at most C02N and the number of steps for B is at most C1N10, and they may have different thresholds N0 and N1.
In asking about a computer that can perform 103 basic operations for 107 seconds, the question asks for which N is the number of steps of A or B known to be at most 1010. In other words, it asks to solve for N in C02N ≤ 1010 and in C1N10 ≤ 1010.
These are clearly unsolvable without knowing C0 and C1, about which the question gives no information.
Further, we do not know the thresholds N0 and N1 where these bounds are known to apply. So even if we knew C0 and C1, we would not know any bound on how many steps the algorithms take for any particular N.
The question is also defective in that it neglects that the O notation puts only an upper bound on the algorithm. The algorithm may run in fewer steps than the values of the formulae. So it may be that, even with N for which C02N ≤ C1N10, algorithm B is better, or vice-versa.
Possibly it is intended that some simplifying assumptions are intended, such as C0 = C1 = 1, N0 = N1 = 0, and each algorithm takes exactly the number of steps of its formula. Then it is easy to solve 2N ≤ 1010 (N is at most about 33.22) and N10 = 1010 (N ≤ 10). However, if these assumptions are intended, then the author has missed the point of O notation; it characterizes a fundamental nature of an algorithm; it does not quantify its actual number of steps.

Related

Optimal frequency of modulo operation in finite field arithmetic implementation

I'm trying to implement finite field arithmetic to use it in Elliptic Curve calculations. Since all that's ever used are arithmetic operations that commute with the modulo operator, I don't see a reason not to delaying that operation till the very end. One thing that may happen is that the numbers involved might become (way) too big and impractical/inefficient to work with, but I was wondering if there was a way to determine the optimal conditions/frequency which should trigger a modulo operation in the calculations.
I'm coding in C.
To avoid the complexity of elliptic curve crypto (as I'm unfamiliar with its algorithm); let assume you're doing temp = (a * b) % M; result = (temp * c) % M, and you're thinking about just doing result = (a * b * c) % M instead.
Let's also assume that you're doing this a lot with the same modulo M; so you've precomputed "multiples of M" lookup tables, so that your modulo code can use the table to find the highest multiple of "M shifted left by N" that is not greater than the dividend and subtract it from dividend, and repeat that with decreasing values of N until you're left with the quotient.
If your lookup table has 256 entries, the dividend is 4096 bits and the divisor is 2048 bits; then you'd reduce the size of the dividend by 8 bits per iteration, so dividend would become smaller than the divisor (and you'd find the quotient) after no more than 256 "search and subtract" operations.
For multiplication; it's almost purely "multiply and add digits" for each pair of digits. E.g. using uint64_t as a digit, multiplying 2048 bit numbers is multiplying 32 digit numbers and involves 32 * 32 = 1024 of those "multiply and add digits" operations.
Now we can make comparisons. Specifically, assuming a, b, c, M are 2048-bit numbers:
a) the original temp = (a * b) % M; result = (temp * c) % M would be 1024 "multiply and add", then 256 "search and subtract", then 1024 "multiply and add", then 256 "search and subtract". For totals it'd be 2048 "multiply and add" and 512 "search and subtract".
b) the proposed result = (a * b * c) % M would be 1024 "multiply and add", then would be 2048 "multiply and add" (as the result of a*b will be a "twice as big" 4096-bit number), then 512 "search and subtract" (as a*b*c will be twice as big as a*b). For totals it'd be 3072 "multiply and add" and 512 "search and subtract".
In other words; (assuming lots of assumptions) the proposed result = (a * b * c) % M would be worse, with 50% more "multiply and add" and the exact same "search and subtract".
Of course none of this (the operations you need for elliptic curve crypto, the sizes of your variables, etc) can be assumed to apply for your specific case.
I was wondering if there was a way to determine the optimal conditions/frequency which should trigger a modulo operation in the calculations.
Yes; the way to determine the optimal conditions/frequency is to do similar to what I did above - determine the true costs (in terms of lower level operations, like my "search and subtract" and "multiply and add") and compare them.
In general (regardless of how modulo is implemented, etc) I'd expect you'll find that doing modulo as often as possible is the fastest option (as it reduces the cost of multiplications and also reduces the cost of later/final modulo) for all cases don't involve addition or subtraction, and that don't fit in simple integers.
If M is a constant, then an alternative for modulo is to multiply by the logical inverse of M. Looking at Polk's comment about 256 bits being a common case, then assuming M is polynomial of degree 256 with 1 bit coefficients, then define the inverse of M to be x^512 / M, which results in a 256 bit "inverse". Name this inverse to be I. Then for a multiply modulo M:
C = A * B ; 512 bit product
Q = (upper 256 bits of C * I)>>256 ; Q = C / M = 256 bit quotient
P = M * Q ; 512 bit product
R = lower 256 bits of (C xor P) ; (A * B)% M
So this require 3 extended precision multiplies and one xor.
If the processor for this code has a carryless multiply, such as X86 PCLMULQDQ, which multiplies two 64 bit operands to produce a 128 bit result, then that could be used as the basis for an extended precision multiply. A basic implementation would need 16 multiplies for a 256 bit by 256 bit multiply to produce a 512 bit product. This could be improved using somthing like Karatsuba:
https://en.wikipedia.org/wiki/Karatsuba_algorithm
but on currernt X86, PCLMULQDQ is fast, taking 1 to 3 cycles, so the main issue would be loading the data into the XMM registers, and I'm not sure Karatsuba would save much time.
optimal conditions/frequency which should trigger a modulo operation in the calculations
Standard practice is to replace all actual modulo operations with something else. So the frequency is never. There are different ways to accomplish that:
Choose the modulus to be a Mersenne prime or pseudo-Mersenne prime. There is a large repertoire of mathematical tricks to implement arithmetic modulo a (pseudo-)Mersenne prime efficiently, without doing any actual modulo operations. In the context of elliptic curves, the prime-modulus NIST curves are chosen this way and for this reason.
Use Barrett reduction. This has the same effect as a real modulo operation, but relies on some precomputation and a precondition on the range of the input to be able to reduce the cost of a modulo-like operation to the cost to a couple of multiplications (plus some supporting operations). Also applicable to polynomial fields.
Do arithmetic in Montgomery form.
Additionally, and perhaps more in the spirit of your question, a common technique is to do various additions without reducing every time (addition does not significantly change the size of a number). It takes a lot of additions before you need an extra limb in your integers, so a lot of them can be done before it starts to make sense to reduce. For multiplications, unless it's by a small constant it almost always makes sense to reduce immediately afterwards to prevent the numbers from getting much physically larger than they need to be (which would be especially bad if the result was fed into another multiplication).
Another technique especially associated with Barrett reductions is to work, most of the time, in a slightly larger range than [0 .. N), eg [0 .. 2N). This enables skipping the conditional subtraction that Barrett reduction needs in order to fully reduce to the range [0 .. N), while still using the most important part, the reduction from the range [0 .. N²) to the range [0 .. 2N).

Finding first duplicated element in linear time [duplicate]

There is an array of size n and the elements contained in the array are between 1 and n-1 such that each element occurs once and just one element occurs more than once. We need to find this element.
Though this is a very FAQ, I still haven't found a proper answer. Most suggestions are that I should add up all the elements in the array and then subtract from it the sum of all the indices, but this won't work if the number of elements is very large. It will overflow. There have also been suggestions regarding the use of XOR gate dup = dup ^ arr[i] ^ i, which are not clear to me.
I have come up with this algorithm which is an enhancement of the addition algorithm and will reduce the chances of overflow to a great extent!
for i=0 to n-1
begin :
diff = A[i] - i;
sum = sum + diff;
end
diff contains the duplicate element, but using this method I am unable to find out the index of the duplicate element. For that I need to traverse the array once more which is not desirable. Can anyone come up with a better solution that does not involve the addition method or the XOR method works in O(n)?
There are many ways that you can think about this problem, depending on the constraints of your problem description.
If you know for a fact that exactly one element is duplicated, then there are many ways to solve this problem. One particularly clever solution is to use the bitwise XOR operator. XOR has the following interesting properties:
XOR is associative, so (x ^ y) ^ z = x ^ (y ^ z)
XOR is commutative: x ^ y = y ^ x
XOR is its own inverse: x ^ y = 0 iff x = y
XOR has zero as an identity: x ^ 0 = x
Properties (1) and (2) here mean that when taking the XOR of a group of values, it doesn't matter what order you apply the XORs to the elements. You can reorder the elements or group them as you see fit. Property (3) means that if you XOR the same value together multiple times, you get back zero, and property (4) means that if you XOR anything with 0 you get back your original number. Taking all these properties together, you get an interesting result: if you take the XOR of a group of numbers, the result is the XOR of all numbers in the group that appear an odd number of times. The reason for this is that when you XOR together numbers that appear an even number of times, you can break the XOR of those numbers up into a set of pairs. Each pair XORs to 0 by (3), and th combined XOR of all these zeros gives back zero by (4). Consequently, all the numbers of even multiplicity cancel out.
To use this to solve the original problem, do the following. First, XOR together all the numbers in the list. This gives the XOR of all numbers that appear an odd number of times, which ends up being all the numbers from 1 to (n-1) except the duplicate. Now, XOR this value with the XOR of all the numbers from 1 to (n-1). This then makes all numbers in the range 1 to (n-1) that were not previously canceled out cancel out, leaving behind just the duplicated value. Moreover, this runs in O(n) time and only uses O(1) space, since the XOR of all the values fits into a single integer.
In your original post you considered an alternative approach that works by using the fact that the sum of the integers from 1 to n-1 is n(n-1)/2. You were concerned, however, that this would lead to integer overflow and cause a problem. On most machines you are right that this would cause an overflow, but (on most machines) this is not a problem because arithmetic is done using fixed-precision integers, commonly 32-bit integers. When an integer overflow occurs, the resulting number is not meaningless. Rather, it's just the value that you would get if you computed the actual result, then dropped off everything but the lowest 32 bits. Mathematically speaking, this is known as modular arithmetic, and the operations in the computer are done modulo 232. More generally, though, let's say that integers are stored modulo k for some fixed k.
Fortunately, many of the arithmetical laws you know and love from normal arithmetic still hold in modular arithmetic. We just need to be more precise with our terminology. We say that x is congruent to y modulo k (denoted x ≡k y) if x and y leave the same remainder when divided by k. This is important when working on a physical machine, because when an integer overflow occurs on most hardware, the resulting value is congruent to the true value modulo k, where k depends on the word size. Fortunately, the following laws hold true in modular arithmetic:
For example:
If x ≡k y and w ≡k z, then x + w ≡k y + z
If x ≡k y and w ≡k z, then xw ≡k yz.
This means that if you want to compute the duplicate value by finding the total sum of the elements of the array and subtracting out the expected total, everything will work out fine even if there is an integer overflow because standard arithmetic will still produce the same values (modulo k) in the hardware. That said, you could also use the XOR-based approach, which doesn't need to consider overflow at all. :-)
If you are not guaranteed that exactly one element is duplicated, but you can modify the array of elements, then there is a beautiful algorithm for finding the duplicated value. This earlier SO question describes how to accomplish this. Intuitively, the idea is that you can try to sort the sequence using a bucket sort, where the array of elements itself is recycled to hold the space for the buckets as well.
If you are not guaranteed that exactly one element is duplicated, and you cannot modify the array of elements, then the problem is much harder. This is a classic (and hard!) interview problem that reportedly took Don Knuth 24 hours to solve. The trick is to reduce the problem to an instance of cycle-finding by treating the array as a function from the numbers 1-n onto 1-(n-1) and then looking for two inputs to that function. However, the resulting algorithm, called Floyd's cycle-finding algorithm, is extremely beautiful and simple. Interestingly, it's the same algorithm you would use to detect a cycle in a linked list in linear time and constant space. I'd recommend looking it up, since it periodically comes up in software interviews.
For a complete description of the algorithm along with an analysis, correctness proof, and Python implementation, check out this implementation that solves the problem.
Hope this helps!
Adding the elements is perfectly fine you just have to take mod(%) of the intermediate aggregate when calculating the sum of the elements and the expected sum. For the mod operation you can use something like 2n. You also have to fix the value after substraction.

How do you find the largest subset of an array of integers that xor to zero

To clarify, the largest subset of the array: [0,1,4,5,6,8] that xors to 0 would be [0,1,4,5] since 0^1^4^5=0 (where ^ is xor). I know this can be done in exponential time by brute force, but I'd like to know what the lower bound is, and what algorithm solves it in that time.
I'm doing to implement the Rational sieve algorithm. Beyond the wikipedia article, resources on the algorithm are fairly scarce. To complete the rational sieve you attempt to find a subset of a group of arrays, such that when adding up corresponding elements, the resulting array has only even numbers. For example:
[2,3,4,5]+[4,3,4,3]=[6,6,8,8] This would be a valid solution, provided that these arrays exist in the larger set.
According to that wikipedia article, this can be solved using linear algebra, but I don't know enough linear algebra to solve it.
For the purpose of the algorithm, an empty subset isn't useful.
I simplified the problem by saying that the arrays can only have 0s, and 1s, and by putting the array into a single number so that the sum can be computed with a single operator, but otherwise they are the same problem.
Yes, it can be formulated as a linear optimization problem. Assuming the integers are k bits and there are n of them, you can represent them as a k * n matrix A, where columns represent the integers, and row r of column n is the r-th bit of integer i.
Then the selection and xor-ing of integers can be represented as A * x, where x is a vector of size n that has 1-s at positions of selected integers. This has to be over GF(2), so multiplication is the standard one and addition is XOR. So you are solving maximize(|x|) subject to Ax = 0.

Is it correct to use a table of interpolated prime-counting function `pi(x)` values as an upper bound for an array of primes?

Suppose I want to allocate an array of integers to store all the prime numbers less than some N. I would then need an estimate for the array size, E(N). There is mathematical function that gives the exact number of primes below N, it's the Prime-counting function - pi(n). However, it looks impossible to define the function in terms of elementary functions.
There exist some approximations to the function, but all of them are asymptotic approximations, so they can be either above or below the true number of primes and cannot in general be used as the estimate E(N).
I've tried to use tabulated values of pi(n) for certain n like power-of-two and interpolate between them. However I noticed that the function pi(n) is convex, so the interpolation between sparse table points may accidentally yield values of E(n) below true pi(n) that may result in buffer overflow.
I then decided to exploit the monotonic nature of pi(n) and use the table values of pi(2^(n+1)) as an far upper estimate for E(2^n) an interpolate between them this time.
I still feel not completely sure that for some 2^n < X < 2^(n+1) an interpolation between pi(2^(n+1)) and pi(2^(n+2)) would be the safe upper estimate. Is it correct? How do I prove it?
You are overthinking this. In C, you just use malloc and realloc. I'd 100 times prefer an algorithm that just obviously works instead of one that requires a deep mathematical proof.
Use an upper bound. There are a number to choose from, each more complicated but tighter. I call this prime_count_upper(n) since you want a value guaranteed to be greater than or equal to the number of primes under n. See Chebyshev, Rosser and Schoenfeld, Dusart 1999, Dusart 2010, Axler 2014, and Büthe 2015. R&S is simple and not terrible: π(x) <= x/(log(x)-3/2) for x >= 67 but Dusart gives better ones for larger values. Either way, no tables or original research needed.
The prime number theorem guarantees the nth prime P(n) is on the range n log n < P(n) < n log n + n log log n for n > 5. As DanaJ suggests, tighter bounds can be computed.
If you want to store the primes in an array, you can't be talking about anything too big. As you suggest, there is no direct computation of pi(n) in terms of elementary arithmetic functions, but there are several methods for computing pi(n) exactly that aren't too hard, as long as n isn't too big. See this, for instance.

Finding out the duplicate element in an array

There is an array of size n and the elements contained in the array are between 1 and n-1 such that each element occurs once and just one element occurs more than once. We need to find this element.
Though this is a very FAQ, I still haven't found a proper answer. Most suggestions are that I should add up all the elements in the array and then subtract from it the sum of all the indices, but this won't work if the number of elements is very large. It will overflow. There have also been suggestions regarding the use of XOR gate dup = dup ^ arr[i] ^ i, which are not clear to me.
I have come up with this algorithm which is an enhancement of the addition algorithm and will reduce the chances of overflow to a great extent!
for i=0 to n-1
begin :
diff = A[i] - i;
sum = sum + diff;
end
diff contains the duplicate element, but using this method I am unable to find out the index of the duplicate element. For that I need to traverse the array once more which is not desirable. Can anyone come up with a better solution that does not involve the addition method or the XOR method works in O(n)?
There are many ways that you can think about this problem, depending on the constraints of your problem description.
If you know for a fact that exactly one element is duplicated, then there are many ways to solve this problem. One particularly clever solution is to use the bitwise XOR operator. XOR has the following interesting properties:
XOR is associative, so (x ^ y) ^ z = x ^ (y ^ z)
XOR is commutative: x ^ y = y ^ x
XOR is its own inverse: x ^ y = 0 iff x = y
XOR has zero as an identity: x ^ 0 = x
Properties (1) and (2) here mean that when taking the XOR of a group of values, it doesn't matter what order you apply the XORs to the elements. You can reorder the elements or group them as you see fit. Property (3) means that if you XOR the same value together multiple times, you get back zero, and property (4) means that if you XOR anything with 0 you get back your original number. Taking all these properties together, you get an interesting result: if you take the XOR of a group of numbers, the result is the XOR of all numbers in the group that appear an odd number of times. The reason for this is that when you XOR together numbers that appear an even number of times, you can break the XOR of those numbers up into a set of pairs. Each pair XORs to 0 by (3), and th combined XOR of all these zeros gives back zero by (4). Consequently, all the numbers of even multiplicity cancel out.
To use this to solve the original problem, do the following. First, XOR together all the numbers in the list. This gives the XOR of all numbers that appear an odd number of times, which ends up being all the numbers from 1 to (n-1) except the duplicate. Now, XOR this value with the XOR of all the numbers from 1 to (n-1). This then makes all numbers in the range 1 to (n-1) that were not previously canceled out cancel out, leaving behind just the duplicated value. Moreover, this runs in O(n) time and only uses O(1) space, since the XOR of all the values fits into a single integer.
In your original post you considered an alternative approach that works by using the fact that the sum of the integers from 1 to n-1 is n(n-1)/2. You were concerned, however, that this would lead to integer overflow and cause a problem. On most machines you are right that this would cause an overflow, but (on most machines) this is not a problem because arithmetic is done using fixed-precision integers, commonly 32-bit integers. When an integer overflow occurs, the resulting number is not meaningless. Rather, it's just the value that you would get if you computed the actual result, then dropped off everything but the lowest 32 bits. Mathematically speaking, this is known as modular arithmetic, and the operations in the computer are done modulo 232. More generally, though, let's say that integers are stored modulo k for some fixed k.
Fortunately, many of the arithmetical laws you know and love from normal arithmetic still hold in modular arithmetic. We just need to be more precise with our terminology. We say that x is congruent to y modulo k (denoted x ≡k y) if x and y leave the same remainder when divided by k. This is important when working on a physical machine, because when an integer overflow occurs on most hardware, the resulting value is congruent to the true value modulo k, where k depends on the word size. Fortunately, the following laws hold true in modular arithmetic:
For example:
If x ≡k y and w ≡k z, then x + w ≡k y + z
If x ≡k y and w ≡k z, then xw ≡k yz.
This means that if you want to compute the duplicate value by finding the total sum of the elements of the array and subtracting out the expected total, everything will work out fine even if there is an integer overflow because standard arithmetic will still produce the same values (modulo k) in the hardware. That said, you could also use the XOR-based approach, which doesn't need to consider overflow at all. :-)
If you are not guaranteed that exactly one element is duplicated, but you can modify the array of elements, then there is a beautiful algorithm for finding the duplicated value. This earlier SO question describes how to accomplish this. Intuitively, the idea is that you can try to sort the sequence using a bucket sort, where the array of elements itself is recycled to hold the space for the buckets as well.
If you are not guaranteed that exactly one element is duplicated, and you cannot modify the array of elements, then the problem is much harder. This is a classic (and hard!) interview problem that reportedly took Don Knuth 24 hours to solve. The trick is to reduce the problem to an instance of cycle-finding by treating the array as a function from the numbers 1-n onto 1-(n-1) and then looking for two inputs to that function. However, the resulting algorithm, called Floyd's cycle-finding algorithm, is extremely beautiful and simple. Interestingly, it's the same algorithm you would use to detect a cycle in a linked list in linear time and constant space. I'd recommend looking it up, since it periodically comes up in software interviews.
For a complete description of the algorithm along with an analysis, correctness proof, and Python implementation, check out this implementation that solves the problem.
Hope this helps!
Adding the elements is perfectly fine you just have to take mod(%) of the intermediate aggregate when calculating the sum of the elements and the expected sum. For the mod operation you can use something like 2n. You also have to fix the value after substraction.

Resources