Complexity of bin packing with defined function of bin weight - theory

I'm struggling with the following problem:
Given n integers, place them into m bins, so that the total sum in all bins is minimized. The trick is that once numbers are placed in the bin, the total weight/cost/sum of the bin is computed in non-standard way:
weight_of_bin = Sigma - k * X Where Sigma is a sum of integers in the bin
k is the number of integers in the bin
X is the number of prime divisors that integers located in the bin have in common.
In other words, by grouping together the numbers that have many prime divisors in common, and by placing different quantities of numbers in different bins, we can achieve some "savings" in the total sum.
I use bin-packing formulation because I suspect the problem to be NPhard but I have trouble finding a proof. I am not a number theory person and am confused with the fact that weight of the bin depends on the items that are in the bin.
Are there hardness results for this type of problem?
P.S. I only know that the numbers are integers. There is no explicit limit on the largest integer involved in the problem.
Thanks for any pointers you can give.

This is not a complete answer, but hopefully it gives you some things to think about.
First, by way of clarification: what do you know about the prime divisors of the integers? Finding all the prime divisors of the integers in the input to the problem is difficult enough as it is. Factorization isn't known to be NP-complete, but it also isn't known to be in P. If you don't already know the factorization of the inputs, that might be enough to make this problem "hard".
In general, I expect this problem will be at least as hard as bin packing. A simple argument to show this is that it is possible that none of the integers given have any common prime divisors (for example, if you are given a set of distinct primes). In which case, the problem reduces to standard bin packing since the weight of the bin is just the standard weight. If you have a guarantee about how many common divisors there may be, you may possibly do better, but probably not in general.
There is a variant of bin packing, called VM packing (based on the idea of packing virtual machines based on memory requirements) where objects are allowed to share space (such as shared virtual memory pages). Your objective function, where you subtract a term based on "shared" prime divisors reminds me of that. Even in the case of VM packing, the problem is NP-hard. If the sharing has a nice hierarchy, good approximation algorithms exist, but they are still only approximations.

Related

Determine if a given integer number is element of the Fibonacci sequence in C without using float

I had recently an interview, where I failed and was finally told having not enough experience to work for them.
The position was embedded C software developer. Target platform was some kind of very simple 32-bit architecture, those processor does not support floating-point numbers and their operations. Therefore double and float numbers cannot be used.
The task was to develop a C routine for this architecture. This takes one integer and returns whether or not that is a Fibonacci number. However, from the memory only an additional 1K temporary space is allowed to use during the execution. That means: even if I simulate very great integers, I can't just build up the sequence and interate through.
As far as I know, a positive integer is a exactly then a Fibonacci number if one of
(5n ^ 2) + 4
or
(5n ^ 2) − 4
is a perfect square. Therefore I responded the question: it is simple, since the routine must determine whether or not that is the case.
They responded then: on the current target architecture no floating-point-like operations are supported, therefore no square root numbers can be retrieved by using the stdlib's sqrt function. It was also mentioned that basic operations like division and modulus may also not work because of the architecture's limitations.
Then I said, okay, we may build an array with the square numbers till 256. Then we could iterate through and compare them to the numbers given by the formulas (see above). They said: this is a bad approach, even if it would work. Therefore they did not accept that answer.
Finally I gave up. Since I had no other ideas. I asked, what would be the solution: they said, it won't be told; but advised me to try to look for it myself. My first approach (the 2 formula) should be the key, but the square root may be done alternatively.
I googled at home a lot, but never found any "alternative" square root counter algorithms. Everywhere was permitted to use floating numbers.
For operations like division and modulus, the so-called "integer-division" may be used. But what is to be used for square root?
Even if I failed the interview test, this is a very interesting topic for me, to work on architectures where no floating-point operations are allowed.
Therefore my questions:
How can floating numbers simulated (if only integers are allowed to use)?
What would be a possible soultion in C for that mentioned problem? Code examples are welcome.
The point of this type of interview is to see how you approach new problems. If you happen to already know the answer, that is undoubtedly to your credit but it doesn't really answer the question. What's interesting to the interviewer is watching you grapple with the issues.
For this reason, it is common that an interviewer will add additional constraints, trying to take you out of your comfort zone and seeing how you cope.
I think it's great that you knew that fact about recognising Fibonacci numbers. I wouldn't have known it without consulting Wikipedia. It's an interesting fact but does it actually help solve the problem?
Apparently, it would be necessary to compute 5n²±4, compute the square roots, and then verify that one of them is an integer. With access to a floating point implementation with sufficient precision, this would not be too complicated. But how much precision is that? If n can be an arbitrary 32-bit signed number, then n² is obviously not going to fit into 32 bits. In fact, 5n²+4 could be as big as 65 bits, not including a sign bit. That's far beyond the precision of a double (normally 52 bits) and even of a long double, if available. So computing the precise square root will be problematic.
Of course, we don't actually need a precise computation. We can start with an approximation, square it, and see if it is either four more or four less than 5n². And it's easy to see how to compute a good guess: it will very close to n×√5. By using a good precomputed approximation of √5, we can easily do this computation without the need for floating point, without division, and without a sqrt function. (If the approximation isn't accurate, we might need to adjust the result up or down, but that's easy to do using the identity (n+1)² = n²+2n+1; once we have n², we can compute (n+1)² with only addition.
We still need to solve the problem of precision, so we'll need some way of dealing with 66-bit integers. But we only need to implement addition and multiplication of positive integers, is considerably simpler than a full-fledged bignum package. Indeed, if we can prove that our square root estimation is close enough, we could safely do the verification modulo 2³¹.
So the analytic solution can be made to work, but before diving into it, we should ask whether it's the best solution. One very common caregory of suboptimal programming is clinging desperately to the first idea you come up with even when as its complications become increasingly evident. That will be one of the things the interviewer wants to know about you: how flexible are you when presented with new information or new requirements.
So what other ways are there to know if n is a Fibonacci number. One interesting fact is that if n is Fib(k), then k is the floor of logφ(k×√5 + 0.5). Since logφ is easily computed from log2, which in turn can be approximated by a simple bitwise operation, we could try finding an approximation of k and verifying it using the classic O(log k) recursion for computing Fib(k). None of the above involved numbers bigger than the capacity of a 32-bit signed type.
Even more simply, we could just run through the Fibonacci series in a loop, checking to see if we hit the target number. Only 47 loops are necessary. Alternatively, these 47 numbers could be precalculated and searched with binary search, using far less than the 1k bytes you are allowed.
It is unlikely an interviewer for a programming position would be testing for knowledge of a specific property of the Fibonacci sequence. Thus, unless they present the property to be tested, they are examining the candidate’s approaches to problems of this nature and their general knowledge of algorithms. Notably, the notion to iterate through a table of squares is a poor response on several fronts:
At a minimum, binary search should be the first thought for table look-up. Some calculated look-up approaches could also be proposed for discussion, such as using find-first-set-bit instruction to index into a table.
Hashing might be another idea worth considering, especially since an efficient customized hash might be constructed.
Once we have decided to use a table, it is likely a direct table of Fibonacci numbers would be more useful than a table of squares.

c++ random number generator : std::rand() vs RANMAR : smallest double

I recently discovered something that bugs me...
I use RANMAR algorithm to generate random number. It is said that it is the best algorithm currently available (if you know a better one, please let me know).
I was really surprised to notice that the smallest double it can generate is roughly 1e-8.
So I tried with std::rand() with the common
(double)rand() / RAND_MAX;
way of generating double and I noticed that the smallest number is roughly 1e-9. I kind of understand that in this case because 1.0/RAND_MAX is roughly 1.0/2^31 ~ 1e-9 (on my computer, I know that RAND_MAX can have different values).
I was therefore wondering if it was possible to generate random double between [0:1] with the smallest possible value beeing near machine precision.
[edit]
I just want to be more precise... when I said that the smallest number that was generated was of the order of 1e-9, I should also have said that the next one is 0. Therefore there is a huge gap (infinity number of numbers) between 1e-9 and 0 that will be considered as 0. I mean by that if you do the following test
double x(/*is a value computed somehow in your code that is small ~1e-12*/);
if(x>rand()){ /*will be true for all random number generated that are below 1e-9*/}
So the condition will be true for too many numbers...
[/edit]

Bin packing - exact np-hard exponential algorithm

I wrote a heuristic algorithm for the bin packing problem using best-fit aproach,
itens S=(i1,...,in), bins size T, and a want to create a real exact exponential
algorithm witch calculates the optimal solution(minimum numbers of bins to pack all
the itens), but I have no idea how to check every possibility of packing, I'm doing in C.
Somebody can tell me any ideas what structs I have to use? How can I test all de combinations of itens? It has to be a recursive algorithm? Have some book ou article that may help me?
sorry for my bad english
The algorithm given will find one packing, usually one that is quite good, but not necessarily optimal, so it does not solve the problem.
For NP complete problems, algorithms that solve them are usually easiest to describe recursively (iterative descriptions mostly end up making explicit all the book-keeping that is hidden by recursion). For bin packing, you may start with a minimum number of bins (upper Gaussian of sum of object sizes divided by bin size, but you can even start with 1), try all combinations of assignments of objects to bins, check for each such assignment that it is legal (sum of bin content sizes <= bin size for each bin), return accepting (or outputing the found assignment) if it is, or increase number of bins if no assignment was found.
You asked for structures, here is one idea: Each bin should somehow describe the objects contained (list or array) and you need a list (or array) of all your bins. With these fairly simple structures, a recursive algorithm looks like this: To try out all possible assignments you run a loop for each object that will try assigning it to each available bin. Either you wait for all objects to be assigned before checking legality, or (as a minor optimization) you only assign an object to the bins it fits in before going on to the next object (that's the recursion that ends when the last object has been assigned), going back to the previous object if no such bin is found or (for the first object) increasing the number of bins before trying again.
Hope this helps.

How to get pseudo-random uniformly distributed integers in C good enough for statistical simulation?

I'm writing a Monte Carlo simulation and am going to need a lot of random bits for generating integers uniformly distributed over {1,2,...,N} where N<40. The problem with using the C rand function is that I'd waste a lot of perfectly good bits using the standard rand % N technique. What's a better way for generating the integers?
I don't need cryptographically secure random numbers, but I don't want them to skew my results. Also, I don't consider downloading a batch of bits from random.org a solution.
rand % N does not work; it skews your results unless RAND_MAX + 1 is a multiple of N.
A correct approach is to figure out the largest multiple of N that's smaller than RAND_MAX, and then generate random numbers until it's less than that value. Only then should you do the modulo operation. This gives you a worst-case rejection ratio of 50%.
in addition to oli's answer:
if you're desperately concerned about bits then you can manage a queue of bits by hand, only retrieving as many as are necessary for the next number (ie upper(log2(n))).
but you should make sure that your generator is good enough. simple linear congruential (sp?) generators are better in the higher bits than the lower (see comments) so your current modular division approach makes more sense there.
numerical recipes has a really good section on all this and is very easy to read (not sure it mentions saving bits, but as a general ref).
update if you're unsure whether it's needed or not, i would not worry about this for now (unless you have better advice from someone who understands your particular context).
Represent rand in base40 and take the digits as numbers. Drop any incomplete digits, that is, drop the first digit if it doesn't have the full range [0..39] and drop the whole random number if the first digit takes its highest-possible value (e.g. if RAND_MAX is base40 is 21 23 05 06, drop all numbers having the highest base-40 digit 21).

Further speeding up of Sieve method of Eratosthenes to find primes

I saw this c code of using Sieve method of Eratosthenes to find primes, but I cannot extend it to even larger integers (for example, to 1000000000 and even larger) because of memory consumption to allocate such a large char array.
What would be the strategies to extend the code to larger numbers? Any references are also welcome.
Thanks.
The standard improvement to apply would be to treat each i-th bit as representing the number 2*i+1, thus representing odds only, cutting the size of the array in half. This would also entail, for each new prime p, starting the marking-off from p*p and incrementing by 2*p, to skip over evens. 2 itself is a special case. See also this question with a lot of answers.
Another strategy is to switch to the segmented sieve. That way you only need about pi(sqrt(m)) = 2*sqrt(m)/log(m) memory (m being your upper limit) set aside for the initial sequence of primes with which you'd sieve smaller fixed-sized array, sequentially representing segments of numbers. If you only need primes in some narrow far away range [m-d,m], you'd skip directly to sieving that range after all the needed primes have been gathered, as shown e.g. in this answer.
Per your specifics, to get primes up to 10^9 in value, working with one contiguous array is still possible. Using a bitarray for odds only, you'd need 10^9/16 bytes, i.e. about 60 MB of memory. Easier to work by segments; we only need 3402 primes, below 31627, to sieve each segment array below 10^9.
Exactly because of the size of the array required, the Sieve of Eratosthenes becomes impractical at some point. A modified sieve is common to find larger primes, (as explained on Wikipedia).
You could use gmp library. See Speed up bitstring/bit operations in Python? for the fast implementation of Sieve of Eratosthenes. It should be relatively easy to translate the provided solutions to C.

Resources