Say I have some array of length n where arr[k] represents how much of object k I want. I also have some arbitrary number of arrays which I can sum integer multiples of in any combination - my goal being to minimise the sum of the absolute differences across each element.
So as a dumb example if my target was [2,1] and my options were A = [2,3] and B = [0,1], then I could take A - 2B and have a cost of 0
I’m wondering if there is an efficient algorithm for approximating something like this? It has a weird knapsack-y flavour to is it maybe just intractable for large n? It doesn’t seem very amenable to DP methods
This is the (NP-hard) closest vector problem. There's an algorithm due to Fincke and Pohst ("Improved methods for calculating vectors of short length in a lattice, including a complexity analysis"), but I haven't personally worked with it.
Related
I'm studying the Ising model, and I'm trying to efficiently compute a function H(σ) where σ is the current state of an LxL lattice (that is, σ_ij ∈ {+1, -1} for i,j ∈ {1,2,...,L}). To compute H for a particular σ, I need to perform the following calculation:
where ⟨i j⟩ indicates that sites σ_i and σ_j are nearest neighbors and (suppose) J is a constant.
A couple of questions:
Should I store my state σ as an LxL matrix or as an L2 list? Is one better than the other for memory accessing in RAM (which I guess depends on the way I'm accessing elements...)?
In either case, how can I best compute H?
Really I think this boils down to how can I access (and manipulate) the neighbors of every state most efficiently.
Some thoughts:
I see that if I loop through each element in the list or matrix that I'll be double counting, so is there a "best" way to return the unique neighbors?
Is there a better data structure that I'm not thinking of?
Your question is a bit broad and a bit confusing for me, so excuse me if my answer is not the one you are looking for, but I hope it will help (a bit).
An array is faster than a list when it comes to indexing. A matrix is a 2D array, like this for example (where N and M are both L for you):
That means that you first access a[i] and then a[i][j].
However, you can avoid this double access, by emulating a 2D array with a 1D array. In that case, if you want to access element a[i][j] in your matrix, you would now do, a[i * L + j].
That way you load once, but you multiply and add your variables, but this may still be faster in some cases.
Now as for the Nearest Neighbor question, it seems that you are using a square-lattice Ising model, which means that you are working in 2 dimensions.
A very efficient data structure for Nearest Neighbor Search in low dimensions is the kd-tree. The construction of that tree takes O(nlogn), where n is the size of your dataset.
Now you should think if it's worth it to build such a data structure.
PS: There is a plethora of libraries implementing the kd-tree, such as CGAL.
I encountered this problem during one of my school assignments and I think the solution depends on which programming language you are using.
In terms of efficiency, there is no better way than to write a for loop to sum neighbours(which are actually the set of 4 points{ (i+/-1,j+/-1)} for a given (i,j). However, when simd(sse etc) functions are available, you can re-express this as a convolution with a 2d kernel {0 1 0;1 0 1;0 1 0}. so if you use a numerical library which exploits simd functions you can obtain significant performance increase. You can see the example implementation of this here(https://github.com/zawlin/cs5340/blob/master/a1_code/denoiseIsingGibbs.py) .
Note that in this case, the performance improvement is huge because to evaluate it in python I need to write an expensive for loop.
In terms of work, there is in fact some waste as the unecessary multiplications and sum with zeros at corners and centers. So whether you can experience performance improvement depends quite a bit on your programming environment( if you are already in c/c++, it can be difficult and you need to use mkl etc to obtain good improvement)
I was trying to sharpen my skills by solving the Codality problems. I reached this one: https://codility.com/programmers/lessons/9-maximum_slice_problem/max_double_slice_sum/
I actually theoretically understand the solution:
Use Kadane's Algorithm on the array and store the sum at every index.
Reverse the array and do the same.
Find a point where the sum of both is max by looping over both result sets one at a time.
The max is the max double slice.
My question is not so much about how to solve the problem. My question is about how does one imagine that this will be way in which this problem can be solved. There are at-least 3 different concepts that need to be made use of:
The understanding that if all elements in the array are positive, or negative it is a different case than when there are some positive and negative elements in the array.
Kadane's Algorithm
Going over the array forward and reversed.
Despite all of this, Codality has tagged this problem as "Painless".
My questions is am I missing something? It seems hard that I would be able to solve this problem without knowing some of these concepts.
Is there a technique where I can start from scratch and very basic concepts and work my way up to the concepts required to solve this problem. Or is it that I am expected to know these concepts before even starting the problem?
How can I prepare my self to solve such problems where I don't know the required concepts in the future?
I think you are overthinking the problem, that's why you find it more difficult than it is:
The understanding that if all elements in the array are positive, or negative it is a different case than when there are some positive and negative elements in the array.
It doesn't have to be a different case. You might be able to come up with an algorithm that doesn't care about this distinction and works anyway.
You don't need to start by understanding this distinction, so don't think about it until or even if you have to.
Kadane's Algorithm
Don't think of an algorithm, think of what the problem requires. Usually that 10+ paragraph problem statement can be expressed in much less.
So let's see how we can simplify the problem statement.
It first defines a slice as a triplet (x, y, z). It's defined at the sum of elements starting at x+1, ending at z-1 and not containing y.
Then it asks for the maximum sum slice. If we need the maximum slice, do we need x and z in the definition? We might as well let it start and end anywhere as long as it gets us the maximum sum, no?
So redefine a slice as a subset of the array that starts anywhere, goes up to some y-1, continues from y+1 and ends anywhere. Much simpler, isn't it?
Now you need the maximum such slice.
Now you might be thinking that you need, for each y, the maximum sum subarray that starts at y+1 and the maximum sum subarray that ends at y-1. If you can find these, you can update a global max for each y.
So how do you do this? This should now point you towards Kadane's algorithm, which does half of what you want: it computes the maximum sum subarray ending at some x. So if you compute it from both sides, for each y, you just have to find:
kadane(y - 1) + kadane_reverse(y + 1)
And compare with a global max.
No special cases for negatives and positives. No thinking "Kadane's!" as soon as you see the problem.
The idea is to simplify the requirement as much as possible without changing its meaning. Then you use your algorithmic and deductive skills to reach a solution. These skills are honed with time and experience.
I've been learning about different algorithms in my spare time recently, and one that I came across which appears to be very interesting is called the HyperLogLog algorithm - which estimates how many unique items are in a list.
This was particularly interesting to me because it brought me back to my MySQL days when I saw that "Cardinality" value (which I always assumed until recently that it was calculated not estimated).
So I know how to write an algorithm in O(n) that will calculate how many unique items are in an array. I wrote this in JavaScript:
function countUniqueAlgo1(arr) {
var Table = {};
var numUnique = 0;
var numDataPoints = arr.length;
for (var j = 0; j < numDataPoints; j++) {
var val = arr[j];
if (Table[val] != null) {
continue;
}
Table[val] = 1;
numUnique++;
}
return numUnique;
}
But the problem is that my algorithm, while O(n), uses a lot of memory (storing values in Table).
I've been reading this paper about how to count duplicates in a list in O(n) time and using minimal memory.
It explains that by hashing and counting bits or something one can estimate within a certain probability (assuming the list is evenly distributed) the number of unique items in a list.
I've read the paper, but I can't seem to understand it. Can someone give a more layperson's explanation? I know what hashes are, but I don't understand how they are used in this HyperLogLog algorithm.
The main trick behind this algorithm is that if you, observing a stream of random integers, see an integer which binary representation starts with some known prefix, there is a higher chance that the cardinality of the stream is 2^(size of the prefix).
That is, in a random stream of integers, ~50% of the numbers (in binary) starts with "1", 25% starts with "01", 12,5% starts with "001". This means that if you observe a random stream and see a "001", there is a higher chance that this stream has a cardinality of 8.
(The prefix "00..1" has no special meaning. It's there just because it's easy to find the most significant bit in a binary number in most processors)
Of course, if you observe just one integer, the chance this value is wrong is high. That's why the algorithm divides the stream in "m" independent substreams and keep the maximum length of a seen "00...1" prefix of each substream. Then, estimates the final value by taking the mean value of each substream.
That's the main idea of this algorithm. There are some missing details (the correction for low estimate values, for example), but it's all well written in the paper. Sorry for the terrible english.
A HyperLogLog is a probabilistic data structure. It counts the number of distinct elements in a list. But in comparison to a straightforward way of doing it (having a set and adding elements to the set) it does this in an approximate way.
Before looking how the HyperLogLog algorithm does this, one has to understand why you need it. The problem with a straightforward way is that it consumes O(distinct elements) of space. Why there is a big O notation here instead of just distinct elements? This is because elements can be of different sizes. One element can be 1 another element "is this big string". So if you have a huge list (or a huge stream of elements) it will take a lot memory.
Probabilistic Counting
How can one get a reasonable estimate of a number of unique elements? Assume that you have a string of length m which consists of {0, 1} with equal probability. What is the probability that it will start with 0, with 2 zeros, with k zeros? It is 1/2, 1/4 and 1/2^k. This means that if you have encountered a string starting with k zeros, you have approximately looked through 2^k elements. So this is a good starting point. Having a list of elements that are evenly distributed between 0 and 2^k - 1 you can count the maximum number of the biggest prefix of zeros in binary representation and this will give you a reasonable estimate.
The problem is that the assumption of having evenly distributed numbers from 0 t 2^k-1 is too hard to achieve (the data we encountered is mostly not numbers, almost never evenly distributed, and can be between any values. But using a good hashing function you can assume that the output bits would be evenly distributed and most hashing function have outputs between 0 and 2^k - 1 (SHA1 give you values between 0 and 2^160). So what we have achieved so far is that we can estimate the number of unique elements with the maximum cardinality of k bits by storing only one number of size log(k) bits. The downside is that we have a huge variance in our estimate. A cool thing that we almost created 1984's probabilistic counting paper (it is a little bit smarter with the estimate, but still we are close).
LogLog
Before moving further, we have to understand why our first estimate is not that great. The reason behind it is that one random occurrence of high frequency 0-prefix element can spoil everything. One way to improve it is to use many hash functions, count max for each of the hash functions and in the end average them out. This is an excellent idea, which will improve the estimate, but LogLog paper used a slightly different approach (probably because hashing is kind of expensive).
They used one hash but divided it into two parts. One is called a bucket (total number of buckets is 2^x) and another - is basically the same as our hash. It was hard for me to get what was going on, so I will give an example. Assume you have two elements and your hash function which gives values form 0 to 2^10 produced 2 values: 344 and 387. You decided to have 16 buckets. So you have:
0101 011000 bucket 5 will store 1
0110 000011 bucket 6 will store 4
By having more buckets you decrease the variance (you use slightly more space, but it is still tiny). Using math skills they were able to quantify the error (which is 1.3/sqrt(number of buckets)).
HyperLogLog
HyperLogLog does not introduce any new ideas, but mostly uses a lot of math to improve the previous estimate. Researchers have found that if you remove 30% of the biggest numbers from the buckets you significantly improve the estimate. They also used another algorithm for averaging numbers. The paper is math-heavy.
And I want to finish with a recent paper, which shows an improved version of hyperLogLog algorithm (up until now I didn't have time to fully understand it, but maybe later I will improve this answer).
The intuition is if your input is a large set of random number (e.g. hashed values), they should distribute evenly over a range. Let's say the range is up to 10 bit to represent value up to 1024. Then observed the minimum value. Let's say it is 10. Then the cardinality will estimated to be about 100 (10 × 100 ≈ 1024).
Read the paper for the real logic of course.
Another good explanation with sample code can be found here:
Damn Cool Algorithms: Cardinality Estimation - Nick's Blog
I have a simple machine learning question:
I have n (~110) elements, and a matrix of all the pairwise distances. I would like to choose the 10 elements that are most far apart. That is, I want to
Maximize:
Choose 10 different elements.
Return min distance over (all pairings within the 10).
My distance metric is symmetric and respects the triangle inequality.
What kind of algorithm can I use? My first instinct is to do the following:
Cluster the n elements into 20
clusters.
Replace each cluster with just the
element of that cluster that is
furthest from the mean element of
the original n.
Use brute force to solve the
problem on the remaining 20
candidates. Luckily, 20 choose 10 is
only 184,756.
Edit: thanks to etarion's insightful comment, changed "Return sum of (distances)" to "Return min distance" in the optimization problem statement.
Here's how you might approach this combinatorial optimization problem by taking the convex relaxation.
Let D be an upper triangular matrix with your distances on the upper triangle. I.e. where i < j, D_i,j is the distance between elements i and j. (Presumably, you'll have zeros on the diagonal, as well.)
Then your objective is to maximize x'*D*x, where x is binary valued with 10 elements set to 1 and the rest to 0. (Setting the ith entry in x to 1 is analogous to selecting the ith element as one of your 10 elements.)
The "standard" convex optimization thing to do with a combinatorial problem like this is to relax the constraints such that x need not be discrete valued. Doing so gives us the following problem:
maximize y'*D*y
subject to: 0 <= y_i <= 1 for all i, 1'*y = 10
This is (morally) a quadratic program. (If we replace D with D + D', it'll become a bona fide quadratic program and the y you get out should be no different.) You can use an off-the-shelf QP solver, or just plug it in to the convex optimization solver of your choice (e.g. cvx).
The y you get out need not be (and probably won't be) a binary vector, but you can convert the scalar values to discrete ones in a bunch of ways. (The simplest is probably to let x be 1 in the 10 entries where y_i is highest, but you might need to do something a little more complicated.) In any case, y'*D*y with the y you get out does give you an upper bound for the optimal value of x'*D*x, so if the x you construct from y has x'*D*x very close to y'*D*y, you can be pretty happy with your approximation.
Let me know if any of this is unclear, notation or otherwise.
Nice question.
I'm not sure if it can be solved exactly in an efficient manner, and your clustering based solution seems reasonable. Another direction to look at would be local search method such as simulated annealing and hill climbing.
Here's an obvious baseline I would compare any other solution against:
Repeat 100 times:
Greedily select the datapoint that whose removal decreases the objective function the least and remove it.
I have an array (arr) of elements, and a function (f) that takes 2 elements and returns a number.
I need a permutation of the array, such that f(arr[i], arr[i+1]) is as little as possible for each i in arr. (and it should loop, ie. it should also minimize f(arr[arr.length - 1], arr[0]))
Also, f works sort of like a distance, so f(a,b) == f(b,a)
I don't need the optimum solution if it's too inefficient, but one that works reasonable well and is fast since I need to calculate them pretty much in realtime (I don't know what to length of arr is, but I think it could be something around 30)
What does "such that f(arr[i], arr[i+1]) is as little as possible for each i in arr" mean? Do you want minimize the sum? Do you want to minimize the largest of those? Do you want to minimize f(arr[0],arr[1]) first, then among all solutions that minimize this, pick the one that minimizes f(arr[1],arr[2]), etc., and so on?
If you want to minimize the sum, this is exactly the Traveling Salesman Problem in its full generality (well, "metric TSP", maybe, if your f's indeed form a metric). There are clever optimizations to the naive solution that will give you the exact optimum and run in reasonable time for about n=30; you could use one of those, or one of the heuristics that give you approximations.
If you want to minimize the maximum, it is a simpler problem although still NP-hard: you can do binary search on the answer; for a particular value d, draw edges for pairs which have f(x,y)
If you want to minimize it lexiocographically, it's trivial: pick the pair with the shortest distance and put it as arr[0],arr[1], then pick arr[2] that is closest to arr[1], and so on.
Depending on where your f(,)s are coming from, this might be a much easier problem than TSP; it would be useful for you to mention that as well.
You're not entirely clear what you're optimizing - the sum of the f(a[i],a[i+1]) values, the max of them, or something else?
In any event, with your speed limitations, greedy is probably your best bet - pick an element to make a[0] (it doesn't matter which due to the wraparound), then choose each successive element a[i+1] to be the one that minimizes f(a[i],a[i+1]).
That's going to be O(n^2), but with 30 items, unless this is in an inner loop or something that will be fine. If your f() really is associative and commutative, then you might be able to do it in O(n log n). Clearly no faster by reduction to sorting.
I don't think the problem is well-defined in this form:
Let's instead define n fcns g_i : Perms -> Reals
g_i(p) = f(a^p[i], a^p[i+1]), and wrap around when i+1 > n
To say you want to minimize f over all permutations really implies you can pick a value of i and minimize g_i over all permutations, but for any p which minimizes g_i, a related but different permatation minimizes g_j (just conjugate the permutation). So therefore it makes no sense to speak minimizing f over permutations for each i.
Unless we know something more about the structure of f(x,y) this is an NP-hard problem. Given a graph G and any vertices x,y let f(x,y) be 1 if there is no edge and 0 if there is an edge. What the problem asks is an ordering of the vertices so that the maximum f(arr[i],arr[i+1]) value is minimized. Since for this function it can only be 0 or 1, returning a 0 is equivalent to finding a Hamiltonian path in G and 1 is saying that no such path exists.
The function would have to have some sort of structure that disallows this example for it to be tractable.