Array Partitioning - arrays

I have a set of object numbered from 1 to n and there is only one instance of each object.Now say, I have received k request, each request is to get some set of object.
So how I should process these requests so that we can process maximum number of request. we have prior knowledge of all the request.
Ex: say we have 10 apple numbered from 1 to 10 and three requests:
request 1: {2,6}
request 2: {1,2,3,4,5}
request 3: {6,7,8,9,10}
so here if we process request 1 then we can full-fill only 1 request but if we process 2 and 3 we can full-fill 2 request.
Please suggest a optimized Algo.

The problem you describe is called maximum set packing: given a set (your array) an a set of subsets (your requests), find the maximum number of subsets (requests) that do not have an element in common (are pairwise disjoint).
As shown in the wikipedia article, the problem can be formulated as an integer linear program, which can be solved by a standard solver. Since those are highly optimized, that is as optimal as you can probably get. Like many packing problems, the problem is NP-hard, so you cannot do much better than brute-force if you try to implement this yourself.

Related

What exactly means Concurrency in Apache Bench? [duplicate]

The benchmark documentation says concurrency is how many requests are done simultaneously, while number of requests is total number of requests. What I'm wondering is, if I put a 100 requests at a concurrency level of 20, does that mean 5 tests of 20 requests at the same time, or 100 tests of 20 requests at the same time each? I'm assuming the second option, because of the example numbers quoted below..
I'm wondering because I frequently see results such as this one on some testing blogs:
Complete requests: 1000000
Failed requests: 2617614
This seems implausible, since the number of failed requests is higher than the number of total requests.
Edit: the site that displays the aforementioned numbers: http://zgadzaj.com/benchmarking-nodejs-basic-performance-tests-against-apache-php
OR could it be that it keeps trying until it reaches one million successes? Hm...
It means a single test with a total of 100 requests, keeping 20 requests open at all times. I think the misconception you have is that requests all take the same amount of time, which is virtually never the case. Instead of issuing requests in batches of 20, ab simply starts with 20 requests and issues a new one each time an existing request finishes.
For example, testing with ab -n 10 -c 3 would start with3 concurrent requests:
[1, 2, 3]
Let's say #2 finishes first, ab replaces it with a fourth:
[1, 4, 3]
... then #1 may finish, replaced by a fifth:
[5, 4, 3]
... Then #3 finishes:
[5, 4, 6]
... and so on, until a total of 10 requests have been made. (As requests 8, 9, and 10 complete the concurrency tapers off to 0 of course.)
Make sense?
As to your question about why you see results with more failures than total requests... I don't know the answer to that. I can't say I've seen that. Can you post links or test cases that show this?
Update: In looking at the source, ab tracks four types of errors which are detailed below the "Failed requests: ..." line:
Connect - (err_conn in source) Incremented when ab fails to set up the HTTP connection
Receive - (err_recv in source) Incremented when ab fails a read of the connection fails
Length - (err_length in source) Incremented when the response length is different from the length of the first good response received.
Exceptions - (err_except in source) Incremented when ab sees an error while polling the connection socket (e.g. the connection is killed by the server?)
The logic around when these occur and how they are counted (and how the total bad count is tracked) is, of necessity, a bit complex. It looks like the current version of ab should only count a failure once per request, but perhaps the author of that article was using a prior version that was somehow counting more than one? That's my best guess.
If you're able to reproduce the behavior, definitely file a bug.
I see nothing wrong. Failed requests can increment more than one error each. That's how ab works.
There are various statically declared buffers of fixed length.
Combined with the lazy parsing of the command line arguments,
the response headers from the server and other external inputs,
this might bite you.
You might notice for example that the previous node results have a similar count for 3 of the error counters. Most probably, from the 100 000 requests made only 8409 failed and not 25227.
Receive: 8409, Length: 8409, Exceptions: 8409

Generated unique id with 6 characters - handling when too much ids already used

​In my program you can book an item. This item has an id with 6 characters from 32 possible characters.
So my possibilities are 32^6. Every id must be unique.
func tryToAddItem {
if !db.contains(generateId()) {
addItem()
} else {
tryToAddItem()
}
}
For example 90% of my ids are used. So the probability that I call tryToAddItem 5 times is 0,9^5 * 100 = 59% isn't it?
So that is quite high. This are 5 database queries on a lot of datas.
When the probability is so high I want to implement a prefix „A-xxxxxx“.
What is a good condition for that? At which time do I will need a prefix?
In my example 90% ids were use. What is about the rest? Do I threw it away?
What is about database performance when I call tryToAddItem 5 times? I could imagine that this is not best practise.
For example 90% of my ids are used. So the probability that I call tryToAddItem 5 times is 0,9^5 * 100 = 59% isn't it?
Not quite. Let's represent the number of call you make with the random variable X, and let's call the probability of an id collision p. You want the probability that you make the call at most five times, or in general at most k times:
P(X≤k) = P(X=1) + P(X=2) + ... + P(X=k)
= (1-p) + (1-p)*p + (1-p)*p^2 +... + (1-p)*p^(k-1)
= (1-p)*(1 + p + p^2 + .. + p^(k-1))
If we expand this out all but two terms cancel and we get:
= 1- p^k
Which we want to be greater than some probability, x:
1 - p^k > x
Or with p in terms of k and x:
p < (1-x)^(1/k)
where you can adjust x and k for your specific needs.
If you want less than a 50% probability of 5 or more calls, then no more than (1-0.5)^(1/5) ≈ 87% of your ids should be taken.
First of all make sure there is an index on the id columns you are looking up. Then I would recommend thinking more in terms of setting a very low probability of a very bad event occurring. For example maybe making 20 calls slows down the database for too long, so we'd like to set the probability of this occurring to <0.1%. Using the formula above we find that no more than 70% of ids should be taken.
But you should also consider alternative solutions. Is remapping all ids to a larger space one time only a possibility?
Or if adding ids with prefixes is not a big deal then you could generate longer ids with prefixes for all new items going forward and not have to worry about collisions.
Thanks for response. I searched for alternatives and want show three possibilities.
First possibility: Create an UpcomingItemIdTable with 200 (more or less) valid itemIds. A task in the background can calculate them every minute (or what you need). So the action tryToAddItem will always get a valid itemId.
Second possibility
Is remapping all ids to a larger space one time only a possibility?
In my case yes. I think for other problems the answer will be: it depends.
Third possibility: Try to generate an itemId and when there is a collision try it again.
Possible collisions handling: Do some test before. Measure the time to generate itemIds when there are already 1000,10.000,100.000,1.000.000 etc. entries in the table. When the tryToAddItem method needs more than 100ms (or what you prefer) then increase your length from 6 to 7,8,9 characters.
Some thoughts
every request must be atomar
create an index on itemId
Disadvantages for long UUIDs in API: See https://zalando.github.io/restful-api-guidelines/#144
less usable, because...
-cannot be memorized and easily communicated by humans
-harder to use in debugging and logging analysis
-less convenient for consumer facing usage
-quite long: readable representation requires 36 characters and comes with higher memory and bandwidth consumption
-not ordered along their creation history and no indication of used id volume
-may be in conflict with additional backward compatibility support of legacy ids
[...]
TLDR: For my case every possibility is working. As so often it depends on the problem. Thanks for input.

which size of chunk will yield to best performance using master-worker with MPI?

Im using MPI to parrlel a program that is trying to solve the Metric TSP problem. I have P processors , and N cities to pass .
Each thread asks for work from the master, recieves a chunk - which is a range of permutation that he should check and calculates the minimal among them. I am optimizing this by pruning bad routes in advance.
There are total (N-1)! routes to calculate. each worker get a chunk with a number that represnt the first route he has to check and the also the last. In addition the master sends him the most recent best result known , so can easly prone bad routes in advance with some lower bound on thier remains.
Each time a worker is finding result that is better that the global , he asyncrounsly sends it to the all other workers and to the master.
Im not looking for better solution- I'm just trying to determine which chunk size is the best.
The best chunk size i've found so far is (n!)/(n/2)! , but it doesnt yield so good result .
please help me understand which chunk size is the best here. I'm trying to balance between the amount of computation and communication
thanks
This depends heavily on factors beyond your control: MPI implementation, total load on the machine, etc. However, I'd hazard a guess that it also heavily depends on how many worker processes there are. On that note, understand that MPI spawns processes, not threads.
Ultimately, as is often the case with most optimization questions, the answer is simply "test a lot of different settings and see which one is best". You may want to do this manually, or write a tester app that implements some sort of heuristic (e.g. a genetic algorithm).

Is neural network's response guaranteed on training data?

I'm trying to train an ANN (I use this library: http://leenissen.dk/fann/ ) and the results are somewhat puzzling - basically if I run the trained network on the same data used for training, the output is not what specified in the training set, but some random number.
For example, the first entry in the training file is something like
88.757004 88.757004 104.487999 138.156006 100.556000 86.309998 86.788002
1
with the first line being the input values and the second line is the desired output neuron's value. But when I feed the exact same data to the trained network, I get different results on each train attempt, and they are quite different from 1, e.g.:
Max epochs 500000. Desired error: 0.0010000000.
Epochs 1. Current error: 0.0686412785. Bit fail 24.
Epochs 842. Current error: 0.0008697828. Bit fail 0.
my test result -4052122560819626000.000000
and then on another attempt:
Max epochs 500000. Desired error: 0.0010000000.
Epochs 1. Current error: 0.0610717005. Bit fail 24.
Epochs 472. Current error: 0.0009952184. Bit fail 0.
my test result -0.001642
I realize that the training set size may be inadequate (I only have about a 100 input/output pairs so far), but shouldn't at least the training data trigger the right output value? The same code works fine for the "getting started" XOR function described at the FANN's website (I've already used up my 1 link limit)
Short answer: No
Longer answer (but possibly not the as correct):
1st: a training run only moves the weights of the neurons towards a position where they affect the output to be as in the testdata. After some/many iterations the output should be close to the expected output. Iff the neurol network is up to the task, which brings me to
2nd: Not every neuronal network works for every problem. For a single neuron it is pretty easy to come up with a simple function that can not get approximated by a single neuron. Though not as easy to see, the same limit applies for every neural network. In such cases your results will very likely look like random numbers. Edit after comment: In many cases this can be fixed by adding neurons to the network.
3rd: actually the first point is a strength of a neural network, because it allows the network to handle outliers nicely.
4th: I blame 3 for my lacking understanding of music. It just doesn't fit my brain ;-)
No, if you get your ANN to work perfectly on the training data, you either have a really easy problem or you're overfitting.

Search image pattern

I need to do a program that does this: given an image (5*5 pixels), I have to search how many images like that exist in another image, composed by many other images. That is, i need to search a given pattern in an image.
The language to use is C. I have to use parallel computing to search in the 4 angles (0º, 90º, 180º and 270º).
What is the best way to do that?
Seems straight forward.
Create 4 versions of the image rotated by 0°, 90°, 180°, and 270°.
Start four threads each with one version of the image.
For all positions from (0,0) to (width - 5, height - 5)
Comapare the 25 pixels of the reference image with the 25 pixels at the current position
If they are equal enough using some metric, report the finding.
Use normalized correlation to determine a match of templates.
#Daniel, Daniel's solution is good for leveraging your multiple CPUs. He doesn't mention a quality metric that would be useful and I would like to suggest one quality metric that is very common in image processing.
I suggest using normalized correlation[1] as a comparison metric because it outputs a number from -1 to +1. Where 0 is no correlation 1 would be output if the two templates were identical and -1 would be if the two templates were exactly opposite.
Once you compute the normalized correlation you can test to see if you have found the template by doing either a threshold test or a peak-to-average test[2].
[1 - footnote] How do you implement normalized correlation? It is pretty simple and only has two for loops. Once you have an implementation that is good enough you can verify your implementation by checking to see if the identical image gets you a 1.
[2 - footnote] You do the ratio of the max(array) / average(array_without_peak). Then threshold to make sure you have a good peak to average ratio.
There's no need to create the additional three versions of the image, just address them differently or use something like the class I created here. Better still, just duplicate the 5x5 matrix and rotate those instead. You can then linearly scan the image for all rotations (which is a good thing).
This problem will not scale well for parallel processing since the bottleneck is certainly accessing the image data. Having multiple threads accessing the same data will slow it down, especially if the threads get 'out of sync', i.e. one thread gets further through the image than the other threads so that the other threads end up reloading the data the first thread has discarded.
So, the solution I think will be most efficient is to create four threads that scan 5 lines of the image, one thread per rotation. A fifth thread loads the image data one line at a time and passes the line to each of the four scanning threads, waiting for all four threads to complete, i.e. load one line of image, append to five line buffer, start the four scanning threads, wait for threads to end and repeat until all image lines are read.
5 * 5 = 25
25 bits fits in an integer.
each image can be encoded as an array of 4 integers.
Iterate your larger image, (hopefully it is not too big),
pulling out all 5 * 5 sub images, convert to an array of 4 integers and compare.

Resources