Can CUDA do argmax? - c

Question says it all;
Assuming each threads are doing something like
value=blockDim.x*blockIdx.x+threadIdx.x;
result=f(value);
where f is a device function, its easy enough to find the max result by adding an atomicMax() call, but how could you find out what the value was?

Does this make sense? Just add an if statement comparing the max result to the thread's result. If it matches, save the thread's value.
value=blockDim.x*blockIdx.x+threadIdx.x;
result=f(value);
atomicMax(max,result);
if result==*max:
max_value = value;
Or, perhaps you need to specify behavior if multiple threads have the max result... for example taking the lowest thread:
value=blockDim.x*blockIdx.x+threadIdx.x;
result=f(value);
atomicMax(max,result);
if result==*max:
atomicMin(max_value,value);
That said, if you are finding the max result out of every thread, you will want to use a reduction instead of atomicMax. If I understand correctly, the atomicMax function is basically going to execute serially, whereas a reduction will be largely in parallel. When you use a reduction, you can manually track the value along with the result - that's what I do. (Although perhaps the above if statement approach will work at the end of the reduction, too. I may have to try it in my code...)

Related

multiplication of all integers between two given variables using recursion

Yes this is homework. I am not asking for any easy answers, just help moving in the right direction. here is the assignment: "Create a function that receives two numbers: a and b. The function calculates and returns the multiplication of all the numbers between a and b. Create three versions of this function."
I created the function using a for loop and a while loop, but I am at a loss how to use recursion- the final part of the assignment.
Kudos for admitting this is a homework question. As such, while I won't give you the answer, I will give you a few pointers towards it.
When writing a recursive function, there are two key things to consider:
What stops the recursion, and
What happens until the recursion stops
In your case, where you have to calculate the product of a list of numbers, this works out as:
What should the function do when there is only 1 item in the list? (ie: when a and b are the same)
How can I multiply one element by the product of the rest of the list?
For extra credit, look up tail recursion and understand why it can help keep your memory usage down.
Does that give you enough of a start?
It's a simple instance of dynamic programming — you start with one problem and attempt to resolve it by breaking it into problems that are easier to solve and combining the results.
You can then usually attack these problems by working backwards: what's the most trivial case, that you could answer immediately? What would you do if the problem were a notch harder than that?
As you've explicitly been told to find a recursive solution, you can assume that you're looking for a method that can either directly return a result or else must call itself with modified parameters, and do something with that result to get its own.
Failing that, given that the question is slightly artificial, consider looking up how you could literally just implement a for loop using a recursive structure, then directly adapt your existing for loop. No great thought about the nature of breaking problems down, just looking at how to express your existing solution in a different way.
function recursiveMultiplication(num1, num2) {
if (num2 == num1) {
return num2;
}
return num2 * recursiveMultiplication(num1, num2 - 1);
}
console.log(recursiveMultiplication(5, 8));

Matlab: Return input value with the highest output from a custom function

I have a vector of numbers like this:
myVec= [ 1 2 3 4 5 6 7 8 ...]
and I have a custom function which takes the input of one number, performs an algorithm and returns another number.
cust(1)= 55, cust(2)= 497, cust(3)= 14, etc.
I want to be able to return the number in the first vector which yielded the highest outcome.
My current thought is to generate a second vector, outcomeVec, which contains the output from the custom function, and then find the index of that vector that has max(outcomeVec), then match that index to myVec. I am wondering, is there a more efficient way of doing this?
What you described is a good way to do it.
outcomeVec = myfunc(myVec);
[~,ndx] = max(outcomeVec);
myVec(ndx) % input that produces max output
Another option is to do it with a loop. This saves a little memory, but may be slower.
maxOutputValue = -Inf;
maxOutputNdx = NaN;
for ndx = 1:length(myVec)
output = myfunc(myVec(ndx));
if output > maxOutputValue
maxOutputValue = output;
maxOutputNdx = ndx;
end
end
myVec(maxOutputNdx) % input that produces max output
Those are pretty much your only options.
You could make it fancy by writing a general purpose function that takes in a function handle and an input array. That method would implement one of the techniques above and return the input value that produces the largest output.
Depending on the size of the range of discrete numbers you are searching over, you may find a solution with a golden section algorithm works more efficiently. I tried for instance to minimize the following:
bf = -21;
f =#(x) round(x-bf).^2;
within the range [-100 100] with a routine based on a script from the Mathworks file exchange. This specific file exchange script does not appear to implement the golden section correctly as it makes two function calls per iteration. After fixing this the number of calls required is reduced to 12, which certainly beats evaluating the function 200 times prior to a "dumb" call to min. The gains can quickly become dramatic. For instance, if the search region is [-100000 100000], golden finds the minimum in 25 function calls as opposed to 200000 - the dependence of the number of calls in golden section on the range is logarithmic, not linear.
So if the range is sufficiently large, other methods can definitely beat min by requiring less function calls. Minimization search routines sometimes incorporate such a search in early steps. However you will have a problem with convergence (termination) criteria, which you will have to modify so that the routine knows when to stop. The best option is probably to narrow the search region for application of min by starting out with a few iterations of golden section.
An important caveat is that golden section is guaranteed to work only with unimodal regions, that is, displaying a single minimum. In a region containing multiple minima it's likely to get stuck in one and may miss the global minimum. In that sense min is a sure bet.
Note also that the function in the example here rounds input x, whereas your function takes an integer input. This means you would have to place a wrapper around your function which rounds the input passed by the calling golden routine.
Others appear to have used genetic algorithms to perform such a search, although I did not research this.

Crossover function for genetic

I am writing a Time table generator in java, using AI approaches to satisfy the hard constraints and help find an optimal solution. So far I have implemented and Iterative construction (a most-constrained first heuristic) and Simulated Annealing, and I'm in the process of implementing a genetic algorithm.
Some info on the problem, and how I represent it then :
I have a set of events, rooms , features (that events require and rooms satisfy), students and slots
The problem consists in assigning to each event a slot and a room, such that no student is required to attend two events in one slot, all the rooms assigned fulfill the necessary requirements.
I have a grading function that for each set if assignments grades the soft constraint violations, thus the point is to minimize this.
The way I am implementing the GA is I start with a population generated by the iterative construction (which can leave events unassigned) and then do the normal steps: evaluate, select, cross, mutate and keep the best. Rinse and repeat.
My problem is that my solution appears to improve too little. No matter what I do, the populations tends to a random fitness and is stuck there. Note that this fitness always differ, but nevertheless a lower limit will appear.
I suspect that the problem is in my crossover function, and here is the logic behind it:
Two assignments are randomly chosen to be crossed. Lets call them assignments A and B. For all of B's events do the following procedure (the order B's events are selected is random):
Get the corresponding event in A and compare the assignment. 3 different situations might happen.
If only one of them is unassigned and if it is possible to replicate
the other assignment on the child, this assignment is chosen.
If both of them are assigned, but only one of them creates no
conflicts when assigning to the child, that one is chosen.
If both of them are assigned and none create conflict, on of
them is randomly chosen.
In any other case, the event is left unassigned.
This creates a child with some of the parent's assignments, some of the mother's, so it seems to me it is a valid function. Moreover, it does not break any hard constraints.
As for mutation, I am using the neighboring function of my SA to give me another assignment based on on of the children, and then replacing that child.
So again. With this setup, initial population of 100, the GA runs and always tends to stabilize at some random (high) fitness value. Can someone give me a pointer as to what could I possibly be doing wrong?
Thanks
Edit: Formatting and clear some things
I think GA only makes sense if part of the solution (part of the vector) has a significance as a stand alone part of the solution, so that the crossover function integrates valid parts of a solution between two solution vectors. Much like a certain part of a DNA sequence controls or affects a specific aspect of the individual - eye color is one gene for example. In this problem however the different parts of the solution vector affect each other making the crossover almost meaningless. This results (my guess) in the algorithm converging on a single solution rather quickly with the different crossovers and mutations having only a negative affect on the fitness.
I dont believe GA is the right tool for this problem.
If you could please provide the original problem statement, I will be able to give you a better solution. Here is my answer for the present moment.
A genetic algorithm is not the best tool to satisfy hard constraints. This is an assigment problem that can be solved using integer program, a special case of a linear program.
Linear programs allow users to minimize or maximize some goal modeled by an objective function (grading function). The objective function is defined by the sum of individual decisions (or decision variables) and the value or contribution to the objective function. Linear programs allow for your decision variables to be decimal values, but integer programs force the decision variables to be integer values.
So, what are your decisions? Your decisions are to assign students to slots. And these slots have features which events require and rooms satisfy.
In your case, you want to maximize the number of students that are assigned to a slot.
You also have constraints. In your case, a student may only attend at most one event.
The website below provides a good tutorial on how to model integer programs.
http://people.brunel.ac.uk/~mastjjb/jeb/or/moreip.html
For a java specific implementation, use the link below.
http://javailp.sourceforge.net/
SolverFactory factory = new SolverFactoryLpSolve(); // use lp_solve
factory.setParameter(Solver.VERBOSE, 0);
factory.setParameter(Solver.TIMEOUT, 100); // set timeout to 100 seconds
/**
* Constructing a Problem:
* Maximize: 143x+60y
* Subject to:
* 120x+210y <= 15000
* 110x+30y <= 4000
* x+y <= 75
*
* With x,y being integers
*
*/
Problem problem = new Problem();
Linear linear = new Linear();
linear.add(143, "x");
linear.add(60, "y");
problem.setObjective(linear, OptType.MAX);
linear = new Linear();
linear.add(120, "x");
linear.add(210, "y");
problem.add(linear, "<=", 15000);
linear = new Linear();
linear.add(110, "x");
linear.add(30, "y");
problem.add(linear, "<=", 4000);
linear = new Linear();
linear.add(1, "x");
linear.add(1, "y");
problem.add(linear, "<=", 75);
problem.setVarType("x", Integer.class);
problem.setVarType("y", Integer.class);
Solver solver = factory.get(); // you should use this solver only once for one problem
Result result = solver.solve(problem);
System.out.println(result);
/**
* Extend the problem with x <= 16 and solve it again
*/
problem.setVarUpperBound("x", 16);
solver = factory.get();
result = solver.solve(problem);
System.out.println(result);
// Results in the following output:
// Objective: 6266.0 {y=52, x=22}
// Objective: 5828.0 {y=59, x=16}
I would start by measuring what's going on directly. For example, what fraction of the assignments are falling under your "any other case" catch-all and therefore doing nothing?
Also, while we can't really tell from the information given, it doesn't seem any of your moves can do a "swap", which may be a problem. If a schedule is tightly constrained, then once you find something feasible, it's likely that you won't be able to just move a class from room A to room B, as room B will be in use. You'd need to consider ways of moving a class from A to B along with moving a class from B to A.
You can also sometimes improve things by allowing constraints to be violated. Instead of forbidding crossover from ever violating a constraint, you can allow it, but penalize the fitness in proportion to the "badness" of the violation.
Finally, it's possible that your other operators are the problem as well. If your selection and replacement operators are too aggressive, you can converge very quickly to something that's only slightly better than where you started. Once you converge, it's very difficult for mutations alone to kick you back out into a productive search.
I think there is nothing wrong with GA for this problem, some people just hate Genetic Algorithms no matter what.
Here is what I would check:
First you mention that your GA stabilizes at a random "High" fitness value, but isn't this a good thing? Does "high" fitness correspond to good or bad in your case? It is possible you are favoring "High" fitness in one part of your code and "Low" fitness in another thus causing the seemingly random result.
I think you want to be a bit more careful about the logic behind your crossover operation. Basically there are many situations for all 3 cases where making any of those choices would not cause an increase in fitness at all of the crossed-over individual, but you are still using a "resource" (an assignment that could potentially be used for another class/student/etc.) I realize that a GA traditionally will make assignments via crossover that cause worse behavior, but you are already performing a bit of computation in the crossover phase anyway, why not choose one that actually will improve fitness or maybe don't cross at all?
Optional Comment to Consider : Although your iterative construction approach is quite interesting, this may cause you to have an overly complex Gene representation that could be causing problems with your crossover. Is it possible to model a single individual solution as an array (or 2D array) of bits or integers? Even if the array turns out to be very long, it may be worth it use a more simple crossover procedure. I recommend Googling "ga gene representation time tabling" you may find an approach that you like more and can more easily scale to many individuals (100 is a rather small population size for a GA, but I understand you are still testing, also how many generations?).
One final note, I am not sure what language you are working in but if it is Java and you don't NEED to code the GA by hand I would recommend taking a look at ECJ. Maybe even if you have to code by hand, it could help you develop your representation or breeding pipeline.
Newcomers to GA can make any of a number of standard mistakes:
In general, when doing crossover, make sure that the child has some chance of inheriting that which made the parent or parents winner(s) in the first place. In other words, choose a genome representation where the "gene" fragments of the genome have meaningful mappings to the problem statement. A common mistake is to encode everything as a bitvector and then, in crossover, to split the bitvector at random places, splitting up the good thing the bitvector represented and thereby destroying the thing that made the individual float to the top as a good candidate. A vector of (limited) integers is likely to be a better choice, where integers can be replaced by mutation but not by crossover. Not preserving something (doesn't have to be 100%, but it has to be some aspect) of what made parents winners means you are essentially doing random search, which will perform no better than linear search.
In general, use much less mutation than you might think. Mutation is there mainly to keep some diversity in the population. If your initial population doesn't contain anything with a fractional advantage, then your population is too small for the problem at hand and a high mutation rate will, in general, not help.
In this specific case, your crossover function is too complicated. Do not ever put constraints aimed at keeping all solutions valid into the crossover. Instead the crossover function should be free to generate invalid solutions and it is the job of the goal function to somewhat (not totally) penalize the invalid solutions. If your GA works, then the final answers will not contain any invalid assignments, provided 100% valid assignments are at all possible. Insisting on validity in the crossover prevents valid solutions from taking shortcuts through invalid solutions to other and better valid solutions.
I would recommend anyone who thinks they have written a poorly performing GA to conduct the following test: Run the GA a few times, and note the number of generations it took to reach an acceptable result. Then replace the winner selection step and goal function (whatever you use - tournament, ranking, etc) with a random choice, and run it again. If you still converge roughly at the same speed as with the real evaluator/goal function then you didn't actually have a functioning GA. Many people who say GAs don't work have made some mistake in their code which means the GA converges as slowly as random search which is enough to turn anyone off from the technique.

Recursion within a thread

Is it a good idea to call a recursive function inside a thread ?
I am creating 10 threads, the thread function in turn call a recursive function . The bad part is
ThreadFunc( )
{
for( ;condn ; )
recursiveFunc(objectId);
}
bool recursiveFunc(objectId)
{
//Get a instance to the database connection
// Query for attibutes of this objectId
if ( attibutes satisfy some condition)
return true;
else
recursiveFunc(objectId) // thats the next level of objectId
}
The recursive function has some calls to the database
My guess is that a call to recursive function inside a loop is causing a performance degradation. Can anyone confirm
Calling a function recursively inside a thread is not a bad idea per se. The only thing you have to be aware of is to limit the recursion depth, or you may produce a (wait for it...) stack overflow. This is not specific to multithreading but applies in any case where you use recursion.
In this case, I would recommend against recursion because it's not necessary. Your code is an example of tail recursion, which can always be replaced with a loop. This eliminates the stack overflow concern:
bool recursiveFunc(objectId)
{
do
{
// Get an instance to the database connection
// Query for attributes of this objectId
// Update objectId if necessary (not sure what the "next level of objectId" is)
}
while(! attributes satisfy some condition);
return true;
}
There's no technical reason why this wouldn't work - it's perfectly legal.
Why is this code the "bad part"?
You'll need to debug/profile this and recursiveFunc to see where the performance degradation is.
Going by the code you've posted have you checked that condn is ever satisfied so that your loop terminates. If not it will loop for ever.
Also what does recursiveFunc actually do?
UPDATE
Based on your comment that each thread performs 15,000 iterations the first thing I'd do would be to move the Get an instance to the database connection code outside recursiveFunc so that you are only getting it once per thread.
Even if you rewrite into a loop (as per Martin B's answer) you would still want to do this.
It depends on how the recursive function talks to the database. If each (or many) level of recursion reopens the database that can be the reason for degradation. If they all share the same "connection" to the database the problem is not in recursion but in the number of threads concurrently accessing the database.
The only potential problem I see with the posted code is that it can represent an infinite loop, and that's usually not what you want (so you'd have to force break somewhere on known reachable conditions to avoid having to abend the application in order to break out of the thread (and subsequently the thread).
Performance degradation can happen with both threading, recursion, and database access for a variety of reasons.
Whether any or all of them are at fault for your problems is impossible to ascertain from the little you're showing us.

Most simple and fast method for audio activity detection?

Given is an array of 320 elements (int16), which represent an audio signal (16-bit LPCM) of 20 ms duration. I am looking for a most simple and very fast method which should decide whether this array contains active audio (like speech or music), but not noise or silence. I don't need a very high quality of the decision, but it must be very fast.
It occurred to me first to add all squares or absolute values of the elements and compare their sum with a threshold, but such a method is very slow on my system, even if it is O(n).
You're not going to get much faster than a sum-of-squares approach.
One optimization that you may not be doing so far is to use a running total. That is, in each time step, instead of summing the squares of the last n samples, keep a running total and update that with the square of the most recent sample. To avoid your running total from growing and growing over time, add an exponential decay. In pseudocode:
decay_constant=0.999; // Some suitable value smaller than 1
total=0;
for t=1,...
// Exponential decay
total=total*decay_constant;
// Add in latest sample
total+=current_sample;
if total>threshold
// do something
end
end
Of course, you'll have to tune the decay constant and threshold to suit your application. If this isn't fast enough to run in real time, you have a seriously underpowered DSP...
You might try calculating two simple "statistics" - first would be spread (max-min). Silence will have very low spread. Second would be variety - divide the range of possible values into say 16 brackets (= value range) and as you go through the elements, determine in which bracket that element goes. Noise will have similar numbers for all brackets, whereas music or speech should prefer some of them while neglecting others.
This should be possible to do in just one pass through the array and you do not need complicated arithmetics, just some addition and comparison of values.
Also consider some approximation, for example take only each fourth value, thus reducing the number of checked elements to 80. For audio signal, this should be okay.
I did something like this a while back. After some experimentation I arrived at a solution that worked sufficiently well in my case.
I used the rate of change in the cube of the running average over about 120ms. When there is silence (only noise that is) the expression should be hovering around zero. As soon as the rate starts increasing over a couple of runs, you probably have some action going on.
rate = cur_avg^3 - prev_avg^3
I used a cube because the square just wasn't agressive enough. If the cube is to slow for you, try using the square and a bitshift instead. Hope this helps.
It is clearly that the complexity should be at least O(n). Probably some simple algorithms that calculate some value range are good for the moment but I would look for Voice Activity Detection on web and for related code samples.

Resources