I'm trying to time a loop by using either gettimeofday or cudaEventRecord. However, they report very different results. Here's the pseudo code:
// get time here (start)
while (..)
{
. ..
}
// get time here (stop)
// calculate time
// time = (stop.tv_usec-start.tv_usec)*1.0e-3 + (stop.tv_sec - start.tv_sec); or
// cudaEventElapsedTime(&time,start,stop);
I do not use both of them at the same time but use each separately and the results are not the same. I also called cudaEventSynchrosize(stop) when using cudaEvent. Thanks.
I see problem in measuring units. I am not much of cuda programmer, but I can tell about gettimeofday function. gettimeofday expresses the time in seconds and microseconds, so the right pseudocode line would be:
// time = (stop.tv_usec-start.tv_usec)*1.0e-6 + (stop.tv_sec - start.tv_sec);
There are cuda specific solution given here: Timing CUDA operations.
I hope this helped.
Related
I'm trying to detect tones in an phone audio signal (Busy and Ring to be exact).
I used a Goertzel algorithm to detect one frquency in the signal.
I dont need to search for multiple frquencies, it's only the one I want or not (1/0) (it's before the call starts)
On another side I wrote a pattern detector (on for 300ms, off for 100ms, on for 300ms, off for 100ms for example). I get a percentage of similitude to my pattern than I decide if I found it or not.
I worked with sample from one tone database web site but it seems to give generated signal : too much clean compared to the real sound you can get from a phone.
My goertzel filter gives something like this in reality:
When I run this on one sample I got something like this:
https://i.stack.imgur.com/rZdgZ.png
How to convert this results so I can get 1 when the frequency is detected and 0 if not.
So far, I tried this:
clean signal = (goertzel > 20000) : works but i'm afraid this value can change with differents signal or different hardware.
I computed 2 goertzel : g1 = goertzel(frq) and g2 = goertzel(frq-100) then result = (g1 > g2):
This is not always working. very often g1=g2 and "100" may not always work.
g1 = goertzel(frqn) g1 = goertzel(frqn/2) and result = g1 > g2. It's fine for detecting the frequence but not the silence
in addition I would prefer to avoid to run 2 times the filter.
What do you suggest ?
Thanks
Edit
I think I managed to get what I want. In real time:
I compute the average of the last 20 goertzel magnitudes.
I update the max of this average
The signal was found if avg > (max/2)
On the screenshot below the result is in gray
https://i.stack.imgur.com/L432s.jpg
Edit 2
source code:
https://github.com/nonprenom/tones_detector
According to many sources on the Internet its possible to get GPU usage (load) using D3DKMTQueryStatistics.
How to query GPU Usage in DirectX?
I've succeded to get memory information using code from here with slight modifications:
http://processhacker.sourceforge.net/forums/viewtopic.php?t=325#p1338
However I didn't find a member of D3DKMT_QUERYSTATISTICS structure that should carry information regarding GPU usage.
Look at the EtpUpdateNodeInformation function in gpumon.c. It queries for process statistic per GPU node. There can be several processing nodes per graphics card:
queryStatistics.Type = D3DKMT_QUERYSTATISTICS_PROCESS_NODE
...
totalRunningTime += queryStatistics.QueryResult.ProcessNodeInformation.RunningTime.QuadPart
...
PhUpdateDelta(&Block->GpuRunningTimeDelta, totalRunningTime);
...
block->GpuNodeUsage = (FLOAT)(block->GpuRunningTimeDelta.Delta / (elapsedTime * EtGpuNodeBitMapBitsSet));
It gathers process running time and divides by actual time span.
I am using the "tick" event's delta property in EaselJS in order to create a simple timer in milliseconds. My ticker is set to 60 FPS. When the game is running I am getting roughly 16/17 ms between each tick (1000/60 = 16.6667) - so I am happy with this. However, when I append this value onto my text value (starting from 0) it is going up considerably quicker than it should be. I was expecting that on average it would be displaying a time of 1000 for each second elapsed. My code (in chunks) is below (game.js and gameInit.js are separate files). I am hoping that I am just overlooking something really simple...
//gameInit.js
createjs.Ticker.setFPS(60);
createjs.Ticker.on("tick", this.onTick, this);
...
//game.js
p.run = function (tickerEvent) {
if (this.gameStarted == true ) {
console.log("TICK ms since last tick = " + Math.floor(tickerEvent.delta)); // returns around 16/17
this.timerTextValue += Math.floor(tickerEvent.delta); //FIXME seems too fast!
this.timerText.text = this.timerTextValue;
}
};
Kind Regards,
Rich
Solved it. What a silly mistake! So, I had another place where I was initialising the ticker meaning it was being invoked twice, hence the reason that my timer was displaying doubly quick
I'm currently testing different algorithms, which determine whether an Integer is a real square or not. During my research I found this question at SOF:
Fastest way to determine if an integer's square root is an integer
I'm compareably new to the Programming scene. When testing the different Algorithms that are presented in the question, I found out that this one
bool istQuadratSimple(int64 x)
{
int32 tst = (int32)sqrt(x);
return tst*tst == x;
}
actually works faster than the one provided by A. Rex in the Question I posted. I've used an NS-Timer object for this testing, printing my results with an NSLog.
My question now is: How is speed-testing done in a professional way? How can I achieve equivalent results to the ones provided in the question I posted above?
The problem with calling just this function in a loop is that everything will be in the cache (both the data and the instructions). You wouldn't measure anything sensible; I wouldn't do that.
Given how small this function is, I would try to look at the generated assembly code of this function and the other one and I would try to reason based on the assembly code (number of instructions and the cost of the individual instructions, for example).
Unfortunately, it only works in trivial / near trivial cases. For example, if the assembly codes are identical then you know there is no difference, you don't need to measure anything. Or if one code is like the other plus additional instructions; in that case you know that the longer one takes longer to execute. And then there are the not so clear cases... :(
(See the update below.)
You can get the assembly with the -S -emit-llvm flags from clang and with the -S flag from gcc.
Hope this help.
UPDATE: Response to Prateek's question in the comment "is there any way to determine the speed of one particular algorithm?"
Yes, it is possible but it gets horribly complicated REALLY quick. Long story short, ignoring the complexity of modern processors and simply accumulating some predefined cost associated with the instructions can lead to very very inaccurate results (the estimate off by a factor of 100, due to the cache and the pipeline, among others). If you try take into consideration the complexity of the modern processors, the hierarchical cache, the pipeline, etc. things get very difficult. See for example Worst Case Execution Time Prediction.
Unless you are in a clear situation (trivial / near trivial case), for example the generated assembly codes are identical or one is like the other plus a few instructions, it is also hard to compare algorithms based on their generated assembly.
However, here a simple function of two lines is shown, and for that, looking at the assembly could help. Hence my answer.
I am not sure if there is any professional way of checking the speed (if there is let me know as well). For the method that you directed to in your question I would probably do something this this in java.
package Programs;
import java.math.BigDecimal;
import java.math.RoundingMode;
public class SquareRootInteger {
public static boolean isPerfectSquare(long n) {
if (n < 0)
return false;
long tst = (long) (Math.sqrt(n) + 0.5);
return tst * tst == n;
}
public static void main(String[] args) {
long iterator = 1;
int precision = 10;
long startTime = System.nanoTime(); //Getting systems time before calling the isPerfectSquare method repeatedly
while (iterator < 1000000000) {
isPerfectSquare(iterator);
iterator++;
}
long endTime = System.nanoTime(); // Getting system time after the 1000000000 executions of isPerfectSquare method
long duration = endTime - startTime;
BigDecimal dur = new BigDecimal(duration);
BigDecimal iter = new BigDecimal(iterator);
System.out.println("Speed "
+ dur.divide(iter, precision, RoundingMode.HALF_UP).toString()
+ " nano secs"); // Getting average time taken for 1 execution of method.
}
}
You can check your method in similar fashion and check which one outperforms other.
Record the time value before your massive calculation and the value after that. The difference is the time executed.
Write a shell script where you will run the program. And run 'time ./xxx.sh' to get it's running time.
I couldn't find any explanation for the following problem. Hope you to help me to know the solution...
Let's make a new windows appliaction (using any version of VS), and add a button, timer (we modify the interval to become = 10), and a label (with initial text = "0").
write the following code in the timer:
label1.Text =
(Convert.ToInt32(label1.Text) +
1).ToString();
write the following code in the button:
timer1.Enabled = true;
The label should show an incremental counter starting from 0.
Logically, each 100 counts should consume 1 second, but this is NOT the truth.
What happens is that each 100 counts consume a little bit more than 1 second !!!
What is the cause of this behavior????!!!
Thank you very much for your listenning, and waiting for your reply because I really searched for an explenation but I couldn't find anything.
If you are using System.Windows.Forms.Timer, it is limited to an accuracy of 55 ms.
The Windows Forms Timer component is single-threaded, and is limited to an accuracy of 55 milliseconds. If you require a multithreaded timer with greater accuracy, use the Timer class in the System.Timers namespace.
See the Remarks section: System.Windows.Forms.Timer