In a function, I want to check the absolute bound of a number as a condition for an operation. I want to do abs(r1) > 15 and right now I have an unoptimized way of doing it which is:
CMP r1, #15
ADDGT //operation
CMP r1, #-15
ADDLT //operation
Anyone thinks there could be a faster way? I was thinking of maybe right shifting by 4 so if it is less than +/-15, it will be all 1s or all 0s but I couldn't find a good way of doing it.
Related
I'm computing the incremental mean of my input data (which is an array of 6 elements, so i'll end up with 6 means).
This is the code I'm using everytime a new input array is available (obviously I update the number of samples ecc...):
computing_mean:for(int i=0;i<6;i++){
temp_mean[i]=temp_mean[i] + (input[i]-temp_mean[i])/number_of_samples;
//Possible optimization?
//temp_mean[i]=temp_mean[i] + divide(input[i]-temp_mean[i],number_of_samples);
}
Where all the data in the code are arrays or single number of the following type:
typedef ap_fixed <36,24,AP_RND_CONV,AP_SAT> decimalNumber;
From my synthesis report this loop hase 324 latency and 54 iteration latency, caused mainly by the division operation.
Are there any ways I can improve the speed of the division? I tried using hls_math and the divide function, but it doesn't seem to work with my type of data.
EDIT 1: I'm including my performance profiler inside vivado HLS. I'll add a self-contained reproducible code later with another edit.
As you can see, the majority of the time is spent in SDIV
Other than trigonometric functions like sin() (FSIN = ~50-170 cycles) and cos() (FCOS = ~50-120 cycles), or things like sqrt() (FSQRT = ~22 cycles), division will always be the most painful.
FDIV is 15 cycles. FADD and FMUL are both 5.
There are occasions where you can skip division and do bit-shifting instead, if you're working with integer data and the number you're dividing by is a power of 2, but that's about it.
You can look up the approximate CPU cycle cost of any given instruction in tables like this. FDIV is an example of an expensive one.
That being said, one thing you could try is to compute the division factor in advance, then apply it using multiplication instead:
double inverse_n = 1 / number_of_samples;
temp_mean[i]=temp_mean[i] + (input[i]-temp_mean[i]) * inverse_n;
I'm not sure that's saving a whole lot, but if you really do need to shave off cycles, it's worth a shot.
Can someone explain me how the distance for the quadratic split here is being calculated for this example, and if you could suggest me more examples, it would be really helpful for me. Thank you
I think those quadratic split distances shown are considering the squares to be 1X1, not 10X5. The idea is to find how much space would be wasted in a bounding box that covered the two rectangles - for example, a bounding box covering R1 and R2 would be 4X2, area 8. Subtract 2 for the area of R1 and R2 - the wasted space is 6. Choose as seeds the two rectangles you would least want together, i.e. the two with the greatest wasted space.
This is explained in the original paper: http://pages.cs.wisc.edu/~cs764-1/rtree.pdf.
You can make up your own split algorithm. How good it is will affect the efficiency of the R-tree, but not the correctness.
If I find hard to remember XOR and Inclusive-OR, what is the easiest way to remember the logic and truth table?
XOR: One or the other, but not both
OR: One, or the other, or both
One way to think about it is that XOR (eXclusive-OR) is exclusively OR and not AND. Another way is that XOR is exclusive in that you can only pick one of the options: i.e. "You can't have your cake and eat it too."
Inclusive OR :
"false if both values are false"
XOR :"false if both values are same"
In C I have:
double balance;
void deposit(double amount)
{balance = balance +amount;}
machine language:
load R1, balance
load R2, amount
add R1, R2
store R1, balance
If the variable balance contains 500 and two threads run the procedure to deposit 300 and 200 respectively concurrently, how can this be problematic? And how do I use a concurrency mechanism to make this procedure thread safe?
Concurrency 101
Thread 1 Thread 2
load R1, balance
load R2, amount load R1, balance
add R1, R2 load R2, amount
store R1, balance add R1, R2
store R1, balance
The write by Thread 1 is lost. (There are many sequences that achieve approximately the same result.)
You fix it by locking balance so that only one thread or the other has access to it between the load and the store. Acquire a mutex on balance at the start of the sequence and release it at the end. Consider loading amount before loading balance to reduce the scope of the mutex to the minimum.
I'm doing my my project for a course and my goal is to implement the Proportional Integrant Control over a robot to track a line with 12 simple phototransistors. Now I've been reading many PID tutorials but I'm still confused. Can someone help me to start like from what I have been thinking...
I should assign each state of sensors a binary value and then use that in implementing the PI equation for error.... can some friend throw some light?
Assuming the photo transistors are all in a line parallel to the front edge of your 'car', perpendicular to the edge of the track, and individually numbered from 0 - 11...
You want your car's center to follow the line. Sensors #5 and #6 should straddle the line, and therefore be used be used as fine-tuning adjustment. The sensors at the extreme ends (#0 and #11) should have the highest impact on your steering.
With those two bits of info, you should be able to set appropriate weights (multiplication factors) for your PI control to instruct your car to turn left a little, when sensors #7, #8 see the line, or turn left a lot when sensors #9, #10, #11 see the line. The extreme sensors may also affect the speed of your car.
Some things to consider: When implementing a front-wheel steering vehicle, it is often better to mount your sensor strip behind the front wheels. Also, rear-wheel steering vehicles can adjust to sharp corners more quickly, but are less stable at high-speeds.
I'd convert the 12 sensors into a number from 1 to 12. Then try and target a value of 6 in my PID. Then use the output to drive the wheels. Maybe normalize it so you get a +ve number means more right, and a negative means more left.