DOT PRODUCT UNIT IN MALI MIDGARD GPUS [closed] - arm

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
Hello i am using a mali t-624 gpu (Midgard Family Gpu).
Could you tell me if those gpu's are supporting dot product as I cannot find any info for this.
Also could you tell me a kernel written in opencl that will give me the best time execution for dot product.

Yes. The ARM Mali T624 MP4 GPU supports OpenCL 1.1. The specification includes the dot product for 32-bit floating-point. Use float dot (floatn p0, floatn p1) for best execution time.

Related

How to increase performance of sin and cos using neon instructions? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
How to use arm_neon.h headerfile to increase the performance of a code using sin and cos functions.?
The board used is a Xilinx T1 accelerator card with ARM architecture armv8-a and cortex a53.
Language is c.
arm_neon.h contains SIMD intrinsics, which offer a C API to access/invoke individual low level instructions.
Thus, if you intend to speed up sin/cos with arm_neon.h, the method is to rewrite those trigonometric functions using vector arithmetic calculating 4 values at the same time.
Things you need to concern are:
the code needs to be branchless
you need to define how accurate you need to be
you need to define the input range (no need to handle multiples of 2*pi ?)
you need to define input unit (radians vs degrees vs fractions of 2^n)
All of this will determine what kind of approximation to use -- polynomial, linear piece-wise, rational polynomial and what steps or corner cases can be omitted.

How to implement the printf function in risc-v? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
Here's the C representation of what I'm trying to do in RISC-V assembly:
printf ("x=%d\n", x);
https://godbolt.org/ is an interesting site. If you paste in c code, it can be transfered into others, such as RISC-V assembly. The sample c code is available from menie.org/georges/embedded/small_printf_source_code.html. It does work. Good luck.
Here is a very simple printf (actually only integers and strings and no advanced formatting)
https://godbolt.org/z/sgMVs7
It is not my code - it is tiny ptinf from the atolic studio. But it is a good base to implement something simple but more decent.

different execution time or the same with different options? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
If i compile a C program with different options like '-o, -o2, -o3' Will there be any difference in the execution time or memory utilization?.
Maybe.
Depends. You're telling the compiler to spend a bit of additional time into looking for places where it could probably optimize the code from the standard approach. It might find such places, but it also might not. On all but the most trivial programs, there is, however, quite a high probability the compiler will be able to optimize ("Hello World" doesn't optimize very well, though...).

Fastest C conditions execution [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
Today I stumbled upon one simple part of code and I would like to know more people opinion.
What would be the fastest code to evaluate this graph
There is no such thing as the fastest code to evaluate this graph. It depends on the processor architecture. What can be faster on one architecture, will be slower on another, or not even possible.
Nowadays, the compilers excel at block optimizations, and you should write the code as natural as you can and let the compiler decide what "the fastest" means. If the compiler doesn't have an optimization option, the best way to handle this type of conditions is to use 'conditional move' instructions, because they do not stall the pipeline, but this is very much specific to certain architectures.

What is the fastest semi-arbitrary precision math library? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I'm using long double in a C program to compute 2D images of the Mandelbrot Set but wish to have further precision to zoom deeper.
Are there any performance gains to be had from an arbitrary precision maths library that can restrict the amount of precision as required, rather than leaping from long double precision straight into arbitrary precision?
Which is the fastest of the arbitrary precision maths libraries?
'fastest' is going to be somewhat dependent on your platform and intended use.
The MPFR Library
GMP
This wiki article contains links to several libraries.
If you need more precision, see qd at http://crd.lbl.gov/~dhbailey/mpdist/.

Resources