In my opencl application I have a controlling application part, a graphics application part and some serial application part, as shown below:
All these applications are running in parallel.
So far I have written applications that run simultaneously on CPU and GPU. Is there a way I can use ARM together with CPU(Intel) and GPU (ATI) in parallel as shown in the picture above?
Related
Setup as follows: A winform app, visual studio 2019, create 16 videoview/mediaplayer instances, each streaming a 960 X 540 30fps camera stream from a multicasting camera.
CPU i7 2.67GHz, GPU NV GTX 1650.
The GPU is loading up to 44% decode and about the same for 3d. The application uses an amazing 75 to 90% of the CPU. It jumps around a lot from one test run to another. The GPU is very stable.
Here's some other information that is interesting. If I run a single copy of this application with one video stream the CPU use is about 5/10% of CPU. If I run 16 instances of the application each instance uses about 4/10 to 8/10% of the CPU. Once I have 16 videos streaming the GPU is same as above (44%) the CPU is nominal.
The increase of CPU usage within one instance while adding cameras is not linear it takes a big jump after 9.
From the diagnostic image below you can see the usage is isolated almost entirely in the Native code. Other diagrams show about 2/3 in the kernel and 1/3 in system IO. The CPU is spread across all the cores pretty evenly.
code on gist
I have tried a lot of variations on this but no matter what I try the CPU usage is pretty constant once I get up to 16 channels. I have tried running each instance within its own thread. That made no difference. I really would like to understand this and find a way to reduce CPU usage. I have an application that uses this tech and a customer that requires even more channels than 16.
It may be a bug, which would need to be reported on trac.videolan.org with a minimal C/C++ reproduction sample for the VLC developers.
Do note that comparing 16 VLC app instances (16 processes) playing and 1 LibVLC-based app instance playing 16 streams (1 process) is not exactly a fair comparison.
The perf usage should still be linear and not exponential, though, so maybe there is a bug.
I'm trying to run a C x86 application in a raspberry using Exagear.
In my laptop the CPU consumption of the C applications while is running is about 50-60%. When I run the same C application in the raspberry, the CPU consumption is about 300%. I don't know why this CPU consumption difference between my laptop and the raspberry using exagear.
My raspberry is a Quad core Cortex A53 processor # 1.2 GHz with Videocore IV GPU 1GB LPDDR2 RAM. While my laptop virtual machine have two processors and 4GB RAM.
I'm thinking that maybe there's some kind of problema using my C application with exagear.
I would like to know if I could check more things to try to figure out which is the cause of this high CPU consumption.
I have Linphone open source application that uses x264 encoder. By default it runs on one thread:
x264_param_t *params= .....
params->i_threads=1;
I added ability to use all processors:
long num_cpu=1;
SYSTEM_INFO sysinfo;
GetSystemInfo( &sysinfo );
num_cpu = sysinfo.dwNumberOfProcessors;
params->i_threads=num_cpu;
The question is how do I know that during video streaming x264 runs on (in my case) 4 processors?
Because from Task Manager -> Performance -> CPU usage history doesn't clear.
I use windows 7
Thanks,
There are three easy to see indications that encoding leverages multiple cores:
Encoding runs faster
Per core CPU load indicates simultaneous load on several cores/processors
Per thread CPU load of your application shows relevant load on multiple threads
Also, you can use processor affinity mask (programmatically, and via Task Manager) to limit the application to single CPU. If x264 is using multiple processors, setting the mask will seriously affect application performance.
In windows task manager, be sure to select View -> CPU History -> One Graph Per CPU. If it still does not look like all processor cores are running at full speed, then possibly some resource is starving the encoding threads, and there's a bottleneck feeding data into the encoder.
Does anybody have any experience in maintaining single codebase for both CPU and GPU?
I want to create an application which when possible would use GPU for some long lasting calculations, but if a compatible GPU is not present on a target machine it would just use regular CPU version. It would be really helpfull if I could just write a portion of code using conditional compilation directives which would compile both to a CPU version and GPU version. Of course there will be some parts which are different for CPU and GPU, but I would like to keep the essense of the algorithm in one place. Is it at all possible?
OpenCL is a C-based language. OpenCL platforms exist that run on GPUs (from NVidia and AMD) and CPUs (from Intel and AMD).
While it is possible to execute the same OpenCL code on both GPUs and CPUs, it really needs to be optimized for the target device. Different code would need to be written for different GPUs and CPUs to gain the best performance. However, a CPU OpenCL platform can function as a low-performance fallback for even GPU optimized code.
If you are happy writing conditional directives that execute depending on the target device (CPU or GPU) then that can help performance of OpenCL code on multiple devices.
I was running cuda program on a machine which has cpu with four cores, how is it possible to change cuda c program to use all four cores and all gpu's available?
I mean my program also does things on host side before computing on gpus'...
thanks!
CUDA is not intended to do this. The purpose of CUDA is to provide access to the GPU for parallel processing. It will not use your CPU cores.
From the What is CUDA? page:
CUDA is NVIDIA’s parallel computing architecture that enables dramatic increases in computing performance by harnessing the power of the GPU (graphics processing unit).
That should be handled via more traditional multi-threading techniques.
cuda code runs only on GPU.
so if you want parallelism on your CPU cores, you need to use threads such as Pthreads or OpenMP.
Convert your program to OpenCL :-)