I'm trying to allocate several cloudlets to a single VM and run them after each other by a specific sequence. but as i add them, i gt the following error:
Broker: Postponing execution of cloudlet 0: bount VM not available
Broker: Postponing execution of cloudlet 0: bount VM not available
Broker: Postponing execution of cloudlet 0: bount VM not available
...
i used this command:
cloudId.setVmId(0);
cloudId.setVmId(1);
...
is there any way to do this?
yes i found the problem. because of lack of host's resources like MIPS, BW, RAM and ... the VMs couldn't run. i increased the host's Resources and now all VMs are up and running the tasks.
so there was no problem with the code and you can queue several tasks to a single VM.
Related
I'm working in sage maker studio, and I have a single instance running one computationally intensive task:
It appears that the kernel running my task is maxed out, but the actual instance is only using a small amount of its resources. Is there some sort of throttling occurring? Can I configure this so that more of the instance is utilized?
Your ml.c5.xlarge instance comes with 4 vCPU. However, Python only uses a single CPU by default. (Source: Can I apply multithreading for computationally intensive task in python?)
As a result, the overall CPU utilization of your ml.c5.xlarge instance is low. To utilize all the vCPUs, you can try multiprocessing.
The examples below are performed using a 2 vCPU + 4 GiB instance.
In the first picture, multiprocessing is not set up. The instance CPU utilization peaks at around 50%.
single processing:
In the second picture, I created 50 processes to be run simultaneously. The instance CPU utilization rises to 100% immediately.
multiprocessing:
It might be something off with these stats your seeing, or they are showing different time spans, or the kernel has a certain resources assignment out of the total instance.
I suggest opening a terminal and running top to see what's actually going on and which UI stat it matches (note your opening the instance's terminal, and not the Jupyter UI instance terminal).
I work on a Control Flow Integrity mechanism that monitors a process. I have been trying to restrict the kernel workqueue workers to certain cores while running the monitored program on an isolated core (Pinned using taskset and isolcpus). But I have been able to do that only with an unbounded workqueue, but I would like to maintain the locality of the workers by using a bounded workqueue while making sure that none of the workers run on the isolated core.
I couldn't find a way to specify the cpu mask when using a bounded workqueue. For now, when I use a bounded workqueue, when I try to pin the monitored process to the isolated core, only the workers that belong to the worker pool of the isolated core are active. (I monitored which core the workers are running on using on the top command).
Is there a way I can restrict workqueue workers from running on certain cores while using a bounded workqueue in linux kernel ?
I have implemented source which open fixed UDP port and listen it. So, I want to run exactly one source per task manager (in my case I run one task manager per node) because overwise a java.net.BindException: Address already in use exception will be thrown.
I notice this problem when test HA of Apache Flink. When I shut down one task manager the Apache Flick started trying to run two sources with the same port on one node.
So, how to run exactly one source per task manager (or per cluster node)?
It is currently not possible to dynamically enforce that exactly one task of a kind runs on each TaskManager. You can avoid that multiple source tasks are scheduled to the same machine by setting the number of slots to 1. However, then if you lose a machine and don't have a spare TaskManager, then you won't have enough slots to restart the job.
Alternatively, you could write your sources such that they are more resilient. For example, you could simple stop a source if they cannot bind to the specified port. Given that no other program can bind to the port, then you know that there is already another source task consuming data from this port.
I have Linphone open source application that uses x264 encoder. By default it runs on one thread:
x264_param_t *params= .....
params->i_threads=1;
I added ability to use all processors:
long num_cpu=1;
SYSTEM_INFO sysinfo;
GetSystemInfo( &sysinfo );
num_cpu = sysinfo.dwNumberOfProcessors;
params->i_threads=num_cpu;
The question is how do I know that during video streaming x264 runs on (in my case) 4 processors?
Because from Task Manager -> Performance -> CPU usage history doesn't clear.
I use windows 7
Thanks,
There are three easy to see indications that encoding leverages multiple cores:
Encoding runs faster
Per core CPU load indicates simultaneous load on several cores/processors
Per thread CPU load of your application shows relevant load on multiple threads
Also, you can use processor affinity mask (programmatically, and via Task Manager) to limit the application to single CPU. If x264 is using multiple processors, setting the mask will seriously affect application performance.
In windows task manager, be sure to select View -> CPU History -> One Graph Per CPU. If it still does not look like all processor cores are running at full speed, then possibly some resource is starving the encoding threads, and there's a bottleneck feeding data into the encoder.
What would happen if there are four concurrent CUDA Applications competing for resources in one single GPU
so they can offload the work to the graphic card?. The Cuda Programming Guide 3.1 mentions that there
are certain methods which are asynchronous:
Kernel launches
Device device memory copies
Host device memory copies of a memory block of 64 KB or less
Memory copies performed by functions that are suffixed with Async
Memory set function calls
As well it mentions that devices with compute capability 2.0 are able to execute multiple kernels concurrently as long as the kernels belong to the same context.
Does this type of concurrency just apply to streams within a single cuda applications but not possible when there are complete different applications requesting GPU resources??
Does that mean that the concurrent support is just available within 1 application (context???) and that the 4 applications will just run concurrent in the way that the methods might be overlaped by context switching in the CPU but the 4 applications need to wait until the GPU is freed by the other applications? (i.e Kernel launch from app4 waits until a kernel launch from app1 finishes..)
If that is the case, how these 4 applications might access GPU resources without suffering long waiting times?
As you said only one "context" can occupy each of the engines at any given time. This means that one of the copy engines can be serving a memcpy for application A, the other a memcpy for application B, and the compute engine can be executing a kernel for application C (for example).
An application can actually have multiple contexts, but no two applications can share the same context (although threads within an application can share a context).
Any application that schedules work to run on the GPU (i.e. a memcpy or a kernel launch) can schedule the work asynchronously so that the application is free to go ahead and do some other work on the CPU and it can schedule any number of tasks to run on the GPU.
Note that it is also possible to put the GPUs in exclusive mode whereby only one context can operate on the GPU at any time (i.e. all the resources are reserved for the context until the context is destroyed). The default is shared mode.