Run batch file on specific processor - batch-file

I have a server with dual processors, that is multiple cores per processor and two physical Xenon processors.
Each process will only run on one processor, which is fine. If you start a multi-threaded app it can only use the maximum amount of cores on one physical processor, not both (Windows 10 limitation?). I would like to start two instances of the same program so that I can use all cores on both the processors.
How do I start a process from a batch file so that it runs on a specified processor group? I.e. Cores 0-16 of processor 1, or Cores 0-16 of processor 2?
I've tried:
start /affinity FF file.exe
But that only runs it on cores from one particular processor. I believe I need to set the processor group but how do I do that using the 'start' command?
I can see you can use hexadecimal masks for the affinity with 'start' but that only seems to work on the cores of the first processor, I can't seem to access the cores of the second processor.
Since there is much confusion over my question, please see below. It's from task manager when you try and set an affinity, notice how I have multiple processor groups? That's what I am trying to configure using the 'start' command. '/affinity' only uses cores from group 0.

Judging by your "Processor group" combo, it appears that you have the system set to present NUMA nodes with each physical CPU being assigned to a single node. This question talks about how to check the config, so assuming that that is how you are set up, the command line flag /node <NUMA index> would allow you to select which node, so we get:
start /node 1 file.exe
This should start the application on the second NUMA node. Note that you might be able to combine this with the /affinity flag, so to run on just two cores of the first node, the following might work:
start /node 0 /affinity 3 file.exe

Related

Programmatically getting the per-core CPU load for a Windows system

I am trying to programmatically determine what the idle/running break-down is for the CPU cores.
In Linux, I use /proc/stat for this, which gives me cycle counts for cycles spent in:
user: normal processes executing in user mode
nice: niced processes executing in user mode
system: processes executing in kernel mode
idle: twiddling thumbs
iowait: waiting for I/O to complete
irq: servicing interrupts
softirq: servicing softirqs
Please note: I am getting system-wide numbers, not the cpu-usage for a specific process!
Now, I want to do the same, but in C, for a Windows system.
UPDATE
I've been able to find an aggregate statistic, so not per core: GetSystemTimes() which will return a value for idle, kernel and user cycles. What really confused me at first, is that the kernel cycles include idle cycles.

windows 7 processor interrogation

I have a PC with 24 cores. I have an application that needs to dedicate a thread to one of those cores, and the process itself to a few of those cores. The affinity and priorities are now hard-coded, I would like to programmatically determine what set of cores my application should set its affinity to.
I have read to stay away from core 0, I am currently using the last 8 cores on the first CPU for the process and the 12th core for the thread I want to run. Here is sample code which may not be 100% accurate with the parameters.
SetProirityClass(getCurrentProcess(),REAL_TIME_PRIORITY_CLASS);
SetProcessAffinityMask(getCurrentProcess(),0xFF0);
CreatThread(myThread, 0, entryPoint, NUll, 0, 0);//all 0 params besides handle and entry
SetThreadPriorityClass(myThread, TIME_CRITICAL);
SetThreadAffinityMask(myThread, 0x1 << 11);
I know that with elevated priorities (even with base priority 31) there is no way to dedicate a core to an application (Please correct me if I am wrong here since this is exactly what I want to do, non-programmatical solutions would be fine if I could do that). That being said, the OS itself runs "mostly" on a core or a couple of cores. Is it randomly determined on boot? Can I interrogate available cores to programmatically determine which set of cores my process and TIME_CRITICAL thread should be running on?
Is there any way to prevent the kernel threads from stealing time slices of my TIME_CRITICAL thread?
I understand windows is not real time but I'm doing the best with what I have. The solution needs to apply to win 7 but if it is also supported under XP that would be great.

Regarding CPU utilization

Considering the below piece of C code, I expected the CPU utilization to go up to 100% as the processor would try to complete the job (endless in this case) given to it. On running the executable for 5 mins, I found the CPU to go up to a max. of 48%. I am running Mac OS X 10.5.8; processor: Intel Core 2 Duo; Compiler: GCC 4.1.
int i = 10;
while(1) {
i = i * 5;
}
Could someone please explain why the CPU usage does not go up to 100%? Does the OS limit the CPU from reaching 100%?
Please note that if I added a "printf()" inside the loop the CPU hits 88%. I understand that in this case, the processor also has to write to the standard output stream hence the sharp rise in usage.
Has this got something to do with the amount of job assigned to the processor per unit time?
Regards,
Ven.
You have a multicore processor and you are in a single thread scenario, so you will use only one core full throttle ... Why do you expect the overall processor use go to 100% in a similar context ?
Run two copies of your program at the same time. These will use both cores of your "Core 2 Duo" CPU and overall CPU usage will go to 100%
Edit
if I added a "printf()" inside the loop the CPU hits 88%.
The printf send some characters to the terminal/screen. Sending information, Display and Update is handeled by code outside your exe, this is likely to be executed on another thread. But displaying a few characters does not need 100% of such a thread. That is why you see 100% for Core 1 and 76% for Core 2 which results in the overal CPU usage of 88% what you see.

network performance tunning in Linux

I have a application which has two threads ,thread1 would receive multicast packages
from network card eth1 , suppose I use sched_setaffinity to set cpu affinity
for thread1 to cpu core 1 , and then I have thread2 to use these packages
(received from thread1,located in heap area global vars) to do some operations ,
I set cpu affinity for thread2 to core 7 ,suppose core 1 and core 7 are
in the same core with hyper-threading , I think the performance would be good,
since core 1 and core 7 can use L1 cache .
I have watched /proc/interrupt , I see eth1 has interrupts in several cpu cores ,
so in my case , I set cpu affinity to core 1 for thread1 ,but interrupts happened
in many cores , would it effect performance ? those packages received from eth1
would go directly to main memory no matter which core has the interrupt ?
I don't know much about network in linux kernel , may anyone who would suggest
books or websites can help me for this topic ? Thanks for any comments ~~
Edit : according to "What every programmer should know about memory" 6.3.5 "Direct Cache Access" , I think "DCA" is hwat i like to know ...
The interrupt will (quite likely) happen on a different core than the one receiving the packet. Depending on how the driver deals with packets, that may or may not matter. If the driver reads the packet (e.g. to makle a copy), then it's not ideal, as the cache gets filled on a different CPU. But if the packet is just loaded into memory somewhere using DMA, and left there for the software to pick up later, then it doesn't matter [in fact, it's better to have it happen on a different CPU, as "your" cpu gets more time to do other things].
As to using hyperthreading, my experience (and that of many others) is that hyperthreading SOMETIMES gives a benefit, but often ends up being similar to not having hyperthreading, because the two threads use the same execution units of the same core. You may want to compare the throughput with both threads set to affinity on the same core as well, to see if that makes it "better" or "worse" - like most things, it's often details that make a difference, so your code may be slightly different from someone elses, meaning it works better in one or the other of the cases.
Edit: if your system has multiple sockets, you may also want to ensure that the CPU on the socket "nearest" (as in the number of QPI/PCI bridge hops) the network card.

Why would one CPU core run slower than the others?

I was benchmarking a large scientific application, and found it would sometimes run 10% slower given the same inputs. After much searching, I found the the slowdown only occurred when it was running on core #2 of my quad core CPU (specifically, an Intel Q6600 running at 2.4 GHz). The application is a single-threaded and spends most of its time in CPU-intensive matrix math routines.
Now that I know one core is slower than the others, I can get accurate benchmark results by setting the processor affinity to the same core for all runs. However, I still want to know why one core is slower.
I tried several simple test cases to determine the slow part of the CPU, but the test cases ran with identical times, even on slow core #2. Only the complex application showed the slowdown. Here are the test cases that I tried:
Floating point multiplication and addition:
accumulator = accumulator*1.000001 + 0.0001;
Trigonometric functions:
accumulator = sin(accumulator);
accumulator = cos(accumulator);
Integer addition:
accumulator = accumulator + 1;
Memory copy while trying to make the L2 cache miss:
int stride = 4*1024*1024 + 37; // L2 cache size + small prime number
for(long iter=0; iter<iterations; ++iter) {
for(int offset=0; offset<stride; ++offset) {
for(i=offset; i<array_size; i += stride) {
array1[i] = array2[i];
}
}
}
The Question: Why would one CPU core be slower than the others, and what part of the CPU is causing that slowdown?
EDIT: More testing showed some Heisenbug behavior. When I explicitly set the processor affinity, then my application does not slow down on core #2. However, if it chooses to run on core #2 without an explicitly set processor affinity, then the application runs about 10% slower. That explains why my simple test cases did not show the same slowdown, as they all explicitly set the processor affinity. So, it looks like there is some process that likes to live on core #2, but it gets out of the way if the processor affinity is set.
Bottom Line: If you need to have an accurate benchmark of a single-threaded program on a multicore machine, then make sure to set the processor affinity.
You may have applications that have opted to be attached to the same processor(CPU Affinity).
Operating systems would often like to run on the same processor as they can have all their data cached on the same L1 cache. If you happen to run your process on the same core that your OS is doing a lot of its work on, you could experience the effect of a slowdown in your cpu performance.
It sounds like some process is wanting to stick to the same cpu. I doubt it's a hardware issue.
It doesn't necessarily have to be your OS doing the work, some other background daemon could be doing it.
Most modern cpu's have separate throttling of each cpu core due to overheating or power saving features. You may try to turn off power-saving or improve cooling. Or maybe your cpu is bad. On my i7 I get about 2-3 degrees different core temperatures of the 8 reported cores in "sensors". At full load there is still variation.
Another possibility is that the process is being migrated from one core to another while running. I'd suggest setting the CPU affinity to the 'slow' core and see if it's just as fast that way.
Years ago, before the days of multicore, I bought myself a dual-socket Athlon MP for 'web development'. Suddenly my Plone/Zope/Python web servers slowed to a crawl. A google search turned up that the CPython interpreter has a global interpreter lock, but Python threads are backed by OS threads. OS Threads were evenly distributed among the CPUs, but only one CPU can acquire the lock at a time, thus all the other processes had to wait.
Setting Zope's CPU affinity to any CPU fixed the problem.
I've observed something similar on my Haswel laptop. The system was quiet, no X running, just the terminal. Executing the same code with different numactl --physcpubin option gave exactly the same results on all cores, except one. I changed the frequency of the cores to Turbo, to other values, nothing helped. All cores were running with expected speed, except one which was always running slower than the others. That effect survived the reboot.
I rebooted the computer and turned off HyperThreading in the BIOS. When it came back online it was fine again. I then turned on HyperThreading and it is fine till now.
Bizzare. No idea what that could be.

Resources