GET Process Cpu Usage In c - c

How can i Get process Cpu usage in c??
I need Cpu usage of evrey process and threads.
please give me an example.
Thanks!

In plain C, this is not possible, but since the question is also tagged "Windows":
CPU usage is CPU time divided by real time. The GetThreadTimes and GetProcessTimes functions give you that information (among other features such as performance counters, which Joachim Pileborg mentioned above, but I think this one is probably easier).
You probably also want to use CreateToolhelp32Snapshot first to know what processes and threads exist at all. You'll need to translate thread/process IDs to handles, but I guess that won't be a big hurdle (i.e. OpenProcess).

In C, total CPU usage can be determined using Performance Counters (there is a small typo in the example code: sleep has to be changed to Sleep).
In C++, C#, Delphi etc., I would recommend using WMI.
== EDIT ==
I found an approach to get the per-process CPU usage. For example, in order to get the CPU load of Microsoft Outlook, change the counter path in the above example to this:
PdhAddCounter(query, TEXT("\\Process(OUTLOOK)\\% Processor Time"), 0, &counter);
If you have multiple instances of the same executable running, you may use indexes. This MSDN example is also very useful.

Related

How to choose for multithreading - c

I have to do a program client-server in c where server can use n-threads that can work simultaneously for manage the request of clients.
For do it I use a socket that use a listener that put the new FD (of new connection request) in a list and then the threads can take it when they are able to do.
I know that I can use pipe too for communication between thread.
Is the socket the best way ? And why or why not?
Sorry for my bad English
To communicate between threads you can use socket as well as shared memory.
To do multithreading there are many libraries available on github, one of them I used is the below one.
https://github.com/snikulov/prog_posix_threads/blob/master/workq.c
I tried and tested the same way what you want. it works perfect!
There's one very nice resource related to socket multiplexing which I think you should stop and read after reading this answer. That resource is entitled The C10K problem, and it details numerous solutions to the problem people faced in the year 2000, of handling 10000 clients.
Of those solutions, multithreading is not the primary one. Indeed, multithreading as an optimisation should be one of your last resorts, as that optimisation will interfere with the instruments you use to diagnose other optimisations.
In general, here is how you should perform optimisations, in order to provide guaranteed justifications:
Use a profiler to determine the most significant bottlenecks (in your single-threaded program).
Perform your optimisation upon one of the more significant bottlenecks.
Use the profiler again, with the same set of data, to verify that your optimisation worked correctly.
You can repeat these steps ad infinitum until you decide the improvements are no longer tangible (meaning, good luck observing the differences between before and after). Following these steps will provide you with data you can show your employer, if he/she asks you what you've been doing for the last hour, so make sure you save the output of your profiler at each iteration.
Optimisations are per-machine; what this means is that an optimisation for your machine might actually be slower on another machine. For example, you may use a buffer of 4096 bytes for your machine, while the cache lines for another machine might indicate that 512 bytes is a better idea.
Hence, ideally, we should design programs and modules in such a way that their resources are minimal and can be easily be scaled up, substituted and/or otherwise adjusted for other machines. This can be difficult, as it means in the buffer example above you might start off with a buffer of one byte; you'd most likely need to study finite state machines to achieve that, and using buffers of one byte might not always be technically feasable (i.e. when dealing with fields that are guaranteed to be a certain width; you should use that width as your minimum limit, and scale up from there). The reward is ultra-portable and ultra-optimisable in all situations.
Keep in mind that extra threads use extra resources; we tend to assume that the stack space reserved for a thread can grow to 1MB, so 10000 sockets occupying 10000 threads (in a thread-per-socket model) would occupy about 10GB of memory! Yikes! The minimal resources method suggests that we should start off with one thread, and scale up from there, using a multithreading profiler to measure performance like in the three steps above.
I think you'll find, though, that for anything purely socket-driven, you likely won't need more than one thread, even for 10000 clients, if you study the C10K problem or use some library which has been engineered based on those findings (see your comments for one such suggestion). We're not talking about masses of number crunching, here; we're talking about socket operations, which the kernel likely processes using a single core, and so you can likely match that single core with a single thread, and avoid any context switching or thread synchronisation troubles/overheads incurred by multithreading.

Is there a way to suspend OS scheduling for the duration of a program?

I have an assignment where I am analyzing the runtime of various sorting algorithms. I have written the code but I think it's an unfair comparison.
My code basically grabs the the clock time before and after the sorting is finished to compute the elapsed time. However, what if the OS decides to interrupt more frequently during the runtime of a specific sorting algorithm, or if it rather decides that some other background application should be given more of the time domain when it's thread comes back up?
I am not a CS major so I may not be entirely correct here, but from what I've read previously I was concerned this might have an impact on the results.
I also realize that if OS scheduling is suspended and the program hangs then there might be a serious problem; I am just wondering if it possible.
Normally, there's no real reason for it. The scheduler will slightly increase the execution time, but if the code runs for a few seconds, the change will be tiny.
So unless you're running heavy applications on the same computer, the amount of noise this will add to your tests is negligible.
In Linux, you can use isolcpus parameter to mark CPUs that won't be used by the scheduler. You can find information here. I'm not sure what's the minimal kernel version.
If you use it, you'll need to use sched_setaffinity, to put your theread on an isolated CPU, because the scheduler won't put it there.
It is not possible, not in user space code. Otherwise, any malicious process could steal the CPU from others.
If you want precise time counting for your process only, I suggest using time command. You can read about it here: What do 'real', 'user' and 'sys' mean in the output of time(1)?
Quick answer: you are most likely interested in user time, assuming your code doesn't make a heavy use of syscalls (which would be rather strange for a sorting algorithm)
On an up-to-date POSIX system (basically Linux) you can use clock_gettime with CLOCK_PROCESS_CPUTIME_ID or CLOCK_THREAD_CPUTIME_ID if you make sure the process doesn't migrate between CPUs (you can set its affinity for example).
The difference in times returned by clock_gettime with those arguments results in exact time the process/thread spent executing. Only pitfall as I mentioned is process migration as the man page says:
The CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID clocks are realized on many platforms using timers from the CPUs (TSC on i386, AR.ITC on Itanium). These registers may differ between CPUs and as a consequence these clocks may return bogus results if a process is migrated to another CPU.
This means that you don't really need to suspend all other processes just to measure the execution time of your program.

time() and context switching

I am more or less wondering how time() is implemented in the C standard library and what would happen in the situation described below. Although this time is most-likely negligible, consider a situation where you have a hard-limit on time and no control over the CPU scheduler (assume that it is a "good" scheduler for a general-purpose CPU).
Now, if I use time() to calculate my execution time of a particular section of code and use this time subtracted from some maximum bound to determine some other time-dependent variable, how would this variable be skewed based on context-switches? I know we could use nice and other tools (i.e. custom scheduler, etc.) to be certain we get full CPU usage when we need it, however, I am wondering how this works in general for similar situations as this and what side-effects exist due to the system's choices.
time is supposed to measure wall-time. I.e., it gives the current time, regardless of how much or little your process has run.
If you want to measure cpu time, you should use clock instead (though some vendors such as MS implement it wrong, so it does wall time also).
Of course, there are also other tools to retrieve CPU usage, such as times on Unix-like systems or GetProcessTimes on Windows. Most people find these more useful despite the reduced portability.

How to retrieve the current processor time in Linux?

I am using C language and Linux as my programming platform in embedded device.
My question is, how to correctly retrieve the current processor time(tick). I am using clock() function in time.h and it seems I am getting inconsistent value.
Thanks.
The clock() function measures the CPU time consumed by your process. It doesn't increment while your process is sleeping or blocked.
If you want a high resolution clock that advances continually, use clock_gettime(CLOCK_MONOTONIC, ..).
I am not real clear on what, specifically, you are asking. If you want another method to get the time your process is using, I often use getitimer() / setitimer() with ITIMER_PROF versus ITIMER_REAL. I find that can be a bit quirky, however.
You may be interested in the LWN article "The trouble with TSC", and the attached comments. While gettimeofday and clock_gettime seem to be the correct thing to go to, there's a lot to consider: performance may vary, there may be consistency issues between different CPUs in multithreaded or multiprocess programs, and the presence of e.g. NTP can mutate the clock value (CLOCK_MONOTONIC will not be affected by NTP, but others may).
Be careful, and make sure you read up on whatever you pursue to make sure it fits your requirements. If you're lucky you're on a fixed hardware and library platform, or you can afford some kinds of inaccuracy or imprecision.

Measuring CPU clocks consumed by a process

I have written a program in C. Its a program created as result of a research. I want to compute exact CPU cycles which program consumes. Exact number of cycles.
Any idea how can I find that?
The valgrind tool cachegrind (valgrind --tool=cachegrind) will give you a detailed output including the number of instructions executed, cache misses and branch prediction misses. These can be accounted down to individual lines of assembler, so in principle (with knowledge of your exact architecture) you could derive precise cycle counts from this output.
Know that it will change from execution to execution, due to cache effects.
The documentation for the cachegrind tool is here.
No you can't. The concept of a 'CPU cycle' is not well defined. Modern chips can run at multiple clock rates, and different parts of them can be doing different things at different times.
The question of 'how many total pipeline steps' might in some cases be meaningful, but there is not likely to be a way to get it.
Try OProfile. It use various hardware counters on the CPU to measure the number of instructions executed and how many cycles have passed. You can see an example of it's use in the article, Memory part 7: Memory performance tools.
I am not entirely sure that I know exactly what you're trying to do, but what can be done on modern x86 processors is to read the time stamp counter (TSC) before and after the block of code you're interested in. On the assembly level, this is done using the RDTSC instruction, which gives you the value of the TSC in the edx:eax register pair.
Note however that there are certain caveats to this approach, e.g. if your process starts out on CPU0 and ends up on CPU1, the result you get from RDTSC will refer to the specific processor core that executed the instruction and hence may not be comparable. (There's also the lack of instruction serialisation with RDTSC, but in this context here, I don't think that's so much of an issue.)
Sorry, but no, at least not for most practical purposes -- it's simply not possible with most normal OSes. Just for example, quite a few OSes don't do a full context switch to handle an interrupt, so the time spent servicing a interrupt can and often will appear to be time spent in whatever process was executing when the interrupt occurred.
The "not for practical purposes" would indicate the possibility of running your program under a cycle accurate simulator. These are available, but mostly for CPUs used primarily in real-time embedded systems, NOT for anything like a full-blown PC. Worse, they (generally) aren't for running anything like a full-blown OS, but for code that runs on the "bare metal."
In theory, you might be able to do something with a virtual machine running something like Windows or Linux -- but I don't know of any existing virtual machine that attempts to, and it would be decidedly non-trivial and probably have pretty serious consequences in performance as well (to put it mildly).

Resources