Auto generate w3wp.exe process dump file when CPU threshold is reached even when PID changes - pid

I'm trying to troubleshoot an issue with one of our websites which causes the CPU to spike intermittently. The site sits on a farm of web servers and it intermittently happens across all servers at different times. The process which causes the spike is w3wp.exe. I have checked all the obvious things and now want to analyse multiple sets of dump files for the w3wp.exe which causes the spike.
I'm trying to automatically generate a dump file of the w3wp.exe process when it reaches a specified CPU threshold for a specified time.
I can use ProcDump.exe to do this and it works a treat IF it's fired before the PID (Process ID) changes.
For example:
procdump -ma -c 80 -s 10 -n 2 5844(where 5844 is the PID)
-ma Write a dump file with all process memory. The default dump format includes thread and handle information.
-c CPU threshold at which to create a dump of the process.
-s Consecutive seconds CPU threshold must be hit before dump written (default is 10).
-n Number of dumps to write before exiting.
The above command would monitor the w3wp.exe till CPU spikes 80% for 10 seconds and it would take full dump at least for two iterations.
The problem:
I have multiple instances of w3wp.exe running so I cannot use the Process Name, I need to specify the PID. The PID changes each time the App Pool is recycled. This causes the PID to change before I can capture multiple dump files. I then need to start procdump again on each web server.
My question:
How can I keep automatically generating dump files even after the PID changes?

USe DebugDiagnostic 2.0 from Microsfot: https://www.microsoft.com/en-us/download/details.aspx?id=49924
It handles multiple w3wp.exe processes. If you need a generic solution, you will have to write a script - such as https://gallery.technet.microsoft.com/scriptcenter/Getting-SysInternals-027bef71

Related

Debugging Linux process hangs, which code is it running?

I have a process running on a very weak Linux embedded device, which could not run gdb / gdb server on itself. I let it provoking a function X from a shared library repeatedly (there are also some others process calling it at the same time with much less frequency), it usually hangs somewhere inside the shared library after 1 day or a half-day. How do I debug:
In case it blocked somewhere: which is the last line of code it ran?
In case it stuck in an infinite loop: which lines of code it running?
What I tried:
I dig into the shared library and put a lot of syslog inside to check. However, with a very high amount of syslog being called constantly, my process now hangs every 2-5 minutes. I think syslog is blocked by UNIX socket?
gdb comes with a program called gcore, which will generate a core file from the running process.
Many systems nowadays disable core files by default (ulimit -c in a shell will show 0). Use the ulimit -c unlimited shell command, then run your process in the same shell (these limits are inherited from the parent process. If you start your process some other way than directly from the shell, you will need to find out how to set them there. e.g., LimitCORE= in a systemd unit file).
Once your process gets into the bad state, run gcore on its process ID. You can then copy it to your workstation and load it into gdb (gdb <executable> <core-file>). You can then view the stack trace and other state as of the moment the core dump was taken.

Sustes in Solr running and using lots of CPU power

I am trying to figure out what this process is:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2398 solr 20 0 709920 16412 1364 S 771.5 0.0 19:39.02 sustes
I thought maybe it was from a script doing an optimization of the database - but I've disabled that and its still there even after a reboot. Seems excessively high!
Solr doesn't usually launch any external processes, and the Internet seems to indicate your server has been compromised and someone is running a cryptominer binary on your server.
It's time to kill any access to the server, recreate the server and firewall it off from the world, find out how they got in and make sure it doesn't happen again.
It shows some external process is running - Maybe CryptoJacking which is Bigger than Ransomware, It's a new money maker.
Please check below files for any unwanted logs and any connection to external URL's
1. var/log/syslog and
2. Or if you are using window/Linux machines - check window service/process(Window) or any cron job(Linux).

Fork : concurrency or parallelism

I was recently doing some experiments with the fork function and I was concerned by a "simple" (short) question:
Does fork uses concurrency or parallelism (if more than one core) mechanisms?
Or, is it the OS which makes the best choice?
Thanks for your answer.
nb: Damn I fork bombed myself again!
Edit:
Concurrency: Each operation run on one core. Interrupts are received in order to switch from one process to another (Sequential computation).
Parallelism: Each operation run on two cores (or more).
fork() duplicates the current process, creating another independent process. End of story.
How the kernel chooses to schedule these processes is a different, very broad question. In general the kernel will try to use all available resources (cores) to run as many tasks as possible. When there are more runnable tasks than cores, it has to start making decisions about who gets to run, and for how long.
The fork function creates a separate process. It is up to the operating system how it handles different processes.
Of course, if only once core is available, the OS has no other choice but running all processes interleaved.
If more cores are available, every sane OS will distribute the processes to the different cores, so every core runs at least one process.
However, even then, more processes can be active than there are cores. So even then, it is up to the OS to decide which processes can be run parallel (by distributing to cores) and which have to be run interleaved (on a single core).
In fact, fork() is a system call (aka. system service) which creates a new process from the current process (read the return code to see who you are, the parent or the child).
In the UNIX work, processes shares the CPU computing time. This works like that :
a process is running
the clock generates an interrupt, calling the kernel and pausing the process
the kernel takes the list of available processes, and decide to resume one (this is called scheduling)
go to point 1)
When there is multiples processor cores, kernels are able to dispatch processes on them.
Well, you can do something. Write a complex program say, O(n^3), so that it takes a good amount of time to compute. fork() four times (if you have quad-core). Now open any graphical CPU monitor. Anything cool?

Performance changes significantly every time I rerun program

I have a Rasberry PI like hardware which has a basic linux operating system and only one program running called "Main_Prog".
Every time I run a performance test again Main_Prog I get a less than 1% performance fluctuation. This is perfectly acceptable.
When I kill Main_Prog using the kill command, and re-start Main_Prog, the performance changes up to 8%. Further performance tests will vary less than 1% around this fluctuation.
Example
So for example if Main_Prog at first ran at 100 calls/sec and varied between 99-101 calls/sec.
I then did a "kill" command against Main_Prog and restarted using "./Main_Prog &". I then run a performance test and now Main_Prog is running 105 calls/sec with 104-106 calls/sec fluctuation. It will continue to run 104-106 calls/sec until I kill the Main_Prog and start it.
Any idea how to prevent fluctuation or what is happening? Remember, it is VERY consistent. No other programs running on operating system.
Your temporary fluctuation might be related to the page cache. I would not bother (the change is insignificant). See also http://www.linuxatemyram.com/
You might prefill the page cache, e.g. by running some wc Main_Prog before running ./Main_Prog
And you probably still do have some other executable programs & processes on your Raspberry Pi (check with top or ps auxw). I guess that /sbin/init is still running at pid 1. And probably your shell is running too.
It is quite unusual to have a Linux system with only one process. To get that, you should replace /sbin/init with your program, and I really don't recommend that, especially if you don't know Linux very well.
Since there are several processes running in your box, and because the kernel scheduler is preempting tasks at arbitrary moment, its behavior is not entirely reproducible, and that explains the observed fluctuation.
Read also more about real-time scheduling, setpriority(2), sched_setscheduler(2), pthread_setschedparam(3), readahead(2), mlock(2), madvise(2), posix_fadvise(2)
If you are mostly interested in benchmarking, the sensible way is to repeat the same benchmark several times (e.g. 4 to 15 times) and either take the minimum, or the maximum, or the average.

Linux automatically restarting application on crash - Daemons

I have an system running embedded linux and it is critical that it runs continuously. Basically it is a process for communicating to sensors and relaying that data to database and web client.
If a crash occurs, how do I restart the application automatically?
Also, there are several threads doing polling(eg sockets & uart communications). How do I ensure none of the threads get hung up or exit unexpectedly? Is there an easy to use watchdog that is threading friendly?
You can seamlessly restart your process as it dies with fork and waitpid as described in this answer. It does not cost any significant resources, since the OS will share the memory pages.
Which leaves only the problem of detecting a hung process. You can use any of the solutions pointed out by Michael Aaron Safyan for this, but a yet easier solution would be to use the alarm syscall repeatedly, having the signal terminate the process (use sigaction accordingly). As long as you keep calling alarm (i.e. as long as your program is running) it will keep running. Once you don't, the signal will fire.
That way, no extra programs needed, and only portable POSIX stuff used.
The gist of it is:
You need to detect if the program is still running and not hung.
You need to (re)start the program if the program is not running or is hung.
There are a number of different ways to do #1, but two that come to mind are:
Listening on a UNIX domain socket, to handle status requests. An external application can then inquire as to whether the application is still ok. If it gets no response within some timeout period, then it can be assumed that the application being queried has deadlocked or is dead.
Periodically touching a file with a preselected path. An external application can look a the timestamp for the file, and if it is stale, then it can assume that the appliation is dead or deadlocked.
With respect to #2, killing the previous PID and using fork+exec to launch a new process is typical. You might also consider making your application that runs "continuously", into an application that runs once, but then use "cron" or some other application to continuously rerun that single-run application.
Unfortunately, watchdog timers and getting out of deadlock are non-trivial issues. I don't know of any generic way to do it, and the few that I've seen are pretty ugly and not 100% bug-free. However, tsan can help detect potential deadlock scenarios and other threading issues with static analysis.
You could create a CRON job to check if the process is running with start-stop-daemon from time to time.
use this script for running your application
#!/bin/bash
while ! /path/to/program #This will wait for the program to exit successfully.
do
echo “restarting” # Else it will restart.
done
you can also put this script on your /etc/init.d/ in other to start as daemon

Resources