Why does apache mod_perl process become a zombie? - apache2

Occasionally mod_perl apache process is marked "defunct" in "top" utility, that is becomes a zombie process.
Is it a correct behavior?
Do I have to worry about it?
Our Perl script is very simple, it does not spawn any child processes.
The zombie process disappears pretty quickly.
Apache2, Ubuntu.
Our apache config is here: apache_config.txt
Here is a snap-shot of top.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19525 www-data 20 0 55972 25m 4684 S 10.3 2.4 0:00.32 apache2
19486 www-data 20 0 52792 21m 4120 S 1.7 2.1 0:00.05 apache2
19538 www-data 20 0 52792 21m 4120 S 1.3 2.1 0:00.04 apache2
19539 www-data 20 0 0 0 0 Z 0.7 0.0 0:00.03 apache2 <defunct>
19481 www-data 20 0 52860 21m 4016 S 0.3 2.1 0:00.05 apache2
19521 www-data 20 0 52804 21m 3824 S 0.3 2.1 0:00.08 apache2
These are CPAN modules I use
CGI();
XML::LibXML();
DateTime;
DateTime::TimeZone;
Benchmark();
Data::Dump();
Devel::StackTrace();
DBD::mysql();
DBI();
LWP();
LWP::UserAgent();
HTTP::Request();
HTTP::Response();
URI::Heuristic();
MD5();
IO::String();
DateTime::Format::HTTP();
Math::BigInt();
Digest::SHA1();
top:
26252 www-data 20 0 0 0 0 Z 0.3 0.0 0:00.22 apache2 <defunct>
access.log with pid logged as the first parameter:
26252 85.124.207.173 - - [26/Dec/2009:22:16:42 +0300] "GET /cgi-bin/wimo/server/index.pl?location=gn:2761369&request=forecast&client_part=app&ver=2_0b191&client=desktop&license_type=free&auto_id=125CC6B6DAA HTTP/1.1" 200 826 0 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; GTB6.3; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)"
3 different zombie processes logged by server-status
Srv PID Acc M CPU SS Req ConnChild Slot Client VHost Request
32-0 1300 0/0/45 _ 0.00 0 0 0.0 0.00 2.29 127.0.0.1 weather_server OPTIONS * HTTP/1.0
100-0 1254 1/7/41 C 0.22 0 0 0.0 0.00 1.51 127.0.0.1 weather_server OPTIONS * HTTP/1.0
29-0 1299 0/12/78 _ 0.31 0 2 0.0 0.78 2.37 [my ip was here] weather_server GET /server-status HTTP/1.1

My first suspicion is that you really are forking but maybe you don't realize it. Is it possible for to include your code? Remember that any system or `` calls are forking. This could easily be happening inside a CPAN module without you realizing. There is some useful information about mod_perl and forking (including how zombies are created and how to avoid them) here.
Update: try adding this to your config:
# Monitor apache server status
ExtendedStatus On
<VirtualHost 127.0.0.1:80>
<Location /server-status>
SetHandler server-status
Order deny,allow
Deny from all
Allow from 127.0.0.1
</Location>
</VirtualHost>
And then change the Allow from to be your IP, then you can visit http://yourdomain.com/server-status and get a page of summary information on apache. Try doing this when you see one of the zombies and look to see what apache thinks that process is doing.

I see that too with my very simple mod_perl 2 module. I do not fork anything, just write a string to client socket and then return OK. And still a defunct process appears at my CentOS 5.5 Linux VM and then goes away. Here is my source code and you can test it with "telnet yourhost 843" and pressing ENTER:
package SocketPolicy;
# Run: semanage port -a -t http_port_t -p tcp 843
# And add following lines to the httpd.conf
# Listen 843
# <VirtualHost _default_:843>
# PerlModule SocketPolicy
# PerlProcessConnectionHandler SocketPolicy
# </VirtualHost>
use strict;
use warnings FATAL => 'all';
use APR::Const(-compile => 'SO_NONBLOCK');
use APR::Socket();
use Apache2::ServerRec();
use Apache2::Connection();
use Apache2::Const(-compile => qw(OK DECLINED));
use constant POLICY =>
qq{<?xml version="1.0"?>
<!DOCTYPE cross-domain-policy SYSTEM
"http://www.adobe.com/xml/dtds/cross-domain-policy.dtd">
<cross-domain-policy>
<allow-access-from domain="*" to-ports="8080"/>
</cross-domain-policy>
\0};
sub handler {
my $conn = shift;
my $socket = $conn->client_socket();
my $offset = 0;
# set the socket to the blocking mode
$socket->opt_set(APR::Const::SO_NONBLOCK => 0);
do {
my $nbytes = $socket->send(substr(POLICY, $offset), length(POLICY) - $offset);
# client connection closed or interrupted
return Apache2::Const::DECLINED unless $nbytes;
$offset += $nbytes;
} while ($offset < length(POLICY));
my $slog = $conn->base_server()->log();
$slog->warn('served socket policy to: ', $conn->remote_ip());
return Apache2::Const::OK;
}
1;

Related

Lando db-import is taking very long time to import database

I am using lando (v3.0.0.rc-14) along with docker CE (latest version) for my Drupal 7 site. I am trying to import database of file size uncomressed(~4GB) & compressed (~900 MB) using following command:
lando db-import db-name db-filename.sql
lando db-import db-name db-filename.sql.gz
But, this is not helping. Usually, this db-import is taking more than 24 hours to complete. Is this a problem with my version or my settings in .lando.yml file?
My CPU and Mem usage stats below:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
28334 1001 20 0 1435088 568912 15892 S 2.3 3.5 4:27.81 mysqld
3131 usersf 20 0 306928 9320 7172 S 1.7 0.1 0:38.15 gvfs-udisks2-vo
2095 root 20 0 2434900 96804 39620 S 1.3 0.6 0:34.19 dockerd
Try these instructions
Disable writing barrier for the ext4 fs:
$ sudo gedit /etc/fstab
Here you’ll see something like this:
UUID=700a2404-f687-4ae2-a2d5-54291553551e / ext4 errors=remount-ro 0 1
So you just need to add barrier=0
UUID=700a2404-f687-4ae2-a2d5-54291553551e / ext4 errors=remount-ro,barrier=0 0 1
Reboot your system.
Or try any of the suggestions

snmptrapd logging error- couldn't open udp:162 -- errno 98 ("Address already in use")

I am trying to receive a trap generated by a cisco router on my VM- Ubuntu 14.04. I can do a snmwalk so I guess snmp is working fine but I am not able to receive the traps generated by router on my VM.
a#ubuntu:~$ sudo /etc/init.d/snmpd restart
* Restarting network management services:
a#ubuntu:~$ sudo /etc/init.d/snmpd status
* snmpd is running
* snmptrapd is running
Here is what I have inside files-
/etc/default/snmpd-
export MIBS=
SNMPDRUN=yes
SNMPDOPTS='-Lsd -Lf /dev/null -u snmp -I -smux -p /var/run/snmpd.pid -c /etc/snmp/snmpd.conf'
TRAPDRUN=yes
# snmptrapd options (use syslog).
TRAPDOPTS='-n -On -t -Lsd -p /var/run/snmptrapd.pid'
/etc/snmp/-
snmpd.conf-
rocommunity public
snmptrapd.conf-
disableAuthorization yes
snmp.conf-
mibs:
The command I am running for viewing the traps on VM-
a#ubuntu:/etc/snmp$ sudo snmptrapd -f -Lo -c snmptrapd.conf
couldn't open udp:162 -- errno 98 ("Address already in use")
I am confused since the port is being used by snmptrap itself-
a#ubuntu:~$ cat /etc/services|grep 162
snmp-trap 162/tcp snmptrap # Traps for SNMP
snmp-trap 162/udp snmptrap
a#ubuntu:~$ sudo netstat -lnp| grep 162
udp 0 0 0.0.0.0:162 0.0.0.0:* 6216/snmptrapd
a#ubuntu:~$ ps -ef | grep snmptrapd
root 6216 2076 0 10:43 ? 00:00:00 /usr/sbin/snmptrapd -Lsd -p /var/run/snmptrapd.pid
a 6493 2667 0 11:47 pts/8 00:00:00 grep --color=auto snmptrapd
Generating a trap from windows using SnmpTrapGen.exe leads to the same error.
Is there any way of solving this issue? I have googled a lot and stuck on this for days, any help will be very much appreciated.
Thanks a lot in advance!!
Port 162 can listen only with an application. If you get this error , you have an app already running which listens port 162 , those can be snmptrapd service or your own application for snmp traps. You should close one of the applications.

Vagrant VMs can talk to each other, but I can't reach HTTP from the host

I have 5 VMs on 192.168.56.*:
.19 - Zookeeper
.20 - Solr1
.21 - Solr2
.22 - Solr3
.23 - Solr4
This is my Vagrantfile:
Vagrant.configure(2) do |config|
# The most common configuration options are documented and commented below.
# For a complete reference, please see the online documentation at
# https://docs.vagrantup.com.
# Every Vagrant development environment requires a box. You can search for
# boxes at https://atlas.hashicorp.com/search.
# config.vm.box = "base"
(1..4).each do |x|
ip = 20
config.vm.define "solr#{x}" do |solr|
solr.vm.box = 'ubuntu/wily64'
solr.vm.network "private_network", ip: "192.168.56.#{ip}", bridge: "Intel(R) Centrino(R) Advanced-N 6205"
ip = ip + 1
solr.vm.provider "virtualbox" do |v|
v.memory = 2048
#v.cpus = 1
end
end
end
end
I have Apache HTTP on port 80 and Solr on port 8983. I can do wget 192.168.56.20:8983 from the ZooKeeper VM and it downloads the main page. When I try to hit 192.168.56.20:8983 from the host OS, it just hangs. Firewall rules are in place that open up those ports, so no idea why .19 can access Solr, but the host cannot.
Any ideas?

Problems regarding nagios.cfg with defined services and hosts (generated by Nagiosql)

Nagiosql generated files make problems during preflight check - but everythings seems to be okay.
/etc/nagios/nagios.cfg
....
## Hosts
cfg_dir=/etc/nagiosql/hosts/
cfg_file=/etc/nagiosql/hosttemplates.cfg
cfg_file=/etc/nagiosql/hostgroups.cfg
cfg_file=/etc/nagiosql/hostextinfo.cfg
cfg_file=/etc/nagiosql/hostescalations.cfg
cfg_file=/etc/nagiosql/hostdependencies.cfg
## Services
cfg_dir=/etc/nagiosql/services/
cfg_file=/etc/nagiosql/servicetemplates.cfg
cfg_file=/etc/nagiosql/servicegroups.cfg
cfg_file=/etc/nagiosql/serviceextinfo.cfg
cfg_file=/etc/nagiosql/serviceescalations.cfg
cfg_file=/etc/nagiosql/servicedependencies.cfg
...
nagios -v /etc/nagios/nagios.cfg
....
Running pre-flight check on configuration data...
Checking services...
Error: There are no services defined!
Checked 0 services.
Checking hosts...
Error: There are no hosts defined!
Checked 0 hosts.
The content seems okay to me
[root#xxx services]# cd /etc/nagiosql/services/
[root#xxx services]# ls -alh
total 20K
drwsr-sr-x 2 apache nagios 4.0K Aug 7 10:46 .
drwsr-sr-x 5 apache nagios 4.0K Aug 7 12:17 ..
-rw-r--r-- 1 apache nagios 2.3K Aug 7 10:46 localhost.cfg
-rw-r--r-- 1 apache nagios 2.2K Aug 7 10:46 www.google.com.cfg
-rw-r--r-- 1 apache nagios 1.1K Aug 7 10:46 www.yahoo.com.cfg
[root#xxx hosts]# ls -alh
total 16K
drwsr-sr-x 2 apache nagios 4.0K Aug 11 07:12 .
drwsr-sr-x 5 apache nagios 4.0K Aug 7 12:17 ..
-rw-r--r-- 1 apache nagios 800 Aug 11 07:12 GIT.cfg
-rw-r--r-- 1 apache nagios 948 Aug 11 07:12 psm01.cfg
Content also seems to be fine (generated by nagiosql):
[root#xxx hosts]# vi GIT.cfg
###############################################################################
#
# Host configuration file
#
# Created by: Nagios QL Version 3.2.0
# Date: 2015-08-11 07:12:54
# Version: Nagios 3.x config file
#
# --- DO NOT EDIT THIS FILE BY HAND ---
# Nagios QL will overwite all manual settings during the next update
#
###############################################################################
define host {
host_name GIT
alias GIT Server
address 172.25.10.80
register 0
}
###############################################################################
#
# Host configuration file
#
# END OF FILE
#
###############################################################################
~
Can somebody tell me where the solution for this problem is? Already wasted 2 hours...
Try removing the final slash from the directory names in your cfg_dir definitions and see if that doesn´t get it to recognize the cfg files in that directory.
For example,
Change:
cfg_dir=/etc/nagiosql/hosts/
...
cfg_dir=/etc/nagiosql/services/
To:
cfg_dir=/etc/nagiosql/hosts
...
cfg_dir=/etc/nagiosql/services
EDIT:
Okay I think directory permissions may be causing the cfg_dir evaulations to fail. According to the ls -alh output you listed, your /etc/nagiosql/hosts/, /etc/nagiosql/services/, and /etc/nagiosql/ directories do not grant write permissions to the nagios group. Nagios will need to get a directory listing for those directories and will need group write permissions to do it.
To remedy:
chmod g+w /etc/nagiosql/hosts/
chmod g+w /etc/nagiosql/services/
Restart nagios service.
Also, you don't need to remove the slashes from the directory paths in the nagios cfg_dir configurations. Nagios will strip the trailing slash (/) for you, according to the code:
https://github.com/NagiosEnterprises/nagioscore/blob/eb8e83d5d05e572eb8c0d4d4764885c5427b4b69/xdata/xodtemplate.c#L327
/* process all files in a config directory */ else if(!strcmp(var, "xodtemplate_config_dir") || !strcmp(var, "cfg_dir")) {
if(config_base_dir != NULL && val[0] != '/') {
asprintf(&cfgfile, "%s/%s", config_base_dir, val);
} else
cfgfile = strdup(val);
/* strip trailing / if necessary */ if(cfgfile != NULL && cfgfile[strlen(cfgfile) - 1] == '/')
cfgfile[strlen(cfgfile) - 1] = '\x0';
/* process the config directory... */ result = xodtemplate_process_config_dir(cfgfile, options);
my_free(cfgfile);
/* if there was an error processing the config file, break out of loop */ if(result == ERROR)
break; } }
EDIT #2: In the host definition you posted, your register value is set to 0. Try setting it to 1 instead. register 0 is used for templates that will be inherited from, but will not actually show up in the Nagios UI.
Change:
define host {
host_name GIT
alias GIT Server
address 172.25.10.80
register 0
}
To:
define host {
host_name GIT
alias GIT Server
address 172.25.10.80
register 1
}
Also please set register 1 for your service definitions as well.
Try adding executable permissions to your directories. Some programs and languages require +x permissions in order to actually open the directory.
If that doesn't work, temporarily set everything to 0777 permissions to see if the issue is permissions related at all.
You also have config problems even if you get that part working. Your host and service configs don't have a use directive in them, which points to a template that would have most of the default values. The register directive is implied as 1 unless you specifically set it to 0 for a template. See the object definitions docs if you need a reference: https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/3/en/objectdefinitions.html

open() returns with "No such device" error, but there is such a device (linux)

I'm trying to use a somewhat old DAQ, and had to jump through a few hoops to get an old (circa 2004) device driver for it to compile (DTI-DT340 Linux-DAQ-PCI).
I've gotten to the point where it compiles, I can load the kernel module, it finds the card, and I can create the character devices using mknod.
But I can't seem to open these devices and keep getting errno 19 (ENODEV) 'No such device' when I try to
open("/dev/dt340/0",O_RDWR);
but mknod had no complaints about making it, and it's there:
# ls -l /dev/dt340/
total 0
crw-rw-r-- 1 root staff 250, 0 2009-04-23 11:02 0
crw-rw-r-- 1 root staff 250, 1 2009-04-23 11:02 1
crw-rw-r-- 1 root staff 250, 2 2009-04-23 11:02 2
crw-rw-r-- 1 root staff 250, 3 2009-04-23 11:02 3
Is there something I'm neglecting to do? What might be a reason open fails?
Here's the script I use to load the driver and make the devices.
#!/bin/bash
module="dt340"
device="dt340"
mode="664"
# invoke modprobe with all arguments we were passed
#/sbin/modprobe -t misc -lroot -f -s $module.o $* || exit 1
insmod $module.ko
# remove stale nodes
rm -f /dev/${device}/[0-3]
major=`awk "\\$2==\"$module\" {print \\$1}" /proc/devices`
mkdir -p /dev/${device}
mknod /dev/${device}/0 c $major 0
mknod /dev/${device}/1 c $major 1
mknod /dev/${device}/2 c $major 2
mknod /dev/${device}/3 c $major 3
# give appropriate group/permissions, and change the group
# not all distributions have staff; some have "users" instead
group="staff"
grep '^staff:' /etc/group > /dev/null || group="users"
chgrp $group /dev/${device}/[0-3]
chmod $mode /dev/${device}/[0-3]
Some additional info:
#grep dt340 /proc/devices
250 dt340
# lsmod | grep dt340
dt340 21516 0
# tail /var/log/messages
Apr 23 11:59:26 ve kernel: [ 412.862139] dt340 0000:03:01.0: PCI INT A -> GSI 22 (level, low) -> IRQ 22
Apr 23 11:59:26 ve kernel: [ 412.862362] dt340: In function dt340_init_one:
Apr 23 11:59:26 ve kernel: [ 412.862363] Device DT340 Rev 0x0 detected at address 0xfebf0000
#lspci | grep 340
03:01.0 Multimedia controller: Data Translation DT340
ANSWER: A printk confirmed that the -ENODEV was thrown from inside open(). Following an oldstyle
while ((pdev = pci_find_device(PCI_VENDOR_ID_DTI, PCI_ANY_ID, pdev)))
(which is deprecated), if(!pdev) ends up true, and returns the -ENODEV.
I'm inching closer - I guess I have to work through and update the pci code to use more modern mechanisms...
If the device shows up in /proc/devices, and you're sure you've got the number right in mknod, then the driver itself is refusing the open. The driver can return any error code from open() - including "no such device", which it might if it discovered a problem initialising the hardware.
I'd guess it is a problem in the driver, check the open function.
It shows up in /proc/devices, so all the generic device stuff seems to be ok.
mknod doesn't care if there is an device corresponding to the given major/minor numbers. Are you sure insmod is installing your module? What does lsmod tell you?
I'm unfamiliar with having to add the ".ko" extension. Is that something specific to your device driver?
Check through lspci and make sure hardware is properly initialized. If your system supports hotplug, pci_find_device won't work. The problem with this is a refcnt. The best way to deal and learn is to dissect the API. BOL !!

Resources