I noticed today that when making requests from our web server, things were rather slow.
I started looking into it and I've found a load of root owned apache processes.
I don't know for sure that this is actually what's causing things to be slow, but none the less, it doesn't look good.
problem is, I don't know what to do from here?
How do I find out why there are so many root processes?
Could some recommend a set of tests? I've tried stracing a few of them, and they appear to be doing something, but the output of strace is beyond me.
root 30918 1.8 1.3 84284 52296 ? Ss 14:11 0:01 /usr/sbin/apache2 -k restart
root 30919 0.0 1.1 84420 45612 ? S 14:11 0:00 /usr/sbin/apache2 -k restart
root 30920 0.0 1.1 84420 45604 ? S 14:11 0:00 /usr/sbin/apache2 -k restart
root 30921 0.0 1.1 84420 45612 ? S 14:11 0:00 /usr/sbin/apache2 -k restart
root 30922 0.1 1.1 84420 45612 ? S 14:11 0:00 /usr/sbin/apache2 -k restart
root 30923 0.0 1.1 84420 45612 ? S 14:11 0:00 /usr/sbin/apache2 -k restart
www-data 30926 6.6 1.5 104964 61336 ? S 14:12 0:03 /usr/sbin/apache2 -k restart
root 30930 0.1 1.1 84420 45616 ? S 14:12 0:00 /usr/sbin/apache2 -k restart
root 30933 0.0 1.1 84420 45616 ? S 14:12 0:00 /usr/sbin/apache2 -k restart
root 30935 0.0 1.1 84420 45616 ? S 14:12 0:00 /usr/sbin/apache2 -k restart
root 30936 0.0 1.1 84420 45616 ? S 14:12 0:00 /usr/sbin/apache2 -k restart
root 30937 0.0 1.1 84420 45616 ? S 14:12 0:00 /usr/sbin/apache2 -k restart
root 30938 0.0 1.1 84420 45616 ? S 14:12 0:00 /usr/sbin/apache2 -k restart
root 30961 0.0 1.1 84420 45612 ? S 14:12 0:00 /usr/sbin/apache2 -k restart
root 30989 0.0 1.1 84420 45612 ? S 14:12 0:00 /usr/sbin/apache2 -k restart
root 30990 0.0 1.1 84420 45612 ? S 14:12 0:00 /usr/sbin/apache2 -k restart
root 31011 0.1 1.1 84420 45612 ? S 14:12 0:00 /usr/sbin/apache2 -k restart
root 31013 0.1 1.1 84420 45612 ? S 14:12 0:00 /usr/sbin/apache2 -k restart
root 31014 0.0 1.1 84420 45612 ? S 14:12 0:00 /usr/sbin/apache2 -k restart
www-data 31175 2.5 1.5 104168 60524 ? S 14:12 0:00 /usr/sbin/apache2 -k restart
www-data 31189 2.3 1.4 102360 58920 ? S 14:12 0:00 /usr/sbin/apache2 -k restart
www-data 31190 1.5 1.4 101904 58356 ? S 14:12 0:00 /usr/sbin/apache2 -k restart
www-data 31191 0.3 1.1 84556 46760 ? S 14:12 0:00 /usr/sbin/apache2 -k restart
www-data 31192 1.4 1.4 101916 58384 ? S 14:12 0:00 /usr/sbin/apache2 -k restart
www-data 31193 1.5 1.4 101916 58376 ? S 14:12 0:00 /usr/sbin/apache2 -k restart
root 31240 0.1 1.1 84420 45612 ? S 14:12 0:00 /usr/sbin/apache2 -k restart
This is an example of the output from strace from one of the processes.
--- SIGCHLD (Child exited) # 0 (0) ---
read(6, 0xff87f6ef, 1) = -1 EAGAIN (Resource temporarily unavailable)
getuid32() = 0
close(17) = 0
gettimeofday({1354109303, 670988}, NULL) = 0
semop(5668864, {{0, -1, SEM_UNDO}}, 1) = 0
accept(4, {sa_family=AF_INET, sin_port=htons(48107), sin_addr=inet_addr("192.168.16.12")}, [16]) = 17
fcntl64(17, F_GETFD) = 0
fcntl64(17, F_SETFD, FD_CLOEXEC) = 0
semop(5668864, {{0, 1, SEM_UNDO}}, 1) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xf74a2768) = 1949
waitpid(1949, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 1949
--- SIGCHLD (Child exited) # 0 (0) ---
read(6, 0xff87f6ef, 1) = -1 EAGAIN (Resource temporarily unavailable)
getuid32() = 0
close(17) = 0
gettimeofday({1354109305, 724358}, NULL) = 0
semop(5668864, {{0, -1, SEM_UNDO}}, 1) = 0
accept(4, {sa_family=AF_INET, sin_port=htons(48132), sin_addr=inet_addr("192.168.16.12")}, [16]) = 17
fcntl64(17, F_GETFD) = 0
fcntl64(17, F_SETFD, FD_CLOEXEC) = 0
semop(5668864, {{0, 1, SEM_UNDO}}, 1) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xf74a2768) = 1974
waitpid(1974, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 1974
--- SIGCHLD (Child exited) # 0 (0) ---
I've disabled all of the modules in mods-enabled except for essential ones like auth, env, siteenv and alias and started the server. In this case I still get 6 root apache processes and 1 www-data owned apache process.
I've made sure all the modules are up2date.
There are no obvious errors in the logs.
config follow;
ServerRoot "/etc/apache2"
LockFile /var/lock/apache2/accept.lock
PidFile ${APACHE_PID_FILE}
Timeout 300
KeepAlive On
MaxKeepAliveRequests 100
KeepAliveTimeout 15
<IfModule mpm_worker_module>
StartServers 2
MaxClients 150
MinSpareThreads 25
MaxSpareThreads 75
ThreadsPerChild 25
MaxRequestsPerChild 0
</IfModule>
User ${APACHE_RUN_USER}
Group ${APACHE_RUN_GROUP}
AccessFileName .htaccess
<Files ~ "^\.ht">
Order allow,deny
Deny from all
</Files>
DefaultType text/plain
HostnameLookups Off
ErrorLog /var/log/apache2/error.log
LogLevel warn
Include /etc/apache2/mods-enabled/*.load
Include /etc/apache2/mods-enabled/*.conf
Include /etc/apache2/httpd.conf
Include /etc/apache2/ports.conf
LogFormat "%v:%p %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" vhost_combined
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
LogFormat "%h %l %u %t \"%r\" %>s %b" common
LogFormat "%{Referer}i -> %U" referer
LogFormat "%{User-agent}i" agent
CustomLog /var/log/apache2/other_vhosts_access.log vhost_combined
Include /etc/apache2/conf.d/
Include /etc/apache2/sites-enabled/
The compiled in modules are:
Compiled in modules:
core.c
mod_log_config.c
mod_logio.c
itk.c
http_core.c
mod_so.c
So I'm only running the mpm_worker config now.
DEBUG UPDATER
When I restart apache, and ps, I get something like this;
root 26921 0.5 1.3 80008 52452 ? Ss 21:27 0:02 /usr/sbin/apache2 -k start
root 27114 0.0 1.1 80144 44804 ? S 21:34 0:00 /usr/sbin/apache2 -k start
root 27115 0.0 1.1 80144 44820 ? S 21:34 0:00 /usr/sbin/apache2 -k start
root 27116 0.0 1.1 80144 44804 ? S 21:34 0:00 /usr/sbin/apache2 -k start
root 27117 0.0 1.1 80144 44804 ? S 21:34 0:00 /usr/sbin/apache2 -k start
root 27119 0.0 1.1 80144 44804 ? S 21:34 0:00 /usr/sbin/apache2 -k start
If I put LogLevel to debug and restart, then I see these messages from mod_proxy
[Thu Nov 29 21:34:01 2012] [info] Server built: Sep 9 2012 21:17:36
[Thu Nov 29 21:34:01 2012] [debug] itk.c(1100): AcceptMutex: sysvsem (default: sysvsem)
[Thu Nov 29 21:34:01 2012] [debug] proxy_util.c(1818): proxy: grabbed scoreboard slot 0 in child 27115 for worker proxy:reverse
[Thu Nov 29 21:34:01 2012] [debug] proxy_util.c(1818): proxy: grabbed scoreboard slot 0 in child 27114 for worker proxy:reverse
[Thu Nov 29 21:34:01 2012] [debug] proxy_util.c(1934): proxy: initialized single connection worker 0 in child 27115 for (*)
[Thu Nov 29 21:34:01 2012] [debug] proxy_util.c(1837): proxy: worker proxy:reverse already initialized
[Thu Nov 29 21:34:01 2012] [debug] proxy_util.c(1934): proxy: initialized single connection worker 0 in child 27114 for (*)
[Thu Nov 29 21:34:01 2012] [debug] proxy_util.c(1818): proxy: grabbed scoreboard slot 0 in child 27117 for worker proxy:reverse
[Thu Nov 29 21:34:01 2012] [debug] proxy_util.c(1837): proxy: worker proxy:reverse already initialized
[Thu Nov 29 21:34:01 2012] [debug] proxy_util.c(1934): proxy: initialized single connection worker 0 in child 27117 for (*)
[Thu Nov 29 21:34:01 2012] [debug] proxy_util.c(1818): proxy: grabbed scoreboard slot 0 in child 27119 for worker proxy:reverse
[Thu Nov 29 21:34:01 2012] [debug] proxy_util.c(1837): proxy: worker proxy:reverse already initialized
[Thu Nov 29 21:34:01 2012] [debug] proxy_util.c(1934): proxy: initialized single connection worker 0 in child 27119 for (*)
[Thu Nov 29 21:34:01 2012] [debug] proxy_util.c(1818): proxy: grabbed scoreboard slot 0 in child 27116 for worker proxy:reverse
[Thu Nov 29 21:34:01 2012] [debug] proxy_util.c(1837): proxy: worker proxy:reverse already initialized
[Thu Nov 29 21:34:01 2012] [debug] proxy_util.c(1934): proxy: initialized single connection worker 0 in child 27116 for (*)
[Thu Nov 29 21:36:20 2012] [notice] SIGHUP received. Attempting to restart
Notice the pids match. However, if I disable mod_proxy, then these message disappear, but I still get the same number of root processes starting, so I believe this is a symptom not a cause.
This is absolutely normal for Apache. Each process processes one request at a time. So if there was only one process (it is called worker) then it would be really slow if there are lots of users.
The issue I see is that these should not be root owned processes. Depending on your platform it should have it's own user. Like in Debian user would be www-data. Then only one process would be owned by root and rest would be owned by that user.
However speed is defined by several factors - hardware, web server, and web application.
Make sure that hardware you are running on fits requirements (enough ram and CPU)
Lower number of workers in case of poor hardware capabilities or increase if it is super good.
Make sure that web application (if there's is one, and often it is php app) is not a bottleneck for performance.
PS: sorry for poor formatting, typed clamsily from phone.
Know I'm a bit late to the game but I ran into the same issue and was going nuts trying to figure out what's going on. I'm on apache 2.4.7 so a bit newer than you but the general premise is the same.
I had to look in /etc/apache2/mods-enabled/mpm_prefork.conf to find my mpm configuration but you have it right here:
<IfModule mpm_worker_module>
StartServers 2
MaxClients 150
MinSpareThreads 25
MaxSpareThreads 75
ThreadsPerChild 25
MaxRequestsPerChild 0
</IfModule>
Looks like a valid config, which it is. However, your MaxRequestsPerChild, like mine was, is set to 0. I've adjusted it to approximately 10 (can probably go higher but am just testing now) and I think that's solved my problem. Hope this helps!
Related
what is the .stt file used for in TDengine database ?
under the dataDir some of the file is called .stt like this :
ssn#TDengine:/var/lib/taos/vnode/vnode14$ cd ..
ssn#TDengine:/var/lib/taos/vnode$ ls -ltR | grep -i stt
-rwxrwxrwx 1 root root 4096 Jan 11 10:19 v18f1736ver22.stt
-rwxrwxrwx 1 root root 4096 Jan 11 10:19 v19f1736ver16.stt
-rwxrwxrwx 1 root root 4096 Jan 10 20:00 v16f1736ver18.stt
-rwxrwxrwx 1 root root 4096 Jan 10 20:01 v17f1736ver27.stt
may I know what is it for ?
a specific description for this file ,what is it used ,does it impact the database performance,etc.
the .sst file is equivalent to the .last file in the TDengine database 2.0
it is used to store the data fragment that smaller than minrows configuration .
root 64994 14.0 0.0 8099944 85472 ? Sl 16:30 0:03 /bin/python3 -m pyflink.fn_execution.beam.beam_boot --id=5-1 --provision_endpoint=localhost:10514
root 64998 0.0 0.0 108060 684 ? S 16:30 0:00 tee /tmp/python-dist-6b89369d-ba23-4e5d-83d8-7a54dd9a3497/flink-python-udf-boot.log
This python process keeps running after cancelling its flink job. What can I do instead of killing it manually?
My requirement is to do dynamic cpu shielding in C program based on the queue length of runnable threads (but not running threads which are waiting for CPU availability) in Realtime operating systems-(say ubuntu with RT linux patch) scenarios. For example, we can consider the system is configured for SCHED_FIFO policy.
I am not able to find any commands to retrieve the number of process which are in wait state, running state, runnable state etc.
Any help is much appreciated.
The command 'PS -T au' shows the state of all 'runnable' as well as 'running' threads as 'R'.
PS -T au
Below is the result I am getting from above command. In this ThreadID-16841, 16842 and 16843 are threads which were created by main process 16840. All the above created threads were showing in R state which denotes Runnable or running.
Instead I would like a linux command or C API to retrieve the number of processes in a runnable state but not running.
USER PID SPID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 914 914 0.1 1.3 428324 105804 tty7 Rsl+ Oct23 1:27 /usr/lib/xorg/Xorg -core :0 -seat seat0 -auth /var/run/lightdm/root/:0 -nolisten
root 914 925 0.0 1.3 428324 105804 tty7 Ssl+ Oct23 0:04 /usr/lib/xorg/Xorg -core :0 -seat seat0 -auth /var/run/lightdm/root/:0 -nolisten
root 1170 1170 0.0 0.0 23004 1772 tty1 Ss+ Oct23 0:00 /sbin/agetty --noclear tty1 linux
senthil 1979 1979 0.0 0.0 29532 5056 pts/11 Ss Oct23 0:00 bash
senthil 2032 2032 0.0 0.0 29552 5212 pts/2 Ss Oct23 0:00 bash
root 16837 16837 0.0 0.0 62092 4132 pts/2 S+ 09:37 0:00 sudo ./sigmain
root 16840 16840 0.0 0.0 31108 796 pts/2 Sl+ 09:37 0:00 ./sigmain
root 16840 16841 95.9 0.0 31108 796 pts/2 Rl+ 09:37 9:01 ./sigmain
root 16840 16842 95.9 0.0 31108 796 pts/2 Rl+ 09:37 9:01 ./sigmain
root 16840 16843 95.9 0.0 31108 796 pts/2 Rl+ 09:37 9:01 ./sigmain
senthil 17225 17225 0.0 0.0 44432 3364 pts/11 R+ 09:46 0:00 ps -T au
I started saving redis-db snapshot by calling BGSAVE command in redis-cli.
It has started running but I keep getting these errors in the logs
[30853] 27 Jan 07:18:41.129 # Background saving error
[30853] 27 Jan 07:18:47.043 * 1 changes in 900 seconds. Saving...
[30853] 27 Jan 07:18:47.058 * Background saving started by pid 13204
[13204] 27 Jan 07:18:47.058 # Failed opening .rdb for saving: Permission denied
[30853] 27 Jan 07:18:47.158 # Background saving error
[30853] 27 Jan 07:18:53.070 * 1 changes in 900 seconds. Saving...
[30853] 27 Jan 07:18:53.085 * Background saving started by pid 13207
[13207] 27 Jan 07:18:53.085 # Failed opening .rdb for saving: Permission denied
[30853] 27 Jan 07:18:53.186 # Background saving error
[30853] 27 Jan 07:18:59.098 * 1 changes in 900 seconds. Saving...
[30853] 27 Jan 07:18:59.113 * Background saving started by pid 13210
[13210] 27 Jan 07:18:59.114 # Failed opening .rdb for saving: Permission denied
[30853] 27 Jan 07:18:59.213 # Background saving error
looks like the redis BGSAVE command is running indefinitely. How to stop this.
Also I tried checking for process pid by ps -aux| grep redis command.
13196 pts/11 S+ 0:00 grep --color=auto redis
30853 ? Ssl 1292:57 /usr/bin/redis-server *:6379
There is no process to kill.
EDIT: These are the permissions to redis folder and dump.rdb file
f: /var/lib/redis
drwxr-xr-x root root /
drwxr-xr-x root root var
drwxr-xr-x root root lib
drwxr-xr-x redis redis redis
f: /var/lib/redis/dump.rdb
drwxr-xr-x root root /
drwxr-xr-x root root var
drwxr-xr-x root root lib
drwxr-xr-x redis redis redis
-rw-rw-rw- redis redis dump.rdb
EDIT2: Got the answer. The problem was somehow the config parameters are changed. The dbfilename and dir values are changed.
Set these values to original through CONFIG SET command and now its working fine. Adding in-case somebody has same problem.
But the question is how did they change. Did this happen to anybody else?
Help me
Thanks
You can either try and fix the file permissions error (does the default save location exist and does redis have permission to write to it?) or you can disable saving with:
config set save ""
I have enabled the WAL Archiving with following archive command:
wal_keep_segments = 32
archive_mode = on
archive_command = 'gzip < %p > /mnt/nfs/archive/%f'
and on Slave I have restore command as:
restore_command = 'gunzip < /mnt/nfs/archive/%f > %p'
archive_cleanup_command = '/opt/PostgreSQL/9.4/bin/pg_archivecleanup -d /mnt/nfs/archive %r'
on Master I could see that many files are stuck. around 327 files are yet to be archived. Ideally it should be only 32 only.
the px command shows:
-bash-4.1$ ps x
PID TTY STAT TIME COMMAND
3302 ? S 0:00 /opt/PostgreSQL/9.4/bin/postgres -D /opt/PostgreSQL/9.4/data
3304 ? Ss 0:00 postgres: logger process
3306 ? Ss 0:09 postgres: checkpointer process
3307 ? Ss 0:00 postgres: writer process
3308 ? Ss 0:06 postgres: wal writer process
3309 ? Ss 0:00 postgres: autovacuum launcher process
3311 ? Ss 0:00 postgres: stats collector process
3582 ? S 0:00 sshd: postgres#pts/1
3583 pts/1 Ss 0:00 -bash
3628 ? Ss 0:00 postgres: archiver process archiving 000000010000002D000000CB
3673 ? S 0:00 sh -c gzip < pg_xlog/000000010000002D000000CB > /mnt/nfs/archive/000000010000002D000000CB
3674 ? D 0:00 gzip
3682 ? S 0:00 sshd: postgres#pts/0
3683 pts/0 Ss 0:00 -bash
4070 ? Ss 0:00 postgres: postgres postgres ::1(34561) idle
4074 ? Ss 0:00 postgres: postgres sorriso ::1(34562) idle
4172 pts/0 S+ 0:00 vi postgresql.conf
4192 pts/1 R+ 0:00 ps x
-bash-4.1$ ls | wc -l
327
-bash-4.1$
gzip and gunzip without flags expect to work with files, compressing or uncompressing them in-place. You're trying to use them as stream processors. That's not going to work.
You want to use gzip -c and zcat (or gunzip -c) to tell them to use stdio.
Additionally, though, you should probably use a simple script as the archive_command that:
Writes with gzip -c to a temp file
Moves the temp file to the final location with mv
This ensures that the file is not read by the replica until it's fully written by the master.
Also, unless the master and replica are sharing the same network file system (or are both on the same host), you might actually need to use scp or similar to transfer the archive files. The restore_command uses paths on the replica, not on the master, so unless the replica server can access the WAL archive via NFS/CIFS/etc, you're going to need to copy the files.