WAL Archive hangs in postgres when gzip is used - database

I have enabled the WAL Archiving with following archive command:
wal_keep_segments = 32
archive_mode = on
archive_command = 'gzip < %p > /mnt/nfs/archive/%f'
and on Slave I have restore command as:
restore_command = 'gunzip < /mnt/nfs/archive/%f > %p'
archive_cleanup_command = '/opt/PostgreSQL/9.4/bin/pg_archivecleanup -d /mnt/nfs/archive %r'
on Master I could see that many files are stuck. around 327 files are yet to be archived. Ideally it should be only 32 only.
the px command shows:
-bash-4.1$ ps x
PID TTY STAT TIME COMMAND
3302 ? S 0:00 /opt/PostgreSQL/9.4/bin/postgres -D /opt/PostgreSQL/9.4/data
3304 ? Ss 0:00 postgres: logger process
3306 ? Ss 0:09 postgres: checkpointer process
3307 ? Ss 0:00 postgres: writer process
3308 ? Ss 0:06 postgres: wal writer process
3309 ? Ss 0:00 postgres: autovacuum launcher process
3311 ? Ss 0:00 postgres: stats collector process
3582 ? S 0:00 sshd: postgres#pts/1
3583 pts/1 Ss 0:00 -bash
3628 ? Ss 0:00 postgres: archiver process archiving 000000010000002D000000CB
3673 ? S 0:00 sh -c gzip < pg_xlog/000000010000002D000000CB > /mnt/nfs/archive/000000010000002D000000CB
3674 ? D 0:00 gzip
3682 ? S 0:00 sshd: postgres#pts/0
3683 pts/0 Ss 0:00 -bash
4070 ? Ss 0:00 postgres: postgres postgres ::1(34561) idle
4074 ? Ss 0:00 postgres: postgres sorriso ::1(34562) idle
4172 pts/0 S+ 0:00 vi postgresql.conf
4192 pts/1 R+ 0:00 ps x
-bash-4.1$ ls | wc -l
327
-bash-4.1$

gzip and gunzip without flags expect to work with files, compressing or uncompressing them in-place. You're trying to use them as stream processors. That's not going to work.
You want to use gzip -c and zcat (or gunzip -c) to tell them to use stdio.
Additionally, though, you should probably use a simple script as the archive_command that:
Writes with gzip -c to a temp file
Moves the temp file to the final location with mv
This ensures that the file is not read by the replica until it's fully written by the master.
Also, unless the master and replica are sharing the same network file system (or are both on the same host), you might actually need to use scp or similar to transfer the archive files. The restore_command uses paths on the replica, not on the master, so unless the replica server can access the WAL archive via NFS/CIFS/etc, you're going to need to copy the files.

Related

pyflink.fn_execution.beam.beam_boot doesnt close after its job cancelled

root 64994 14.0 0.0 8099944 85472 ? Sl 16:30 0:03 /bin/python3 -m pyflink.fn_execution.beam.beam_boot --id=5-1 --provision_endpoint=localhost:10514
root 64998 0.0 0.0 108060 684 ? S 16:30 0:00 tee /tmp/python-dist-6b89369d-ba23-4e5d-83d8-7a54dd9a3497/flink-python-udf-boot.log
This python process keeps running after cancelling its flink job. What can I do instead of killing it manually?

Running fork, dup2 and exec from a C program in background

I'm developing a embedded C application (gather.out) for a controller that runs Debian 8 (Jessie). This application calls another program called gather. The gather program prints some data on screen when I run it alone on the terminal. I decided to use fork and dup2 to redirect the output to a output file that contains the gathered data, as follows:
pid_t pid=fork();
/* Child process */
if(pid==0)
{
// Open gather output file
int gather_file_fd=open(actual_filename, O_WRONLY | O_CREAT, 0777);
if(gather_file_fd==-1)
{
return 2;
}
// Make stdout of gather program as input of output gather file
int gather_file_fd_dup=dup2(gather_file_fd, STDOUT_FILENO);
// Run the gather program
int err;
err=execl("/usr/bin/gather","gather","-w",NULL);
if(err==-1)
{
return 2;
}
}
Everything goes fine when I run it from the terminal, manually. When I set the controller to run it as startup (I don't know exactly how the controller start the program on boot, but it does) my output file gets corrupted.
I don't know how to debug this problem but I was trying to see the TTY of the processes to have some idea about what's happening using ps ux | grep gather. When running manually:
# ps ux | grep gather
root 18014 49.4 0.3 219380 6240 pts/0 RLl+ 14:50 0:51 ./gather.out
root 18022 39.2 0.2 219304 5988 pts/0 RLl+ 14:52 0:01 gather -w
root 18026 0.0 0.0 2076 568 pts/22 S+ 14:52 0:00 grep gather
And running at startup:
# ps ux | grep gather
root 17859 80.5 0.3 219380 6240 ? RLl 14:42 0:33 /var/ftp/usrflash/Project/C Language/Background Programs/gather.out
root 17978 45.0 0.2 219304 5984 ? RLl 14:43 0:01 gather -w
root 17982 0.0 0.0 2076 532 pts/0 S+ 14:43 0:00 grep gather
From there I noticed that there is no TTY attached to the processes running the program at startup. Does it affect the way I redirect the output with fork, dup2 and execl?
Edit 1
The correct output file must be like this (produced by gather application, runing gather -w > /var/ftp/gather/test.txt on the terminal and start the data acquisition on controller's IDE):
Waiting to gather
0 0 0
0 0 0
0 0 0
...
Where each line 0 0 0 is the sampled signals from the controller and could be any value, the ... indicates that the file is very large. When I run the gather program on the terminal gather -w > /var/ftp/gather/test.txt &, start data acquisition on controller's IDE, the gather program suddenly stops when I hit enter on terminal:
# gather -w > /var/ftp/gather/test.txt &
[2] 22232
#
[2]+ Stopped gather -w > /var/ftp/gather/test.txt
#
And the output file prints just the header:
Waiting to gather
Starting the gather program in background using gather -w < /dev/null >& /var/ftp/gather/test.txt & as one of the comments' advices and starting the data acquisition on controller's IDE I got the following file output:
Waiting to gather
Waiting to gather
Waiting to gather
Waiting to gather
...
The number of lines on the file is much greater than the number of samples on sample counter watched by IDE. My guess lies on a bug in the gather program. Unfortunatly I don't have gather's source code because it's part of natively programs of the controller.

Do we have any ways to retrieve the list of process and threads which are in runnable state(not running state) in ubuntu?

My requirement is to do dynamic cpu shielding in C program based on the queue length of runnable threads (but not running threads which are waiting for CPU availability) in Realtime operating systems-(say ubuntu with RT linux patch) scenarios. For example, we can consider the system is configured for SCHED_FIFO policy.
I am not able to find any commands to retrieve the number of process which are in wait state, running state, runnable state etc.
Any help is much appreciated.
The command 'PS -T au' shows the state of all 'runnable' as well as 'running' threads as 'R'.
PS -T au
Below is the result I am getting from above command. In this ThreadID-16841, 16842 and 16843 are threads which were created by main process 16840. All the above created threads were showing in R state which denotes Runnable or running.
Instead I would like a linux command or C API to retrieve the number of processes in a runnable state but not running.
USER PID SPID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 914 914 0.1 1.3 428324 105804 tty7 Rsl+ Oct23 1:27 /usr/lib/xorg/Xorg -core :0 -seat seat0 -auth /var/run/lightdm/root/:0 -nolisten
root 914 925 0.0 1.3 428324 105804 tty7 Ssl+ Oct23 0:04 /usr/lib/xorg/Xorg -core :0 -seat seat0 -auth /var/run/lightdm/root/:0 -nolisten
root 1170 1170 0.0 0.0 23004 1772 tty1 Ss+ Oct23 0:00 /sbin/agetty --noclear tty1 linux
senthil 1979 1979 0.0 0.0 29532 5056 pts/11 Ss Oct23 0:00 bash
senthil 2032 2032 0.0 0.0 29552 5212 pts/2 Ss Oct23 0:00 bash
root 16837 16837 0.0 0.0 62092 4132 pts/2 S+ 09:37 0:00 sudo ./sigmain
root 16840 16840 0.0 0.0 31108 796 pts/2 Sl+ 09:37 0:00 ./sigmain
root 16840 16841 95.9 0.0 31108 796 pts/2 Rl+ 09:37 9:01 ./sigmain
root 16840 16842 95.9 0.0 31108 796 pts/2 Rl+ 09:37 9:01 ./sigmain
root 16840 16843 95.9 0.0 31108 796 pts/2 Rl+ 09:37 9:01 ./sigmain
senthil 17225 17225 0.0 0.0 44432 3364 pts/11 R+ 09:46 0:00 ps -T au

LD_PRELOAD in every log in to server

I need to logging all terminal commands in Linux.
I have found correctly working library in C, but it works only when I run LD_PRELOAD=/usr/local/bin/bashpreload.so /bin/bash:
# ldd /bin/bash
linux-vdso.so.1 => (0x00007ffef59f8000)
/usr/local/bin/bashpreload.so (0x00007fe691323000)
libtinfo.so.5 => /lib64/libtinfo.so.5 (0x00007fe691102000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007fe690efe000)
libc.so.6 => /lib64/libc.so.6 (0x00007fe690b6a000)
/lib64/ld-linux-x86-64.so.2 (0x00007fe691524000)
If I log in again in the system after this, I will not see the lib with ldd:
[root#XXX ~]# LD_PRELOAD=/usr/local/bin/bashpreload.so /bin/bash
[root#XXX ~]# ldd /bin/bash
linux-vdso.so.1 => (0x00007ffe481f6000)
/usr/local/bin/bashpreload.so (0x00007f3f1b808000)
libtinfo.so.5 => /lib64/libtinfo.so.5 (0x00007f3f1b5e7000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f3f1b3e3000)
libc.so.6 => /lib64/libc.so.6 (0x00007f3f1b04f000)
/lib64/ld-linux-x86-64.so.2 (0x00007f3f1ba09000)
[root#XXX ~]# exit
[root#XXX ~]# logout
Connection to XXX closed.
[sahaquiel#sahaquiel-PC ~]$ ssh root#XXX
root#XXX's password:
Last login: Tue Dec 19 11:28:22 2017 from YYY
[root#XXX ~]# ldd /bin/bash
linux-vdso.so.1 => (0x00007ffca2f98000)
libtinfo.so.5 => /lib64/libtinfo.so.5 (0x00007f19a13ff000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f19a11fb000)
libc.so.6 => /lib64/libc.so.6 (0x00007f19a0e67000)
/lib64/ld-linux-x86-64.so.2 (0x00007f19a1620000)
And one more trouble: if I use this library, my current PID is changing:
Last login: Tue Dec 19 11:28:54 2017 from YYY
[root#XXX ~]# echo "Library is not uploaded"
Library is not uploaded
[root#XXX ~]# echo $$
4639
[root#XXX ~]# LD_PRELOAD=/usr/local/bin/bashpreload.so /bin/bash
[root#XXX ~]# echo $$
4654
[root#212-24-57-104 ~]# ps awwufx | grep -B5 [4]654
root 1706 0.0 0.0 66256 1192 ? Ss 10:54 0:00 /usr/sbin/sshd
root 4517 0.0 0.0 104636 4644 ? Ss 11:27 0:00 \_ sshd: root#pts/1
root 4519 0.0 0.0 108320 1872 pts/1 Ss+ 11:27 0:00 | \_ -bash
root 4637 0.0 0.0 104636 4624 ? Ss 11:30 0:00 \_ sshd: root#pts/0
root 4639 0.0 0.0 108320 1872 pts/0 Ss 11:30 0:00 | \_ -bash
root 4654 0.0 0.0 110376 1956 pts/0 S 11:31 0:00 | \_ /bin/bash
So, I need two things:
Find the way to do LD_PRELOAD quietly for each logging in user;
Know why after this I'm working in the child /bin/bash process.
Thanks!
This is a classic XY problem. You need to log user actions, have decided on a solution, and are asking questions about that solution.
Even though the solution won't work.
Because using an LD_PRELOAD library is not a reliable way to log user commands.
The user can just unset the LD_PRELOAD environment variable. And no, marking it readonly doesn't work. Because it's just a variable in the memory of a process the user controls.
You're setting LD_PRELOAD to a 64-bit shared object. Every 32-bit program will now fail to run.
However your preloaded library logs data, it does so with the user's permissions/access rights. Thus the user can spoof the data recorded.
If you need to log user's actions, use a system designed to do that securely: auditing.
Find the way to do LD_PRELOAD quietly for each logging in user
You need to set somewhere common for all users such as /etc/profile or /etc/environment.
See How to set environment variable for everyone under my linux system? for more options/details.
Know why after this I'm working in the child /bin/bash process.
That's straight-forward - whenever you create a process, its PID is different from its parent :) When you run /bin/bash, you obviously creates another shell and that's why $$ is different. This has nothing to do with LD_PRELOAD. If you run /bin/bash without LD_PRELOAD, you'll observe exactly the same behaviour.
If someone will needs the same as me:
You can add environment variable before user log in via SSH in /etc/security/pam_env.conf, syntax like:
LD_PRELOAD DEFAULT= OVERRIDE="/usr/local/bin/bashpreload.so"

i want to get the server name through Process from os level itself without connecting to DB

sybase 1215 30224 0 20:44 pts/3 00:00:00 grep dataserver
sybase 6138 6137 0 Feb04 ? 00:28:10 /u01/sybase/ASE15_0/ASE-15_0/bin/dataserver -d/u01/sybase/ASE15_0/data/aashish1_master.dat -e/u01/sybase/ASE15_0/ASE-15_0/install/aashish1.log -c/u01/sybase/ASE15_0/ASE-15_0/aashish1.cfg -M/u01/sybase/ASE15_0/ASE-15_0 -s**aashish1**
sybase 7671 1 0 Jan27 ? 00:55:50 /u01/sybase/ASE15_0/ASE-15_0/bin/dataserver -s**chaitu** -d/u01/sybase/ASE15_0/data/chaitu_master.dat -e/u01/sybase/ASE15_0/ASE-15_0/install/chaitu.log -c/u01/sybase/ASE15_0/ASE-15_0/chaitu.cfg -M/u01/sybase/ASE15_0/ASE-15_0
sybase 29479 29478 0 17:28 ? 00:00:33 /u01/sybase/ASE15_0/ASE-15_0/bin/dataserver -d/u01/sybase/ASE15_0/data/asdfg_master.dat -e/u01/sybase/ASE15_0/ASE-15_0/install/asdfg.log -c/u01/sybase/ASE15_0/ASE-15_0/asdfg.cfg -M/u01/sybase/ASE15_0/ASE-15_0 -s**asdfg** -psa
sybase 29617 29616 0 17:48 ? 00:00:33 /u01/sybase/ASE15_0/ASE-15_0/bin/dataserver -d/u01/sybase/ASE15_0/data/parbat.dat -e/u01/sybase/ASE15_0/ASE-15_0/install/parbat.log -c/u01/sybase/ASE15_0/ASE-15_0/parbat.cfg -M/u01/sybase/ASE15_0/ASE-15_0 -s**parbat**
sybase 29789 29788 0 17:57 ? 00:00:28 /u01/sybase/ASE15_0/ASE-15_0/bin/dataserver -d/u01/sybase/ASE15_0/data/ab123_master.dat -e/u01/sybase/ASE15_0/ASE-15_0/install/ab123.log -c/u01/sybase/ASE15_0/ASE-15_0/ab123.cfg -M/u01/sybase/ASE15_0/ASE-15_0 -s**ab123** -psa
[sybase#linuxerp scripts]$
I want to get the dataserver name from OS level itself without connecting to the Database.
ps -ef | grep dataserver
will get the server running or not
I tried to keep the output in a file and used grep -v on the file
Since the server name was not in exactly position, it is difficult to get the servername .
There are a couple of ways you can grab that information. One would be to pipe the grep output and use a regular expression:
ps -ef | grep dataserver | grep -oh '\-s[[:alnum:]]*' which should output something like this:
-saashish1
-schaitu
-sasdfg
-sparbat
-sab123
Another would be to use the showservers utility that comes installed with ASE, which outputs very similar to ps -ef but with CPU & Memory information as well as including other database servers such as the backup server, xp server, etc.
%> showserver
USER PID %CPU %MEM SZ RSS TT STAT START TIME COMMAND
user114276 0.0 1.7 712 1000 ? S Apr 5514:05 dataserver -d greensrv.dat -sgreensrv -einstall/greensrv+_errorlog
sybase 1071 0.0 1.4 408 820 ? S Mar 28895:38 /usr/local/sybase/bin/dataserver -d/dev/rsd1f -e/install/errorlog
user128493 0.0 0.0 3692 0 ? IW Apr 1 0:10 backupserver -SSYB_BACKUP -e/install/backup.log -Iinterfaces -Mbin/sybmultbuf -Lus_english -Jiso_1
And then pipe that into the same grep to get the information you are trying to find.
If you want to cut the -s off the front, to just get the servername itself, then you can pipe that into tr or cut.
Using tr you can tell it to delete -s from each line:
| tr -d '\-s'
Using cut you can tell it to print everything from the 3rd character to the end of the word:
| cut -c3-
Both of these will output your server names like this:
aashish1
chaitu
asdfg
parbat
ab123
Check this Question for information on using grep to grab single words.

Resources