makefile loop var pipe - loops

I'm trying to do a loop in a make to exec a remote ssh command to optain the pid of a process to kill it.
Like this:
target:
for node in 23 ; do \
echo $$node ; \
ssh user#pc$$node "~/jdk1.6.0_31/bin/jps | grep CassandraDaemon | awk '{print \$$1}'" > $(PID); \
ssh user#pc$$node "kill -9 $(PID); \
done
But I get:
/bin/sh: 3: Syntax error: ";" unexpected
The issue I think is to store the pid that the remote ssh command returns (it woks well without the > $(PID) )

> redirects into files, not into variables. $() captures in a way you can assign to variables... but is also make syntax, so you need to escape it. You also need to escape it when you use it so that you don't get the make variable instead (no, you can't store it in a make variable).
for node in 23 ; do \
echo $$node ; \
PID=$$(ssh user#pc$$node "~/jdk1.6.0_31/bin/jps | grep CassandraDaemon | awk '{print \$$1}'"); \
ssh user#pc$$node "kill -9 $$PID; \
done
(assuming one of your many, many edits hasn't changed things too much from when I copied and pasted that to fix it...)

Related

assign two bash arrays with a single command

youtube-dl can take some time parsing remote sites when called multiple times.
EDIT0 : I want to fetch multiple properties (here fileNames and remoteFileSizes) output by youtube-dl without having to run it multiple times.
I use those 2 properties to compare the local file size and ${remoteFileSizes[$i]} to tell if the file is finished downloading.
$ youtube-dl --restrict-filenames -o "%(title)s__%(format_id)s__%(id)s.%(ext)s" -f m4a,18,webm,251 -s -j https://www.youtube.com/watch?v=UnZbjvyzteo 2>errors_youtube-dl.log | jq -r ._filename,.filesize | paste - - > input_data.txt
$ cat input_data.txt
Alan_Jackson_-_I_Want_To_Stroll_Over_Heaven_With_You_Live__18__UnZbjvyzteo__youtube_com.mp4 8419513
Alan_Jackson_-_I_Want_To_Stroll_Over_Heaven_With_You_Live__250__UnZbjvyzteo__youtube_com.webm 1528955
Alan_Jackson_-_I_Want_To_Stroll_Over_Heaven_With_You_Live__140__UnZbjvyzteo__youtube_com.m4a 2797366
Alan_Jackson_-_I_Want_To_Stroll_Over_Heaven_With_You_Live__244__UnZbjvyzteo__youtube_com.webm 8171725
I want the first column in the fileNames array and the second column in the remoteFileSizes.
For the time being, I use a while read loop, but when this loop is finished my two arrays are lost :
$ fileNames=()
$ remoteFileSizes=()
$ cat input_data.txt | while read fileName remoteFileSize; do \
fileNames+=($fileName); \
remoteFileSizes+=($remoteFileSize); \
done
$ for fileNames in "${fileNames[#]}"; do \
echo PROCESSING....; \
done
$ echo "=> fileNames[0] = ${fileNames[0]}"
=> fileNames[0] =
$ echo "=> remoteFileSizes[0] = ${remoteFileSizes[0]}"
=> remoteFileSizes[0] =
$
Is it possible to assign two bash arrays with a single command ?
You assign variables in a subshell, so they are not visible in the parent shell. Read https://mywiki.wooledge.org/BashFAQ/024 . Remove the cat and do a redirection to solve your problem.
while IFS=$'\t' read -r fileName remoteFileSize; do
fileNames+=("$fileName")
remoteFileSizes+=("$remoteFileSize")
done < input_data.txt
You might also interest yourself in https://mywiki.wooledge.org/BashFAQ/001.
For what it's worth, if you're looking for specific/bespoke functionality from youtube-dl, I recommend creating your own python scripts using the 'embedded' approach: https://github.com/ytdl-org/youtube-dl/blob/master/README.md#embedding-youtube-dl
You can set your own signal for when a download is finished (text/chime/mail/whatever) and track downloads without having to compare file sizes.

process file line by line does not work in ksh

This looks simple but i am not sure why it wont read beyond first line if I try to do any processing on data i read from file.
code:
while IFS="" read -r line
do
u=`echo $line | awk '{print $4}'`
h=`echo $line | awk '{print $2}'|cut -f1 -d'.'`
echo "$h : $u"
ssh $h grep $u /etc/shadow
done < "/var/tmp/user_data"
user_data is a file with one line for each user/system like:
xxx unixhost01 xxx admin69 xxx... ....
xxx host xxx uid xxx... ...
...
...
when i run this code it only works on first line then exit. debug in shell shows no issues. when i run it without the ssh operation/command it processes whole data file.
shell is ksh:
# ps
PID TTY TIME CMD
1424 pts/138 0:00 ps
18521 pts/138 0:02 ksh
on executing it only prints (the first line):
unixhost01 : admin69
admin69:$1$gFfcEQETGZAo6W0:17599:0:90:10:::
any idea?
The problem lies in ssh itself since it consumes standard input so the next time you loop around to get another line, there aren't any left. You can verify this by changing your command to:
ssh $h cat
and seeing that it outputs the rest of your file during the first ssh session.
You can fix this in at least two ways, the first by simply disconnecting your ssh from standard input:
ssh $h grep $u /etc/shadow </dev/null
The second (needed if you actually want to interact with the target box) is to use a different file handle for the input file stream:
while IFS="" read -u3 -r line ; do # reads file #3
blah blah blah
done 3</var/tmp/user_data # sends data via file #3

Find tmux session that a PID belongs to

I am using htop so see what processes are taking up a lot of memory so I can kill them. I have a lot of tmux sessions and lots of similar processes. How can I check which tmux pane a PID is in so I can be sure I am killing stuff I want to kill?
Given that PID in the below line is the target pid number:
$ tmux list-panes -a -F "#{pane_pid} #{pane_id}" | grep ^PID
The above will identify the pane where the PID is running. The output will be two strings. The first number should be the same as PID and the second one (with a percent sign) is "tmux pane id". Example output:
2345 %30
Now, you can use "tmux pane id" to kill the pane without "manually" searching for it:
$ tmux kill-pane -t %30
To answer your question completely, in order to find *tmux session* that a PID belongs to, this command can be used:
$ tmux list-panes -a -F "#{pane_pid} #{session_name}" | grep ^PID
# example output: 2345 development
Here's another possibly useful "line":
$ tmux list-panes -a -F "#{pane_pid} #{session_name}:#{window_index}:#{pane_index}" | grep ^PID
# example output: 2345 development:2:0
The descriptions for all of the interpolation strings (example #{pane_pid}) can be looked up in tmux man page in the FORMATS section.
The answers above give you the pids of the shells running in the panes, you'll be out of luck if you want to find something running in the shells.
try:
https://gist.github.com/nkh/0dfa8bf165a53832a4b5b17ee0d7ab12
This scrip gives you all the pids as well as the files the processes have opened. I never know in which session, window, pane, attached or not, I have a file open, this helps.
I haven't tried it on another machine, tell me if you encounter any problem.
lsof needs to be installed.
if you just want pids, pstree is useful, you can modity the script to use it (it's already there commented)
The following script displays the tree of processes in each window (or pane). It takes list of PIDs as one parameter (one PID per line). Specified processes are underlined. It automatically pipes to less unless is a part of some other pipe. Example:
$ ./tmux-processes.sh "$(pgrep ruby)"
-- session-name-1 window-index-1 window-name-1
7184 7170 bash bash --rcfile /dev/fd/63 -i
7204 7184 vim vim ...
-- session-name-2 window-index-2 window-name-2
7186 7170 bash bash --rcfile /dev/fd/63 -i
10771 7186 bash bash ./manage.sh runserver
10775 10771 django-admi /srv/www/s1/env/bin/python /srv/www/s1/env/bin/...
5761 10775 python /srv/www/s1/env/bin/python /srv/www/s1/env/bin/...
...
tmux-processes.sh:
#!/usr/bin/env bash
set -eu
pids=$1
my_pid=$$
subtree_pids() {
local pid=$1 level=${2:-0}
if [ "$pid" = "$my_pid" ]; then
return
fi
echo "$pid"
ps --ppid "$pid" -o pid= | while read -r pid; do
subtree_pids "$pid" $((level + 1))
done
}
# server_pid=$(tmux display-message -p '#{pid}')
underline=$(tput smul)
# reset=$(tput sgr0) # produces extra symbols in less (^O), TERM=screen-256color (under tmux)
reset=$(echo -e '\033[m')
re=$(echo "$pids" | paste -sd'|')
tmux list-panes -aF '#{session_name} #{window_index} #{window_name} #{pane_pid}' \
| while read -r session_name window_index window_name pane_pid; do
echo "-- $session_name $window_index $window_name"
ps -p "$(subtree_pids "$pane_pid" | paste -sd,)" -Ho pid=,ppid=,comm=,args= \
| sed -E 's/^/ /' \
| awk \
-v re="$re" -v underline="$underline" -v reset="$reset" '
$1 ~ re {print underline $0 reset}
$1 !~ re {print $0}
'
done | {
[ -t 1 ] && less -S || cat
}
Details regarding listing tmux processes you can find here.
To underline lines I use ANSI escape sequences. To show the idea separately, here's a script that displays list of processes and underlines some of them (having PIDs passed as an argument):
#!/usr/bin/env bash
set -eu
pids=$1
bold=$(tput bold)
# reset=$(tput sgr0) # produces extra symbols in less (^O), TERM=xterm-256color
reset=$(echo -e '\033[m')
underline=$(tput smul)
re=$(echo "$pids" | paste -sd'|')
ps -eHo pid,ppid,comm,args | awk \
-v re="$re" -v bold="$bold" -v reset="$reset" -v underline="$underline" '
$1 ~ re {print underline $0 reset}
$1 !~ re {print $0}
'
Usage:
$ ./ps.sh "$(pgrep ruby)"
Details regarding less and $(tput sgr0) can be found here.

Execute bash command stored in associative array over SSH, store result

For a larger project that's not relevant, I need to collect system stats from the local system or a remote system. Since I'm collecting the same stats either way, I'm preventing code duplication by storing the stats-collecting commands in a Bash associative array.
declare -A stats_cmds
# Actually contains many more key:value pairs, similar style
stats_cmds=([total_ram]="$(free -m | awk '/^Mem:/{print $2}')")
I can collect local system stats like this:
get_local_system_stats()
{
# Collect stats about local system
complex_data_structure_that_doesnt_matter=${stats_cmds[total_ram]}
# Many more similar calls here
}
A precondition of my script is that ~/.ssh/config is setup such that ssh $SSH_HOSTNAME works without any user input. I would like something like this:
get_remote_system_stats()
{
# Collect stats about remote system
complex_data_structure_that_doesnt_matter=`ssh $SSH_HOSTNAME ${stats_cmds[total_ram]}`
}
I've tried every combination of single quotes, double quotes, backticks and such that I can imagine. Some combinations result in the stats command getting executed too early (bash: 7986: command not found), others cause syntax errors, others return null (single quotes around the stats command) but none store the proper result in my data structure.
How can I evaluate a command, stored in an associative array, on a remote system via SSH and store the result in a data structure in my local script?
Make sure that the commands you store in your array don't get expanded when you assign your array!
Also note that the complex-looking quoting style is necessary when nesting single quotes. See this SO post for an explanation.
stats_cmds=([total_ram]='free -m | awk '"'"'/^Mem:/{print $2}'"'"'')
And then just launch your ssh as:
sh "$ssh_hostname" "${stats_cmds[total_ram]}"
(yeah, I lowercased your variable name because uppercase variable names in Bash are really sick). Then:
get_local_system_stats() {
# Collect stats about local system
complex_data_structure_that_doesnt_matter=$( ${stats_cmds[total_ram]} )
# Many more similar calls here
}
and
get_remote_system_stats() {
# Collect stats about remote system
complex_data_structure_that_doesnt_matter=$(ssh "$ssh_hostname" "${stats_cmds[total_ram]}")
}
First, I'm going to suggest an approach that makes minimal changes to your existing implementation. Then, I'm going to demonstrate something closer to best practices.
Smallest Modification
Given your existing code:
declare -A remote_stats_cmds
remote_stats_cmds=([total_ram]='free -m | awk '"'"'/^Mem:/{print $2}'"'"''
[used_ram]='free -m | awk '"'"'/^Mem:/{print $3}'"'"''
[free_ram]='free -m | awk '"'"'/^Mem:/{print $4}'"'"''
[cpus]='nproc'
[one_min_load]='uptime | awk -F'"'"'[a-z]:'"'"' '"'"'{print $2}'"'"' | awk -F "," '"'"'{print $1}'"'"' | tr -d " "'
[five_min_load]='uptime | awk -F'"'"'[a-z]:'"'"' '"'"'{print $2}'"'"' | awk -F "," '"'"'{print $2}'"'"' | tr -d " "'
[fifteen_min_load]='uptime | awk -F'"'"'[a-z]:'"'"' '"'"'{print $2}'"'"' | awk -F "," '"'"'{print $3}'"'"' | tr -d " "'
[iowait]='cat /proc/stat | awk '"'"'NR==1 {print $6}'"'"''
[steal_time]='cat /proc/stat | awk '"'"'NR==1 {print $9}'"'"'')
...one can evaluate these locally as follows:
result=$(eval "${remote_stat_cmds[iowait]}")
echo "$result" # demonstrate value retrieved
...or remotely as follows:
result=$(ssh "$hostname" bash <<<"${remote_stat_cmds[iowait]}")
echo "$result" # demonstrate value retrieved
No separate form is required.
The Right Thing
Now, let's talk about an entirely different way to do this:
# no awful nested quoting by hand!
collect_total_ram() { free -m | awk '/^Mem:/ {print $2}'; }
collect_used_ram() { free -m | awk '/^Mem:/ {print $3}'; }
collect_cpus() { nproc; }
...and then, to evaluate locally:
result=$(collect_cpus)
...or, to evaluate remotely:
result=$(ssh "$hostname" bash <<<"$(declare -f collect_cpus); collect_cpus")
...or, to iterate through defined functions with the collect_ prefix and do both of these things:
declare -A local_results
declare -A remote_results
while IFS= read -r funcname; do
local_results["${funcname#collect_}"]=$("$funcname")
remote_results["${funcname#collect_}"]=$(ssh "$hostname" bash <<<"$(declare -f "$funcname"); $funcname")
done < <(compgen -A function collect_)
...or, to collect all the items into a single remote array in one pass, avoiding extra SSH round-trips and not eval'ing or otherwise taking security risks with results received from the remote system:
remote_cmd=""
while IFS= read -r funcname; do
remote_cmd+="$(declare -f "$funcname"); printf '%s\0' \"$funcname\" \"\$(\"$funcname\")\";"
done < <(compgen -A function collect_)
declare -A remote_results=( )
while IFS= read -r -d '' funcname && IFS= read -r -d '' result; do
remote_results["${funcname#collect_}"]=$result
done < <(ssh "$hostname" bash <<<"$remote_cmd")

bash array not working

I am rather new to bioinformatic but trying my best to learn. I am running into an issue and I was hoping someone would know what to do, and explain me how bash tool for multiple file is actually working.
I have a folder with 160 RNAseq libraries unzip just look like name.fastq.
I want to run cutadapt (a software which will remove all the adapters sequence from my libraries) on all of them at the same time; so, for one library, the command looks just like this:
python2.6 /imports/home/w/workshop/oibc2013/oibc1/Apps/cutadapt-1.2.1/bin/cutadapt -a name_adapter input_file.fastq > out
So I tried to make a bash array loop to be able to do it on all 160 files I have, but it still does not work.
!/bin/bash
. $HOME/.bashrc
my_array=(*.fastq)
echo ${myarray["SGE_TASK_ID"-1]}
python2.6 \
/imports/home/w/workshop/oibc2013/oibc1/Apps/cutadapt-1.2.1/bin/cutadapt \
-a CTGTCTCTTATACACATCT \
-b AATTGCAGTGGTATCAACGCAGAGCGGCCGC \
-b GCGGCCGCTCTGCGTTGATACCACTGCAATT \
-b AAGCAGTGGTATCAACGCAGAGTACATGGG \
-b CCCATGTACTCTGCGTTGATACCACTGCTT \
inputs.$SGE_TASK_ID \
results.$SGE_TASK_ID]}
Rather than an array, you just want a loop. In this case, since you're matching a glob pattern (*.fastq), a for ... in loop would make sense.
The general syntax is for variable_name in list_of_words; do something_with $variable_name; done;. In your case:
#!/bin/bash
. $HOME/.bashrc
path=/imports/home/w/workshop/oibc2013/oibc1/Apps/cutadapt-1.2.1/bin
for file in *.fastq
do
python2.6 "$path"/cutadapt -a name_adapter "$file" > "$file.out"
done

Resources