Bash script for running parallel commands using input from a file

Bash script for running parallel commands using input from a file - arrays

I am trying to make a shell script which reads a configuration file and executes commands for each line in parallel.
For example, I have IPs.cfg, which can contain a variable amount of IPs. It could be one or several
IPs.cfg
145.x.x.x
176.x.x.x
192.x.x.x
I want to read the file and then execute a command for each line at the same time... for instance :
scp test.iso root#$IP1:/tmp &
scp test.iso root#$IP2:/tmp &
scp test.iso root#$IP3:/tmp &
wait
The way I'm thinking this is that I store the IPs into an array
IFS=$'\n' read -d '' -r -a array < IPs.cfg
Then I extract the number of lines from the file and decrease it by 1 since the array starts at 0.
NUMLINES=`cat IPs.cfg | wc -l`
NUMLINES=$((NUMLINES-1))
Now I want to execute the commands all at the same time. It's a variable number of parameters so I can't just manually use scp test.iso root#${array[0]}:/tmp & scp test.iso root#${array[1]}:/tmp & so on. I could use a while loop, but that would mean doing the commands one at a time. I'm also thinking about using recursion, but I have never done that in a bash script.
It might be a silly question, but what are my options here?

You could make use of GNU parallel

The loop should look like this:
while read -r ip ; do
scp test.iso "root#$ip:/tmp" &
done < IPs.conf
wait

with this trick, you can control the number of simultaneous processes running at the same:
cat IPs.conf | xargs -n1 -I{} -P10 scp test.iso "root#{}:/tmp"
Check the -p10 which means to use 10 processes at the same time.

Related

assign two bash arrays with a single command

youtube-dl can take some time parsing remote sites when called multiple times.
EDIT0 : I want to fetch multiple properties (here fileNames and remoteFileSizes) output by youtube-dl without having to run it multiple times.
I use those 2 properties to compare the local file size and ${remoteFileSizes[$i]} to tell if the file is finished downloading.
$ youtube-dl --restrict-filenames -o "%(title)s__%(format_id)s__%(id)s.%(ext)s" -f m4a,18,webm,251 -s -j https://www.youtube.com/watch?v=UnZbjvyzteo 2>errors_youtube-dl.log | jq -r ._filename,.filesize | paste - - > input_data.txt
$ cat input_data.txt
Alan_Jackson_-_I_Want_To_Stroll_Over_Heaven_With_You_Live__18__UnZbjvyzteo__youtube_com.mp4 8419513
Alan_Jackson_-_I_Want_To_Stroll_Over_Heaven_With_You_Live__250__UnZbjvyzteo__youtube_com.webm 1528955
Alan_Jackson_-_I_Want_To_Stroll_Over_Heaven_With_You_Live__140__UnZbjvyzteo__youtube_com.m4a 2797366
Alan_Jackson_-_I_Want_To_Stroll_Over_Heaven_With_You_Live__244__UnZbjvyzteo__youtube_com.webm 8171725
I want the first column in the fileNames array and the second column in the remoteFileSizes.
For the time being, I use a while read loop, but when this loop is finished my two arrays are lost :
$ fileNames=()
$ remoteFileSizes=()
$ cat input_data.txt | while read fileName remoteFileSize; do \
fileNames+=($fileName); \
remoteFileSizes+=($remoteFileSize); \
done
$ for fileNames in "${fileNames[#]}"; do \
echo PROCESSING....; \
done
$ echo "=> fileNames[0] = ${fileNames[0]}"
=> fileNames[0] =
$ echo "=> remoteFileSizes[0] = ${remoteFileSizes[0]}"
=> remoteFileSizes[0] =
$
Is it possible to assign two bash arrays with a single command ?

You assign variables in a subshell, so they are not visible in the parent shell. Read https://mywiki.wooledge.org/BashFAQ/024 . Remove the cat and do a redirection to solve your problem.
while IFS=$'\t' read -r fileName remoteFileSize; do
fileNames+=("$fileName")
remoteFileSizes+=("$remoteFileSize")
done < input_data.txt
You might also interest yourself in https://mywiki.wooledge.org/BashFAQ/001.

For what it's worth, if you're looking for specific/bespoke functionality from youtube-dl, I recommend creating your own python scripts using the 'embedded' approach: https://github.com/ytdl-org/youtube-dl/blob/master/README.md#embedding-youtube-dl
You can set your own signal for when a download is finished (text/chime/mail/whatever) and track downloads without having to compare file sizes.

Populate array to ssh in bash

Just some background, I have a file with 1000 servers in it new line delimted. I have to read them to an array the run about 5 commands over SSH. I have been using heredoc notation but that seems to fail. Currently I get an error saying the host isn't recognized.
IFS='\n' read -d '' -r -a my_arr < file
my_arr=()
for i in "${my_arr[#]}"; do
ssh "$1" bash -s << "EOF"
echo "making back up of some file"
cp /path/to/file /path/to/file.bak
exit
EOF
done
I get output that lists the first server but then all the ones in the array as well. I know that I am missing a redirect for STDIN that causes this.
Thanks for the help.

Do you need an array? What is wrong with:
while read -r host
do
ssh "$host" bash -s << "EOF"
echo "making back up of some file"
cp /path/to/file /path/to/file.bak
EOF
done < file

To be clear -- the problem here, and the only problem present in the code actually included in your question, is that you're using $1 inside your loop, whereas you specified $i as the variable that contains the entry being iterated over on each invocation of the loop.
That is to say: ssh "$1" needs to instead by ssh "$i".

how to run a c program using multiple cores (i7 machine) in shell scripting

I have a C program fextract which takes a wavfile as input and gives output in some fcc format.the syntax goes like this 'fextract file.wav file.fcc'. Now I have 75000 wav files which needs to be converted into fcc format. to speed up the procedure I am planing to use all the cores of my i7 machine. First I have saved all the input and output paths in a file which i call it as scp file
eg: /mnt/disk1/file1.wav /mnt/disk2/file1.fcc
/mnt/disk1/file2.wav /mnt/disk2/file2.fcc
and so on
now using the following shell scripting i have devided the scp files into 8 files and stored in a temp directory
mkdir $tmpDir
cd $tmpDir
nCores=`cat /proc/cpuinfo | grep processor | wc -l`
nLines=`cat $scpFile|wc -l`
split -l $((nLines/nCores + 1)) $scpFile
now my temp file has eight subfiles. How can i assess them to run the program fextract using multiple cores
for i in `ls`
do
fextract &i
done
need something of these kind. Please help me solve this its urgent

Use GNU Parallel:
parallel -j $nCores fextract -- `ls`
Or you could use xargs with -P key (useful with find).
Those commands will launch your code in multiple threads, which will allow them to be executed on multiple cores.

Using GNU Parallel:
cat filenames | parallel fextract {} {.}.fcc
As some time is spent for disk I/O it may be faster to run a little more than 1 per cpu core:
cat filenames | parallel -j150% {} {.}.fcc
If you just want all files in current dir:
parallel -j150% {} {.}.fcc ::: *.wav
If you want to give both input and output filename on a single line separated by space you can use:
cat filenames_2_per_line | parallel --colsep ' ' -j150% {1} {2}
If the filenames are not on the same line but after each other, then you need to read 2 lines at a time:
cat filenames_interleaved | parallel -N2 -j150% {1} {2}
Watch the intro videos to learn more: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

How to create file and put string on it using shellscript?

I want to create a file in /usr/share/applications/ and put a string on it.
What I have so far:
sudo touch /usr/share/applications/test.desktop
dentry="testing"
sudo echo $dentry >> /usr/share/applications/test.desktop
But this raise an error Permission Denied. What should I do to make it works?

You should create the file using your own pernissions, then sudo cp it into place.
The reason the second command doesn't work is that the redirection is set up by your shell, before sudo even runs. You could work around this by running sudo sh -c 'echo stuff >>file' but this is vastly more risk-prone than a simple sudo cp, and additionally has a race condition (if you run two concurrent instances of this script, they could end up writing the information twice to the file).

flock correct usage to prevent read while writing

*/10 * * * * /usr/bin/flock -x -w 10 /tmp/craigslist.lock /usr/bin/lynx -width=120 -dump "http://sfbay.craigslist.org/search/roo/sfc?query=&srchType=A&minAsk=&maxAsk=1100&nh=6&nh=8&nh=16&nh=24&nh=17&nh=21&nh=22&nh=23&nh=27" | grep "sort by most recent" -A 53 > /home/winchell/apartments.txt
*/10 * * * * /usr/bin/flock -x -w 10 /tmp/craigslist.lock /usr/bin/php /home/winchell/apartments.php
This is a cron job. The second line php command seems to be executing even while lynx is writing to apartments.txt, and I don't see the reason. Is this correct usage assuming I'm trying to prevent read from apartments.txt while lynx/grep are writing to it? Thanks!

Your usage is not correct. Notice how your first cron job is a pipeline consisting of two commands:
/usr/bin/flock -x -w 10 /tmp/craigslist.lock /usr/bin/lynx -width=120 -dump
"http://sfbay.craigslist.org/search/roo/sfc?query=&srchType=A&minAsk=&maxAsk=1100&nh=6&nh=8&nh=16&nh=24&nh=17&nh=21&nh=22&nh=23&nh=27"
which is then piped to:
grep "sort by most recent" -A 53 > /home/winchell/apartments.txt
So the first command is locking a file but it's the second command that's writing to that file! The second command will happily execute without waiting for the lock.
One way to fix this would be to write the file while holding the lock:
lynx etc... | grep etc.. |
flock -x -w 10 /tmp/craigslist.lock tee /home/winchell/apartments.txt
The disadvantage of this approach is that lynx and grep run even if the file is locked. To prevent this, you will have to run the whole thing under the lock:
flock -x -w 10 /tmp/craigslock.lock sh -c "lynx etc... | grep etc... >thefile"
With this approach you will have to pay careful attention to quoting as the URL argument of lynx as it will require double quoting.
Finally: consider using curl or wget instead of lynx. lynx is meant for interactive usage!