cat/Xargs/command VS for/bash/command - c

The page 38 of the book Linux 101 Hacks suggests:
cat url-list.txt | xargs wget –c
I usually do:
for i in `cat url-list.txt`
do
wget -c $i
done
Is there some thing, other than length, where the xargs-technique is superior to the old good for-loop-technique in bash?
Added
The C source code seems to have only one fork. In contrast, how many forks have the bash-combo? Please, elaborate on the issue.

From the Rationale section of a UNIX manpage for xargs. (Interestingly this section doesn't appear in the OS X BSD version of xargs, nor in the GNU version.)
The classic application of the xargs
utility is in conjunction with the
find utility to reduce the number of
processes launched by a simplistic use
of the find -exec combination. The
xargs utility is also used to enforce
an upper limit on memory required to
launch a process. With this basis in
mind, this volume of POSIX.1-2008
selected only the minimal features
required.
In your follow-up, you ask how many forks the other version will have. Jim already answered this: one per iteration. How many iterations are there? It's impossible to give an exact number, but easy to answer the general question. How many lines are there in your url-list.txt file?
There are other some other considerations. xargs requires extra care for filenames with spaces or other no-no characters, and -exec has an option (+), that groups processing into batches. So, not everyone prefers xargs, and perhaps it's not best for all situations.
See these links:
http://www.sunmanagers.org/pipermail/summaries/2005-March/006255.html
http://fahdshariff.blogspot.com/2009/05/find-exec-vs-xargs.html

Also consider:
xargs -I'{}' wget -c '{}' < url-list.txt
but wget provides an even better means for the same:
wget -c -i url-list.txt
With respect to the xargs versus loop consideration, i prefer xargs when the meaning and implementation are relatively "simple" and "clear", otherwise, i use loops.

xargs will also allow you to have a huge list, which is not possible with the "for" version because the shell uses command lines limited in length.

xargs is designed to process multiple inputs for each process it forks. A shell script with a for loop over its inputs must fork a new process for each input. Avoiding that per-process overhead can give an xargs solution a significant performance enhancement.

instead of GNU/Parallel i prefer using xargs' built in parallel processing. Add -P to indicate how many forks to perform in parallel. As in...
seq 1 10 | xargs -n 1 -P 3 echo
would use 3 forks on 3 different cores for computation. This is supported by modern GNU Xargs. You will have to verify for yourself if using BSD or Solaris.

Depending on your internet connection you may want to use GNU Parallel http://www.gnu.org/software/parallel/ to run it in parallel.
cat url-list.txt | parallel wget -c

One advantage I can think of is that, if you have lots of files, it could be slightly faster since you don't have as much overhead from starting new processes.
I'm not really a bash expert though, so there could be other reasons it's better (or worse).

Related

Linux - Displaying Memory Usage Live

I'm running GNU - Screen (4.03.01) so I can have multiple terminals in one, and I'm looking for a good way to display live stats of my memory, so as I do things like compiling, testing programs, etc... I can see how much resources I have left.
I know there is "TOP" the performance monitor... and other similar programs, but I'm not looking for the entire active process list etc... I just want a snapshot of my memory stats that updates for example every 3-5 seconds.
I really appreciate anyone taking the time to help me with this, so thank you!
(for visualization purposes)
Screenshot:
You can use the combination of watch which repeats the specified program and displays its output and free which shows current memory usage
watch free -m
free --help
Usage:
free [options]
Options:
-b, --bytes show output in bytes
-k, --kilo show output in kilobytes
-m, --mega show output in megabytes
-g, --giga show output in gigabytes
--tera show output in terabytes
-h, --human show human-readable output
--si use powers of 1000 not 1024
-l, --lohi show detailed low and high memory statistics
-o, --old use old format (without -/+buffers/cache line)
-t, --total show total for RAM + swap
-s N, --seconds N repeat printing every N seconds
-c N, --count N repeat printing N times, then exit
--help display this help and exit
-V, --version output version information and exit
For more details see free(1).
watch --help
Usage:
watch [options] command
Options:
-b, --beep beep if command has a non-zero exit
-c, --color interpret ANSI color sequences
-d, --differences[=]
highlight changes between updates
-e, --errexit exit if command has a non-zero exit
-g, --chgexit exit when output from command changes
-n, --interval seconds to wait between updates
-p, --precise attempt run command in precise intervals
-t, --no-title turn off header
-x, --exec pass command to exec instead of "sh -c"
-h, --help display this help and exit
-v, --version output version information and exit
You could use valgrind tool Massif, I haven't tried it, but it seems to be what you are looking for.
To use massif, install valgrind then run:
valgrind --tool=massif program argument1 argument2 ...
another fast solution is script like this
while true; do
free -m
# any command for CPU stats - i didn't understand - what you really want to see, please clarify - just % of CPU usage ?
# i think this command should help you.
ps -A -o pcpu | tail -n+2 | paste -sd+ | bc
done
The other thing you can do is use htop. It displays memory usage, CPU usage per core and shows resources used by each process. Really neat but maybe not that detailed as the rest of the answers.

In C, what's the typical way to handle multiple arguments that are "list"-like?

Suppose I have some program called "combine" that takes input of "red", "green" and "blue"-type files to produce an output file (let's say "color.jpg")... BUT the number of each type is arbitrary. Let's also suppose that there's no way to determine what type the file is except through how the user classifies them. What do people usually do in this case?
For instance, on the command line, some of the approaches might be:
command red1,red2,red3 green1,green2 blue1 color.jpg
This comma-approach breaks down if commas can appear in the filenames. It's the approach I like the most though. Another idea would be
command "red1 red2 red3" "green1 green2" "blue1" color.jpg
but this approach also has trouble with spaces in names.
I could also require ASCII files containing lists giving the files of each type:
command redlist greenlist bluelist color.jpg
but this requires lugging around extra files.
Further ideas? Is there a standard LINUX way of doing this?
The standard way would be this:
command --red red1.jpg --red red2.jpg --blue blue1.jpg
With short options:
command -r red1.jpg -r red2.jpg -b blue1.jpg
With bash shorthand:
command -r={red1,red2}.jpg -b blue1.jpg
(The above gets expanded by the shell so it looks like the previous invocation.)
Doing things this way avoids arbitrary limitations like "no commas in filenames" and also makes your program more interoperable with standard *nix utilities like xargs and so on.
Another way is accepting:
command -r redfile1 redfile2 -b bluefile1 blue2 blue2 -g green1
so that:
command -r red* -b blue* -g green*
is possible.

Is there a way to ping faster in Busy Box or Tiny Core Linux?

Solution at end of this post.
By default the time is set to one second, and under the usual iputils version of ping there is an option to reduce this number with the -i switch. I need to ping faster, as I have 120 pings in a certain test that needs to be run many times.
I tried modifying the source of ping.c from the busybox source but I don't know much about compiling and I get the error "could not be found libbb.h" and I couldn't find anyone else with a similar error on busybox.
Does anyone know of a way for me to ping faster than 1 per second, I am hoping to go down to 0.1 or 0.05 seconds if at all possible.
Thanks in advance
Solution
In case anyone comes looking for an answer, the solution I came up with was much better. If you write a script to ping with the -c 1 flag, and count the failures yourself you can ping much faster.
Example:
fails=0
for i in `seq 1 20`
do
x=`ping -c 1 192.168.1.1 | grep received | cut -d' ' -f4`
if [ x -eq 0 ]
then
fails=$(($fails+1))
fi
done
echo $fails fails
done
You are correct in that you have to modify the ping.c file. As you have determined, BusyBox ping does not support the -i switch.
What platform are you building this for? A PC, an embedded system?
Option 1:
Modify ping.c from BusyBox and recompile BusyBox. To do this, you would use 'make' in the root of the BusyBox project.
user#linux:~/busybox-1.19.2$ make
Option 2:
It might be easier and more simplistic to leave BusyBox alone and get ping.c from another archive such as iputils. This supports the -i switch and goes as low as 0.2 seconds. To compile ping.c:
user#linux:~/iputils-s20101006$ make ping

Moving things in terminal based on their name

Edit: I think this has been answered successfully, but I can't check 'til later. I've reformatted it as suggested though.
The question: I have a series of files, each with a name of the form XXXXNAME, where XXXX is some number. I want to move them all to separate folders called XXXX and have them called NAME. I can do this manually, but I was hoping that by naming them XXXXNAME there'd be some way I could tell Terminal (I think that's the right name, but not really sure) to move them there. Something like
mv *NAME */NAME
but where it takes whatever * was in the first case and regurgitates it to the path.
This is on some form of Linux, with a bash shell.
In the real life case, the files are 0000GNUmakefile, with sequential numbering. I'm having to make lots of similar-but-slightly-altered versions of a program to compile and run on a cluster as part of my research. It would probably have been quicker to write a program to edit all the files and put in the right place in the first place, but I didn't.
This is probably extremely simple, and I should be able to find an answer myself, if I knew the right words. Thing is, I have no formal training in programming, so I don't know what to call things to search for them. So hopefully this will result in me getting an answer, and maybe knowing how to find out the answer for similar things myself next time. With the basic programming I've picked up, I'm sure I could write a program to do this for me, but I'm hoping there's a simple way to do it just using functionality already in Terminal. I probably shouldn't be allowed to play with these things.
Thanks for any help! I can actually program in C and Python a fair amount, but that's through trial and error largely, and I still don't know what I can do and can't do in Terminal.
SO many ways to achieve this.
I find that the old standbys sed and awk are often the most powerful.
ls | sed -rne 's:^([0-9]{4})(NAME)$:mv -iv & \1/\2:p'
If you're satisfied that the commands look right, pipe the command line through a shell:
ls | sed -rne 's:^([0-9]{4})(NAME)$:mv -iv & \1/\2:p' | sh
I put NAME in brackets and used \2 so that if it varies more than your example indicates, you can come up with a regular expression to handle your filenames better.
To do the same thing in gawk (GNU awk, the variant found in most GNU/Linux distros):
ls | gawk '/^[0-9]{4}NAME$/ {printf("mv -iv %s %s/%s\n", $1, substr($0,0,4), substr($0,5))}'
As with the first sample, this produces commands which, if they make sense to you, can be piped through a shell by appending | sh to the end of the line.
Note that with all these mv commands, I've added the -i and -v options. This is for your protection. Read the man page for mv (by typing man mv in your Linux terminal) to see if you should be comfortable leaving them out.
Also, I'm assuming with these lines that all your directories already exist. You didn't mention if they do. If they don't, here's a one-liner to create the directories.
ls | sed -rne 's:^([0-9]{4})(NAME)$:mkdir -p \1:p' | sort -u
As with the others, append | sh to run the commands.
I should mention that it is generally recommended to use constructs like for (in Tim's answer) or find instead of parsing the output of ls. That said, when your filename format is as simple as /[0-9]{4}word/, I find the quick sed one-liner to be the way to go.
Lastly, if by NAME you actually mean "any string of characters" rather than the literal string "NAME", then in all my examples above, replace NAME with .*.
The following script will do this for you. Copy the script into a file on the remote machine (we'll call it sortfiles.sh).
#!/bin/bash
# Get all files in current directory having names XXXXsomename, where X is an integer
files=$(find . -name '[0-9][0-9][0-9][0-9]*')
# Build a list of the XXXX patterns found in the list of files
dirs=
for name in ${files}; do
dirs="${dirs} $(echo ${name} | cut -c 3-6)"
done
# Remove redundant entries from the list of XXXX patterns
dirs=$(echo ${dirs} | uniq)
# Create any XXXX directories that are not already present
for name in ${dirs}; do
if [[ ! -d ${name} ]]; then
mkdir ${name}
fi
done
# Move each of the XXXXsomename files to the appropriate directory
for name in ${files}; do
mv ${name} $(echo ${name} | cut -c 3-6)
done
# Return from script with normal status
exit 0
From the command line, do chmod +x sortfiles.sh
Execute the script with ./sortfiles.sh
Just open the Terminal application, cd into the directory that contains the files you want moved/renamed, and copy and paste these commands into the command line.
for file in [0-9][0-9][0-9][0-9]*; do
dirName="${file%%*([^0-9])}"
mkdir -p "$dirName"
mv "$file" "$dirName/${file##*([0-9])}"
done
This assumes all the files that you want to rename and move are in the same directory. The file globbing also assumes that there are at least four digits at the start of the filename. If there are more than four numbers, it will still be caught, but not if there are less than four. If there are less than four, take off the appropriate number of [0-9]s from the first line.
It does not handle the case where "NAME" (i.e. the name of the new file you want) starts with a number.
See this site for more information about string manipulation in bash.

parsing the output of the 'w' command?

I'm writing a program which requires knowledge of the current load on the system, and the activity of any users (it's a load balancer).
This is a university assignment, and I am required to use the w command. I'm having a hard time parsing this command because it is very verbose. Any suggestions on what I can do would be appreciated. This is a small part of the program, and I am free to use whatever method i like.
The most condensed version of w which still has the information I require is `w -u -s -f' which produces this:
10:13:43 up 9:57, 2 users, load average: 0.00, 0.00, 0.00
USER TTY IDLE WHAT
fsm tty7 22:44m x-session-manager
fsm pts/0 0.00s w -u -s -f
So out of that, I am interested in the first number after load average and the smallest idle time (so i will need to parse them all).
My background process will call w, so the fact that w is the lowest idle time will not matter (all i will see is the tty time).
Do you have any ideas?
Thanks
(I am allowed to use alternative unix commands, like grep, if that helps).
Are you allowed to use other Unix commands? You could use grep, sed or head/tail to get the lines you need, and cut to split them up as needed.
Check out: http://www.gnu.org/s/libc/manual/html_node/Regexp-Subexpressions.html#Regexp-Subexpressions
Use regular expressions to match [0-9]+\.[0-9]{2} on the first line. You may have to fiddle with which characters are escaped. That will give you 3 load averages.
The remaining output is column-based, so if you count off the string positions from w, you'll be able to just strncpy the interesting bits.
Another possible theory (which sounds like it goes against the assignment, but I'd keep it in mind) is to go grab the source code of w and hack it up to just tell you the information via function calls. If you're feeling really hardcore, you can learn all the library api calls and do it directly that way.
I found i can use a combination of commands like so:
w -u -s -f | grep load | cut -d " " -f 11
and
w -u -s -f | grep tty | cut -d " " -f 13
the first takes the output of w, uses grep to only select the line with load, and then cuts everything except for the 11th chunk of data (delimiter is a space), which is the first load number with a comma.
the second does something similar, only for user load. And if there are multiple loads, its a list.
This is easy enough to parse, unless someone has an objection, or suggestion to improve it.

Resources