"System" command in c giving wrong output with pipes of bash - c

I am trying to understand the internal working of pipes in C. I noticed if I run
int main() {
system("ls | grep d | wc");
}
Output:
3 3 53
But on running the same command with bash I get
3 3 104
Output of ls | grep d
question_1.pdf
question_2.pdf
question_2_dataset.txt
Can someone explain the cause of this discrepancy?
The same thing occurs if I use pipe via pipe() call in C.

Actually I figured out problem wasn't with ls but "grep --color=always d" which is alias of grep in my bash. The colored characters have extra length which increase the length of output.

Check out what your 'ls' command at the bash is! Try:
type ls
You probably will find that ls is an alias of some kind.
Check your bash-test again with
/bin/ls | grep d | wc
If you then get th esame result as in your C-Code you will know what went wrong.

ls is often an alias in an interactive shell.
For example, in my bash session if I do type ls I get
ls is aliased to `ls -t --group-directories-first -I .pyc -I __pycache__ -I .git --color=auto -xF'
(The alias usually comes from $HOME/.bashrc or /etc/bash.bashrc).
Now if you do:
sh -c 'ls | grep d | wc'
(or command ls| command grep d | command wc ) you should get absolutely the same result as with compiling
int main() { system("ls | grep d | wc"); }
and running it in the same directory.

Related

Bash Add elements to an array does not work [duplicate]

Why isn't this bash array populating? I believe I've done them like this in the past. Echoing ${#XECOMMAND[#]} shows no data..
DIR=$1
TEMPFILE=/tmp/dir.tmp
ls -l $DIR | tail -n +2 | sed 's/\s\+/ /g' | cut -d" " -f5,9 > $TEMPFILE
i=0
cat $TEMPFILE | while read line ;do
if [[ $(echo $line | cut -d" " -f1) == 0 ]]; then
XECOMMAND[$i]="$(echo "$line" | cut -d" " -f2)"
(( i++ ))
fi
done
When you run the while loop like
somecommand | while read ...
then the while loop is executed in sub-shell, i.e. a different process than the main script. Thus, all variable assignments that happen in the loop, will not be reflected in the main process. The workaround is to use input redirection and/or command substitution, so that the loop executes in the current process. For example if you want to read from a file you do
while read ....
do
# do stuff
done < "$filename"
or if you wan't the output of a process you can do
while read ....
do
# do stuff
done < <(some command)
Finally, in bash 4.2 and above, you can set shopt -s lastpipe, which causes the last command in the pipeline to be executed in the current process.
I think you're trying to construct an array consisting of the names of all zero-length files and directories in $DIR. If so, you can do it like this:
mapfile -t ZERO_LENGTH < <(find "$DIR" -maxdepth 1 -size 0)
(Add -type f to the find command if you're only interested in regular files.)
This sort of solution is almost always better than trying to parse ls output.
The use of process substitution (< <(...)) rather than piping (... |) is important, because it means that the shell variable will be set in the current shell, not in an ephimeral subshell.

pipe awk output into c program

Hi I have written a c program that takes 3 integers as input:
./myprogram 1 2 3
and I am aiming to pipe data from a csv file into the input of the c program. I grab each line from the c program using:
for i in $(seq 1 `wc -l "test.csv" | awk '{print $1}'`); do sed -n $i'p' "test.csv"; done;
and then would like to pipe the output of this into my c program. I have tried doing:
for i in $(seq 1 `wc -l "test.csv" | awk '{print $1}'`); do sed -n $i'p' "test.csv"; done; | ./myprogram
however I get:
Line
bash: syntax error near unexpected token `|'
how do I pipe the output into my c program?
Thanks
It helps when you really try to understand error messages the shell gives you:
Line
bash: syntax error near unexpected token `|'
If you think about it, when you chain commands together in a pipeline, there is never a ; before a |, for example:
ls | wc -l
# and not: ls; | wc -l
Whatever comes after a ; is like an independent new command, as if you typed it on a completely new, clear command line. If you type | hello on a clear command line, you'll get the exact same error, because that's the exact same situation as ; | ... in your script, for example:
$ | hello
-bash: syntax error near unexpected token `|'
Others already answered this, but I also wanted to urge you to make other improvements in your script:
Always use $() instead of backticks, for example:
for i in $(seq 1 $(wc -l "test.csv" | awk '{print $1}')); ...
You didn't need the awk there, this would work just as well:
for i in $(seq 1 $(wc -l "test.csv")); ...
You could reduce your entire script to simply this, for the same effect:
./myprogram < test.csv
In the shell, it doesn't like an explicit line termination followed by a pipe (|). The pipe already delimits the commands. So you want:
for i in $(seq 1 `wc -l "test.csv" | awk '{print $1}'`); do sed -n $i'p' "test.csv"; done | ./myprogram

How do i find the largest file, by size, then copy to another directory

how do I search for files in a directory, sort by size, then copy the largest file into another directory.
I have seen bits and pieces..yet to solve it.
I have tried the below code. However, it does not work.
find sourceDirectory -type f -exec ls -s {} \; | sort -n -r | head -1 | cp {} targetdirectory
The curly brace notation ({}) is used in the arguments for the -exec option for find, it has no meaning to cp in this context. You need to split this up into two separate steps, 1) find the file, and 2) copy the file.
If you are using GNU find I would suggest something like this:
read size filepath < <(find . -type f -printf '%k %p\n' | sort -nr)
cp "$filepath" target/path/
Here is an alternative that avoids temporary variables:
cp "$(find . -type f -printf '%k %p\n' | sort -nr | head -n1 | cut -d' ' -f2-)" target/path/
You can replace -printf '%k %p\n' by-exec ls -s {} \;` but printf is much more efficient.
Note that special precautions may be needed if the file names contain other than ASCII characters.
You were almost there. just needed extra support of awk and xargs
I would prefer using du in place of ls -s although they both work fine in this case.
find <sourceDirectory> -type f -exec du {} \; | sort -nr | head -1 | awk '{print $2}' | xargs -I file cp file <targetdirectory>

How to find the largest file in a directory and its subdirectories?

We're just starting a UNIX class and are learning a variety of Bash commands. Our assignment involves performing various commands on a directory that has a number of folders under it as well.
I know how to list and count all the regular files from the root folder using:
find . -type l | wc -l
But I'd like to know where to go from there in order to find the largest file in the whole directory. I've seen somethings regarding a du command, but we haven't learned that, so in the repertoire of things we've learned I assume we need to somehow connect it to the ls -t command.
And pardon me if my 'lingo' isn't correct, I'm still getting used to it!
Quote from this link-
If you want to find and print the top 10 largest files names (not
directories) in a particular directory and its sub directories
$ find . -type f -printf '%s %p\n'|sort -nr|head
To restrict the search to the present directory use "-maxdepth 1" with
find.
$ find . -maxdepth 1 -printf '%s %p\n'|sort -nr|head
And to print the top 10 largest "files and directories":
$ du -a . | sort -nr | head
** Use "head -n X" instead of the only "head" above to print the top X largest files (in all the above examples)
To find the top 25 files in the current directory and its subdirectories:
find . -type f -exec ls -al {} \; | sort -nr -k5 | head -n 25
This will output the top 25 files by sorting based on the size of the files via the "sort -nr -k5" piped command.
Same but with human-readable file sizes:
find . -type f -exec ls -alh {} \; | sort -hr -k5 | head -n 25
find . -type f | xargs ls -lS | head -n 1
outputs
-rw-r--r-- 1 nneonneo staff 9274991 Apr 11 02:29 ./devel/misc/test.out
If you just want the filename:
find . -type f | xargs ls -1S | head -n 1
This avoids using awk and allows you to use whatever flags you want in ls.
Caveat. Because xargs tries to avoid building overlong command lines, this might fail if you run it on a directory with a lot of files because ls ends up executing more than once. It's not an insurmountable problem (you can collect the head -n 1 output from each ls invocation, and run ls -S again, looping until you have a single file), but it does mar this approach somewhat.
There is no simple command available to find out the largest files/directories on a Linux/UNIX/BSD filesystem. However, combination of following three commands (using pipes) you can easily find out list of largest files:
# du -a /var | sort -n -r | head -n 10
If you want more human readable output try:
$ cd /path/to/some/var
$ du -hsx * | sort -rh | head -10
Where,
Var is the directory you wan to search
du command -h option : display sizes in human readable format (e.g.,
1K, 234M, 2G).
du command -s option : show only a total for each
argument (summary).
du command -x option : skip directories on
different file systems.
sort command -r option : reverse the result
of comparisons.
sort command -h option : compare human readable
numbers. This is GNU sort specific option only.
head command -10 OR -n 10 option : show the first 10 lines.
This lists files recursively if they're normal files, sorts by the 7th field (which is size in my find output; check yours), and shows just the first file.
find . -type f -ls | sort +7 | head -1
The first option to find is the start path for the recursive search. A -type of f searches for normal files. Note that if you try to parse this as a filename, you may fail if the filename contains spaces, newlines or other special characters. The options to sort also vary by operating system. I'm using FreeBSD.
A "better" but more complex and heavier solution would be to have find traverse the directories, but perhaps use stat to get the details about the file, then perhaps use awk to find the largest size. Note that the output of stat also depends on your operating system.
This will find the largest file or folder in your present working directory:
ls -S /path/to/folder | head -1
To find the largest file in all sub-directories:
find /path/to/folder -type f -exec ls -s {} \; | sort -nr | awk 'NR==1 { $1=""; sub(/^ /, ""); print }'
On Solaris I use:
find . -type f -ls|sort -nr -k7|awk 'NR==1{print $7,$11}' #formatted
or
find . -type f -ls | sort -nrk7 | head -1 #unformatted
because anything else posted here didn't work.
This will find the largest file in $PWD and subdirectories.
Try the following one-liner (display top-20 biggest files):
ls -1Rs | sed -e "s/^ *//" | grep "^[0-9]" | sort -nr | head -n20
or (human readable sizes):
ls -1Rhs | sed -e "s/^ *//" | grep "^[0-9]" | sort -hr | head -n20
Works fine under Linux/BSD/OSX in comparison to other answers, as find's -printf option doesn't exist on OSX/BSD and stat has different parameters depending on OS. However the second command to work on OSX/BSD properly (as sort doesn't have -h), install sort from coreutils or remove -h from ls and use sort -nr instead.
So these aliases are useful to have in your rc files:
alias big='du -ah . | sort -rh | head -20'
alias big-files='ls -1Rhs | sed -e "s/^ *//" | grep "^[0-9]" | sort -hr | head -n20'
Try following command :
find /your/path -printf "%k %p\n" | sort -g -k 1,1 | awk '{if($1 > 500000) print $1/1024 "MB" " " $2 }' |tail -n 1
This will print the largest file name and size and more than 500M. You can move the if($1 > 500000),and it will print the largest file in the directory.
du -aS /PATH/TO/folder | sort -rn | head -2 | tail -1
or
du -aS /PATH/TO/folder | sort -rn | awk 'NR==2'
To list the larger file in a folder
ls -sh /pathFolder | sort -rh | head -n 1
The output of ls -sh is a sized s and human h understandable view of the file size number.
You could use ls -shS /pathFolder | head -n 1. The bigger S from ls already order the list from the larger files to the smaller ones but the first result its the sum of all files in that folder. So if you want just to list the bigger file, one file, you need to head -n 2 and check at the "second line result" or use the first example with ls sort head.
This command works for me,
find /path/to/dir -type f -exec du -h '{}' + | sort -hr | head -10
Lists Top 10 files ordered by size in human-readable mode.
This script simplifies finding largest files for further action.
I keep it in my ~/bin directory, and put ~/bin in my $PATH.
#!/usr/bin/env bash
# scriptname: above
# author: Jonathan D. Lettvin, 201401220235
# This finds files of size >= $1 (format ${count}[K|M|G|T], default 10G)
# using a reliable version-independent bash hash to relax find's -size syntax.
# Specifying size using 'T' for Terabytes is supported.
# Output size has units (K|M|G|T) in the left hand output column.
# Example:
# ubuntu12.04$ above 1T
# 128T /proc/core
# http://stackoverflow.com/questions/1494178/how-to-define-hash-tables-in-bash
# Inspiration for hasch: thanks Adam Katz, Oct 18 2012 00:39
function hasch() { local hasch=`echo "$1" | cksum`; echo "${hasch//[!0-9]}"; }
function usage() { echo "Usage: $0 [{count}{k|K|m|M|g|G|t|T}"; exit 1; }
function arg1() {
# Translate single arg (if present) into format usable by find.
count=10; units=G; # Default find -size argument to 10G.
size=${count}${units}
if [ -n "$1" ]; then
for P in TT tT GG gG MM mM Kk kk; do xlat[`hasch ${P:0:1}`]="${P:1:1}"; done
units=${xlat[`hasch ${1:(-1)}`]}; count=${1:0:(-1)}
test -n "$units" || usage
test -x $(echo "$count" | sed s/[0-9]//g) || usage
if [ "$units" == "T" ]; then units="G"; let count=$count*1024; fi
size=${count}${units}
fi
}
function main() {
sudo \
find / -type f -size +$size -exec ls -lh {} \; 2>/dev/null | \
awk '{ N=$5; fn=$9; for(i=10;i<=NF;i++){fn=fn" "$i};print N " " fn }'
}
arg1 $1
main $size
That is quite simpler way to do it:
ls -l | tr -s " " " " | cut -d " " -f 5,9 | sort -n -r | head -n 1***
And you'll get this: 8445 examples.desktop
Linux Solution: For example, you want to see all files/folder list of your home (/) directory according to file/folder size (Descending order).
sudo du -xm / | sort -rn | more
ls -alR|awk '{ if ($5 > max) {max=$5;ff=$9}} END {print max "\t" ff;}'
Kindly run below one liner with your required-path. as of now i am running for /var/log/ location
(sudo du -a /var/log/ |sort -nr|head -n20 |awk '{print $NF}'|while read l ;do du -csh $l|grep -vi total;done ) 2> /dev/null

Enumerate the number of running processes with a given name - assign to variable

I need to know how many processes are running for a specific task (e.g. number of Apache tomcats) and if it's 1, then print the PID. Otherwise print out a message.
I need this in a BASH script, now when I perform something like:
result=`ps aux | grep tomcat | awk '{print $2}' | wc -l`
The number of items is assigned to result. Hurrah! But I don't have the PID(s). However when I attempt to perform this as an intermediary step (without the wc), I encounter problems. So if I do this:
result=`ps aux | grep tomcat | awk '{print $2}'`
Any attempts I make to modify the variable result just don't seem to work. I've tried set and tr (replace blanks with line-breaks), but I just cannot get the right result. Ideally I'd like the variable result to be an array with the PIDs as individual elements. Then I can see size, elements, easily.
Can anyone suggest what I am doing wrong?
Thanks,
Phil
Update:
I ended up using the following syntax:
pids=(`ps aux | grep "${searchStr}"| grep -v grep | awk '{print $2}'`)
number=${#pids[#]}
The key was putting the brackets around the back-ticked commands. Now the variable pids is an array and can be asked for length and elements.
Thanks to both choroba and Dimitre for their suggestions and help.
pids=($(
ps -eo pid,command |
sed -n '/[t]omcat/{s/^ *\([0-9]\+\).*/\1/;p}'
))
number=${#pids[#]}
pids=( ... ) creates an array.
$( ... ) returns its output as a string (similar to backquote).
Then, sed is called on the list of all the processes: for lines containing tomacat (the [t] prevents the sed itself from being included), only the pid is preserved and printed.
You may need to adjust the pgrep command (you may need or may not need the -f option).
_pids=(
$( pgrep -f tomcat )
)
(( ${#_pids[#]} == 1 )) &&
echo ${_pids[0]} ||
echo message
If you want to print the number of pids (with a message):
_pids=(
$( pgrep -f tomcat )
)
(( ${#_pids[#]} == 1 )) &&
echo ${_pids[0]} ||
echo "${#_pids[#]} running"
It should be noted that the pgrep utility and the syntax used are not standard.

Resources