I am trying to replicate the "ls" command in C. It should take anywhere between 0 and 2 arguments, those being a file path and a set of flags. When 1 argument is passed, I am trying to distinguish between passing a file and passing a set of flags - I would have thought the obvious way to go about it was to assume no file names begin with a "-" character and so therefore, if the first character of the argument is a "-" then treat it as a set of flags, otherwise treat it as a file path.
How should I actually distinguish between the two?
well, the rule with ls is that — considering -a is a file:
ls -a : -a is considered as an option argument ;
ls -- -a : -a is considered as a file argument
the -- argument is considered as a separator, after which all arguments are files, not options.
Typically, programs don't, and leave it to the user to deal with the resulting problems.
For example, create a file called -l, and at least one other file, and then run ls *:
me#localhost:~$ mkdir temp
me#localhost:~$ cd temp
me#localhost:~/temp$ touch ./-l
me#localhost:~/temp$ touch testfile
me#localhost:~/temp$ ls *
-rw-rw-r-- 1 acampbell acampbell 0 Apr 4 11:00 testfile
me#localhost:~/temp$
ls * expanded to ls -l testfile.
Most Unix utilities can take the argument --, and every argument after -- will be treated as a filename:
me#localhost:~/temp$ ls -l -- testfile
-rw-rw-r-- 1 acampbell acampbell 0 Apr 4 11:00 testfile
me#localhost:~/temp$ ls -- -l testfile
-l testfile
me#localhost:~/temp$
They can also specify a path that doesn't start with -, such as by using a redundant ./:
me#localhost:~/temp$ ls ./*
./-l ./testfile
me#localhost:~/temp$
Related
Question Details
Suppose we have a directory with three files in it: file_1, file_2, and the very inconveniently named file 3. If my understanding of filename expansion is correct, the way bash interprets the string
echo *
is that it sees the (unquoted) *, and modifies the string so that it now reads
echo file_1 file_2 file 3
Then, since there are no more expansions to be performed, bash attempts to evaluate the string. In this case, it runs the command echo, passing to it four arguments: file, 3, file_1, and file_2. In any case, the outputs are identical:
$ echo *
> file 3 file_1 file_2
$ echo file 3 file_1 file_2
> file 3 file_1 file_2
However, in other contexts, this doesn't seem to be what happens. For instance
$ arr1=( * )
$ arr2=( file 3 file_1 file_2 )
$ echo ${#arr1}
> 3
$ echo ${#arr2}
> 4
And yet, if shell expansion works the way it's described in the bash documentation, these ought to be identical.
Something similar happens in a for loop:
$ for f in *; do echo $f; done
> file 3
> file_1
> file_2
$ for f in file 3 file_1 file_2; do echo $f; done
> file
> 3
> file_1
> file_2
What am I missing? Does globbing not happen in these cases?
Use case
I'm putting together a GitHub repo to centralize my dotfiles, following this suggestion from MIT's Hacker Tools. The script I'm writing has two usages:
./install.sh DOTFILE [DOTFILE [DOTFILE ...]]
./install.sh -a
In the first case, each of the named dotfiles in src/config is symlinked to a corresponding dotfile in my home directory; in the second, the -a flag prompts the script to run as if I had entered every dotfile as an argument.
The solution I came up with was to run ln -sih in a for loop using one of two arrays: $# and *.1 So, simply assign FILES=( $# ) or FILES=( * ), and then run for f in $FILES--except, it seems to me, * should break in this assignment if there's a filename with a space in it. Clearly bash is smarter than me, since it doesn't, but I don't understand why.
1: Obviously, you don't want the script itself to run through the loop, but that's easy enough to exclude with an if [[ "$f" != "$0" ]] clause.
From the bash documentation you linked to:
The order of expansions is: brace expansion; tilde expansion,
parameter and variable expansion, arithmetic expansion, and command
substitution (done in a left-to-right fashion); word splitting; and
filename expansion.
Filename expansion happens after word splitting, and therefore the expanded filenames are not themselves subject to further word splitting.
I want to run my mpi program with 3 command line inputs.
mpirun -np 4 ./exe file_1 file_2 Size
for each file_1 there is associated file_2 and Size is same for each pair of files. I want to execute my program with the different number of processes say -np 2,4,6,8 and 10.
I have more than a hundred files. I want to execute my code once from command line that reads these files one by one with the specified number of processes.
For serial code, I have tried the following command and it works by taking all .txt files one by one.
find . -name "*.txt" | awk -F"/" '{system ("./a.out " $2)}'
I am not sure how I can execute with three program line inputs(file1,file2,size) at same time with different number of processes.
As far as I understand your question, you might for example use:
find . -name "*.txt" -print0 | xargs -0 -n2 -J% -- mpirun -np4 ./exe % $SIZE
If your working directory would contain for example files a.txt, b.txt, c.txt, d.txt, this would launch commands:
mpirun -np4 ./exe a.txt b.txt $SIZE
mpirun -np4 ./exe c.txt d.txt $SIZE
Here, find first finds all *.txt files in the current directory, outputs their names as null-delimited strings and pipes this list to xargs. The -0 option of xargs specifies that it should expect null-delimited input, -n2 says that it should take two files at a time, and -J% prescribes the replacement character %, i.e., the character which will be in the supplied command replaced with the file names taken from the input. Finally, -- merely denotes end of options and is followed by the command to be executed (containing the replacement character).
When I do ls -l I get
-rw-r--r-- 1 jboss admin **26644936** Sep 1 21:23 MyBig.war
How do I print it as below
-rw-r--r-- 1 jboss admin **26,644,936** Sep 1 21:23 MyBig.war
The proper way to format ls output is to specify BLOCK_SIZE.
Saying:
BLOCK_SIZE="'1" ls -l
would achieve your desired result.
Quoting from the above link:
Some GNU programs (at least df, du, and ls) display sizes in “blocks”.
You can adjust the block size and method of display to make sizes
easier to read.
A block size specification preceded by ‘'’ causes output sizes to be
displayed with thousands separators.
Using sed:
$ ls_output='-rw-r--r-- 1 jboss admin 26644936 Sep 1 21:23 MyBig.war'
$ echo $ls_output | sed -e :a -e 's/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/;ta'
-rw-r--r-- 1 jboss admin 26,644,936 Sep 1 21:23 MyBig.war
Above sed command repeatedly replace the last 4 digits #### with #,###.
-e :a: Make a label named a for t command.
ta: Jump to a if substitution was successful.
My program gets executed like:
$./sort 1 < test.txt
sort is the program name
1 is the argument (argv[1])
and test.txt is the file I am inputting from
Is it possible to extract the name file from this? if so how?
The problem is I already wrote my whole program as if I could extract the name from the input line, so I need to be able to pass it into arguments.
Any help is appreciated,
Thanks!
You can't. The shell opens (open(2)) that file and sets up the redirect (most likely using dup2).
The only possible way would be for the shell to explicitly export the information in an environment variable that you could read via getenv.
But it doesn't always make sense. For example, what file name would you expect from
$ echo "This is the end" | ./sort 1
Though this can't be done portably, it's possible on Linux by calling readlink on /proc/self/fd/0 (or /proc/some_pid/fd/0).
eg, running:
echo $(readlink /proc/self/fd/0 < /dev/null)
outputs:
/dev/null
No you can't: the shell sends the content of test.txt to the standard input of your program.
Look at this:
sort << _EOF
3
1
2
_EOF
The < > | operators are processed by the shell, they alter standard input,output,error of the programs in the cmd line.
If you happen to run Solaris, you could parse pfiles output to get the file associated, if any, with stdin.
$ /usr/bin/sleep 3600 < /tmp/foo &
[1] 8430
$ pfiles 8430
8430: /usr/bin/sleep 3600
Current rlimit: 65536 file descriptors
0: S_IFREG mode:0600 dev:299,2 ino:36867886 uid:12345 gid:67890 size=123
O_RDONLY|O_LARGEFILE
/tmp/foo
1: S_IFCHR mode:0600 dev:295,0 ino:12569206 uid:12345 gid:67890 rdev:24,2
...
On most Unix platforms, you will also get the same information from lsof -p if this freeware is installed.
Ok, I have been working with Solaris for a 10+ years, and have never seen this...
I have a directory listing which includes both a file and subdirectory with the same name:
-rw-r--r-- 1 root other 15922214 Nov 29 2006 msheehan
drwxrwxrwx 12 msheehan sysadmin 2048 Mar 25 15:39 msheehan
I use file to discover contents of the file, and I get:
bash-2.03# file msheehan
msheehan: directory
bash-2.03# file msh*
msheehan: ascii text
msheehan: directory
I am not worried about the file, but I want to keep the directory, so I try rm:
bash-2.03# rm msheehan
rm: msheehan is a directory
So here is my two part question:
What's up with this?
How do I carefully delete the file?
Jonathan
Edit:
Thanks for the answers guys, both (so far) were helpful, but piping the listing to an editor did the trick, ala:
bash-2.03# ls -l > jb.txt
bash-2.03# vi jb.txt
Which contained:
-rw-r--r-- 1 root other 15922214 Nov 29 2006 msheehab^?n
drwxrwxrwx 12 msheehan sysadmin 2048 Mar 25 15:39 msheehan
Always be careful with the backspace key!
I would guess that these are in fact two different filenames that "look" the same, as the command file was able to distinguish them when the shell passed the expanded versions of the name in. Try piping ls into od or another hex/octal dump utility to see if they really have the same name, or if there are non-printing characters involved.
I'm wondering what could cause this. Aside from filesystem bugs, it could be caused by a non-ascii chararacter that got through somehow. In that case, use another language with easier string semantics to do the operation.
It would be interesting to see what would be the output of this ruby snippet:
ruby -e 'puts Dir["msheehan*"].inspect'
You can delete using the iNode
If you use the "-i" option in "ls"
$ ls -li
total 1
20801 -rw-r--r-- 1 root root 0 2010-11-08 01:55 a?
20802 -rw-r--r-- 1 root root 0 2010-11-08 01:55 a\?
$ find . -inum 20802 -exec rm {} \;
$ ls -li
total 1
20801 -rw-r--r-- 1 root root 0 2010-11-08 01:55 a?
I've an example (in Spanish) how you can delete a file using then iNode on Solaris
http://sparcki.blogspot.com/2010/03/como-eliminar-archivos-utilizando-su.html
Urko,
And a quick answer to part 2 of my own question...
I would imagine I could rename the directory, delete the file, and rename the directory back to it's original again.
... I would still be interested to see what other people come up with.
JB
I suspect that one of them has a strange character in the name. You could try using the shell wildcard expansion to see that: type
cat msh*
and press the wildcard expansion key (in my shell it's Ctrl-X *). You should get two names listed, perhaps one of which has an escape character in it.
To see if there are special characters in your file, Try the -b or -q options to ls,
assuming solaris 8 has those options.
As another solution to deleting the file you can bring up the graphical file browser
(gasp!) and drag and drop the unwanted file to the trash.
Another solution might be to move the one file to a different name (the one without the unknown special character), then delete the special character directory name with wildcards.
mv msheehan temp
rm mshee*
mv temp msheehan
Of course, you want to be sure that only the file you want to delete matches the wildcard.
And, for your particular case, since one was a directory and the other a file, this command might have solved it all:
rmdir msheeha*
One quick-and-easy way to see non-printing characters and whitespace is to pipe the output through cat -vet, e.g.:
# ls -l | cat -vet
Nice and easy to remember!
For part 2, since one name contains two extra characters, you can use:
mv sheehan abc
mv sheeha??n xyz
Once you've done that, you've got sane file names again, that you can fix up as you need.