Command line arguments with datafiles - c

If I want to pass a program data files how can I distinguish the fact they are data files, not just strings of the file names. Basically I want to file redirect, but use command line arguments so I can a sure input is correct.
I have been using:
./theapp < datafile1 < datafile2 arg1 arg2 arg3 > outputfile
but I am wondering is it posible for it to look like this:
./the app datafile1 datafile2 arg1 arg2 arg3 > outputfile
Allowing the use of command line arguments.

It's a little hard to combine two files into standard input like that. Better would be:
cat datafile1 datafile2 | ./theapp arg1 arg2 arg3 >outputfile
With bash (at least), the second input redirection overrides the first, it does not augment it. You can see that with the two commands:
cat <realfile.txt </dev/null # no output.
cat </dev/null <realfile.txt # outputs realfile.txt.
When you use redirection, your application never even sees >outputfile (for example). It is evaluated by the shell which opens it up and connects it to the standard output of the process you're trying to run. All your program will generally see will be:
./theapp arg1 arg2 arg3
Same with standard input, it's taken care of by the shell.
The only possible problem with that first command above is that it combines the two files into one stream so that your program doesn't know where the first ends and second begins (unless it can somehow deduce this from the content of the files).
If you want to process multiple files and know which they are, there's a time-honoured tradition of doing something like:
./theapp arg1 arg2 arg3 #datafile1 #datafile2 >outputfile
and then having your application open and process the files itself. This is more work than letting the shell do it though.

From the perspective of your program, all command line arguments are strings, and you have to decide whether they represent file names or not yourself. There are only two bytes that cannot appear in a file name on Unix: 0x00 and 0x2F (NUL and /). [I really mean bytes. Except for HFS+, Unix file systems are completely oblivious to character encoding, although sensible people use UTF-8, of course.]
Shell redirections don't appear in argv at all.
There is a convention, though: treat each element of argv (except argv[0] of course) that does not begin with a dash as the name of a file to process, in the order that they appear. You do NOT have to do any unquoting operations; just pass them to fopen (or open) as is. If the string "-" appears as an element of argv, process standard input at that point until exhausted, then continue looping over argv. And if the string "--" appears in argv, treat everything after that point as a file name, whether or not it begins with a dash. (Including subsequent appearances of "-" or "--").
There may be a handy library module or even a language primitive to deal with this stuff for you, depending on what language you're using. For instance, in Perl, you just write
for (<>) {
... do stuff with $_ ...
}
and you get everything I said in the "There is a convention..." paragraph for free. (But you said C, so, um, you gotta do most of it yourself. I'm not aware of an argument-processing library for plain C that's worth the space it takes on disk. :-( )

Related

Strange behavior of argv when passing string containing "!!!!"

I have written a small program that takes some input parameters from *argv[] and prints them. In almost all use cases my code works perfectly fine. A problem only arises when I use more than one exclamation mark at the end of the string I want to pass as an argument ...
This works:
./program -m "Hello, world!"
This does NOT work:
./program -m "Hello, world!!!!"
^^ If I do this, the program output is either twice that string, or the command I entered previous to ./program.
However, what I absolutely don't understand: The following, oddly enough, DOES work:
./program -m 'Hello, world!!!!'
^^ The output is exactly ...
Hello, world!!!!
... just as desired.
So, my questions are:
Why does this strange behavior occur when using multiple exclamation marks in a string?
As far as I know, in C you use "" for strings and '' for single chars. So why do I get the desired result when using '', but not when using "" as I should (in my understanding)?
Is there a mistake in my code or what do I need to change to be able to enter any string (no matter if, what, and how many punctuation marks are used) and get exactly that string printed?
The relevant parts of my code:
// this is a simplified example that, in essence, does the same
// as my (significantly longer) code
int main(int argc, char* argv[]) {
char *msg = (char *)calloc(1024, sizeof(char));
printf("%s", strcat(msg, argv[2])); // argv[1] is "-m"
free(msg);
}
I already tried copying the content of argv[2] into a char* buffer first and appending a '\0' to it, which didn't change anything.
This is not related to your code but to the shell that starts it.
In most shells, !! is shorthand for the last command that was run. When you use double quotes, the shell allows for history expansion (along with variable substitution, etc.) within the string, so when you put !! inside of a double-quoted string it substitutes the last command run.
What this means for your program is that all this happens before your program is executed, so there's not much the program can do except check if the string that is passed in is valid.
In contrast, when you use single quotes the shell does not do any substitutions and the string is passed to the program unmodified.
So you need to use single quotes to pass this string. Your users would need to know this if they don't want any substitution to happen. The alternative is to create a wrapper shell script that prompts the user for the string to pass in, then the script would subsequently call your program with the proper arguments.
The shell does expansion in double-quoted strings. And if you read the Bash manual page (assuming you use Bash, which is the default on most Linux distributions) then if you look at the History Expansion section you will see that !! means
Refer to the previous command.
So !!!! in your double-quoted string will expand to the previous command, twice.
Such expansion is not made for single-quoted strings.
So the problem is not within your program, it's due to the environment (the shell) calling your program.
In addition to the supplied answers, you should remember that echo is your shell friend. If you prefix your command with "echo ", you will see what shell is actually sending to your script.
echo ./program -m "Hello, world!!!!"
This would have showed you some strangeness and might have helped steer you in the right direction.

How to pass a filename when executing a C program

I am trying to not hardcode the name of the input file in my C program. I have all of the other components working when I hardcode the filename. But would like to be able to pass it a string filename.
I am trying to execute compile a file called Matrix.c and name its executable matrix.
So, in terminal, when I get to my working directory.
gcc -g Matrix.c -o matrix
then when I compile
./matrix
It doesn't have a filename passed to it so I am gonna check for that and have the user input a filename to load.
However, when someone passes the filename, should it be passed as:
./matrix filename.txt
or
./matrix < filename.txt
With the latter option, I can't seem to get the name of the argument passed to the function from argv[1] — it's just "(Null)".
I know this is very simplistic question. But am I just completely off my rocker? Is it something to do with me running on OS X El Capitan. I know I've used the '<' convention before.
The issue is how the shell works, mainly. When you use:
./matrix filename.txt
then the program is given two arguments — the program name and the file name. When you use:
./matrix < filename.txt
then the program is given just one argument — the program name — and the shell arranges for its standard input to come from the file (and the file name is not passed to your program).
Either can be made to work; you just have to decide which you want to support. What should happen if the user types ./matrix file1.txt file2.txt file3.txt? One version of conventional behaviour would be to process each file in turn, writing each set of results to standard output. There are plenty of alternative behaviours — most of them have been used by someone at some time or another. Reading from standard input when there is no file name specified is a common mode of operation (think cat and grep and …).
Arguments to a command are in argv[1 .. argc-1].
The redirect from '<' sends the contents of the file to the program's stdin.
A third way to get the filename would be to print "Enter filename: " and then read the string typed by the user.

Trying to get an asterisk * as input to main from command line

I'm trying to send input from the command line to my main function. The input is then sent to the functions checkNum etc.
int main(int argc, char *argv[])
{
int x = checkNum(argv[1]);
int y = checkNum(argv[3]);
int o = checkOP(argv[2]);
…
}
It is supposed to be a calculator so for example in the command line when I write:
program.exe 4 + 2
and it will give me the answer 6 (code for this is not included).
The problem is when I want to multiply and I type for example
program.exe 3 * 4
It seems like it creates a pointer (or something, not quite sure) instead of giving me the char pointer to the char '*'.
The question is can I get the input '*' to behave the same way as when I type '+'?
Edit: Writing "*" in the command line works. Is there a way where I only need to type *?
The code is running on Windows, which seems to be part of the problem.
As #JohnBollinger wrote in the comments, you should use
/path/to/program 3 '*' 4
the way it's written at the moment.
But some explanation is clearly required. This is because the shell will parse the command line before passing it to your program. * will expand to any file in the directory (UNIX) or something similar (windows), space separated. This is not what you need. You cannot fix it within your program as it will be too late. (On UNIX you can ensure you are in an empty directory but that probably doesn't help).
Another way around this is to quote the entire argument (and rewrite you program appropriately), i.e.
/path/to/program '3 * 4'
in which case you would need to use strtok_r or strsep to step through the (single) argument passed, separating it on the space(s).
How the shell handles the command-line arguments is outside the scope and control of your program. There is nothing you can put in the program to tell the shell to avoid performing any of its normal command-handling behavior.
I suggest, however, that instead of relying on the shell for word splitting, you make your program expect the whole expression as a single argument, and for it to parse the expression. That will not relieve you of the need for quotes, but it will make the resulting commands look more natural:
program.exe 3+4
program.exe "3 + 4"
program.exe "4*5"
That will also help if you expand your program to handle more complex expressions, such as those containing parentheses (which are also significant to the shell).
You can turn off the shell globbing if you don't want to use single quote (') or double quote (").
Do
# set -o noglob
or
# set -f
(both are equivalent).
to turn off the shell globbing. Now, the shell won't expand any globs, including *.

ls piped into the command line

I have been trying to pipe in the results from ls into the command line for a C program I am writing (in Unix). I want to be able to have an index of the files and so I was planning on using argv. This is how I thought it should work:
./foo &(ls ~/path)
It doesn't work — what's the correct way to pass the output of ls as arguments to the command?
Your syntax is a bit off...
./foo $(ls ~/path)
Do note that this will choke on files with certain characters in them. Use an array instead to fix this.
pushd ~/path
files=(*)
popd
./foo "${files[#]}"
The notation you specified does two things:
./foo &
runs the program foo in the background (with no arguments other than its command name). Then:
(ls ~/path)
runs the ls command in a sub-shell (which, in this context, is the same as running it in the main shell). The problem is you intended (or need) to use $ in place of &.
./foo $(ls ~/path)
This runs the command ls ~/path and captures the output, which is split into words (using the separators listed in the $IFS variable). Each word is then supplied as an argument to the command ./foo, as you required.
We can then debate the wisdom of using the output of ls like that, but unless you have file names containing spaces (tabs, newlines etc,) you will be OK.
You know how Unix tools accept glob patterns, so you can do cat *.txt or rm ~/Pictures/Vacation*.jpg, without having to pipe/expand ls?
That's an ability your shell gives your program for free!
Just use ./foo ~/path/* and argv[1] will contain /home/you/path/fileone, argv[2] will contain /home/you/path/filetwo, and so forth.
These filenames may be relative or absolute, but can always be passed directly to open/fopen/execve or whichever function you want to use.
Using ls as you describe will only give you the last part of the filename with no directory, so you won't know where the files are to do anything with them (though if that's what you want, just use basename(argv[1])).

How does grep work?

I am trying to understand how grep works.
When I say grep "hello" *.*, does grep get 2 arguments — (1) string to be searched i.e. "hello" and (2) path *.*? Or does the shell convert *.* into something that grep can understand?
Where can I get source code of grep? I came across this GNU grep link. One of the README files says its different from unix grep. How so?
I want to look at source of FreeBSD version of grep and also Linux version of it (if they are different).
The power of grep is the magic of automata theory. GREP is an abbreviation for Global Regular Expression Print. And it works by constructing an automaton (a very simple "virtual machine": not Turing Complete); it then "executes" the automaton against the input stream.
The automaton is a graph or network of nodes or states. The transition between states is determined by the input character under scrutiny. Special automatons like + and * work by having transitions that loop back to themselves. Character classes like [a-z] are represented by a fan: one start node with branches for each character out to the "spokes"; and usually the spokes have a special "epsilon transition" to a single final state so it can be linked up with the next automaton to be built from the regular expression (the search string). The epsilon transitions allow a change of state without moving forward in the string being searched.
Edit: It appears I didn't read the question very closely.
When you type a command-line, it is first pre-processed by the shell. The shell performs alias substitutions and filename globbing. After substituting aliases (they're like macros), the shell chops up the command-line into a list of arguments (space-delimited). This argument list is passed to the main() function of the executable command program as an integer count (often called argc) and a pointer to a NULL-terminated ((void *)0) array of nul-terminated ('\0') char arrays.
Individual commands make use of their arguments however they wish. But most Unix programs will print a friendly help message if given the -h argument (since it begins with a minus-sign, it's called an option). GNU software will also accept a "long-form" option --help.
Since there are a great many differences between different versions of Unix programs the most reliable way to discover the exact syntax that a program requires is to ask the program itself. If that doesn't tell you what you need (or it's too cryptic to understand), you should next check the local manpage (man grep). And for gnu software you can often get even more info from info grep.
The shell does the globbing (conversion from * form to filenames). You can see this by if you have a simple C program:
#include <stdio.h>
int main(int argc, char **argv) {
for(int i=1; i<argc; i++) {
printf("%s\n", argv[i]);
}
return 0;
}
And then run it like this:
./print_args *
You'll see it prints out what matched, not * literally. If you invoke it like this:
./print_args '*'
You'll see it gets a literal *.
The shell expands the '*.*' into a list of file names and passes the expanded list of file names to the program such as grep. The grep program itself does not do expansion of file names.
So, in answer to your question: grep does not get 2 arguments; the shell converts '*.*' into something grep can understand.
GNU grep is different from Unix grep in supporting extra options, such as -w and -B and -A.
It looks to me like FreeBSD uses the GNU version of grep:
http://svnweb.freebsd.org/base/stable/8/gnu/usr.bin/grep/
How grep sees the wildcard argument depends on your shell. (Standard) Bourne shell has a switch (-f) to disable file name globbing (see man pages).
You may activate this switch in a script with
set -f

Resources