I am writing an option parser for a bash-like shell I develop.
Nevertheless, to be compatible with bash options, I must read some options which begin with a '+', like this:
./42sh +O autocd [...]
(The manual page says these options will passed to the builtin shopt which sets the values of settings).
The problem is that getopt_long() function only returns the options which begin with a - or --, if they are not alone. If they are, bash counsider them respectively as an alias for standard input and a end of options marker.
How I could get this type of options with getopt_long() ? Have I to parse these options by myself ?
EDIT : according to the #jxh response and the man 3 getopt page, I discovered that getopt and getopt_long arranges the **argv parameters array to move all arguments that don't seem to be valid options - from their point of view - at the end. So, I wrote the following code after the usual code which gets the normal options (all suggestions and remarks greatly appreciated):
EDIT2 : fixed memory leak due to strdup() at each iteration of the loop.
for(; optind < argc; ++optind)
{
const char *argv_copy = strdup(argv[optind]);
if (argv_copy[0] == '+' && argv_copy[1] == 'O')
{
/* A deactivation parameter have been just found ! */
if (handle_shopt_options(shell_options,
argv[optind + 1],
DISABLE) == EXIT_FAILURE)
{
usage(argv[optind]);
free_shell_options(shell_options);
shell_options = NULL;
}
++optind;
}
free(argv_copy);
argv_copy = NULL;
}
Some explanations:
optind is the argv index which tells us which argument will be parsed at the next pass. Since we parsed all the getopt() point-of-view valid arguments, and since getopt() moves all the non-options at the end, we will parse all remaining arguments, included the ones which interest us.
argv[argc] == NULL : this trick is used to know where is the end of the arguments list, so it is useless to parse the argument when optind == argc.
I am not comfortable to play directly with the current argv value, so I preferred to copy it in a new string, but I am probably wrong, to be verfied.
Some remarks:
getopt_long() will be available only if _GNU_SOURCE macro is defined.
strdup() will be available only if macro _XOPEN_SOURCE >= 500 or macro _POSIX_C_SOURCE >= 200809L.
As you have noted in your research, you cannot use getopt_long to parse options that begin with +.
As a workaround, you can scan argv[] yourself, and then create a new argument vector that substitutes --plus- in front of every argument you think is a + style option in the original argv[]. This new array should be parseable by getopt_long.
Related
I want to have a program with specific arguments of different types. For example I start my program like this:
./program picture.jpg image.jpg
./program picture.ppm image.jpg
Now I want my program to recognize the different extensions, jpg vs ppm. How can I easily do it? I tried:
if (argv[1][-3] == j)
{
//do something
}
...
But this is not working and I can't find any reply on my question on internet (probably, because I am asking badly...).
btw. I am python programmer and I am learning c...
You should
Check the count of argument from argc.
Loop over each command line argument by using argv[i], where i is from 1 to argc-1.
make use of strstr() to check if the .jpg or .ppm is present in any of the strings pointer to by argv[i].
The simplest way to find the (last) extension in a filename is to use the standard library function strrchr.
The following checks whether the extension of the first command-line argument is .jpg or .jpeg or some upper-case version (like .JPG or .Jpeg). It assumes that you have already ensured that argv[1] is valid by checking the value of argc.
char* extension = strrchr(argv[1], '.');
if (extension &&
(0 == strcasecmp(extension, ".jpg")
|| 0 == strcasecmp(extension, ".jpeg"))) {
/* Handle jpegs */
}
else {
/* Try something else */
}
In practice, you would probably wrap that into a function which returned some kind of enum representing the known extension types, so that you could use it with various arguments, not just the first one. You would also want to have some way to let the user manually specify a type, in case they wanted to call your program with a badly-named file.
A couple of usage notes:
Like strlen(), strrchr() must scan every byte in its argument. So avoid calling it more than once on the same string. If you'll need the result more than once, save it in a temporary.
If the filename doesn't contain the delimiter (in this case, .), strrchr returns NULL. If you attempt to use the return value without checking for this case, your program is likely to crash (and is certain to invoke Undefined Behaviour). So always check the return value.
To test the 3rd character counting from the end you can use:
if(argv[1][strlen(argv[1])-3] == 'j').
Note that this method will cause undefined behaviour if the length of the argument is lower than 3.
You can safeguard this by using if(strlen(argv[1]) > 3 && argv[1][strlen(argv[1])-3] == 'j')
Unlike Python where a negative array index means to start indexing from the end, a negative array index in C counts backward from the start, so that's not what you want.
You can use the strstr function to search for ".jpg" or ".ppm" in the string, or use strlen to get the length of the string and start looking at that index -3.
I've been looking every sscanf post here and I can't find an exact solution suitable for my problem. I was implementing my own Shell and one of the characteristics is that if I find the dollar sign $, I got to replace what is exactly behind with the environmental variable:
cd $HOME should actually be replaced by cd /home/user before I even execute the cd.
My question is what is the code to use sscanf to take out the dollar sign and simply get HOME on the same string? I've been struggling with some null pointers trying this:
char * change;
if (strcmp(argv[1][0],'$')==0){
change = malloc(strlen(argv[y]));
sscanf(argv2[y]+1,"%[_a-zA-Z0-9]",change);
argv2[y]=getenv(change);
}
But this seems to be failing, I'm having a segmentation fault core. (If needed i put more code, my question is specially focused on the sscanf).
Quick explanation argv is an array of pointers to the lines entered and parsed, so actually the content of argv[0] = "cd" and argv[1]="$HOME". I also know that the variable I'm going to receive after the $ has the format %[_a-zA-Z0-9].
Please ignore the non failure treatment.
You asked "is malloc() necessary" in your code snipped and the answer was "no", you could use a simple array. In reality, if you are simply making use of the return of getenv() without modification in the same scope without any other calls to getenv(), all you need is a pointer. getenv() will return a pointer to the value part of the name=value pair within the program environment. However the pointer may be a pointer to a statically allocated array, so any other calls to getenv() before you make use of the pointer can cause the text to change. Also, do not modify the string returned by getenv() or you will be modifying the environment of the process.
That said, for your simple case, you could do something similar to:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAXC 1024 /* if you need a constant, #define one (or more) */
int main (int argc, char **argv) {
char *envp = NULL, /* pointer for return of getenv() */
buf[MAXC]; /* buffer to parse argv[2] w/sscanf */
if (argc < 3) { /* validate at least 2 program arguments given */
printf ("usage: %s cmd path\n", strrchr (argv[0], '/') + 1);
return 1;
}
if (*argv[2] == '$') /* chest 1st char of argv[2] == '$' */
if (sscanf (argv[2] + 1, "%1023[_a-zA-Z0-9]", buf) != 1) {
fputs ("error: invalid format following '$'.\n", stderr);
return 1;
}
if (!(envp = getenv (buf))) { /* get environment var from name in buf */
fprintf (stderr, "'%s' not found in environment.\n", buf);
return 1;
}
printf ("%s %s\n", argv[1], envp); /* output resulting command line */
}
Right now the program just outputs what the resulting command line would be after retrieving the environment variable. You can adjust and build the array of pointers for execv as needed.
Example Use/Output
$ ./bin/getenvhome "cd" '$HOME'
cd /home/david
Look things over and let me know if you have any further questions.
You don't need sscanf here, you can just slide the pointer.
If argv[1] points to the string "$HOME", then argv[1] + 1 points to "HOME", so your example code would just become:
char * change;
if (argv[y][0] == '$')
{
change = argv[y] + 1;
}
(But in this case your variable should not be named change. Call your variables what they represent, for example in this case variable_name, because it contains the name of the shell variable you will be expanding - remember your code is for communicating to other humans your intent and other helpful information about the code.)
To be clear, whether you do sscanf or this trick, you still have to do error checking to make sure the variable name is actually the right characters.
Remember that sscanf won't tell you if there are wrong characters, it'll just stop - if the user writes a variable like $FO)O (because they made a typo while trying to type $FOO) sscanf will just scan out the first valid characters and ignore the invalid ones, and return FO instead.
In this case ignoring bad data at the end would be bad because user interfaces (that includes shells) should minimize the chances that a user mistake silently does an unintended wrong thing.
I need a hand in terms of processing command line argument (on Windows) in C.
Suppose I have the following situation
C:\Users\USER\Desktop> my_executable arg1 ' "A>200 && B<300 (just some conditions" '
In this case argc = 5
and
C:\Users\USER\Desktop> my_executable arg1 '"A>200 && B<300 (just some conditions"'
In this case argc = 3
Depending on users, the argv and argc will be different.
How can I write the code such that the condition and arg1 can be stored correctly :)
Required:
arg1 is stored into a char pointer
condition is also stored into a char pointer
Thanks
Don't use single quotes as argument quotes on Windows unless you want to implement your own argument parser. ^ can be used to escape " and itself and a few other things. To embed " in arguments use "".
If you really need to, call GetCommandLineW and parse yourself. GetCommandLineW returns a string that consists of the executable image name possibly enclosed in double quotes, followed by an optional space and the arguments exactly as given to CreateProcess (which means that ^ processing has already taken place).
I am reading the book Windows System Programming. In the second chapter, there is a program Cat.c , It implements the cat command of linux. The code is http://pastebin.com/wwQFp599
On the 20th line, a function is called
iFirstFile = Options (argc, argv, _T("s"), &dashS, NULL);
Code for Option is http://pastebin.com/QegxxFpn
Now, the parameters for option is
(int argc, LPCTSTR argv [], LPCTSTR OptStr, ...)
1) What is this "..."? Do it mean we can supply it unlimited number of arguments of type LPCTSTR ?
2)If I execute the program as cat -s a.txt
a) What will be the argc and why?
b) What will be the argv and why?
c) What is _T("s")? Why _T is used here?
d) Why &dashS is used? It is address of a boolean most probably. But I can't understand the logic behind use of this.
e) Why they have passed NULL as last parameter?
I have Basic knowledge of C programming and these things are really confusing. So kindly explain.
You have two different kinds of "variable" lists of arguments here.
First, you have the arguments passed to the program on the command line, clearly a person could invoke the program from the command line with many arguments
cat file1 file2 file3
and so on. The main() of C programs have since the early days of C given access to the command line arguments in the variables argc and argv, argc is the count of how many arguments (3 + the name of the pogram itself in my example above) and argv is the array of the arguments (actually an array of pointers to strings) So in this case we can access argv[0], argv[1], arv[2] and argv[3], knowing to stop there because argc tells us there are four arguments.
So in your example argc will be 3, argv[0] will point to "cat", argv[1] to "-s" and argv[2] to "a.txt".
Next the function you are looking at itself takes an indefinate number of arguments as indicated by the elipses - the ...
You need to read about variable arguments. This is a language feature that was not in the earliest C language, and is considered to be a little bit advanced, hence some of your books either may not cover it, or leave until late in the book. The key point here though is that we interpretting the variable list we need to know when we have reached the end of the variable list, we don't have an "argc" equivalent. So we put a "this is the last one, stop here" value in the function call, that's the NULL you ask about.
1) "..." is a variable argument list as username Cornstalks pointed out. It allows functions like printf() to have a variable number of arguments but their type and number of arguments have to be specified in one of the arguments (like the formatting string for printf()). See *va_list.h* or stdarg.h.
2) a) argc is the number of arguments specified at the command line.
b) argv is the argument array, it's an array of strings.
c) The _T() is a macro, I know it as TEXT(). Basically it allows programmers to use either ASCII strings or Unicode strings at build time without having to modify the entire code. If the UNICODE macro is defined the string specified as argument to the _T() macro becomes L"string", else it becomes "string". That's why some functions have an A or a W as a last letter. For example, OutputDebugString defaults to OutputDebugStringW if UNICODE is defined and OutputDebugStringA if UNICODE is not defined. The functions that have A as a last letter in their name only accept ASCII strings while W only accepts Unicode strings. There's also a type defined for this purpose, TCHAR defaults either to CHAR or WCHAR, and there's also another entry point, i.e. _tmain().
d) &variable means the address of a variable. It is used to pass along to a function the location in the memory of the variable contents so that if the function modifies the value of the variable, the variable is modified everywhere else where it is used.
e) You'll have to look at the function prototype.
It seems to me that you have been misled to believe that starting Windows Programming is the way to go if you want to learn to program. The C and C++ programming languages are OS independent by default and you should learn the independent part first. I recommend "C Programming : A modern approach".
I'm checking some OpenCV tutorial and found this line at the beginning (here is the link, code is under the CalcHist section http://opencv.willowgarage.com/documentation/c/histograms.html)
if (argc == 2 && (src = cvLoadImage(argv[1], 1)) != 0)
I've never seen this before and really don't understand it. I checked some Q&A regarding this subject but still don't understand it.
Could someone explain to me what is the meaning of this line?
Thanks!
The line does the following, in order:
Tests if argc == 2 - that is, if there was exactly 1 command line argument (the first "argument" is the executable name)
If so (because if argc is not 2, the short-circuiting && will abort the test without evaluating the right-hand-side), sets src to the result of cvLoadImage called on that command-line argument
Tests whether that result (and hence src) is not zero
argc and argv are the names (almost always) given to the two arguments taken by the main function in C. argc is an integer, and is equal to the number of command-line arguments present when the executable was called. argv is an array of char* (representing an array of NULL-terminated strings), containing the actual values of those command-line arguments. Logically, it contains argc entries.
Note that argc and argv always have the executable's name as the first entry, so the following command invocation:
$> my_program -i input.txt -o output.log
...will put 5 in argc, and argv will contain the five strings my_program, -i, input.txt, -o, output.log.
So your quoted if-test is checking first whether there was exactly 1 command-line argument, apart from the executable name (argc == 2). It then goes on to use that argument (cvLoadImage(argv[1], 1))
Checking argc and then using argv[n] is a common idiom, because it is unsafe to access beyond the end of the argv array.