How to use sscanf to parse an program argument in C - c

I've been looking every sscanf post here and I can't find an exact solution suitable for my problem. I was implementing my own Shell and one of the characteristics is that if I find the dollar sign $, I got to replace what is exactly behind with the environmental variable:
cd $HOME should actually be replaced by cd /home/user before I even execute the cd.
My question is what is the code to use sscanf to take out the dollar sign and simply get HOME on the same string? I've been struggling with some null pointers trying this:
char * change;
if (strcmp(argv[1][0],'$')==0){
change = malloc(strlen(argv[y]));
sscanf(argv2[y]+1,"%[_a-zA-Z0-9]",change);
argv2[y]=getenv(change);
}
But this seems to be failing, I'm having a segmentation fault core. (If needed i put more code, my question is specially focused on the sscanf).
Quick explanation argv is an array of pointers to the lines entered and parsed, so actually the content of argv[0] = "cd" and argv[1]="$HOME". I also know that the variable I'm going to receive after the $ has the format %[_a-zA-Z0-9].
Please ignore the non failure treatment.

You asked "is malloc() necessary" in your code snipped and the answer was "no", you could use a simple array. In reality, if you are simply making use of the return of getenv() without modification in the same scope without any other calls to getenv(), all you need is a pointer. getenv() will return a pointer to the value part of the name=value pair within the program environment. However the pointer may be a pointer to a statically allocated array, so any other calls to getenv() before you make use of the pointer can cause the text to change. Also, do not modify the string returned by getenv() or you will be modifying the environment of the process.
That said, for your simple case, you could do something similar to:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAXC 1024 /* if you need a constant, #define one (or more) */
int main (int argc, char **argv) {
char *envp = NULL, /* pointer for return of getenv() */
buf[MAXC]; /* buffer to parse argv[2] w/sscanf */
if (argc < 3) { /* validate at least 2 program arguments given */
printf ("usage: %s cmd path\n", strrchr (argv[0], '/') + 1);
return 1;
}
if (*argv[2] == '$') /* chest 1st char of argv[2] == '$' */
if (sscanf (argv[2] + 1, "%1023[_a-zA-Z0-9]", buf) != 1) {
fputs ("error: invalid format following '$'.\n", stderr);
return 1;
}
if (!(envp = getenv (buf))) { /* get environment var from name in buf */
fprintf (stderr, "'%s' not found in environment.\n", buf);
return 1;
}
printf ("%s %s\n", argv[1], envp); /* output resulting command line */
}
Right now the program just outputs what the resulting command line would be after retrieving the environment variable. You can adjust and build the array of pointers for execv as needed.
Example Use/Output
$ ./bin/getenvhome "cd" '$HOME'
cd /home/david
Look things over and let me know if you have any further questions.

You don't need sscanf here, you can just slide the pointer.
If argv[1] points to the string "$HOME", then argv[1] + 1 points to "HOME", so your example code would just become:
char * change;
if (argv[y][0] == '$')
{
change = argv[y] + 1;
}
(But in this case your variable should not be named change. Call your variables what they represent, for example in this case variable_name, because it contains the name of the shell variable you will be expanding - remember your code is for communicating to other humans your intent and other helpful information about the code.)
To be clear, whether you do sscanf or this trick, you still have to do error checking to make sure the variable name is actually the right characters.
Remember that sscanf won't tell you if there are wrong characters, it'll just stop - if the user writes a variable like $FO)O (because they made a typo while trying to type $FOO) sscanf will just scan out the first valid characters and ignore the invalid ones, and return FO instead.
In this case ignoring bad data at the end would be bad because user interfaces (that includes shells) should minimize the chances that a user mistake silently does an unintended wrong thing.

Related

Use scanf to get commands and arguments in C

I need to create a method that get's commands from users using scanf and runs a function. The command can be simple as help or list but it can also be a command that has an argument like look DIRECTION or take ITEM. What is the best way to go about this? I could just loop through the characters of a single given string and check it manually but I was wondering there was a better way of doing this.
scanf("%s %s", command, argument);
This won't work if there's no argument. Is there a way around this?
There is a 'method' that may work. In fact, two come to mind.
Both rely on whitespace chars (in plain-english, '\n', ' 'and '\t') separating the arguments , and I assume this is good enough.
1
First, the relatively easy one - using main(int argc,char *argv[]) as most CLI programs do.
Then, running a long string of if()s/else if()s which check if the input string matched valid arguments , by testing if strcmp(argv[x],expected_command) returns 0.
You may not yet have been taught about how to use this, and it may appear scary, but its quite easy if you are familiar with string.h, arrays and pointers already.
Google searches and YouTube videos may be of help, and it won't take more than 20 or so minutes.
2
Second, if you have your program with a real CLU 'UI' and the program is in a loop and doesn't just terminate once output is generated - unlike say cat or ls , then you take input of 'command' strings within the program.
This means you will have to, apart from and before the if-ed strcmp()s , ensure that you take input with scanf() safely, and that you are able to take multiple strings as input, since you talk of sub-arguments like look DIRECTION.
The way I have done this myself (in the past) is as follows :
1. Declare a command string, say char cmd[21] = ""; and (optionally) initialise it to be empty , since reading an uninitialised string is UB (and the user may enter EOF).
2. Declare a function (for convenience) to check scanf() say like so:
int handle_scanf(int returned,int expected){
if(returned==expected)
return 0;
if(returned==EOF){
puts("\n Error : Input Terminated Immaturely.");
/* you may alternatively do perror() but then
will have to deal with resetting errno=0 and
including errno.h */
return -1;
}
else{
puts("\n Error : Insufficient Input.");
return -2;
}
}
Which can be used as : if(handle_scanf(scanf(xyz,&xyz),1)==0) {...}
As scanf() returns number of items 'taken' (items that matched with expected format-string and were hence saved) and here there is only 1 expected argument.
3. Declare a function (for convenience) to clear/flush stdin so that if and when unnecessary input is left in the input stream , (which if not dealt with, will be passed to the next place where input is taken) it can be 'eaten'.
I do it like so :
void eat()
{
int eat; while ((eat = getchar()) != '\n' && eat != EOF);
}
Essentially clears input till a newline or EOF is read. Since '\n' and EOF represent End Of Line and End Of File , and modern I/O is line buffered and performed through the stdin file , it makes sense to stop upon reading them.
EDIT : You may alternatively use a macro, for slightly better performance.
4. Print a prompt and take input, like so :
fputs("\n >>> ",stdout);
int check = handle_scanf(scanf("%20s",cmd),1);
Notice what I did here ?
"%20s" does two things - stops buffer overflow (because more than 20 chars won't be scanned into cmd) and also stops scanning when a whitespace char is encountered. So, your main command must be one-word.
5. Check if the the command is valid .
This is to be done with the aforementioned list of checking if strcmp(cmd,"expected_cmd")==0 , for all possible expected commands.
If there is no match, with an else , display an error message and call eat();(arguments to invalid command can be ignored) but only if(check != -1).
If check==-1 , this may mean that the user has sent an EOF signal to the program, in which case, calling eat() within a loop will result in an infinite loop displaying the error message, something which you don't want.
6. If there is a match, absorb the whitespace separating char and then scanf() into a char array ( if the user entered, look DIRECTION, DIRECTION is still in the input stream and will only now be saved to said char array ). This can be done like so :
#define SOME_SIZE 100 // use an appropriate size
if(strcmp(cmd,"look")==0 && check==0){ // do if(check==0) before these ifs, done here just for my convenience)
getchar(); // absorb whitespace seperator
char strbuff[SOME_SIZE] = ""; // string buffer of appropriate size
if(handle_scanf(scanf("%99[^\n]",strbuff),1)==0){
eat();
/* look at DIRECTION :) */
}
// handle_scanf() generated appropriate error msg if it doesn't return 0
}
Result
All in all, this code handles scanf mostly safely and can indeed be used in a way that the user will only type , say :
$ ./myprogram
>>> look DIRECTION
# output
>>> | #cursor
If it is all done within a big loop inside main() .
Conclusion
In reality, you may end up needing to use both together if your program is complex enough :)
I hope my slightly delayed answer is of help :)
In case of any inaccuracies , or missing details, please comment and I will get back to you ASAP
Here's a good way to parse an inputted string using strtok and scanf with a limit of 99 characters
#include <string.h>
char command[99];
scanf("%[^\n]%*c", command); //This gets the entire string and spaces
char *token;
token = strtok(command, " "); //token = the first string separated by a " "
if (strcmp(token, "help") == 0){
//do function
}
else if (strcmp(token, "go") == 0){ //if the command has an argument, you have to get the next string
token = strtok(NULL, " "); //this gets the next string separated by a space
if (strcmp(token, "north") == 0){
//do function
}
}
You can keep using token = strtok(NULL, " "); until token = NULL signifying the end of a string

**argv contain a lot of characters than expected

First, I need to execute two commands with system(), for example, I receive an string and open this string with an text editor, like this:
$ ./myprogram string1
And the output should be a command like this:
$ vim string1
But, I cannot find a way to do this like this pseudo code:
system("vim %s",argv[1]); //Error:
test.c:23:3: error: too many arguments to function 'system'
system("vim %s",argv[1]);
Therefore, my solution is store the argv[1] on a char array that already initialized with four characters, like this:
char command[strlen(argv[1])+4];
command[0] = 'v'; command [1] = 'i'; command[2] = 'm'; command[3] = ' ';
And assign the argv[1] to my new char array:
for(int i = 0; i < strlen(argv[1]) ; i++)
command[i+4] = argv[1][i];
And finally:
system(command);
But, if the arguments given to my program has less than 3 characters, its works fine, but if not, some weird characters that I do not expect appear in the output, like this:
./myprogramg 1234
And the output is:
$ vim 12348�M�
How can I solve this bug and why does this happen?
The full code is:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main (int argc,char **argv) {
char command[strlen(argv[1])+4];
command[0] = 'v'; command [1] = 'i'; command[2] = 'm'; command[3] = ' ';
for(int i = 0; i < strlen(argv[1]) ; i++)
command[i+4] = argv[1][i];
system(command);
return 0;
}
You need to NUL terminate your C-style strings, and that includes allocating enough memory to hold the NUL.
Your array is a byte short (must be char command[strlen(argv[1])+4+1]; to leave space for NUL), and you should probably just use something like sprintf to fill it in, e.g.:
sprintf(command, "vim %s", argv[1]);`
That's simpler than manual loops, and it also fills in the NUL for you.
The garbage you see is caused by the search for the NUL byte (which terminates the string) wandering off into unrelated (and undefined for that matter) memory that happens to occur after the buffer.
The reason you're running into problems is that you aren't terminating your command string with NULL. But you really want to use sprintf (or even better to use snprintf) for something like this. It works similarly to printf but outputs to memory instead of stdout and handles the terminating NULL for you. E.g:
char cmd[80];
snprintf(cmd, 80, "vim %s", argv[1])
system(cmd);
As #VTT points out, this simplified code assumes that the value in argv[1] will be less than 75 characters (80 minus 4 for "vim " minus 1 for the NULL character). One safer option would be to verify this assumption first and throw an error if it isn't the case. To be more flexible you could dynamically allocate the cmd buffer:
char *cmd = "vim ";
char *buf = malloc(strlen(argv[1]) + strlen(cmd) + 1);
sprintf(buf, "%s%s", cmd, argv[1]);
system(buf);
free(buf);
Of course you should also check to be sure argc > 1.
I know that there are already good answers here, but I'd like to expand them a little bit.
I often see this kind of code
system("vim %s",argv[1]); //Error:
and beginners often wonder, why that is not working.
The reason for that is that "%s", some_string is not a feature of the C
language, the sequence of characters %s has no special meaning, in fact it is
as meaningful as the sequence mickey mouse.
The reason why that works with printf (and the other members of the
printf-family) is because printf was designed to replace sequences like
%s with a value passed as an argument. It's printf which make %s special,
not the C language.
As you may have noticed, doing "hallo" + " world" doesn't do string
concatenation. C doesn't have a native string type that behaves like C++'s
std::string or Python's String. In C a string is just a sequence of
characters that happen to have a byte with value of 0 at the end (also called
the '\0'-terminating byte).
That's why you pass to printf a format as the first argument. It tells
printf that it should print character by character unless it finds a %,
which tells printf that the next character(s)1 is/are special and
must substitute them with the value passed as subsequent arguments to printf.
The %x are called conversion specifiers and the documentation of printf
will list all of them and how to use them.
Other functions like the scanf family use a similar strategy, but that doesn't
mean that all functions in C that expect strings, will work the same way. In
fact the vast majority of C functions that expect strings, do not work in that
way.
man system
#include <stdlib.h>
int system(const char *command);
Here you see that system is a function that expects one argument only.
That's why your compiler complains with a line like this: system("vim %s",argv[1]);.
That's where functions like sprintf or snprintf come in handy.
1If you take a look at the printf documentation you will see that
the conversion specifier together with length modifiers can be longer than 1
character.

How does strace read the file name of system call sys_open?

I am writing a program which uses Ptrace and does the following:
It reads the current eax and checks if the system call is sys_open.
If it is then i need to know what are the arguments that are passed.
int sys_open(const char * filename, const int mode, const int mask)
So eax = 5 implies it is a open system call
I came to know ebx has the address of the file location from this Question
But how do I knows the length of the file name so I can read the contents in that location?
I came across the following questions which address the same
Question 1
Question 2 (This one is mine only!)
But I still didn't get a solution to my problem. :( as both the answers were not clear.
I am still getting a segmentation fault when I try the approach in the Question-1
You can check my code here
So Now I really was wondering how does strace extract these values so beautifully :(
As you know, sys_open() doesn't receive the size of the filename as parameter. However, the standard says that a literal string must end with a \0 character. This is good news, because now we can do a simple loop iterating over the characters of the string, and when we find a \0 (NULL) character we know we've reached the end of it.
That's the standard procedure, that's how strlen() does it, and also how strace does it!
C example:
#include <stdio.h>
int main()
{
const char* filename = "/etc/somefile";
int fname_length = 0;
for (int i = 0; filename[i] != '\0'; i++)
{
fname_length++;
}
printf("Found %d chars in: %s\n", fname_length, filename);
return 0;
}
Back to your task at hand, you must access the address of filename and perform the procedure I just described. This is something you will have to do, and there's no other way.

Looping until a specific string is found

I have a simple question. I want to write a program in C that scans the lines of a specific file, and if the only phrase on the line is "Atoms", I want it to stop scanning and report which line it was on. This is what I have and is not compiling because apparently I'm comparing an integer to a pointer: (of course "string.h" is included.
char dm;
int test;
test = fscanf(inp,"%s", &dm);
while (test != EOF) {
if (dm=="Amit") {
printf("Found \"Atoms\" on line %d", j);
break;
}
j++;
}
the file was already opened with:
inp = fopen( .. )
And checked to make sure it opens correctly...
I would like to use a different approach though, and was wondering if it could work. Instead of scanning individual strings, could I scan entire lines as such:
// char tt[200];
//
// fgets(tt, 200, inp);
and do something like:
if (tt[] == "Atoms") break;
Thanks!
Amit
Without paying too much attention to your actual code here, the most important mistake your making is that the == operator will NOT compare two strings.
In C, a string is an array of characters, which is simply a pointer. So doing if("abcde" == some_string) will never be true unless they point to the same string!
You want to use a method like "strcmp(char *a, char *b)" which will return 0 if two strings are equal and something else if they're not. "strncmp(char *a, char *b, size_t n)" will compare the first "n" characters in a and b, and return 0 if they're equal, which is good for looking at the beginning of strings (to see if a string starts with a certain set of characters)
You also should NOT be passing a character as the pointer for %s in your fscanf! This will cause it to completely destroy your stack it tries to put many characters into ch, which only has space for a single character! As James says, you want to do something like char ch[BUFSIZE] where BUFSIZE is 1 larger than you ever expect a single line to be, then do "fscanf(inp, "%s", ch);"
Hope that helps!
please be aware that dm is a single char, while you need a char *
more: if (dm=="Amit") is wrong, change it in
if (strcmp(dm, "Amit") == 0)
In the line using fscanf, you are casting a string to the address of a char. Using the %s in fscanf should set the string to a pointer, not an address:
char *dm;
test = fscanf(inp,"%s", dm);
The * symbol declares an indirection, namely, the variable pointed to by dm. The fscanf line will declare dm as a reference to the string captured with the %s delimiter. It will point to the address of the first char in the string.
What kit said is correct too, the strcmp command should be used, not the == compare, as == will just compare the addresses of the strings.
Edit: What kit says below is correct. All pointers should be allocated memory before they are used, or else should be cast to a pre-allocated memory space. You can allocate memory like this:
dm = (char*)malloc(sizeof(char) * STRING_LENGTH);
where STRING_LENGTH is a maximum length of a possible string. This memory allocation only has to be done once.
The problem is you've declared 'dm' as a char, not a malloc'd char* or char[BUFSIZE]
http://www.cplusplus.com/reference/clibrary/cstdio/fscanf/
You'll also probably report incorrect line numbers, you'll need to scan the read-in buffer for '\n' occurences, and handle the case where your desired string lies across buffer boundaries.

Read in text file - 1 character at a time. using C

I'm trying to read in a text file line by line and process each character individually.
For example, one line in my text file might look like this:
ABC XXXX XXXXXXXX ABC
There will always be a different amount of spaces in the line. But the same number of characters (including spaces).
This is what I have so far...
char currentLine[100];
fgets(currentLine, 22, inputFile);
I'm then trying to iterate through the currentLine Array and work with each character...
for (j = 0; j<22; j++) {
if (&currentLine[j] == 'x') {
// character is an x... do something
}
}
Can anyone help me with how I should be doing this?
As you can probably tell - I've just started using C.
Something like the following is the canonical way to process a file character by character:
#include <stdio.h>
int main(int argc, char **argv)
{
FILE *fp;
int c;
if (argc != 2) {
fprintf(stderr, "Usage: %s file.txt\n", argv[0]);
exit(1);
}
if (!(fp = fopen(argv[1], "rt"))) {
perror(argv[1]);
exit(1);
}
while ((c = fgetc(fp)) != EOF) {
// now do something with each character, c.
}
fclose(fp);
return 0;
}
Note that c is declared int, not char because EOF has a value that is distinct from all characters that can be stored in a char.
For more complex parsing, then reading the file a line at a time is generally the right approach. You will, however, want to be much more defensive against input data that is not formatted correctly. Essentially, write the code to assume that the outside world is hostile. Never assume that the file is intact, even if it is a file that you just wrote.
For example, you are using a 100 character buffer to read lines, but limiting the amount read to 22 characters (probably because you know that 22 is the "correct" line length). The extra buffer space is fine, but you should allow for the possibility that the file might contain a line that is the wrong length. Even if that is an error, you have to decide how to handle that error and either resynchronize your process or abandon it.
Edit: I've added some skeleton of an assumed rest of the program for the canonical simple case. There are couple of things to point out there for new users of C. First, I've assumed a simple command line interface to get the name of the file to process, and verified using argc that an argument is really present. If not, I print a brief usage message taking advantage of the content of argv[0] which by convention names the current program in some useful way, and exit with a non-zero status.
I open the file for reading in text mode. The distinction between text and binary modes is unimportant on Unix platforms, but can be important on others, especially Windows. Since the discussion is of processing the file a character at a time, I'm assuming that the file is text and not binary. If fopen() fails, then it returns NULL and sets the global variable errno to a descriptive code for why it failed. The call to perror() translates errno to something human-readable and prints it along with a provided string. Here I've provided the name of the file we attempted to open. The result will look something like "foo.txt: no such file". We also exit with non-zero status in this case. I haven't bothered, but it is often sensible to exit with distinct non-zero status codes for distinct reasons, which can help shell scripts make better sense of errors.
Finally, I close the file. In principle, I should also test the fclose() for failure. For a process that just reads a file, most error conditions will already have been detected as some kind of content error, and there will be no useful status added at the close. For file writing, however, you might not discover certain I/O errors until the call to fclose(). When writing a file it is good practice to check return codes and expect to handle I/O errors at any call that touches the file.
You don't need the address operator (&). You're trying to compare the value of the variable currentLine[j] to 'x', not it's address.
ABC XXXX XXXXXXXX ABC has 21 characters. There's also the line break (22 chars) and the terminating null byte (23 chars).
You need to fgets(currentLine, 23, inputFile); to read the full line.
But you declared currentLine as an array of 100. Why not use all of it?
fgets(currentLine, sizeof currentLine, inputFile);
When using all of it, it doesn't mean that the system will put more than a line each time fgets is called. fgets always stops after reading a '\n'.
Try
while( fgets(currentLine, 100, inputFile) ) {
for (j = 0; j<22; j++) {
if (/*&*/currentLine[j] == 'x') { /* <--- without & */
// character is an x... do something
}
}
}

Resources