Tokenizing user input in C (store in **arg)?

Tokenizing user input in C (store in **arg)? - c

I'm attempting to write a simple shell like interface, that takes in a users input (by char) and stores it via a pointer to a pointer* (exactly how argv works). Here's my code:
char input[100];
char **argvInput;
char ch;
int charLoop = 0;
int wordCount = 0;
argvInput = malloc(25 * sizeof(char *));
while((ch = getc(stdin))) {
if ((ch == ' ' || ch == '\n') && charLoop != 0) {
input[charLoop] = '\0';
argvInput[wordCount] = malloc((charLoop + 1) * sizeof(char));
argvInput[wordCount] = input;
charLoop = 0;
wordCount++;
if (ch == '\n') {
break;
}
} else if (ch != ' ' && ch != '\n') {
input[charLoop] = ch;
charLoop++;
} else {
break;
}
}
If I loop through argvInput via:
int i = 0;
for (i = 0; i < wordCount; i++)
printf("Word %i: %s\n", i, argvInput[i]);
All of the values of argvInput[i] are whatever the last input assignment was. So if I type:
"happy days are coming soon", the output of the loop is:
Word 0: soon
Word 1: soon
Word 2: soon
Word 3: soon
Word 4: soon
I'm at a loss. Clearly each loop is overwriting the previous value, but I'm staring at the screen, unable to figure out why...

This line is your bane:
argvInput[wordCount] = input;
Doesn't matter that you allocate new space, if you're going to replace the pointer to it with another one (i.e. input).
Rather, use strncpy to extract parts of the input into argvInput[wordCount].

argvInput[wordCount] = input; is only making the pointer of argvInput[wordCount] point to the memory of input instead of copy the content of input into the new allocated memory. You should use memcpy or strcpy to correct your program.
After the pointer assignment the memory status looks like the image below. The memory allocated by malloc((charLoop + 1) * sizeof(char));, which are the grey ones in the graph, could not be accessed by your program anymore and this will lead to some memory leak issue. Please take care of that.

I suggest printing your argvInput pointers with %p, instead of %s, to identify this problem: printf("Word %i: %p\n", i, (void *) argvInput[i]);
What do you notice about the values it prints? How does this differ from the behaviour of argv? Try printing the pointers of argv: for (size_t x = 0; x < argc; x++) { printf("Word %zu: %p\n", x, (void *) argv[x]); }
Now that you've observed the problem, explaining it might become easier.
This code allocates memory, and stores a pointer to that memory in argvInput[wordCount]: argvInput[wordCount] = malloc((charLoop + 1) * sizeof(char)); (by the way, sizeof char is always 1 in C, so you're multiplying by 1 unnecessarily).
This code replaces that pointer to allocated memory with a pointer to input: argvInput[wordCount] = input; ... Hence, all of your items contain a pointer to the same array: input, and your allocated memory leaks because you lose reference to it. Clearly, this is the problematic line; It doesn't do what you initially thought it does.
It has been suggested that you replace your malloc call with a strdup call, and remove the problematic line. I don't like this suggestion, because strdup isn't in the C standard, and so it isn't required to exist.
strncpy will work, but it's unnecessarily complex. strcpy is guaranteed to work just as well because the destination array is allocated to be large enough to store the string. Hence, I recommend replacing the problematic line with strcpy(argvInput[wordCount], input);.
Another option that hasn't been explained in detail is strtok. It seems this is best left unexplored for now, because it would require too much modification to your code.
I have a bone to pick with this code: char ch; ch = getc(stdin); is wrong. getc returns an int for a reason: Any successful character read will be returned in the form of an unsigned char value, which can't possibly be negative. If getc encounters EOF or an error, it'll return a negative value. Once you assign the return value to ch, how do you differentiate between an error and a success?
Have you given any thought as to what happens if the first character is ' '? Currently, your code would break out of the loop. This seems like a bug, if your code is to mimic common argv parsing behaviours. Adapting this code to solve your problem might be a good idea:
for (int c = getc(stdin); c >= 0; c = getc(stdin)) {
if (c == '\n') {
/* Terminate your argv array and break out of the loop */
}
else if (c != ' ') {
/* Copy c into input */
}
else if (charLoop != 0) {
/* Allocate argvInput[wordCount] and copy input into it,
* reset charLoop and increment wordCount */
}
}

Related

I have a 'Segmentation Problem' while printing parsed parts of a String

I am writing a simple Shell for school assignment and stuck with a segmentation problem. Initially, my shell parses the user input to remove whitespaces and endofline character, and seperate the words inside the input line to store them in a char **args array. I can seperate the words and can print them without any problem, but when storing the words into a char **args array, and if argument number is greater than 1 and is odd, I get a segmentation error.
I know the problem is absurd, but I stuck with it. Please help me.
This is my parser code and the problem occurs in it:
char **parseInput(char *input){
int idx = 0;
char **parsed = NULL;
int parsed_idx = 0;
while(input[idx]){
if(input[idx] == '\n'){
break;
}
else if(input[idx] == ' '){
idx++;
}
else{
char *word = (char*) malloc(sizeof(char*));
int widx = 0; // Word index
word[widx] = input[idx];
idx++;
widx++;
while(input[idx] && input[idx] != '\n' && input[idx] != ' '){
word = (char*)realloc(word, (widx+1)*sizeof(char*));
word[widx] = input[idx];
idx++;
widx++;
}
word = (char*)realloc(word, (widx+1)*sizeof(char*));
word[widx] = '\0';
printf("Word[%d] --> %s\n", parsed_idx, word);
if(parsed == NULL){
parsed = (char**) malloc(sizeof(char**));
parsed[parsed_idx] = word;
parsed_idx++;
}else{
parsed = (char**) realloc(parsed, (parsed_idx+1)*sizeof(char**));
parsed[parsed_idx] = word;
parsed_idx++;
}
}
}
int i = 0;
while(parsed[i] != NULL){
printf("Parsed[%d] --> %s\n", i, parsed[i]);
i++;
}
return parsed;
}

In your code you have the loop
while(parsed[i] != NULL) { ... }
The problem is that the code never sets any elements of parsed to be a NULL pointer.
That means the loop will go out of bounds, and you will have undefined behavior.
You need to explicitly set the last element of parsed to be a NULL pointer after you parsed the input:
while(input[idx]){
// ...
}
parsed[parsed_idx] = NULL;
On another couple of notes:
Don't assign back to the same pointer you pass to realloc. If realloc fails it will return a NULL pointer, but not free the old memory. If you assign back to the pointer you will loose it and have a memory leak. You also need to be able to handle this case where realloc fails.
A loop like
int i = 0;
while (parsed[i] != NULL)
{
// ...
i++;
}
is almost exactly the same as
for (int i = 0; parsed[i] != NULL; i++)
{
// ...
}
Please use a for loop instead, it's usually easier to read and follow. Also for a for loop the "index" variable (i in your code) will be in a separate scope, and not available outside of the loop. Tighter scope for variables leads to less possible problems.
In C you shouldn't really cast the result of malloc (or realloc) (or really any function returning void *). If you forget to #include <stdlib.h> it could lead to hard to diagnose problems.

Also, a beginner might find the -pedantic switch helpful on your call to the compiler. That switch would have pointed up most of the other suggestions made here. I personally am also a fan of -Wall, though many find it annoying instead of helpful.

Calling free() after malloc causes unexpected behaviour

Hi I read that I should call free() as soon as I could do that to free the memory but when I call free in this way my code stops working correctly. what's the problem?
I want to call free() in every iteration and when an error occurs.
int read_words(char *words[], int size, int max_str_len) {
int i, j;
char *ExtendedWord = NULL;
for (i = 0; i < size && size != -1; ++i) {
char tmp[1], ch, *word = tmp;
for (j = 0; j < max_str_len; ++j) {
if (scanf("%c", &ch) == EOF || ch == 'R') {
size = -1;
break;
}
if (ch == ' ')
break;
word[j] = ch;
ExtendedWord = malloc((i + 2) * sizeof(char));
if (ExtendedWord == NULL)
return -1;
strcpy(ExtendedWord, word);
word = ExtendedWord;
free(ExtendedWord);
}
word[j] = '\0';
words[i] = word;
}
return i;
}

strcpy(ExtendedWord,word);
strcpy() expects as 2nd parameter the address of the 1st character of a "C"-string, which in fact is a char-array with at least one element being equal to '\0'.
The memory word points to does not meet such requirements.
Due to this the infamous undefined behaviour is invoked, probably messing up the program's memory management, which in turn causes free() to fail.

There are multiple problems in your code:
you free the newly allocated block instead of the previous one.
you so not null terminate the string before passing it to strcpy
word should be initialized as NULL or to a block of allocated memory, not to point to a local array which you cannot pass to free().
you should reallocate the array before copying the new character at its end.
Here is a modified version:
int read_words(char *words[], int size, int max_str_len) {
int i, j;
for (i = 0; i < size; i++) {
char *word = malloc(1);
if (word == NULL)
return -1;
for (j = 0; j < max_str_len; ++j) {
int ch;
char *ExtendedWord;
if ((ch = getchar()) == EOF || ch == 'R') {
size = -1;
break;
}
if (ch == ' ' || c == '\n')
break;
/* reallocate array for one more character and a null terminator */
ExtendedWord = malloc(i + 2);
if (ExtendedWord == NULL)
return -1;
memcpy(ExtendedWord, word, i);
free(word);
word = ExtendedWord;
word[j] = ch;
}
if (size == -1) {
free(word);
break;
}
word[j] = '\0';
words[i] = word;
}
return i;
}

I Read that I should call free() as soon as I could do that to free the memory
That description is a little ambiguous. It is reasonable only if you interpret "as soon as I could do that" to mean the same as "as soon as I no longer need the allocated memory".
but when I call free in this way my code stops working correctly. what's the problem?
The problem with respect to free is that you free the memory before you are done with it. Subsequently attempting to access that memory produces undefined behavior.
There are other problems with the code, too, discussed in other answers, but this is how the free fits into the picture.
I want to call free in every iteration and when an error occurs.
Inasmuch as it appears that your function intends to provide pointers to the allocated memory to its caller via the words array, you must not free that memory anywhere within the scope of the function, because the caller (it must be presumed) intends to use it. Therefore the caller must assume the responsibility for freeing it. The function's documentation should clearly describe that responsibility.
Perhaps the confusion arises here:
word=ExtendedWord;
It is essential to understand that the assignment copies the pointer, not the space to which it points. Afterward, word points to the same (dynamically allocated) space that ExtendedWord does, so that freeing ExtendedWord invalidates both copies of the pointer.

Return a string made with a line read from input

i am trying to code a C function which returns a line read from the input as a char* . I am on Windows and i test my program in the command line by giving files as input and output of my program like this:
cl program.c
program < test_in.txt > test_out.txt
This is my (not working) function:
char* getLine(void)
{
char* result = "";
int i, c;
i = 1;
while((c = getchar()) != EOF)
{
*result++ = c;
i++;
if(c == '\n')
return result - i;
}
return result - i;
}
I was expecting it to work because i previously wrote:
char* getString(char* string)
{
//char* result = string; // the following code achieve this.
char* result = "";
int i;
for(i = 1; *result++ = *string++; i++);
return result - i;
}
And these lines of code have a correct behaviour.
Even if every answers will be appreciated, i would be really thankfull
if any of you could explain me why my getString() function works while my getLine() function doesn't.

Your function does not allocate enough space for the string being read. The variable char* result = "" defines a char pointer to a string literal ("", empty string), and you store some arbitrary number of characters into the location pointed to by result.
char* getLine(void)
{
char* result = ""; //you need space to store input
int i, c;
i = 1;
while((c = getchar()) != EOF)
{
*result++ = c; //you should check space
i++;
if(c == '\n')
return result - i; //you should null-terminate
}
return result - i; //you should null-terminate
}
You need to allocate space for your string, which is challenging because you don't know how much space you are going to need a priori. So you need to decide whether to limit how much you read (ala fgets), or dynamically reallocate space as you read more. Also, how to you indicate that you have finished input (reached EOF)?
The following alternative assumes dynamic reallocation is your chosen strategy.
char* getLine(void)
{
int ch; int size=100; size_t pos=0;
char* result = malloc(size*sizeof(char*));
while( (ch=getchar()) != EOF )
{
*result++ = ch;
if( ++pos >= size ) {
realloc(result,size+=100);
//or,realloc(result,size*=2);
if(!result) exit(1); //realloc failed
}
if( c=='\n' ) break;
}
*result = '\0'; //null-terminate
return result - pos;
}
When you are done with the string returned from the above function, please remember to free() the allocated space.
This alternative assumes you provide a buffer to store the string (and specifies the size of the buffer).
char* getLine(char* buffer, size_t size)
{
int ch;
char* result = buffer;
size_t pos=0;
while( (ch=getchar()) != EOF )
{
*result++ = ch;
if( ++pos >= size ) break; //full
if( c=='\n' ) break;
}
*result = '\0'; //null-terminate
return buffer;
}
Both avoid the subtle interaction between detecting EOF, and having enough space to store a character read. The solution is to buffer a character if you read and there is not enough room, and then inject that on a subsequent read. You will also need to null-ter

Both functions have undefined behaviour since you are modifying string literals. It just seems to work in one case. Basically, result needs to point to memory that can be legally accessed, which is not the case in either of the snippets.
On the same subject, you might find this useful: What Every C Programmer Should Know About Undefined Behavior.

Think of it this way.
When you say
char* result = "";
you are setting up a pointer 'result' to point to a 1-byte null terminated string (just the null). Since it is a local variable it will be allocated on the stack.
Then when you say
*result++ = c;
you are storing that value 'c' in to that address + 1.
So, where are you putting it?
Well, most stacks are to-down; so they grow toward lower addresses; so, you are probably writing over what is already on the stack (the return address for whatever called this, all the registers it needs restore and all sorts of important stuff).
That is why you have to be very careful with pointers.

When you expect to return a string from a function, you have two options (1) provide a string to the function with adequate space to hold the string (including the null-terminating character), or (2) dynamically allocate memory for the string within the function and return a pointer. Within your function you must also have a way to insure your are not writing beyond the end of the space available and you are leaving room for the null-terminating character. That requires passing a maximum size if you are providing the array to the function, and keeping count of the characters read.
Putting that together, you could do something similar to:
#include <stdio.h>
#define MAXC 256
char* getLine (char *s, int max)
{
int i = 0, c = 0;
char *p = s;
while (i + 1 < max && (c = getchar()) != '\n' && c != EOF) {
*p++ = c;
i++;
}
*p = 0;
return s;
}
int main (void) {
char buf[MAXC] = {0};
printf ("\ninput : ");
getLine (buf, MAXC);
printf ("output: %s\n\n", buf);
return 0;
}
Example/Output
$ ./bin/getLine
input : A quick brown fox jumps over the lazy dog.
output: A quick brown fox jumps over the lazy dog.

how to put char * into array so that I can use it in qsort, and then move on to the next line

I have lineget function that returns char *(it detects '\n') and NULL on EOF.
In main() I'm trying to recognize particular words from that line.
I used strtok:
int main(int argc, char **argv)
{
char *line, *ptr;
FILE *infile;
FILE *outfile;
char **helper = NULL;
int strtoks = 0;
void *temp;
infile=fopen(argv[1],"r");
outfile=fopen(argv[2],"w");
while(((line=readline(infile))!=NULL))
{
ptr = strtok(line, " ");
temp = realloc(helper, (strtoks)*sizeof(char *));
if(temp == NULL) {
printf("Bad alloc error\n");
free(helper);
return 0;
} else {
helper=temp;
}
while (ptr != NULL) {
strtoks++;
fputs(ptr, outfile);
fputc(' ', outfile);
ptr = strtok(NULL, " ");
helper[strtoks-1] = ptr;
}
/*fputs(line, outfile);*/
free(line);
}
fclose(infile);
fclose(outfile);
return 0;
}
Now I have no idea how to put every of tokenized words into an array (I created char ** helper for that purpose), so that it can be used in qsort like qsort(helper, strtoks, sizeof(char*), compare_string);.
Ad. 2 Even if it would work - I don't know how to clear that line, and proceed to sorting next one. How to do that?
I even crashed valgrind (with the code presented above) -> "valgrind: the 'impossible' happened:
Killed by fatal signal"
Where is the mistake ?

The most obvious problem (there may be others) is that you're reallocating helper to the value of strtoks at the beginning of the line, but then incrementing strtoks and adding to the array at higher values of strtoks. For instance, on the first line, strtoks is 0, so temp = realloc(helper, (strtoks)*sizeof(char *)); leaves helper as NULL, but then you try to add every word on that line to the helper array.
I'd suggest an entirely different approach which is conceptually simpler:
char buf[1000]; // or big enough to be bigger than any word you'll encounter
char ** helper;
int i, numwords;
while(!feof(infile)) { // most general way of testing if EOF is reached, since EOF
// is just a macro and may not be machine-independent.
for(i = 0; (ch = fgetc(infile)) != ' ' && ch != '\n'; i++) {
// get chars one at a time until we hit a space or a newline
buf[i] = ch; // add char to buffer
}
buf[i + 1] = '\0' // terminate with null byte
helper = realloc(++numwords * sizeof(char *)); // expand helper to fit one more word
helper[numwords - 1] = strdup(buffer) // copy current contents of buffer to the just-created element of helper
}
I haven't tested this so let me know if it's not correct or there's anything you don't understand. I've left out the opening and closing of files and the freeing at the end (remember you have to free every element of helper before you free helper itself).

As you can see in strtok's prototype:
char * strtok ( char * str, const char * delimiters );
...str is not const. What strtok actually does is replace found delimiters by null bytes (\0) into your str and return a pointer to the beginning of the token.
Per example:
char in[] = "foo bar baz";
char *toks[3];
toks[0] = strtok(in, " ");
toks[1] = strtok(NULL, " ");
toks[2] = strtok(NULL, " ");
printf("%p %s\n%p %s\n%p %s\n", toks[0], toks[0], toks[1], toks[1],
toks[2], toks[2]);
printf("%p %s\n%p %s\n%p %s\n", &in[0], &in[0], &in[4], &in[4],
&in[8], &in[8]);
Now look at the results:
0x7fffd537e870 foo
0x7fffd537e874 bar
0x7fffd537e878 baz
0x7fffd537e870 foo
0x7fffd537e874 bar
0x7fffd537e878 baz
As you can see, toks[1] and &in[4] point to the same location: the original str has been modified, and in reality all tokens in toks point to somewhere in str.
In your case your problem is that you free line:
free(line);
...invalidating all your pointers in helper. If you (or qsort) try to access helper[0] after freeing line, you end up accessing freed memory.
You should copy the tokens instead, e.g.:
ptr = strtok(NULL, " ");
helper[strtoks-1] = malloc(strlen(ptr) + 1);
strcpy(helper[strtoks-1], ptr);
Obviously, you will need to free each element of helper afterwards (in addition to helper itself).

You should be getting a 'Bad alloc' error because:
char **helper = NULL;
int strtoks = 0;
...
while ((line = readline(infile)) != NULL) /* Fewer, but sufficient, parentheses */
{
ptr = strtok(line, " ");
temp = realloc(helper, (strtoks)*sizeof(char *));
if (temp == NULL) {
printf("Bad alloc error\n");
free(helper);
return 0;
}
This is because the value of strtoks is zero, so you are asking realloc() to free the memory pointed at by helper (which was itself a null pointer). One outside chance is that your library crashes on realloc(0, 0), which it shouldn't but it is a curious edge case that might have been overlooked. The other possibility is that realloc(0, 0) returns a non-null pointer to 0 bytes of data which you are not allowed to dereference. When your code dereferences it, it crashes. Both returning NULL and returning non-NULL are allowed by the C standard; don't write code that crashes regardless of which behaviour realloc() shows. (If your implementation of realloc() does not return a non-NULL pointer for realloc(0, 0), then I'm suspicious that you aren't showing us exactly the code that managed to crash valgrind (which is a fair achievement — congratulations) because you aren't seeing the program terminate under control as it should if realloc(0, 0) returns NULL.)
You should be able to avoid that problem if you use:
temp = realloc(helper, (strtoks+1) * sizeof(char *));
Don't forget to increment strtoks itself at some point.

Reading from stdin (file of variable length)

So I've been trying to get this to assignment work in various different ways, but each time I get different errors. Basically what we have is a program that needs to read, byte by byte, the contents of a file that will be piped in (the file length could be humongous so we can't just call malloc and allocated a large chunk of space). We are required to use realloc to expand the amount of freed memory until we reach the end of the file. The final result should be one long C string (array) containing each byte (and we can't disregard null bytes either if they are part of the file). What I have at the moment is:
char *buff;
int n = 0;
char c;
int count;
if (ferror (stdin))
{
fprintf(stderr, "error reading file\n");
exit (1);
}
else
{
do {
buff = (char*) realloc (buff, n+1);
c = fgetc (stdin);
buff[n] = c;
if (c != EOF)
n++;
}
while (c != EOF);
}
printf("characters entered: ");
for (count = 0; count < n; count++)
printf("%s ", buff[count]);
free (buff);
It should keep reading until the end of the file, expanding the memory each time but when I try to run it by piping in a simple text file, it tells me I have a segmentation fault. I'm not quite sure what I'm doing wrong.
Note that we're allowed to use malloc and whatnot, but I couldn't see how to make that work since we have know idea how much memory is needed.

You are using an unassigned pointer buf in your first call to realloc. Change to
char *buf = malloc(100);
to avoid this problem.
Once you get it working, you'll notice that your program is rather inefficient, with a realloc per character. Consider realloc-ing in larger chunks to reduce the number of reallocations.

char* buff;
...
buff = (char*) realloc (buff, n+1);
You're trying to reallocate an unitialized pointer, which leads to undefined behaviour. Change to
char* buff = 0;
...
buff = (char*) realloc (buff, n+1);
But as has been pointed out, this is very inefficient.

Seems like the answers by #dasblinkenlight and #smocking are the current reason, but to avoid the next crashes:
Change char c; to int c;, as the EOF is represented by more than one char.
This is a bad idea to call realloc for one char at a time, instead increase the size in X bytes (let's say 100) each time, this will be MUCH more efficient.
You need to add the null terminator ('\0') at the end of the buffer, otherwise - undefined behavior at printf().

Here's what I came up with for reading stdin into a char[] or char* (when having embedded NULLs in stdin):
char* content = NULL;
char c;
int contentSize = 0;
while ((c = fgetc(stdin)) != EOF){
contentSize++;
content = (char*)(realloc(content, contentSize+1));
if (content == NULL) {
perror("Realloc failed.");
exit(2);
}
content[contentSize] = c;
}
for (int i = 0; i < contentSize; ++i) {
printf("%c",content[i]);
}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Tokenizing user input in C (store in **arg)? - c

This line is your bane: argvInput[wordCount] = input; Doesn't matter that you allocate new space, if you're going to replace the pointer to it with another one (i.e. input). Rather, use strncpy to extract parts of the input into argvInput[wordCount].

Related

I have a 'Segmentation Problem' while printing parsed parts of a String

Calling free() after malloc causes unexpected behaviour

Return a string made with a line read from input

how to put char * into array so that I can use it in qsort, and then move on to the next line

Reading from stdin (file of variable length)

Categories

Resources