Fill char*[] from strtok - c

I have problems getting following Code to work. It parses a users input into a char*[] and returns it. However the char* command[] does not accept any values and stays filled with NULL... whats going on here?
void* setCommands(int length){
char copy[strlen(commandline)]; //commandline is a char* read with gets();
strcpy(copy, commandline);
char* commands[length];
for (int x=0; x<length; x++)
commands[x] = "\0";
int i = 0;
char* temp;
temp = strtok (copy, " \t");
while (temp != NULL){
commands[i] = temp; //doesnt work here.. commands still filled with NULL afterwards
i++;
printf("word:%s\n", temp);
temp = strtok (NULL, " \t");
}
commands[i] = NULL;
for (int u=0; u<length; u++)
printf("%s ", commands[i]);
printf("\n");
return *commands;
}
You may assume, that commandline != NULL, length != 0

commands[i] = NULL;
for (int u=0; u<length; u++)
printf("%s ", commands[i]);
Take a very good look at that code. It uses u as the loop control variable but prints out the element based on i.
Hence, due to the fact you've set commands[i] to NULL in the line before the loop, you'll just get a series of NULLs.
Use commands[u] in the loop rather than commands[i].
In addition to that:
void* setCommands(int length){
char* commands[length];
:
return *commands;
}
will only return one pointer, the one to the first token, not the one to the array of token pointers. You cannot return addresses of local variables that are going out of scope (well, you can, but it may not work).
And, in any case, since that one pointer most likely points to yet another local variable (somewhere inside copy), it's also invalid.
If you want to pass back blocks of memory from functions, you'll need to look into using malloc, in this case both for the array of pointers and the strings themselves.

You have a number of issues... Your program will be exhibiting undefined behaviour currently, so until you address the issues you cannot hope to predict what's going on. Let's begin.
The following string is one character too short. You forgot to include a character for the string terminator ('\0'). This will lead to a buffer overrun during tokenising, which might be partly responsible for the behaviour you are seeing.
char copy[strlen(commandline)]; // You need to add 1
strcpy(copy, commandline);
The next part is your return value, but it's a temporary (local array). You are not allowed to return this. You should allocate it instead.
// Don't do this:
char* commands[length];
for (int x=0; x<length; x++)
commands[x] = "\0"; // And this is not the right way to zero a pointer
// Do this instead (calloc will zero your elements):
char ** commands = calloc( length, sizeof(char*) );
It's possible for the tokenising loop to overrun your buffer because you never check for length, so you should add in a test:
while( temp != NULL && i < length )
And because of the above, you can't just blindly set commands[i] to NULL after the loop. Either test i < length or just don't set it (you zeroed the array beforehand anyway).
Now let's deal with the return value. Currently you have this:
return *commands;
That returns a pointer to the first token in your temporary string (copy). Firstly, it looks like you actually intended to return an array of tokens, not just the first token. Secondly, you can't return a temporary string. So, I think you meant this:
return commands;
Now, to deal with those strings... There's an easy way, and a clever way. The easy way has already been suggested: you call strdup on each token before shoving them in memory. The annoying part of this is that when you clean up that memory, you have to go through the array and free each individual token.
Instead, let's do it all in one hit, by allocating the array AND the string storage in one call:
char **commands = malloc( length * sizeof(char*) + strlen(commandline) + 1 );
char *copy = (char*)(commands + length);
strcpy( copy, commandline );
The only thing I didn't do above is zero the array. You can do this after the tokenising loop, by just zeroing the remaining values:
while( i < length ) commands[i++] = NULL;
Now, when you return commands, you return an array of tokens which also contains its own token storage. To free the array and all strings it contains, you just do this:
free( commands );
Putting it all together:
void* setCommands(int length)
{
// Create array and string storage in one memory block.
char **commands = malloc( length * sizeof(char*) + strlen(commandline) + 1 );
if( commands == NULL ) return NULL;
char *copy = (char*)(commands + length);
strcpy( copy, commandline );
// Tokenise commands
int i = 0;
char *temp = strtok(copy, " \t");
while( temp != NULL && i < length )
{
commands[i++] = temp;
temp = strtok(NULL, " \t");
}
// Zero any unused tokens
while( i < length ) commands[i++] = NULL;
return commands;
}

Related

Freeing array of char pointers - only works when I have an even number of strings...?

Having a super strange problem in C that I think is related to string memory management but I've poured through this code and can't tell what I'm doing wrong. I'm setting up an array of pointers to strings, where each string in the array is a token broken up from strtok(), which operates on a line made from getline using stdin.
I've been able to break apart the tokens, store them, and print the array with no problem. I just am having trouble freeing each individual pointer afterwards. I get an error message saying "invalid pointer", but oddly enough, it's only when I input an odd number of words in my original string. Even when I input an odd number of words in my string, it still breaks them apart into tokens and stores them in the array, its just the memory freeing that's failing.
here's a snippet of what I have:
char *line = NULL;
size_t len = 0;
getline(&line, &len, stdin);
char *token = strtok(line, " \n");
char **parsed = malloc(sizeof(char*));
if(parsed == NULL){
printf("ruh roh scoob\n");
}
int numList = 0;
while(token!=NULL){
numList++;
parsed = realloc(parsed, sizeof(char*)*numList);
if(parsed == NULL){
printf("realloc failed :(\n");
}
int tokenLen = strlen(token);
parsed[numList-1] = malloc(sizeof(char)*tokenLen);
if(parsed[numList-1] == NULL){
printf("ruh roh raggy\n");
}
strcpy(parsed[numList-1], token);
token = strtok(NULL, " \n");
}
parsed[numList] = NULL;
for(int i = 0; i <= numList; i++){
printf("%d-%s", i, parsed[i]);
}
printf("\n");
for(int j = 0; j < numList; j++){
free(parsed[j]);
}
I thought I was allocating memory correctly and storing the data from my token pointers into the new pointers correctly but perhaps I was wrong. Any advice would be appreciated!
Inside the loop you call:
strcpy(parsed[numList-1], token);
This needs at least strlen(token) + 1 bytes, but you're allocating one too few:
int tokenLen = strlen(token);
parsed[numList-1] = malloc(sizeof(char)*tokenLen);
So the strcpy writes past end of buffer, probably corrupting malloc metadata on a random basis. Note that you may get away with a write past end of buffer, because malloc rounds up the allocation amount, both to avoid memory fragmentation and to ensure that dynamically allocated buffers keep proper alignment (for better performance).

I have a 'Segmentation Problem' while printing parsed parts of a String

I am writing a simple Shell for school assignment and stuck with a segmentation problem. Initially, my shell parses the user input to remove whitespaces and endofline character, and seperate the words inside the input line to store them in a char **args array. I can seperate the words and can print them without any problem, but when storing the words into a char **args array, and if argument number is greater than 1 and is odd, I get a segmentation error.
I know the problem is absurd, but I stuck with it. Please help me.
This is my parser code and the problem occurs in it:
char **parseInput(char *input){
int idx = 0;
char **parsed = NULL;
int parsed_idx = 0;
while(input[idx]){
if(input[idx] == '\n'){
break;
}
else if(input[idx] == ' '){
idx++;
}
else{
char *word = (char*) malloc(sizeof(char*));
int widx = 0; // Word index
word[widx] = input[idx];
idx++;
widx++;
while(input[idx] && input[idx] != '\n' && input[idx] != ' '){
word = (char*)realloc(word, (widx+1)*sizeof(char*));
word[widx] = input[idx];
idx++;
widx++;
}
word = (char*)realloc(word, (widx+1)*sizeof(char*));
word[widx] = '\0';
printf("Word[%d] --> %s\n", parsed_idx, word);
if(parsed == NULL){
parsed = (char**) malloc(sizeof(char**));
parsed[parsed_idx] = word;
parsed_idx++;
}else{
parsed = (char**) realloc(parsed, (parsed_idx+1)*sizeof(char**));
parsed[parsed_idx] = word;
parsed_idx++;
}
}
}
int i = 0;
while(parsed[i] != NULL){
printf("Parsed[%d] --> %s\n", i, parsed[i]);
i++;
}
return parsed;
}
In your code you have the loop
while(parsed[i] != NULL) { ... }
The problem is that the code never sets any elements of parsed to be a NULL pointer.
That means the loop will go out of bounds, and you will have undefined behavior.
You need to explicitly set the last element of parsed to be a NULL pointer after you parsed the input:
while(input[idx]){
// ...
}
parsed[parsed_idx] = NULL;
On another couple of notes:
Don't assign back to the same pointer you pass to realloc. If realloc fails it will return a NULL pointer, but not free the old memory. If you assign back to the pointer you will loose it and have a memory leak. You also need to be able to handle this case where realloc fails.
A loop like
int i = 0;
while (parsed[i] != NULL)
{
// ...
i++;
}
is almost exactly the same as
for (int i = 0; parsed[i] != NULL; i++)
{
// ...
}
Please use a for loop instead, it's usually easier to read and follow. Also for a for loop the "index" variable (i in your code) will be in a separate scope, and not available outside of the loop. Tighter scope for variables leads to less possible problems.
In C you shouldn't really cast the result of malloc (or realloc) (or really any function returning void *). If you forget to #include <stdlib.h> it could lead to hard to diagnose problems.
Also, a beginner might find the -pedantic switch helpful on your call to the compiler. That switch would have pointed up most of the other suggestions made here. I personally am also a fan of -Wall, though many find it annoying instead of helpful.

append a character from an array to a char pointer

Ok, so I'm a person who usually writes Java/C++, and I've just started getting into writing C. I'm currently writing a lexical analyser, and I can't stand how strings work in C, since I can't perform string arithmetic. So here's my question:
char* buffer = "";
char* line = "hello, world";
int i;
for (i = 0; i < strlen(line); i++) {
buffer += line[i];
}
How can I do that in C? Since the code above isn't valid C, how can I do something like that?
Basically I'm looping though a string line, and I'm trying to append each character to the buffer string.
string literals are immutable in C. Modifying one causes Undefined Behavior.
If you use a char array (your buffer) big enough to hold your characters, you can still modify its content :
#include <stdio.h>
int main(void) {
char * line = "hello, world";
char buffer[32]; // ok, this array is big enough for our operation
int i;
for (i = 0; i < strlen(line) + 1; i++)
{
buffer[i] = line[i];
}
printf("buffer : %s", buffer);
return 0;
}
First off the buffer needs to have or exceed the length of the data being copied to it.
char a[length], b[] = "string";
Then the characters are copied to the buffer.
int i = 0;
while (i < length && b[i] != '\0') { a[i] = b[i]; i++; }
a[i] = '\0';
You can reverse the order if you need to, just start i at the smallest length value among the two strings, and decrement the value instead of increment. You can also use the heap, as others have suggested, ordinate towards an arbitrary or changing value of length. Furthermore, you can change up the snippet with pointers (and to give you a better idea of what is happening):
int i = 0;
char *j = a, *k = b;
while (j - a < length && *k) { *(j++) = *(k++); }
*j = '\0';
Make sure to look up memcpy; and don't forget null terminators (oops).
#include <string.h>
//...
char *line = "hello, world";
char *buffer = ( char * ) malloc( strlen( line ) + 1 );
strcpy( buffer, line );
Though in C string literals have types of non-const arrays it is better to declare pointers initialized by string literals with qualifier const:
const char *line = "hello, world";
String literals in C/C++ are immutable.
If you want to append characters then the code can look the following way (each character of line is appended to buffer in a loop)
#include <string.h>
//...
char *line = "hello, world";
char *buffer = ( char * ) malloc( strlen( line ) + 1 );
buffer[0] = '\0';
char *p = Buffer;
for ( size_t i = 0; i < strlen( line ); i++ )
{
*p++ = line[i];
*p = '\0';
}
The general approach is that you find the pointer to the terminating zero substitute it for the target character advance the pointer and appenf the new terminating zero. The source buffer shall be large enough to accomodate one more character.
If you want to append a single character to a string allocated on the heap, here's one way to do it:
size_t length = strlen(buffer);
char *newbuffer = realloc(buffer, length + 2);
if (newbuffer) { // realloc succeeded
buffer = newbuffer;
buffer[length] = newcharacter;
buffer[length + 1] = '\0';
}
else { // realloc failed
// TODO handle error...
free(buffer); // for example
}
However, this is inefficient to do repeatedly in a loop, because you'll be repeatedly calling strlen() on (essentially) the same string, and reallocating the buffer to fit one more character each time.
If you want to be smarter about your reallocations, keep track of the buffer's current allocated capacity separately from the length of the string within it — if you know C++, think of the difference between a std::string object's "size" and its "capacity" — and when it's necessary to reallocate, multiply the buffer's size by a scaling factor (e.g. double it) instead of adding 1, so that the number of reallocations is O(log n) instead of O(n).
This is the sort of thing that a good string class would do in C++. In C, you'll probably want to move this buffer-management stuff into its own module.
The simplest solution, lacking any context, is to do:
char buffer[ strlen(line) + 1 ];
strcpy(buffer, line);
You may be used to using pointers for everything in Java (since non-primitive types in Java are actually more like shared pointers than anything else). However you don't necessarily have to do this in C and it can be a pain if you do.
Maybe a good idea given your background would be to use a counted string object in C, where the string object owns its data. Write struct my_string { char *data; size_t length; } . Write functions for creating, destroying, duplicating, and any other operation you need such as appending a character, or checking the length. (Separate interface from implementation!) A useful addition to this would be to make it allocate 1 more byte than length, so that you can have a function which null-terminates and allows it to be passed to a function that expects a read-only C-style string.
The only real pitfall here is to remember to call a function when you are doing a copy operation, instead of allowing structure assignment to happen. (You can use structure assignment for a move operation of course!)
The asprintf function is very useful for building strings, and is available on GNU-based systems (Linux), or most *BSD based systems. You can do things like:
char *buffer;
if (asprintf(&buffer, "%s: adding some stuff %d - %s", str1, number, str2) < 0) {
fprintf(stderr, "Oops -- out of memory\n");
exit(1); }
printf("created the string \"%s\"\n", buffer);
free(buffer); /* done with it */
Appending is best done with snprintf
Include the stdio.h header
#include <stdio.h>
then
char* buffer;
char line[] = "hello, world";
// Initialise the pointer to an empty string
snprintf(buffer, 1, "%s", "");
for (i = 0; i < strlen(line); ++i) {
snprintf(buffer, sizeof line[i], "%s%s", buffer, line[i]);
}
As you have started the code you have there is different from the question you are asking.
You could have split the line with strtok though.
But I hope my answer clarifies it.

how to put char * into array so that I can use it in qsort, and then move on to the next line

I have lineget function that returns char *(it detects '\n') and NULL on EOF.
In main() I'm trying to recognize particular words from that line.
I used strtok:
int main(int argc, char **argv)
{
char *line, *ptr;
FILE *infile;
FILE *outfile;
char **helper = NULL;
int strtoks = 0;
void *temp;
infile=fopen(argv[1],"r");
outfile=fopen(argv[2],"w");
while(((line=readline(infile))!=NULL))
{
ptr = strtok(line, " ");
temp = realloc(helper, (strtoks)*sizeof(char *));
if(temp == NULL) {
printf("Bad alloc error\n");
free(helper);
return 0;
} else {
helper=temp;
}
while (ptr != NULL) {
strtoks++;
fputs(ptr, outfile);
fputc(' ', outfile);
ptr = strtok(NULL, " ");
helper[strtoks-1] = ptr;
}
/*fputs(line, outfile);*/
free(line);
}
fclose(infile);
fclose(outfile);
return 0;
}
Now I have no idea how to put every of tokenized words into an array (I created char ** helper for that purpose), so that it can be used in qsort like qsort(helper, strtoks, sizeof(char*), compare_string);.
Ad. 2 Even if it would work - I don't know how to clear that line, and proceed to sorting next one. How to do that?
I even crashed valgrind (with the code presented above) -> "valgrind: the 'impossible' happened:
Killed by fatal signal"
Where is the mistake ?
The most obvious problem (there may be others) is that you're reallocating helper to the value of strtoks at the beginning of the line, but then incrementing strtoks and adding to the array at higher values of strtoks. For instance, on the first line, strtoks is 0, so temp = realloc(helper, (strtoks)*sizeof(char *)); leaves helper as NULL, but then you try to add every word on that line to the helper array.
I'd suggest an entirely different approach which is conceptually simpler:
char buf[1000]; // or big enough to be bigger than any word you'll encounter
char ** helper;
int i, numwords;
while(!feof(infile)) { // most general way of testing if EOF is reached, since EOF
// is just a macro and may not be machine-independent.
for(i = 0; (ch = fgetc(infile)) != ' ' && ch != '\n'; i++) {
// get chars one at a time until we hit a space or a newline
buf[i] = ch; // add char to buffer
}
buf[i + 1] = '\0' // terminate with null byte
helper = realloc(++numwords * sizeof(char *)); // expand helper to fit one more word
helper[numwords - 1] = strdup(buffer) // copy current contents of buffer to the just-created element of helper
}
I haven't tested this so let me know if it's not correct or there's anything you don't understand. I've left out the opening and closing of files and the freeing at the end (remember you have to free every element of helper before you free helper itself).
As you can see in strtok's prototype:
char * strtok ( char * str, const char * delimiters );
...str is not const. What strtok actually does is replace found delimiters by null bytes (\0) into your str and return a pointer to the beginning of the token.
Per example:
char in[] = "foo bar baz";
char *toks[3];
toks[0] = strtok(in, " ");
toks[1] = strtok(NULL, " ");
toks[2] = strtok(NULL, " ");
printf("%p %s\n%p %s\n%p %s\n", toks[0], toks[0], toks[1], toks[1],
toks[2], toks[2]);
printf("%p %s\n%p %s\n%p %s\n", &in[0], &in[0], &in[4], &in[4],
&in[8], &in[8]);
Now look at the results:
0x7fffd537e870 foo
0x7fffd537e874 bar
0x7fffd537e878 baz
0x7fffd537e870 foo
0x7fffd537e874 bar
0x7fffd537e878 baz
As you can see, toks[1] and &in[4] point to the same location: the original str has been modified, and in reality all tokens in toks point to somewhere in str.
In your case your problem is that you free line:
free(line);
...invalidating all your pointers in helper. If you (or qsort) try to access helper[0] after freeing line, you end up accessing freed memory.
You should copy the tokens instead, e.g.:
ptr = strtok(NULL, " ");
helper[strtoks-1] = malloc(strlen(ptr) + 1);
strcpy(helper[strtoks-1], ptr);
Obviously, you will need to free each element of helper afterwards (in addition to helper itself).
You should be getting a 'Bad alloc' error because:
char **helper = NULL;
int strtoks = 0;
...
while ((line = readline(infile)) != NULL) /* Fewer, but sufficient, parentheses */
{
ptr = strtok(line, " ");
temp = realloc(helper, (strtoks)*sizeof(char *));
if (temp == NULL) {
printf("Bad alloc error\n");
free(helper);
return 0;
}
This is because the value of strtoks is zero, so you are asking realloc() to free the memory pointed at by helper (which was itself a null pointer). One outside chance is that your library crashes on realloc(0, 0), which it shouldn't but it is a curious edge case that might have been overlooked. The other possibility is that realloc(0, 0) returns a non-null pointer to 0 bytes of data which you are not allowed to dereference. When your code dereferences it, it crashes. Both returning NULL and returning non-NULL are allowed by the C standard; don't write code that crashes regardless of which behaviour realloc() shows. (If your implementation of realloc() does not return a non-NULL pointer for realloc(0, 0), then I'm suspicious that you aren't showing us exactly the code that managed to crash valgrind (which is a fair achievement — congratulations) because you aren't seeing the program terminate under control as it should if realloc(0, 0) returns NULL.)
You should be able to avoid that problem if you use:
temp = realloc(helper, (strtoks+1) * sizeof(char *));
Don't forget to increment strtoks itself at some point.

Double pointer to char[]

Alright, so I have the following code:
char** args = (char**)malloc(10*sizeof(char*));
memset(args, 0, sizeof(char*)*10);
char* curToken = strtok(string, ";");
for (int z = 0; curToken != NULL; z++) {
args[z] = strdup(curToken);
curToken = strtok(NULL, ";")
}
I want every arg[z] casted into an array of chars -- char string[100] -- and then processed in the algorithms I have following. Every arg[z] needs to be casted to the variable string at some point. I am confused by pointers, but I am slowly getting better at them.
EDIT:
char string[100] = "ls ; date ; ls";
arg[0] will be ls, arg[1] will be date, and arg[2] will be ls after the above code.
I want to put each argument back into char string[100] and process it through algorithms.
one easiest way is to keep a backup of the original string in some temporary variable.
char string[100] = "ls ; date ; ls";
char temp_str[100] = {0};
strcpy (temp_str, string);
Another way is to do it by strcat. z has the number of agruments.
memset(string, '\0', 100);
for (i = 0; i < z; i++)
{
strcat(string, args[i]);
if (i != (z - 1))
{
//if it is last string dont append semicolon
strcat(string, ";");
}
}
Note : Take care of the boundary condition check
If you want the parts of string copied into a fixed length string[100] then you need to malloc 100 chars for each args[] inside the loop and strncpy() the result of strtok into it. strdup will only allocate enough memory for the actual length of the supplied string (plus \0)
This:
char** args = (char**)malloc(10*sizeof(char*));
memset(args, 0, sizeof(char*)*10);
is broken code. First, you shouldn't cast malloc()'s return value. Second, args is a pointer to ten pointers to char. You can't set them to NULL using memset(), there's no guarantee that "all bytes zero" is the same as NULL. You need to use a loop.

Resources