Parsing a string in C with strsep (alternative methods) - c

I want to parse a string, and I use strsep function:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char str[] = "Marco:Q:2F7PKC";
char *token1, *token2, *token3;
char *r = malloc(30);
strcpy(r, str);
token1 = strsep(&r, ":");
token2 = strsep(&r, ":");
token3 = strsep(&r, ":");
printf("tok1 = %s\n", token1);
printf("tok2 = %s\n", token2);
printf("tok3 = %s\n", token3);
free(r);
return 0;
}
The function do its job well, but If I launch valgrind, the allocated string char * r does not freed correctly (definitely lost: 30 bytes in 1 blocks).
I'd like to know why and if there are alternative way to do the same thing, maybe without call strsep.
I call valgrind with valgrind --tool=memcheck --leak-check=full --show-reachable=yes ./a.out

strsep overwrites the target of its first (pointer-to-pointer) argument, so you lose the pointer to the malloc'd buffer's base. In fact, if you were do put a printf("%p\n", r); just before the free, you'd find out that you're freeing a null pointer, which has no effect.
The easy solution is to introduce an additional variable to keep that pointer around and free it when you're done. Idiomatic usage would be
char *r = strdup("Marco:Q:3F7PKC");
// check for errors
char *tok = r, *end = r;
while (tok != NULL) {
strsep(&end, ":");
puts(tok);
tok = end;
}
free(r);

I would like to a bit simplify a good reply from Fred Foo:
char *end, *r, *tok;
r = end = strdup("Marco:Q:3F7PKC");
assert(end != NULL);
while ((tok = strsep(&end, ":")) != NULL) {
printf("%s\n", tok);
}
free(r);
It gives the same result. But it is worth to say that strsep(3) stores next value after delimiter into end variable and returns current value (into tok variable).

The strsep function updates its first argument (so it points right after the token it found). You need to store the value returned by malloc in a separate variable and free this variable.

Related

Segfault in C program, malloc call

I am writing a program that takes a list of path ( environmental variable), splits the paths and prints it. When compiling it I get a segfault. The following is my output on GDB :
Program received signal SIGSEGV, Segmentation fault.
0x0000000000400eb0 in dest (name=0x7fffffffbce0 "PATH") at executables.c:100
100 dest[i] = malloc(srclen+1);
On valgrind:
==21574== 1 errors in context 2 of 3:
==21574== Use of uninitialised value of size 8
==21574== at 0x400EB0: dest (executables.c:100)
==21574== by 0x400B5B: main (main.c:9)
This is my function:
char** dest(char *name){
int i=0;
char *vp;
const char s[2]=":";
char *token;
char **dest;
name[strlen(name)-1]='\0';
vp=getenv(name);
if(vp == NULL){
exit(1);
}
token =strtok(vp,s);
while( token != NULL ){
size_t srclen = strlen(token);
dest[i] = malloc(srclen+1);
strcpy(dest[i], token);
token = strtok(NULL, s);
i++;
}
dest[i]=NULL;
return dest;
}
And this is my main:
#include "executables.h"
int main(int argc, char **argv){
char *path;
char name[BUFSIZ];
printf("enter name of environment variable:\n");
fgets(name,BUFSIZ,stdin);
char **p=dest(name);
int j=0;
while(p[j]!=NULL){
printf("%s\n",p[j]);
j++;
}
return(0);
}
Use strdup(). Saves steps (accounts for
'\0' too). You have to allocate some memory before hand for the approach you're using. Otherwise you might want a linked list and allocate packets instead of using the array pattern. When you say dest[i] = <ptr value> you're indexing to an offset of unallocated memory and storing something there, so it's a segvio.
#include <string.h>
#define MAXTOKENS 10000
char **get_dest(char *name) {
// Since dest has to be exposed/persist beyond this function all
// need dynamically allocate (malloc()) rather than stack allocate
// of the form of: char *dest[MAXTOKENS].
char *dest = malloc(MAXTOKENS * sizeof (char *)); // <--- need to allocate storage for the pointers
char *vp;
if ((vp = getenv(name)) == NULL)
exit(-1); // -1 is err exit on UNIX, 0 is success
int i = 0;
char *token = strtok(vp, ":");
while (token != NULL) {
dest[i] = strdup(token); // <=== strdup()
token = strtok(NULL, ":");
i++;
}
// dest[i] = NULL; // Why are you setting this to NULL after adding token?
return dest;
}
It's better if main() takes care of passing a proper null-terminated string to the get_dest() function because main is where the finicky fgets() is handled. Generally you want to do things locally where it makes the most sense and is most relevant. If you ever took your get_dest() function and used it somewhere where the strings were not read by fgets() it would just be a wasted step to overwrite the terminator there. So by initializing the char array to zeroes before fgets() you don't have to worry about setting the trailing byte to '\0'.
And finally probably not good to have your function name dest the same name as the variable it returns dest. In some situations having multiple symbols in your program with the same name can get you into trouble.
#include "executables.h"
int main(int argc, char **argv) {
char *path;
char name[BUFSIZ] = { 0 }; // You could initialize it to zero this way
printf("enter name of environment variable:\n");
// bzero(name, BUFSIZ); //... or you could initialize it to zero this way then
fgets(name, BUFSIZ, stdin);
char **p = get_dest(name);
int j = 0;
while(p[j] != NULL) {
printf("%s\n", p[j]);
j++;
free(p[j]); // like malloc(), strdup'd() strings must be free'd when done
}
free(p);
return 0;
}
dest[i] = malloc(srclen + 1);
You need to allocate memory for the pointer to char pointers (dest) as well as each char pointer stored in dest. In the code you provided, neither step is taken.
From the manpage of getenv:
Notes
...
As typically implemented, getenv() returns a pointer to a string
within the environment list. The caller must take care not to modify
this string, since that would change the environment of the process.
Your code violates that rule:
vp=getenv(name);
...
token =strtok(vp,s);
This is an illegal memory write operation.

Getting string with C function

I need to get strings dynamically but as I need to get more than one string, I need to use functions. So far I wrote this
(I put //**** at places i think might be wrong)
char* getstring(char *str);
int main() {
char *str;
strcpy(str,getstring(str));//*****
printf("\nString: %s", str);
return 0;
}
char* getstring(char str[]){//*****
//this part is copy paste from my teacher lol
char c;
int i = 0, j = 1;
str = (char*) malloc (sizeof(char));
printf("Input String:\n ");
while (c != '\n') {//as long as c is not "enter" copy to str
c = getc(stdin);
str = (char*)realloc(str, j * sizeof(char));
str[i] = c;
i++;
j++;
}
str[i] = '\0';//null at the end
printf("\nString: %s", str);
return str;//******
}
printf in the function is working but not back in main function.
I tried returning void, getting rid of *s or adding, making another str2 and tring to strcpy there or not using strcpy at all. Nothing seems to working. Am I misssing something? Or maybe this is not possible at all
//Thank you so much for your answers
Getting the string part can be taken from this answer. Only put a \n as input to the getline funtion.
char * p = getline('\n');
Three things :-
don't cast malloc, check if malloc/realloc is successful and sizeof is not a function.
The problem is not with the function that you are using, but with the way you try copying its result into an uninitialized pointer.
Good news is that you don't have to copy - your function already allocates a string in dynamic memory, so you can copy the pointer directly:
char *str = getstring(str);
This should fix the crash. A few points to consider to make your function better:
main needs to free(str) when it is done in order to avoid memory leak
Store realloc result in a temporary pointer, and do a NULL check to handle out-of-memory situations properly
There are two things to take away from the lesson as it stands now:
(1) You should have one way of returning the reference to the new string, either as an argument passed by reference to the function OR as a return value; you should not be implementing both.
(2) Because the subroutine your teacher gave you allocates memory on the heap, it will be available to any part of your program and you do not have to allocate any memory yourself. You should study the difference between heap memory, global memory, and automatic (stack) memory so you understand the differences between them and know how to work with each type.
(3) Because the memory is already allocated on the heap there is no need to copy the string.
Given these facts, your code can be simplified to something like the following:
int main() {
char *str = getstring();
printf( "\nString: %s", str );
return 0;
}
char* getstring(){
.... etc
Going forward, you want to think about how you de-allocate memory in your programs. For example, in this code the string is never de-allocated. It is a good habit to think about your strategy for de-allocating any memory that you allocate.
Let's simplify the code a bit:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char* getstring()
{
char c = 0;
int i = 0, j = 2;
char *str = NULL;
if ((str = (char*) malloc(sizeof(char))) == NULL)
return NULL;
printf("Input String: ");
while (c = getc(stdin)) {
if (c == '\n') break;
str = (char*) realloc(str, j * sizeof(char));
str[i++] = c;
j++;
}
str[i] = '\0';
printf("getstring() String: %s\n", str);
return str;
}
int main()
{
char *str = getstring();
printf("main() String: %s\n", str);
free(str);
return 0;
}
Then execute:
$ make teststring && ./teststring
cc teststring.c -o teststring
Input String: asdfasfasdf
getstring() String: asdfasfasdf
main() String: asdfasfasdf

Segmentation error in C while allocation memory

I am a total begginer at C programming and am trying to write a program that reads the value of "stat" file in /proc/. It works for the first few entries, but then it returns "Segmentation error (core dumped)".
So far I found out that the error has to do with the allocation of memory, but I cant seem to find a way to fix it.
My code so far is:
char* readFile(char* filename)
{
FILE *fp;
struct stat buf;
fp=fopen(filename,"r");
stat(filename,&buf);
char *string = malloc(buf.st_size);
char *s;
while(!feof(fp))
{
s=malloc(1024);
fgets(s,1024,fp);
s[strlen(s)-1]='\0';
strcat(string,s);
}
return string;
}
char* readStat(char* path, int statNumber)
{
char* str = malloc(sizeof(readFile(path)));
str = readFile(path);
char * pch = malloc(sizeof(str));
char * vals;
pch = strtok (str," ");
int i = 1;
while (pch != NULL)
{
if(i == statNumber)
vals = pch;
pch = strtok(NULL, " ");
i++;
}
return vals;
}
1) the
s=malloc(1024);
should not into the while it should be oitside the while loop and before the while.
And free it before leaving the function:
free(s);
2) add
string[0] = '\0';
just after
char *string = malloc(buf.st_size);
Otherwise the strcat will not work properly
3) You do not need to allocate memory for str pointer because the readFile function already did
char* str = malloc(sizeof(readFile(path)));
Just replaced with
char* str;
4) And also replace
char * pch = malloc(sizeof(str));
by
char * pch = str;
To start with, you don't allocate space for the terminator for the string variable. You also need to terminate it before you can use it as a destination for strcat.
To continue, when you do sizeof on a pointer, you get the size of the pointer and not what it points to. You have this problem in readStat.
You also have memory leaks, in that you call readFile twice, but never free the memory allocated in it. Oh, and one of the memory allocations in readFile is not needed at all.
And there's another memory leak in that you allocate memory for pch, but you loose that pointer when you assign the result of the strtok call. strtok returns a pointer to the string in the strtok call, so no need to allocate memory for it (which you didn't attempt to free anyway).
s=malloc(1024); should not be in loop, you should allocate the memory once and reset s with NULL before use next time in loop.
Also you should make a habit to free the memory after its usage.

how to put char * into array so that I can use it in qsort, and then move on to the next line

I have lineget function that returns char *(it detects '\n') and NULL on EOF.
In main() I'm trying to recognize particular words from that line.
I used strtok:
int main(int argc, char **argv)
{
char *line, *ptr;
FILE *infile;
FILE *outfile;
char **helper = NULL;
int strtoks = 0;
void *temp;
infile=fopen(argv[1],"r");
outfile=fopen(argv[2],"w");
while(((line=readline(infile))!=NULL))
{
ptr = strtok(line, " ");
temp = realloc(helper, (strtoks)*sizeof(char *));
if(temp == NULL) {
printf("Bad alloc error\n");
free(helper);
return 0;
} else {
helper=temp;
}
while (ptr != NULL) {
strtoks++;
fputs(ptr, outfile);
fputc(' ', outfile);
ptr = strtok(NULL, " ");
helper[strtoks-1] = ptr;
}
/*fputs(line, outfile);*/
free(line);
}
fclose(infile);
fclose(outfile);
return 0;
}
Now I have no idea how to put every of tokenized words into an array (I created char ** helper for that purpose), so that it can be used in qsort like qsort(helper, strtoks, sizeof(char*), compare_string);.
Ad. 2 Even if it would work - I don't know how to clear that line, and proceed to sorting next one. How to do that?
I even crashed valgrind (with the code presented above) -> "valgrind: the 'impossible' happened:
Killed by fatal signal"
Where is the mistake ?
The most obvious problem (there may be others) is that you're reallocating helper to the value of strtoks at the beginning of the line, but then incrementing strtoks and adding to the array at higher values of strtoks. For instance, on the first line, strtoks is 0, so temp = realloc(helper, (strtoks)*sizeof(char *)); leaves helper as NULL, but then you try to add every word on that line to the helper array.
I'd suggest an entirely different approach which is conceptually simpler:
char buf[1000]; // or big enough to be bigger than any word you'll encounter
char ** helper;
int i, numwords;
while(!feof(infile)) { // most general way of testing if EOF is reached, since EOF
// is just a macro and may not be machine-independent.
for(i = 0; (ch = fgetc(infile)) != ' ' && ch != '\n'; i++) {
// get chars one at a time until we hit a space or a newline
buf[i] = ch; // add char to buffer
}
buf[i + 1] = '\0' // terminate with null byte
helper = realloc(++numwords * sizeof(char *)); // expand helper to fit one more word
helper[numwords - 1] = strdup(buffer) // copy current contents of buffer to the just-created element of helper
}
I haven't tested this so let me know if it's not correct or there's anything you don't understand. I've left out the opening and closing of files and the freeing at the end (remember you have to free every element of helper before you free helper itself).
As you can see in strtok's prototype:
char * strtok ( char * str, const char * delimiters );
...str is not const. What strtok actually does is replace found delimiters by null bytes (\0) into your str and return a pointer to the beginning of the token.
Per example:
char in[] = "foo bar baz";
char *toks[3];
toks[0] = strtok(in, " ");
toks[1] = strtok(NULL, " ");
toks[2] = strtok(NULL, " ");
printf("%p %s\n%p %s\n%p %s\n", toks[0], toks[0], toks[1], toks[1],
toks[2], toks[2]);
printf("%p %s\n%p %s\n%p %s\n", &in[0], &in[0], &in[4], &in[4],
&in[8], &in[8]);
Now look at the results:
0x7fffd537e870 foo
0x7fffd537e874 bar
0x7fffd537e878 baz
0x7fffd537e870 foo
0x7fffd537e874 bar
0x7fffd537e878 baz
As you can see, toks[1] and &in[4] point to the same location: the original str has been modified, and in reality all tokens in toks point to somewhere in str.
In your case your problem is that you free line:
free(line);
...invalidating all your pointers in helper. If you (or qsort) try to access helper[0] after freeing line, you end up accessing freed memory.
You should copy the tokens instead, e.g.:
ptr = strtok(NULL, " ");
helper[strtoks-1] = malloc(strlen(ptr) + 1);
strcpy(helper[strtoks-1], ptr);
Obviously, you will need to free each element of helper afterwards (in addition to helper itself).
You should be getting a 'Bad alloc' error because:
char **helper = NULL;
int strtoks = 0;
...
while ((line = readline(infile)) != NULL) /* Fewer, but sufficient, parentheses */
{
ptr = strtok(line, " ");
temp = realloc(helper, (strtoks)*sizeof(char *));
if (temp == NULL) {
printf("Bad alloc error\n");
free(helper);
return 0;
}
This is because the value of strtoks is zero, so you are asking realloc() to free the memory pointed at by helper (which was itself a null pointer). One outside chance is that your library crashes on realloc(0, 0), which it shouldn't but it is a curious edge case that might have been overlooked. The other possibility is that realloc(0, 0) returns a non-null pointer to 0 bytes of data which you are not allowed to dereference. When your code dereferences it, it crashes. Both returning NULL and returning non-NULL are allowed by the C standard; don't write code that crashes regardless of which behaviour realloc() shows. (If your implementation of realloc() does not return a non-NULL pointer for realloc(0, 0), then I'm suspicious that you aren't showing us exactly the code that managed to crash valgrind (which is a fair achievement — congratulations) because you aren't seeing the program terminate under control as it should if realloc(0, 0) returns NULL.)
You should be able to avoid that problem if you use:
temp = realloc(helper, (strtoks+1) * sizeof(char *));
Don't forget to increment strtoks itself at some point.

Strtok and Strcat conflict

I am trying to work with strtok and strcat but the second printf never shows up. Here is the code:
int i = 0;
char *token[128];
token[i] = strtok(tmp, "/");
printf("%s\n", token[i]);
i++;
while ((token[i] = strtok(NULL, "/")) != NULL) {
strcat(token[0], token[i]);
printf("%s", token[i]);
i++;
}
If my input is 1/2/3/4/5/6 for tmp then the console output would be 13456. The 2 is always missing. Does anyone know how to fix this?
The two is always missing because on the first iteration of your loop you overwrite it with the call to strcat.
After entry to the loop your buffer contains: "1\02\03/4/5/6" internal strtok pointer is pointing to "3". tokens[1] points to "2".
You then call strcat: "12\0\03/4/5/6" so your token[i] pointer is pointing to "\0". The first print prints nothing.
Subsequent calls are OK because the null characters do not overwrite the input data.
To fix it you should build up your output string into a second buffer, not the one you are parsing.
A working(?) version:
#include <stdio.h>
#include <string.h>
int main(void)
{
int i = 0;
char *token[128];
char tmp[128];
char removed[128] = {0};
strcpy(tmp, "1/2/3/4/5/6");
token[i] = strtok(tmp, "/");
strcat(removed, token[i]);
printf("%s\n", token[i]);
i++;
while ((token[i] = strtok(NULL, "/")) != NULL) {
strcat(removed, token[i]);
printf("%s", token[i]);
i++;
}
return (0);
}
strtok modifies the input string in place and returns pointers to that string. You then take one of those pointers (token[0]) and pass it to another operation (strcat) that writes to that pointer. The writes are clobbering each other.
If you want to concatenate all the tokens, you should allocate a separate char* to strcpy to.

Resources