Parser generating segfaults in C - c

I've been trying to get this to work for the past 2 weeks to no avail.
I have a project to create a shell that implements parsing and built-in commands. The issue I'm having is when I pass a char* to my parse function and it returns, when i try to access any part of it, I get a segfault. I've tried different methods including a struct holding a char** all with the same problems, so i'm guessing it's an issue with my parser. I would appreciate any help.
code for parser.c:
#define BUFSIZE 1024
#define TOK_BUFSIZE 64
#define TOK_DELIM " \t\r\n\a"
char*** Parse(char *line0){
char* null_ptr = 0;
char*** cmd = malloc(MAX_SIZE * sizeof(char**));
/*
char arg[] = argument
char* argv[] = argument array
char** cmd[] = array of argument arrays
*/
int bufsize = MAX_SIZE, cmdp = 0, argp = 0, com = FALSE, redir = FALSE;
char *token;
char* line = malloc(100*sizeof(char));
strcpy(line,line0);
token = strtok(line, TOK_DELIM);
while (token){
if (*token == ';'){ // new command string
char* tmp1 = malloc(BUFSIZE * sizeof(char));
char** tmpa = malloc(BUFSIZE * sizeof(char*));
strcpy(tmp1, token);
tmp1[sizeof(token)] = null_ptr;
tmpa[0]=tmp1;
cmd[cmdp] = tmpa;
argp = 0;
cmdp++;
com = FALSE;
redir = FALSE;
}
else if (*token == '>' || *token == '<' || token == ">>"){ // redirects
argp = 0;
char* tmp1 = malloc(BUFSIZE * sizeof(char));
char** tmpa = malloc(BUFSIZE * sizeof(char*));
strcpy(tmp1, token);
tmp1[sizeof(token)] = null_ptr;
tmpa[argp]=tmp1;
argp++;
printf("Redirect: %s\n",tmp1);
com = FALSE;
redir = TRUE;
}
else if (*token == '|'){ // pipe
printf("PIPE\n");
cmdp++;
argp = 0;
com = FALSE;
}
else if (redir){ // redirect file name
// redirect token stored in arg[]
char* tmp1 = malloc(BUFSIZE * sizeof(char));
char** tmpa = malloc(BUFSIZE * sizeof(char*));
strcpy(tmp1, token);
tmp1[sizeof(token)] = null_ptr;
tmpa[argp]=tmp1;
cmd[cmdp]=tmpa;
argp = 0;
cmdp++;
redir = FALSE;
com = FALSE;
printf("File: %s\n", token);
}
else if (token == "&") // background
{
cmdp++;
argp = 0;
char* tmp1 = malloc(BUFSIZE * sizeof(char));
char** tmpa = malloc(BUFSIZE * sizeof(char*));
strcpy(tmp1, token);
tmp1[sizeof(token)] = null_ptr;
tmpa[0]=tmp1;
cmd[cmdp]=tmpa;
printf("Background");
}
else if (!com && !redir){ // command entered
argp = 0;
char* tmp1 = malloc(BUFSIZE * sizeof(char));
char** tmpa = malloc(BUFSIZE * sizeof(char*));
strcpy(tmp1, token);
tmp1[sizeof(token)] = null_ptr;
tmpa[argp] = tmp1;
argp++;
printf("Command %s\n", token);
com = TRUE;
}
else if (com){ // argument to command, all other redirects and pipes taken care of
char* tmp1 = malloc(BUFSIZE * sizeof(char));
char** tmpa = malloc(BUFSIZE * sizeof(char*));
strcpy(tmp1, token);
tmp1[sizeof(token)] = null_ptr;
tmpa[argp] = tmp1;
argp++;
printf("Argument: %s\n", token);
//cmd[cmdp] = argv; // save current working argument array
//cmdp++;
}
// end of if else statements
token = strtok(NULL, TOK_DELIM);
} // end of while
cmdp++;
cmd[cmdp] = NULL;
return &cmd;
}

When I compiled your code on the command-line by typing in:
gcc /path/to/yourcodefilename.c -Wall -Wextra
But replacing /path/to/yourcodefilename.c with the actual filename of the code containing the main function that eventually calls your function (my file is test2.c), I received warnings. The first being:
./test2.c:21: error: 'aaa' undeclared (first use in this function)
./test2.c:21: error: (Each undeclared identifier is reported only once
./test2.c:21: error: for each function it appears in.)
And I received a few of those. "aaa" is replaced by the named of something you used inside your function that has not been previously defined. This includes the word TRUE and FALSE. To correct this, you can use at the top of your program:
#define FALSE n
#define TRUE y
where n and y are numbers representing false and true respectively. Another way to correct it is to include the header files containing definitions for "TRUE" and "FALSE".
The second thing I noticed on a few lines is:
warning: assignment makes integer from pointer without a cast
Make sure you don't convert data over from one type to another. For example, don't set a character variable to a pointer value.
For example, change:
tmp1[sizeof(token)] = null_ptr;
to:
tmp1[sizeof(token)] = '\0';
Because specifying an index to a char* means specifying a char, and null_ptr is of type char* and char* and char are not the same. What i did was assigned a null value that is a char.
I hope that helps you with some troubleshooting

There are several issues here:
You allocate cmd and its subarrays. You return an address to that array at the end of the function. The address has type char ****, which is not the correct return type. What's worse: That address is the address of a local variable, which goes out of scope immediately after returning. Return the handle you got from malloc instead:
char ***Parse(char *line0)
{
char ***cmd = malloc(MAX_SIZE * sizeof(*cmd));
// fill cmd
return cmd;
}
Your code is needlessly long, mostly because you code the steps to allocate memory, copy a string and null-terminate it explicitly. (Others have pointed out that you don't do the null termination properly. You also allocate a fixed size of 1024 bytes regardles of the actual string length, which is quite wasteful.) You could write a function to duplicate strings or use the non-standard, but widely available strdup; this would make your code easier to read.
All the temporary allocations are hard to follow. For example, in the branch if (!com && !redir), you allocate to tmpa, but you never store that value in cmd. The same goes for the redirection branch.
It's also not clear when you start a new command. There should be a new command just before parsing the first token, after encountering a pipe or after encountering a semicolon. You also start new commands for redirections and the background ampersand.
The comparison token == ">>" will always be false: token is an address in line and ">>" is a string literal stored n static memory. You should use strcmp to compare two strings.
In general, you want to allocate a new list when cmdp increases. In that case, argp is reset to zero. Otherwise, you just append to the current command.
I think that you complicate things by treating everything as special. I recommend to simplify the code and leave redirection and the background for the moment. They can easily be resolved when the command is called. (Your code sets the state with redir and com, but it never enforces file names after redirection, for example. You can do that easily when all the tokens are in place.)
The code below treats only pipes and semicolons as command separators. When the command is a pipe, the pipe token is prepended to the following command:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#define MAX_SIZE 32
#define TOK_DELIM " \t\r\n\a"
char *sdup(const char *str)
{
size_t len = strlen(str);
char *dup = malloc(len + 1);
if (dup) {
memcpy(dup, str, len);
dup[len] = '\0';
}
return dup;
}
char ***parse(char *line0)
{
char *token;
char *line = sdup(line0);
token = strtok(line, TOK_DELIM);
if (token == NULL) return NULL;
char ***cmd = malloc(MAX_SIZE * sizeof(char **));
int cmdp = 0;
int argp = 0;
cmd[0] = malloc(MAX_SIZE * sizeof(*cmd[0]));
while (token) {
if (strcmp(token, ";") == 0 || strcmp(token, "|") == 0) {
// begin new command
cmd[cmdp][argp++] = NULL;
cmdp++;
if (cmdp + 1 == MAX_SIZE) break;
argp = 0;
cmd[cmdp] = malloc(MAX_SIZE * sizeof(*cmd[0]));
// prepend pipe token
if (*token == '|') {
cmd[cmdp][argp++] = sdup(token);
}
} else {
// append to current command
if (argp + 1 < MAX_SIZE) {
cmd[cmdp][argp++] = sdup(token);
}
}
token = strtok(NULL, TOK_DELIM);
}
// null-terminate arg and cmd lists
cmd[cmdp][argp] = NULL;
cmdp++;
cmd[cmdp] = NULL;
return cmd;
}
int main()
{
char ***cmd = parse("echo start ; ls -l | wc > output ; echo stop");
char ***p = cmd;
while (*p) {
char **q = *p;
while (*q) {
printf("'%s' ", *q);
free(*q);
q++;
}
puts("");
free(*p);
p++;
}
free(cmd);
return 0;
}
Further remarks:
I'm not sure whether the current format is suited to the task. It might be better to have a tree structure that takes care of pipes, semicolons and maybe as well &&and || and then have leaf nodes with the commands where the arguments are linked lists.
Tokenisation with strtok requires white-space between all tokens, but punctuation can usually be written without explicit space, e.g.: "./a.out>kk&". So you will need a better method of parsing.
At the moment, you allocate space for each string, which you must free later. If you create a token struct that describes a token as read-only view into the original string you can do without the allocations. The views are not null-terminated, though, so you will need comparison methis that work on starting pointer plus length, for example.

Related

Strange behaviour in assigning char**[i]

While debugging a simple shell application, I came across a strange bug when a command has >2 parameters. I tracked it down to this function (string[] is the command line, e.g. echo one two and sep is the character to split the string by, set to ' ' where it is called):
char **split(char string[], char *sep) {
char *token = strtok(string, sep);
char **argv = calloc(1, sizeof(char*));
int i = 0;
while (token != NULL) {
argv = realloc(argv, sizeof(argv) + sizeof(char*));
argv[i] = calloc(strlen(token), sizeof(char));
strcpy(argv[i++], token);
token = strtok(NULL, sep);
}
argv[i] = NULL; // sets argv[0] even if i == 4
return argv;
}
It runs fine with <=2 parameters, splitting a string into a char** and null terminating it. However with 3 parameters, argv[0] ends up getting set to null in at the end instead of the last element.
Am I doing something stupid or is there undefined behaviour?
edit: >=4 parameters causes crashes the program:
realloc(): invalid next size
signal: aborted (core dumped)
You have three bugs in your code.
strlen() gives the string length without the null terminator. So you have to write
calloc(strlen(token) + 1, sizeof(char)); to make space for it.
sizeof(argv) returns the size of the pointer (not the number of elements) and is constant at compile time.
argv[i] = NULL; is out of bounds
Working code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char **split(char string[], char *sep)
{
char *token = strtok(string, sep);
char **argv = calloc(1, sizeof(char *));
int i = 0;
while(token != NULL)
{
argv = realloc(argv, (i + 1) * sizeof(char *));
argv[i] = calloc(strlen(token) + 1, sizeof(char));
strcpy(argv[i++], token);
token = strtok(NULL, sep);
}
argv = realloc(argv, (i + 1) * sizeof(char *));
argv[i] = NULL;
return argv;
}
int main(void)
{
char s[] = "hello world test word";
char **result = split(s, " ");
printf("[0] = %s\n", result[0]);
printf("[1] = %s\n", result[1]);
printf("[2] = %s\n", result[2]);
printf("[3] = %s\n", result[3]);
return 1;
}
I recommend using gdb and valgrind for debugging such problems.
The problem is on these two lines:
argv = realloc(argv, sizeof(argv) + sizeof(char*));
argv[i] = calloc(strlen(token), sizeof(char));
sizeof(argv) gives you the size of a pointer (usually 4 or 8), not the number of items it points to. So when you call realloc you're always asking for the same number of bytes, which happens to be enough for two array items. Then when you have more than 2 you write past the bounds of allocated memory triggering undefined behavior.
Since i is keeping track of the length of the array, use that when calling realloc, and you need to multiply by the element size instead of add.
argv = realloc(argv, (i + 1) * sizeof(char*));
On the next line, you allocate enough space for the characters in the string but not enough for the terminating null byte, so add 1 for that:
argv[i] = calloc(strlen(token) + 1, sizeof(char));

What is the proper way to use strtok()?

Just to clarify, I'm a complete novice in C programming.
I have a tokenize function and it isn't behaving like what I expect it to. I'm trying to read from the FIFO or named pipe that is passed by the client side, and this is the server side. The client side reads a file and pass it to the FIFO. The problem is that tokenize doesn't return a format where execvp can process it, as running gdb tells me that it failed at calling the execute function in main(). (append function appends a char into the string)
One bug is that tokens is neither initialized nor allocated any memory.
Here is an example on how to initialize and allocate memory for tokens:
char **tokenize(char *line){
line = append(line,'\0');
int i = 0, tlen = 0;
char **tokens = NULL, *line2, *token, *delimiter;
delimiter = " \t";
token = strtok(line,delimiter);
while (token != NULL) {
if (i == tlen) {
// Allocate more space
tlen += 10;
tokens = realloc(tokens, tlen * sizeof *tokens);
if (tokens == NULL) {
exit(1);
}
}
tokens[i] = token;
token = strtok(NULL, delimiter);
i += 1;
}
tokens[i] = NULL;
return tokens;
}
This code will allocate memory for 10 tokens at a time. If the memory allocation fails, it will end the program with a non-zero return value to indicate failure.

C string not being filled in function call

I'm attempting to fill a few strings using a function, but the strings don't seem to be getting filled properly. The print statement is just 4 empty lines. BUT if I un-comment the char ** pls line, it prints all three strings properly even if I never use the variable pls anywhere. It also runs properly in debug mode without the pls variable existing. I'm not entirely sure what I did that it isn't happy about.
char * dataFile = (char *) calloc(64, sizeof(char));
//char ** pls = &dataFile;
char * queryFile = (char *) calloc(64, sizeof(char));
char * outFile = (char *) calloc(64, sizeof(char));
for(i = 1; i <argc; ++i)
{
char command[3];
char * iterator = argv[i];
command[0] = *iterator;
++iterator;
command[1] = *iterator;
++iterator;
command[2] = *iterator;
if(strcmp(command, "df=") == 0)
determineFileString(iterator, &dataFile);
else if(strcmp(command, "if=") == 0)
determineFileString(iterator, &queryFile);
else if(strcmp(command, "of=") == 0)
determineFileString(iterator, &outFile);
}
printf("%s\n%s\n%s\n", dataFile, queryFile, outFile);
void determineFileString(char * iterator, char ** file)
{
char * p = *file;
++iterator;
while(*iterator != '\0')
{
*p = *iterator;
++p;
++iterator;
}
*p = '\0';
}
You are calling strcmp but the first operand does not point to a string. A string is defined as some characters followed by a null terminator.
Your code will also cause undefined behaviour if an argv[i] string is shorter than 2 characters, because you always copy 3 characters out of it.
To fix this, either make command bigger and put a null terminator on the end, or use memcmp instead of strcmp. (But be careful with memcmp as it also causes UB if both objects are not at least as big as the size).
Here is a possible fix:
for(i = 1; i <argc; ++i)
{
if ( strlen(argv[i]) < 3 )
continue;
if ( memcmp(argv[i], "df=", 3) == 0 )
determineFileString(argv[i] + 3, &dataFile);
else if // etc.
}
BTW, the determineFileString function does not do any buffer size checking (it could buffer overflow). I'd suggest redesigning this function; perhaps it could do a length check and call realloc inside the function.

Split and Join strings in C Language

I learnt C in uni but haven't used it for quite a few years. Recently I started working on a tool which uses C as the programming language. Now I'm stuck with some really basic functions. Among them are how to split and join strings using a delimiter? (I miss Python so much, even Java or C#!)
Below is the function I created to split a string, but it does not seem to work properly. Also, even this function works, the delimiter can only be a single character. How can I use a string as a delimiter?
Can someone please provide some help?
Ideally, I would like to have 2 functions:
// Split a string into a string array
char** fSplitStr(char *str, const char *delimiter);
// Join the elements of a string array to a single string
char* fJoinStr(char **str, const char *delimiter);
Thank you,
Allen
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
char** fSplitStr(char *str, const char *delimiters)
{
char * token;
char **tokenArray;
int count=0;
token = (char *)strtok(str, delimiters); // Get the first token
tokenArray = (char**)malloc(1 * sizeof(char*));
if (!token) {
return tokenArray;
}
while (token != NULL ) { // While valid tokens are returned
tokenArray[count] = (char*)malloc(sizeof(token));
tokenArray[count] = token;
printf ("%s", tokenArray[count]);
count++;
tokenArray = (char **)realloc(tokenArray, sizeof(char *) * count);
token = (char *)strtok(NULL, delimiters); // Get the next token
}
return tokenArray;
}
int main (void)
{
char str[] = "Split_The_String";
char ** splitArray = fSplitStr(str,"_");
printf ("%s", splitArray[0]);
printf ("%s", splitArray[1]);
printf ("%s", splitArray[2]);
return 0;
}
Answers: (Thanks to Moshbear, Joachim and sarnold):
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
char** fStrSplit(char *str, const char *delimiters)
{
char * token;
char **tokenArray;
int count=0;
token = (char *)strtok(str, delimiters); // Get the first token
tokenArray = (char**)malloc(1 * sizeof(char*));
tokenArray[0] = NULL;
if (!token) {
return tokenArray;
}
while (token != NULL) { // While valid tokens are returned
tokenArray[count] = (char*)strdup(token);
//printf ("%s", tokenArray[count]);
count++;
tokenArray = (char **)realloc(tokenArray, sizeof(char *) * (count + 1));
token = (char *)strtok(NULL, delimiters); // Get the next token
}
tokenArray[count] = NULL; /* Terminate the array */
return tokenArray;
}
char* fStrJoin(char **str, const char *delimiters)
{
char *joinedStr;
int i = 1;
joinedStr = realloc(NULL, strlen(str[0])+1);
strcpy(joinedStr, str[0]);
if (str[0] == NULL){
return joinedStr;
}
while (str[i] !=NULL){
joinedStr = (char*)realloc(joinedStr, strlen(joinedStr) + strlen(str[i]) + strlen(delimiters) + 1);
strcat(joinedStr, delimiters);
strcat(joinedStr, str[i]);
i++;
}
return joinedStr;
}
int main (void)
{
char str[] = "Split_The_String";
char ** splitArray = (char **)fStrSplit(str,"_");
char * joinedStr;
int i=0;
while (splitArray[i]!=NULL) {
printf ("%s", splitArray[i]);
i++;
}
joinedStr = fStrJoin(splitArray, "-");
printf ("%s", joinedStr);
return 0;
}
Use strpbrk instead of strtok, because strtok suffers from two weaknesses:
it's not re-entrant (i.e. thread-safe)
it modifies the string
For joining, use strncat for joining, and realloc for resizing.
The order of operations is very important.
Before doing the realloc;strncat loop, set the 0th element of the target string to '\0' so that strncat won't cause undefined behavior.
For starters, don't use sizeof to get the length of a string. strlen is the function to use. In this case strdup is better.
And you don't actually copy the string returned by strtok, you copy the pointer. Change you loop to this:
while (token != NULL) { // While valid tokens are returned
tokenArray[count] = strdup(token);
printf ("%s", tokenArray[count]);
count++;
tokenArray = (char **)realloc(tokenArray, sizeof(char *) * count);
token = (char *)strtok(NULL, delimiters); // Get the next token
}
tokenArray[count] = NULL; /* Terminate the array */
Also, don't forget to free the entries in the array, and the array itself when you're done with it.
Edit At the beginning of fSplitStr, wait with allocating the tokenArray until after you check that token is not NULL, and if token is NULL why not return NULL?
I'm not sure the best solution for you, but I do have a few notes:
token = (char *)strtok(str, delimiters); // Get the first token
tokenArray = (char**)malloc(1 * sizeof(char*));
if (!token) {
return tokenArray;
}
At this point, if you weren't able to find any tokens in the string, you return a pointer to an "array" that is large enough to hold a single character pointer. It is un-initialized, so it would not be a good idea to use the contents of this array in any way. C almost never initializes memory to 0x00 for you. (calloc(3) would do that for you, but since you need to overwrite every element anyway, it doesn't seem worth switching to calloc(3).)
Also, the (char **) case before the malloc(3) call indicates to me that you've probably forgotten the #include <stdlib.h> that would properly prototype malloc(3). (The cast was necessary before about 1989.)
Do note that your while() { } loop is setting pointers to the parts of the original input string to your tokenArray elements. (This is one of the cons that moshbear mentioned in his answer -- though it isn't always a weakness.) If you change tokenArray[1][1]='H', then your original input string also changes. (In addition to having each of the delimiter characters replaced with an ASCII NUL character.)

strtok and memory leaks

I wrote a simple url parser using strtok(). here's the code
#include <stdio.h>
#include <stdlib.h>
typedef struct {
char *protocol;
char *host;
int port;
char *path;
} aUrl;
void parse_url(char *url, aUrl *ret) {
printf("Parsing %s\n", url);
char *tmp = (char *)_strdup(url);
//char *protocol, *host, *port, *path;
int len = 0;
// protocol agora eh por exemplo http: ou https:
ret->protocol = (char *) strtok(tmp, "/");
len = strlen(ret->protocol) + 2;
ret->host = (char *) strtok(NULL, "/");
len += strlen(ret->host);
//printf("char at %d => %c", len, url[len]);
ret->path = (char *)_strdup(&url[len]);
ret->path = (char *) strtok(ret->path, "#");
ret->protocol = (char *) strtok(ret->protocol, ":");
// host agora é por exemplo address.com:8080
//tmp = (char *)_strdup(host);
//strtok(tmp, ":");
ret->host = (char *) strtok(ret->host, ":");
tmp = (char *) strtok(NULL, ":");
if(tmp == NULL) {
if(strcmp(ret->protocol, "http") == 0) {
ret->port = 80;
} else if(strcmp(ret->protocol, "https") == 0) {
ret->port = 443;
}
} else {
ret->port = atoi(tmp);
}
//host = (char *) strtok(NULL, "/");
}
/*
*
*/
int main(int argc, char** argv) {
printf("hello moto\n");
aUrl myUrl;
parse_url("http://teste.com/Teste/asdf#coisa", &myUrl);
printf("protocol is %s\nhost is %s\nport is %d\npath is %s\n", myUrl.protocol, myUrl.host, myUrl.port, myUrl.path);
return (EXIT_SUCCESS);
}
As you can see, I use strtok() a lot so I can "slice" the url. I don't need to support urls different than http or https so the way it's done solves all of my problems.
My concern is (this is running on an embedded device) - Am I wasting memory ?
When I write something like
ret->protocol = (char *) strtok(tmp, "/");
And then later call
ret->protocol = (char *) strtok(ret->protocol, ":");
Does me first pointer ret->protocol held remain in memory ? I thought that maybe I should set the first call to a tmp pointer, call strtok pointing ret->protocol to the right portion of the string (the second call) and then free(tmp).
What should be the best way to use strtok ?
To answer your question directly, strtok only returns a pointer to a location inside the string you give it as input-- it doesn't allocate new memory for you, so shouldn't need to call free on any of the pointers it gives you back in return.
For what it's worth, you could also look into "strchr" and "strstr", which are nondestructive ways of searching for single characters or sequences within strings.
Also note that your memory allocation is problematic here-- you're using strdup() to allocate a new string inside your parse function, and then you're assigning fragments of that memory block to fields of "ret". Your caller will thus be responsible for free'ing the strdup'd string, but since you're only passing that string back implicitly inside ret, the caller needs to know magically what pointer to pass to free. (Probably ret->protocol, but maybe not, depending on how the input looks.)
strtok modifies the string in place, replacing the specified characters with NULL. Since strings in C are NULL-terminated, it now appears that your original pointer is pointing to a shorter string, even though the original string is still there and still occupies the same amount of memory (but with characters replaced with NULL). The end of the string, I think, contains a double-NULL.
The short answer is this: Keep a pointer to the beginning of your string buffer, and have another pointer that is your "current" pointer into the string as you parse it. When you use strtok or iterate over the string in other ways you update the "current" pointer but leave the beginning pointer alone. When you're finished, free() the beginning pointer. No memory leaked.
Do you know you can continue parsing the string using NULL as first parameter of strtok?
First call:
char* token = strtok(string, delimiters);
Then:
token = strtok(NULL, other_delimiters);
This allow you to simplify your code:
int parse_url(char *url, aUrl *ret)
{
//get protocol
char* token = strtok(url, "/");
if( token == NULL )
return -1;
strcpy(ret->protocol, token);
strcat(ret->protocol, "//");
// skip next '/'
token = strtok(NULL, "/");
if( token == NULL )
return -1;
//get host
token = strtok(NULL, "/");
if( token == NULL )
return -1;
strcpy(ret->host, token);
// get path
token = strtok(NULL, "#");
if( token == NULL )
return -1;
strcpy(ret->path, token);
// ...
return 0;
}
You can see I had a return value to know if parsing was successfully done.
Thanks for sharing your code! I ran it inside valgrind and fixed two memory leaks generated by strdup functions.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct {
char *protocol;
char *host;
int port;
char *path;
} URL;
void parse_url(char *url, URL *ret) {
char *tmp = (char *) strdup(url);
int len = 0;
ret->protocol = (char *) strtok(tmp, "/");
len = strlen(ret->protocol) + 2;
ret->host = (char *) strtok(NULL, "/");
len += strlen(ret->host);
ret->path = (char *) strdup(&url[len]);
ret->path = (char *) strtok(ret->path, "#");
ret->protocol = (char *) strtok(ret->protocol, ":");
ret->host = (char *) strtok(ret->host, ":");
tmp = (char *) strtok(NULL, ":");
if (tmp == NULL) {
if (strcmp(ret->protocol, "http") == 0) {
ret->port = 80;
} else if (strcmp(ret->protocol, "https") == 0) {
ret->port = 443;
}
} else {
ret->port = atoi(tmp);
}
}
void free_url(URL *url) {
free(url->path);
free(url->protocol);
}
int main(int argc, char** argv) {
URL url;
parse_url("http://example.com:3000/Teste/asdf#coisa", &url);
printf("protocol: %s\nhost: %s\nport: %d\npath: %s\n", url.protocol, url.host, url.port, url.path);
free_url(&url);
return (EXIT_SUCCESS);
}

Resources