I am having a problem with my program in which I am getting a segmentation fault from strsep() which was gotten from GDB and has the error message
Program received signal SIGSEGV, Segmentation fault.
0x00002aaaaad64550 in strsep () from /lib64/libc.so.6
My code is as follows:
int split(char *string, char *commands, char *character) {
char **sp = &string;
char *temp;
temp = strdup(string);
sp = &temp;
for (int i = 0; i < 100; i++) {
commands[i] = strsep(sp, character);
if (commands[i] == '\0') {
return 0;
}
if (strcasecmp(commands[i], "") == 0) {
i--;
}
printf("%d", i);
}
return 0;
}
Any help would be greatly appreciated as I have spent hours trying to solve this problem
The arguments for the function are ("Hello World", "#", "&")
EDIT
So I have managed to get rid of the segment fault by changing the code to
int split(char* string, char* commands, char* character) {
for(int i = 0; i < 100; i++) {
commands[i] = strsep(&string, character);
if(commands[i] == '\0') {
return 0;
}
if(strcasecmp(&commands[i], "") == 0) {
i--;
}
}
return 0;
}
However now I have a new problem with commands returning a null array where every index is out of bounds.
EDIT 2
I should also clarify on what I am trying to do a bit so essentially commands is of type char* commands[100] and I want to pass it into the function when then modifies the original pointer array and store say `"Hello World"' into commands[0] then I want to modify this value outside the function.
Your usage of commands is inconsistent with the function prototype: the caller passes an array of 100 char*, commands should be a pointer to an array of char *, hence a type char **commands or char *commands[]. For the caller to determine the number of tokens stored into the array, you should either store a NULL pointer at the end or return this number or both.
Storing commands[i] = strsep(...) is incorrect as commands is defined as a char *, not a char **.
It is surprising you get a segmentation fault in strsep() because the arguments seem correct, unless character happens to be an invalid pointer.
Conversely you have undefined behavior most likely resulting in a segmentation fault in strcasecmp(commands[i], "") as commands[i] is a char value, not a valid pointer.
Here is a modified version:
// commands is assumed to point to an array of at least 100 pointers
// return the number of tokens or -1 is case of allocation failure
int split(const char *string, char *commands[], const char *separators) {
char *dup = strdup(string + strcspn(string, separators));
if (temp == NULL)
return -1;
char *temp = dup;
char **sp = &temp;
int i = 0;
while (i < 99) {
char *token = strsep(sp, separators);
if (token == NULL) // no more tokens
break;
if (*token == '\0') // ignore empty tokens
continue;
commands[i++] = token;
}
commands[i] = NULL;
if (i == 0) {
free(dup);
}
return i;
}
The memory allocated for the tokens can be freed by freeing the first pointer in the commands array. It might be simpler to duplicate these tokens so they an be freed in a more generic way.
Related
My goal in the code is to parse some sort of input into words regarding all spaces but at the same time use those spaces to signify a change in words. The logic here is that anytime it encounters a space it loops until there is no longer a space character and then when it encounters a word it loops until it encounters a space character or a '\0' and meanwhile puts each character into one index of an array inside arrays in the 2d array. Then before the while loop continues again it indexes to the next array.
I'm almost certain the logic is implemented well enough for it to work but I get this weird output listed below I've had the same problem before when messing with pointers and whatnot but I just can't get this to work no matter what I do. Any ideas as to why I'm genuinely curious about the reason behind why?
#include <stdio.h>
#include <stdlib.h>
void print_mat(char **arry, int y, int x){
for(int i=0;i<y;i++){
for(int j=0;j<x;j++){
printf("%c",arry[i][j]);
}
printf("\n");
}
}
char **parse(char *str)
{
char **parsed=(char**)malloc(sizeof(10*sizeof(char*)));
for(int i=0;i<10;i++){
parsed[i]=(char*)malloc(200*sizeof(char));
}
char **pointer = parsed;
while(*str!='\0'){
if(*str==32)
{
while(*str==32 && *str!='\0'){
str++;
}
}
while(*str!=32 && *str!='\0'){
(*pointer) = (str);
(*pointer)++;
str++;
}
pointer++;
}
return parsed;
}
int main(){
char str[] = "command -par1 -par2 thething";
char**point=parse(str);
print_mat(point,10,200);
return 0;
}
-par1 -par2 thethingUP%�W���U�6o� X%��U�v;,���UP%���cNjW��]A�aW�Ӹto�8so�z�
-par2 thethingUP%�W���U�6o� X%��U�v;,���UP%���cNjW��]A�aW�Ӹto�8so�z�
thethingUP%�W���U�6o� X%��U�v;,���UP%���cNjW��]A�aW�Ӹto�8so�z�
UP%�W���U�6o� X%��U�v;,���UP%���cNjW��]A�aW�Ӹto�8so�z�
I also tried to simply index the 2d array but to no avail
char **parse(char *str)
{
int i, j;
i=0;
j=0;
char **parsed=(char**)malloc(sizeof(10*sizeof(char*)));
for(int i=0;i<10;i++){
parsed[i]=(char*)malloc(200*sizeof(char));
}
while(*str!='\0'){
i=0;
if(*str==32)
{
while(*str==32 && *str!='\0'){
str++;
}
}
while(*str!=32 && *str!='\0'){
parsed[j][i] = (*str);
i++;
str++;
}
j++;
}
return parsed;
}
Output:
command�&�v�U`'�v�U0(�v�U)�v�U�)�v�U
-par1
-par2
thething
makefile:5: recipe for target 'build' failed
make: *** [build] Segmentation fault (core dumped)
A couple of problems in your code:
Your program is leaking memory.
Your program is accessing memory which it does not own and this is UB.
Lets discuss them one by one -
First problem - Memory leak:
Check this part of parse() function:
while(*str!=32 && *str!='\0'){
(*pointer) = (str);
In the first iteration of outer while loop, the *pointer will give you first member of parsed array i.e. parsed[0], which is a pointer to char. Note that you are dynamically allocating memory to parsed[0], parsed[1]... parsed[9] pointers in parse() before the outer while loop. In the inner while loop you are pointing them to str. Hence, they will loose the dynamically allocated memory reference and leading to memory leak.
Second problem - Accessing memory which it does not own:
As stated above that the pointers parsed[0], parsed[1] etc. will point to whatever was the current value of str in the inner while loop of parse() function. That means, the pointers parsed[0], parsed[1] etc. will point to some element of array str (defined in main()). In the print_mat() function, you are passing 200 and accessing every pointer of array arry from 0 to 199 index. Since, the arry pointers are pointing to str array whose size is 29, that means, your program is accessing memory (array) beyond its size which is UB.
Lets fix these problem in your code without making much of changes:
For memory leak:
Instead of pointing the pointers to str, assign characters of str to the allocated memory, like this:
int i = 0;
while(*str!=32 && *str!='\0'){
(*pointer)[i++] = (*str);
str++;
}
For accessing memory which it does not own:
A point that you should remember:
In C, strings are actually one-dimensional array of characters terminated by a null character \0.
First of all, empty the strings after dynamically allocating memory to them so that you can identify the unused pointers while printing them:
for(int i=0;i<10;i++){
parsed[i]=(char*)malloc(200*sizeof(char));
parsed[i][0] = '\0';
}
Terminate all string in with null terminator character after writing word to parsed array pointers:
int i = 0;
while(*str!=32 && *str!='\0'){
(*pointer)[i++] = (*str);
str++;
}
// Add null terminator
(*pointer)[i] = '\0';
In the print_mat(), make sure once you hit the null terminator character, don't read beyond it. Modify the condition of inner for loop:
for(int j = 0; (j < x) && (arry[i][j] != '\0'); j++){
printf("%c",arry[i][j]);
You don't need to print the strings character by character, you can simply use %s format specifier to print a string, like this -
for (int i = 0;i < y; i++) {
if (arry[i][0] != '\0') {
printf ("%s\n", arry[i]);
}
}
With the above suggested changes (which are the minimal changes required in your program to work it properly), your code will look like this:
#include <stdio.h>
#include <stdlib.h>
void print_mat (char **arry, int y) {
for (int i = 0; i < y; i++) {
if (arry[i][0] != '\0') {
printf ("%s\n", arry[i]);
}
}
}
char **parse(char *str) {
char **parsed = (char**)malloc(sizeof(10*sizeof(char*)));
// check malloc return
for(int i = 0; i < 10; i++){
parsed[i] = (char*)malloc(200*sizeof(char));
// check malloc return
parsed[i][0] = '\0';
}
char **pointer = parsed;
while (*str != '\0') {
if(*str == 32) {
while(*str==32 && *str!='\0') {
str++;
}
}
int i = 0;
while (*str != 32 && *str != '\0') {
(*pointer)[i++] = (*str);
str++;
}
(*pointer)[i] = '\0';
pointer++;
}
return parsed;
}
int main (void) {
char str[] = "command -par1 -par2 thething";
char **point = parse(str);
print_mat (point, 10);
// free the dynamically allocate memory
return 0;
}
Output:
command
-par1
-par2
thething
There is a lot improvements can be done in your code implementation, for e.g. -
As I have shown above, you can use %s format specifier instead of printing string character by character etc.. I am leaving it up to you to identify those changes and modify your program.
Allocate memory to a parsed array pointer only where there is a word in str.
Instead of allocating memory of fixed size (i.e. 200) to parsed array pointers, allocate memory of size of word only.
Few suggestions:
Always check the return value of function like malloc.
Make sure to free the dynamically allocated memory once your program done with it.
You can achieve what you want in a simpler way.
First, define a function that checks if a character (separator) is present in a list of characters (separators):
// Returns true if c is found in a list of separators, false otherwise.
bool belongs(const char c, const char *list)
{
for (const char *p = list; *p; ++p)
if (*p == c) return true;
return false;
}
Then, define a function that splits a given string into tokens, separated by one or more separators:
// Splits a string into into tokens, separated by one of the separators in sep
bool split(const char *s, const char *sep, char **tokens, size_t *ntokens, const size_t maxtokens)
{
// Start with zero tokens.
*ntokens = 0;
const char *start = s, *end = s;
for (const char *p = s; /*no condtition*/; ++p) {
// Can no longer hold more tokens? Exit.
if (*ntokens == maxtokens)
return false;
// Not a token? Continue looping.
if (*p && !belongs(*p, sep))
continue;
// Found a token: calculate its length.
size_t tlength = p - start;
// Empty token?
if (tlength == 0) {
// And reached the end of string? Break.
if (!*p) break;
// Not the end of string? Skip it.
++start;
continue;
}
// Attempt to allocate memory.
char *token = malloc(sizeof(*token) * (tlength + 1));
// Failed? Exit.
if (!token)
return false;
// Copy the token.
strncpy(token, start, tlength+1);
token[tlength] = '\0';
// Put it in tokens array.
tokens[*ntokens] = token;
// Update the number of tokens.
*ntokens += 1;
// Reached the end of string? Break.
if (!*p) break;
// There is more to parse. Set the start to the next char.
start = p + 1;
}
return true;
}
Call it like this:
int main(void)
{
char command[] = "command -par1 -par2 thing";
const size_t maxtokens = 10;
char **tokens = malloc(sizeof *tokens * maxtokens);
if (!tokens) return 1;
size_t ntokens = 0;
split(command, " ", tokens, &ntokens, maxtokens);
// Print all tokens.
printf("Number of tokens = %ld\n", ntokens);
for (size_t i = 0; i < ntokens; ++i)
printf("%s\n", tokens[i]);
// Release memory when done.
for (size_t i = 0; i < ntokens; ++i)
free(tokens[i]);
free(tokens);
}
Output:
Number of tokens = 4
command
-par1
-par2
thing
My str_split function returns (or at least I think it does) a char** - so a list of strings essentially. It takes a string parameter, a char delimiter to split the string on, and a pointer to an int to place the number of strings detected.
The way I did it, which may be highly inefficient, is to make a buffer of x length (x = length of string), then copy element of string until we reach delimiter, or '\0' character. Then it copies the buffer to the char**, which is what we are returning (and has been malloced earlier, and can be freed from main()), then clears the buffer and repeats.
Although the algorithm may be iffy, the logic is definitely sound as my debug code (the _D) shows it's being copied correctly. The part I'm stuck on is when I make a char** in main, set it equal to my function. It doesn't return null, crash the program, or throw any errors, but it doesn't quite seem to work either. I'm assuming this is what is meant be the term Undefined Behavior.
Anyhow, after a lot of thinking (I'm new to all this) I tried something else, which you will see in the code, currently commented out. When I use malloc to copy the buffer to a new string, and pass that copy to aforementioned char**, it seems to work perfectly. HOWEVER, this creates an obvious memory leak as I can't free it later... so I'm lost.
When I did some research I found this post, which follows the idea of my code almost exactly and works, meaning there isn't an inherent problem with the format (return value, parameters, etc) of my str_split function. YET his only has 1 malloc, for the char**, and works just fine.
Below is my code. I've been trying to figure this out and it's scrambling my brain, so I'd really appreciate help!! Sorry in advance for the 'i', 'b', 'c' it's a bit convoluted I know.
Edit: should mention that with the following code,
ret[c] = buffer;
printf("Content of ret[%i] = \"%s\" \n", c, ret[c]);
it does indeed print correctly. It's only when I call the function from main that it gets weird. I'm guessing it's because it's out of scope ?
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#define DEBUG
#ifdef DEBUG
#define _D if (1)
#else
#define _D if (0)
#endif
char **str_split(char[], char, int*);
int count_char(char[], char);
int main(void) {
int num_strings = 0;
char **result = str_split("Helo_World_poopy_pants", '_', &num_strings);
if (result == NULL) {
printf("result is NULL\n");
return 0;
}
if (num_strings > 0) {
for (int i = 0; i < num_strings; i++) {
printf("\"%s\" \n", result[i]);
}
}
free(result);
return 0;
}
char **str_split(char string[], char delim, int *num_strings) {
int num_delim = count_char(string, delim);
*num_strings = num_delim + 1;
if (*num_strings < 2) {
return NULL;
}
//return value
char **ret = malloc((*num_strings) * sizeof(char*));
if (ret == NULL) {
_D printf("ret is null.\n");
return NULL;
}
int slen = strlen(string);
char buffer[slen];
/* b is the buffer index, c is the index for **ret */
int b = 0, c = 0;
for (int i = 0; i < slen + 1; i++) {
char cur = string[i];
if (cur == delim || cur == '\0') {
_D printf("Copying content of buffer to ret[%i]\n", c);
//char *tmp = malloc(sizeof(char) * slen + 1);
//strcpy(tmp, buffer);
//ret[c] = tmp;
ret[c] = buffer;
_D printf("Content of ret[%i] = \"%s\" \n", c, ret[c]);
//free(tmp);
c++;
b = 0;
continue;
}
//otherwise
_D printf("{%i} Copying char[%c] to index [%i] of buffer\n", c, cur, b);
buffer[b] = cur;
buffer[b+1] = '\0'; /* extend the null char */
b++;
_D printf("Buffer is now equal to: \"%s\"\n", buffer);
}
return ret;
}
int count_char(char base[], char c) {
int count = 0;
int i = 0;
while (base[i] != '\0') {
if (base[i++] == c) {
count++;
}
}
_D printf("Found %i occurence(s) of '%c'\n", count, c);
return count;
}
You are storing pointers to a buffer that exists on the stack. Using those pointers after returning from the function results in undefined behavior.
To get around this requires one of the following:
Allow the function to modify the input string (i.e. replace delimiters with null-terminator characters) and return pointers into it. The caller must be aware that this can happen. Note that supplying a string literal as you are doing here is illegal in C, so you would instead need to do:
char my_string[] = "Helo_World_poopy_pants";
char **result = str_split(my_string, '_', &num_strings);
In this case, the function should also make it clear that a string literal is not acceptable input, and define its first parameter as const char* string (instead of char string[]).
Allow the function to make a copy of the string and then modify the copy. You have expressed concerns about leaking this memory, but that concern is mostly to do with your program's design rather than a necessity.
It's perfectly valid to duplicate each string individually and then clean them all up later. The main issue is that it's inconvenient, and also slightly pointless.
Let's address the second point. You have several options, but if you insist that the result be easily cleaned-up with a call to free, then try this strategy:
When you allocate the pointer array, also make it large enough to hold a copy of the string:
// Allocate storage for `num_strings` pointers, plus a copy of the original string,
// then copy the string into memory immediately following the pointer storage.
char **ret = malloc((*num_strings) * sizeof(char*) + strlen(string) + 1);
char *buffer = (char*)&ret[*num_strings];
strcpy(buffer, string);
Now, do all your string operations on buffer. For example:
// Extract all delimited substrings. Here, buffer will always point at the
// current substring, and p will search for the delimiter. Once found,
// the substring is terminated, its pointer appended to the substring array,
// and then buffer is pointed at the next substring, if any.
int c = 0;
for(char *p = buffer; *buffer; ++p)
{
if (*p == delim || !*p) {
char *next = p;
if (*p) {
*p = '\0';
++next;
}
ret[c++] = buffer;
buffer = next;
}
}
When you need to clean up, it's just a single call to free, because everything was stored together.
The string pointers you store into the res with ret[c] = buffer; array point to an automatic array that goes out of scope when the function returns. The code subsequently has undefined behavior. You should allocate these strings with strdup().
Note also that it might not be appropriate to return NULL when the string does not contain a separator. Why not return an array with a single string?
Here is a simpler implementation:
#include <stdlib.h>
char **str_split(const char *string, char delim, int *num_strings) {
int i, n, from, to;
char **res;
for (n = 1, i = 0; string[i]; i++)
n += (string[i] == delim);
*num_strings = 0;
res = malloc(sizeof(*res) * n);
if (res == NULL)
return NULL;
for (i = from = to = 0;; from = to + 1) {
for (to = from; string[to] != delim && string[to] != '\0'; to++)
continue;
res[i] = malloc(to - from + 1);
if (res[i] == NULL) {
/* allocation failure: free memory allocated so far */
while (i > 0)
free(res[--i]);
free(res);
return NULL;
}
memcpy(res[i], string + from, to - from);
res[i][to - from] = '\0';
i++;
if (string[to] == '\0')
break;
}
*num_strings = n;
return res;
}
Sorry if this is a very noob question, I am a beginner to C and am having significant trouble understand pointers and other concepts making it very difficult. Getting segmentation fault, I don't know why please help. I think it may be from the use of arrays to store. Also, if you could recommend a debugger would be very helpful. Thanks in advance.
#include <stdio.h>
#include <string.h>
char *lineaccept(char *buf, size_t sz){ //Getting inputs using fgets() and storing it in buf.
if(fgets(buf,sz,stdin)==NULL){
printf("ERROR\n");
return NULL;
}
if(strlen(buf) == 1) {
printf("ERROR\n");
return NULL;
}
return buf;
}
void delimitLine(char *buf, char *delimited[], size_t max){ //Taking the string from buf and getting each individual word to store in split[]
if(buf != NULL){
const char s[2] = " ";
char *token;
token = strtok(buf, s);
int counter = 0;
while( token != NULL && counter <= max){
split[counter] = token;
token = strtok(NULL, s);
counter ++;
}
for(int y = 0; y < counter; y++){
if(split[y]==NULL){
break;
}else{
printf("%s\n",split[y]);
}
}
}
}
int main(void) {
const int maxWords = 10;
char maxLenInput[11];
char *arrOfWords[100];
char inputFromLine[100];
while(strcmp((strcpy(inputFromLine,lineaccept(maxLenInput, maxWords))), "")>0) {
delimitline(inputFromLine, arrOfWords, maxWords);
}
return 0;
}
The following part of your code will return NULL if you press solely Enter in the console (without having typed any other character before the "Enter"). This is because fgets will store the new line as the only character in buf, such that strlen(buf) will be 1 then:
char *read_line(char *buf, size_t sz){
....
if (fgets(buf,sz,stdin)) {
if(strlen(buf) == 1) {
return NULL;
...
When you then pass the result of a call to read_line to strcpy, as you do with
strcpy(inputFromLine,read_line(maxLenInput, maxWords)
then you actually pass NULL to strcpy and access "invalid" memory thereby; undefined behaviour, likely a segfault.
I'm trying to learn C, and one of the things I'm finding tricky is strings and manipulating them. I think I understand the basics of it, but I've taken for granted a lot of what might go into strings in JS or PHP (where I'm coming from).
I'm trying now to write a function that explodes a string into an array, based on a delimiter, using strtok. Similar to PHP's implementation of explode().
Here's the code:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
char **explode(char *input, char delimiter) {
char **output;
char *token;
char *string = malloc(sizeof(char) * strlen(input));
char delimiter_str[2] = {delimiter, '\0'};
int i;
int delim_count = 0;
for (i = 0; i < strlen(input); i++) {
string[i] = input[i];
if (input[i] == delimiter) {
delim_count++;
}
}
string[strlen(input)] = '\0';
output = malloc(sizeof(char *) * (delim_count + 1));
token = strtok(string, delimiter_str);
i = 0;
while (token != NULL) {
output[i] = token;
token = strtok(NULL, delimiter_str);
i++;
}
// if i uncomment this line, output gets all messed up
// free(string);
return output;
}
int main() {
char **row = explode("id,username,password", ',');
int i;
for (i = 0; i < 3; i++) {
printf("%s\n", row[i]);
}
free(row);
return 0;
}
The question I have is why if I try to free(string) in the function, the output gets messed up, and if I'm doing this incorrectly in the first place. I believe I'm just not mapping out the memory properly in my head and that's why I'm not understanding the issue.
you misunderstand what strtok does, It does not make new strings, it is simply returning a pointer to different parts of the original string. If you then free that string all the pointers you stored become invalid. I think you need
while (token != NULL) {
output[i] = strdup(token);
token = strtok(NULL, delimiter_str);
i++;
}
strdup will allocated and copy a new string for you
In output you save pointers that points into string so when you free string, you free the memory that the output pointers are pointing to.
It's not enough to save the pointers. You'll have to copy the actual strings. To do that you need to allocate memory to output in another way.
How can I compare the first letter of the first element of a char**?
I have tried:
int main()
{
char** command = NULL;
while (true)
{
fgets(line, MAX_COMMAND_LEN, stdin);
parse_command(line, command);
exec_command(command);
}
}
void parse_command(char* line, char** command)
{
int n_args = 0, i = 0;
while (line[i] != '\n')
{
if (isspace(line[i++]))
n_args++;
}
for (i = 0; i < n_args+1; i++)
command = (char**) malloc (n_args * sizeof(char*));
i = 0;
line = strtok(line," \n");
while (line != NULL)
{
command[i++] = (char *) malloc ( (strlen(line)+1) * sizeof(char) );
strcpy(command[i++], line);
line = strtok(NULL, " \n");
}
command[i] = NULL;
}
void exec_command(char** command)
{
if (command[0][0] == '/')
// other stuff
}
but that gives a segmentation fault. What am I doing wrong?
Thanks.
Could you paste more code? Have you allocated memory both for your char* array and for the elements of your char* array?
The problem is, you do allocate a char* array inside parse_command, but the pointer to that array never gets out of the function. So exec_command gets a garbage pointer value. The reason is, by calling parse_command(line, command) you pass a copy of the current value of the pointer command, which is then overwritten inside the function - but the original value is not affected by this!
To achieve that, either you need to pass a pointer to the pointer you want to update, or you need to return the pointer to the allocated array from parse_command. Apart from char*** looking ugly (at least to me), the latter approach is simpler and easier to read:
int main()
{
char** command = NULL;
while (true)
{
fgets(line, MAX_COMMAND_LEN, stdin);
command = parse_command(line);
exec_command(command);
}
}
char** parse_command(char* line)
{
char** command = NULL;
int n_args = 0, i = 0;
while (line[i] != '\n')
{
if (isspace(line[i++]))
n_args++;
}
command = (char**) malloc ((n_args + 1) * sizeof(char*));
i = 0;
line = strtok(line," \n");
while (line != NULL)
{
command[i] = (char *) malloc ( (strlen(line)+1) * sizeof(char) );
strcpy(command[i++], line);
line = strtok(NULL, " \n");
}
command[i] = NULL;
return command;
}
Notes:
in your original parse_command, you allocate memory to command in a loop, which is unnecessary and just creates memory leaks. It is enough to allocate memory once. I assume that you want command to contain n_args + 1 pointers, so I modified the code accordingly.
in the last while loop of parse_command, you increment i incorrectly twice, which also leads to undefined behaviour, i.e. possible segmentation fault. I fixed it here.