I am really stuck with one very simple piece of code.
This program takes argument like ./a.out -t=1,32,45,2 and prints quantity of commas in stdout. But from time to time execution works correctly and and more often throws segmentation fault.
I figured out that problem in this line of function substr_cnt (I also placed corresponding commentaries in code below):
target_counting = (char *)malloc(sizeof(char)*(strlen(target)));
In fact malloc returns NULL. If I change sizeof(char) by sizeof(char *) all starts work like a charm but I can't understand why is that. Furthermore in main function I also use malloc, and even with the same line
arg_parameter = (char *) malloc(sizeof(char)*(strlen(argv[1] - 3)));
all works just fine.
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#define strindex(target, source) ((size_t) strstr(target, source) - (size_t) target)
int substr_cnt( char *target, char *source ) {
int i=0;
int cnt=0;
char *target_counting;
//this is NOT working
target_counting = (char *)malloc(sizeof(char)*(strlen(target)));
//this is working
//target_counting = (char *)malloc(sizeof(char *)*(strlen(target)));
if (target_counting == NULL) {
printf("malloc failed\n");
return -1;
}
strcpy(target_counting, target);
while ((i=strindex(target_counting, source)) > 0) {
strncpy(target_counting, target_counting + i + 1, strlen(target_counting));
cnt++;
}
free(target_counting);
return cnt;
}
int main( int argc, char *argv[] )
{
int i;
int default_behavior = 0;
int arg_parametr_cnt;
char *arg_parameter;
if (argc == 1) {
default_behavior = 1;
} else if (argv[1][0] == '-' && argv[1][1] == 't' && argv[1][2] == '=') {
//this is working
arg_parameter = (char *) malloc(sizeof(char)*(strlen(argv[1] - 3)));
strncpy(arg_parameter, argv[1]+3, strlen(argv[1]));
printf("%s\n", arg_parameter);
arg_parametr_cnt = substr_cnt(arg_parameter, ",");
printf("commas: %d\n", arg_parametr_cnt);
}
else {
printf("wrong command line");
return 1;
}
return 0;
}
You have several issues here, the main point, you don't need to allocate memory at all. You can implement searching for a given substring without modifying the string and therefore work directly on the given argv parameters, e.g.
int substr_cnt(const char *haystack, const char *needle)
{
int cnt = 0;
const char *found = haystack;
while ((found = strstr(found, needle)) != NULL) {
++found;
++cnt;
}
return cnt;
}
Same for the call in main, just pass argv directly
arg_parametr_cnt = substr_cnt(argv[1] + 3, ",");
Now to answer your question, unless you really see the output of
printf("malloc failed\n");
I don't believe, malloc returns NULL, because when you allocate an even larger amount of memory, sizeof(char*) vs sizeof(char), it works.
The reasons, why your program crashes, are already covered in the other answers. To summarize
target_counting = (char *)malloc(sizeof(char)*(strlen(target))); allocates one char less than it should
while ((i=strindex(target_counting, source)) > 0) I'm not sure, what happens, when the result of strstr is NULL. strindex might return a negative number, depending on your memory layout, but I am not sure.
strncpy(target_counting, target_counting + i + 1, strlen(target_counting)); This is not really an issue, but since you copy the rest of the string, you could use strcpy(target_counting, target_counting + i + 1) instead.
arg_parameter = (char *) malloc(sizeof(char)*(strlen(argv[1] - 3))); this should be malloc(sizeof(char) * strlen(argv[1]) - 3 + 1)
strncpy(arg_parameter, argv[1]+3, strlen(argv[1])); again strcpy(arg_parameter, argv[1]+3) would be sufficient
Update:
In this version
int strindex(char *target, char *source)
{
char *idx;
if ((idx = strstr(target, source)) != NULL) {
return idx - target;
} else {
return -1;
}
}
you have an explicit test for NULL and act accordingly.
In the macro version
#define strindex(target, source) ((size_t) strstr(target, source) - (size_t) target)
there is no such test. You determine the index by calculating the difference between strstr() and the base address target. This is fine so far, but what happens, when strstr() returns NULL?
Pointer arithmetic is defined with two pointers, pointing into the same array. As soon as the two pointers point into different arrays, or one pointing into an array and the other somewhere else, the behaviour is undefined.
Technically, when you calculate NULL - target, it might yield a negative value, but it also might not. If target points to the address of 0x0f0a3a90, you could have 0x0 - 0x0f0a3a90 and get a negative value. If target points to 0xfe830780 however, it might be interpreted as a negative number, and then 0x0 - 0xfe830780 could result in a positive number.
But the main point is, you have undefined behaviour. For further reading look for pointer arithmetic, e.g. C++: Pointer Arithmetic
your malloc is not allocating space for the null terminator, you need to malloc (strlen(string)+1).
The malloc with a char* works because a pointer (is normal) 4 bytes long, so you are allocating 4 times more memory than required - minus the 1 byte need for a null terminator.
The problem may lie here: malloc(sizeof(char)*(strlen(argv[1] - 3)) in main. You are subtracting 3 from argv[1].
I think you intended to use:
malloc(sizeof(char)*(strlen(argv[1]) - 2)); // Allocate one more space for '\0' character
Doing this makes strlen to access unallocated memory.
Your program may not fail here, but later, because it is simply undefined behavior.
There are several buffer overruns, but I think that the bug that makes you program crash is the following:
strncpy(target_counting, target_counting + i + 1, strlen(target_counting));
Note that the strings in strncpy may not overlap!
I suggest that you do a memmove instead, because memmove can handle overlapping buffers:
memmove(target_counting, target_counting + i + 1, strlen(target_counting + i + 1) + 1);
I think your main issue is here :
arg_parameter = (char *) malloc(sizeof(char)*(strlen(argv[1] - 3)));
especially here
strlen(argv[1] - 3)
you pass to strlen address of argv[1]-3 which is not valid address.
actually what you meant is strlen(argv[1]) - 3. As others said you also should add one char for \0 so strlen(argv[1]) - 2
Related
I'm using an array of strings in C to hold arguments given to a custom shell. I initialize the array of buffers using:
char *args[MAX_CHAR];
Once I parse the arguments, I send them to the following function to determine the type of IO redirection if there are any (this is just the first of 3 functions to check for redirection and it only checks for STDIN redirection).
int parseInputFile(char **args, char *inputFilePath) {
char *inputSymbol = "<";
int isFound = 0;
for (int i = 0; i < MAX_ARG; i++) {
if (strlen(args[i]) == 0) {
isFound = 0;
break;
}
if ((strcmp(args[i], inputSymbol)) == 0) {
strcpy(inputFilePath, args[i+1]);
isFound = 1;
break;
}
}
return isFound;
}
Once I compile and run the shell, it crashes with a SIGSEGV. Using GDB I determined that the shell is crashing on the following line:
if (strlen(args[i]) == 0) {
This is because the address of arg[i] (the first empty string after the parsed commands) is inaccessible. Here is the error from GDB and all relevant variables:
(gdb) next
359 if (strlen(args[i]) == 0) {
(gdb) p args[0]
$1 = 0x7fffffffe570 "echo"
(gdb) p args[1]
$2 = 0x7fffffffe575 "test"
(gdb) p args[2]
$3 = 0x0
(gdb) p i
$4 = 2
(gdb) next
Program received signal SIGSEGV, Segmentation fault.
parseInputFile (args=0x7fffffffd570, inputFilePath=0x7fffffffd240 "") at shell.c:359
359 if (strlen(args[i]) == 0) {
I believe that the p args[2] returning $3 = 0x0 means that because the index has yet to be written to, it is mapped to address 0x0 which is out of the bounds of execution. Although I can't figure out why this is because it was declared as a buffer. Any suggestions on how to solve this problem?
EDIT: Per Kaylum's comment, here is a minimal reproducible example
#include<stdio.h>
#include<string.h>
#include<stdlib.h>
#include<unistd.h>
#include<sys/types.h>
#include<sys/wait.h>
#include <sys/stat.h>
#include<readline/readline.h>
#include<readline/history.h>
#include <fcntl.h>
// Defined values
#define MAX_CHAR 256
#define MAX_ARG 64
#define clear() printf("\033[H\033[J") // Clear window
#define DEFAULT_PROMPT_SUFFIX "> "
char PROMPT[MAX_CHAR], SPATH[1024];
int parseInputFile(char **args, char *inputFilePath) {
char *inputSymbol = "<";
int isFound = 0;
for (int i = 0; i < MAX_ARG; i++) {
if (strlen(args[i]) == 0) {
isFound = 0;
break;
}
if ((strcmp(args[i], inputSymbol)) == 0) {
strcpy(inputFilePath, args[i+1]);
isFound = 1;
break;
}
}
return isFound;
}
int ioRedirectHandler(char **args) {
char inputFilePath[MAX_CHAR] = "";
// Check if any redirects exist
if (parseInputFile(args, inputFilePath)) {
return 1;
} else {
return 0;
}
}
void parseArgs(char *cmd, char **cmdArgs) {
int na;
// Separate each argument of a command to a separate string
for (na = 0; na < MAX_ARG; na++) {
cmdArgs[na] = strsep(&cmd, " ");
if (cmdArgs[na] == NULL) {
break;
}
if (strlen(cmdArgs[na]) == 0) {
na--;
}
}
}
int processInput(char* input, char **args, char **pipedArgs) {
// Parse the single command and args
parseArgs(input, args);
return 0;
}
int getInput(char *input) {
char *buf, loc_prompt[MAX_CHAR] = "\n";
strcat(loc_prompt, PROMPT);
buf = readline(loc_prompt);
if (strlen(buf) != 0) {
add_history(buf);
strcpy(input, buf);
return 0;
} else {
return 1;
}
}
void init() {
char *uname;
clear();
uname = getenv("USER");
printf("\n\n \t\tWelcome to Student Shell, %s! \n\n", uname);
// Initialize the prompt
snprintf(PROMPT, MAX_CHAR, "%s%s", uname, DEFAULT_PROMPT_SUFFIX);
}
int main() {
char input[MAX_CHAR];
char *args[MAX_CHAR], *pipedArgs[MAX_CHAR];
int isPiped = 0, isIORedir = 0;
init();
while(1) {
// Get the user input
if (getInput(input)) {
continue;
}
isPiped = processInput(input, args, pipedArgs);
isIORedir = ioRedirectHandler(args);
}
return 0;
}
Note: If I forgot to include any important information, please let me know and I can get it updated.
When you write
char *args[MAX_CHAR];
you allocate room for MAX_CHAR pointers to char. You do not initialise the array. If it is a global variable, you will have initialised all the pointers to NULL, but you do it in a function, so the elements in the array can point anywhere. You should not dereference them before you have set the pointers to point at something you are allowed to access.
You also do this, though, in parseArgs(), where you do this:
cmdArgs[na] = strsep(&cmd, " ");
There are two potential issues here, but let's deal with the one you hit first. When strsep() is through the tokens you are splitting, it returns NULL. You test for that to get out of parseArgs() so you already know this. However, where your program crashes you seem to have forgotten this again. You call strlen() on a NULL pointer, and that is a no-no.
There is a difference between NULL and the empty string. An empty string is a pointer to a buffer that has the zero char first; the string "" is a pointer to a location that holds the character '\0'. The NULL pointer is a special value for pointers, often address zero, that means that the pointer doesn't point anywhere. Obviously, the NULL pointer cannot point to an empty string. You need to check if an argument is NULL, not if it is the empty string.
If you want to check both for NULL and the empty string, you could do something like
if (!args[i] || strlen(args[i]) == 0) {
If args[i] is NULL then !args[i] is true, so you will enter the if body if you have NULL or if you have a pointer to an empty string.
(You could also check the empty string with !(*args[i]); *args[i] is the first character that args[i] points at. So *args[i] is zero if you have the empty string; zero is interpreted as false, so !(*args[i]) is true if and only if args[i] is the empty string. Not that this is more readable, but it shows again the difference between empty strings and NULL).
I mentioned another issue with the parsed arguments. Whether it is a problem or not depends on the application. But when you parse a string with strsep(), you get pointers into the parsed string. You have to be careful not to free that string (it is input in your main() function) or to modify it after you have parsed the string. If you change the string, you have changed what all the parsed strings look at. You do not do this in your program, so it isn't a problem here, but it is worth keeping in mind. If you want your parsed arguments to survive longer than they do now, after the next command is passed, you need to copy them. The next command that is passed will change them as it is now.
In main
char input[MAX_CHAR];
char *args[MAX_CHAR], *pipedArgs[MAX_CHAR];
are all uninitialized. They contain indeterminate values. This could be a potential source of bugs, but is not the reason here, as
getInput modifies the contents of input to be a valid string before any reads occur.
pipedArgs is unused, so raises no issues (yet).
args is modified by parseArgs to (possibly!) contain a NULL sentinel value, without any indeterminate pointers being read first.
Firstly, in parseArgs it is possible to completely fill args without setting the NULL sentinel value that other parts of the program should rely on.
Looking deeper, in parseInputFile the following
if (strlen(args[i]) == 0)
contradicts the limits imposed by parseArgs that disallows empty strings in the array. More importantly, args[i] may be the sentinel NULL value, and strlen expects a non-NULL pointer to a valid string.
This termination condition should simply check if args[i] is NULL.
With
strcpy(inputFilePath, args[i+1]);
args[i+1] might also be the NULL sentinel value, and strcpy also expects non-NULL pointers to valid strings. You can see this in action when inputSymbol is a match for the final token in the array.
args[i+1] may also evaluate as args[MAX_ARGS], which would be out of bounds.
Additionally, inputFilePath has a string length limit of MAX_CHAR - 1, and args[i+1] is (possibly!) a dynamically allocated string whose length might exceed this.
Some edge cases found in getInput:
Both arguments to
strcat(loc_prompt, PROMPT);
are of the size MAX_CHAR. Since loc_prompt has a length of 1. If PROMPT has the length MAX_CHAR - 1, the resulting string will have the length MAX_CHAR. This would leave no room for the NUL terminating byte.
readline can return NULL in some situations, so
buf = readline(loc_prompt);
if (strlen(buf) != 0) {
can again pass the NULL pointer to strlen.
A similar issue as before, on success readline returns a string of dynamic length, and
strcpy(input, buf);
can cause a buffer overflow by attempting to copy a string greater in length than MAX_CHAR - 1.
buf is a pointer to data allocated by malloc. It's unclear what add_history does, but this pointer must eventually be passed to free.
Some considerations.
Firstly, it is a good habit to initialize your data, even if it might not matter.
Secondly, using constants (#define MAX_CHAR 256) might help to reduce magic numbers, but they can lead you to design your program too rigidly if used in the same way.
Consider building your functions to accept a limit as an argument, and return a length. This allows you to more strictly track the sizes of your data, and prevents you from always designing around the maximum potential case.
A slightly contrived example of designing like this. We can see that find does not have to concern itself with possibly checking MAX_ARGS elements, as it is told precisely how long the list of valid elements is.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX_ARGS 100
char *get_input(char *dest, size_t sz, const char *display) {
char *res;
if (display)
printf("%s", display);
if ((res = fgets(dest, sz, stdin)))
dest[strcspn(dest, "\n")] = '\0';
return res;
}
size_t find(char **list, size_t length, const char *str) {
for (size_t i = 0; i < length; i++)
if (strcmp(list[i], str) == 0)
return i;
return length;
}
size_t split(char **list, size_t limit, char *source, const char *delim) {
size_t length = 0;
char *token;
while (length < limit && (token = strsep(&source, delim)))
if (*token)
list[length++] = token;
return length;
}
int main(void) {
char input[512] = { 0 };
char *args[MAX_ARGS] = { 0 };
puts("Welcome to the shell.");
while (1) {
if (get_input(input, sizeof input, "$ ")) {
size_t argl = split(args, MAX_ARGS, input, " ");
size_t redirection = find(args, argl, "<");
puts("Command parts:");
for (size_t i = 0; i < redirection; i++)
printf("%zu: %s\n", i, args[i]);
puts("Input files:");
if (redirection == argl)
puts("[[NONE]]");
else for (size_t i = redirection + 1; i < argl; i++)
printf("%zu: %s\n", i, args[i]);
}
}
}
I have a c program where I copy one string to another but for some reason in my loop, if I remove a print statement I used for debugging once, the program crashes before I reach the print statement outside the while loop.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char * cats2 = malloc(sizeof(char));
cats2[0] = '\0';
char * cats = "this string is a cool cat";
getCopyFrom(cats, cats2);
free(cats2);
return 0;
}
void getCopyFrom(char* original, char* translation){
int index = 0;
char * current = malloc(sizeof(char));
current[0] = '\0';
while(index < (strlen(original))){
printf("%d\n", index);
current = realloc(current, sizeof(char) * 2);
current[index] = original[index];
current[index + 1] = '\0';
index++;
}
printf("%s\n", current);
free(current);
}
If I remove the printf("%d\n", index); from the while loop, the program will crash before the while loop ends. If I keep it, the program runs fine until the end where it returns a access violation error.
I'm not sure why either happens, am I missing something obvious or am I just not understanding malloc and realloc correctly?
edit:)
My previous question was answered, but I have a new problem. I added translation = realloc(translation, (strlen(current) + 1) * sizeof(char)); to the code to set the size of translation to the size of current but I get another access violation. Are you not able to realloc parameters or something?
current = realloc(current, sizeof(char) * 2);
but why is the second argument to realloc() in a loop a constant? It's like we want
current = realloc(current, sizeof(char) * (index + 1));
but once you do that you'll discover your code is really slow. The * 2 meant something. You really want to keep a separate array size and allocated size, and only call realloc when array size == allocated size and double allocated size.
if (index + 1 >= alloc) {
char *new = realloc(current, sizeof(char) * (alloc = (alloc == 0) ? 4 : (alloc << 1)));
if (!new) {
/* todo handle error */
}
}
I made a program in C that can find two similar or different strings and extract the string between them. This type of program has so many uses, and generally when you use such a program, you have a lot of info, so it needs to be fast. I would like tips on how to make this program as fast and efficient as possible.
I am looking for suggestions that won't make me resort to heavy libraries (such as regex).
The code must:
be able to extract a string between two similar or different strings
find the 1st occurrence of string1
find the 1st occurrence of string2 which occurs AFTER string1
extract the string between string1 and string2
be able to use string arguments of any size
be foolproof to human error and return NULL if such occurs (example, string1 exceeds entire text string length. don't crash in an element error, but gracefully return NULL)
focus on speed and efficiency
Below is my code. I am quite new to C, coming from C++, so I could probably use a few suggestions, especially regarding efficient/proper use of the 'malloc' command:
fast_strbetween.c:
/*
Compile with:
gcc -Wall -O3 fast_strbetween.c -o fast_strbetween
*/
#include <stdio.h> // printf
#include <stdlib.h> // malloc
// inline function if it pleases the compiler gods
inline size_t fast_strlen(char *str)
{
int i; // Cannot return 'i' if inside for loop
for(i = 0; str[i] != '\0'; ++i);
return i;
}
char *fast_strbetween(char *str, char *str1, char *str2)
{
// size_t segfaults when incorrect length strings are entered (due to going below 0), so use int instead for increased robustness
int str0len = fast_strlen(str);
int str1len = fast_strlen(str1);
int str1pos = 0;
int charsfound = 0;
// Find str1
do {
charsfound = 0;
while (str1[charsfound] == str[str1pos + charsfound])
++charsfound;
} while (++str1pos < str0len - str1len && charsfound < str1len);
// '++str1pos' increments past by 1: needs to be set back by one
--str1pos;
// Whole string not found or logical impossibilty
if (charsfound < str1len)
return NULL;
/* Start searching 2 characters after last character found in str1. This will ensure that there will be space, and logical possibility, for the extracted text to exist or not, and allow immediate bail if the latter case; str1 cannot possibly have anything between it if str2 is right next to it!
Example:
str = 'aa'
str1 = 'a'
str2 = 'a'
returned = '' (should be NULL)
Without such preventative, str1 and str2 would would be found and '' would be returned, not NULL. This also saves 1 do/while loop, one check pertaining to returning null, and two additional calculations:
Example, if you didn't add +1 str2pos, you would need to change the code to:
if (charsfound < str2len || str2pos - str1pos - str1len < 1)
return NULL;
It also allows for text to be found between three similar strings—what??? I can feel my brain going fuzzy!
Let this example explain:
str = 'aaa'
str1 = 'a'
str2 = 'a'
result = '' (should be 'a')
Without the aforementioned preventative, the returned string is '', not 'a'; the program takes the first 'a' for str1 and the second 'a' for str2, and tries to return what is between them (nothing).
*/
int str2pos = str1pos + str1len + 1; // the '1' added to str2pos
int str2len = fast_strlen(str2);
// Find str2
do {
charsfound = 0;
while (str2[charsfound] == str[str2pos + charsfound])
++charsfound;
} while (++str2pos < str0len - str2len + 1 && charsfound < str2len);
// Deincrement due to '++str2pos' over-increment
--str2pos;
if (charsfound < str2len)
return NULL;
// Only allocate what is needed
char *strbetween = (char *)malloc(sizeof(char) * str2pos - str1pos - str1len);
unsigned int tmp = 0;
for (unsigned int i = str1pos + str1len; i < str2pos; i++)
strbetween[tmp++] = str[i];
return strbetween;
}
int main() {
char str[30] = { "abaabbbaaaabbabbbaaabbb" };
char str1[10] = { "aaa" };
char str2[10] = { "bbb" };
//Result should be: 'abba'
printf("The string between is: \'%s\'\n", fast_strbetween(str, str1, str2));
// free malloc as we go
for (int i = 10000000; --i;)
free(fast_strbetween(str, str1, str2));
return 0;
}
In order to have some way of measuring progress, I have already timed the code above (extracting a small string 10000000 times):
$ time fast_strbetween
The string between is: 'abba'
0m11.09s real 0m11.09s user 0m00.00s system
Process used 99.3 - 100% CPU according to 'top' command (Linux).
Memory used while running: 3.7Mb
Executable size: 8336 bytes
Ran on a Raspberry Pi 3B+ (4 x 1.4Ghz, Arm 6)
If anyone would like to offer code, tips, pointers... I would appreciate it. I will also implement the changes and give a timed result for your troubles.
Oh, and one thing that I learned is to always de-allocate malloc; I ran the code above (with extra loops), just before posting this. My computer's ram filled up, and the computer froze. Luckily, Stack made a backup draft! Lesson learned!
* EDIT *
Here is the revised code using chqrlie's advice as best I could. Added extra checks for end of string, which ended up costing about a second of time with the tested phrase but can now bail very fast if the first string is not found. Using null or illogical strings should not result in error, hopefully. Lots of notes int the code, where they can be better understood. If I've left anything thing out or done something incorrectly, please let me know guys; it is not intentional.
fast_strbetween2.c:
/*
Compile with:
gcc -Wall -O3 fast_strbetween2.c -o fast_strbetween2
Corrections and additions courtesy of:
https://stackoverflow.com/questions/55308295/extracting-a-string-between-two-similar-or-different-strings-in-c-as-fast-as-p
*/
#include<stdio.h> // printf
#include<stdlib.h> // malloc, free
// Strings now set to 'const'
char * fast_strbetween(const char *str, const char *str1, const char *str2)
{
// string size will now be calculated by the characters picked up
size_t str1pos = 0;
size_t str1chars;
// Find str1
do{
str1chars = 0;
// Will the do/while str1 check for '\0' suffice?
// I haven't seen any issues yet, but not sure.
while(str1[str1chars] == str[str1pos + str1chars] && str1[str1chars] != '\0')
{
//printf("Found str1 char: %i num: %i pos: %i\n", str1[str1chars], str1chars + 1, str1pos);
++str1chars;
}
// Incrementing whilst not in conditional expression tested faster
++str1pos;
/* There are two checks for "str1[str1chars] != '\0'". Trying to find
another efficient way to do it in one. */
}while(str[str1pos] != '\0' && str1[str1chars] != '\0');
--str1pos;
//For testing:
//printf("str1pos: %i str1chars: %i\n", str1pos, str1chars);
// exit if no chars were found or if didn't reach end of str1
if(!str1chars || str1[str1chars] != '\0')
{
//printf("Bailing from str1 result\n");
return '\0';
}
/* Got rid of the '+1' code which didn't allow for '' returns.
I agree with your logic of <tag></tag> returning ''. */
size_t str2pos = str1pos + str1chars;
size_t str2chars;
//printf("Starting pos for str2: %i\n", str1pos + str1chars);
// Find str2
do{
str2chars = 0;
while(str2[str2chars] == str[str2pos + str2chars] && str2[str2chars] != '\0')
{
//printf("Found str2 char: %i num: %i pos: %i \n", str2[str2chars], str2chars + 1, str2pos);
++str2chars;
}
++str2pos;
}while(str[str2pos] != '\0' && str2[str2chars] != '\0');
--str2pos;
//For testing:
//printf("str2pos: %i str2chars: %i\n", str2pos, str2chars);
if(!str2chars || str2[str2chars] != '\0')
{
//printf("Bailing from str2 result!\n");
return '\0';
}
/* Trying to allocate strbetween with malloc. Is this correct? */
char * strbetween = malloc(2);
// Check if malloc succeeded:
if (strbetween == '\0') return '\0';
size_t tmp = 0;
// Grab and store the string between!
for(size_t i = str1pos + str1chars; i < str2pos; ++i)
{
strbetween[tmp] = str[i];
++tmp;
}
return strbetween;
}
int main() {
char str[30] = { "abaabbbaaaabbabbbaaabbb" };
char str1[10] = { "aaa" };
char str2[10] = { "bbb" };
printf("Searching \'%s\' for \'%s\' and \'%s\'\n", str, str1, str2);
printf(" 0123456789\n\n"); // Easily see the elements
printf("The word between is: \'%s\'\n", fast_strbetween(str, str1, str2));
for(int i = 10000000; --i;)
free(fast_strbetween(str, str1, str2));
return 0;
}
** Results **
$ time fast_strbetween2
Searching 'abaabbbaaaabbabbbaaabbb' for 'aaa' and 'bbb'
0123456789
The word between is: 'abba'
0m10.93s real 0m10.93s user 0m00.00s system
Process used 99.0 - 100% CPU according to 'top' command (Linux).
Memory used while running: 1.8Mb
Executable size: 8336 bytes
Ran on a Raspberry Pi 3B+ (4 x 1.4Ghz, Arm 6)
chqrlie's answer
I understand that this is just some example code that shows proper programming practices. Nonetheless, it can make for a decent control in testing.
Please note that I do not know how to deallocate malloc in your code, so it is NOT a fair test. As a result, ram usage builds up, taking 130Mb+ for the process alone. I was still able to run the test for the full 10000000 loops. I will say that I tried deallocating this code the way I did my code (via bringing the function 'simple_strbetween' down into main and deallocating with 'free(strndup(p, q - p));'), and the results weren't much different from not deallocating.
** simple_strbetween.c **
/*
Compile with:
gcc -Wall -O3 simple_strbetween.c -o simple_strbetween
Courtesy of:
https://stackoverflow.com/questions/55308295/extracting-a-string-between-two-similar-or-different-strings-in-c-as-fast-as-p
*/
#include<string.h>
#include<stdio.h>
char *simple_strbetween(const char *str, const char *str1, const char *str2) {
const char *q;
const char *p = strstr(str, str1);
if (p) {
p += strlen(str1);
q = *str2 ? strstr(p, str2) : p + strlen(p);
if (q)
return strndup(p, q - p);
}
return NULL;
}
int main() {
char str[30] = { "abaabbbaaaabbabbbaaabbb" };
char str1[10] = { "aaa" };
char str2[10] = { "bbb" };
printf("Searching \'%s\' for \'%s\' and \'%s\'\n", str, str1, str2);
printf(" 0123456789\n\n"); // Easily see the elements
printf("The word between is: \'%s\'\n", simple_strbetween(str, str1, str2));
for(int i = 10000000; --i;)
simple_strbetween(str, str1, str2);
return 0;
}
$ time simple_strbetween
Searching 'abaabbbaaaabbabbbaaabbb' for 'aaa' and 'bbb'
0123456789
The word between is: 'abba'
0m19.68s real 0m19.34s user 0m00.32s system
Process used 100% CPU according to 'top' command (Linux).
Memory used while running: 130Mb (leak due do my lack of knowledge)
Executable size: 8380 bytes
Ran on a Raspberry Pi 3B+ (4 x 1.4Ghz, Arm 6)
Results for above code ran with this alternate strndup:
char *alt_strndup(const char *s, size_t n)
{
size_t i;
char *p;
for (i = 0; i < n && s[i] != '\0'; i++)
continue;
p = malloc(i + 1);
if (p != NULL) {
memcpy(p, s, i);
p[i] = '\0';
}
return p;
}
$ time simple_strbetween
Searching 'abaabbbaaaabbabbbaaabbb' for 'aaa' and 'bbb'
0123456789
The word between is: 'abba'
0m20.99s real 0m20.54s user 0m00.44s system
I kindly ask that nobody make judgements on the results until the code is properly ran. I will revise the results as soon as it is figured out.
* Edit *
Was able to decrease the time by over 25% (11.93s vs 8.7s). This was done by using pointers to increment the positions, as opposed to size_t. Collecting the return string whilst checking the last string was likely what caused the biggest change. I feel there is still lots of room for improvement. A big loss comes from having to free malloc. If there is a better way, I'd like to know.
fast_strbetween3.c:
/*
gcc -Wall -O3 fast_strbetween.c -o fast_strbetween
*/
#include<stdio.h> // printf
#include<stdlib.h> // malloc, free
char * fast_strbetween(const char *str, const char *str1, const char *str2)
{
const char *sbegin = &str1[0]; // String beginning
const char *spos;
// Find str1
do{
spos = str;
str1 = sbegin;
while(*spos == *str1 && *str1)
{
++spos;
++str1;
}
++str;
}while(*str1 && *spos);
// Nothing found if spos hasn't advanced
if (spos == str)
return NULL;
char *strbetween = malloc(1);
if (!strbetween)
return '\0';
str = spos;
int i = 0;
//char *p = &strbetween[0]; // Alt. for advancing strbetween (slower)
sbegin = &str2[0]; // Recycle sbegin
// Find str2
do{
str2 = sbegin;
spos = str;
while(*spos == *str2 && *str2)
{
++str2;
++spos;
}
//*p = *str;
//++p;
strbetween[i] = *str;
++str;
++i;
}while(*str2 && *spos);
if (spos == str)
return NULL;
//*--p = '\0';
strbetween[i - 1] = '\0';
return strbetween;
}
int main() {
char s[100] = "abaabbbaaaabbabbbaaabbb";
char s1[100] = "aaa";
char s2[100] = "bbb";
printf("\nString: \'%s\'\n", fast_strbetween(s, s1, s2));
for(int i = 10000000; --i; )
free(fast_strbetween(s, s1, s2));
return 0;
}
String: 'abba'
0m08.70s real 0m08.67s user 0m00.01s system
Process used 99.0 - 100% CPU according to 'top' command (Linux).
Memory used while running: 1.8Mb
Executable size: 8336 bytes
Ran on a Raspberry Pi 3B+ (4 x 1.4Ghz, Arm 6)
* Edit *
This doesn't really count as it does not 'return' a value, and therefore is against my own rules, but it does pass a variable through, which is changed and brought back to main. It runs with 1 library and takes 3.6s. Getting rid of malloc was the key.
/*
gcc -Wall -O3 fast_strbetween.c -o fast_strbetween
*/
#include<stdio.h> // printf
unsigned int fast_strbetween(const char *str, const char *str1, const char *str2, char *strbetween)
{
const char *sbegin = &str1[0]; // String beginning
const char *spos;
// Find str1
do{
spos = str;
str1 = sbegin;
while(*spos == *str1 && *str1)
{
++spos;
++str1;
}
++str;
}while(*str1 && *spos);
// Nothing found if spos hasn't advanced
if (spos == str)
{
strbetween[0] = '\0';
return 0;
}
str = spos;
sbegin = &str2[0]; // Recycle sbegin
// Find str2
do{
str2 = sbegin;
spos = str;
while(*spos == *str2 && *str2)
{
++str2;
++spos;
}
*strbetween = *str;
++strbetween;
++str;
}while(*str2 && *spos);
if (spos == str)
{
strbetween[0] = '\0';
return 0;
}
*--strbetween = '\0';
return 1; // Successful (found text)
}
int main() {
char s[100] = "abaabbbaaaabbabbbaaabbb";
char s1[100] = "aaa";
char s2[100] = "bbb";
char sret[100];
fast_strbetween(s, s1, s2, sret);
printf("String: %s\n", sret);
for(int i = 10000000; --i; )
fast_strbetween(s, s1, s2, sret);
return 0;
}
Your code has multiple problems and is probably not as efficient as it should be:
you use types int and unsigned int for indexes into the strings. These types may be smaller than the range of size_t. You should revise your code to use size_t and avoid mixing signed and unsigned types in comparisons.
your functions' string arguments should be declared as const char * as you do not modify the strings and should be able to pass const strings without a warning.
redefining strlen is a bad idea: your version will be slower than the system's optimized, assembly coded and very likely inlined version.
computing the length of str is unnecessary and potentially costly: both str1 and str2 may appear close to the beginning of str, scanning for the end of str will be wasteful.
the while loop inside the first do / while loop is incorrect: while(str1[charsfound] == str[str1pos + charsfound]) charsfound++; may access characters beyond the end of str and str1 as the loop does not stop at the null terminator. If str1 only appears at the end of str, you have undefined behavior.
if str1 is an empty string, you will find it at the end of str instead of at the beginning.
why do you initialize str2pos as int str2pos = str1pos + str1len + 1;? If str2 immediately follows str1 inside str, an empty string should be allocated and returned. Your comment regarding this case is unreadable, you should break such long lines to fit within a typical screen width such as 80 columns. It is debatable whether strbetween("aa", "a", "a") should return "" or NULL. IMHO it should return an allocated empty string, which would be consistent with the expected behavior on strbetween("<name></name>", "<name>", "</name>") or strbetween("''", "'", "'"). Your specification preventing strbetween from returning an empty string produces a counter-intuitive border case.
the second scanning loop has the same problems as the first.
the line char *strbetween = (char *) malloc(sizeof(char) * str2pos - str1pos - str1len); has multiple problems: no cast is necessary in C, if you insist on specifying the element size sizeof(char), which is 1 by definition, you should parenthesize the number of elements, and last but not least, you must allocate one extra element for the null terminator.
You do not test if malloc() succeeded. If it returns NULL, you will have undefined behavior, whereas you should just return NULL.
the copying loop uses a mix of signed and unsigned types, causing potentially counterintuitive behavior on overflow.
you forget to set the null terminator, which is consistent with the allocation size error, but incorrect.
Before you try and optimize code, you must ensure correctness! Your code is too complicated and has multiple flaws. Optimisation is a moot point.
You should first try a very simple implementation using standard C string functions: searching a string inside another one is performed efficiently by strstr.
Here is a simple implementation using strstr and strndup(), which should be available on your system:
#include <string.h>
char *simple_strbetween(const char *str, const char *str1, const char *str2) {
const char *q;
const char *p = strstr(str, str1);
if (p) {
p += strlen(str1);
q = *str2 ? strstr(p, str2) : p + strlen(p);
if (q)
return strndup(p, q - p);
}
return NULL;
}
strndup() is defined in POSIX and is part of the Extensions to the C Library Part II: Dynamic Allocation Functions, ISO/IEC TR 24731-2:2010. If it is not available on your system, it can be redefined as:
#include <stdlib.h>
#include <string.h>
char *strndup(const char *s, size_t n) {
size_t i;
char *p;
for (i = 0; i < n && s[i] != '\0'; i++)
continue;
p = malloc(i + 1);
if (p != NULL) {
memcpy(p, s, i);
p[i] = '\0';
}
return p;
}
To ensure correctness, write a number of test cases, with border cases such as all combinations of empty strings and identical strings.
Once your have thoroughly your strbetween function, you can write a benchmarking framework to test performance. This is not so easy to get reliable performance figures, as you will experience if you try. Remember to configure your compiler to select the appropriate optimisations, -O3 for example.
Only then can you move to the next step: if you are really restricted from using standard C library functions, you may first recode your versions of strstr and strlen and still use the same method. Test this new version both for correctness and for performance.
The redundant parts are the computation of strlen(str1) which must have been determined by strstr when it finds a match. And the scan in strndup() which is unnecessary since no null byte is present between p and q. If you have time to waste, you can try and remove these redundancies at the expense of readability, risking non conformity. I would be surprised if you get any improvement at all on average over a wide variety of test cases. 20% would be remarkable.
I am trying to write a simple program that will read words from a file and print the number of occurrences of a particular word passed to it as argument.
For that, I use fscanf to read the words and copy them into an array of strings that is dynamically allocated.
For some reason, I get an error message.
Here is the code for the readFile function:
void readFile(char** buffer, char** argv){
unsigned int i=0;
FILE* file;
file = fopen(argv[1], "r");
do{
buffer = realloc(buffer, sizeof(char*));
buffer[i] = malloc(46);
}while(fscanf(file, "%s", buffer[i++]));
fclose(file);
}
And here is the main function :
int main(int argc, char** argv){
char** buffer = NULL;
readFile(buffer, argv);
printf("%s\n", buffer[0]);
return 0;
}
I get the following error message :
realloc(): invalid next size
Aborted (core dumped)
I have looked at other threads on this topic but none of them seem to be of help. I could not apply whatever I learned there to my problem.
I used a debugger (VS Code with gdb). Data is written successfully into indices 0,1,2,3 of the buffer array but says error : Cannot access memory at address 0xfbad2488 for index 4 and pauses on exception.
Another thread on this topic suggests there might be a wild pointer somewhere. But I don't see one anywhere.
I have spent days trying to figure this out. Any help will be greatly appreciated.
Thanks.
Your algorithm is wrong on many fronts, including:
buffer is passed by-value. Any modifications where buffer = ... is the assignment will mean nothing to the caller. In C, arguments are always pass-by-value (arrays included, but their "value" is a conversion to temporary pointer to first element, so you get a by-ref synonym there whether you want it or not).
Your realloc usage is wrong. It should be expanding based on the iteration of the loop as a count multiplied by the size of a char *. You only have the latter, with no count multiplier. Therefore, you never allocate more than a single char * with that realloc call.
Your loop termination condition is wrong. Your fscanf call should check for the expected number of arguments to be processed, which in your case is 1. Instead, you're looking for any non-zero value, which EOF is going to be when you hit it. Therefore, the loop never terminates.
Your fscanf call is not protected from buffer overflow : You're allocating a static-sized string for each string read, but not limiting the %s format to the static size specified. This is a recipe for buffer-overflow.
No IO functions are ever checked for success/failure : The following APIs could fail, yet you never check that possibility: fopen, fscanf, realloc, malloc. In failing to do so, you're violating Henry Spencer's 6th Commandment for C Programmers : "If a function be advertised to return an error code in the event of difficulties, thou shalt check for that code, yea, even though the checks triple the size of thy code and produce aches in thy typing fingers, for if thou thinkest ``it cannot happen to me'', the gods shall surely punish thee for thy arrogance."
No mechanism for communicating the allocated string count to the caller : The caller of this function is expecting a resulting char**. Assuming you fix the first item in this list, you still have not provided the caller any means of knowing how long that pointer sequence is when readFile returns. An out-parameter and/or a formal structure is a possible solution to this. Or perhaps a terminating NULL pointer to indicate the list is finished.
(Moderate) You never check argc : Instead, you just send argv directly to readFile, and assume the file name will be at argv[1] and always be valid. Don't do that. readFile should take either a FILE* or a single const char * file name, and act accordingly. It would be considerably more robust.
(Minor) : Extra allocation : Even fixing the above items, you'll still leave one extra buffer allocation in your sequence; the one that failed to read. Not that it matter much in this case, as the caller has no idea how many strings were allocated in the first place (see previous item).
Shoring up all of the above would require a basic rewrite of nearly everything you have posted. In the end, the code would look so different, it's almost not worth trying to salvage what is here. Instead, look at what you have done, look at this list, and see where things went wrong. There's plenty to choose from.
Sample
#include <stdio.h>
#include <stdlib.h>
#define STR_MAX_LEN 46
char ** readFile(const char *fname)
{
char **strs = NULL;
int len = 0;
FILE *fp = fopen(fname, "r");
if (fp != NULL)
{
do
{
// array expansion
void *tmp = realloc(strs, (len+1) * sizeof *strs);
if (tmp == NULL)
{
// failed. cleanup prior success
perror("Failed to expand pointer array");
for (int i=0; i<len; ++i)
free(strs[i]);
free(strs);
strs = NULL;
break;
}
// allocation was good; save off new pointer
strs = tmp;
strs[len] = malloc( STR_MAX_LEN );
if (strs[len] == NULL)
{
// failed. cleanup prior sucess
perror("Failed to allocate string buffer");
for (int i=0; i<len; ++i)
free(strs[i]);
free(strs);
strs = NULL;
break;
}
if (fscanf(fp, "%45s", strs[len]) == 1)
{
++len;
}
else
{
// read failed. we're leaving regardless. the last
// allocation is thrown out, but we terminate the list
// with a NULL to indicate end-of-list to the caller
free(strs[len]);
strs[len] = NULL;
break;
}
} while (1);
fclose(fp);
}
return strs;
}
int main(int argc, char *argv[])
{
if (argc < 2)
exit(EXIT_FAILURE);
char **strs = readFile(argv[1]);
if (strs)
{
// enumerate and free in the same loop
for (char **pp = strs; *pp; ++pp)
{
puts(*pp);
free(*pp);
}
// free the now-defunct pointer array
free(strs);
}
return EXIT_SUCCESS;
}
Output (run against /usr/share/dict/words)
A
a
aa
aal
aalii
aam
Aani
aardvark
aardwolf
Aaron
Aaronic
Aaronical
Aaronite
Aaronitic
Aaru
Ab
aba
Ababdeh
Ababua
abac
abaca
......
zymotechny
zymotic
zymotically
zymotize
zymotoxic
zymurgy
Zyrenian
Zyrian
Zyryan
zythem
Zythia
zythum
Zyzomys
Zyzzogeton
Improvements
The secondary malloc in this code is completely pointless. You're using a fixed length word maximum size, so you could easily retool you array to be a pointer to use this:
char (*strs)[STR_MAX_LEN]
and simply eliminate the per-string malloc code entirely. That does leave the problem of how to tell the caller how many strings were allocated. In the prior version we used a NULL pointer to indicate end-of-list. In this version we can simply use a zero-length string. Doing this makes the declaration of readFile rather odd looking, but for returning a pointer-to-array-of-size-N, its' correct. See below:
#include <stdio.h>
#include <stdlib.h>
#define STR_MAX_LEN 46
char (*readFile(const char *fname))[STR_MAX_LEN]
{
char (*strs)[STR_MAX_LEN] = NULL;
int len = 0;
FILE *fp = fopen(fname, "r");
if (fp != NULL)
{
do
{
// array expansion
void *tmp = realloc(strs, (len+1) * sizeof *strs);
if (tmp == NULL)
{
// failed. cleanup prior success
perror("Failed to expand pointer array");
free(strs);
strs = NULL;
break;
}
// allocation was good; save off new pointer
strs = tmp;
if (fscanf(fp, "%45s", strs[len]) == 1)
{
++len;
}
else
{
// read failed. make the final string zero-length
strs[len][0] = 0;
break;
}
} while (1);
fclose(fp);
}
return strs;
}
int main(int argc, char *argv[])
{
if (argc < 2)
exit(EXIT_FAILURE);
char (*strs)[STR_MAX_LEN] = readFile(argv[1]);
if (strs)
{
// enumerate and free in the same loop
for (char (*s)[STR_MAX_LEN] = strs; (*s)[0]; ++s)
puts(*s);
free(strs);
}
return EXIT_SUCCESS;
}
The output is the same as before.
Another Improvement: Geometric Growth
With a few simple changes we can significantly cut down on the realloc invokes (we're currently doing one per string added) by only doing them in a double-size growth pattern. If each time we reallocate, we double the size of the prior allocation, we will make more and more space available for reading larger numbers of strings before the next allocation:
#include <stdio.h>
#include <stdlib.h>
#define STR_MAX_LEN 46
char (*readFile(const char *fname))[STR_MAX_LEN]
{
char (*strs)[STR_MAX_LEN] = NULL;
int len = 0;
int capacity = 0;
FILE *fp = fopen(fname, "r");
if (fp != NULL)
{
do
{
if (len == capacity)
{
printf("Expanding capacity to %d\n", (2 * capacity + 1));
void *tmp = realloc(strs, (2 * capacity + 1) * sizeof *strs);
if (tmp == NULL)
{
// failed. cleanup prior success
perror("Failed to expand string array");
free(strs);
strs = NULL;
break;
}
// save the new string pointer and capacity
strs = tmp;
capacity = 2 * capacity + 1;
}
if (fscanf(fp, "%45s", strs[len]) == 1)
{
++len;
}
else
{
// read failed. make the final string zero-length
strs[len][0] = 0;
break;
}
} while (1);
// shrink if needed. remember to retain the final empty string
if (strs && (len+1) < capacity)
{
printf("Shrinking capacity to %d\n", len);
void *tmp = realloc(strs, (len+1) * sizeof *strs);
if (tmp)
strs = tmp;
}
fclose(fp);
}
return strs;
}
int main(int argc, char *argv[])
{
if (argc < 2)
exit(EXIT_FAILURE);
char (*strs)[STR_MAX_LEN] = readFile(argv[1]);
if (strs)
{
// enumerate and free in the same loop
for (char (*s)[STR_MAX_LEN] = strs; (*s)[0]; ++s)
puts(*s);
// free the now-defunct pointer array
free(strs);
}
return EXIT_SUCCESS;
}
Output
The output is the same as before, but I added instrumentation to show when expansion happens to illustrate the expansions and final shrinking. I'll leave out the rest of the output (which is over 200k lines of words)
Expanding capacity to 1
Expanding capacity to 3
Expanding capacity to 7
Expanding capacity to 15
Expanding capacity to 31
Expanding capacity to 63
Expanding capacity to 127
Expanding capacity to 255
Expanding capacity to 511
Expanding capacity to 1023
Expanding capacity to 2047
Expanding capacity to 4095
Expanding capacity to 8191
Expanding capacity to 16383
Expanding capacity to 32767
Expanding capacity to 65535
Expanding capacity to 131071
Expanding capacity to 262143
Shrinking capacity to 235886
My function is being passed a struct containing, among other things, a NULL terminated array of pointers to words making up a command with arguments.
I'm performing a glob match on the list of arguments, to expand them into a full list of files, then I want to replace the passed argument array with the new expanded one.
The globbing is working fine, that is, g.gl_pathv is populated with the list of expected files. However, I am having trouble copying this array into the struct I was given.
#include <glob.h>
struct command {
char **argv;
// other fields...
}
void myFunction( struct command * cmd )
{
char **p = cmd->argv;
char* program = *p++; // save the program name (e.g 'ls', and increment to the first argument
glob_t g;
memset(&g, 0, sizeof(g));
g.gl_offs = 1;
int res = glob(*p++, GLOB_DOOFFS, NULL, &g);
glob_handle_res(res);
while (*p)
{
res = glob(*p, GLOB_DOOFFS | GLOB_APPEND, NULL, &g);
glob_handle_res(res);
}
if( g.gl_pathc <= 0 )
{
globfree(&g);
}
cmd->argv = malloc((g.gl_pathc + g.gl_offs) * sizeof *cmd->argv);
if (cmd->argv == NULL) { sys_fatal_error("pattern_expand: malloc failed\n");}
// copy over the arguments
size_t i = g.gl_offs;
for (; i < g.gl_pathc + g.gl_offs; ++i)
cmd->argv[i] = strdup(g.gl_pathv[i]);
// insert the original program name
cmd->argv[0] = strdup(program);
** cmd->argv[g.gl_pathc + g.gl_offs] = 0; **
globfree(&g);
}
void
command_free(struct esh_command * cmd)
{
char ** p = cmd->argv;
while (*p) {
free(*p++); // Segfaults here, was it already freed?
}
free(cmd->argv);
free(cmd);
}
Edit 1: Also, I realized I need to stick program back in there as cmd->argv[0]
Edit 2: Added call to calloc
Edit 3: Edit mem management with tips from Alok
Edit 4: More tips from alok
Edit 5: Almost working.. the app segfaults when freeing the command struct
Finally: Seems like I was missing the terminating NULL, so adding the line:
cmd->argv[g.gl_pathc + g.gl_offs] = 0;
seemed to make it work.
argv is an array of pointers of char *. This means that argv has space for argc char * values. If you try to copy more than that many char * values into it, you will end up with an overflow.
Most likely your glob call results in more than argc elements in gl_pathv field (i.e, gl_pathc > argc). This is undefined behavior.
It is similar to the code below:
/* Wrong code */
#include <string.h>
int a[] = { 1, 2, 3 };
int b[] = { 1, 2, 3, 4 };
memcpy(a, b, sizeof b);
Solution: you should either work with the glob_t struct directly, or allocate new space to copy gl_pathv to a new char **:
char **paths = malloc(g.gl_pathc * sizeof *paths);
if (paths == NULL) { /* handle error */ }
for (size_t i=0; i < g.gl_pathc; ++i) {
/* The following just copies the pointer */
paths[i] = g.gl_pathv[i];
/* If you actually want to copy the string, then
you need to malloc again here.
Something like:
paths[i] = malloc(strlen(g.gl_pathv[i] + 1));
followed by strcpy.
*/
}
/* free all the allocated data when done */
Edit: after your edit:
cmd->argv = calloc(g.gl_pathc, sizeof(char *) *g.gl_pathc);
it should work, but each of argv[1] to argv[g.gl_pathc + g.gl_offs - 1] is a char * that is "owned" by the struct glob. Your memcpy call is only copying the pointers. When you later do globfree(), those pointers don't mean anything anymore. So, you need to do copy the strings for your use:
size_t i;
cmd->argv = malloc((g.gl_pathc+g.gl_offs) * sizeof *cmd->argv);
for (i=g.gl_offs; i < g.gl_pathc + g.gl_offs; ++i)
cmd->argv[i] = strdup(g.gl_pathv[i]);
This makes sure you now have your own private copies of the strings. Be sure to free them (and argv) once you are done.
There are a few other problems with your code.
You are doing *p++, you should do p++, since you're not using the value of the dereferencing.
You should really check the return value of glob.
Your paths variable needs g.gl_pathc + 1 elements, not g.gl_pathc. (Or more correctly, you need to allocate g.gl_pathc + g.gl_offs times sizeof *paths bytes.)
Your for loop to copy strings should be for (j=1; j < g.gl_pathc + g.gl_offs; ++j).
Make sure you prevent shell from expanding your glob. I.e., call ./a.out '*' instead of ./a.out *.
Don't you need to multiple g.gl_pathc by sizeof(char *)?