String character dropping off? - c

I have been using strcat to join several strings. Everything appears to be correct, prints:
/proc/573/fd/ <- with the backslash
13 <- length
After I try to copy the "src" string with strcpy to another string, the trailing character doesn't print in either the "dest" or the "src" strings:
/proc/573/fd <- same string prints without the backslash?
13 <- length is unchanged?
If I call strlen the length shows it is unchanged though?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
// This function counts the number of digit places in 'pid'
int pid_digit_places(int pid)
{
int n = pid;
int places = 0;
while (n)
n /= 10;
places++;
return places;
}
char *construct_path(int pid, char *dir)
{
// get count of places in pid
int places = pid_digit_places(pid);
char *pid_str = calloc(places, sizeof(char));
// create string of pid
sprintf(pid_str, "%d", pid);
char *proc = "/proc/";
size_t plen = strlen(proc);
size_t dlen = strlen(dir) + 1;
char *path = calloc(plen + dlen + places, sizeof(char));
strcat(path, proc);
strcat(path, pid_str);
strcat(path, dir);
return path;
}
void fd_walk(int pid)
{
char *fd = "/fd/";
char *fdpath = construct_path(pid, fd);
// prints "/proc/573/fd/ - as expected
printf("Before: %s\n", fdpath);
// shows a length of 13
printf("Size Before: %d\n", (int)strlen(fdpath));
char *test = calloc(strlen(fdpath) + 1, sizeof(char));
strcpy(test, fdpath);
// prints "/proc/573/fd" no trailing "/"
printf("Copied Str: %s\n", test);
//shows a length of 13 though
printf("Copied Size: %d\n", (int)strlen(test));
// prints "/proc/573/fd" no trailing "/" now
printf("After: %s\n", fdpath);
// still shows length of 13
printf("Size After: %d\n", (int)strlen(fdpath));
}
int main(void)
{
// integer to create path around
int pid = 573;
fd_walk(pid);
return 0;
}
I'm compiling on gcc-4.8.2 with -Wall:
gcc -o src src.c -Wall
I've popped this small example into ideone.
I've made sure to add an extra space for the null-terminator when allocating memory.
I've thought to re-examine how I'm intializing my pointers first and haven't seen anything wrong? How is the string printing as expected with printf and then after copying it, printf prints something different -- undefined behavior?

I have executed your exact code with no troubles. Nonetheless, I see two possible problems:
// This function counts the number of digit places in 'pid'
int pid_digit_places(int pid)
{
int n = pid;
int places = 0;
while (n) { // <-- The braces were missing here.
n /= 10;
places++;
}
return places;
}
char *construct_path(int pid, char *dir)
{
// get count of places in pid
int places = pid_digit_places(pid);
// You need "places" bytes for the digits, plus one for the zero
char *pid_str = calloc(places + 1, sizeof(char));
However, in general, I wouldn't waste time to allocate exactly the memory I needed; the extra code more than compensates in size and complexity.
Just make a guess on the largest possible value, and enforce that guess:
// avoid pid_digit_places altogether
pid_str = malloc(16);
if (pid > 999999999999L) {
// fprintf an error and abort.
// Better yet, see whether this is a limit #define'd in the OS,
// and place an appropriate compile-time # warning. Chances are
// that unless your code's trivial, running it on a system with
// such large PIDs (and therefore likely so different an arch!)
// would cause some other troube to pop up.
// With an # error in place, you save also the pid check!
}

Related

Memory corruption when attempting to use strtok

When I run this code on my microcontroller, it crashes when attempting to printf on "price_right_of_period".
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define DEBUG
char *str = "$3.45";
int main()
{
char *decimal_pos; // where the decimal place is in the string
char buffer[64]; // temporary working buffer
int half_price_auto_calc = 1;
// the cost of the product price difference for half price mode
int price_save_left, price_save_right = 0;
int price_half_left, price_half_right = 0;
// EG: let's say we get the string "$2.89"
char *price_left_of_period; // this will contain "2"
char *price_right_of_period; // this will contain "89"
// find where the decimal is
decimal_pos = strstr(str, ".");
if (decimal_pos != NULL)
{
printf("\nThe decimal point was found at array index %d\n", decimal_pos - str);
printf("Splitting the string \"%s\" into left and right segments\n\n", str);
}
// get everything before the period
strcpy(buffer, str); // copy the string
price_left_of_period = strtok(buffer, ".");
// if the dollar sign exists, skip over it
if (price_left_of_period[0] == '$') price_left_of_period++;
#ifdef DEBUG
printf("price_left_of_period = \"%s\"\n", price_left_of_period);
#endif
// get everything after the period
//
// strtok remembers the last string it worked with and where it ended
// to get the next string, call it again with NULL as the first argument
price_right_of_period = strtok(NULL, "");
#ifdef DEBUG
printf("price_right_of_period = \"%s\"\n\n", price_right_of_period);
#endif
if (half_price_auto_calc == 1)
{
// calculate the amount we saved (before the decimal)
price_save_left = atoi((const char *)price_left_of_period);
// halve the value if half price value is true
price_half_left = price_save_left / 2;
// calculate the amount we saved (before the decimal)
price_save_right = atoi((const char *)price_right_of_period);
// halve the value if half price value is true
price_half_right = price_save_right / 2;
#ifdef DEBUG
printf("price_half_left = \"%d\"\n", price_half_left);
printf("price_half_right = \"%d\"", price_half_right);
#endif
}
return 0;
}
The code runs and works fine here: https://onlinegdb.com/kDAw2cJyz. However as mentioned above, on my MCU it crashes (image below).
Would anyone have any ideas why this might be happening as a result of my code? The code looks right to me but it's always nice to get a second opinion from other C experts :)
SOLUTION:
Your code does have one bug. %d\n", decimal_pos - str doesn't work because decimal_pos - str has the wrong type to
print through %d. You need to cast it (I doubt that's causing the
crash but you can test by commenting it out and re-testing)
The code indeed has a bug here:
printf("\nThe decimal point was found at array index %d\n", decimal_pos - str);
The difference of 2 pointers has type ptrdiff_t which may be different from int expected for %d. You should either use %td or cast the difference as (int)(decimal_pos - str). It is however very surprising that this type mismatch be the cause of your problem.
Note that you copy the string without testing its length in strcpy(buffer, str); which for this example is OK but might have undefined behavior if str points to a longer string.
The code is too complicated: there is no need for strtok() as you already have the offset to the decimal point if any. You can use atoi() with a pointer to the beginning of the integral portion without patching the . with a null byte. You could also use strtol() to avoid strstr() too.
Note also that the code will compute an incorrect price in most cases: "$3.45" will be changed to "$1.22" which is substantially more than a 50% rebate.
You should just convert the number as an integral number of cents and use that to compute the reduced price.
Here is a simplified version:
#include <stdio.h>
#include <stdlib.h>
int half_price_auto_calc = 1;
char *str = "$3.45";
int main() {
int cents;
char *p = str;
if (*p == '$')
p++;
// use strtol to convert dollars and cents
cents = strtol(p, &p, 10) * 100;
if (*p == '.') {
cents += strtol(p + 1, NULL, 10);
}
if (half_price_auto_calc) {
cents /= 2;
}
printf("reduced price: $%d.%02d\n", cents / 100, cents % 100);
return 0;
}

Cannot access empty string from array of strings in C

I'm using an array of strings in C to hold arguments given to a custom shell. I initialize the array of buffers using:
char *args[MAX_CHAR];
Once I parse the arguments, I send them to the following function to determine the type of IO redirection if there are any (this is just the first of 3 functions to check for redirection and it only checks for STDIN redirection).
int parseInputFile(char **args, char *inputFilePath) {
char *inputSymbol = "<";
int isFound = 0;
for (int i = 0; i < MAX_ARG; i++) {
if (strlen(args[i]) == 0) {
isFound = 0;
break;
}
if ((strcmp(args[i], inputSymbol)) == 0) {
strcpy(inputFilePath, args[i+1]);
isFound = 1;
break;
}
}
return isFound;
}
Once I compile and run the shell, it crashes with a SIGSEGV. Using GDB I determined that the shell is crashing on the following line:
if (strlen(args[i]) == 0) {
This is because the address of arg[i] (the first empty string after the parsed commands) is inaccessible. Here is the error from GDB and all relevant variables:
(gdb) next
359 if (strlen(args[i]) == 0) {
(gdb) p args[0]
$1 = 0x7fffffffe570 "echo"
(gdb) p args[1]
$2 = 0x7fffffffe575 "test"
(gdb) p args[2]
$3 = 0x0
(gdb) p i
$4 = 2
(gdb) next
Program received signal SIGSEGV, Segmentation fault.
parseInputFile (args=0x7fffffffd570, inputFilePath=0x7fffffffd240 "") at shell.c:359
359 if (strlen(args[i]) == 0) {
I believe that the p args[2] returning $3 = 0x0 means that because the index has yet to be written to, it is mapped to address 0x0 which is out of the bounds of execution. Although I can't figure out why this is because it was declared as a buffer. Any suggestions on how to solve this problem?
EDIT: Per Kaylum's comment, here is a minimal reproducible example
#include<stdio.h>
#include<string.h>
#include<stdlib.h>
#include<unistd.h>
#include<sys/types.h>
#include<sys/wait.h>
#include <sys/stat.h>
#include<readline/readline.h>
#include<readline/history.h>
#include <fcntl.h>
// Defined values
#define MAX_CHAR 256
#define MAX_ARG 64
#define clear() printf("\033[H\033[J") // Clear window
#define DEFAULT_PROMPT_SUFFIX "> "
char PROMPT[MAX_CHAR], SPATH[1024];
int parseInputFile(char **args, char *inputFilePath) {
char *inputSymbol = "<";
int isFound = 0;
for (int i = 0; i < MAX_ARG; i++) {
if (strlen(args[i]) == 0) {
isFound = 0;
break;
}
if ((strcmp(args[i], inputSymbol)) == 0) {
strcpy(inputFilePath, args[i+1]);
isFound = 1;
break;
}
}
return isFound;
}
int ioRedirectHandler(char **args) {
char inputFilePath[MAX_CHAR] = "";
// Check if any redirects exist
if (parseInputFile(args, inputFilePath)) {
return 1;
} else {
return 0;
}
}
void parseArgs(char *cmd, char **cmdArgs) {
int na;
// Separate each argument of a command to a separate string
for (na = 0; na < MAX_ARG; na++) {
cmdArgs[na] = strsep(&cmd, " ");
if (cmdArgs[na] == NULL) {
break;
}
if (strlen(cmdArgs[na]) == 0) {
na--;
}
}
}
int processInput(char* input, char **args, char **pipedArgs) {
// Parse the single command and args
parseArgs(input, args);
return 0;
}
int getInput(char *input) {
char *buf, loc_prompt[MAX_CHAR] = "\n";
strcat(loc_prompt, PROMPT);
buf = readline(loc_prompt);
if (strlen(buf) != 0) {
add_history(buf);
strcpy(input, buf);
return 0;
} else {
return 1;
}
}
void init() {
char *uname;
clear();
uname = getenv("USER");
printf("\n\n \t\tWelcome to Student Shell, %s! \n\n", uname);
// Initialize the prompt
snprintf(PROMPT, MAX_CHAR, "%s%s", uname, DEFAULT_PROMPT_SUFFIX);
}
int main() {
char input[MAX_CHAR];
char *args[MAX_CHAR], *pipedArgs[MAX_CHAR];
int isPiped = 0, isIORedir = 0;
init();
while(1) {
// Get the user input
if (getInput(input)) {
continue;
}
isPiped = processInput(input, args, pipedArgs);
isIORedir = ioRedirectHandler(args);
}
return 0;
}
Note: If I forgot to include any important information, please let me know and I can get it updated.
When you write
char *args[MAX_CHAR];
you allocate room for MAX_CHAR pointers to char. You do not initialise the array. If it is a global variable, you will have initialised all the pointers to NULL, but you do it in a function, so the elements in the array can point anywhere. You should not dereference them before you have set the pointers to point at something you are allowed to access.
You also do this, though, in parseArgs(), where you do this:
cmdArgs[na] = strsep(&cmd, " ");
There are two potential issues here, but let's deal with the one you hit first. When strsep() is through the tokens you are splitting, it returns NULL. You test for that to get out of parseArgs() so you already know this. However, where your program crashes you seem to have forgotten this again. You call strlen() on a NULL pointer, and that is a no-no.
There is a difference between NULL and the empty string. An empty string is a pointer to a buffer that has the zero char first; the string "" is a pointer to a location that holds the character '\0'. The NULL pointer is a special value for pointers, often address zero, that means that the pointer doesn't point anywhere. Obviously, the NULL pointer cannot point to an empty string. You need to check if an argument is NULL, not if it is the empty string.
If you want to check both for NULL and the empty string, you could do something like
if (!args[i] || strlen(args[i]) == 0) {
If args[i] is NULL then !args[i] is true, so you will enter the if body if you have NULL or if you have a pointer to an empty string.
(You could also check the empty string with !(*args[i]); *args[i] is the first character that args[i] points at. So *args[i] is zero if you have the empty string; zero is interpreted as false, so !(*args[i]) is true if and only if args[i] is the empty string. Not that this is more readable, but it shows again the difference between empty strings and NULL).
I mentioned another issue with the parsed arguments. Whether it is a problem or not depends on the application. But when you parse a string with strsep(), you get pointers into the parsed string. You have to be careful not to free that string (it is input in your main() function) or to modify it after you have parsed the string. If you change the string, you have changed what all the parsed strings look at. You do not do this in your program, so it isn't a problem here, but it is worth keeping in mind. If you want your parsed arguments to survive longer than they do now, after the next command is passed, you need to copy them. The next command that is passed will change them as it is now.
In main
char input[MAX_CHAR];
char *args[MAX_CHAR], *pipedArgs[MAX_CHAR];
are all uninitialized. They contain indeterminate values. This could be a potential source of bugs, but is not the reason here, as
getInput modifies the contents of input to be a valid string before any reads occur.
pipedArgs is unused, so raises no issues (yet).
args is modified by parseArgs to (possibly!) contain a NULL sentinel value, without any indeterminate pointers being read first.
Firstly, in parseArgs it is possible to completely fill args without setting the NULL sentinel value that other parts of the program should rely on.
Looking deeper, in parseInputFile the following
if (strlen(args[i]) == 0)
contradicts the limits imposed by parseArgs that disallows empty strings in the array. More importantly, args[i] may be the sentinel NULL value, and strlen expects a non-NULL pointer to a valid string.
This termination condition should simply check if args[i] is NULL.
With
strcpy(inputFilePath, args[i+1]);
args[i+1] might also be the NULL sentinel value, and strcpy also expects non-NULL pointers to valid strings. You can see this in action when inputSymbol is a match for the final token in the array.
args[i+1] may also evaluate as args[MAX_ARGS], which would be out of bounds.
Additionally, inputFilePath has a string length limit of MAX_CHAR - 1, and args[i+1] is (possibly!) a dynamically allocated string whose length might exceed this.
Some edge cases found in getInput:
Both arguments to
strcat(loc_prompt, PROMPT);
are of the size MAX_CHAR. Since loc_prompt has a length of 1. If PROMPT has the length MAX_CHAR - 1, the resulting string will have the length MAX_CHAR. This would leave no room for the NUL terminating byte.
readline can return NULL in some situations, so
buf = readline(loc_prompt);
if (strlen(buf) != 0) {
can again pass the NULL pointer to strlen.
A similar issue as before, on success readline returns a string of dynamic length, and
strcpy(input, buf);
can cause a buffer overflow by attempting to copy a string greater in length than MAX_CHAR - 1.
buf is a pointer to data allocated by malloc. It's unclear what add_history does, but this pointer must eventually be passed to free.
Some considerations.
Firstly, it is a good habit to initialize your data, even if it might not matter.
Secondly, using constants (#define MAX_CHAR 256) might help to reduce magic numbers, but they can lead you to design your program too rigidly if used in the same way.
Consider building your functions to accept a limit as an argument, and return a length. This allows you to more strictly track the sizes of your data, and prevents you from always designing around the maximum potential case.
A slightly contrived example of designing like this. We can see that find does not have to concern itself with possibly checking MAX_ARGS elements, as it is told precisely how long the list of valid elements is.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX_ARGS 100
char *get_input(char *dest, size_t sz, const char *display) {
char *res;
if (display)
printf("%s", display);
if ((res = fgets(dest, sz, stdin)))
dest[strcspn(dest, "\n")] = '\0';
return res;
}
size_t find(char **list, size_t length, const char *str) {
for (size_t i = 0; i < length; i++)
if (strcmp(list[i], str) == 0)
return i;
return length;
}
size_t split(char **list, size_t limit, char *source, const char *delim) {
size_t length = 0;
char *token;
while (length < limit && (token = strsep(&source, delim)))
if (*token)
list[length++] = token;
return length;
}
int main(void) {
char input[512] = { 0 };
char *args[MAX_ARGS] = { 0 };
puts("Welcome to the shell.");
while (1) {
if (get_input(input, sizeof input, "$ ")) {
size_t argl = split(args, MAX_ARGS, input, " ");
size_t redirection = find(args, argl, "<");
puts("Command parts:");
for (size_t i = 0; i < redirection; i++)
printf("%zu: %s\n", i, args[i]);
puts("Input files:");
if (redirection == argl)
puts("[[NONE]]");
else for (size_t i = redirection + 1; i < argl; i++)
printf("%zu: %s\n", i, args[i]);
}
}
}

Extracting a string between two similar (or different) strings in C as fast as possible

I made a program in C that can find two similar or different strings and extract the string between them. This type of program has so many uses, and generally when you use such a program, you have a lot of info, so it needs to be fast. I would like tips on how to make this program as fast and efficient as possible.
I am looking for suggestions that won't make me resort to heavy libraries (such as regex).
The code must:
be able to extract a string between two similar or different strings
find the 1st occurrence of string1
find the 1st occurrence of string2 which occurs AFTER string1
extract the string between string1 and string2
be able to use string arguments of any size
be foolproof to human error and return NULL if such occurs (example, string1 exceeds entire text string length. don't crash in an element error, but gracefully return NULL)
focus on speed and efficiency
Below is my code. I am quite new to C, coming from C++, so I could probably use a few suggestions, especially regarding efficient/proper use of the 'malloc' command:
fast_strbetween.c:
/*
Compile with:
gcc -Wall -O3 fast_strbetween.c -o fast_strbetween
*/
#include <stdio.h> // printf
#include <stdlib.h> // malloc
// inline function if it pleases the compiler gods
inline size_t fast_strlen(char *str)
{
int i; // Cannot return 'i' if inside for loop
for(i = 0; str[i] != '\0'; ++i);
return i;
}
char *fast_strbetween(char *str, char *str1, char *str2)
{
// size_t segfaults when incorrect length strings are entered (due to going below 0), so use int instead for increased robustness
int str0len = fast_strlen(str);
int str1len = fast_strlen(str1);
int str1pos = 0;
int charsfound = 0;
// Find str1
do {
charsfound = 0;
while (str1[charsfound] == str[str1pos + charsfound])
++charsfound;
} while (++str1pos < str0len - str1len && charsfound < str1len);
// '++str1pos' increments past by 1: needs to be set back by one
--str1pos;
// Whole string not found or logical impossibilty
if (charsfound < str1len)
return NULL;
/* Start searching 2 characters after last character found in str1. This will ensure that there will be space, and logical possibility, for the extracted text to exist or not, and allow immediate bail if the latter case; str1 cannot possibly have anything between it if str2 is right next to it!
Example:
str = 'aa'
str1 = 'a'
str2 = 'a'
returned = '' (should be NULL)
Without such preventative, str1 and str2 would would be found and '' would be returned, not NULL. This also saves 1 do/while loop, one check pertaining to returning null, and two additional calculations:
Example, if you didn't add +1 str2pos, you would need to change the code to:
if (charsfound < str2len || str2pos - str1pos - str1len < 1)
return NULL;
It also allows for text to be found between three similar strings—what??? I can feel my brain going fuzzy!
Let this example explain:
str = 'aaa'
str1 = 'a'
str2 = 'a'
result = '' (should be 'a')
Without the aforementioned preventative, the returned string is '', not 'a'; the program takes the first 'a' for str1 and the second 'a' for str2, and tries to return what is between them (nothing).
*/
int str2pos = str1pos + str1len + 1; // the '1' added to str2pos
int str2len = fast_strlen(str2);
// Find str2
do {
charsfound = 0;
while (str2[charsfound] == str[str2pos + charsfound])
++charsfound;
} while (++str2pos < str0len - str2len + 1 && charsfound < str2len);
// Deincrement due to '++str2pos' over-increment
--str2pos;
if (charsfound < str2len)
return NULL;
// Only allocate what is needed
char *strbetween = (char *)malloc(sizeof(char) * str2pos - str1pos - str1len);
unsigned int tmp = 0;
for (unsigned int i = str1pos + str1len; i < str2pos; i++)
strbetween[tmp++] = str[i];
return strbetween;
}
int main() {
char str[30] = { "abaabbbaaaabbabbbaaabbb" };
char str1[10] = { "aaa" };
char str2[10] = { "bbb" };
//Result should be: 'abba'
printf("The string between is: \'%s\'\n", fast_strbetween(str, str1, str2));
// free malloc as we go
for (int i = 10000000; --i;)
free(fast_strbetween(str, str1, str2));
return 0;
}
In order to have some way of measuring progress, I have already timed the code above (extracting a small string 10000000 times):
$ time fast_strbetween
The string between is: 'abba'
0m11.09s real 0m11.09s user 0m00.00s system
Process used 99.3 - 100% CPU according to 'top' command (Linux).
Memory used while running: 3.7Mb
Executable size: 8336 bytes
Ran on a Raspberry Pi 3B+ (4 x 1.4Ghz, Arm 6)
If anyone would like to offer code, tips, pointers... I would appreciate it. I will also implement the changes and give a timed result for your troubles.
Oh, and one thing that I learned is to always de-allocate malloc; I ran the code above (with extra loops), just before posting this. My computer's ram filled up, and the computer froze. Luckily, Stack made a backup draft! Lesson learned!
* EDIT *
Here is the revised code using chqrlie's advice as best I could. Added extra checks for end of string, which ended up costing about a second of time with the tested phrase but can now bail very fast if the first string is not found. Using null or illogical strings should not result in error, hopefully. Lots of notes int the code, where they can be better understood. If I've left anything thing out or done something incorrectly, please let me know guys; it is not intentional.
fast_strbetween2.c:
/*
Compile with:
gcc -Wall -O3 fast_strbetween2.c -o fast_strbetween2
Corrections and additions courtesy of:
https://stackoverflow.com/questions/55308295/extracting-a-string-between-two-similar-or-different-strings-in-c-as-fast-as-p
*/
#include<stdio.h> // printf
#include<stdlib.h> // malloc, free
// Strings now set to 'const'
char * fast_strbetween(const char *str, const char *str1, const char *str2)
{
// string size will now be calculated by the characters picked up
size_t str1pos = 0;
size_t str1chars;
// Find str1
do{
str1chars = 0;
// Will the do/while str1 check for '\0' suffice?
// I haven't seen any issues yet, but not sure.
while(str1[str1chars] == str[str1pos + str1chars] && str1[str1chars] != '\0')
{
//printf("Found str1 char: %i num: %i pos: %i\n", str1[str1chars], str1chars + 1, str1pos);
++str1chars;
}
// Incrementing whilst not in conditional expression tested faster
++str1pos;
/* There are two checks for "str1[str1chars] != '\0'". Trying to find
another efficient way to do it in one. */
}while(str[str1pos] != '\0' && str1[str1chars] != '\0');
--str1pos;
//For testing:
//printf("str1pos: %i str1chars: %i\n", str1pos, str1chars);
// exit if no chars were found or if didn't reach end of str1
if(!str1chars || str1[str1chars] != '\0')
{
//printf("Bailing from str1 result\n");
return '\0';
}
/* Got rid of the '+1' code which didn't allow for '' returns.
I agree with your logic of <tag></tag> returning ''. */
size_t str2pos = str1pos + str1chars;
size_t str2chars;
//printf("Starting pos for str2: %i\n", str1pos + str1chars);
// Find str2
do{
str2chars = 0;
while(str2[str2chars] == str[str2pos + str2chars] && str2[str2chars] != '\0')
{
//printf("Found str2 char: %i num: %i pos: %i \n", str2[str2chars], str2chars + 1, str2pos);
++str2chars;
}
++str2pos;
}while(str[str2pos] != '\0' && str2[str2chars] != '\0');
--str2pos;
//For testing:
//printf("str2pos: %i str2chars: %i\n", str2pos, str2chars);
if(!str2chars || str2[str2chars] != '\0')
{
//printf("Bailing from str2 result!\n");
return '\0';
}
/* Trying to allocate strbetween with malloc. Is this correct? */
char * strbetween = malloc(2);
// Check if malloc succeeded:
if (strbetween == '\0') return '\0';
size_t tmp = 0;
// Grab and store the string between!
for(size_t i = str1pos + str1chars; i < str2pos; ++i)
{
strbetween[tmp] = str[i];
++tmp;
}
return strbetween;
}
int main() {
char str[30] = { "abaabbbaaaabbabbbaaabbb" };
char str1[10] = { "aaa" };
char str2[10] = { "bbb" };
printf("Searching \'%s\' for \'%s\' and \'%s\'\n", str, str1, str2);
printf(" 0123456789\n\n"); // Easily see the elements
printf("The word between is: \'%s\'\n", fast_strbetween(str, str1, str2));
for(int i = 10000000; --i;)
free(fast_strbetween(str, str1, str2));
return 0;
}
** Results **
$ time fast_strbetween2
Searching 'abaabbbaaaabbabbbaaabbb' for 'aaa' and 'bbb'
0123456789
The word between is: 'abba'
0m10.93s real 0m10.93s user 0m00.00s system
Process used 99.0 - 100% CPU according to 'top' command (Linux).
Memory used while running: 1.8Mb
Executable size: 8336 bytes
Ran on a Raspberry Pi 3B+ (4 x 1.4Ghz, Arm 6)
chqrlie's answer
I understand that this is just some example code that shows proper programming practices. Nonetheless, it can make for a decent control in testing.
Please note that I do not know how to deallocate malloc in your code, so it is NOT a fair test. As a result, ram usage builds up, taking 130Mb+ for the process alone. I was still able to run the test for the full 10000000 loops. I will say that I tried deallocating this code the way I did my code (via bringing the function 'simple_strbetween' down into main and deallocating with 'free(strndup(p, q - p));'), and the results weren't much different from not deallocating.
** simple_strbetween.c **
/*
Compile with:
gcc -Wall -O3 simple_strbetween.c -o simple_strbetween
Courtesy of:
https://stackoverflow.com/questions/55308295/extracting-a-string-between-two-similar-or-different-strings-in-c-as-fast-as-p
*/
#include<string.h>
#include<stdio.h>
char *simple_strbetween(const char *str, const char *str1, const char *str2) {
const char *q;
const char *p = strstr(str, str1);
if (p) {
p += strlen(str1);
q = *str2 ? strstr(p, str2) : p + strlen(p);
if (q)
return strndup(p, q - p);
}
return NULL;
}
int main() {
char str[30] = { "abaabbbaaaabbabbbaaabbb" };
char str1[10] = { "aaa" };
char str2[10] = { "bbb" };
printf("Searching \'%s\' for \'%s\' and \'%s\'\n", str, str1, str2);
printf(" 0123456789\n\n"); // Easily see the elements
printf("The word between is: \'%s\'\n", simple_strbetween(str, str1, str2));
for(int i = 10000000; --i;)
simple_strbetween(str, str1, str2);
return 0;
}
$ time simple_strbetween
Searching 'abaabbbaaaabbabbbaaabbb' for 'aaa' and 'bbb'
0123456789
The word between is: 'abba'
0m19.68s real 0m19.34s user 0m00.32s system
Process used 100% CPU according to 'top' command (Linux).
Memory used while running: 130Mb (leak due do my lack of knowledge)
Executable size: 8380 bytes
Ran on a Raspberry Pi 3B+ (4 x 1.4Ghz, Arm 6)
Results for above code ran with this alternate strndup:
char *alt_strndup(const char *s, size_t n)
{
size_t i;
char *p;
for (i = 0; i < n && s[i] != '\0'; i++)
continue;
p = malloc(i + 1);
if (p != NULL) {
memcpy(p, s, i);
p[i] = '\0';
}
return p;
}
$ time simple_strbetween
Searching 'abaabbbaaaabbabbbaaabbb' for 'aaa' and 'bbb'
0123456789
The word between is: 'abba'
0m20.99s real 0m20.54s user 0m00.44s system
I kindly ask that nobody make judgements on the results until the code is properly ran. I will revise the results as soon as it is figured out.
* Edit *
Was able to decrease the time by over 25% (11.93s vs 8.7s). This was done by using pointers to increment the positions, as opposed to size_t. Collecting the return string whilst checking the last string was likely what caused the biggest change. I feel there is still lots of room for improvement. A big loss comes from having to free malloc. If there is a better way, I'd like to know.
fast_strbetween3.c:
/*
gcc -Wall -O3 fast_strbetween.c -o fast_strbetween
*/
#include<stdio.h> // printf
#include<stdlib.h> // malloc, free
char * fast_strbetween(const char *str, const char *str1, const char *str2)
{
const char *sbegin = &str1[0]; // String beginning
const char *spos;
// Find str1
do{
spos = str;
str1 = sbegin;
while(*spos == *str1 && *str1)
{
++spos;
++str1;
}
++str;
}while(*str1 && *spos);
// Nothing found if spos hasn't advanced
if (spos == str)
return NULL;
char *strbetween = malloc(1);
if (!strbetween)
return '\0';
str = spos;
int i = 0;
//char *p = &strbetween[0]; // Alt. for advancing strbetween (slower)
sbegin = &str2[0]; // Recycle sbegin
// Find str2
do{
str2 = sbegin;
spos = str;
while(*spos == *str2 && *str2)
{
++str2;
++spos;
}
//*p = *str;
//++p;
strbetween[i] = *str;
++str;
++i;
}while(*str2 && *spos);
if (spos == str)
return NULL;
//*--p = '\0';
strbetween[i - 1] = '\0';
return strbetween;
}
int main() {
char s[100] = "abaabbbaaaabbabbbaaabbb";
char s1[100] = "aaa";
char s2[100] = "bbb";
printf("\nString: \'%s\'\n", fast_strbetween(s, s1, s2));
for(int i = 10000000; --i; )
free(fast_strbetween(s, s1, s2));
return 0;
}
String: 'abba'
0m08.70s real 0m08.67s user 0m00.01s system
Process used 99.0 - 100% CPU according to 'top' command (Linux).
Memory used while running: 1.8Mb
Executable size: 8336 bytes
Ran on a Raspberry Pi 3B+ (4 x 1.4Ghz, Arm 6)
* Edit *
This doesn't really count as it does not 'return' a value, and therefore is against my own rules, but it does pass a variable through, which is changed and brought back to main. It runs with 1 library and takes 3.6s. Getting rid of malloc was the key.
/*
gcc -Wall -O3 fast_strbetween.c -o fast_strbetween
*/
#include<stdio.h> // printf
unsigned int fast_strbetween(const char *str, const char *str1, const char *str2, char *strbetween)
{
const char *sbegin = &str1[0]; // String beginning
const char *spos;
// Find str1
do{
spos = str;
str1 = sbegin;
while(*spos == *str1 && *str1)
{
++spos;
++str1;
}
++str;
}while(*str1 && *spos);
// Nothing found if spos hasn't advanced
if (spos == str)
{
strbetween[0] = '\0';
return 0;
}
str = spos;
sbegin = &str2[0]; // Recycle sbegin
// Find str2
do{
str2 = sbegin;
spos = str;
while(*spos == *str2 && *str2)
{
++str2;
++spos;
}
*strbetween = *str;
++strbetween;
++str;
}while(*str2 && *spos);
if (spos == str)
{
strbetween[0] = '\0';
return 0;
}
*--strbetween = '\0';
return 1; // Successful (found text)
}
int main() {
char s[100] = "abaabbbaaaabbabbbaaabbb";
char s1[100] = "aaa";
char s2[100] = "bbb";
char sret[100];
fast_strbetween(s, s1, s2, sret);
printf("String: %s\n", sret);
for(int i = 10000000; --i; )
fast_strbetween(s, s1, s2, sret);
return 0;
}
Your code has multiple problems and is probably not as efficient as it should be:
you use types int and unsigned int for indexes into the strings. These types may be smaller than the range of size_t. You should revise your code to use size_t and avoid mixing signed and unsigned types in comparisons.
your functions' string arguments should be declared as const char * as you do not modify the strings and should be able to pass const strings without a warning.
redefining strlen is a bad idea: your version will be slower than the system's optimized, assembly coded and very likely inlined version.
computing the length of str is unnecessary and potentially costly: both str1 and str2 may appear close to the beginning of str, scanning for the end of str will be wasteful.
the while loop inside the first do / while loop is incorrect: while(str1[charsfound] == str[str1pos + charsfound]) charsfound++; may access characters beyond the end of str and str1 as the loop does not stop at the null terminator. If str1 only appears at the end of str, you have undefined behavior.
if str1 is an empty string, you will find it at the end of str instead of at the beginning.
why do you initialize str2pos as int str2pos = str1pos + str1len + 1;? If str2 immediately follows str1 inside str, an empty string should be allocated and returned. Your comment regarding this case is unreadable, you should break such long lines to fit within a typical screen width such as 80 columns. It is debatable whether strbetween("aa", "a", "a") should return "" or NULL. IMHO it should return an allocated empty string, which would be consistent with the expected behavior on strbetween("<name></name>", "<name>", "</name>") or strbetween("''", "'", "'"). Your specification preventing strbetween from returning an empty string produces a counter-intuitive border case.
the second scanning loop has the same problems as the first.
the line char *strbetween = (char *) malloc(sizeof(char) * str2pos - str1pos - str1len); has multiple problems: no cast is necessary in C, if you insist on specifying the element size sizeof(char), which is 1 by definition, you should parenthesize the number of elements, and last but not least, you must allocate one extra element for the null terminator.
You do not test if malloc() succeeded. If it returns NULL, you will have undefined behavior, whereas you should just return NULL.
the copying loop uses a mix of signed and unsigned types, causing potentially counterintuitive behavior on overflow.
you forget to set the null terminator, which is consistent with the allocation size error, but incorrect.
Before you try and optimize code, you must ensure correctness! Your code is too complicated and has multiple flaws. Optimisation is a moot point.
You should first try a very simple implementation using standard C string functions: searching a string inside another one is performed efficiently by strstr.
Here is a simple implementation using strstr and strndup(), which should be available on your system:
#include <string.h>
char *simple_strbetween(const char *str, const char *str1, const char *str2) {
const char *q;
const char *p = strstr(str, str1);
if (p) {
p += strlen(str1);
q = *str2 ? strstr(p, str2) : p + strlen(p);
if (q)
return strndup(p, q - p);
}
return NULL;
}
strndup() is defined in POSIX and is part of the Extensions to the C Library Part II: Dynamic Allocation Functions, ISO/IEC TR 24731-2:2010. If it is not available on your system, it can be redefined as:
#include <stdlib.h>
#include <string.h>
char *strndup(const char *s, size_t n) {
size_t i;
char *p;
for (i = 0; i < n && s[i] != '\0'; i++)
continue;
p = malloc(i + 1);
if (p != NULL) {
memcpy(p, s, i);
p[i] = '\0';
}
return p;
}
To ensure correctness, write a number of test cases, with border cases such as all combinations of empty strings and identical strings.
Once your have thoroughly your strbetween function, you can write a benchmarking framework to test performance. This is not so easy to get reliable performance figures, as you will experience if you try. Remember to configure your compiler to select the appropriate optimisations, -O3 for example.
Only then can you move to the next step: if you are really restricted from using standard C library functions, you may first recode your versions of strstr and strlen and still use the same method. Test this new version both for correctness and for performance.
The redundant parts are the computation of strlen(str1) which must have been determined by strstr when it finds a match. And the scan in strndup() which is unnecessary since no null byte is present between p and q. If you have time to waste, you can try and remove these redundancies at the expense of readability, risking non conformity. I would be surprised if you get any improvement at all on average over a wide variety of test cases. 20% would be remarkable.

C - Char array SEEMS to copy, but only within loop scope

Right now, I'm attempting to familiarize myself with C by writing a function which, given a string, will replace all instances of a target substring with a new substring. However, I've run into a problem with a reallocation of a char* array. To my eyes, it seems as though I'm able to successfully reallocate the array string to a desired new size at the end of the main loop, then perform a strcpy to fill it with an updated string. However, it fails for the following scenario:
Original input for string: "use the restroom. Then I need"
Target to replace: "the" (case insensitive)
Desired replacement value: "th'"
At the end of the loop, the line printf("result: %s\n ",string); prints out the correct phrase "use th' restroom. Then I need". However, string seems to then reset itself: the call to strcasestr in the while() statement is successful, the line at the beginning of the loop printf("string: %s \n",string); prints the original input string, and the loop continues indefinitely.
Any ideas would be much appreciated (and I apologize in advance for my flailing debug printf statements). Thanks!
The code for the function is as follows:
int replaceSubstring(char *string, int strLen, char*oldSubstring,
int oldSublen, char*newSubstring, int newSublen )
{
printf("Starting replace\n");
char* strLoc;
while((strLoc = strcasestr(string, oldSubstring)) != NULL )
{
printf("string: %s \n",string);
printf("%d",newSublen);
char *newBuf = (char *) malloc((size_t)(strLen +
(newSublen - oldSublen)));
printf("got newbuf\n");
int stringIndex = 0;
int newBufIndex = 0;
char c;
while(true)
{
if(stringIndex > 500)
break;
if(&string[stringIndex] == strLoc)
{
int j;
for(j=0; j < newSublen; j++)
{
printf("new index: %d %c --> %c\n",
j+newBufIndex, newBuf[newBufIndex+j], newSubstring[j]);
newBuf[newBufIndex+j] = newSubstring[j];
}
stringIndex += oldSublen;
newBufIndex += newSublen;
}
else
{
printf("old index: %d %c --> %c\n", stringIndex,
newBuf[newBufIndex], string[stringIndex]);
newBuf[newBufIndex] = string[stringIndex];
if(string[stringIndex] == '\0')
break;
newBufIndex++;
stringIndex++;
}
}
int length = (size_t)(strLen + (newSublen - oldSublen));
string = (char*)realloc(string,
(size_t)(strLen + (newSublen - oldSublen)));
strcpy(string, newBuf);
printf("result: %s\n ",string);
free(newBuf);
}
printf("end result: %s ",string);
}
At first the task should be clarified regarding desired behavior and interface.
The topic "Char array..." is not clear.
You provide strLen, oldSublen newSublen, so it looks that you indeed want to work just with bulk memory buffers with given length.
However, you use strcasestr, strcpy and string[stringIndex] == '\0' and also mention printf("result: %s\n ",string);.
So I assume that you want to work with "null terminated strings" that can be passed by the caller as string literals: "abc".
It is not needed to pass all those lengths to the function.
It looks that you are trying to implement recursive string replacement. After each replacement you start from the beginning.
Let's consider more complicated sets of parameters, for example, replace aba by ab in abaaba.
Case 1: single pass through input stream
Each of both old substrings can be replaced: "abaaba" => "abab"
That is how the standard sed string replacement works:
> echo "abaaba" | sed 's/aba/ab/g'
abab
Case 2: recursive replacement taking into account possible overlapping
The first replacement: "abaaba" => "ababa"
The second replacement in already replaced result: "ababa" => "abba"
Note that this case is not safe, for example replace "loop" by "loop loop". It is an infinite loop.
Suppose we want to implement a function that takes null terminated strings and the replacement is done in one pass as with sed.
In general the replacement cannot be done in place of input string (in the same memory).
Note that realloc may allocate new memory block with new address, so you should return that address to the caller.
For implementation simplicity it is possible to calculate required space for result before memory allocation (Case 1 implementation). So reallocation is not needed:
#define _GNU_SOURCE
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
char* replaceSubstring(const char* string, const char* oldSubstring,
const char* newSubstring)
{
size_t strLen = strlen(string);
size_t oldSublen = strlen(oldSubstring);
size_t newSublen = strlen(newSubstring);
const char* strLoc = string;
size_t replacements = 0;
/* count number of replacements */
while ((strLoc = strcasestr(strLoc, oldSubstring)))
{
strLoc += oldSublen;
++replacements;
}
/* result size: initial size + replacement diff + sizeof('\0') */
size_t result_size = strLen + (newSublen - oldSublen) * replacements + 1;
char* result = malloc(result_size);
if (!result)
return NULL;
char* resCurrent = result;
const char* strCurrent = string;
strLoc = string;
while ((strLoc = strcasestr(strLoc, oldSubstring)))
{
memcpy(resCurrent, strCurrent, strLoc - strCurrent);
resCurrent += strLoc - strCurrent;
memcpy(resCurrent, newSubstring, newSublen);
resCurrent += newSublen;
strLoc += oldSublen;
strCurrent = strLoc;
}
strcpy(resCurrent, strCurrent);
return result;
}
int main()
{
char* res;
res = replaceSubstring("use the restroom. Then I need", "the", "th");
printf("%s\n", res);
free(res);
res = replaceSubstring("abaaba", "aba", "ab");
printf("%s\n", res);
free(res);
return 0;
}

How to add/remove a string in a dynamic array in c language

I have a defined array sample:
char *arguments[] = {"test-1","test-2","test-3"};
I am trying to add an argument input given by the command line. I tried the strcpy function and also to pass it through the array element e.g. arguments[num+1] = argv[1] but again not successfully.
I know that this is a very simple question but I am not an experienced programmer and all my experience comes from higher level programming languages (PHP, Perl).
The closest sample of work that I found online is C program to insert an element in an array and C program to delete an element from an array. But are not exactly what I am looking for and they are working with intigers not characters that I need.
My goal is to find a way to add and remove strings from a dynamic array that can grow and shrink based on the process of the script.
Thanks everyone for their time and effort to assist me.
Sample of a working code is given under:
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
/* Set as minimum parameters 2 */
#define MIN_REQUIRED 2
#define MAX_CHARACTERS 46
/* Usage Instructions */
int help() {
printf("Usage: test.c [-s <arg0>]\n");
printf("\t-s: a string program name <arg0>\n");
printf("\t-s: a string sample name <arg1>\n");
return (1);
}
int main(int argc, char *argv[]) {
if ( argc < MIN_REQUIRED ) {
printf ("Please follow the instructions: not less than %i argument inputs\n",MIN_REQUIRED);
return help();
}
else if ( argc > MIN_REQUIRED ) {
printf ("Please follow the instructions: not more than %i argument inputs\n",MIN_REQUIRED);
return help();
}
else {
int size, realsize;
char *input = NULL;
char *arguments[] = {"test-1","test-2","test-3"};
int num = sizeof(arguments) / sizeof(arguments[0]);
printf("This is the number of elements before: %i\n",num);
int i;
for (i=0; i<num; i++) {
printf("This is the arguments before: [%i]: %s\n",i,arguments[i]);
}
printf("This is the input argument: %s\n",argv[1]);
printf("This is the array element: %i\n",num+1);
input = (char *)malloc(MAX_CHARACTERS);
if (input == NULL) {
printf("malloc_in failled\n");
exit(0);
}
memset ( input , '\0' , MAX_CHARACTERS);
int length_before = strlen(input);
printf("This is the length before: %i\n",length_before);
strcpy(input , argv[1]);
int length_after = strlen(input);
printf("This is the length after: %i\n",length_after);
//arguments[num+1] = input;
strcpy(arguments[num+1],input);
int num_2 = sizeof(arguments) / sizeof(arguments[0]);
printf("This is the number of elements after: %i\n",num);
for (i=0; i<num_2; i++) {
printf("This is the arguments after [%i]: %s\n",i,arguments[i]);
}
} // End of else condition
return 0;
} // Enf of int main ()
"My goal is to find a way to add and remove strings from a dynamic array":
char *arguments[] = {...} is statically allocated, so it cannot serve as a "dynamic array".
strcpy(arguments[num+1],input):
You cannot access arguments[num+1] when this array has only num entries.
Suggested fix - allocate and initialize arguments dynamically, according to the value of argc:
char* strings[] = {"test-1","test-2","test-3"};
int i, num = sizeof(strings) / sizeof(*strings);
char** arguments = malloc((num+argc-1)*sizeof(char*));
if (arguments == NULL)
; // Exit with a failure
for (i=0; i<num; i++)
{
arguments[i] = malloc(strlen(strings[i])+1);
if (arguments[i] == NULL)
; // Deallocate what's already been allocated, and exit with a failure
strcpy(arguments[i],strings[i]);
}
for (i=0; i<argc-1; i++)
{
arguments[num+i] = malloc(strlen(argv[i+1])+1);
if (arguments[num+i] == NULL)
; // Deallocate what's already been allocated, and exit with a failure
strcpy(arguments[num+i],argv[i+1]);
}
...
// Deallocate everything before ending the program
There are no dynamic arrays in C, arguments has a static size, capable of holding 3 elements,
strcpy(arguments[num+1],input);
is simply undefined behaviour, the expression arguments[num+1] accesses an array out of bounds (two elements after the last); no magical reallocation or something will happen.
In general, you have three options:
You have an upper bound of how many items you want to be able to store in the array and declare the array to have that size. Either keep track of the numbers actually stored in it or add some sentinel value (which needs additional space!) indicating the end.
You have an upper bound, and if the number of items you want to store happen to exceed this limit, you abort (returning an error indicator, telling the user that the input data is too big, …).
Look for malloc, realloc and free.

Resources