I have a string, and in it I need to find a substring and replace it. The one to be found and the one that'll replace it are of different length. My code, partially:
char *source_str = "aaa bbb CcCc dddd kkkk xxx yyyy";
char *pattern = "cccc";
char *new_sub_s = "mmmmm4343afdsafd";
char *sub_s1 = strcasestr(source_str, pattern);
printf("sub_s1: %s\r\n", sub_s1);
printf("sub_str before pattern: %s\r\n", sub_s1 - source_str); // Memory corruption
char *new_str = (char *)malloc(strlen(source_str) - strlen(pattern) + strlen(new_sub_s) + 1);
strcat(new_str, '\0');
strcat(new_str, "??? part before pattern ???");
strcat(new_str, new_sub_s);
strcat(new_str, "??? part after pattern ???");
Why do I have memory corruption?
How do I effective extract and replace pattern with new_sub_s?
There are multiple problems in your code:
you do not test if sub_s1 was found in the string. What if there is no match?
printf("sub_str before pattern: %s\r\n", sub_s1 - source_str); passes a difference of pointers for %s that expects a string. The behavior is undefined.
strcat(new_str, '\0'); has undefined behavior because the destination string is uninitialized and you pass a null pointer as the string to concatenate. strcat expects a string pointer as its second argument, not a char, and '\0' is a character constant with type int (in C) and value 0, which the compiler will convert to a null pointer, with or without a warning. You probably meant to write *new_str = '\0';
You cannot compose the new string with strcat as posted: because the string before the match is not a C string, it is a fragment of a C string. You should instead determine the lengths of the different parts of the source string and use memcpy to copy fragments with explicit lengths.
Here is an example:
char *patch_string(const char *source_str, const char *pattern, const char *replacement) {
char *match = strcasestr(source_str, pattern);
if (match != NULL) {
size_t len = strlen(source_str);
size_t n1 = match - source_str; // # bytes before the match
size_t n2 = strlen(pattern); // # bytes in the pattern string
size_t n3 = strlen(replacement); // # bytes in the replacement string
size_t n4 = len - n1 - n2; // # bytes after the pattern in the source string
char *result = malloc(n1 + n3 + n4 + 1);
if (result != NULL) {
// copy the initial portion
memcpy(result, source_str, n1);
// copy the replacement string
memcpy(result + n1, replacement, n3);
// copy the trailing bytes, including the null terminator
memcpy(result + n1 + n3, match + n2, n4 + 1);
}
return result;
} else {
return strdup(source_str); // always return an allocated string
}
}
Note that the above code assumes that the match in the source string has be same length as the pattern string (in the example, strings "cccc" an "CcCc" have the same length). Given that strcasestr is expected to perform a case independent search, which is confirmed by the example strings in the question, it might be possible that this assumption fail, for example if the encoding of upper and lower case letters have a different length, or if accents are matched by strcasestr as would be expected in French: "é" and "E" should match but have a different length when encoded in UTF-8. If strcasestr has this advanced behavior, it is not possible to determine the length of the matched portion of the source string without a more elaborate API.
printf("sub_str before pattern: %s\r\n", sub_s1 - source_str); // Memory corruption
You're taking the difference of two pointers, and printing it as though it was a pointer to a string. In practice, on your machine, this probably calculates a meaningless number and interprets it as a memory address. Since this is a small number, when interpreted as an address, on your system, this probably points to unmapped memory, so your program crashes. Depending on the platform, on the compiler, on optimization settings, on what else there is in your program, and on the phase of the Moon, anything could happen. It's undefined behavior.
Any half-decent compiler would tell you that there's a type mismatch between the %s directive and the argument. Turn those warnings on. For example, with GCC:
gcc -Wall -Wextra -Werror -O my_program.c
char *new_str = (char *)malloc(…);
strcat(new_str, '\0');
strcat(new_str, "…");
The first call to strcat attempts to append '\0'. This is a character, not a string. It happens that since this is the character 0, and C doesn't distinguish between characters and numbers, this is just a weird way of writing the integer 0. And any integer constant with the value 0 is a valid way of writing a null pointer constant. So strcat(new_str, '\0') is equivalent to strcat(new_str, NULL) which will probably crash due to attempting to dereference the null pointer. Depending on the compiler optimizations, it's possible that the compiler will think that this block of code is never executed, since it's attempting to dereference a null pointer, and this is undefined behavior: as far as the compiler is concerned, this can't happen. This is a case where you can plausibly expect that the undefined behavior causes the compiler to do something that looks preposterous, but makes perfect sense from the way the compiler sees the program.
Even if you'd written strcat(new_str, "\0") as you probably intended, that would be pointless. Note that "\0" is a pointless way of writing "": there's always a null terminator at the end of a string literal¹. And appending an empty string to a string wouldn't change it.
And there's another problem with the strcat calls. At this point, the content of new_str is not initialized. But strcat (if called correctly, even for strcat(new_str, ""), if the compiler doesn't optimize this away) will explore this uninitialized memory and look for the first null byte. Because the memory is uninitialized, there's no guarantee that there is a null byte in the allocated memory, so strcat may attempt to read from an unmapped address when it runs out of buffer, or it may corrupt whatever. Or it may make demons fly out of your nose: once again it's undefined behavior.
Before you do anything with the newly allocated memory area, make it contain the empty string: set the first character to 0. And before that, check that malloc succeeded. It will always succeed in your toy program, but not in the real world.
char *new_str = malloc(…);
if (new_str == NULL) {
return NULL; // or whatever you want to do to handle the error
}
new_str[0] = 0;
strcat(new_str, …);
¹ The only time there isn't a null pointer at the end of a "…" is when you use this to initialize an array and the characters that are spelled out fill the whole array without leaving room for a null terminator.
snprintf can be used to calculate the memory needed and then print the string to the allocated pointer.
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main ( void) {
char *source_str = "aaa bbb CcCc dddd kkkk xxx yyyy";
char *pattern = "cccc";
char *new_sub_s = "mmmmm4343afdsafd";
char *sub_s1 = strcasestr(source_str, pattern);
int span = (int)( sub_s1 - source_str);
char *tail = sub_s1 + strlen ( pattern);
size_t size = snprintf ( NULL, 0, "%.*s%s%s", span, source_str, new_sub_s, tail);
char *new_str = malloc( size + 1);
snprintf ( new_str, size, "%.*s%s%s", span, source_str, new_sub_s, tail);
printf ( "%s\n", new_str);
free ( new_str);
return 0;
}
Related
This subprogram takes three user inputs: a text string, a path to a file, and a 1 digit flag. It loads the file into a buffer, then appends both the flag and the file buffer, in that order, to a char array that serves as a payload. It returns the payload and the original user string.
I received a bug where some of my string operations on the file buffer, flag, and payload appeared to corrupt the memory that the user_string was located in. I fixed the bug by swapping strcat(flag, buffer) to strcpy(payload, flag), (which is what I intended to write originally), but I'm still perplexed as to what caused this bug.
My guess from reading the documentation (https://www.gnu.org/software/libc/manual/html_node/Concatenating-Strings.html , https://www.gnu.org/software/libc/manual/html_node/Concatenating-Strings.html) is that strcat extends the to string strlen(to) bytes into unprotected memory, which the file contents loaded into the buffer copied over in a buffer overflow.
My questions are:
Is my guess correct?
Is there a way to reliably prevent this from occurring? Catching this sort of thing with an if(){} check is kind of unreliable, as it doesn't consistently return something obviously wrong; you expect a string of length filelength+1 and get a string of filelength+1.
bonus/unrelated: is there any computational cost/drawbacks/effects with calling a variable without operating on it?
/*
user inputs:
argv[0] = tendigitaa/four
argv[1] = ~/Desktop/helloworld.txt
argv[2] = 1
helloworld.txt is a text file containing (no quotes) : "Hello World"
*/
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <unistd.h>
#include <string.h>
int main (int argc, char **argv) {
char user_string[100] = "0";
char file_path[100] = "0";
char flag[1] = "0";
strcpy(user_string, argv[1]);
strcpy(file_path, argv[2]);
strcpy(flag, argv[3]);
/*
at this point printfs of the three declared variables return the same as the user inputs.
======
======
a bunch of other stuff happens...
======
======
and then this point printfs of the three declared variables return the same as the user inputs.
*/
FILE *file;
char * buffer = 0;
long filelength;
file = fopen(file_path, "r");
if (file) {
fseek(file, 0, SEEK_END);
filelength = ftell(file);
fseek(file, 0, SEEK_SET);
buffer = malloc(filelength);
printf("stringcheck1: %s \n", user_string);
if (buffer) {
fread(buffer, 1, filelength, file);
}
}
long payloadlen = filelength + 1;
char payload[payloadlen];
printf("stringcheck2: %s \n", user_string);
strcpy(payload, flag);
printf("stringcheck3: %s \n", user_string);
strcat(flag, buffer);
printf("stringcheck4: %s \n", user_string); //bug here
free(buffer);
printf("stringcheck5: %s \n", user_string);
payload; user_string; //bonus question: does this line have any effect on the program or computational cost?
return 0;
}
/*
printf output:
stringcheck1: tendigitaa/four
stringcheck2: tendigitaa/four
stringcheck3: tendigitaa/four
stringcheck4: lo World
stringcheck5: lo World
*/
note: taking this section out of the main program caused stringcheck 4 to segfault instead of returning "lo World". The behavior was otherwise equivalent.
strcat does exactly what documentation says:
char *strcat(char *restrict s1, const char *restrict s2); The
strcat() function shall append a copy of the string pointed to by s2
(including the terminating null byte) to the end of the string pointed
to by s1. The initial byte of s2 overwrites the null byte at the end
of s1. If copying takes place between objects that overlap, the
behavior is undefined.
s1 has to have enough memory allocated to accommodate both strings plus the terminating nul
The linked article is about programming own string concatenating functions. How to write such a function depends on the application - which is stated there. There are many ways.
In your program the destination char array is not big enough and the result is an Undefined Behaviour and it is not even big enough to accommodate a single character string.
I strongly advice to learn some C strings basics.
If you want safer strcat you can write your own one for example:
char *mystrcat(const char *str1, const char *str2)
{
char *dest = NULL;
size_t str1_length, str2_length;
if(str1 && str2)
{
dest = malloc((str1_length = strlen(str1)) + (str2_length = strlen(str2)) + 1);
if(dest)
{
memcpy(dest, str1, str1_length);
memcpy(dest + str1_length, str2, str2_length);
}
}
return dest;
}
But for the safety we always pay the price - the code is longer and less efficient. C language was designed to be as efficient as possible sacrificing the safety and introducing the idea if the Undefined Behaviour.
You can't store a non-empty string in a 1-character array. A string needs room for the string contents and a null terminator.
So when you declare
char flag[1] = "1";
you've only allocated one byte, which contains the character 1. There's no null terminator.
Using this with any string functions will result in undefined behavior, because they look for the null terminator to find the end of the string.
strcat(flag, buffer) will search for the null terminator, which will be outside the array, and then append buffer after that. So this clearly causes a buffer overflow when writing.
strcpy(payload, flag) is also wrong. It will look for a null terminator after the flag bytes to know when to stop copying to payload, so it will copy more than just flag (unless there happens to be a null byte after it).
You can resolve the strcpy() problem by increasing the size:
char flag[2] = "1";
You can also leave the size empty, the compiler will make it large enough to hold the string that initializes it, including the null byte:
char flag[] = "1";
The line that causes the problem is because strcat() is trying to cram buffer into flag which is only one character long and you haven't allocated any more space to fit buffer.
If you want to put buffer into flag, I recommend using realloc() to increase the length of flag to include the length of buffer.
Also the only thing you ever print is user_string. I'm not sure if you're trying to print the other string you're working with.
I have the following code in C now
int length = 50
char *target_str = (char*) malloc(length);
char *source_str = read_string_from_somewhere() // read a string from somewhere
// with length, say 20
memcpy(target_str, source_str, length);
The scenario is that target_str is initialized with 50 bytes. source_str is a string of length 20.
If I want to copy the source_str to target_str i use memcpy() as above with length 50, which is the size of target_str. The reason I use length in memcpy is that, the source_str can have a max value of length but is usually less than that (in the above example its 20).
Now, if I want to copy till length of source_str based on its terminating character ('\0'), even if memcpy length is more than the index of terminating character, is the above code a right way to do it? or is there an alternative suggestion.
Thanks for any help.
The scenario is that target_str is initialized with 50 bytes. source_str is a string of length 20.
If I want to copy the source_str to target_str i use memcpy() as above with length 50, which is the size of target_str.
currently you ask for memcpy to read 30 characters after the end of the source string because it does not care of a possible null terminator on the source, this is an undefined behavior
because you copy a string you can use strcpy rather than memcpy
but the problem of size can be reversed, I mean the target can be smaller than the source, and without protection you will have again a undefined behavior
so you can use strncpy giving the length of the target, just take care of the necessity to add a final null character in case the target is smaller than the source :
int length = 50
char *target_str = (char*) malloc(length);
char *source_str = read_string_from_somewhere(); // length unknown
strncpy(target_str, source_str, length - 1); // -1 to let place for \0
target_str[length - 1] = 0; // force the presence of a null character at end in case
If I want to copy the source_str to target_str i use memcpy() as above
with length 50, which is the size of target_str. The reason I use
length in memcpy is that, the source_str can have a max value of
length but is usually less than that (in the above example its 20).
It is crucially important to distinguish between
the size of the array to which source_str points, and
the length of the string, if any, to which source_str points (+/- the terminator).
If source_str is certain to point to an array of length 50 or more then the memcpy() approach you present is ok. If not, then it produces undefined behavior when source_str in fact points to a shorter array. Any result within the power of your C implementation may occur.
If source_str is certain to point to a (properly-terminated) C string of no more than length - 1 characters, and if it is its string value that you want to copy, then strcpy() is more natural than memcpy(). It will copy all the string contents, up to and including the terminator. This presents no problem when source_str points to an array shorter than length, so long as it contains a string terminator.
If neither of those cases is certain to hold, then it's not clear what you want to do. The strncpy() function may cover some of those cases, but it does not cover all of them.
Now, if I want to copy till length of source_str based on its terminating character ('\0'), even if memcpy length is more than the index of terminating character, is the above code a right way to do it?
No; you'd be copying the entire content of source_str, even past the null-terminator if it occurs before the end of the allocated space for the string it is pointing to.
If your concern is minimizing the auxiliary space used by your program, what you could do is use strlen to determine the length of source_str, and allocate target_str based on that. Also, strcpy is similar to memcpy but is specifically intended for null-terminated strings (observe that it has no "size" or "length" parameter):
char *target_str = NULL;
char *source_str = read_string_from_somewhere();
size_t len = strlen(source_str);
target_str = malloc(len + 1);
strcpy(target_str, source_str);
// ...
free(target_str);
target_str = NULL;
memcpy is used to copy fixed blocks of memory, so if you want to copy something shorter that is terminated by '\n' you don't want to use memcpy.
There is other functions like strncpy or strlcpy that do similar things.
Best to check what the implementations do. I removed the optimized versions from the original source code for the sake of readability.
This is an example memcpy implementation: https://git.musl-libc.org/cgit/musl/tree/src/string/memcpy.c
void *memcpy(void *restrict dest, const void *restrict src, size_t n)
{
unsigned char *d = dest;
const unsigned char *s = src;
for (; n; n--) *d++ = *s++;
return dest;
}
It's clear that here, both pieces of memory are visited for n times. regardless of the size of source or destination string, which causes copying of memory past your string if it was shorter. Which is bad and can cause various unwanted behavior.
this is strlcpy from: https://git.musl-libc.org/cgit/musl/tree/src/string/strlcpy.c
size_t strlcpy(char *d, const char *s, size_t n)
{
char *d0 = d;
size_t *wd;
if (!n--) goto finish;
for (; n && (*d=*s); n--, s++, d++);
*d = 0;
finish:
return d-d0 + strlen(s);
}
The trick here is that n && (*d = 0) evaluates to false and will break the looping condition and exit early.
Hence this gives you the wanted behaviour.
Use strlen to determine the exact size of source_string and allocate accordingly, remembering to add an extra byte for the null terminator. Here's a full example:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void) {
char *source_str = "string_read_from_somewhere";
int len = strlen(source_str);
char *target_str = malloc(len + 1);
if (!target_str) {
fprintf(stderr, "%s:%d: malloc failed", __FILE__, __LINE__);
return 1;
}
memcpy(target_str, source_str, len + 1);
puts(target_str);
free(target_str);
return 0;
}
Also, there's no need to cast the result of malloc. Don't forget to free the allocated memory.
As mentioned in the comments, you probably want to restrict the size of the malloced string to a sensible amount.
I need help with char array. I want to create a n-lenght array and initialize its values, but after malloc() function the array is longer then n*sizeof(char), and the content of array isnt only chars which I assign... In array is few random chars and I dont know how to solve that... I need that part of code for one project for exam in school, and I have to finish by Sunday... Please help :P
#include<stdlib.h>
#include<stdio.h>
int main(){
char *text;
int n = 10;
int i;
if((text = (char*) malloc((n)*sizeof(char))) == NULL){
fprintf(stderr, "allocation error");
}
for(i = 0; i < n; i++){
//text[i] = 'A';
strcat(text,"A");
}
int test = strlen(text);
printf("\n%d\n", test);
puts(text);
free(text);
return 0;
}
Well before using strcat make
text[0]=0;
strcat expects null terminated char array for the first argument also.
From standard 7.24.3.1
#include <string.h>
char *strcat(char * restrict s1,
const char * restrict s2);
The strcat function appends a copy of the string pointed to by s2
(including the terminating null character) to the end of the string
pointed to by s1. The initial character of s2 overwrites the null
character at the end of s1.
How do you think strcat will know where the first string ends if you don't
put a \0 in s1.
Also don't forget to allocate an extra byte for the \0 character. Otherwise you are writing past what you have allocated for. This is again undefined behavior.
And earlier you had undefined behavior.
Note:
You should check the return value of malloc to know whether the malloc invocation was successful or not.
Casting the return value of malloc is not needed. Conversion from void* to relevant pointer is done implicitly in this case.
strlen returns size_t not int. printf("%zu",strlen(text))
To start with, you're way of using malloc in
text = (char*) malloc((n)*sizeof(char)
is not ideal. You can change that to
text = malloc(n * sizeof *text); // Don't cast and using *text is straighforward and easy.
So the statement could be
if(NULL == (text = (char*) malloc((n)*sizeof(char))){
fprintf(stderr, "allocation error");
}
But the actual problem lies in
for(i = 0; i < n; i++){
//text[i] = 'A';
strcat(text,"A");
}
The strcat documentation says
dest − This is pointer to the destination array, which should contain
a C string, and should be large enough to contain the concatenated
resulting string.
Just to point out that the above method is flawed, you just need to consider that the C string "A" actually contains two characters in it, A and the terminating \0(the null character). In this case, when i is n-2, you have out of bounds access or buffer overrun1. If you wanted to fill the entire text array with A, you could have done
for(i = 0; i < n; i++){
// Note for n length, you can store n-1 chars plus terminating null
text[i]=(n-2)==i?'A':'\0'; // n-2 because, the count starts from zero
}
//Then print the null terminated string
printf("Filled string : %s\n",text); // You're all good :-)
Note: Use a tool like valgrind to find memory leaks & out of bound memory accesses.
I have a problem with printing a string in C (well, the string that *ptr points to).
I have the following code:
char *removeColon(char *word) {
size_t wordLength;
char word1[MAXLENGTH];
wordLength = strlen(word);
wordLength--;
memcpy(word1, word, wordLength);
printf("word1: %s\n", word1);
return *word1;
}
I ran this with word = "MAIN:" (the value of word comes from strtok on a string read from a file).
It works fine until the printf, where the result is:
word1: MAIN╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠
and then there is an exception and everything breaks.
Any thoughts?
Your function removeColon should either
operate in place and modify the string passed as an argument
be given a destination buffer and copy the shortened string to it or
allocate memory for the shortened string and return that.
You copy just the characters into the local array, not the null terminator, nor do you set one in the buffer, passing this array to printf("%s", ...) invokes undefined behavior: printf continues printing the buffer contents until it finds a '\0' byte, it even goes beyond the end of the array, invoking undefined behavior, printing garbage and eventually dies in a crash.
You cannot return a pointer to an automatic array because this array becomes unavailable as soon as the function returns. Dereferencing the pointer later will invoke undefined behavior.
Here is a function that works in place:
char *removeColon(char *word) {
if (*word) word[strlen(word) - 1] = '\0';
return word;
}
Here is one that copies to a destination buffer, assumed to be long enough:
char *removeColon(char *dest, const char *word) {
size_t len = strlen(word);
memcpy(dest, word, len - 1);
dest[len - 1] = '\0';
return dest;
}
Here is one that allocates memory:
char *removeColon(const char *word) {
size_t len = strlen(word);
char *dest = malloc(len);
memcpy(dest, word, len - 1);
dest[len - 1] = '\0';
return dest;
}
You must make sure (1) each string is nul-terminated, and (2) you are not attempting to modify a string-literal. You have many approaches you can take. A simple approach to remove the last character (any char) with strlen:
char *rmlast (char *s)
{
if (!*s) return s; /* return if empty-string */
s[strlen (s) - 1] = 0; /* overwrite last w/nul */
return s;
}
(you can also use the string.h functions strchr (searching for 0), strrchr (searching for your target char, if passed), strpbrk (searching for one of several chars), etc.. to locate the last character as well)
Or you can do the same thing with pointers:
char *rmlast (char *s)
{
if (!*s) return s; /* return if empty-string */
char *p = s;
for (; *p; p++) {} /* advance to end of str */
*--p = 0; /* overwrite last w/nul */
return s;
}
You can also pass the last character of interest if you want to limit removal to any specific character and make a simple comparison in the function before overwriting it with a nul-terminating character.
Look over both and let me know if you have any questions.
wordLength = strlen(word);
You have to include the null terminator in the length, because every string has a terminating character whose ASCII value is 0, spelled \0 in C. Also, use the str... family of functions instead of mem..., since the former is intended for null terminated strings, but the latter for arrays. In addition, you cannot return a local stack allocated array. Based on the code of the function, it sounds like you're removing the last character. If that is the case, it is better to do
void remlast(char *str)
{
str[strlen(str) - 1] = '\0';
}
Note that this does not work on empty strings.
You copy over wordLength bytes, but you fail to add a null terminating byte. Because word1 is uninitialized prior to this copy, the remaining bytes are undefined.
So when printf attempts to print the string, it doesn't find a null terminator and keeps reading until it finds a null byte somewhere outside the bounds of the array. This is undefined behavior.
After copying the bytes, you need to manually add the null terminator:
memcpy(word1, word, wordLength);
word1[wordLength] = '\0';
Also, you're returning a pointer to a local variable. When the function returns, that variable is out of scope, and dereferencing that pointer is also undefined behavior.
Rather than making word1 a local array, you can allocate memory dynamically for it:
char *word1 = malloc(strlen(word));
If you do this, you'll need to free this memory somewhere in the calling function. The other option is to have the caller pass in a buffer of the proper size:
void removeColon(char *word, char *word1) {
I am trying to remove first semicolon from a character arrary whose value is:
Input: ; Test: 876033074, 808989746, 825766962, ; Test1:
825766962,
Code:
char *cleaned = cleanResult(result);
printf("Returned BY CLEAN: %s\n",cleaned);
char *cleanResult(char *in)
{
printf("Cleaning this: %s\n",in);
char *firstOccur = strchr(in,';');
printf("CLEAN To Remove: %s\n",firstOccur);
char *restOfArray = firstOccur + 2;
printf("CLEAN To Remove: %s\n",restOfArray); //Correct Value Printed here
char *toRemove;
while ((toRemove = strstr(restOfArray + 2,", ;"))!=NULL)
{
printf("To Remove: %s\n",toRemove);
memmove (toRemove, toRemove + 2, strlen(toRemove + 2));
printf("Removed: %s\n",toRemove); //Correct Value Printed
}
return in;
}
Output (first semicolon still there): ; Test: 876033074,
808989746, 825766962; Test1: 825766962;
Regarding sizeof(cleaned): using sizeof to get the capacity of an array only works if the argument is an array, not a pointer:
char buffer[100];
const char *pointer = "something something dark side";
// Prints 100
printf("%zu\n", sizeof(buffer));
// Prints size of pointer itself, usually 4 or 8
printf("%zu\n", sizeof(pointer));
Although both a local array and a pointer can be subscripted, they behave differently when it comes to sizeof. Thus, you cannot determine the capacity of an array given only a pointer to it.
Also, bear this in mind:
void foo(char not_really_an_array[100])
{
// Prints size of pointer!
printf("%zu\n", sizeof(not_really_an_array));
// Compiles, since not_really_an_array is a regular pointer
not_really_an_array++;
}
Although not_really_an_array is declared like an array, it is a function parameter, so is actually a pointer. It is exactly the same as:
void foo(char *not_really_an_array)
{
...
Not really logical, but we're stuck with it.
On to your question. I'm unclear on what you're trying to do. Simply removing the first character of a string (in-place) can be accomplished with a memmove:
memmove( buffer // destination
, buffer + 1 // source
, strlen(buffer) - 1 // number of bytes to copy
);
This takes linear time, and assumes buffer does not contain an empty string.
The reason strcpy(buffer, buffer + 1) won't do is because the strings overlap, so this yields undefined behavior. memmove, however, explicitly allows the source and destination to overlap.
For more complex character filtering, you should consider traversing the string manually, using a "read" pointer and a "write" pointer. Just make sure the write pointer does not get ahead of the read pointer, so the string won't be clobbered while it is read.
void remove_semicolons(char *buffer)
{
const char *r = buffer;
char *w = buffer;
for (; *r != '\0'; r++)
{
if (*r != ';')
*w++ = *r;
}
*w = 0; // Terminate the string at its new length
}
You are using strcpy with overlapping input / output buffer, which results in undefined behavior.
You're searching for a sequence of three characters (comma space semicolon) and then removing the first two (the comma and the space). If you want to remove the semicolon too, you need to remove all three characters (use toRemove+3 instead of toRemove+2). You also need to add 1 to the strlen result to account for the NUL byte terminating the string.
If, as you say, you just want to remove the first semicolon and nothing else, you need to search for just the semicolon (which you can do with strchr):
if ((toRemove = strchr(in, ';')) // find a semicolon
memmove(toRemove, toRemove+1, strlen(toRemove+1)+1); // remove 1 char at that position