Strange behaviour of printf after memcpy - c

I have a problem with printing a string in C (well, the string that *ptr points to).
I have the following code:
char *removeColon(char *word) {
size_t wordLength;
char word1[MAXLENGTH];
wordLength = strlen(word);
wordLength--;
memcpy(word1, word, wordLength);
printf("word1: %s\n", word1);
return *word1;
}
I ran this with word = "MAIN:" (the value of word comes from strtok on a string read from a file).
It works fine until the printf, where the result is:
word1: MAIN╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠
and then there is an exception and everything breaks.
Any thoughts?

Your function removeColon should either
operate in place and modify the string passed as an argument
be given a destination buffer and copy the shortened string to it or
allocate memory for the shortened string and return that.
You copy just the characters into the local array, not the null terminator, nor do you set one in the buffer, passing this array to printf("%s", ...) invokes undefined behavior: printf continues printing the buffer contents until it finds a '\0' byte, it even goes beyond the end of the array, invoking undefined behavior, printing garbage and eventually dies in a crash.
You cannot return a pointer to an automatic array because this array becomes unavailable as soon as the function returns. Dereferencing the pointer later will invoke undefined behavior.
Here is a function that works in place:
char *removeColon(char *word) {
if (*word) word[strlen(word) - 1] = '\0';
return word;
}
Here is one that copies to a destination buffer, assumed to be long enough:
char *removeColon(char *dest, const char *word) {
size_t len = strlen(word);
memcpy(dest, word, len - 1);
dest[len - 1] = '\0';
return dest;
}
Here is one that allocates memory:
char *removeColon(const char *word) {
size_t len = strlen(word);
char *dest = malloc(len);
memcpy(dest, word, len - 1);
dest[len - 1] = '\0';
return dest;
}

You must make sure (1) each string is nul-terminated, and (2) you are not attempting to modify a string-literal. You have many approaches you can take. A simple approach to remove the last character (any char) with strlen:
char *rmlast (char *s)
{
if (!*s) return s; /* return if empty-string */
s[strlen (s) - 1] = 0; /* overwrite last w/nul */
return s;
}
(you can also use the string.h functions strchr (searching for 0), strrchr (searching for your target char, if passed), strpbrk (searching for one of several chars), etc.. to locate the last character as well)
Or you can do the same thing with pointers:
char *rmlast (char *s)
{
if (!*s) return s; /* return if empty-string */
char *p = s;
for (; *p; p++) {} /* advance to end of str */
*--p = 0; /* overwrite last w/nul */
return s;
}
You can also pass the last character of interest if you want to limit removal to any specific character and make a simple comparison in the function before overwriting it with a nul-terminating character.
Look over both and let me know if you have any questions.

wordLength = strlen(word);
You have to include the null terminator in the length, because every string has a terminating character whose ASCII value is 0, spelled \0 in C. Also, use the str... family of functions instead of mem..., since the former is intended for null terminated strings, but the latter for arrays. In addition, you cannot return a local stack allocated array. Based on the code of the function, it sounds like you're removing the last character. If that is the case, it is better to do
void remlast(char *str)
{
str[strlen(str) - 1] = '\0';
}
Note that this does not work on empty strings.

You copy over wordLength bytes, but you fail to add a null terminating byte. Because word1 is uninitialized prior to this copy, the remaining bytes are undefined.
So when printf attempts to print the string, it doesn't find a null terminator and keeps reading until it finds a null byte somewhere outside the bounds of the array. This is undefined behavior.
After copying the bytes, you need to manually add the null terminator:
memcpy(word1, word, wordLength);
word1[wordLength] = '\0';
Also, you're returning a pointer to a local variable. When the function returns, that variable is out of scope, and dereferencing that pointer is also undefined behavior.
Rather than making word1 a local array, you can allocate memory dynamically for it:
char *word1 = malloc(strlen(word));
If you do this, you'll need to free this memory somewhere in the calling function. The other option is to have the caller pass in a buffer of the proper size:
void removeColon(char *word, char *word1) {

Related

Printing a string in C

I understand that in C, a string is an array of characters with a special '\0' character at the end of the array.
Say I have "Hello" stored in a char* named string and there is a '\0' at the end of the array.
When I call printf("%s\n", string);, it would print out "Hello".
My question is, what happens to '\0' when you call printf on a string?
The null character ('\0') at the end of a string is simply a sentinel value for C library functions to know where to stop processing a string pointer.
This is necessary for two reasons:
Arrays decay to pointers to their first element when passed to functions
It's entirely possible to have a string in an array of chars that doesn't use up the entire array.
For example, strlen, which determines the length of the string, might be implemented as:
size_t strlen(char *s)
{
size_t len = 0;
while(*s++ != '\0') len++;
return len;
}
If you tried to emulate this behavior inline with a statically allocated array instead of a pointer, you still need the null terminator to know the string length:
char str[100];
size_t len = 0;
strcpy(str, "Hello World");
for(; len < 100; len++)
if(str[len]=='\0') break;
// len now contains the string length
Note that explicitly comparing for inequality with '\0' is redundant; I just included it for ease of understanding.

Copy c-string char by char to dynamic char*

I have a const char* string, I want to copy that string character by character to dynamic `char*.
const char *constStr = "Hello world";
char *str = (char*) malloc(strlen(constStr)+1);
while(*constStr){
*str = *constStr;
constStr++;
str++;
}
printf("%s", str);
free(str);
The problem is that previous code just copies each character of constStr to only the first index of the str. I don't know why?
As others have pointed out, you are incrementing str pointer in each iteration, so you always end up printing the end of the string.
You can instead iterate over each character without incrementing the pointer. The following code worked for me:
const char *constStr = "Hello world";
int len = strlen(constStr);
char *str = (char *) malloc(len + 1);
int i;
for (i = 0; i <= len; ++i) {
str[i] = constStr[i];
}
printf("%s", str);
free(str);
Yes you didn't null terminate the string. That was the primary problem. To be more clear, it is not that you didn't nul terminate the string which is the problem but rather your use of them where a pointer to a nul terminated char array is expected is the problem. But even if you did there was significant amount of problems in the code.
You allocated the memory and the casted the return value of malloc which is unnecessary. void* to char* conversion is implicitly done.
malloc might not be able to service the request, it might return a null pointer. It is important to
check for this to prevent later attempts to dereference the null pointer.
Then you started copying - you copied everything except the NUL terminating character. And then you passed it to printf's %s format specifier which expects a pointer to a null terminated char array. This is undefined behavior.
The one position, in the str is uninitialized - beware that accessing uninitialized value may lead to undefined behavior.
Also there is another problem, From standard §7.22.3.3
The free function causes the space pointed to by ptr to be deallocated, that is, made available for further allocation. If ptr is a null pointer, no action occurs. Otherwise, if the argument does not match a pointer earlier returned by a memory management function, or if the space has been deallocated by a call to free or realloc, the behavior is undefined.
Yes so is is the case here? No. when you called free(str) str is not pointing to the dynamically allocated memory returned by the malloc. This is again undefined behavior.
The solution always is to keep a pointer which stores the address of the allocated chunk. The other answers already showed them (without repeating them - both of them provides a good solution).
You can use strdup or strcpy also - even if you don't need them now - get accustomed with them. It helps to know those. And yes strdup is not part of standard, it is a POSIX standard thing.
Example:
const char *constStr = "Hello world";
char *str = malloc(strlen(constStr)+1);
if( !str ){
perror("malloc");
exit(EXIT_FAILURE);
}
char *sstr = str;
while(*constStr){
*str = *constStr;
constStr++;
str++;
}
*str = 0;
printf("%s", sstr);
free(sstr);
Here's the "classical" string copy solution:
const char *constStr = "Hello world";
char *str = malloc(strlen(constStr) + 1), *p = str;
/* Do not forget to check if str!=NULL !*/
while((*p++ = *constStr++));
puts(str);
The problem is that previous code just copies each character of
constStr to only the first index of the str. I don't know why?
Use index variable.
Don't forget terminating '\0' because you have a good chance of segmentation fault.

C - Truncate char* string argument

I've been trying to do this simple task for a while now but can't get it to work properly as I'm not super familiar with pure C tricks.
Basically I have a function that get called by a block of code that I didn't write myself and can't edit.
int myMethod(char* str);
My task is to find the position of a substring in the char* and if found get the string from index 0 to index of the found substring and assign it to the original char* str.
Here is what I tried to do:
int myMethod(char* str)
{
int splitPos = strstr(str, "Pikachu") - str;
char buffer[splitPos + 1];
strncpy(buffer, str, splitPos);
buffer[splitPos] = '\0';
memcpy(str, buffer, strlen(buffer) + 1);
}
And I get a SegFault at the last strcpy call. I tried changing it to the followings with the same result
memmove(str, buffer, ...)
strcpy(...)
There are multiple problems in your function:
you do not check the return value of strstr for success. Your code has undefined behavior if it fails to locate the substring.
strncpy will not null terminate the destination string if the source is longer than splitPos-1, which it is. Do not use strncpy, it does not do what you think, it is very error prone for both the programmer and the reader. For your purpose, memcpy with the same arguments is equivalent and less problematic.
strlen(buffer) is redundant, it evaluates to splitPos.
You actually do not need a temporary buffer for your goal: truncating the string can be done by simply setting the start of the substring to '\0'.
if the destination string is read-only, modifying it has undefined behavior, and might explain the observed segmentation fault.
Here is a simplified version:
int myMethod(char *str) {
char *p = strstr(str, "Pikachu");
if (p != NULL) {
*p = '\0';
}
}
Conversely if you just need to manipulate the substring in further code in myMethod, you can make a copy to avoid the fateful attempt at modifying the original string:
int myMethod(char *str) {
char *p = strstr(str, "Pikachu");
size_t len = p ? p - str : stren(str);
char buffer[len + 1];
memcpy(buffer, str, len);
buffer[len] = '\0';
str = buffer;
/* use `str` in this function */
}

Assigning NULL to the middle of a c-string

I'm writing a function eliminate(char *str, int character) that takes a c-string and a character to eliminate as input, scans str for instances of character and replaces the value at the current index with... what? I thought NULL, but this seems risky and could mess with other functions that rely on the null-terminator in a c-string. For example:
char *eliminate(char *str, int character) {
if (!str) return str;
int index = 0;
while (str[index])
if (str[index] == character)
str[index++] = '\0'; //THIS LINE IS IN QUESTION
return str;
}
My question is, how do I properly implement this function such that I'm effectively eliminating all instances of a specified character in a string? And if a proper elimination assigns '\0' to the character to be replaced, how does this not affect the entire string (i.e., it effectively ends at the first '\0' encountered). For example, if I were to run the above function twice on the same string, the second call would only examine the string up to where the last character was replaced.
It is fine to use such replacement if you know what you are doing.
It can work only if the char buffer char *str is writable (dynamically allocated, for example by malloc, or just char array on stack char str[SIZE]). It cannot work for string literals.
The standard function strtok also works in this way. By the way, probably you can use strtok for your task if you want to have null terminated substring.
It does not make sense to have integer type for charcter function argument: int character -> char character
Replacing the character by '\0' will likely cause confusion. I would eliminate the undesirable character by shifting the next eligible character into its spot.
char *eliminate(char *str, int character) {
if (!str) return str;
int index = 0, shiftIndex = 0;
while (str[index]) {
if (str[index] == character)
index++;
else {
str[shiftIndex] = str[index];
shiftIndex++, index++;
}
}
str[shiftIndex] = '\0';
return str;
}

*str = c gives me a segmentation fault [duplicate]

This question already has answers here:
Why do I get a segmentation fault when writing to a "char *s" initialized with a string literal, but not "char s[]"?
(19 answers)
Closed 9 years ago.
I am writing a function normalize that prepares a string for processing. This is the code:
/* The normalize procedure examines a character array of size len
in ONE PASS and does the following:
1) turns all upper-case letters into lower-case ones
2) turns any white-space character into a space character and,
shrinks any n>1 consecutive spaces into exactly 1 space only
3) removes all initial and final white-space characters
Hint: use the C library function isspace()
You must do the normalization IN PLACE so that when the procedure
returns, the character array buf contains the normalized string and
the return value is the length of the normalized string.
*/
int normalize(char *buf, /* The character array containing the string to be normalized*/
int len /* the size of the original character array */)
{
/* exit function and return error if buf or len are invalid values */
if (buf == NULL || len <= 0)
return -1;
char *str = buf;
char prev, temp;
len = 0;
/* skip over white space at the beginning */
while (isspace(*buf))
buf++;
/* process characters and update str until end of buf */
while (*buf != '\0') {
printf("processing %c, buf = %p, str = %p \n", *buf, buf, str);
/* str might point to same location as buf, so save previous value in case str ends up changing buf */
temp = *buf;
/* if character is whitespace and last char wasn't, then add a space to the result string */
if (isspace(*buf) && !isspace(prev)) {
*str++ = ' ';
len++;
}
/* if character is NOT whitespace, then add its lowercase form to the result string */
else if (!isspace(*buf)) {
*str++ = tolower(*buf);
len++;
}
/* update previous char and increment buf to point to next character */
prev = temp;
buf++;
}
/* if last character was a whitespace, then get rid of the trailing whitespace */
if (len > 0 && isspace(*(str-1))) {
str--;
len--;
}
/* append NULL character to terminate result string and return length */
*str = '\0';
return len;
}
However, I am getting a segmentation fault. I have narrowed down the problem to this line:
*str++ = *buf;
More specifically, if I try to deference str and assign it a new char value (eg: *str = c) the program will crash. However str was initialize at the beginning to point to buf so I have no clue why this is happening.
*EDIT: This is how I am calling the function: *
char *p = "string goes here";
normalize(p, strlen(p));
You can't call your function with p when p was declared as char *p = "Some string";, since p is a pointer initialized to a string constant. This means you can't modify the contents of p, and attempting to do so results in undefined behavior (this is the cause for segfault). However, you can, of course, make p point to somewhere else, namely, to a writable characters sequence.
Alternatively, you could declare p to be an array of characters. You can initialize it just like you did with the pointer declaration, but array declaration makes the string writable:
char p[] = "Some string";
normalize(p, strlen(p));
Remember that arrays are not modifiable l-values, so you will not be able to assign to p, but you can change the content in p[i], which is what you want.
Apart from that, note that your code uses prev with garbage values in the first loop iteration, because you never initialize it. Because you only use prev to test if it is a space, maybe a better approach would be to have a flag prev_is_space, rather than explicitly storing the previous character. This would make it easy to start the loop, you just have to initialize prev_is_space to 0, or 1 if there are leading white spaces (this really depends on how you want your function to behave).
I don't see where you initialized prev before using it in isspace(prev).

Resources