This question already has answers here:
Why do I get a segmentation fault when writing to a "char *s" initialized with a string literal, but not "char s[]"?
(19 answers)
Closed 9 years ago.
I am writing a function normalize that prepares a string for processing. This is the code:
/* The normalize procedure examines a character array of size len
in ONE PASS and does the following:
1) turns all upper-case letters into lower-case ones
2) turns any white-space character into a space character and,
shrinks any n>1 consecutive spaces into exactly 1 space only
3) removes all initial and final white-space characters
Hint: use the C library function isspace()
You must do the normalization IN PLACE so that when the procedure
returns, the character array buf contains the normalized string and
the return value is the length of the normalized string.
*/
int normalize(char *buf, /* The character array containing the string to be normalized*/
int len /* the size of the original character array */)
{
/* exit function and return error if buf or len are invalid values */
if (buf == NULL || len <= 0)
return -1;
char *str = buf;
char prev, temp;
len = 0;
/* skip over white space at the beginning */
while (isspace(*buf))
buf++;
/* process characters and update str until end of buf */
while (*buf != '\0') {
printf("processing %c, buf = %p, str = %p \n", *buf, buf, str);
/* str might point to same location as buf, so save previous value in case str ends up changing buf */
temp = *buf;
/* if character is whitespace and last char wasn't, then add a space to the result string */
if (isspace(*buf) && !isspace(prev)) {
*str++ = ' ';
len++;
}
/* if character is NOT whitespace, then add its lowercase form to the result string */
else if (!isspace(*buf)) {
*str++ = tolower(*buf);
len++;
}
/* update previous char and increment buf to point to next character */
prev = temp;
buf++;
}
/* if last character was a whitespace, then get rid of the trailing whitespace */
if (len > 0 && isspace(*(str-1))) {
str--;
len--;
}
/* append NULL character to terminate result string and return length */
*str = '\0';
return len;
}
However, I am getting a segmentation fault. I have narrowed down the problem to this line:
*str++ = *buf;
More specifically, if I try to deference str and assign it a new char value (eg: *str = c) the program will crash. However str was initialize at the beginning to point to buf so I have no clue why this is happening.
*EDIT: This is how I am calling the function: *
char *p = "string goes here";
normalize(p, strlen(p));
You can't call your function with p when p was declared as char *p = "Some string";, since p is a pointer initialized to a string constant. This means you can't modify the contents of p, and attempting to do so results in undefined behavior (this is the cause for segfault). However, you can, of course, make p point to somewhere else, namely, to a writable characters sequence.
Alternatively, you could declare p to be an array of characters. You can initialize it just like you did with the pointer declaration, but array declaration makes the string writable:
char p[] = "Some string";
normalize(p, strlen(p));
Remember that arrays are not modifiable l-values, so you will not be able to assign to p, but you can change the content in p[i], which is what you want.
Apart from that, note that your code uses prev with garbage values in the first loop iteration, because you never initialize it. Because you only use prev to test if it is a space, maybe a better approach would be to have a flag prev_is_space, rather than explicitly storing the previous character. This would make it easy to start the loop, you just have to initialize prev_is_space to 0, or 1 if there are leading white spaces (this really depends on how you want your function to behave).
I don't see where you initialized prev before using it in isspace(prev).
Related
I understand that in C, a string is an array of characters with a special '\0' character at the end of the array.
Say I have "Hello" stored in a char* named string and there is a '\0' at the end of the array.
When I call printf("%s\n", string);, it would print out "Hello".
My question is, what happens to '\0' when you call printf on a string?
The null character ('\0') at the end of a string is simply a sentinel value for C library functions to know where to stop processing a string pointer.
This is necessary for two reasons:
Arrays decay to pointers to their first element when passed to functions
It's entirely possible to have a string in an array of chars that doesn't use up the entire array.
For example, strlen, which determines the length of the string, might be implemented as:
size_t strlen(char *s)
{
size_t len = 0;
while(*s++ != '\0') len++;
return len;
}
If you tried to emulate this behavior inline with a statically allocated array instead of a pointer, you still need the null terminator to know the string length:
char str[100];
size_t len = 0;
strcpy(str, "Hello World");
for(; len < 100; len++)
if(str[len]=='\0') break;
// len now contains the string length
Note that explicitly comparing for inequality with '\0' is redundant; I just included it for ease of understanding.
I need help with char array. I want to create a n-lenght array and initialize its values, but after malloc() function the array is longer then n*sizeof(char), and the content of array isnt only chars which I assign... In array is few random chars and I dont know how to solve that... I need that part of code for one project for exam in school, and I have to finish by Sunday... Please help :P
#include<stdlib.h>
#include<stdio.h>
int main(){
char *text;
int n = 10;
int i;
if((text = (char*) malloc((n)*sizeof(char))) == NULL){
fprintf(stderr, "allocation error");
}
for(i = 0; i < n; i++){
//text[i] = 'A';
strcat(text,"A");
}
int test = strlen(text);
printf("\n%d\n", test);
puts(text);
free(text);
return 0;
}
Well before using strcat make
text[0]=0;
strcat expects null terminated char array for the first argument also.
From standard 7.24.3.1
#include <string.h>
char *strcat(char * restrict s1,
const char * restrict s2);
The strcat function appends a copy of the string pointed to by s2
(including the terminating null character) to the end of the string
pointed to by s1. The initial character of s2 overwrites the null
character at the end of s1.
How do you think strcat will know where the first string ends if you don't
put a \0 in s1.
Also don't forget to allocate an extra byte for the \0 character. Otherwise you are writing past what you have allocated for. This is again undefined behavior.
And earlier you had undefined behavior.
Note:
You should check the return value of malloc to know whether the malloc invocation was successful or not.
Casting the return value of malloc is not needed. Conversion from void* to relevant pointer is done implicitly in this case.
strlen returns size_t not int. printf("%zu",strlen(text))
To start with, you're way of using malloc in
text = (char*) malloc((n)*sizeof(char)
is not ideal. You can change that to
text = malloc(n * sizeof *text); // Don't cast and using *text is straighforward and easy.
So the statement could be
if(NULL == (text = (char*) malloc((n)*sizeof(char))){
fprintf(stderr, "allocation error");
}
But the actual problem lies in
for(i = 0; i < n; i++){
//text[i] = 'A';
strcat(text,"A");
}
The strcat documentation says
dest − This is pointer to the destination array, which should contain
a C string, and should be large enough to contain the concatenated
resulting string.
Just to point out that the above method is flawed, you just need to consider that the C string "A" actually contains two characters in it, A and the terminating \0(the null character). In this case, when i is n-2, you have out of bounds access or buffer overrun1. If you wanted to fill the entire text array with A, you could have done
for(i = 0; i < n; i++){
// Note for n length, you can store n-1 chars plus terminating null
text[i]=(n-2)==i?'A':'\0'; // n-2 because, the count starts from zero
}
//Then print the null terminated string
printf("Filled string : %s\n",text); // You're all good :-)
Note: Use a tool like valgrind to find memory leaks & out of bound memory accesses.
I have a problem with printing a string in C (well, the string that *ptr points to).
I have the following code:
char *removeColon(char *word) {
size_t wordLength;
char word1[MAXLENGTH];
wordLength = strlen(word);
wordLength--;
memcpy(word1, word, wordLength);
printf("word1: %s\n", word1);
return *word1;
}
I ran this with word = "MAIN:" (the value of word comes from strtok on a string read from a file).
It works fine until the printf, where the result is:
word1: MAIN╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠
and then there is an exception and everything breaks.
Any thoughts?
Your function removeColon should either
operate in place and modify the string passed as an argument
be given a destination buffer and copy the shortened string to it or
allocate memory for the shortened string and return that.
You copy just the characters into the local array, not the null terminator, nor do you set one in the buffer, passing this array to printf("%s", ...) invokes undefined behavior: printf continues printing the buffer contents until it finds a '\0' byte, it even goes beyond the end of the array, invoking undefined behavior, printing garbage and eventually dies in a crash.
You cannot return a pointer to an automatic array because this array becomes unavailable as soon as the function returns. Dereferencing the pointer later will invoke undefined behavior.
Here is a function that works in place:
char *removeColon(char *word) {
if (*word) word[strlen(word) - 1] = '\0';
return word;
}
Here is one that copies to a destination buffer, assumed to be long enough:
char *removeColon(char *dest, const char *word) {
size_t len = strlen(word);
memcpy(dest, word, len - 1);
dest[len - 1] = '\0';
return dest;
}
Here is one that allocates memory:
char *removeColon(const char *word) {
size_t len = strlen(word);
char *dest = malloc(len);
memcpy(dest, word, len - 1);
dest[len - 1] = '\0';
return dest;
}
You must make sure (1) each string is nul-terminated, and (2) you are not attempting to modify a string-literal. You have many approaches you can take. A simple approach to remove the last character (any char) with strlen:
char *rmlast (char *s)
{
if (!*s) return s; /* return if empty-string */
s[strlen (s) - 1] = 0; /* overwrite last w/nul */
return s;
}
(you can also use the string.h functions strchr (searching for 0), strrchr (searching for your target char, if passed), strpbrk (searching for one of several chars), etc.. to locate the last character as well)
Or you can do the same thing with pointers:
char *rmlast (char *s)
{
if (!*s) return s; /* return if empty-string */
char *p = s;
for (; *p; p++) {} /* advance to end of str */
*--p = 0; /* overwrite last w/nul */
return s;
}
You can also pass the last character of interest if you want to limit removal to any specific character and make a simple comparison in the function before overwriting it with a nul-terminating character.
Look over both and let me know if you have any questions.
wordLength = strlen(word);
You have to include the null terminator in the length, because every string has a terminating character whose ASCII value is 0, spelled \0 in C. Also, use the str... family of functions instead of mem..., since the former is intended for null terminated strings, but the latter for arrays. In addition, you cannot return a local stack allocated array. Based on the code of the function, it sounds like you're removing the last character. If that is the case, it is better to do
void remlast(char *str)
{
str[strlen(str) - 1] = '\0';
}
Note that this does not work on empty strings.
You copy over wordLength bytes, but you fail to add a null terminating byte. Because word1 is uninitialized prior to this copy, the remaining bytes are undefined.
So when printf attempts to print the string, it doesn't find a null terminator and keeps reading until it finds a null byte somewhere outside the bounds of the array. This is undefined behavior.
After copying the bytes, you need to manually add the null terminator:
memcpy(word1, word, wordLength);
word1[wordLength] = '\0';
Also, you're returning a pointer to a local variable. When the function returns, that variable is out of scope, and dereferencing that pointer is also undefined behavior.
Rather than making word1 a local array, you can allocate memory dynamically for it:
char *word1 = malloc(strlen(word));
If you do this, you'll need to free this memory somewhere in the calling function. The other option is to have the caller pass in a buffer of the proper size:
void removeColon(char *word, char *word1) {
I'm writing a function eliminate(char *str, int character) that takes a c-string and a character to eliminate as input, scans str for instances of character and replaces the value at the current index with... what? I thought NULL, but this seems risky and could mess with other functions that rely on the null-terminator in a c-string. For example:
char *eliminate(char *str, int character) {
if (!str) return str;
int index = 0;
while (str[index])
if (str[index] == character)
str[index++] = '\0'; //THIS LINE IS IN QUESTION
return str;
}
My question is, how do I properly implement this function such that I'm effectively eliminating all instances of a specified character in a string? And if a proper elimination assigns '\0' to the character to be replaced, how does this not affect the entire string (i.e., it effectively ends at the first '\0' encountered). For example, if I were to run the above function twice on the same string, the second call would only examine the string up to where the last character was replaced.
It is fine to use such replacement if you know what you are doing.
It can work only if the char buffer char *str is writable (dynamically allocated, for example by malloc, or just char array on stack char str[SIZE]). It cannot work for string literals.
The standard function strtok also works in this way. By the way, probably you can use strtok for your task if you want to have null terminated substring.
It does not make sense to have integer type for charcter function argument: int character -> char character
Replacing the character by '\0' will likely cause confusion. I would eliminate the undesirable character by shifting the next eligible character into its spot.
char *eliminate(char *str, int character) {
if (!str) return str;
int index = 0, shiftIndex = 0;
while (str[index]) {
if (str[index] == character)
index++;
else {
str[shiftIndex] = str[index];
shiftIndex++, index++;
}
}
str[shiftIndex] = '\0';
return str;
}
I am trying to remove first semicolon from a character arrary whose value is:
Input: ; Test: 876033074, 808989746, 825766962, ; Test1:
825766962,
Code:
char *cleaned = cleanResult(result);
printf("Returned BY CLEAN: %s\n",cleaned);
char *cleanResult(char *in)
{
printf("Cleaning this: %s\n",in);
char *firstOccur = strchr(in,';');
printf("CLEAN To Remove: %s\n",firstOccur);
char *restOfArray = firstOccur + 2;
printf("CLEAN To Remove: %s\n",restOfArray); //Correct Value Printed here
char *toRemove;
while ((toRemove = strstr(restOfArray + 2,", ;"))!=NULL)
{
printf("To Remove: %s\n",toRemove);
memmove (toRemove, toRemove + 2, strlen(toRemove + 2));
printf("Removed: %s\n",toRemove); //Correct Value Printed
}
return in;
}
Output (first semicolon still there): ; Test: 876033074,
808989746, 825766962; Test1: 825766962;
Regarding sizeof(cleaned): using sizeof to get the capacity of an array only works if the argument is an array, not a pointer:
char buffer[100];
const char *pointer = "something something dark side";
// Prints 100
printf("%zu\n", sizeof(buffer));
// Prints size of pointer itself, usually 4 or 8
printf("%zu\n", sizeof(pointer));
Although both a local array and a pointer can be subscripted, they behave differently when it comes to sizeof. Thus, you cannot determine the capacity of an array given only a pointer to it.
Also, bear this in mind:
void foo(char not_really_an_array[100])
{
// Prints size of pointer!
printf("%zu\n", sizeof(not_really_an_array));
// Compiles, since not_really_an_array is a regular pointer
not_really_an_array++;
}
Although not_really_an_array is declared like an array, it is a function parameter, so is actually a pointer. It is exactly the same as:
void foo(char *not_really_an_array)
{
...
Not really logical, but we're stuck with it.
On to your question. I'm unclear on what you're trying to do. Simply removing the first character of a string (in-place) can be accomplished with a memmove:
memmove( buffer // destination
, buffer + 1 // source
, strlen(buffer) - 1 // number of bytes to copy
);
This takes linear time, and assumes buffer does not contain an empty string.
The reason strcpy(buffer, buffer + 1) won't do is because the strings overlap, so this yields undefined behavior. memmove, however, explicitly allows the source and destination to overlap.
For more complex character filtering, you should consider traversing the string manually, using a "read" pointer and a "write" pointer. Just make sure the write pointer does not get ahead of the read pointer, so the string won't be clobbered while it is read.
void remove_semicolons(char *buffer)
{
const char *r = buffer;
char *w = buffer;
for (; *r != '\0'; r++)
{
if (*r != ';')
*w++ = *r;
}
*w = 0; // Terminate the string at its new length
}
You are using strcpy with overlapping input / output buffer, which results in undefined behavior.
You're searching for a sequence of three characters (comma space semicolon) and then removing the first two (the comma and the space). If you want to remove the semicolon too, you need to remove all three characters (use toRemove+3 instead of toRemove+2). You also need to add 1 to the strlen result to account for the NUL byte terminating the string.
If, as you say, you just want to remove the first semicolon and nothing else, you need to search for just the semicolon (which you can do with strchr):
if ((toRemove = strchr(in, ';')) // find a semicolon
memmove(toRemove, toRemove+1, strlen(toRemove+1)+1); // remove 1 char at that position