C code removing wrong character from string - c

This code is supposed to remove any leading spaces from the given string, and it was working correctly. Then, for seemingly no reason at all, it started removing characters in the middle of the word. In this example the word "CHEDDAR" is given, which has no leading spaces so it should be passed back the same as it was input, however it's returning "CHEDDR" and I have no idea why. Does anyone know how this is even possible? I assume it has to do with pointers and memory, but I am not fluent in C and I need some help. Runnning on RHEL. Thanks.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define REMOVE_LEADING_SPACES(input) \
{ \
stripFrontChar( input, ' ' ); \
}
char *stripFrontChar(char *startingString, char removeChar) {
while (*startingString == removeChar)
strcpy(startingString, startingString + 1);
return (startingString);
}
void main(argc, argv)
char **argv;int argc; {
char *result = "CHEDDAR";
REMOVE_LEADING_SPACES(result);
printf("%s\n", result);
}
EDIT: It's a little late now but based on the comments I should have shown that the word (CHEDDAR I used as an example) is read from a file, not a literal as shown in my code. I was trying to simplify it for the question and I realize now it's a completely different scenario, so I shouldn't have. Thanks, looks like I need to use memmov.
EDIT2: There actually is a space like " CHEDDAR", so I really just need to change it to memmov, thanks again everyone.

You copy a string using overlapping memory area:
strcpy(startingString, startingString + 1);
From the C standard:
7.24.2.3 The strcpy function
If copying takes place between objects that overlap, the behavior is undefined.
You need to use memmov (and provide proper length) or you need to move the characters on your own. You can also improve the performance if you start with counting the characters that need to be removed and then copy all in one go.
Another issue that was pointed out by Joop Eggen in a comment:
char *result = "CHEDDAR";
You are not allowed to modify string literals.
If you try to remove leading characters, you invoke undefined behaviour.
You should change this to
char result[] = "CHEDDAR";
As your sample string does not contain a leading space, this does not cause trouble yet. But you should fix it nevertheless

This code
strcpy(startingString, startingString + 1);
copies overlapping strings.
Per 7.24.2.3 The strcpy function, paragraph 2 of the C standard:
The strcpy function copies the string pointed to by s2 (including the terminating null character) into the array pointed to by s1. If copying takes place between objects that overlap, the behavior is undefined.
You are invoking undefined behavior.

Although the answers identifying that you are copying overlapping strings identify undefined behavior, another cause is that you use a literal string, and on most platforms those are immutable and will cause the program to abort.
Instead of:
char *result = "CHEDDAR";
Use:
char result[] = "CHEDDAR";
(Note: looking at how most strcpy functions will have been implemented, namely a loop that terminates when seeing the null character of the source string, then the overlap that you use will still see the null character of the source and place it in the destination (down-copying). Copying the other way around (up-copying) would not see the null terminator anymore, as it will have been overwritten, and may continue copying beyond the destination string.)

In your case, where no modification is needed and no allocs are done, you only need to find the start without copying anything.
char *stripFrontChar(char *startingString, char removeChar) {
for( ; *startingString == removeChar; startingString++)
;
return (startingString);
}
But you have to use the return of stripFrontChar()
printf("%s\n", stripFrontChar(result));

Related

Segmentation fault of small code

I am trying to test something and I made a small test file to do so. The code is:
void main(){
int i = 0;
char array1 [3];
array1[0] = 'a';
array1[1] = 'b';
array1[2] = 'c';
printf("%s", array1[i+1]);
printf("%d", i);
}
I receive a segmentation error when I compile and try to run. Please let me know what my issue is.
Please let me know what my issue is. ? firstly char array1[3]; is not null terminated as there is no enough space to put '\0' at the end of array1. To avoid this undefined behavior increase the size of array1.
Secondly, array1[i+1] is a single char not string, so use %c instead of %s as
printf("%c", array1[i+1]);
I suggest you get yourself a good book/video series on C. It's not a language that's fun to pick up out of the blue.
Regardless, your problem here is that you haven't formed a correct string. In C, a string is a pointer to the start of a contiguous region of memory that happens to be filled with characters. There is no data whatsoever stored about it's size or any other characteristics. Only where it starts and what it is. Therefore you must provide information as to when the string ends explicitly. This is done by having the very last character in a string be set to the so called null character (in C represented by the escape sequence '\0'.
This implies that any string must be one character longer than the content you want it to hold. You should also never be setting up a string manually like this. Use a library function like strlcpy to do it. It will automatically add in a null character, even if your array is too small (by truncating the string). Alternatively you can statically create a literal string like this:
char array[] = "abc";
It will automatically be null terminated and be of size 4.
Strings need to have a NUL terminator, and you don't have one, nor is there room for one.
The solution is to add one more character:
char array1[4];
// ...
array1[3] = 0;
Also you're asking to print a string but supplying a character instead. You need to supply the whole buffer:
printf("%s", array1);
Then you're fine.
Spend the time to learn about how C strings work, in particular about the requirement for the terminator, as buffer overflow bugs are no joke.
When printf sees a "%s" specifier in the formatting string, it expects a char* as the corresponding argument, but you passed a char value of the array1[i+1] expression. That char got promoted to int but that is still incompatible with char *, And even if it was it has no chance to be a valid pointer to any meaningful character string...

Getting substring from string in C

I have a string "abcdefg-this-is-a-test" and I want to delete the first 6 characters of the string. This is what I am trying:
char contentSave2[180] = "abcdefg-this-is-a-test";
strncpy(contentSave2, contentSave2+8, 4);
No luck so far, processor gets stuck and resets itself.
Any help will be appreaciated.
Question: How can I trim down a string in C?
////EDIT////
I also tried this:
memcpy(contentSave2, &contentSave2[6], 10);
Doesn't work, same problem.
int len=strlen(content2save);
for(i=6;i<len;i++)
content2save[i-6]=content2save[i];
content2save[i-6]='\0'
This will delete first 6 charcters . Based on requirement you may modify your code. If you want to use an inbuilt function try memmove
The problem with your first code snippet is that it copies the middle four characters to the beginning of the string, and then stops.
Unfortunately, you cannot expand it to cover the entire string, because in that case the source and output buffers would overlap, causing UB:
If the strings overlap, the behavior is undefined.
Overlapping buffers is the problem with your second attempt: memcpy does not allow overlapping buffers, so the behavior is undefined.
If all you need is to remove characters at the beginning of the string, you do not need to copy it at all: simply take the address of the initial character, and use it as your new string:
char *strWithoutPrefix = &contentSave2[8];
For copying of strings from one buffer to another use memcpy:
char middle[5];
memcpy(middle, &contentSave2[8], 4);
middle[4] = '\0'; // "this"
For copying potentially overlapping buffers use memmove:
char contentSave2[180] = "abcdefg-this-is-a-test";
printf("%s\n", contentSave2);
memmove(contentSave2, contentSave2+8, strlen(contentSave2)-8+1);
printf("%s\n", contentSave2);
Demo.
Simply you can use pointer because contentSave2 here is also a pointer to a char array plus this will be quick and short.
char* ptr = contentSave2 + 6;
ptr[0] will be equal to contentSave2[6]
You can use memmove function.
It is specially used when source and destination memory addresses overlap.
Small word of advice, try to avoid copying to and from overlapping source and destination. It is simply a buggen.
The following snippet should works fine:
#include <stdio.h>
#include <string.h>
int main() {
char contentSave2[180] = "abcdefg-this-is-a-test";
strncpy(contentSave2, contentSave2+8, 4);
printf("%s\n", contentSave2);
return 0;
}
I would suggest posting the rest of your code because your issue is elsewhere. As others pointed out, watch out for overlap when you use strncpy though in this specific case it should works.

Space for Null character in c strings

When is it necessary to explicitly provide space for a NULL character in C strings.
For eg;
This works without any error although I haven't declared str to be 7 characters long,i.e for the characters of string plus NULL character.
#include<stdio.h>
int main(){
char str[6] = "string";
printf("%s", str);
return 0;
}
Though in this question https://stackoverflow.com/a/7652089 the user says
"This is useful if you need to modify the string later on, but know that it will not exceed 40 characters (or 39 characters followed by a null terminator, depending on context)."
What does it mean by "depending on context" ?
When is it necessary to explicitly provide space for a NULL character in C strings?
Always. Not having that \0 character there will make functions like strcpy, strlen and printing via %s behave wrong. It might work for some examples (like your own) but I won't bet anything on that.
On the other hand, if your string is binary and you know the length of the packet you don't need that extra space. But then you cannot use str* functions. And this is not the case of your question, anyway.
It is buggy, keyword "buffer overflow". The memory is overwritten.
char str[4] = "stringulation";
char str2[20];
printf("%s", str);
printf("%s", str2);
Trying to write on some address for which you have not requested may lead to data corruption, Random output or undefined nature of code.
Your code invokes undefined behaviour. You may think it works, but the code is broken.
To store a C string with 6 characters, and a null-terminator, you need a character array of length 7 or more.
When is it necessary to explicitly provide space for a NULL character in C strings
There are no exceptions. A C string must always include a null terminating character.
What does it mean by "depending on context"?
The answer there is drawing the distinction between a string variable that you intend to modify at a later time, or a string variable that you will not modify. In the former case, you may choose to allocate more than you need for the initial contents, because you want to be able to add more later. In the latter case, you can simply allocate as many characters are needed for the initial value, and no more.
That 0 terminator1 is how the various library functions (strcpy(), strlen(), printf(), etc.) identify the end of a string. When you call a function like
char foo[6] = "hello";
printf( "%s\n", foo );
the array expression foo is converted to a pointer value before it's passed to the function, so all the function receives is the address of the first character; it doesn't know how long the foo array is. So it needs some way to know where the end of the string is. If foo didn't have that space for the 0 terminator, printf() would continue to print characters beyond the end of the array until it saw a 0-valued byte.
1. I prefer using the term "0 terminator" instead of "NULL terminator", just to avoid confusion with the NULL pointer, which is a different thing.

How strcpy works behind the scenes?

This may be a very basic question for some. I was trying to understand how strcpy works actually behind the scenes. for example, in this code
#include <stdio.h>
#include <string.h>
int main ()
{
char s[6] = "Hello";
char a[20] = "world isnsadsdas";
strcpy(s,a);
printf("%s\n",s);
printf("%d\n", sizeof(s));
return 0;
}
As I am declaring s to be a static array with size less than that of source. I thought it wont print the whole word, but it did print world isnsadsdas .. So, I thought that this strcpy function might be allocating new size if destination is less than the source. But now, when I check sizeof(s), it is still 6, but it is printing out more than that. Hows that working actually?
You've just caused undefined behaviour, so anything can happen. In your case, you're getting lucky and it's not crashing, but you shouldn't rely on that happening. Here's a simplified strcpy implementation (but it's not too far off from many real ones):
char *strcpy(char *d, const char *s)
{
char *saved = d;
while (*s)
{
*d++ = *s++;
}
*d = 0;
return saved;
}
sizeof is just returning you the size of your array from compile time. If you use strlen, I think you'll see what you expect. But as I mentioned above, relying on undefined behaviour is a bad idea.
http://natashenka.ca/wp-content/uploads/2014/01/strcpy8x11.png
strcpy is considered dangerous for reasons like the one you are demonstrating. The two buffers you created are local variables stored in the stack frame of the function. Here is roughly what the stack frame looks like:
http://upload.wikimedia.org/wikipedia/commons/thumb/d/d3/Call_stack_layout.svg/342px-Call_stack_layout.svg.png
FYI things are put on top of the stack meaning it grows backwards through memory (This does not mean the variables in memory are read backwards, just that newer ones are put 'behind' older ones). So that means if you write far enough into the locals section of your function's stack frame, you will write forward over every other stack variable after the variable you are copying to and break into other sections, and eventually overwrite the return pointer. The result is that if you are clever, you have full control of where the function returns. You could make it do anything really, but it isn't YOU that is the concern.
As you seem to know by making your first buffer 6 chars long for a 5 character string, C strings end in a null byte \x00. The strcpy function copies bytes until the source byte is 0, but it does not check that the destination is that long, which is why it can copy over the boundary of the array. This is also why your print is reading the buffer past its size, it reads till \x00. Interestingly, the strcpy may have written into the data of s depending on the order the compiler gave it in the stack, so a fun exercise could be to also print a and see if you get something like 'snsadsdas', but I can't be sure what it would look like even if it is polluting s because there are sometimes bytes in between the stack entries for various reasons).
If this buffer holds say, a password to check in code with a hashing function, and you copy it to a buffer in the stack from wherever you get it (a network packet if a server, or a text box, etc) you very well may copy more data from the source than the destination buffer can hold and give return control of your program to whatever user was able to send a packet to you or try a password. They just have to type the right number of characters, and then the correct characters that represent an address to somewhere in ram to jump to.
You can use strcpy if you check the bounds and maybe trim the source string, but it is considered bad practice. There are more modern functions that take a max length like http://www.cplusplus.com/reference/cstring/strncpy/
Oh and lastly, this is all called a buffer overflow. Some compilers add a nice little blob of bytes randomly chosen by the OS before and after every stack entry. After every copy the OS checks these bytes against its copy and terminates the program if they differ. This solves a lot of security problems, but it is still possible to copy bytes far enough into the stack to overwrite the pointer to the function to handle what happens when those bytes have been changed thus letting you do the same thing. It just becomes a lot harder to do right.
In C there is no bounds checking of arrays, its a trade off in order to have better performance at the risk of shooting yourself in the foot.
strcpy() doesn't care whether the target buffer is big enough so copying too many bytes will cause undefined behavior.
that is one of the reasons that a new version of strcpy were introduced where you can specify the target buffer size strcpy_s()
Note that sizeof(s) is determined at run time. Use strlen() to find the number of characters s occupied. When you perform strcpy() source string will be replaced by destination string so your output wont be "Helloworld isnsadsdas"
#include <stdio.h>
#include <string.h>
main ()
{
char s[6] = "Hello";
char a[20] = "world isnsadsdas";
strcpy(s,a);
printf("%s\n",s);
printf("%d\n", strlen(s));
}
You are relying on undefined behaviour in as much as that the compiler has chose to place the two arrays where your code happens to work. This may not work in future.
As to the sizeof operator, this is figured out at compile time.
Once you use adequate array sizes you need to use strlen to fetch the length of the strings.
The best way to understand how strcpy works behind the scene is...reading its source code!
You can read the source for GLibC : http://fossies.org/dox/glibc-2.17/strcpy_8c_source.html . I hope it helps!
At the end of every string/character array there is a null terminator character '\0' which marks the end of the string/character array.
strcpy() preforms its task until it sees the '\0' character.
printf() also preforms its task until it sees the '\0' character.
sizeof() on the other hand is not interested in the content of the array, only its allocated size (how big it is supposed to be), thus not taking into consideration where the string/character array actually ends (how big it actually is).
As opposed to sizeof(), there is strlen() that is interested in how long the string actually is (not how long it was supposed to be) and thus counts the number of characters until it reaches the end ('\0' character) where it stops (it doesn't include the '\0' character).
Better Solution is
char *strcpy(char *p,char const *q)
{
char *saved=p;
while(*p++=*q++);
return saved;
}

C: How to copy over null terminator to structure member, in cleaner way?

Essentially I am tokenizing a string and strncpying the string found to a structure member, i.e. stringid. It of course suffers from the problem of lack of termination, I have added an extra array space for it, I've no clue how to add it properly.
I had done it like so:
my_struct[iteration].stringID[ID_SIZE-1] = '\0' //updated
I am unsure if that really works, it looks horrible IMO.
Str(n)cpying a null character, or 0, results in a warning generated by GCC and MinGW:
warning: null argument where non-null required (arg 2)
Am I blind on how to do this in a clean manner? I was thinking of memsetting the member array to all zeros, and then copying the string in to nicely fit with null termination. Do you have any suggestions or practises?
Two things:
Beware that strncpy() has very unexpected semantics, it will always 0-fill the buffer if not totally filled by the string, and it will not terminate the string if it completely fills the buffer. Both of these are weird enough that I recommend against using it.
Never index an array with it's size, like stringID[ID_SIZE] seems to be doing; that is out of bounds.
The best solution is to write a custom version of strncpy() that is less weird, or (if you know the length of the input) just use strcpy().
UPDATE: If the length of your input tokens is static, but they're not 0-terminated in the source buffer due to your tokenization process, then just use memcpy() and manual termination:
const char * token = ...; /* Extract from tokenization somehow. Not 0-terminated. */
const size_t token_length = ... /* Perhaps from tokenization step. */
memcpy(my_struct[iteration].stringID, token, token_length);
my_struct[iteration].stringID[token_length] = '\0';
I don't see a need to "wrap" the above in a macro.
Actually, null terminating the way you suggested isn't horrible at all and I personally very much like it.
The best way, in my opinion, would be to define it as a macro in similar fashion:
// for char* blah;
#define TERMINATE_DYNAMIC_STRING(str, len) str[len] = '\0';
// for char mytext[] = "hello";
#define TERMINATE_STRING(str) str[sizeof(str)/sizeof(str[0]) - 1] = '\0';
Then you can use it all around your code as much as you want.
On Windows Microsoft gives you the following functions which null terminate when copying string: StringCchCopy
As others have noted, strncpy has odd semantics. The idiomatic way to do a bounded string copy is to strncat onto an empty string:
my_struct[iteration].stringID[0] = '\0';
strncat(my_struct[iteration].stringID, src, ID_SIZE-1);
This always appends a terminating NUL, (and fills at most ID_SIZE characters including the NUL).
I ended-up writing a strncpyz(char* pszTo, char* pszTo, size_t lSize) function that forces the NULL termination. This works pretty-well if you have a library to put it in. Using it also requires minimal code changes.
I'm not keen on the macro approach because somebody will pass a pointer to the wrong macro.

Resources