My code is crashing because of a lack of the char '\0' at the end of some strings.
It's pretty clear to me why we have to use this termination char. My question is,
is there a problem adding a potential 2nd null character to a character array - to solve string problems?
I think it's cheaper just add a '\0' to every string than verify if it needs and then add it, but I don't know if it's a good thing to do.
is there a problem to have this char ('\0') twice at the end of a string?
This question lacks clarity as "string" means different things to people.
Let us use the C specification definition as this is a C post.
A string is a contiguous sequence of characters terminated by and including the first null character. C11 §7.1.1 1
So a string, cannot have 2 null characters as the string ends upon reaching its first one. #Michael Walz
Instead, re-parse to "is there a problem adding a potential 2nd null character to a character array - to solve string problems?"
A problem with attempting to add a null character to a string is confusion. The str...() functions work with C strings as defined above.
// If str1 was not a string, strcpy(str1, anything) would be undefined behavior.
strcpy(str1, "\0"); // no change to str1
char str2[] = "abc";
str2[strlen(str2)] = '\0'; // OK but only, re-assigns the \0 to a \0
// attempt to add another \0
str2[strlen(str2)+1] = '\0'; // Bad: assigning outside `str2[]` as the array is too small
char str3[10] = "abc";
str3[strlen(str3)+1] = '\0'; // OK, in this case
puts(str3); // Adding that \0 served no purpose
As many have commented, adding a spare '\0' is not directly attending the code's fundamental problem. #Haris #Malcolm McLean
That unposted code is the real issue that need solving #Yunnosch, and not by attempting to append a second '\0'.
I think it's cheaper just add a '\0' to every string than verify if it needs and then add it, but I don't know if it's a good thing to do.
Where would you add it? Let's assume we've done something like this:
char *p = malloc(32);
Now, if we know the allocated length, we could put a '\0' as the last character of the allocated area, as in p[31] = '\0'. But we don't how long the contents of the string are supposed to be. If there's supposed to be just foobar, then there'd still be 25 bytes of garbage, which might cause other issues if processed or printed.
Let alone the fact that if all you have is the pointer to the string, it's hard to know the length of the allocated area.
Probably better to fix the places where you build the strings to do it correctly.
Having '\0' is not a problem, unless you have not gone out of bounds of that char array.
You do have to understand that, having '\0' twice would mean, any string operation would not even know that there is a second '\0'. They will just read till the first '\0', and be with it. For them, the first '\0' is the Null terminating character and there should not be anything after that.
Related
Given for example a char *p that points to the first character in "there is so \0ma\0ny \0 \\0 in t\0his stri\0ng !\0\0\0\0",
how would Strrchr() find the last occurrence of null-character?
the following questions arises:
=>What conditions would it depend on to stop the loop!
=>I think in all cases it'll try to access the next memory area to check for its condition?at some point bypassing the string boundaries, UB! so is it safe !
please if i'am wrong feel free to correct me!
It's very simple, as explained in the comments.
The first \0 is the last and the only one in a C string.
So if you write
char *str = "there is so \0ma\0ny \0 \\0 in t\0his stri\0ng !\0\0\0\0";
char *p = strrchr(str, 's');
printf("%s\n", p);
it will print
so
because strchr will find the 's' in "so", which is the last 's' in the string you gave it. And (to answer your specific question) if you write
p = strrchr(str, '\0');
printf("%d %s\n", (int)(p - str), p+1);
it will print
12 ma
proving that strchr found the first \0.
It's obvious to you that str is a long string with some embedded \0's in it. But, in C, there is no such thing as a "string with embedded \0's in it". It is impossible, by definition, for a C string to contain an embedded \0. The first \0, by definition, ends the string.
One more point. You had mentioned that if you were to "access the next memory area", that you would "at some point bypassing the string boundaries, UB!" And you're right. In my answer, I skirted with danger when I said
p = strrchr(str, '\0');
printf("%d %s\n", (int)(p - str), p+1);
Here, p points to what strrchr thinks is the end of the string, so when I compute p+1 and try to print it using %s, if we don't know better it looks like I've indeed strayed into Undefined Behavior. In this case it's safe, of course, because we know exactly what's beyond the first \0. But if I were to write
char *str2 = "hello";
p = strrchr(str2, '\0');
printf("%s\n", p+1); /* WRONG */
then I'd definitely be well over the edge.
There is a difference between "a string", "an array of characters" and "a char* pointer".
A C String is a number of characters terminated by a null character.
An array of characters is a defined number of characters.
A char* pointer is technically a pointer to a single character, but often used to mark a point in a C style string.
You say you have a pointer to a character (char*p) and the value of *p is 't', but you believe that *p is the first character of a C style string
"there is so \0ma\0ny \0 \\0 in t\0his stri\0ng !\0\0\0\0".
As others have said, because you said this is a C style string and you don't know the length of it then the first null after p will mark the end of the string.
If this was a character array char str[40] then you could find the last null by looping from the end of the array towards the start for (i=39; i>=0; i--) BUT you don't know then length, so that won't work.
Hope that helps, and please excuse me if I have strayed into C++, its 25 years since I did C :)
In the case you present, you can never know if the null character you've found is the last one since you have no guarantee for the end of the string. As it is a c-string, it is guaranteed that the string ends with a '\0', but if you decide to go beyond that, you can't know if the memory you're accessing is yours. Accessing memory out of an array has undefined behaviour as you can either be accessing just the next object that is in memory that is yours or you could touch memory that is unallocated, but its block still belongs to your process, or you can try to touch a segment that is not yours at all. And only the third one will cause a SIGSEGV. You can see this question to check for segmentation fault without crashing your program, but your string could have ended way before you can catch it that way.
There is a reason for the strings to have an ending character. If you insist to have \0 in multiple places in your string, you can just terminate with another character, but note that all library functions will still consider the first \0 to be the end of the string.
It is considered a bad practice and a very bad thing to have multiple \0 in your strings so if you can, avoid it.
I have a function taking two strings string_one and string_two both a pointer to a character.
I thought of a way to add them together:
while (*string_one){
string_one++;
}
*string_one = *string_two;
but I can't see the second string in the output!
How do I add two strings toghether? did I go any close?
When you write
*string_one = *string_two;
you are just copying one character since you are de-referencing a char pointer
to add two strings to one another you need to overwrite the \0 on the first string and then append the characters from the second string (provided you have space enough to do so).
so this
while (*string_one) {
string_one++;
}
*string_one = *string_two;
will only overwrite the \0 with the first character from string_two which will result in the first string will not be null terminated any longer.
instead you should so do something similar again like
while (*string_one) {
string_one++;
}
while (*string_two) {
*string_one++ = *string_two++;
}
*string_one = '\0';
again with the premise that string_one originally pointed to a character string large enough to hold both strings.
Did you try your code at all?
First, you need to make sure that there is enough space for both strings. Concatenating strings and not checking that there is enough space for both is probably the cause of 50% of all hacked computers in the world.
Second, a C string is an array of char with a zero byte as the last char. Your code overwrites the zero byte at the end of string_one with the first char of string_two.
Third, there is a function named strcat in the standard C library doing exactly what you want to do. It doesn't check whether there is enough space, you have to do that before the call.
I'm afraid, your code is not valid.
*string_one = *string_two;
will copy the first element of string_two to string_one. This never adds anything. Moreover, you need to append the complete string, not only one element.
What you need is strcat(). You can find more on that here.
The general description :
The strcat() function appends the src string to the dest string, overwriting the terminating null byte (\0) at the end of dest, and then adds a terminating null byte. The strings may not overlap, and the dest string must have enough space for the result. If dest is not large enough, program behavior is unpredictable.
If by mistake,I define a char array with no \0 as its last character, what happens then?
I'm asking this because I noticed that if I try to iterate through the array with while(cnt!='\0'), where cnt is an int variable used as an index to the array, and simultaneously print the cnt values to monitor what's happening the iteration stops at the last character +2.The extra characters are of course random but I can't get it why it has to stop after 2.Does the compiler automatically inserts a \0 character? Links to relevant documentation would be appreciated.
To make it clear I give an example. Let's say that the array str contains the word doh(with no '\0'). Printing the cnt variable at every loop would give me this:
doh+
or doh^
and so on.
EDIT (undefined behaviour)
Accessing array elements outside of the array boundaries is undefined behaviour.
Calling string functions with anything other than a C string is undefined behaviour.
Don't do it!
A C string is a sequence of bytes terminated by and including a '\0' (NUL terminator). All the bytes must belong to the same object.
Anyway, what you see is a coincidence!
But it might happen like this
,------------------ garbage
| ,---------------- str[cnt] (when cnt == 4, no bounds-checking)
memory ----> [...|d|o|h|*|0|0|0|4|...]
| | \_____/ -------- cnt (big-endian, properly 4-byte aligned)
\___/ ------------------ str
If you define a char array without the terminating \0 (called a "null terminator"), then your string, well, won't have that terminator. You would do that like so:
char strings[] = {'h', 'e', 'l', 'l', 'o'};
The compiler never automatically inserts a null terminator in this case. The fact that your code stops after "+2" is a coincidence; it could just as easily stopped at +50 or anywhere else, depending on whether there happened to be \0 character in the memory following your string.
If you define a string as:
char strings[] = "hello";
Then that will indeed be null-terminated. When you use quotation marks like that in C, then even though you can't physically see it in the text editor, there is a null terminator at the end of the string.
There are some C string-related functions that will automatically append a null-terminator. This isn't something the compiler does, but part of the function's specification itself. For example, strncat(), which concatenates one string to another, will add the null terminator at the end.
However, if one of the strings you use doesn't already have that terminator, then that function will not know where the string ends and you'll end up with garbage values (or a segmentation fault.)
In C language the term string refers to a zero-terminated array of characters. So, pedantically speaking there's no such thing as "strings without a '\0' char". If it is not zero-terminated, it is not a string.
Now, there's nothing wrong with having a mere array of characters without any zeros in it, as long as you understand that it is not a string. If you ever attempt to work with such character array as if it is a string, the behavior of your program is undefined. Anything can happen. It might appear to "work" for some magical reasons. Or it might crash all the time. It doesn't really matter what such a program will actually do, since if the behavior is undefined, the program is useless.
This would happen if, by coincidence, the byte at *(str + 5) is 0 (as a number, not ASCII)
As far as most string-handling functions are concerned, strings always stop at a '\0' character. If you miss this null-terminator somewhere, one of three things will usually happen:
Your program will continue reading past the end of the string until it finds a '\0' that just happened to be there. There are several ways for such a character to be there, but none of them is usually predictable beforehand: it could be part of another variable, part of the executable code or even part of a larger string that was previously stored in the same buffer. Of course by the time that happens, the program may have processed a significant amount of garbage. If you see lots of garbage produced by a printf(), an unterminated string is a common cause.
Your program will continue reading past the end of the string until it tries to read an address outside its address space, causing a memory error (e.g. the dreaded "Segmentation fault" in Linux systems).
Your program will run out of space when copying over the string and will, again, cause a memory error.
And, no, the C compiler will not normally do anything but what you specify in your program - for example it won't terminate a string on its own. This is what makes C so powerful and also so hard to code for.
I bet that an int is defined just after your string and that this int takes only small values such that at least one byte is 0.
I am new to C and I am very much confused with the C strings. Following are my questions.
Finding last character from a string
How can I find out the last character from a string? I came with something like,
char *str = "hello";
printf("%c", str[strlen(str) - 1]);
return 0;
Is this the way to go? I somehow think that, this is not the correct way because strlen has to iterate over the characters to get the length. So this operation will have a O(n) complexity.
Converting char to char*
I have a string and need to append a char to it. How can i do that? strcat accepts only char*. I tried the following,
char delimiter = ',';
char text[6];
strcpy(text, "hello");
strcat(text, delimiter);
Using strcat with variables that has local scope
Please consider the following code,
void foo(char *output)
{
char *delimiter = ',';
strcpy(output, "hello");
strcat(output, delimiter);
}
In the above code,delimiter is a local variable which gets destroyed after foo returned. Is it OK to append it to variable output?
How strcat handles null terminating character?
If I am concatenating two null terminated strings, will strcat append two null terminating characters to the resultant string?
Is there a good beginner level article which explains how strings work in C and how can I perform the usual string manipulations?
Any help would be great!
Last character: your approach is correct. If you will need to do this a lot on large strings, your data structure containing strings should store lengths with them. If not, it doesn't matter that it's O(n).
Appending a character: you have several bugs. For one thing, your buffer is too small to hold another character. As for how to call strcat, you can either put the character in a string (an array with 2 entries, the second being 0), or you can just manually use the length to write the character to the end.
Your worry about 2 nul terminators is unfounded. While it occupies memory contiguous with the string and is necessary, the nul byte at the end is NOT "part of the string" in the sense of length, etc. It's purely a marker of the end. strcat will overwrite the old nul and put a new one at the very end, after the concatenated string. Again, you need to make sure your buffer is large enough before you call strcat!
O(n) is the best you can do, because of the way C strings work.
char delimiter[] = ",";. This makes delimiter a character array holding a comma and a NUL Also, text needs to have length 7. hello is 5, then you have the comma, and a NUL.
If you define delimiter correctly, that's fine (as is, you're assigning a character to a pointer, which is wrong). The contents of output won't depend on delimiter later on.
It will overwrite the first NUL.
You're on the right track. I highly recommend you read K&R C 2nd Edition. It will help you with strings, pointers, and more. And don't forget man pages and documentation. They will answer questions like the one on strcat quite clearly. Two good sites are The Open Group and cplusplus.com.
A "C string" is in reality a simple array of chars, with str[0] containing the first character, str[1] the second and so on. After the last character, the array contains one more element, which holds a zero. This zero by convention signifies the end of the string. For example, those two lines are equivalent:
char str[] = "foo"; //str is 4 bytes
char str[] = {'f', 'o', 'o', 0};
And now for your questions:
Finding last character from a string
Your way is the right one. There is no faster way to know where the string ends than scanning through it to find the final zero.
Converting char to char*
As said before, a "string" is simply an array of chars, with a zero terminator added to the end. So if you want a string of one character, you declare an array of two chars - your character and the final zero, like this:
char str[2];
str[0] = ',';
str[1] = 0;
Or simply:
char str[2] = {',', 0};
Using strcat with variables that has local scope
strcat() simply copies the contents of the source array to the destination array, at the offset of the null character in the destination array. So it is irrelevant what happens to the source after the operation. But you DO need to worry if the destination array is big enough to hold the data - otherwise strcat() will overwrite whatever data sits in memory right after the array! The needed size is strlen(str1) + strlen(str2) + 1.
How strcat handles null terminating character?
The final zero is expected to terminate both input strings, and is appended to the output string.
Finding last character from a string
I propose a thought experiment: if it were generally possible to find the last character
of a string in better than O(n) time, then could you not also implement strlen
in better than O(n) time?
Converting char to char*
You temporarily can store the char in an array-of-char, and that will decay into
a pointer-to-char:
char delimiterBuf[2] = "";
delimiterBuf[0] = delimiter;
...
strcat(text, delimiterBuf);
If you're just using character literals, though, you can simply use string literals instead.
Using strcat with variables that has local scope
The variable itself isn't referenced outside the scope. When the function returns,
that local variable has already been evaluated and its contents have already been
copied.
How strcat handles null terminating character?
"Strings" in a C are NUL-terminated sequences of characters. Both inputs to
strcat must be NUL-terminated, and the result will be NUL-terminated. It
wouldn't be useful for strcat to write an extra NUL-byte to the result if it
doesn't need to.
(And if you're wondering what if the input strings have multiple trailing
NUL bytes already, I propose another thought experiment: how would strcat know
how many trailing NUL-bytes there are in a string?)
BTW, since you tagged this with "best-practices", I'll also recommend that you take care not to write past the end of your destination buffers. Typically this means avoiding strcat and strcpy (unless you've already checked that the input strings won't overflow the destination) and using safer versions (e.g. strncat. Note that strncpy has its own pitfalls, so that's a poor substitute. There also are safer versions that are non-standard, such as strlcpy/strlcat and strcpy_s/strcat_s.)
Similarly, functions like your foo function always should take an additional argument specifying what the size of the destination buffer is (and documentation should make it explicitly clear whether that size accounts for a NUL terminator or not).
How can I find out the last character
from a string?
Your technique with str[strlen(str) - 1] is fine. As pointed out, you should avoid repeated, unnecessary calls to strlen and store the results.
I somehow think that, this is not the
correct way because strlen has to
iterate over the characters to get the
length. So this operation will have a
O(n) complexity.
Repeated calls to strlen can be a bane of C programs. However, you should avoid premature optimization. If a profiler actually demonstrates a hotspot where strlen is expensive, then you can do something like this for your literal string case:
const char test[] = "foo";
sizeof test // 4
Of course if you create 'test' on the stack, it incurs a little overhead (incrementing/decrementing stack pointer), but no linear time operation involved.
Literal strings are generally not going to be so gigantic. For other cases like reading a large string from a file, you can store the length of the string in advance as but one example to avoid recomputing the length of the string. This can also be helpful as it'll tell you in advance how much memory to allocate for your character buffer.
I have a string and need to append a
char to it. How can i do that? strcat
accepts only char*.
If you have a char and cannot make a string out of it (char* c = "a"), then I believe you can use strncat (need verification on this):
char ch = 'a';
strncat(str, &ch, 1);
In the above code,delimiter is a local
variable which gets destroyed after
foo returned. Is it OK to append it to
variable output?
Yes: functions like strcat and strcpy make deep copies of the source string. They don't leave shallow pointers behind, so it's fine for the local data to be destroyed after these operations are performed.
If I am concatenating two null
terminated strings, will strcat
append two null terminating characters
to the resultant string?
No, strcat will basically overwrite the null terminator on the dest string and write past it, then append a new null terminator when it's finished.
How can I find out the last character from a string?
Your approach is almost correct. The only way to find the end of a C string is to iterate throught the characters, looking for the nul.
There is a bug in your answer though (in the general case). If strlen(str) is zero, you access the character before the start of the string.
I have a string and need to append a char to it. How can i do that?
Your approach is wrong. A C string is just an array of C characters with the last one being '\0'. So in theory, you can append a character like this:
char delimiter = ',';
char text[7];
strcpy(text, "hello");
int textSize = strlen(text);
text[textSize] = delimiter;
text[textSize + 1] = '\0';
However, if I leave it like that I'll get zillions of down votes because there are three places where I have a potential buffer overflow (if I didn't know that my initial string was "hello"). Before doing the copy, you need to put in a check that text is big enough to contain all the characters from the string plus one for the delimiter plus one for the terminating nul.
... delimiter is a local variable which gets destroyed after foo returned. Is it OK to append it to variable output?
Yes that's fine. strcat copies characters. But your code sample does no checks that output is big enough for all the stuff you are putting into it.
If I am concatenating two null terminated strings, will strcat append two null terminating characters to the resultant string?
No.
I somehow think that, this is not the correct way because strlen has to iterate over the characters to get the length. So this operation will have a O(n) complexity.
You are right read Joel Spolsky on why C-strings suck. There are few ways around it. The ways include either not using C strings (for example use Pascal strings and create your own library to handle them), or not use C (use say C++ which has a string class - which is slow for different reasons, but you could also write your own to handle Pascal strings more easily than in C for example)
Regarding adding a char to a C string; a C string is simply a char array with a nul terminator, so long as you preserve the terminator it is a string, there's no magic.
char* straddch( char* str, char ch )
{
char* end = &str[strlen(str)] ;
*end = ch ;
end++ ;
*end = 0 ;
return str ;
}
Just like strcat(), you have to know that the array that str is created in is long enough to accommodate the longer string, the compiler will not help you. It is both inelegant and unsafe.
If I am concatenating two null
terminated strings, will strcat append
two null terminating characters to the
resultant string?
No, just one, but what ever follows that may just happen to be nul, or whatever happened to be in memory. Consider the following equivalent:
char* my_strcat( char* s1, const char* s2 )
{
strcpy( &str[strlen(str)], s2 ) ;
}
the first character of s2 overwrites the terminator in s1.
In the above code,delimiter is a local
variable which gets destroyed after
foo returned. Is it OK to append it to
variable output?
In your example delimiter is not a string, and initialising a pointer with a char makes no sense. However if it were a string, the code would be fine, strcat() copies the data from the second string, so the lifetime of the second argument is irrelevant. Of course you could in your example use a char (not a char*) and the straddch() function suggested above.
I'm writing a C Code in which my array of length 2 char contains String My But while printing it to the Screen using puts(). I'm getting this output
My £■ 0√"
What is the reason for such codes ???
And if my array length is 2 then How can i get output of length 2+ ???
sounds like you are missing the null terminator - the string needs to be three chars "m', 'y', '\0'
If you've explicitly set the length of the string to 2, you're not leaving room for a NUL terminator, which is what puts uses to find the end of the string. Since you don't have one, it'll continue printing out the contents of memory following the string you defined, until it gets to a byte in memory that happens to contain a 0.
To avoid that, you generally should not specify the length when you're creating a string literal:
char string[2] = "My"; // avoid this
char string2[] = "My"; // use this instead.
It's been a while since I did C. But I suspect that your character array doesn't end with a null character. So you need to end your array with '\0'
Your array length should be at least 3, one for each character and one for a \0 character.
make sure you've got the terminating char.