A null character '\0' at the end of a string - c

I have the following code.
#include <stdio.h>
#include <string.h>
#define MAXLINE 1000
int main()
{
int i;
char s[MAXLINE];
for (i = 0; i < 20; ++i) {
s[i] = i + 'A';
printf("%c", s[i]);
}
printf("\nstrlen = %d\n", strlen(s)); // strlen(s): 20
return 0;
}
Should I write
s[i] = '\0';
explicitly after the loop executing to mark the end of the string or it is done automatically? Without s[i] = '\0'; function strlen(s) returns correct value 20.

Yes, you need to add a null terminator yourself. One is not added automatically.
You can verify this by explicitly initializing s to something that doesn't contain a NUL at byte 20.
char s[MAXLINE] = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa";
If you do that strlen(s) won't return 20.

Yes, you should add the null terminator after the loop. Alternatively, you could initialize the entire array with 0. That way, you don't have to add a 0 after the loop because there is one already:
...
char s[MAXLINE] = {0};
...

You MUST add a NUL terminator to mark the end of a C string.
Adding a NUL terminator character isn't automatic (unless documentation states that a function call writes the NUL terminator character for you).
In your case, use:
s[20] = 0;
As mentioned in the comments, C strings are defined by the terminator NUL character. The NUL character is required also by all the strXXX C functions.
If you don't mark the end of the string with a NUL, you have a (binary) sequence of characters, but not a C string. These are sometimes referred to as binary strings and they cannot use the strXXX library functions.
Why do you get Correct Results
It is likely that you get correct results mostly by chance.
The most probable explanation for the correct results is that the OS you are using provides you with a "clean" memory stack (the initial stack memory is all zero)... this isn't always the case.
Since you never wrote on the stack memory prior to executing your code, the following byte is whatever was there before (on your OS, that byte was set to zero when the stack was first initialized).
However, this will not be true if the OS does not provide you with a "clean" stack or if your code runs on a previously used stack.

Related

Random character strange defined behavior

I'm using an array in this code because i need a string which should be always modified, that's why i'm not using a pointer, howewer everytime i run the code i get a strange behavior at the 31th iteration.
code
int i = 0;
char name[100];
srand(getpid());
while(i<100) {
name[i] += (char)'A'+(rand()%26);
printf("%d strlen\n", i+1);
printf("%s\n", name);
printf("/////\n");
i++;
}
output
/////
30 strlen
IKXVKZOLKHLTKBFFTUZCYXHYVEBZOY
/////
31 strlen
IKXVKZOLKHLTKBFFTUZCYXHYVEBZOYJ
/////
32 strlen
IKXVKZOLKHLTKBFFTUZCYXHYVEBZOYJWttime
/////
33 strlen
IKXVKZOLKHLTKBFFTUZCYXHYVEBZOYJW�time
/////
34 strlen
IKXVKZOLKHLTKBFFTUZCYXHYVEBZOYJW��ime
/////
35 strlen
IKXVKZOLKHLTKBFFTUZCYXHYVEBZOYJW���me
/////
36 strlen
IKXVKZOLKHLTKBFFTUZCYXHYVEBZOYJW����e
/////
37 strlen
IKXVKZOLKHLTKBFFTUZCYXHYVEBZOYJW�����
In other words it prints always ttime as the 31th character and then the code overwrites each character of that word and i get question mark as a result.
Going on the things get even worse look at the final output
100 strlen
IKXVKZOLKHLTKBFFTUZCYXHYVEBZOYJW�����K��ȶ������MKRLHALEV�SNNRVWNOEXUVQNJUHAEWN�W�YPMCW�N�PXHNT��0�
/////
Why does this happen?
Well you are printing garbage value. What the behavior will be is not known.(Undefined behavior) By that I mean, it may be the case that those garbage values (added with your random number) may be ascii values of some characters or may be those are some non-printables. You should initialize the char array (with \0's - that will serves two purpose, Providing \0 for the running string and also you can add and be sure it will be a printable) or just assign it.
name[i] = 'A'+(rand()%26);
Also put a \0 in the end of the string. Otherwise it will try to access array index out of bound until it finds \0 and it will invoke undefined behavior.
31 is not something special - it can be anything the very next time you run it.
Code:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main(void) {
size_t i = 0;
char name[100]={0}; // = "";
srand(getpid());
while(i<99) { // till 99 making sure you won't overwrite the last NUL
name[i] ='A'+(rand()%26);
printf("%d strlen\n", i+1);
printf("%s\n", name);
printf("/////\n");
i++;
}
return 0;
}
Note that we have looped till 98th index because there is NUL terminating character in the 99th index.
char name[100]; is not a string by default. It is just another 100 element char array.
In C a string is a always carrying (at least) one '\0' character to mark the end of the string. printf(), mostly all str*() functions and many other functions rely on this terminating '\0'.
Also what is the idea behind adding to the array elements?
name[i] += ...
Their values are not set, they are garbage. Even worth, adding to them means reading uninitialised memory 1st, which in turn provokes undefined behaviour.
So to fix your code drop the addition add the terminator by hand:
while (i < 99) {
name[i] = (char) 'A' + (rand() % 26);
name[i + 1] = '\0';
Or go for the lazy approach any initialising name to all '\0' even before starting:
char name[100] = ""; /* or ... = {0}; */
(this would allow you to stick to doing name[i] += .... Still, as all elements are 0, adding is of no use.)
In any case do not loop until the array last element (100 here), but always one less as the last element is reserved for the terminating '\0'.
If the char name[100] array is a local variable inside a function, its initial value is undefined. So it will contain whatever random junk was in that chunk of memory before.
Therefore when you are doing
name[i] += (char)'A'+(rand()%26);
you are actually doing
name[i] = RANDOM JUNK + (char)'A'+(rand()%26);

function returning a string of zeros in c

i trying to create a function that return an array of zeros us a char array
and print this array in a file text but when i return a string an addition char was returned
this the text file string the program wrote
this my fuction :
char *zeros_maker (int kj,int kj1)
{
char *zeros;
zeros = (char *) malloc(sizeof(char)*(kj-kj1));
int i;
for(i =0;i<kj-kj1;i++)
zeros[i]='0';
printf("%s\n",zeros);
return zeros;
}
the instruction i used when i printed in the file
fprintf(pFile,"%c%s%c &",34,zeros_maker(added_zeros,0),34);
Thanks in advance
'0' in C is the value of the encoding used for the digit zero. This is not allowed to have the value 0 by the C standard.
You need to add a NUL-terminator '\0' to the end of the char array, in order for the printf function to work correctly.
Else you run the risk of it running past the end of the char array, with undefined results.
Finally, don't forget to free the allocated memory at some point in your program.
Read about how string in C are meant to be terminated.
Each string terminates with the null char '\0' (the NULL symbol ASCII value 0, not to be confused with the char '0' that has ASCII value 48). It identifies the end of the string.
zeros[kj-kj1]='\0';
Plus check always if you are accessing an element out of bound. In this case it happens if kj1> kj
Instead of for loop, you may get hand of memset.
char* zeros_maker(int kj,int kj1)
{
int len=kj-kj1;
char *zeros=malloc(sizeof(char)*(len));
memset(zeros,'0',len-1);
zeros[len-1]=0;
printf("%s\n",zeros);
//fflush(stdout);
return zeros;
}
Or if you are not fan of C-style string, and it's going to be ASCII only, following could be used too. Just be careful what your are doing this way.
char* zeros_maker_pascal_form(int kj,int kj1)
{
int len=kj-kj1;
char *zeros=malloc(sizeof(char)*(len));
memset(zeros,'0',len);
for(int a=0;a<len;a++){
printf("%c",zeros[a]);
}
printf("\n");
//fflush(stdout);
return zeros;
}
Your code has a few basic issues, the main one is that it fails to terminate the string (and include space for the terminator).
Here's a fixed and cleaned-up version:
char * zeros_maker(size_t length)
{
char *s = malloc(length + 1);
if(s != NULL)
{
memset(s, '0', length);
s[length - 1] = '\0';
}
return s;
}
This has the following improvements over your code:
It simplifies the interface, just taking the number of zeroes that should be returned (the length of the returned string). Do the subtraction at the call site, where those two values make sense.
No cast of the return value from malloc(), and no scaling by sizeof (char) since I consider that pointless.
Check for NULL being returned by malloc() before using the memory.
Use memset() to set a range of bytes to a single value, that's a standard C function and much easier to know and verify than a custom loop.
Terminate the string, of course.
Call it like so:
char *zs = zeros_maker(kj - kj1);
puts(s);
free(s);
Remember to free() the string once you're done with it.

Where does C store '\0'?

The below code reads a line and return the line length. lim is the length of the array s[].
When the input line length is lim, then s[lim] = '\0'. But the array s[] is only lim-length long, from s[0] to s[lim-1]. Will it cause an buffer overflow? I tested it many times, but the code seemed to work just fine.
int getline(char s[], int lim)
{
int c, i;
for(i = 0; i < lim-1 && ( c = getchar())!= EOF && c!= '\n'; i++)
s[i] = c;
if( c == '\n') {
s[i] = c;
++i;
}
s[i] = '\0';
return i;
}
The '\0' is just another character. It is stored right after the last character of the string.
Often, you can "get away" with writing off the end of a buffer with no obvious harm, but don't do it. It's a bug.
I once had to debug a program that contained an error like this. The program was writing a single byte past the end of one buffer. In the debug build, there was enough extra stuff on the stack that the single byte extra caused no harm; the crash only occurred in the release build, but the debugger didn't really work since it was the non-debug build. This is an example of why it is good to test your code both in a "debug" build and in a release build (compiled the way you would give it to your users).
This is a good example as to how to clearly define an interface - its input and returned value;
" int getline(char s[], int lim) "
One possible definition of "lim" is, maximum number of characters to be copied to s[], excluding the terminating null-character i.e. '\0'
Example:
char arr[] = "hello";
getline(arr, strlen(arr));
The other definition of "lim" is, Maximum number of characters to be copied into s[] (including the terminating null-character)
Example:
char arr[] = "hello";
getline(arr, sizeof(arr));
You seem to be supposing the 2nd definition of "lim".
This is a function straight out of "The C Programming Language" by K&R. It's from chapter one. It works because it is correct.
Consider "cat". This is a four character array {'c','a','t','\0'}. The length of the string is 3.
If s[]="cat" then s[0]='c', s[3]='\0'. Eh?
The string length returned by srtlen or what have you is the number of characters minus one. The array is allocated to hold all the 4 characters. That's where the '\0' is, at the end of the array.
No, it won't cause buffer overflow. In fact, a '\0' indicates a NULL position, which is considered as the end of an array. When you go from the beginning to the end of an array, the last position containing the '\0' character will never be considered as a position containing valid data.
You could go over all the array by using while(index < size) as a condition, or by using while(array[position] != NULL)

Initializing end of the string in C

I am learning C now and I'm at the point where I don't really get what is the difference of initializing the end of the string with NULL '\0' character. Below is the example from the book:
#include <stdio.h>
#include <string.h>
int main(){
int i;
char str1[] = "String to copy";
char str2[20];
for(i = 0; str1[i]; i++)
str2[i] = str1[i];
str2[i] = '\0'; //<====WHY ADDING THIS LINE??
printf("String str2 %s\n\n", str2);
return 0;
}
So, why do I have to add NULL character? Because it works without that line as well. Also, is there a difference if I use:
for (i = 0; str1[i]; i++){
str2[i] = str1[i];
}
Thanks for your time.
The line you're referring to is added in general use for safety. When you copy values to a string you always want to be sure that it's null terminated, otherwise when reading the string it will continue past the point where you want the end of that string to be (because it doesn't know where to stop due to lack of the null terminator).
There is no difference with the alternate code you posted since you are separating only the line below the for statement to be in the loop, which happens by default anyway if you don't use the curly braces {}
In C, the end of the string is detected by the null character. Consider the string 'abcd'. If the variable in the actual binary have the next variable immediately after the 'd' character, C will think that the next characters in the platform are part of that string and you will continue. This is called buffer overrun.
Your initial statement allowing 20 bytes for str2 will usually fill it with 20 zeroes, However, this is not required and may not occur. Additionally, let us say you move a 15 character string into str2. Since it starts with 20 zeroes, this will work. However, say that you then copy a 10 character string into str2. The remaining 5 characters will be unchanged and you will then have a 15 character string consisting of the new 10 characters, followed by the five characters previously copied in.
In the code above the for loop says move the character in str1 to str2 and point to the next character. If the character now pointed to in str1 is not 0, loop back and do again. Otherwise drop out of the loop. Now add the null character to the end of the str2. If you left that out, the null character at the end of str1 would not be copied to str2, and you would have no null character at the end of str2.
This can be expressed as
i = 0;
label:
if (str1[i] == 0) goto end;
str2[i] = str1[i];
i = i + 1;
goto label;
end: /* This is the end of the loop*/
Note that the '\0' character has not yet been moved into str2.
Since C requires brackets to show the range of the for, only the first line after the for is part of the loop. If i had local scope and is lost after the loop, you would not be able to just wait to fall out of the loop and make it 0. You would no longer have a valid i pointer to tell you where in str2 you need to add the 0.
An example is C++ or some compilers in C which would allow (syntactically)
for (int i = 0; str1[i]; i++)
{
str2[i] = str1[i];
}
str2[i] = 0;
This would fail because i would be reset to whatever it happened to be before it entered the loop (probably 0) as it falls out of the loop. If it had not been defined before the loop, you would get an undefined variable compiler error.
I see that you fixed the indentation, but had the original indentation stayed there, the following comment would apply.
C does not work solely by indentation (as Python does, for example). If it did, the logic would be as follows and it would fail because str2 would be overwritten as all 0.
for (int i = 0; str1[i]; i++)
{
str2[i] = str1[i];
str2[i] = 0;
}
You should only add a \0 (also called the null byte) in the end of the string. Do as follows:
...
for(i = 0; str1[i]; i++) {
str2[i] = str1[i];
}
str2[i] = '\0'; //<====WHY ADDING THIS LINE??
...
(note that I simply added braces to make the code more readable, it was confusing before)
For me, that is clearer. What you were doing before is basically take advantage of the fact that the integer i that you declared is still available after you ran the loop to add a \0 in the end of str2.
The way strings work in C is that they are basically a pointer to the location of the first character and string functions (such as the ones you can find in string.h) will read every single char until they find a \0 (null byte). It is simply a convention for marking the end of the string.
Some further reading: http://www.cs.nyu.edu/courses/spring05/V22.0201-001/c_tutorial/classes/String.html
'\0' is used for denoting end of string. It is not for the compiler, it is for the libraries and possibly your code. C does not support arrays properly. You can have local arrays, but there is no way to pass them about. If you try you just pass the start address (address of first element). So you can ever have the last element be special e.g. '\0' or always pass the size, being careful not to mess up.
For example:
If your string is like this:
char str[]="Hello \0 World";
will you tell me what would display if you print str ?
Output is:
Hello
This will be the case in character arrays, Hence to be in safer side, it is good to add '\0'at the end of string.
If you didnt add '\0', some garbage values might get printed out, and it will keep on printing till it reached '\0'
In C, char[] do not know the length of the string. It is therefore important character '\0' (ASCII 0) to indicate the end of the string. Your "For" command will not copy '\0', so output is a string > str2 (until found '\ 0' last stop)
Try:
#include <stdio.h>
#include <string.h>
int main(){
int i;
char str[5] = "1234";
str[4] = '5';
printf("String %s\n\n", str);
return 0;
}

How to replace a character in a string with NULL in ANSI C?

I want to replace all 'a' characters from a string in ANSI C. Here's my code:
#include <stdio.h>
#include <stdlib.h>
void sos(char *dst){
while(*dst){
if(*dst == 'a')
*dst = '\0';
dst++;
}
}
int main(void){
char str[20] = "pasternak";
sos(str);
printf("str2 = %s \n", str);
return 0;
}
When I run it, result is:
str2 = p
But it should be
str2 = psternk
It works fine with other characters like 'b' etc. I tried to assign NULL to *dst, but I got error during compile.
How can I remove 'a' characters now?
In C, strings are zero-terminated, it means that when there's a '\0' in the string it is the end of the string.
So what you're doing is spliting the string in 3 different ones:
p
stern
k
If you want to delete the a you must move all the characters after the a one position.
What printf does is: read bytes until a '\0' is found.
You transformed "pasternak" to "p\0astern\0k", so printf prints p.
This convention is used on the string functions of the stdlib so that you don't have to pass string length as an argument.
This is why it is said that in C strings are null terminated: it is just a convention followed by the C stdlib.
The downside, as you discovered, is that strings cannot contain \0.
If you really want to print a given number of bytes, use something like fwrite, which counts the number of bytes to be printed, so it can print a \0.
The answers previously provided are perfect to explain why your code does not work. But you can try to use strtok to split the string based on the 'a' characters, to then join the parts together or simply print them appart. Check this example: http://www.tutorialspoint.com/c_standard_library/c_function_strtok.htm
'\0' is how the C language tools recognize the end of the string. In order to actually remove a character, you'll need to shift all of the subsequent characters forward.
void sos(char *dst) {
int offset = 0;
do {
while (dst[offset] == 'a') ++offset;
*dst = dst[offset];
} while (*dst++);
}

Resources