Space for Null character in c strings - c

When is it necessary to explicitly provide space for a NULL character in C strings.
For eg;
This works without any error although I haven't declared str to be 7 characters long,i.e for the characters of string plus NULL character.
#include<stdio.h>
int main(){
char str[6] = "string";
printf("%s", str);
return 0;
}
Though in this question https://stackoverflow.com/a/7652089 the user says
"This is useful if you need to modify the string later on, but know that it will not exceed 40 characters (or 39 characters followed by a null terminator, depending on context)."
What does it mean by "depending on context" ?

When is it necessary to explicitly provide space for a NULL character in C strings?
Always. Not having that \0 character there will make functions like strcpy, strlen and printing via %s behave wrong. It might work for some examples (like your own) but I won't bet anything on that.
On the other hand, if your string is binary and you know the length of the packet you don't need that extra space. But then you cannot use str* functions. And this is not the case of your question, anyway.

It is buggy, keyword "buffer overflow". The memory is overwritten.
char str[4] = "stringulation";
char str2[20];
printf("%s", str);
printf("%s", str2);

Trying to write on some address for which you have not requested may lead to data corruption, Random output or undefined nature of code.

Your code invokes undefined behaviour. You may think it works, but the code is broken.
To store a C string with 6 characters, and a null-terminator, you need a character array of length 7 or more.
When is it necessary to explicitly provide space for a NULL character in C strings
There are no exceptions. A C string must always include a null terminating character.
What does it mean by "depending on context"?
The answer there is drawing the distinction between a string variable that you intend to modify at a later time, or a string variable that you will not modify. In the former case, you may choose to allocate more than you need for the initial contents, because you want to be able to add more later. In the latter case, you can simply allocate as many characters are needed for the initial value, and no more.

That 0 terminator1 is how the various library functions (strcpy(), strlen(), printf(), etc.) identify the end of a string. When you call a function like
char foo[6] = "hello";
printf( "%s\n", foo );
the array expression foo is converted to a pointer value before it's passed to the function, so all the function receives is the address of the first character; it doesn't know how long the foo array is. So it needs some way to know where the end of the string is. If foo didn't have that space for the 0 terminator, printf() would continue to print characters beyond the end of the array until it saw a 0-valued byte.
1. I prefer using the term "0 terminator" instead of "NULL terminator", just to avoid confusion with the NULL pointer, which is a different thing.

Related

best method to assign new string value to char array

I know that I have to use strcpy / strncpy to assign a new string value to an existing char array. Recently I saw a lot of code like this
char arr[128] = "\0";
sprintf(arr, "Hello World"); // only string constants no variable input
// or
sprintf(arr, "%s", "Hello World");
Both variants give the same result. What is the advantage of the latter variant?
It depends on whether the string to be copied is a literal, as shown, or can vary.
The best technique for the array shown would be:
char arr[128] = "Hello World";
If you're in charge of the string and it contains no % symbols, then there's not much difference between the two sprintf() calls. Strictly, the first uses the string as the format and copies the characters directly, while the second notes it has %s as the format and copies the characters from the extra argument directly — it's immeasurably slower. There's a case for:
snprintf(arr, sizeof(arr), "%s", "Hello World");
which ensures no buffer overflow even if "Hello World" becomes a much longer diatribe.
If you're not in charge of the string, then using snprintf() as shown becomes important as even if the string contains % symbols, it is simply copied and there's no overflow. You have to check the return value to establish whether any data was truncated.
Using strcpy() is reasonable if you know how long the string is and that there's space to hold it. Using strncpy() is fraught — it null pads to full length if the source is shorter than the target, and doesn't null terminate if the source is too long for the target.
If you've established the length of the string is short enough, using memmove() or memcpy() is reasonable too. If the string is too long, you have to choose an error handling strategy — truncation or error.
If the trailing (unused) space in the target array must be null bytes (for security reasons, to ensure there's no leftover password hidden in it), then using strncpy() may be sensible — but beware of ensuring null termination if the source is too long. In most cases, the initializer for the array is not really needed.
The compiler may be able to optimize the simple cases.
The first version won't work if the string contains any % characters, because sprintf() will treat them as formatting operators that need to be filled in using additional arguments.. This isn't a problem with a fixed string like Hello World, but if you're getting the string dynamically it could cause undefined behavior because there won't be any arguments to match the formatting operators. This can potentially cause security exploits.
If you're not actually doing any formatting, a better way is to just use strcpy():
strcpy(arr, "Hello World");
Also, when initiallizing the string it's not necessary to put an explicit \0 in the string. A string literal always ends with a null byte. So you can initialize it as:
char arr[128] = "";
And if you're immediately overwriting the variable with sprintf() or strcpy(), you don't need to initialize it in the first place.

Segmentation fault of small code

I am trying to test something and I made a small test file to do so. The code is:
void main(){
int i = 0;
char array1 [3];
array1[0] = 'a';
array1[1] = 'b';
array1[2] = 'c';
printf("%s", array1[i+1]);
printf("%d", i);
}
I receive a segmentation error when I compile and try to run. Please let me know what my issue is.
Please let me know what my issue is. ? firstly char array1[3]; is not null terminated as there is no enough space to put '\0' at the end of array1. To avoid this undefined behavior increase the size of array1.
Secondly, array1[i+1] is a single char not string, so use %c instead of %s as
printf("%c", array1[i+1]);
I suggest you get yourself a good book/video series on C. It's not a language that's fun to pick up out of the blue.
Regardless, your problem here is that you haven't formed a correct string. In C, a string is a pointer to the start of a contiguous region of memory that happens to be filled with characters. There is no data whatsoever stored about it's size or any other characteristics. Only where it starts and what it is. Therefore you must provide information as to when the string ends explicitly. This is done by having the very last character in a string be set to the so called null character (in C represented by the escape sequence '\0'.
This implies that any string must be one character longer than the content you want it to hold. You should also never be setting up a string manually like this. Use a library function like strlcpy to do it. It will automatically add in a null character, even if your array is too small (by truncating the string). Alternatively you can statically create a literal string like this:
char array[] = "abc";
It will automatically be null terminated and be of size 4.
Strings need to have a NUL terminator, and you don't have one, nor is there room for one.
The solution is to add one more character:
char array1[4];
// ...
array1[3] = 0;
Also you're asking to print a string but supplying a character instead. You need to supply the whole buffer:
printf("%s", array1);
Then you're fine.
Spend the time to learn about how C strings work, in particular about the requirement for the terminator, as buffer overflow bugs are no joke.
When printf sees a "%s" specifier in the formatting string, it expects a char* as the corresponding argument, but you passed a char value of the array1[i+1] expression. That char got promoted to int but that is still incompatible with char *, And even if it was it has no chance to be a valid pointer to any meaningful character string...

How do you assign a string in C

Printing the initials (first character) of the string held in the variable 'fn' and the variable 'ln'
#include <stdio.h>
#include <cs50.h>
int main(void)
{
string fn, ln, initials;
fn = get_string("\nFirst Name: ");
ln = get_string("Last Name: ");
initials = 'fn[0]', 'ln[0]';
printf("%s", initials)
}
Read more about C. In particular, read some good C programming book, and some C reference site and read the C11 standard n1570. Notice that cs50.h is not a standard C header (and I never encountered it).
The string type does not exist. So your example don't compile and is not valid C code.
An important (and difficult) notion in C is : undefined behavior (UB). I won't explain what is it here, but see this, read much more about UB, and be really afraid of UB.
Even if you (wrongly) add something like
typedef char* string;
(and your cs50.h might do that) you need to understand that:
not every pointer is valid, and some pointers may contain an invalid address (such as NULL, or most random addresses; in particular an uninitialized pointer variable often has an invalid pointer). Be aware that in your virtual address space most addresses are invalid. Dereferencing an invalid pointer is UB (often, but not always, giving a segmentation fault).
even when a pointer to char is valid, it could point to something which is not a string (e.g. some sequence of bytes which is not NUL terminated). Passing such a pointer (to a non-string data) to string related functions -e.g. strlen or printf with %s is UB.
A string is a sequence of bytes, with additional conventions: at the very least it should be NUL terminated and you generally want it to be a valid string for your system. For example, my Linux is using UTF-8 (in 2017 UTF-8 is used everywhere) so in practice only valid UTF-8 strings can be correctly displayed in my terminals.
Arrays are decayed into pointers (read more to understand what that means, it is tricky). So in several occasions you might declare an array variable (a buffer)
char buf[50];
then fill it, perhaps using strcpy like
strcpy(buf, "abc");
or using snprintf like
int xx = something();
snprintf(buf, sizeof(buf), "x%d", xx);
and latter you can use as a "string", e.g.
printf("buf is: %s\n", buf);
In some cases (but not always!), you might even do some array accesses like
char c=buf[4];
printf("c is %c\n", c);
or pointer arithmetic like
printf("buf+8 is %s\n", buf+8);
BTW, since stdio is buffered, I recommend ending your printf control format strings with \n or using fflush.
Beware and be very careful about buffer overflows. It is another common cause of UB.
You might want to declare
char initials[8];
and fill that memory zone to become a proper string:
initials[0] = fn[0];
initials[1] = ln[0];
initials[2] = (char)0;
the last assignment (to initials[2]) is putting the NUL terminating byte and makes that initials buffer a proper string. Then you could output it using printf or fputs
fputs(initials, stdout);
and you'll better output a newline with
putchar('\n');
(or you might just do puts(initials); ....)
Please compile with all warnings and debug info, so gcc -Wall -Wextra -g with GCC. Improve your code to get no warnings. Learn how to use your compiler and your debugger gdb. Use gdb to run your program step by step and query its state. Take time to read the documentation of every standard function that you are using (e.g. strcpy, printf, scanf, fgets) even if at first you don't understand all of it.
char initials[]={ fn[0], ln[0], '\0'};
This will form the char array and you can print it with
printf("%s", initials) //This is a string - null terminated character array.
There is no concept of string datatype in c . We simulate it using null terminated character array.
If you don't put the \0 in the end, it won't be a null terminated char array and if you want to print it you will have to use indexing in the array to determine the individual characters. (You can't use printf or other standard functions).
int s[]={'h','i'} // not null terminated
//But you can work with this, iterating over the elements.
for(size_t i=0; i< sizeof s; i++)
printf("%c",s[i]);
To explain further there is no string datatype in C. So what you can do is you simulate it using char [] and that is sufficient for that work.
For example you have to do this to get a string
char fn[MAXLEN}, ln[MAXLEN];
Reading an input can be like :
if(!fgets(fn, MAXLEN,stdin) ){
fprintf(stderr,"Error in input");
}
Do similarly for the second char array.
And then you do form the initializationg of array initials.
char initials[]={fn[0],ln[0],'\0'}
The benefit of the null terminated char array is that you can pass it to the fucntions which works over char* and get a correct result. Like strcmp() or strcpy().
Also there are lots of ways to get input from stdin and it is better always to check the return type of the standard functions that you use.
Standard don't restrict us that all the char arrays must be null terminated. But if we dont do that way then it's hardly useful in common cases. Like my example above. That array i shown earlier (without the null terminator) can't be passed to strlen() or strcpy() etc.
Also knowingly or unknowingly you have used somnething interesting The comma operator
Suppose you write a statememnt like this
char initialChar = fn[0] , ln[0]; //This is error
char initialChar = (fn[0] , ln[0]); // This is correct and the result will be `ln[0]`
, operator works that first it tries to evaluate the first expression fn[0] and then moves to the second ln[0] and that value is returned as a value of the whole expression that is assigned to initialChar.
You can check these helpful links to get you started
Beginner's Guide Away from scanf()
How to debug small programs

C: Why string variable accepts more characters than its size?

I have following code and the out put:-
#include<stdio.h>
int main()
{
char pal_tmp[4];
printf("Size of String Variable %d\n",sizeof(pal_tmp));
strcpy(pal_tmp,"123456789");
printf("Printing Extended Ascii: %s\n",pal_tmp);
printf("Size of String Variable %d\n",sizeof(pal_tmp));
}
Out put:-
Size of String Variable 4
Printing Extended Ascii: 123456789
Size of String Variable 4
My questions is Why String variable (character array) accepts characters more than what its capacity is? Should not it just print 1234 instead of 123456789 ?
Am I doing something wrong?
Well yes. You are doing something wrong. You're putting more characters into the string than you are supposed to. According to the C specification, that is wrong and referred to as "undefined behaviour".
However, that very same C specification does not require the compiler (nor runtime) to actually flag that as an error. "Undefined behaviour" means that anything could happen, including getting an error, random data corruption or the program actually working.
In this particular case, your call to strcpy simply writes outside the reserved memory and will overwrite whatever happens to be stored after the array. There is probably nothing of importance there, which is why nothing bad seems to happen.
As an example of what would happen if you do have something relevant after the array, let's add a variable to see what happens to it:
#include <stdio.h>
int main( void )
{
char foo[4];
int bar = 0;
strcpy( foo, "a long string here" );
printf( "%d\n", bar );
return 0;
}
When run, I get the result 1701322855 on my machine (the results on yours will likely be different).
The call to strcpy clobbered the content of the bar variable, resulting in the random output that you saw.
Well yes, you are overwriting memory that doesn't belong to that buffer (pal_tmp). In some cases this might work, in others you might get a segfault and your program will crash. In the case you showed, it looks like you happened to not overwrite anything "useful". If you tried to write more, you'll be more likely to overwrite something useful and crash the program.
C arrays of char don't have a predefined size, as far as the string handling functions are concerned. The functions will happily write off the end of the array into other variables (bad), or malloc's bookkeeping data (worse), or the call stack's bookkeeping data (even worse). The C standard makes this undefined behaviour, and for good reason.
If a version of a particular function accepts a size argument to limit how much data it writes, use it. It protects you against this stuff.
C does not keep track of the size of strings (or arrays, or allocated memory, etc.), so that is your job. If you create a string, you must be careful to always make sure it never gets longer than the amount of memory you've allocated to it.
In C language Strings are defined as an array of characters or a pointer to a portion of memory containing ASCII characters. A string in C is a sequence of zero or more characters followed by a NULL '\0' character. It is important to preserve the NULL terminating character as it is how C defines and manages variable length strings. All the C standard library functions require this for successful operation.
For complete reference refer this
Function strcpy doesn't have knowledge about the length of the character array - this function is considered as unsecure.
You may use strncpy, where you tell the size of the buffer and if longer argument is provided, only the memory of the buffer is used and nothing else is changed.

What sorts of strings are legal inputs to C string functions?

Do functions like strcat and strcmp require null-terminated strings as arguments, or is any array of characters acceptable ?
All documentations suggest it must be null-terminated, but one of the most well known online references (http://cplusplus.com) gives the following as example of strcmp:
/* strcmp example */
#include <stdio.h>
#include <string.h>
int main ()
{
char szKey[] = "apple";
char szInput[80];
do {
printf ("Guess my favourite fruit? ");
gets (szInput);
} while (strcmp (szKey,szInput) != 0);
puts ("Correct answer!");
return 0;
}
Yes, the functions required null-terminated strings. However, the example you've listed above does indeed use null-terminated strings. For example, the line
char szKey[] = "apple";
Describes a string that has a null terminator appended, even though it's not immediately apparent in the source code. Any string literal in C is automatically null-terminated, even if you don't explicitly put the request in yourself (though there is an exception, as we'll see in a minute).
Moreover, in the line
gets (szInput);
The function gets automatically appends a null-terminator to the end of the string that it reads from the console. In fact, with few exceptions (such as the notoriously complex strncat function), all string manipulation functions in <string.h> automatically append a null-terminator. It is rare in common usage to end up with a non-null-terminated string unless you're explicitly messing around with the character bytes yourself.
That said, there are many ways to get strings that aren't null-terminated. For example, if you define a string like this:
char hello[5] = {'h', 'e', 'l', 'l', 'o'}; /* Careful! */
This array will not be null-terminated, because you've explicitly listed off the values you'd like it to have. This means that what you have is an array of characters rather than a string. If you then tried calling
printf("%s\n", hello);
You would run into undefined behavior because the array is not null-terminated.
Additionally, if you use any of the raw memory manipulation routines like memcpy or memmove, then you need to be careful to ensure that the null terminator is copied or set explicitly, since these routines have no concept of null-terminators.
Also, one quick bit of terminology - NULL usually refers to a null pointer, that is, a pointer that is explicitly marked as pointing to no object. The null in null-terminator refers to the character with numeric value 0 and is a character (not a pointer) used to indicate that the end of a string has been reached. While the names are the same (and there are similarities), it's best not to confuse the two.
Hope this helps!
gets() does null-terminate, and szKey[] = "apple"; is null-terminated. "apple" is a string literal which is always null terminated.
strcmp requires the string to copy to have a \0 terminator otherwise it could possibly run off the end of the string and cause access violations.
strcat also requires its arguments to be \0 terminated.
Strings need to be NUL terminated, but I don't see any problem with NUL termination in that code.
I should add that this is nearly the only problem it doesn't have. Using gets, in particular, is inexcusable.
By definition, a string is always NULL-terminated.
"apple" gets actually treated as apple\0 by the compiler.
Absolutely they must be null terminated.
Comparison functions like strcmp must have something to stop the comparison and that is the null character.
There are other functions like strncmp which take a length parameter. In this case strings do not have to be null terminated as you provide the number of characters to compare.
They have to be NULL-terminated, because the length is unknown. The way the library functions work is they start at the beginning of a string, and iterate through every character until the null byte is encountered. If this isn't the case, the function will continue on into memory where it should not be.
What makes you think those /aren't/ null-terminated? String literals are automatically null terminated, and (provided that there's enough space in the buffer) gets() will also null-terminate.
That point there - that gets will only null-terminate if there's space in the buffer - is the reason you should use getsn() instead of gets().

Resources