If strncat adding NUL may cause the array go out of bound - c

I have some trouble with strncat().The book called Pointers On C says the function ,strncat(),always add a NUL in the end of the character string.To better understand it ,I do an experiment.
#include<stdio.h>
#include<string.h>
int main(void)
{
char a[14]="mynameiszhm";
strncat(a,"hello",3);
printf("%s",a);
return 0;
}
The result is mynameiszhmhel
In this case the array has 14 char memory.And there were originally 11 characters in the array except for NUL.Thus when I add three more characters,all 14 characters fill up the memory of array.So when the function want to add a NUL,the NUL takes up memory outside the array.This cause the array to go out of bounds but the program above can run without any warning.Why?Will this causes something unexpected?
So when we use the strncat ,should we consider the NUL,in case causes the array go out of bound?
And I also notice the function strncpy don't add NUL.Why this two string function do different things about the same thing?And why the designer of C do this design?

This cause the array to go out of bounds but the program above can run without any warning. Why?
Maybe. With strncat(a,"hello",3);, code attempted to write beyond the 14 of a[]. It might go out of bounds, it might not. It is undefined behavior (UB). Anything is allowed.
Will this causes something unexpected?
Maybe, the behavior is not defined. It might work just as you expect - whatever that is.
So when we use thestrncat ,should we consider the NUL, in case causes the array go out of bound?
Yes, the size parameter needs to account for appending a null character, else UB.
I also notice the function strncpy don't add NUL. Why this two string function do different things about the same thing? And why the designer of C do this design?
The 2 functions strncpy()/strncat() simple share similar names, not highly similar paired functionality of strcpy()/strcat().
Consider that the early 1970s, memory was far more expensive and many considerations can be traced back to a byte of memory more that an hour's wage. Uniformity of functionality/names was of lesser importance.
And there were originally 11 characters in the array except for NUL.
More like "And there were originally 11 characters in the array except for 3 NUL.". This is no partial initialization in C.

This is not really an answer, but a counterexample.
Observe the following modification to your program:
#include<stdio.h>
#include<string.h>
int main(void)
{
char p[]="***";
char a[14]="mynameiszhm";
char q[]="***";
strncat(a,"hello",3);
printf("%s%s%s", p, a, q);
return 0;
}
The results of this program are dependent on where p and q are located in memory, compared to a. If they are not adjacent, the results are not so clear but if either p or q immediately comes after a, then your strncat will overwrite the first * causing one of them not to be printed anymore because that will now be a string of length 0.
So the results are dependent on memory layout, and it should be clear that the compiler can put the variables in memory in any order it likes. And they can be adjacent or not.
So the problem is that you are not keeping to your promise not to put more than 14 bytes into a. The compiler did what you asked, and the C standards guarantee behaviour as long as you keep to the promises.
And now you have a program that may or may not do what you wanted it to do.

Related

Does specifying array size for a user input string in C matter?

I am writing a code to take a user's input from the terminal as a string. I've read online that the correct way to instantiate a string in C is to use an array of characters. My question is if I instantiate an array of size [10], is that 10 indexes? 10 bits? 10 bytes? See the code below:
#include <stdio.h>
int main(int argc, char **argv){
char str[10] = "Jessica";
scanf("%s", &str);
printf("%c\n", str[15]);
}
In this example "str" is initialized to size 10 and I am able to to print out str[15] assuming that when the user inputs a a string it goes up to that index.
My questions are:
Does the size of the "str" array increase after taking a value from scanf?
At what amount of string characters will my original array have overflow?
.
When you declare an array of char as you have done:
char str[10] = "Jessica";
then you are telling the compiler that the array will hold up to 10 values of the type char (generally - maybe even always - this is an 8-bit character). When you then try to access a 'member' of that array with an index that goes beyond the allocated size, you will get what is known as Undefined Behaviour, which means that absolutely anything may happen: your program may crash; you may get what looks like a 'sensible' value; you may find that your hard disk is entirely erased! The behaviour is undefined. So, make sure you stick within the limits you set in the declaration: for str[n] in your case, the behaviour is undefined if n < 0 or n > 9 (array indexes start at ZERO). Your code:
printf("%c\n", str[15]);
does just what I have described - it goes beyond the 'bounds' of your str array and, thus, will cause the described undefined behaviour (UB).
Also, your scanf("%s", &str); may also cause such UB, if the user enters a string of characters longer than 9 (one must be reserved for a terminating nul character)! You can prevent this by telling the scanf function to accept a maximum number of characters:
scanf("%9s", str);
where the integer given after the % is the maximum input length allowed (anything after this will be ignored). Also, as str is defined as an array, then you don't need the explicit "address of" operator (&) in scanf - it is already there, as an array reference decays to a pointer!
Hope this helps! Feel free to ask for further clarification and/or explanation.
One of C's funny little foibles is that in almost all cases it does not check to make sure you are not overflowing your arrays.
It's your job to make sure you don't access outside the bounds of your arrays, and if you accidentally do, almost anything can happen. (Formally, it's undefined behavior.)
About the only thing that can't happen is that you get a nice error message
Error: array out-of-bounds access at line 23
(Well, theoretically that could happen, but in practice, virtually no C implementation checks for array bounds violations or issues messages like that.)
See also this answer to a similar question.
An array declares the given number of whatever you are declaring. So in the case of:
char str[10]
You are declaring an array of ten chars.
Does the size of the "str" array increase after taking a value from scanf?
No, the size does not change.
At what amount of string characters will my original array have overflow?
An array of 10 chars will hold nine characters and the null terminator. So, technically, it limits the string to nine characters.
printf("%c\n", str[15]);
This code references the 16th character in your array. Because your array only holds ten characters, you are accessing memory outside of the array. It's anyone's guess as to if your program even owns that memory and, if it does, you are referencing memory that is part of another variable. This is a recipe for disaster.

What happens when strnlen() is used with a larger maximum length than the buffer size actually is?

I've written the following code to understand better how strnlen behaves:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
char bufferOnStack[10]={'a','b','c','d','e','f','g','h','i','j'};
char *bufferOnHeap = (char *) malloc(10);
bufferOnHeap[ 0]='a';
bufferOnHeap[ 1]='b';
bufferOnHeap[ 2]='c';
bufferOnHeap[ 3]='d';
bufferOnHeap[ 4]='e';
bufferOnHeap[ 5]='f';
bufferOnHeap[ 6]='g';
bufferOnHeap[ 7]='h';
bufferOnHeap[ 8]='i';
bufferOnHeap[ 9]='j';
int lengthOnStack = strnlen(bufferOnStack,39);
int lengthOnHeap = strnlen(bufferOnHeap, 39);
printf("lengthOnStack = %d\n",lengthOnStack);
printf("lengthOnHeap = %d\n",lengthOnHeap);
return 0;
}
Note the deliberate lack of null termination in both buffers.
According to the documentation, it seems that the lengths should
both be 39:
RETURN VALUE
The strnlen() function returns strlen(s), if that is less than maxlen, or
maxlen if there is no null terminating ('\0') among the first maxlen characters
pointed to by s.
Here's my compile line:
$ gcc ./main_08.c -o main
And the output:
$ ./main
lengthOnStack = 10
lengthOnHeap = 10
What's going on here? Thanks!
First of all, strnlen() is not defined by C standard; it's a POSIX standard function.
That being said, read the documentation carefully
The strnlen() function returns the number of bytes in the string pointed to by s, excluding the terminating null byte ('\0'), but at most maxlen. In doing this, strnlen() looks only at the first maxlen bytes at s and never beyond s+maxlen.
So that means, while calling the function, you need to make sure, for the value you provide for maxlen, the array idexing is valid for [maxlen -1] for the supplied string, i.e, the string has at least maxlen elements in it.
Otherwise, while accessing the string, you'll venture into memory location which is not allocated to you (array out of bound access) hereby invoking undefined behaviour.
Remember, this function is to calculate the length of an array, upper-bound to a value (maxlen). That implies, the supplied arrays are at least equal to or greater than the bound, not the other way around.
[Footnote]:
By definition, a string is null-terminated.
Quoting C11, chapter §7.1.1, Definitions of terms
A string is a contiguous sequence of characters terminated by and including the first null
character. [...]
Firstly, don't cast malloc.
Secondly, you are reading past the end of your arrays. The memory outside your array bounds is undefined, and therefore there is no guarantee that it is not zero; in this instance, it is!
In general, this kind of behaviour is sloppy - see this answer for a good summary of the potential consequences
Your question is roughly equivalent to the following:
I know that a burglar alarm is supposed to prevent your house from getting robbed. This morning when I left the house, I turned off the burglar alarm. Sometime during the day when I was away, a burglar broke in and stole my stuff. How did this happen?
Or to this:
I know you can use the cruise control on your car to help you avoid getting speeding tickets. Yesterday I was driving on a road where the speed limit was 65. I set the cruise control to 95. A cop pulled me over and I got a speeding ticket. How did this happen?
Actually, those aren't quite right. Here's a more contrived analogy:
I live in a house with a 10 yard long driveway to the street. I have trained my dog to fetch my newspaper. One day I made sure there were no newspapers on the driveway. I put my dog on a 39 yard leash, and I told him to fetch the newspapwer. I expected him to go to the end of the leash, 39 yards away. But instead, he only went 10 yards, then stopped. How did this happen?
And of course there are many answers. Perhaps, when your dog got to the end of your newspaper-free driveway, right away he found someone else's newspaper in the gutter. Or perhaps, when the leash failed to stop him at the end of the driveway and he continued into the street, he got run over by a car.
The point of putting your dog on a leash is to restrict him to a safe area -- in this case, your property, that you control. If you put him on such a long leash that he can go off into the street, or into the woods, you're kind of defeating the purpose of controlling him by putting him on a leash.
Similarly, the whole point of strnlen is to behave gracefully if, within the buffer you have defined, there is no null character for strnlen to find.
The problem with non-null-terminated strings is that functions like strlen (which blindly search for null terminators) sail off the end and rummage blindly around in undefined memory, desperately trying to find the terminator. For example, if you say
char non_null_terminated_string[3] = "abc";
int len = strlen(non_null_terminated_string);
the behavior is undefined, because strlen sails off the end. One way to fix this is to use strnlen:
char non_null_terminated_string[3] = "abc";
int len = strnlen(non_null_terminated_string, 3);
But if you hand a bigger number to strnlen, it defeats the whole purpose. You're back wondering what will happen when strnlen sails off the end, and there's no way to answer that.
What happens when ... "Undefined behaviour (UB)"?
“When the compiler encounters [a given undefined construct] it is legal for it to make demons fly out of your nose”
Your heading is actually not UB, since calling strnlen("hi", 5) is perfectly legal, but the specifics of your question shows it is indeed UB...
Both strlen and strnlen expect a string, i.e. a nul-terminated char sequence. Providing your non-nul-terminatedchar array to the function is UB.
What happens in your case is that the function reads the first 10 chars, finds no '\0', and since it hasn't went out-of-bounds it continues to read further, and by that invoking UB (reading un-allocated memory). It could be that your compiler took the liberty to end your array with '\0', it could be that the '\0' was there before... the possibilities are limited only by the compiler designers.

Difference between array and malloc

Here is my code :
#include<stdio.h>
#include <stdlib.h>
#define LEN 2
int main(void)
{
char num1[LEN],num2[LEN]; //works fine with
//char *num1= malloc(LEN), *num2= malloc(LEN);
int number1,number2;
int sum;
printf("first integer to add = ");
scanf("%s",num1);
printf("second integer to add = ");
scanf("%s",num2);
//adds integers
number1= atoi(num1);
number2= atoi(num2);
sum = number1 + number2;
//prints sum
printf("Sum of %d and %d = %d \n",number1, number2, sum);
return 0;
}
Here is the output :
first integer to add = 15
second integer to add = 12
Sum of 0 and 12 = 12
Why it is taking 0 instead of first variable 15 ?
Could not understand why this is happening.
It is working fine if I am using
char *num1= malloc(LEN), *num2= malloc(LEN);
instead of
char num1[LEN],num2[LEN];
But it should work fine with this.
Edited :
Yes, it worked for LEN 3 but why it showed this undefined behaviour. I mean not working with the normal arrays and working with malloc. Now I got that it should not work with malloc also. But why it worked for me, please be specific so that I can debug more accurately ?
Is there any issue with my system or compiler or IDE ?
Please explain a bit more as it will be helpful or provide any links to resources. Because I don't want to be unlucky anymore.
LEN is 2, which is enough to store both digits but not the required null terminating character. You are therefore overrunning the arrays (and the heap allocations, in that version of the code!) and this causes undefined behavior. The fact that one works and the other does not is simply a byproduct of how the undefined behavior plays out on your particular system; the malloc version could indeed crash on a different system or a different compiler.
Correct results, incorrect results, crashing, or something completely different are all possibilities when you invoke undefined behavior.
Change LEN to 3 and your example input would work fine.
I would suggest indicating the size of your buffers in your scanf() line to avoid the undefined behavior. You may get incorrect results, but your program at least would not crash or have a security vulnerability:
scanf("%2s", num1);
Note that the number you use there must be one less than the size of the array -- in this example it assumes an array of size 3 (so you read a maximum of 2 characters, because you need the last character for the null terminating character).
LEN is defined as 2. You left no room for a null terminator. In the array case you would overrun the array end and damage your stack. In the malloc case you would overrun your heap and potentially damage the malloc structures.
Both are undefined behaviour. You are unlucky that your code works at all: if you were "lucky", your program would decide to crash in every case just to show you that you were triggering undefined behaviour. Unfortunately that's not how undefined behaviour works, so as a C programmer, you just have to be defensive and avoid entering into undefined behaviour situations.
Why are you using strings, anyway? Just use scanf("%d", &number1) and you can avoid all of this.
Your program does not "work fine" (and should not "work fine") with either explicitly declared arrays or malloc-ed arrays. Strings like 15 and 12 require char buffers of size 3 at least. You provided buffers of size 2. Your program overruns the buffer boundary in both cases, thus causing undefined behavior. It is just that the consequences of that undefined behavior manifest themselves differently in different versions of the code.
The malloc version has a greater chance to produce illusion of "working" since sizes of dynamically allocated memory blocks are typically rounded to the nearest implementation-depended "round" boundary (like 8 or 16 bytes). That means that your malloc calls actually allocate more memory than you ask them to. This might temporarily hide the buffer overrun problems present in your code. This produces the illusion of your program "working fine".
Meanwhile, the version with explicit arrays uses local arrays. Local arrays often have precise size (as declared) and also have a greater chance to end up located next to each other in memory. This means that buffer overrun in one array can easily destroy the contents of the other array. This is exactly what happened in your case.
However, even in the malloc-based version I'd still expect a good debugging version of standard library implementation to catch the overrun problems. It is quite possible that if you attempt to actually free these malloc-ed memory blocks (something you apparently didn't bother to do), free will notice the problem and tell you that heap integrity has been violated at some point after malloc.
P.S. Don't use atoi to convert strings to integers. Function that converts strings to integers is called strtol.

Strings behvior on C

I want to understand a number of things about the strings on C:
I could not understand why you can not change the string in a normal assignment. (But only through the functions of string.h), for example: I can't do d="aa" (d is a pointer of char or a array of char).
Can someone explain to me what's going on behind the scenes - the compiler gives to run such thing and you receive segmentation fault error.
Something else, I run a program in C that contains the following lines:
char c='a',*pc=&c;
printf("Enter a string:");
scanf("%s",pc);
printf("your first char is: %c",c);
printf("your string is: %s",pc);
If I put more than 2 letters (on scanf) I get segmentation fault error, why is this happening?
If I put two letters, the first letter printed right! And the string is printed with a lot of profits (incorrect)
If I put a letter, the letter is printed right! And the string is printed with a lot of profits and at the end something weird (a square with four numbers containing zeros and ones)
Can anyone explain what is happening behind?
Please note: I do not want the program to work, I did not ask the question to get suggestions for another program, I just want to understand what happens behind the scenes in these situations.
Strings almost do not exist in C (except as C string literals like "abc" in some C source file).
In fact, strings are mostly a convention: a C string is an array of char whose last element is the zero char '\0'.
So declaring
const char s[] = "abc";
is exactly the same as
const char s[] = {'a','b','c','\0'};
in particular, sizeof(s) is 4 (3+1) in both cases (and so is sizeof("abc")).
The standard C library contains a lot of functions (such as strlen(3) or strncpy(3)...) which obey and/or presuppose the convention that strings are zero-terminated arrays of char-s.
Better code would be:
char buf[16]="a",*pc= buf;
printf("Enter a string:"); fflush(NULL);
scanf("%15s",pc);
printf("your first char is: %c",buf[0]);
printf("your string is: %s",pc);
Some comments: be afraid of buffer overflow. When reading a string, always give a bound to the read string, or else use a function like getline(3) which dynamically allocates the string in the heap. Beware of memory leaks (use a tool like valgrind ...)
When computing a string, be also aware of the maximum size. See snprintf(3) (avoid sprintf).
Often, you adopt the convention that a string is returned and dynamically allocated in the heap. You may want to use strdup(3) or asprintf(3) if your system provides it. But you should adopt the convention that the calling function (or something else, but well defined in your head) is free(3)-ing the string.
Your program can be semantically wrong and by bad luck happening to sometimes work. Read carefully about undefined behavior. Avoid it absolutely (your points 1,2,3 are probable UB). Sadly, an UB may happen to sometimes "work".
To explain some actual undefined behavior, you have to take into account your particular implementation: the compiler, the flags -notably optimization flags- passed to the compiler, the operating system, the kernel, the processor, the phase of the moon, etc etc... Undefined behavior is often non reproducible (e.g. because of ASLR etc...), read about heisenbugs. To explain the behavior of points 1,2,3 you need to dive into implementation details; look into the assembler code (gcc -S -fverbose-asm) produced by the compiler.
I suggest you to compile your code with all warnings and debugging info (e.g. using gcc -Wall -g with GCC ...), to improve the code till you got no warning, and to learn how to use the debugger (e.g. gdb) to run your code step by step.
If I put more than 2 letters (on scanf) I get segmentation fault error, why is this happening?
Because memory is allocated for only one byte.
See char c and assigned with "a". Which is equal to 'a' and '\0' is written in one byte memory location.
If scanf() uses this memory for reading more than one byte, then this is simply undefined behavior.
char c="a"; is a wrong declaration in c language since even a single character is enclosed within a pair of double quotes("") will treated as string in C because it is treated as "a\0" since all strings ends with a '\0' null character.
char c="a"; is wrong where as char c='c'; is correct.
Also note that the memory allocated for char is only 1byte, so it can hold only one character, memory allocation details for datatypes are described bellow

C String Null Zero?

I have a basic C programming question, here is the situation. If I am creating a character array and if I wanted to treat that array as a string using the %s conversion code do I have to include a null zero. Example:
char name[6] = {'a','b','c','d','e','f'};
printf("%s",name);
The console output for this is:
abcdef
Notice that there is not a null zero as the last element in the array, yet I am still printing this as a string.
I am new to programming...So I am reading a beginners C book, which states that since I am not using a null zero in the last element I cannot treat it as a string.
This is the same output as above, although I include the null zero.
char name[7] = {'a','b','c','d','e','f','\0'};
printf("%s",name);
You're just being lucky; probably after the end of that array, on the stack, there's a zero, so printf stops reading just after the last character. If your program is very short and that zone of stack is still "unexplored" - i.e. the stack hasn't grown yet up to that point - it's very easy that it's zero, since generally modern OSes give initially zeroed pages to the applications.
More formally: by not having explicitly the NUL terminator, you're going in the land of undefined behavior, which means that anything can happen; such anything may also be that your program works fine, but it's just luck - and it's the worst type of bug, since, if it generally works fine, it's difficult to spot.
TL;DR version: don't do that. Stick to what is guaranteed to work and you won't introduce sneaky bugs in your application.
The output of your fist printf is not predictable specifically because you failed to include the terminating zero character. If it appears to work in your experiment, it is only because by a random chance the next byte in memory happened to be zero and worked as a zero terminator. The chances of this happening depend greatly on where you declare your name array (it is not clear form your example). For a static array the chances might be pretty high, while for a local (automatic) array you'll run into various garbage being printed pretty often.
You must include the null character at the end.
It worked without error because of luck, and luck alone. Try this:
char name[6] = {'a','b','c','d','e','f'};
printf("%s",name);
printf("%d",name[6]);
You'll most probably see that you can read that memory, and that there's a zero in it. But it's sheer luck.
What most likely happened is that there happened to be the value of 0 at memory location name + 6. This is not defined behavior though, you could get different output on a different system or with a different compiler.
Yes. You do. There are a few other ways to do it.
This form of initialization, puts the NUL character in for you automatically.
char name[7] = "abcdef";
printf("%s",name);
Note that I added 1 to the array size, to make space for that NUL.
One can also get away with omitting the size, and letting the compiler figure it out.
char name[] = "abcdef";
printf("%s",name);
Another method is to specify it with a pointer to a char.
char *name = "abcdef";
printf("%s",name);

Resources