Suppose I have an array of size 10 characters (memset to 0), which I am passing to strncat as destination, and in source I am passing a string which is say 20 characters in length (null terminated), now should I pass the 'count' as 10 or 9?
The doubt is, does strncpy considers the 'count' as size of destination buffer or does it just copy 10 characters to the destination and then append a NULL terminating character in the 11th position.
Sorry if the question appears too trivial, but I was unable to make this out from the help documentation of strncpy.
You should probably just read the man page a bit more. The operative sentence seems to be this one:
The strncat() function is similar,
except that it will use at most n
characters from src. Since the result
is always terminated with '\0', at
most n+1 characters are written.
strncat will apply the null terminal for you. Since it has no knowledge about the alleged string you are pointing to, it will assume there is space for the null terminal. So you want to pass in 9.
If your array only has room for 10 characters then your count should be 9, as strncat will try to append count characters from src plus a null terminator.
Also, in this case your destination should have a null terminator in the first position, because that is where you need it to start appending.
$ man strncat
If src contains n or more characters, strncat() writes n+1 characters
to dest (n from src plus the terminating null byte). Therefore, the
size of dest must be at least strlen(dest)+n+1
$
The simple answer is: Ensure your buffer is big enough, if your buffer is to hold 10 characters, add one on to the size of the buffer to accomodate the nul character \0. That cannot be stressed enough and is one of the biggest stumbling blocks of learning C.
If you did not specify the appropriate length excluding the nul character, the buffer overflows and unpredictable results will occur, such as program crash, or jump off into the woods never to be seen again.
Hope this helps,
Best regards,
Tom.
Related
I know the string in c will be terminated by a character \0.
However, if I do char a[5]="abcd\n" , where would \0 be?
Or do I need to reserve at least one position for \0, whenever I try to use char[] to store a string?
Thank you for any help!
You should do:
char a[]="abcd\n";
without specifying the size to let compiler figure out the buffer size. The actual buffer will have size of 6 to accommodate your 5 bytes + 1 byte for terminating zero. When you type "something" without assignment, compilaer puts that string in a dedicated place in the program with at least 1 zero byte after the last character.
Writing
char a[5]="abcd\n"
is a bad practice because it will cause functions like strcpy() to act in undefined manner as your variable 'a' is not a c string, but just a buffer of characters, which by chance seem to be all printable/visible + terminating \n
In the Linux manpage of strncpy I read:
If the length of src is less than n, strncpy() writes additional
null bytes to dest to ensure that a total of n bytes are written.
In this limit case (no \0 at the end of both strings) where n>4:
char dest[8]="qqqqqqqq";
char src[4] = "abcd";
strncpy(dest, src, 5);
printing dest char by char gives "abcdqqqq".
As src has no \0 in it, no \0 is copied from the src to dest, but if I understand correctly the man page, other 4 characters should be copied in any case, and they should be \0s.
Moreover, if src is "abc" (so it is NUL terminated), dest contains "abc\0\0qqq".
I add the whole code I used to test (yes it is going to the 8-th character to look at it also):
#include <stdio.h>
#include <string.h>
int main()
{
char dest[8]="qqqqqqqq";
char src[4] = "abcd"; // "abc"
strncpy(dest, src, 5);
for (int i=0; i<9; i++)
printf("%2x ", dest[i]);
putchar('\n');
return 0;
}
Is this a faulty implementation or do I miss something?
If the length of src
That's the problem, your src is not a null terminated string so strncpy has no idea how long it is. The manpage also has a sample code showing how strncpy works: it stops at a null byte and if it doesn't find one it simply keeps reading until it reaches n. And in your case it reads the 4 characters of src and gets the fifth from dest because it's next in memory. (You can try printing src[4], nothing will stop you and you'll get the q from dest. Isn't C nice?)
You seem to be using strncpy() with a source argument that is not a string but an array of characters. So why do you expect it to behave sensibly? The manual on Linux clearly says it takes a string as the source argument, and that it copies it "including the terminating null byte" (which presumes such a byte). Note that there is no such thing as a "non-terminated string". An array of characters is just an array of characters, and string functions are not guaranteed to work on it.
However, the specific behavior that you are observing is expected and documented.
The rationale for strncpy() in the POSIX standard contains the following sentence:
If there is no NUL character byte in the first n bytes of the array pointed to by s2, the result is not null-terminated.
... where s2 is what you call src.
The manual for strncpy() on at least Ubuntu and OpenBSD contains similar wordings.
Let’s look at this logically, using quotes from the GNU man page dated 2017-09-15, as included in Debian 10:
If the length of src is less than n, strncpy() writes additional null bytes to dest to ensure that a total of n bytes are written.
As you correctly state (sort of), n == 5. So what is the “length of src”? As we should all know, C strings are null-terminated. The man page hints at this:
The strcpy() function copies the string pointed to by src, including the terminating null byte ('\0') … The strncpy() function is similar, except that at most n bytes of src are copied.
Your string is not null-terminated, though. So after reading the 4 bytes you defined, strncpy tries to read a fifth byte, and what does it find? The first byte of dest, apparently (actually, this is implementation-dependent, as far as I know). It still has not found a null to terminate the string, so does it keep reading? No, because as stated above:
a total of n bytes are written
and
at most n bytes of src are copied.
So it copies the 5 bytes "abcdq" into dest.
You said:
other 4 characters should be copied in any case, and they should be \0s
This implies a total length of 8 bytes. Where would you get this from? This is the “length” (or at least declared array length) of dest, not src, and in any case would be overridden by the fact that n < 8.
The case where src is null-terminated is straightforward.
So no, the implementation is not faulty.
Actually, you got lucky:
The strings may not overlap
They do here, so, technically, anything could happen. Indeed, a comment on the question says that Valgrind warns about this.
and the destination string dest must be large enough to receive the copy. Beware of buffer overruns!
This carelessness with string lengths should be a warning to take extra care, before you create a buffer overrun.
On another note, while it is possible that a function like strncpy would have a bug, it is extremely unlikely. Standard implementations have been thoroughly tested in a wide variety of environments. Any apparent bug is much more likely to a bug in your own code, or a lack of understanding of your own code, as is the case here.
I have been using C for quite sometime, and I have this trivial problem that I want to query about.
Say i want to create a character array that stores upto 1000 characters. Now, when I am using malloc for the same, then do I specify the size of array as 1001 character [ 1000 characters + null] or just 1000?
Also, say I came across this problem, then how could I have found the answer to this solution on my own, maybe by using some test programs. I understand the size of string is calculated without the null character, but when I am allocating the memory for the same, do I take into account the null character too?
If you need that block for storing null-terminated string then yes, you need to explictly ask malloc() to allocate an extra byte for storing the null-terminator, malloc() will not do it for you otherwise. If you intend to store the string length somewhere else and so you don't need the null terminator you can get away without allocating the extra byte. Of course it's up to you whether you need null-termination for strings, just don't forget that C library string handling functions only work with null-terminated strings.
malloc and family allocate memory in chunks of bytes. So if you do malloc(1000) you get 1000 bytes. malloc will not care if you allocated those 1000 bytes to hold a string or any other data type.
Since strings in C consist of one byte per character and ideally have to be null terminated you need to make sure you have enough memory to hold that. So the answer is: Yes, you need to allocate 1001 bytes if you wish to hold a string of 1000 characters plus null terminator.
Advanced tip: Also keep in mind that depending on how you use it you may or may not need to null terminate a string.
If you for instance know the exact length of your string you can specify that when using it with printf
printf("%*s", length, string);
will print exactly length characters from the buffer pointed at string.
It's up to you to provide the null-terminating character.
malloc allocates memory for you but it doesn't set it to anything.
If you strcpy to the allocated memory then you will have a null-terminator provided for you.
Alternatively, use calloc as it will set all elements to 0, which is in effect the null-terminator. Then if you do, say, memcpy, you wouldn't have to worry about terminating the string properly.
You do indeed need to allocate the memory for the null terminator.
Conceptually the null terminator is just a convenient way of marking the end of a string. The C standard library exploits this convention when modelling a string. For example, strlen computes the length of a string by examining the memory from the input location (probably a char*) until it reaches a null terminator; but the null terminator itself is not included in the length. But it's still part of the memory consumed by the string.
Consider following case:
#include<stdio.h>
int main()
{
char A[5];
scanf("%s",A);
printf("%s",A);
}
My question is if char A[5] contains only two characters. Say "ab", then A[0]='a', A[1]='b' and A[2]='\0'.
But if the input is say, "abcde" then where is '\0' in that case. Will A[5] contain '\0'?
If yes, why?
sizeof(A) will always return 5 as answer. Then when the array is full, is there an extra byte reserved for '\0' which sizeof() doesn't count?
If you type more than four characters then the extra characters and the null terminator will be written outside the end of the array, overwriting memory not belonging to the array. This is a buffer overflow.
C does not prevent you from clobbering memory you don't own. This results in undefined behavior. Your program could do anything—it could crash, it could silently trash other variables and cause confusing behavior, it could be harmless, or anything else. Notice that there's no guarantee that your program will either work reliably or crash reliably. You can't even depend on it crashing immediately.
This is a great example of why scanf("%s") is dangerous and should never be used. It doesn't know about the size of your array which means there is no way to use it safely. Instead, avoid scanf and use something safer, like fgets():
fgets() reads in at most one less than size characters from stream and stores them into the buffer pointed to by s. Reading stops after an EOF or a newline. If a newline is read, it is stored into the buffer. A terminating null byte ('\0') is stored after the last character in the buffer.
Example:
if (fgets(A, sizeof A, stdin) == NULL) {
/* error reading input */
}
Annoyingly, fgets() will leave a trailing newline character ('\n') at the end of the array. So you may also want code to remove it.
size_t length = strlen(A);
if (A[length - 1] == '\n') {
A[length - 1] = '\0';
}
Ugh. A simple (but broken) scanf("%s") has turned into a 7 line monstrosity. And that's the second lesson of the day: C is not good at I/O and string handling. It can be done, and it can be done safely, but C will kick and scream the whole time.
As already pointed out - you have to define/allocate an array of length N + 1 in order to store N chars correctly. It is possible to limit the amount of characters read by scanf. In your example it would be:
scanf("%4s", A);
in order to read max. 4 chars from stdin.
character arrays in c are merely pointers to blocks of memory. If you tell the compiler to reserve 5 bytes for characters, it does. If you try to put more then 5 bytes in there, it will just overwrite the memory past the 5 bytes you reserved.
That is why c can have serious security implementations. You have to know that you are only going to write 4 characters + a \0. C will let you overwrite memory until the program crashes.
Please don't think of char foo[5] as a string. Think of it as a spot to put 5 bytes. You can store 5 characters in there without a null, but you have to remember you need to do a memcpy(otherCharArray, foo, 5) and not use strcpy. You also have to know that the otherCharArray has enough space for those 5 bytes.
You'll end up with undefined behaviour.
As you say, the size of A will always be 5, so if you read 5 or more chars, scanf will try to write to a memory, that it's not supposed to modify.
And no, there's no reserved space/char for the \0 symbol.
Any string greater than 4 characters in length will cause scanf to write beyond the bounds of the array. The resulting behavior is undefined and, if you're lucky, will cause your program to crash.
If you're wondering why scanf doesn't stop writing strings that are too long to be stored in the array A, it's because there's no way for scanf to know sizeof(A) is 5. When you pass an array as the parameter to a C function, the array decays to a pointer pointing to the first element in the array. So, there's no way to query the size of the array within the function.
In order to limit the number of characters read into the array use
scanf("%4s", A);
There isn't a character that is reserved, so you must be careful not to fill the entire array to the point it can't be null terminated. Char functions rely on the null terminator, and you will get disastrous results from them if you find yourself in the situation you describe.
Much C code that you'll see will use the 'n' derivatives of functions such as strncpy. From that man page you can read:
The strcpy() and strncpy() functions return s1. The stpcpy() and
stpncpy() functions return a
pointer to the terminating `\0' character of s1. If stpncpy() does not terminate s1 with a NUL
character, it instead returns a pointer to s1[n] (which does not necessarily refer to a valid mem-
ory location.)
strlen also relies on the null character to determine the length of a character buffer. If and when you're missing that character, you will get incorrect results.
the null character is used for the termination of array. it is at the end of the array and shows that the array is end at that point. the array automatically make last character as null character so that the compiler can easily understand that the array is ended.
\0 is an terminator operator which terminates itself when array is full
if array is not full then \0 will be at the end of the array
when you enter a string it will read from the end of the array
I'm studying for an exam and I need some help with strings
Assume the following declarations, and further assume that string.h is uncluded
char rocky[21], bw[21], boris[21];
int result;
a.) Write a scanf statement that would enable the string Beauregard to be read into rocky
my answer= scanf("%s", &rocky);
b.) Assuming that the text Beauregard is the only thing on th eline of standard input, write a statement to read in the text and store it in rocky using an alternative to scanf
my answer= gets (Beauregard);
strcpy(rocky);
c.) assuming that the text read in is Beauregard, what is the value of result after the following statement is executed?
result=strlen(rocky);
my answer= i have no clue..
d.) what does the following statment do?
strcpy(boris, rocky);
answer= makes a copy of the string..(dont know much more than that)
e.) what does the following statement do? What are the values of rocky and bw?
strncpy(bw,rocky,3);
my answer= not a clue
help is much appreciated, and an explanation would also help :)
Thanks!
a. Arrays and pointers are closely related in C. In particular, the name of an array decays to a pointer to the first element of the array, so your answer should be
scanf("%s", rocky); /* note the lack of an & in front of rocky */
b. gets(Beauregard) doesn't really make sense. The gets function reads a string from standard input (think, "keyboard") and stores it in the character array pointed to by the argument you pass it. So you're supposed to assume the user will type "Beauregard", and you should read it into the rocky array with
gets(rocky);
c. strlen returns the length of the string, not including the trailing \0 character, so in this case, 10.
d. strcpy just copies the contents of the rocky array into the boris array, so they'd both contain "Beauregard".
e. strncpy works like strcpy, but only copies up to n characters (where n is the last argument, so in this case, bw would contain "Bea" without a terminating null character.
Note that several of these statements are really bad ideas in any real program. There is never a reason to use gets, for example, as any use of gets opens up security flaws. You should always use fgets instead. The scanf function can be used safely if you specify the width, but you haven't done so here. I mention these things just in case your teacher has covered them and you've forgotten.
strlen returns the length of the C string. strlen("Beauregard"); would return 10 because the string is 10 characters long.
strcpy just copies a string, you're right.
strncpy allows you to specify the maximum number of characters you want. So if you pass it 3, you'll get 3 characters and the null terminator on the end of your string.
strlen(rocky) is going to return the string length of what is pointed to by rocky. The number of letters that make up 'Beauregard'.
strncpy(bw, rocky, 3) copies the first 3 letters from the string pointed to by rocky into bw.
You should read man pages for strlen, strcpy and strncpy.
c.) strlen counts until it reaches the terminating null character, \0. So the answer would be 10, because there are 10 letters in Beauregard.
Strlen
d.) Yes, it copies a string. More specifically, it copies rocky into boris. I'm not sure what else they would want you to give as part of an answer there...
e) It copies the first 3 characters of rocky into bw. However, it does NOT add a terminating null character. strncpy