Recently I was programming in my Code Blocks and I did a little program only for hobby in C.
char littleString[1];
fflush( stdin );
scanf( "%s", littleString );
printf( "\n%s", littleString);
If I created a string of one character, why does the CodeBlocks allow me to save 13 characters?
C have no bounds-checking, writing out of bounds of arrays or dynamically allocated memory can't be checked by the compiler. Instead it will lead to undefined behavior.
To prevent buffer overflow with scanf you can tell it to only read a specific number of characters, and nothing more. So to tell it to read only one character you use the format "%1s".
As a small side-note: Remember that strings in C have an extra character in them, the terminator (character '\0'). So if you have a string that should contain one character, the size actually needs to be two characters.
LittleString is not a string. It is a char array of length one. In order for a char array to be a string, it must be null terminated with an \0. You are writing past the memory you have allotted for littleString. This is undefined behavior.Scanf just reads user input from the console and assigns it to the variable specified, in this case littleString. If you would like to control the length of user input which is assigned to the variable, I would suggest using scanf_s. Please note that scanf_s is not a C99 standard
Many functions in C is implemented without any checks for correctness of use. In other words, it is the callers responsibility that the arguments fulfill some rules set by the function.
Example: For strcpy the Linux man page says
The strcpy() function copies the string pointed to by src,
including the terminating null byte ('\0'), to the buffer
pointed to by dest. The strings may not overlap, and the
destination string dest must be large enough to receive the copy.
If you as a caller break that contract by passing a too small buffer, you'll have undefined behavior and anything can happen.
The program may crash or even do exactly what you expected in 99 out of 100 times and do something strange in 1 out of 100 times.
Related
In the various cases that a buffer is provided to the standard library's many string functions, is it guaranteed that the buffer will not be modified beyond the null terminator? For example:
char buffer[17] = "abcdefghijklmnop";
sscanf("123", "%16s", buffer);
Is buffer now required to equal "123\0efghijklmnop"?
Another example:
char buffer[10];
fgets(buffer, 10, fp);
If the read line is only 3 characters long, can one be certain that the 6th character is the same as before fgets was called?
The C99 draft standard does not explicitly state what should happen in those cases, but by considering multiple variations, you can show that it must work a certain way so that it meets the specification in all cases.
The standard says:
%s - Matches a sequence of non-white-space characters.252)
If no l length modifier is present, the corresponding argument shall be a
pointer to the initial element of a character array large enough to accept the
sequence and a terminating null character, which will be added automatically.
Here's a pair of examples that show it must work the way you are proposing to meet the standard.
Example A:
char buffer[4] = "abcd";
char buffer2[10]; // Note the this could be placed at what would be buffer+4
sscanf("123 4", "%s %s", buffer, buffer2);
// Result is buffer = "123\0"
// buffer2 = "4\0"
Example B:
char buffer[17] = "abcdefghijklmnop";
char* buffer2 = &buffer[4];
sscanf("123 4", "%s %s", buffer, buffer2);
// Result is buffer = "123\04\0"
Note that the interface of sscanf doesn't provide enough information to really know that these were different. So, if Example B is to work properly, it must not mess with the bytes after the null character in Example A. This is because it must work in both cases according to this bit of spec.
So implicitly it must work as you stated due to the spec.
Similar arguments can be placed for other functions, but I think you can see the idea from this example.
NOTE:
Providing size limits in the format, such as "%16s", could change the behavior. By the specification, it would be functionally acceptable for sscanf to zero out a buffer to its limits before writing the data into the buffer. In practice, most implementations opt for performance, which means they leave the remainder alone.
When the intent of the specification is to do this sort of zeroing out, it is usually explicitly specified. strncpy is an example. If the length of the string is less than the maximum buffer length specified, it will fill the rest of the space with null characters. The fact that this same "string" function could return a non-terminated string as well makes this one of the most common functions for people to roll their own version.
As far as fgets, a similar situation could arise. The only gotcha is that the specification explicitly states that if nothing is read in, the buffer remains untouched. An acceptable functional implementation could sidestep this by checking to see if there is at least one byte to read before zeroing out the buffer.
Each individual byte in the buffer is an object. Unless some part of the function description of sscanf or fgets mentions modifying those bytes, or even implies their values may change e.g. by stating their values become unspecified, then the general rule applies: (emphasis mine)
6.2.4 Storage durations of objects
2 [...] An object exists, has a constant address, and retains its last-stored value throughout its lifetime. [...]
It's this same principle that guarantees that
#include <stdio.h>
int a = 1;
int main() {
printf ("%d\n", a);
printf ("%d\n", a);
}
attempts to print 1 twice. Even though a is global, printf can access global variables, and the description of printf doesn't mention not modifying a.
Neither the description of fgets nor that of sscanf mentions modifying buffers past the bytes that actually were supposed to be written (except in the case of a read error), so those bytes don't get modified.
The standard is somewhat ambiguous on this, but I think a reasonable reading of it is that the answer is: yes, it's not allowed to write more bytes to the buffer than it read+null. On the other hand, a stricter reading/interpretation of the text could conclude that the answer is no, there's no guarantee. Here's what a publicly avaialble draft says about fgets.
char *fgets(char * restrict s, int n, FILE * restrict stream);
The fgets function reads at most one less than the number of characters specified by n from the stream pointed to by stream into the array pointed to by s. No additional characters are read after a new-line character (which is retained) or after end-of-file. A null character is written immediately after the last character read into the array.
The fgets function returns s if successful. If end-of-file is encountered and no characters have been read into the array, the contents of the array remain unchanged and a null pointer is returned. If a read error occurs during the operation, the array contents are indeterminate and a null pointer is returned.
There's a guarantee about how much it is supposed to read from the input, i.e. stop reading at newline or EOF and not read more than n-1 bytes. Although nothing is said explicitly about how much it's allowed to write to the buffer, the common knowledge is that fgets's n parameter is used to prevent buffer overflows. It's a little strange that the standard uses the ambiguous term read, which may not necessarily imply that gets can't write to the buffer more than n bytes, if you want to nitpick on the terminology it uses. But note that the same "read" terminology is used about both issues: the n-limit and the EOF/newline limit. So if you interpret the n-related "read" as a buffer-write limit, then [for consistency] you can/should interpret the other "read" the same way, i.e. not write more than what it read when string is shorter than the buffer.
On the other hand, if you distinguish between the uses of the phrase-verb "read into" (="write") and just "read", then you can't read the committee's text the same way. You are guaranteed that it won't "read into" (="write to") the array more than n bytes, but if the input string is terminated sooner by newline or EOF you're only guaranteed the rest (of the input) won't be "read", but whether that implies in won't be "read into" (="written to") the buffer is unclear under this stricter reading. The crucial issue is keyword is "into", which is elided, so the problem is whether the completion given by me in brackets in the following modified quote is the intended interpretation:
No additional characters are read [into the array] after a new-line character (which is retained) or after end-of-file.
Frankly a single postcondition stated as a formula (and would be pretty short in this case) would have been a lot more helpful than the verbiage I quoted...
I can't be bothered to try and analyze their writeup about the *scanf family, because I suspect it's going to be even more complicated given all the other things that happen in those functions; their writeup for fscanf is about five pages long... But I suspect a similar logic applies.
is it guaranteed that the buffer will not be modified beyond the null
terminator?
No, there's no guarantee.
Is buffer now required to equal "123\0efghijklmnop"?
Yes. But that's only because you've used correct parameters to your string related functions. Should you mess up buffer length, input modifiers to sscanf and such, then you program will compile. But it will most likely fail during runtime.
If the read line is only 3 characters long, can one be certain that the 6th character is the same as before fgets was called?
Yes. Once fgets() figures you have a 3 character input string it stores the input in the provided buffer, and it doesn't care about the reset of provided space at all.
Is buffer now required to equal "123\0efghijklmnop"?
Here buffer is just consists of 123 string guaranteed terminating at NUL.
Yes the memory allocated for array buffer will not get de-allocated, however you are making sure/restricting your string buffer can atmost only have 16 char elements which you can read into it at any point of time. Now depends whether you write just a single char or maximum what buffer can take.
For example:
char buffer[4096] = "abc";`
actually does something below,
memcpy(buffer, "abc", sizeof("abc"));
memset(&buffer[sizeof("abc")], 0, sizeof(buffer)-sizeof("abc"));
The standard insists that if any part of char array is initialized that is all it consists of at any moment until obeying its memory boundary.
There are no any guarantees from standard, which is why the functions sscanf and fgets are recommended to be used (with respect to the size of the buffer) as you show in your question (and using of fgets is considered preferable compared with gets).
However, some standard functions use null-terminator in their work, e.g. strlen (but I suppose you ask about string modification)
EDIT:
In your example
fgets(buffer, 10, fp);
untouching characters after 10-th is guaranteed (content and length of buffer will not be considered by fgets)
EDIT2:
Moreover, when using fgets keep in mind that '\n' will be stored in the buffers. e.g.
"123\n\0fghijklmnop"
instead of expected
"123\0efghijklmnop"
Depends on the function in use (and to a lesser degree its implementation). sscanf will start writing when it encounters its first non-whitespace character, and continue writing until its first whitespace character, where it will add a finishing 0 and return. But a function like strncpy (famously) zeroes out the rest of the buffer.
There is however nothing in the C standard which mandates how these functions behave.
I am writing a C program, which has a 5-element array to store a string. And I am using gets() to get input. When I typed in more than 5 characters and then output the string, it just gave me all the characters I typed in. I know the string is terminated by a \0 so even I exceeded my array, it will still output the whole thing.
But what I am curious is where exactly gets() stores input, either buffer or just directly goes to my array?
What if I type in a long long string, will gets() try to store characters in the memories that should not be touched? Would it gives me a segment fault?
That's why gets is an evil. It does not check array bound and often invokes undefined behavior. Never use gets, instead you can use fgets.
By the way, now gets is no longer be a part of C. It has been removed in C11 standard in favor of a new safe alternative, gets_s1 (see the wiki). So, better to forget about gets.
1. C11: K.3.5.4.1 The gets_s function
Synopsis
#define _ _STDC_WANT_LIB_EXT1_ _ 1
#include <stdio.h>
char *gets_s(char *s, rsize_t n);
gets() will store the characters in the 5-element buffer. If you type in more than 4 characters, the end of string character will be missed and the result may not work well in any string operations in your program.
excerpt from man page on Ubuntu Linux
gets() reads a line from stdin into the buffer pointed to by s until
either a terminating newline or EOF, which it replaces with a null byte
('\0'). No check for buffer overrun is performed
The string is stored in the buffer and if it is too long it is stored in contiguous memory after the buffer. This can lead to unintended writing over of data or a SEGV fault or other problems. It is a security issue as it can be used to inject code into programs.
gets() stores the characters you type directly into your array and you can safely use/modify them. But indeed, as haccks and unxnut correctly state, gets doesn't care about the size of the array you give it to store its chars in, and when you type more characters than the array has space for you might eventually get a segmentation fault or some other weird results.
Just for the sake of completeness, gets() reads from a buffered file called stdin which contains the chars you typed. More specifically, it takes the chars until it reaches a newline. That newline too is put into your array and next the '\0' terminator. You should, as haccks says, use fgets which is very much alike:
char buf[100]; // the input buffer
fgets(buf, 100, stdin); // reads until it finds a newline (your enter) but never
// more than 99 chars, using the last char for the '\0'
// you can now use and modify buf
I have written a simple program to calculate length of string in this way.
I know that there are other ways too. But I just want to know why this program is giving this output.
#include <stdio.h>
int main()
{
char str[1];
printf( "%d", printf("%s", gets(str)));
return 0;
}
OUTPUT :
(null)6
Unless you always pass empty strings from the standard input, you are invoking undefined behavior, so the output could be pretty much anything, and it could crash as well. str cannot be a well-formed C string of more than zero characters.
char str[1] allocates storage room for one single character, but that character needs to be the NUL character to satisfy C string constraints. You need to create a character array large enough to hold the string that you're writing with gets.
"(null)6" as the output could mean that gets returned NULL because it failed for some reason or that the stack was corrupted in such a way that the return value was overwritten with zeroes (per the undefined behavior explanation). 6 following "(null)" is expected, as the return value of printf is the number of characters that were printed, and "(null)" is six characters long.
There's several issues with your program.
First off, you're defining a char buffer way too short, a 1 char buffer for a string can only hold one string, the empty one. This is because you need a null at the end of the string to terminate it.
Next, you're using the gets function which is very unsafe, (as your compiler almost certainly warned you about), as it just blindly takes input and copies it into a buffer. As your buffer is 0+terminator characters long, you're going to be automatically overwriting the end of your string into other areas of memory which could and probably does contain important information, such as your rsp (your return pointer). This is the classic method of smashing the stack.
Third, you're passing the output of a printf function to another printf. printf isn't designed for formating strings and returning strings, there are other functions for that. Generally the one you will want to use is sprintf and pass it in a string.
Please read the documentation on this sort of thing, and if you're unsure about any specific thing read up on it before just trying to program it in. You seem confused on the basic usage of many important C functions.
It invokes undefined behavior. In this case you may get any thing. At least str should be of 2 bytes if you are not passing a empty string.
When you declare a variable some space is reserved to store the value.
The reserved space can be a space that was previously used by some other
code and has values. When the variable goes out of scope or is freed
the value is not erased (or it may be, anything goes.) only the programs access
to that variable is revoked.
When you read from an unitialised location you can get anything.
This is undefined behaviour and you are doing that,
Output on gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3 is 0
For above program your input is "(null)", So you are getting "(null)6". Here "6" is the output from printf (number of characters successfully printed).
I've seen several usage of fgets (for example, here) that go like this:
char buff[7]="";
(...)
fgets(buff, sizeof(buff), stdin);
The interest being that, if I supply a long input like "aaaaaaaaaaa", fgets will truncate it to "aaaaaa" here, because the 7th character will be used to store '\0'.
However, when doing this:
int i=0;
for (i=0;i<7;i++)
{
buff[i]='a';
}
printf("%s\n",buff);
I will always get 7 'a's, and the program will not crash. But if I try to write 8 'a's, it will.
As I saw it later, the reason for this is that, at least on my system, when I allocate char buff[7] (with or without =""), the 8th byte (counting from 1, not from 0) gets set to 0. From what I guess, things are done like this precisely so that a for loop with 7 writes, followed by a string formatted read, could succeed, whether the last character to be written was '\0' or not, and thus avoiding the need for the programmer to set the last '\0' himself, when writing chars individually.
From this, it follows that in the case of
fgets(buff, sizeof(buff), stdin);
and then providing a too long input, the resulting buffstring will automatically have two '\0' characters, one inside the array, and one right after it that was written by the system.
I have also observed that doing
fgets(buff,(sizeof(buff)+17),stdin);
will still work, and output a very long string, without crashing. From what I guessed, this is because fgets will keep writing until sizeof(buff)+17, and the last char to be written will precisely be a '\0', ensuring that any forthcoming string reading process would terminate properly (although the memory is messed up anyway).
But then, what about fgets(buff, (sizeof(buff)+1),stdin);? this would use up all the space that was rightfully allocated in buff, and then write a '\0' right after it, thus overwriting...the '\0' previously written by the system. In other words, yes, fgets would go out of bounds, but it can be proven that when adding only one to the length of the write, the program will never crash.
So in the end, here comes the question: why does fgets always terminates its write with a '\0', when another '\0', placed by the system right after the array, already exists? why not do like in the one by one for-loop based write, that can access the whole of the array and write anything the programmer wants, without endangering anything?
Thank you very much for your answer!
EDIT: indeed, there is no proof possible, as long as I do not know whether this 8th '\0' that mysteriously appears upon allocation of buff[7], is part of the C standard or not, specifically for string arrays. If not, then...it's just luck that it works :-)
but it can be proven that when adding only one to the length of the write, the program will never crash.
No! You can't prove that! Not in the sense of a mathematical proof. You have only shown that on your system, with your compiler, with those particular compiler settings you used, with particular environment configuration, it might not crash. This is far from a mathematical proof!
In fact the C standard itself, although it guarantees that you can get the address of "one place after the last element of an array", it also states that dereferencing that address (i.e. trying to read or write from that address) is undefined behaviour.
That means that an implementation can do everything in this case. It can even do what you expect with naive reasoning (i.e. work - but it's sheer luck), but it may also crash or it may also format your HD (if your are very, very unlucky). This is especially true when writing system software (e.g. a device driver or a program running on the bare metal), i.e. when there is no OS to shield you from the nastiest consequences of writing bad code!
Edit This should answer the question made in a comment (C99 draft standard):
7.19.7.2 The fgets function
Synopsis
#include <stdio.h>
char *fgets(char * restrict s, int n,
FILE * restrict stream);
Description
The fgets function reads at most one less than the number of characters specified by n
from the stream pointed to by stream into the array pointed to by s. No additional
characters are read after a new-line character (which is retained) or after end-of-file. A
null character is written immediately after the last character read into the array.
Returns
The fgets function returns s if successful. If end-of-file is encountered and no
characters have been read into the array, the contents of the array remain unchanged and a
null pointer is returned. If a read error occurs during the operation, the array contents are
indeterminate and a null pointer is returned.
Edit: Since it seems that the problem lies in a misunderstanding of what a string is, this is the relevant excerpt from the standard (emphasis mine):
7.1.1 Definitions of terms
A string is a contiguous sequence of characters terminated by and including the first null
character. The term multibyte string is sometimes used instead to emphasize special
processing given to multibyte characters contained in the string or to avoid confusion
with a wide string. A pointer to a string is a pointer to its initial (lowest addressed)
character. The length of a string is the number of bytes preceding the null character and
the value of a string is the sequence of the values of the contained characters, in order.
From C11 standard draft:
The fgets function reads at most one less than the number of characters specified by n
from the stream pointed to by stream into the array pointed to by s. No additional
characters are read after a new-line character (which is retained) or after end-of-file. A
null character is written immediately after the last character read into the array.
The fgets function returns s if successful. If end-of-file is encountered and no
characters have been read into the array, the contents of the array remain unchanged and a
null pointer is returned. If a read error occurs during the operation, the array contents are indeterminate and a null pointer is returned.
The behaviour you describe is undefined.
Consider following case:
#include<stdio.h>
int main()
{
char A[5];
scanf("%s",A);
printf("%s",A);
}
My question is if char A[5] contains only two characters. Say "ab", then A[0]='a', A[1]='b' and A[2]='\0'.
But if the input is say, "abcde" then where is '\0' in that case. Will A[5] contain '\0'?
If yes, why?
sizeof(A) will always return 5 as answer. Then when the array is full, is there an extra byte reserved for '\0' which sizeof() doesn't count?
If you type more than four characters then the extra characters and the null terminator will be written outside the end of the array, overwriting memory not belonging to the array. This is a buffer overflow.
C does not prevent you from clobbering memory you don't own. This results in undefined behavior. Your program could do anything—it could crash, it could silently trash other variables and cause confusing behavior, it could be harmless, or anything else. Notice that there's no guarantee that your program will either work reliably or crash reliably. You can't even depend on it crashing immediately.
This is a great example of why scanf("%s") is dangerous and should never be used. It doesn't know about the size of your array which means there is no way to use it safely. Instead, avoid scanf and use something safer, like fgets():
fgets() reads in at most one less than size characters from stream and stores them into the buffer pointed to by s. Reading stops after an EOF or a newline. If a newline is read, it is stored into the buffer. A terminating null byte ('\0') is stored after the last character in the buffer.
Example:
if (fgets(A, sizeof A, stdin) == NULL) {
/* error reading input */
}
Annoyingly, fgets() will leave a trailing newline character ('\n') at the end of the array. So you may also want code to remove it.
size_t length = strlen(A);
if (A[length - 1] == '\n') {
A[length - 1] = '\0';
}
Ugh. A simple (but broken) scanf("%s") has turned into a 7 line monstrosity. And that's the second lesson of the day: C is not good at I/O and string handling. It can be done, and it can be done safely, but C will kick and scream the whole time.
As already pointed out - you have to define/allocate an array of length N + 1 in order to store N chars correctly. It is possible to limit the amount of characters read by scanf. In your example it would be:
scanf("%4s", A);
in order to read max. 4 chars from stdin.
character arrays in c are merely pointers to blocks of memory. If you tell the compiler to reserve 5 bytes for characters, it does. If you try to put more then 5 bytes in there, it will just overwrite the memory past the 5 bytes you reserved.
That is why c can have serious security implementations. You have to know that you are only going to write 4 characters + a \0. C will let you overwrite memory until the program crashes.
Please don't think of char foo[5] as a string. Think of it as a spot to put 5 bytes. You can store 5 characters in there without a null, but you have to remember you need to do a memcpy(otherCharArray, foo, 5) and not use strcpy. You also have to know that the otherCharArray has enough space for those 5 bytes.
You'll end up with undefined behaviour.
As you say, the size of A will always be 5, so if you read 5 or more chars, scanf will try to write to a memory, that it's not supposed to modify.
And no, there's no reserved space/char for the \0 symbol.
Any string greater than 4 characters in length will cause scanf to write beyond the bounds of the array. The resulting behavior is undefined and, if you're lucky, will cause your program to crash.
If you're wondering why scanf doesn't stop writing strings that are too long to be stored in the array A, it's because there's no way for scanf to know sizeof(A) is 5. When you pass an array as the parameter to a C function, the array decays to a pointer pointing to the first element in the array. So, there's no way to query the size of the array within the function.
In order to limit the number of characters read into the array use
scanf("%4s", A);
There isn't a character that is reserved, so you must be careful not to fill the entire array to the point it can't be null terminated. Char functions rely on the null terminator, and you will get disastrous results from them if you find yourself in the situation you describe.
Much C code that you'll see will use the 'n' derivatives of functions such as strncpy. From that man page you can read:
The strcpy() and strncpy() functions return s1. The stpcpy() and
stpncpy() functions return a
pointer to the terminating `\0' character of s1. If stpncpy() does not terminate s1 with a NUL
character, it instead returns a pointer to s1[n] (which does not necessarily refer to a valid mem-
ory location.)
strlen also relies on the null character to determine the length of a character buffer. If and when you're missing that character, you will get incorrect results.
the null character is used for the termination of array. it is at the end of the array and shows that the array is end at that point. the array automatically make last character as null character so that the compiler can easily understand that the array is ended.
\0 is an terminator operator which terminates itself when array is full
if array is not full then \0 will be at the end of the array
when you enter a string it will read from the end of the array