I have written a simple program to calculate length of string in this way.
I know that there are other ways too. But I just want to know why this program is giving this output.
#include <stdio.h>
int main()
{
char str[1];
printf( "%d", printf("%s", gets(str)));
return 0;
}
OUTPUT :
(null)6
Unless you always pass empty strings from the standard input, you are invoking undefined behavior, so the output could be pretty much anything, and it could crash as well. str cannot be a well-formed C string of more than zero characters.
char str[1] allocates storage room for one single character, but that character needs to be the NUL character to satisfy C string constraints. You need to create a character array large enough to hold the string that you're writing with gets.
"(null)6" as the output could mean that gets returned NULL because it failed for some reason or that the stack was corrupted in such a way that the return value was overwritten with zeroes (per the undefined behavior explanation). 6 following "(null)" is expected, as the return value of printf is the number of characters that were printed, and "(null)" is six characters long.
There's several issues with your program.
First off, you're defining a char buffer way too short, a 1 char buffer for a string can only hold one string, the empty one. This is because you need a null at the end of the string to terminate it.
Next, you're using the gets function which is very unsafe, (as your compiler almost certainly warned you about), as it just blindly takes input and copies it into a buffer. As your buffer is 0+terminator characters long, you're going to be automatically overwriting the end of your string into other areas of memory which could and probably does contain important information, such as your rsp (your return pointer). This is the classic method of smashing the stack.
Third, you're passing the output of a printf function to another printf. printf isn't designed for formating strings and returning strings, there are other functions for that. Generally the one you will want to use is sprintf and pass it in a string.
Please read the documentation on this sort of thing, and if you're unsure about any specific thing read up on it before just trying to program it in. You seem confused on the basic usage of many important C functions.
It invokes undefined behavior. In this case you may get any thing. At least str should be of 2 bytes if you are not passing a empty string.
When you declare a variable some space is reserved to store the value.
The reserved space can be a space that was previously used by some other
code and has values. When the variable goes out of scope or is freed
the value is not erased (or it may be, anything goes.) only the programs access
to that variable is revoked.
When you read from an unitialised location you can get anything.
This is undefined behaviour and you are doing that,
Output on gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3 is 0
For above program your input is "(null)", So you are getting "(null)6". Here "6" is the output from printf (number of characters successfully printed).
Related
Recently I was programming in my Code Blocks and I did a little program only for hobby in C.
char littleString[1];
fflush( stdin );
scanf( "%s", littleString );
printf( "\n%s", littleString);
If I created a string of one character, why does the CodeBlocks allow me to save 13 characters?
C have no bounds-checking, writing out of bounds of arrays or dynamically allocated memory can't be checked by the compiler. Instead it will lead to undefined behavior.
To prevent buffer overflow with scanf you can tell it to only read a specific number of characters, and nothing more. So to tell it to read only one character you use the format "%1s".
As a small side-note: Remember that strings in C have an extra character in them, the terminator (character '\0'). So if you have a string that should contain one character, the size actually needs to be two characters.
LittleString is not a string. It is a char array of length one. In order for a char array to be a string, it must be null terminated with an \0. You are writing past the memory you have allotted for littleString. This is undefined behavior.Scanf just reads user input from the console and assigns it to the variable specified, in this case littleString. If you would like to control the length of user input which is assigned to the variable, I would suggest using scanf_s. Please note that scanf_s is not a C99 standard
Many functions in C is implemented without any checks for correctness of use. In other words, it is the callers responsibility that the arguments fulfill some rules set by the function.
Example: For strcpy the Linux man page says
The strcpy() function copies the string pointed to by src,
including the terminating null byte ('\0'), to the buffer
pointed to by dest. The strings may not overlap, and the
destination string dest must be large enough to receive the copy.
If you as a caller break that contract by passing a too small buffer, you'll have undefined behavior and anything can happen.
The program may crash or even do exactly what you expected in 99 out of 100 times and do something strange in 1 out of 100 times.
I am studying now C with "C Programming Absolute Beginner's Guide" (3rd Edition) and there was written that all character arrays should have a size equal to the string length + 1 (which is string-termination zero length). But this code:
#include <stdio.h>
main()
{
char name[4] = "Givi";
printf("%s\n",name);
return 0;
}
outputs Givi and not Giv. Array size is 4 and in that case it should output Giv, because 4 (string length) + 1 (string-termination zero character length) = 5, and the character array size is only 4.
Why does my code output Givi and not Giv?
I am using MinGW 4.9.2 SEH for compilation.
You are hitting what is considered to be undefined behavior. It's working now, but due to chance, not correctness.
In your case, it's because the memory in your program is probably all zeroed out at the beginning. So even though your string is not terminated properly, it just so happens that the memory right after it is zero, so printf knows when to stop.
+-----------------------+
|G|i|v|i|\0|\0|... |
+-----------------------+
| your | rest of |
| stuff | memory (stack)|
+-----------------------+
Other languages, such as Java, have safeguards against this sort of situations. Languages like C, however, do less hand holding, which, on the one hand, allows more flexibility, but on the other, give you much, much more ways to shoot you in the foot with subtle issues such as this one. In other words, if your code compiles, that doesn't mean it's correct and it won't blow up now, in 5 minutes or in 5 years.
In real life, this is almost never the case, and your string might end up getting stored next to other things, which would always end up getting printed out together with your string. You never want this. Situations like this might lead to crashes, exploits and leaked confidential information.
See the following diagram for an example. Imagine you're working on a web server and the string "secret"--a user's password or key is stored right next to your harmless string:
+-----------------------+
|G|i|v|i|s|e|c|r|e|t |
+-----------------------+
| your | rest of |
| stuff | memory (stack)|
+-----------------------+
Every time you would output what you would think is "Givi", you'd end up printing out the secret string, which is not what you want.
The byte after the last character always has to be 0, otherwise printf would not know when the string is terminanted and would try to access bytes (or chars) while they are not 0.
As Andrei said, apparently it just happened, that the compiler put at least one byte with the value 0 after your string data, so printf recognized the end of the string.
This can vary from compiler to compiler and thus is undefined behaviour.
There could, for instance, be a chance to have printf accessing an address, which your program is not allowed to. This would result in a crash.
In C text strings are stored as zero terminated arrays of characters. This means that the end of a text string is indicated by a special character, a numeric value of zero (0), to indicate the end of the string.
So the array of text characters to be used to store a C text string must include an array element for each of the characters as well as an additional array element for the end of string.
All of the C text string functions (strcpy(), strcmp(), strcat(), etc.) all expect that the end of a text string is indicated by a value of zero. This includes the printf() family of functions that print or output text to the screen or to a file. Since these functions depend on seeing a zero value to terminate the string, one source of errors when using C text strings is copying too many characters due to a missing zero terminator or copying a long text string into a smaller buffer. This type of error is known as a buffer overflow error.
The C compiler will perform some types of adjustments for you automatically. For instance:
char *pText = "four"; // pointer to a text string constant, compiler automatically adds zero to an additional array element for the constant "four"
char text[] = "four"; // compiler creates a array with 5 elements and puts the characters four in the first four array elements, a value of 0 in the fifth
char text[5] = "four"; // programmer creates array of 5 elements, compiler puts the characters four in the first four array elements, a value of 0 in the fifth
In the example you provided a good C compiler should issue at the minimum a warning and probably an error. However it looks like your compiler is truncating the string to the array size and is not adding the additional zero string terminator. And you are getting lucky in that there is a zero value after the end of the string. I suppose there is also the possibility that the C compiler is adding an additional array element anyway but that would seem unlikely.
What your book states is basically right, but there is missing the phrase "at least". The array can very well be larger.
You already stated the reason for the min length requirement. So what does that tell you about the example? It is crap!
What it exhibits is called undefined behaviour (UB) and might result in daemons flying out your nose for the printf() - not the initializer. It is just not covered by the C standard (well ,the standard actually says this is UB), so the compiler (and your libraries) are not expected to behave correctly.
For such cases, no terminator will be appended explicitly, so the string is not properly terminated when passed to `printf()".
Reason this does not produce an error is likely some legacy code which did exploit this to safe some bytes of memory. So, instead of reporting an error that the implicit trailing '\0' terminator does not fit, it simply does not append it. Silently truncating the string literal would also be a bad idea.
The following line:
char name[4] = "Givi";
May give warning like:
string for array of chars is too long
Because the behavior is Undefined, still compiler may pass it. But if you debug, you will see:
name[0] 'G'
name[1] 'i'
name[2] 'V'
name[3] '\0'
And so the output is
Giv
Not Give as you mentioned in the question!
I'm using GCC compiler.
But if you write something like this:
char name[4] = "Giv";
Compiles fine! And output is
Giv
I did a lot of searching around for this, couldn't find any question with the same exact issue.
Here is my code:
void fun(char* name){
printf("%s",name);
}
char name[6];
sscanf(input,"RECTANGLE_SEARCH(%6[A-Za-z0-9])",name)
printf("%s",name);
fun(name);
The name is grabbed from scanf, and it printed out fine at first. Then when fun is called, there is a segmentation fault when it tries to print out name. Why is this?
After looking in my scrying-glass, I have it:
Your scanf did overflow the buffer (more than 6 byte including terminator read), with ill-effect slightly delayed due to circumstance:
Nobody else relied on or re-used the memory corrupted at first, thus the first printf seems to work.
Somewhere after the first and before the second call to printf the space you overwrote got re-used, so the string you read was no longer terminated before encountering not allocated pages.
Thus, a segmentation-fault at last.
Of course, your program was toast the moment it overflowed the buffer, not later when it finally crashed.
Morale: Never write to memory you have not dedicated for that.
Looking at your edit, the format %6[A-Za-z0-9] tries to read up to 6 characters exclusive the terminator, not inclusive!
Since you're reading 6 characters, you have to declare name to be 7 characters, so there's room for the terminating null character:
char name[7];
Otherwise, you'll get a buffer overflow, and the consequences are undefined. Once you have undefined consequences, anything can happen, including 2 successful calls to printf() followed by a segfault when you call another function.
You're probably walking off the end of the array with your printf statement. Printf uses the terminating null character '\0' to know where the end of the string is. Try allocating your array like this:
char name[6] = {'\0'};
This will allocate your array with every element initially set to the '\0' character, which means that as long as you don't overwrite the entire array with your scanf, printf will terminate before walking off the end.
Are you sure that name is zero byte terminated? scanf can overflow your buffer depending on how you are calling it.
If that happens then printf will read beyond the end of the array resulting in undefined behavior and probably a segmentation fault.
int main()
{
char *p;
p = (char* ) malloc(sizeof(char) * 0);
printf("Hello Enter the data without spaces :\n");
scanf("%s",p);
printf("The entered string is %s\n",p);
//puts(p);
}
On compiling the above code and running it , the program is able to read the string even though we assigned a 0 byte memory to the pointer p .
What actually happens in the statement p = (char* ) malloc(0) ?
It is implementation defined what malloc() will return but it is undefined behavior to use that pointer. And Undefined behavior means that anything can happen literally from program working without glitch to a crash, all safe bets are off.
C99 Standard:
7.22.3 Memory management functions
Para 1:
If the size of the space requested is zero, the behavior is implementation-defined: either a null pointer is returned, or the behavior is as if the size were some nonzero value, except that the returned pointer shall not be used to access an object.
In addition to Als comment - what happens: you write somewhere into memory and retrieve the data from there. So depending of your system and OS type you get a exception or just some undefined behaviour
Just out of curiosity, I tested your code using gcc on linux, and its a lot more robust than I would expect (after all, writing data to a character buffer of length 0 is undefined behavior... I would expect it to crash).
Here's my modification of your code:
#include <stdio.h>
#include <stdlib.h>
int main()
{
char *p;
p = malloc(sizeof(char)*0);
printf("Hello Enter some without spaces :\n");
scanf("%s",p);
char *q;
q = malloc(sizeof(char)*0);
printf("Hello Enter more data without spaces :\n");
scanf("%s",q);
printf("The first string is '%s'\n",p);
printf("The second string is '%s'\n",q);
}
My first thought was that you might be saved by the fact that you're only reading data into a single memory location -- if you use two buffers, the second might overwrite the first... so I broke the code into input and output sections:
Hello Enter some without spaces :
asdf
Hello Enter more data without spaces :
tutututu
The first string is 'asdf'
The second string is 'tutututu'
If the first buffer had been overwritten, we would see
The first string is 'tutututu'
The second string is 'tutututu'
So that's not the case. [but this depends on how much data you pack into each buffer... see below]
Then, I pasted a crazy amount of data into both variables:
perl -e 'print "c" x 5000000 . "\n" ' | xsel -i
(This put 4+ MB of 'c's into the copy buffer). I pasted this in to both the first and second scanf calls. The program took it without a segmentation fault.
Even though I didn't have a segmentation fault, the first buffer did get overwritten. I couldn't tell it because so much data went flying up the screen. Here's a run with less data:
$ ./foo
Hello Enter some without spaces :
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Hello Enter more data without spaces :
ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
The first string is 'aaaaaaaaaaaa'
The second string is 'ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc'
There was a little glyph after aaaaaaaaaaaa, which is how my terminal represents a unicode character that it can't display. This is typical of overwritten data: you don't know what is going to overwrite your data... it's undefined behavior, so you're prone to nasal demons.
The bottom line is that when you write to memory that you haven't allocated space for (either explicitly using malloc or implicitly with an array), you're playing with fire. Sooner or later, you'll overwrite memory and cause yourself all sorts of grief.
The real lesson here is that C doesn't do bounds checking. It will happily let you write to memory that you don't own. You can do it all day long. Your program may run correctly, and it may not. It may crash, it may write back corrupted data, or it might work until you scan in one more byte than you used while testing. It doesn't care, so you have to.
The case of malloc(0) is simply a special case of this question.
If by mistake,I define a char array with no \0 as its last character, what happens then?
I'm asking this because I noticed that if I try to iterate through the array with while(cnt!='\0'), where cnt is an int variable used as an index to the array, and simultaneously print the cnt values to monitor what's happening the iteration stops at the last character +2.The extra characters are of course random but I can't get it why it has to stop after 2.Does the compiler automatically inserts a \0 character? Links to relevant documentation would be appreciated.
To make it clear I give an example. Let's say that the array str contains the word doh(with no '\0'). Printing the cnt variable at every loop would give me this:
doh+
or doh^
and so on.
EDIT (undefined behaviour)
Accessing array elements outside of the array boundaries is undefined behaviour.
Calling string functions with anything other than a C string is undefined behaviour.
Don't do it!
A C string is a sequence of bytes terminated by and including a '\0' (NUL terminator). All the bytes must belong to the same object.
Anyway, what you see is a coincidence!
But it might happen like this
,------------------ garbage
| ,---------------- str[cnt] (when cnt == 4, no bounds-checking)
memory ----> [...|d|o|h|*|0|0|0|4|...]
| | \_____/ -------- cnt (big-endian, properly 4-byte aligned)
\___/ ------------------ str
If you define a char array without the terminating \0 (called a "null terminator"), then your string, well, won't have that terminator. You would do that like so:
char strings[] = {'h', 'e', 'l', 'l', 'o'};
The compiler never automatically inserts a null terminator in this case. The fact that your code stops after "+2" is a coincidence; it could just as easily stopped at +50 or anywhere else, depending on whether there happened to be \0 character in the memory following your string.
If you define a string as:
char strings[] = "hello";
Then that will indeed be null-terminated. When you use quotation marks like that in C, then even though you can't physically see it in the text editor, there is a null terminator at the end of the string.
There are some C string-related functions that will automatically append a null-terminator. This isn't something the compiler does, but part of the function's specification itself. For example, strncat(), which concatenates one string to another, will add the null terminator at the end.
However, if one of the strings you use doesn't already have that terminator, then that function will not know where the string ends and you'll end up with garbage values (or a segmentation fault.)
In C language the term string refers to a zero-terminated array of characters. So, pedantically speaking there's no such thing as "strings without a '\0' char". If it is not zero-terminated, it is not a string.
Now, there's nothing wrong with having a mere array of characters without any zeros in it, as long as you understand that it is not a string. If you ever attempt to work with such character array as if it is a string, the behavior of your program is undefined. Anything can happen. It might appear to "work" for some magical reasons. Or it might crash all the time. It doesn't really matter what such a program will actually do, since if the behavior is undefined, the program is useless.
This would happen if, by coincidence, the byte at *(str + 5) is 0 (as a number, not ASCII)
As far as most string-handling functions are concerned, strings always stop at a '\0' character. If you miss this null-terminator somewhere, one of three things will usually happen:
Your program will continue reading past the end of the string until it finds a '\0' that just happened to be there. There are several ways for such a character to be there, but none of them is usually predictable beforehand: it could be part of another variable, part of the executable code or even part of a larger string that was previously stored in the same buffer. Of course by the time that happens, the program may have processed a significant amount of garbage. If you see lots of garbage produced by a printf(), an unterminated string is a common cause.
Your program will continue reading past the end of the string until it tries to read an address outside its address space, causing a memory error (e.g. the dreaded "Segmentation fault" in Linux systems).
Your program will run out of space when copying over the string and will, again, cause a memory error.
And, no, the C compiler will not normally do anything but what you specify in your program - for example it won't terminate a string on its own. This is what makes C so powerful and also so hard to code for.
I bet that an int is defined just after your string and that this int takes only small values such that at least one byte is 0.