C Program crashes when adding an extra int - c

I am new to C and using Eclipse IDE
The following code works fine:
#include <stdio.h>
#include <stdlib.h>
#include <String.h>
int main()
{
char *lineName;
int stationNo;
int i;
while (scanf("%s (%d)", lineName, &stationNo)!=EOF) {
for (i=0; i<5 ; i++ ){
printf("%d %d",i);
}
}
return 0;
}
Input:
Green (21)
Red (38)
Output:
Green (21)
Red (38)
0123401234
However, when simply add a new int:
#include <stdio.h>
#include <stdlib.h>
#include <String.h>
int main()
{
char *lineName;
int stationNo;
int i,b=0;
while (scanf("%s (%d)", lineName, &stationNo)!=EOF) {
printf("%d",b);
for (i=0; i<5 ; i++ ){
printf("%d",i);
}
}
return 0;
}
The program will crash with the same input.
Can anybody tell me why?

You said your first program "works", but it works only by accident. It's like a car zooming down the road with no lugnuts holding on the front wheels, only by some miracle they haven't fallen off — yet.
You said
char *lineName;
This gives you a pointer variable that can point to some characters, but it doesn't point anywhere yet. The value of this pointer is undefined. It's sort of like saying "int i" and asking what the value of i is.
Next you said
scanf("%s (%d)", lineName, &stationNo)
You're asking scanf to read a line name and store the string in the memory pointed to by lineName. But where is that memory? We have no idea whatsoever!
The situation with uninitialized pointers is a little trickier to think about because, as always, with pointers we have to distinguish between the value of the pointer as opposed to the data at the memory which the pointer points to. Earlier I mentioned saying int i and asking what the value of i is. Now, there's going to be some bit pattern in i — it might be 0, or 1, or -23, or 8675309.
Similarly, there's going to be some bit pattern in lineName — it might "point at" memory location 0x00000000, or 0xffe01234, or 0xdeadbeef. But then the questions are: is there actually any memory at that location, and do we have permission to write to it, and is it being used for anything else? If there is memory and we do have permission and it's not being used for anything else, the program might seem to work — for now. But those are three pretty big ifs! If the memory doesn't exist, or if we don't have permission to write to it, the program is probably going to crash when it tries. And if the memory is being used for something else, something's going to go wrong — if not now, then later — when we ask scanf to write its string there.
And, really, if what we care about is writing programs that work (and that work for the right reasons), we don't have to ask any of these questions. We don't have to ask where lineName points when we don't initialize it, or whether there's any memory there, or if we have permission to write to it, or if it's being used for something else. Instead, we should simply, actually, initialize lineName! We should explicitly make it point to memory that we do own and that we are allowed to write to and that isn't being used for anything else!
There are several ways to do this. The easiest is to use an array for lineName, not a pointer:
char lineName[20];
Or, if we have our hearts set on using a pointer, we can call malloc:
char *lineName = malloc(20);
However, if we do that, we have to check to make sure malloc succeeded:
if(lineName == NULL) {
fprintf(stderr, "out of memory!\n");
exit(1);
}
If you make either of those changes, your program will work.
...Well, actually, we're still in a situation where your program will seem to work, even though it still has another, pretty serious, lurking problem. We've allocated 20 characters for lineName, which gives us 19 actual characters, plus the trailing '\0'. But we don't know what the user is going to type. What if the user types 20 or more characters? That will cause scanf to write more than 20 characters to lineName, off past the end of what lineName's memory is allowed to hold, and we're back in the situation of writing to memory that we don't own and that might be in use for something else.
One solution is to make lineName bigger — declare it as char lineName[100], or call malloc(100). But that just moves the problem around — now we have to worry about the (perhaps smaller) chance that the user will type 100 or more characters. So the next thing to do is to tell scanf not to write more to lineName than we've arranged for it to hold. This is actually pretty simple. If lineName is still set up to hold 20 characters, just call
scanf("%19s (%d)", lineName, &stationNo)
That format specifier %19s tells scanf that it's only allowed to read and store a string of up to 19 characters long, leaving one byte free for the terminating '\0' that it's also going to add.
Now, I've said a lot here, but I realize I haven't actually gotten around to answering the question of why your program went from working to crashing when you made that seemingly trivial, seemingly unrelated change. This ends up being a hard question to answer satisfactorily. Going back to the analogy I started this answer with, it's like asking why you were able to drive the car with no lugnuts to the store with no problem, but when you tried to drive to grandma's house, the wheels fell off and you crashed into a ditch. There are a million possible factors that might have come into play, but none of them change the underlying fact that driving a car with the wheels not fastened on is a crazy idea, that's not guaranteed to work at all.
In your case, the variables you're talking about — lineName, stationNo, i, and then b — are all local variables, typically allocated on the stack. Now, one of the characteristics of the stack is that it gets used for all sorts of stuff, and it never gets cleared between uses. So if you have an uninitialized local variable, the particular random bits that it ends up containing depend on whatever was using that piece of the stack last time. If you change your program slightly so that different functions get called, those different functions may leave different random values lying around on the stack. Or if you change your function to allocate different local variables, the compiler may place them in different spots on the stack, meaning that they'll end up picking up different random values from whatever was there last time.
Anyway, somehow, with the first version of your program, lineName ended up containing a random value that corresponded to a pointer that pointed to actual memory which you could get away with writing to. But when you added that fourth variable b, things moved around just enough that lineName ended up pointing to memory that didn't exist or that you didn't have permission to write to, and your program crashed.
Make sense?
And now, one more thing, if you're still with me. If you stop and think, this whole thing might be kind of unsettling. You had a program (your first program) that seemed to work just fine, but actually had a decently horrible bug. It wrote to random, unallocated memory. But when you compiled it you got no fatal error messages, and when you ran it there was no indication that anything was amiss. What's up with that?
The answer, as a couple of the comments alluded to, involves what we call undefined behavior.
It turns out that there are three kinds of C programs, which we might call the good, the bad, and the ugly.
Good programs work for the right reasons. They don't break any rules, they don't do anything illegal. They don't get any warnings or error messages when you compile them, and when you run them, they just work.
Bad programs break some rule, and the compiler catches this, and issues a fatal error message, and declines to produce a broken program for you to try to run.
But then there are the ugly programs, that engage in undefined behavior. These are the ones that break a different set of rules, the ones that, for various reasons, the compiler is not obliged to complain about. (Indeed the compiler may or may not even be able to detect them). And programs that engage in undefined behavior can do anything.
Let's think about that last point a little more. The compiler is not obligated to generate error messages when you write a program that uses undefined behavior, so you might not realize you've done it. And the program is allowed to do anything, including work as you expect. But then, since it's allowed to do anything, it might stop working tomorrow, seemingly for no reason at all, either because you made some seemingly innocuous change to it, or merely because you're not around to defend it, as it quietly runs amok and deletes all your customer's data.
So what are you supposed to do about this?
One thing is to use a modern compiler if you can, and turn on its warnings, and pay attention to them. (Good compilers even have an option called "treat warnings as errors", and programmers who care about correct programs usually turn this option on.) Even though, as I said, they're not required to, compilers are getting better and better at detecting undefined behavior and warning you about it, if you ask them to.
And then the other thing, if you're going to be doing a lot of C programming, is to take care to learn the language, what you're allowed to do, what you're not supposed to do. Make a point of writing programs that work for the right reasons. Don't settle for a program that merely seems to work today. And if someone points out that you're depending on undefined behavior, don't say, "But my program works — why should I care?" (You didn't say this, but some people do.)

Related

How defining a const char* variable indirectly causes core dumping?

I've run this:
int main(){
//const char* variable="Hello, World!";//random string
for(char i=0;i<10;i++){//random limit
char* arr;
arr[0]=42;//random number
}
return 0;
}
It hasn't dumped core. But when I've uncommented the commented line and run it again, it produced this error-message:
/usr/bin/timeout: the monitored command dumped core
sh: line 1: 14403 Segmentation fault /usr/bin/timeout 10s main
I was using https://www.tutorialspoint.com/compile_c_online.php.
Why is this happening and what can I do to prevent it?
arr[0]=42;
is the same as
*(arr + 0)=42;
and also
*arr=42;
So you are putting the value 42 into the object that arr points to. However, you do:
char* arr;
so arr is uninitialized and it may point "anywhere" including illegal addresses that will cause a crash. It can also happen that it points to some legal address in which case the program will appear to work. So some times it crashes, other times it seem to work. That's in general called "undefined behavior". You can't know what such code will do...
To prevent this situation, you need to initialize arr to point to a valid object.
For instance:
char* arr = malloc(sizeof *arr);
What the initialised pointer arr points to is undefined and non-deterministic. Anything can happen including both apparently nothing or a core-dump. Modifying the code simply changes what arr happens to point to.
In my test at https://onlinegdb.com/Q1k0Fd5oB it simply ran to completion in both cases (in both cases arr == 0). That's the thing about undefined behaviour. It is also worth noting that this code is also trivially optimised to a no-op (https://godbolt.org/z/7dTvrGaEf) in which case it would not core-dump.
Excellent example of undefined behaviour.
If you corrupt even a single byte of memory (as you do here by writing into an unallocated array), you are likely to get away with it for a while (i.e. nothing will seem to happen), until a totally unrelated change in your code makes your application behave in all sorts of funny ways.
Consider yourself lucky: the crash is systematic and the modification that causes it is very close to the source of the bug. In real life, this corruption could cause an erratic behaviour, your program crashing out of the blue once per hour or per day, spouting the occasional corrupted data, etc.
And the cause of these malfunctions could be located in an entirely different part of the code, written weeks or months ago. That's right, the bug could lay dormant for months until some totally unrelated code change turns it into an app killer. Picture yourself sifting through a few months of code production to locate the source of the problem.
C and C++ are particularly unforgiving languages, leaving the programmer in charge of every single byte of allocated memory. Corrupting memory is extremely easy, and one single byte written where it shouldn't is enough to spell the doom of an entire application.
Moral of the story: sloppy programming is not an option in C/C++. If you don't learn to test your code extensively and adopt some basic defensive and offensive programming techniques early on, you (and your coworkers) are in for a world of pain.

Comparison to NULL pointer in c

This is a code from a tutorial in which user enters the size of the string and string itself. The code should uses memory allocation to reproduce the same string. I have few doubts in the code-
Why is the *text pointer initialized to NULL at the beginning? Was this initialization useful in later part of the program or it is good practice to initialize to NULL.
Why is it comparing the pointer to NULL. Won't the address change once we allocate a string to the pointer? At the end of the string will pointer point to NULL (no address)?
What is the use of scanf(" ")?
After freeing *text pointer, it was again allocated to NULL. So did it effectively free up memory?
#include <stdio.h>
#include <stdlib.h>
int main()
{
int size;
char *text = NULL; //---------------------------------------------------------->1
printf("Enter limit of the text: \n");
scanf("%d", &size);
text = (char *) malloc(size * sizeof(char));
if (text != NULL) //---------------------------------------------------------->2
{
printf("Enter some text: \n");
scanf(" "); //---------------------------------------------------------->3
gets(text);
printf("Inputted text is: %s\n", text);
}
free(text);
text = NULL;//---------------------------------------------------------->4
return 0;
}
Why is the *text pointer initialized to NULL at the beginning?
To protect you from your own humanity, mainly. As the code evolves, it's often easy to forget to initialize the pointer in one or more branches of code and then you're dereferencing an uninitialized pointer - it is undefined behavior, and as such it's not guaranteed to crash. In worst case, if you don't use proper tools such as Valgrind (it'd point it out right away), you can spend hours or days finding such a problem because of how unpredictable it is, and because the behavior changes based on what else was on the stack before the call - so you might see a "bug" in a completely unreleated and perfectly not-buggy code.
Why is it comparing the pointer to NULL.
Because malloc can return a NULL and just because it returns it doesn't mean you can dereference it. The null pointer value is special: it means "hey, this pointer is not valid, don't use it for anything". So before you dereference anything returned from malloc, you have to check that it's not null. To do otherwise is undefined behavior, and modern compilers may do quite unexpected things to your code when such behavior is present. But before asking such a question I'd advise to always check what is the function you're wondering about actually designed to do. Google cppref malloc and the first hit is: https://en.cppreference.com/w/c/memory/malloc. There, under the heading of Return value, we read:
On failure, returns a null pointer.
That's why it's comparing the pointer to NULL!
What is the use of scanf(" ")?
That one is easy: you could have looked it up yourself. The C standard library is well documented: https://en.cppreference.com/w/c/io/fscanf
When you read it, the relevant part is:
format: pointer to a null-terminated character string specifying how to read the input.
The format string consists of [...]
whitespace characters: any single whitespace character in the format string consumes all available consecutive whitespace characters from the input (determined as if by calling isspace in a loop). Note that there is no difference between "\n", " ", "\t\t", or other whitespace in the format string.
And there's your answer: scanf(" ") will consume any whitespace characters in the input, until it reaches either EOF or the first non-whitespace character.
After freeing *text pointer, it was again allocated to NULL. So did it effectively free up memory?
No. First of all, the language used here is wrong: the pointer was assigned a value of NULL. Nothing was allocated! Pointer is like a postal address. You can replace it with the word "NOWHERE", and that's what NULL is. But putting something like "this person has no address" in your address book you have not "allocated" anything.
Yes - free did free the memory. Then you can set it to NULL because you're human, so that you won't forget so easily that the pointer's value is not valid anymore. It's in this case a "note to self". Humans tend to forget that a pointer is null and then will use it. Such use is undefined behavior (your program can do anything, for example erase your hard drive). So the text = NULL assignment has nothing to do with the machine. It has everything to do with you: humans are not perfect, and it's best to program defensively so that you give yourself less chances to introduce a bug as you change the code, or as you work under deadline pressure, etc.
Generally speaking, the NULL assignment at the end of main is not necessary in such a simple short program. But you have to recognize the fact that text cannot be dereferenced after it has been free-d.
Personally, I find it best to leverage the property of C language that gives variables lexical scope. Once the scope ends, the variable is not accessible, so you can't write a bug that would use text - it won't compile. This is called "correctness by design": you design the software in such a way that some bugs are impossible by construction, and if you code the bug then the code won't compile. That's a million times better than catching the bug at runtime, or worse - having to debug it, potentially in unrelated code (remember: undefined behavior is nasty - it often manifests as problems thousands of lines away from the source).
So here's how I'd rewrite it just to address this one issue (there are others still left there):
#include <stdio.h>
#include <stdlib.h>
void process_text(int size)
{
char *const text = malloc(size * sizeof(char));
if (!text) return;
printf("Enter some text: \n");
scanf(" ");
gets(text);
printf("Inputted text is: %s\n", text);
free(text);
}
int main()
{
int size;
printf("Enter limit of the text: \n");
scanf("%d", &size);
process_text(size);
}
The scope of text is limited to the block of process_text. You initialize it immediately at the point of declaration: that's always preferred. There's no need to set it to NULL first, since you assign the desired value right away. You check if maybe malloc has returned NULL (i.e. it failed to allocate memory), and if so you immediately return from the function. A NULL check is idiomatically written as if (pointer) /* do something if the pointer is non-null */ or as if (!pointer) /* do something if the pointer IS NULL */. It's less verbose that way, and anyone reading such code is supposed to know what it means if they have any sort of experience. Now you know too what such code means. It's not a big hurdle to be aware of this idiom. It's less typing and less distraction.
Generally speaking, code that returns early should be preferred to nested if blocks and unending levels of indentation. When there are multiple checks before a function can do its job, they often end up in nested if statements, making the function much harder to read.
There's a flip side to that: in C++ the code is supposed to leverage C++ (i.e. it's not just C compiled with a C++ compiler), and the resources that have to be released when returning from a function should be automatically released by the compiler generated code that invokes destructors. But in C no such automatic destructor calls are made. So if you return from a function early, you have to make sure that you've released any resources that were allocated earlier on. Sometimes the nested if statements help with that, so you shouldn't be blindly following some advice without understanding the context and assumptions the advice makes :)
Although it's truly a matter of preference - and I have C++ background where the code written as above is way more natural - in C probably it'd be better not to return early:
void process_text_alternative_version(int size)
{
char *text = malloc(size * sizeof(char));
if (text) {
printf("Enter some text: \n");
scanf(" ");
gets(text);
printf("Inputted text is: %s\n", text);
}
free(text);
}
The value of text is only used if it's not null, but we don't return from the function early. This ensures that in all cases will the memory block pointed to by text - if any - gets freed! This is very important: it's yet another way to write code that's correct by design, i.e. in a way that makes certain mistakes either impossible or much harder to commit. Written as above, you have no way of forgetting to free the memory (unless you add a return statement somewhere inside).
It must be said that even though some decisions made in the design of the C language library have been atrocious, the interface to free has been thoughtfully made in a way that makes the above code valid. free is explicitly allowed to be passed a null pointer. When you pass it a null pointer - e.g. when malloc above failed to allocate the memory - it will do nothing. That is: "freeing" a null pointer is a perfectly valid thing to do. It doesn't do anything, but it's not a bug. It enables writing code like above, where it's easy to see that in all cases text will be freed.
A VERY IMPORTANT COROLLARY: null pointer checks before free (in C) or delete (in C++) indicate that the author of the code doesn't have a clue about the most basic behavior of free and delete: it's usually an indicator that the code will be written as if it was a black magical art that no mere mortal understands. If the author doesn't understand it, that is. But we can and must do better: we can educate ourselves about what the functions/operators that we use do. It's documented. It costs no money to look that documentation up. People have spent long hours making sure the documentation is there for anyone so inclined to see. Ignoring it is IMHO the very definition of insanity. It's sheer irrationality on a wild rollercoaster ride. For the sane among us: all it takes is a google search that includes the word cppref somewhere. You'll get cppreference links up top, and that's a reliable resource - and collaboratively written, so you can fix any shortcomings you note, since it's a wiki. It's called "cpp"reference, but it really is two references in one: a C++ Reference as well as a C Reference.
Back to the code in question, though: someone could have written it as follows:
void process_text_alternative_version_not_recommended(int size)
{
char *text = malloc(size * sizeof(char));
if (text) {
printf("Enter some text: \n");
scanf(" ");
gets(text);
printf("Inputted text is: %s\n", text);
free(text);
}
}
It's just as valid, but such form defeats the purpose: it's not clear at a glance that text is always freed. You have to inspect the condition of the if block to convince yourself that indeed it will get freed. This code will be OK for a while, and then years later someone will change it to have a bit fancier if condition. And now you got yourself a memory leak, since in some cases malloc will succeed, but free won't be called. You're now hoping that some future programmer, working under pressure and stress (almost invariably!) will notice and catch the problem. Defensive programming means that we protect ourselves not only from bad inputs (whether errant or malicious), but also from our own inherent human fallibility. Thus it makes most sense in my opinion to use the first alternative version: it won't turn into a memory leak no matter how you modify the if condition. But beware: messing up the if condition may turn it into undefined behavior if the test becomes broken such that the body of if executes in spite of the pointer being null. It's not possible to completely protect ourselves from us, sometimes.
As far as constness is concerned, there are 4 ways of declaring the text pointer. I'll explain what they all mean:
char *text - a non-const pointer to non-const character(s): the pointer can be changed later to point to something else, and the characters it points to can be changed as well (or at least the compiler won't prevent you from doing it).
char *const text - a const pointer to non-const character(s) - the pointer itself cannot be changed past this point (the code won't compile if you try), but the characters will be allowed to be changed (the compiler won't complain but that doesn't mean that it's valid to do it - it's up to you the programmer to understand what the circumstances are).
const char *text - a non-const pointer to const character(s): the pointer can be changed later to point somewhere else, but the characters it points to cannot be changed using that pointer - if you try, the code won't compiler.
const char *const text - a const pointer to const character(s): the pointer cannot be changed after its definition, and it cannot be used to change the character(s) it points to - an attempt to do either will prevent the code from compiling.
We chose variant #2: the pointed-to characters can't be constant since gets will definitely be altering them. If you used the variant #4, the code wouldn't compile, since gets expects a pointer to non-const characters.
Choosing #2 we're less likely to mess it up, and we're explicit: this pointer here will remain the same for the duration of the rest of this function.
We also free the pointer immediately before leaving the function: there's no chance we'll inadvertently use it after it was freed, because there's literally nothing done after free.
This coding style protects you from your own humanity. Remember that a lot of software engineering has nothing whatsoever to do with machines. The machine doesn't care much about how comprehensible the code is: it will do what it's told - the code can be completely impenetrable to any human being. The machine doesn't care one bit. The only entitities that are affected - positively or negatively - by the design of the code are the human developers, maintainers, and users. Their humanity is an inseparable aspect of their being, and that implies that they are imperfect (as opposed to the machine which normally is completely dependable).
Finally, this code has a big problem - it again has to do with humans. Indeed you ask the user to enter the size limit for the text. But the assumption must be that humans - being humans - will invariably mess it up. And you'll be absolutely in the wrong if you blame them for messing it up: to err is human, and if you pretend otherwise then you're just an ostrich sticking your head in the sand and pretending there's no problem.
The user can easily make a mistake and enter text longer than the size they declared. That's undefined behavior: the program at this point can do anything, up to and including erasing your hard drive. Here it's not even a joke: in some circumstances it's possible to artificially create an input to this program that would cause the hard drive to indeed be wiped. You may think that it's a far-off possibility, but that's not the case. If you wrote this sort of a program on an Arduino, with an SD card attached, I could create input for both size and text that would cause the contents of the SD card to be zeroed - possibly even an input that can all be typed on a keyboard without use of special control characters. I'm 100% serious here.
Yes, typically this "undefined behavior means you'll format your hard drive" is said tongue-in-cheek, but that doesn't mean preclude it from being a true statement in the right circumstances (usually the more expensive the circumstances, the truer it becomes - such is life). Of course in most cases the user is not malicious - merely error-prone: they'll burn your house down because they were drunk, not because they tried to kill you - that's an awesome consolation I'm sure! But if you get a user that's an adversary - oh boy, they absolutely will leverage all such buffer overrun bugs to take over your system, and soon make you think hard about your choice of career. Maybe landscaping doesn't look all that bad in retrospect when the alternative is to face a massive lawsuit over loss of data (whether disclosure of data or a true loss when the data is wiped and lost).
To this effect, gets() is an absolutely forbidden sort of an interface: it's not possible to make it safe, that is: to make it work when faced with users that are either human, drunk and just error-prone, or worse - an adversary determined to create yet another "Data leak from Bobby Tables' Bank, Inc." headline in the newspaper.
In the second round of fixes, we need to get rid of the gets call: it's basically a big, absurdly bad mistake that the authors of the original C standard library have committed. I am not joking when I say that millions if not billions of dollars have been lost over decades because gets and similarly unsafe interfaces should never ever have been born, and because programmers have been unwittingly using them in spite of their inherently broken, dangerous and unsafe design. What's the problem: well, how on Earth can you tell gets to limit the length of input to actually fit in however much memory you have provided? Sadly, you can't. gets assumes that you-the-programmer have made no mistakes, and that wherever the input's coming from will fit into the space available. Ergo gets is totally utterly broken and any reasonable C coding standard will simply state "Calls to gets are not allowed".
Yes. Forget about gets. Forget about any examples you saw of people calling gets. They are all wrong. Every single one of them. I'm serious. All code using gets is broken, there's no qualification here. If you use gets, you're basically saying "Hey, I've got nothing to lose. If some big institution exposes millions of their users' data, I'm fine with getting sued and having to live under a bridge thereafter". I bet you'd be not so happy about getting sued by a million angry users, so that's where the tale of gets ends. From now on it doesn't exist, and if someone tell you about using gets, you need to look at them weird and tell them "WTF are you talking about? Have you lost your mind?!". That's the only proper response. It's that bad of a problem. No exaggeration here, I'm not trying to scare you.
As for what to do instead of gets? Of course it's a well solved problem. See this question to learn everything you should know about it!
In this function:
It is not needed at all as there is no danger that the automatic variable will be used not initialized in this function
This test checks if malloc was successful or not. If malloc fails it returns NULL
a bit weird way to skip blanks
This statement is not needed at all. The function terminates and variable stops to exists.
The conclusion: I would not rather recommend this kind of code to be used as an example when you learn programming. The authors C knowledge is IMO very limited
Whenever we declare a variable, it is a good practice to initialize it with some value. As you are declaring a dynamic array here, you are initializing it with NULL.
It is set to NULL so that it can be helpful to check if the text is valid or not. If somehow the malloc failed, the text will be still NULL. So you can check whether the malloc failed or not to allocate the memory. Try to put an invalid number for size like -1. You will see that the program won't prompt for the text input, as malloc failed and text is still NULL. I think this answer your query 1, 2, and 4 about why the text is being set to NULL and why it is checking whether the text is NULL or not.
For the 3rd query, After you get the input of size using scanf("%d", &size);, you are pressing Enter. If you don't use the scanf(" ") the pressed Enter will be taken as the end of gets(text) and text would be always empty. So to ignore the Enter pressed after scanf("%d", &size);, scanf(" ") is being used.

Why can I use a char pointer without malloc?

I've programmed something similar and I'm wondering why it works...
char* produceAString(void){
char* myString;
while(somethingIsGoingOn){
//fill myString with a random amountof chars
}
return myString;
}
The theory tells me that I should use malloc to allocate space, when I'm using pointers. But in this case I don't know how much space I need for myString, therefore I just skipped it.
But why does this work? Is it just bad code, which luckily worked for me, or is there something special behind char pointers?
It worked due to pure chance. It might not work the next time you try it. Uninitialized pointers can point anywhere in memory. Writing to them can cause an instant access violation, or a problem that will manifest later, or nothing at all.
This is generally bad code, yes. Also whatever compiler you use is probably not very intelligent or warnings turned off since they usually throw an error or at least a warning like "variable used uninitialized" which is completely true.
You are in ( bad ) luck that when the code runs the point is garbage and somehow the OS allows the write ( or read ), probably you are running in debug mode?
My personal experience is that in some cases its predictable what the OS will do, but you should never ever rely on those things, one example is if you build with MinGW in debug mode, the unintialized values are usualy follow a pattern or zero, in release build its usually complete random junk.
Since you "point to a memory location" it must point to a valid location whenever it is an another variable ( pointing to another variable ) or allocating space at run time ( malloc ) what you are doing is neither so you basically read/write a random memory block and because of some black magic the app doesn't crash because of this, are you running on windows? Windows 2000 or XP? since I know those are not as restrictive as windows since Vista, I remember that back in the day I did similar thing under Windows XP and nothing happened when it was supposed to crash.
So generally, allocate or point to a memory block you want to use before you use the pointer in case you dont know how much memory you need use realloc or just simply figure out a good strategy that has the smallest footprint for your specific case.
One way to see what C actually does is to change this line
char* myString;
into
char* myString=(char*)0;
and break before that line with a debugger and watch the myString variable, it'll junk and if it intalizes the variable it'll be 0 then the rest of your code fail with access violation because you point "nowhere".
The normal operation would be
char* myString=(char*)malloc(125); // whatever amount you want

Why is the compiler OK with this?

I spent an embarrassing amount of time last night tracking down a segfault in my application. Ultimately, it turned out I'd written:
ANNE_SPRITE_FRAME *desiredFrame;
*desiredFrame = anne_sprite_copy_frame(&sprite->current);
instead of:
ANNE_SPRITE_FRAME desiredFrame;
desiredFrame = anne_sprite_copy_frame(&sprite->current);
In line 1 I created a typed pointer, and in line 2 I set the value of the dereferenced pointer to the struct returned by anne_sprite_copy_frame().
Why was this a problem? And why did the compiler accept this at all? All I can figure is that the problem in example 1 is either:
I'm reserving space for the pointer but not the contents that it points to, or
(unlikely) it's trying to store the return value in the memory of the pointer itself
In line 1 I've created a typed pointer, and in line 2 I set the value of the dereferenced pointer to the struct returned by anne_sprite_copy_frame().
Both of these are allowed in C, which is why this is perfectly acceptable by the compiler.
The compiler doesn't check to make sure your pointer actually points to anything meaningful - it just dereferences and assigns.
One of the best and worst features of C is that the compiler does very little sanity checking for you - it follows your instructions, and does exactly what you tell it to do. You told it to do two legal operations - even though the variables were not initialized properly. As such, you get runtime issues, not compile time problems.
I'm reserving space for the pointer but not the contents that it points to
Yeah, exactly. But the compiler (unless it does some static analysis) can't infer that. It only sees that the syntax is valid and the types match, so it compiles your program. Dereferencing an uninitialized pointer is undefined behavior, though, so your program will most likely work erroneously.
The pointer is uninitialized, but it still has a value so it points somewhere. Writing the return value to that memory address overwrites whatever happens to be there, invoking undefined behavior.
Technically the compiler is not in the business of telling you that a syntactically valid construct will result in undefined (or even likely unexpected) behavior, but I would be surprised if there was no warning issued about this particular usage.
C is weakly typed. You can assign anything to anything with the obvious consequences. You have to be very careful and disciplined if you do not want to spend nights uncovering bugs that turn out "stupid". I mean no offense. I went through the same issues due to an array bound overflow that overwrote other variables and only showed up in some other part of the code trying to use these variables. Nightmare! That's why Java is so much easier to deal with. With C you are an acrobat without a net, with Java, you can afford to fall. That said, I do not mean to say Java is better. C has its raison d'etre.

Exceeding array bound in C -- Why does this NOT crash?

I have this piece of code, and it runs perfectly fine, and I don't why:
int main(){
int len = 10;
char arr[len];
arr[150] = 'x';
}
Seriously, try it! It works (at least on my machine)!
It doesn't, however, work if I try to change elements at indices that are too large, for instance index 20,000. So the compiler apparently isn't smart enough to just ignore that one line.
So how is this possible? I'm really confused here...
Okay, thanks for all the answers!
So I can use this to write into memory consumed by other variables on the stack, like so:
#include <stdio.h>
main(){
char b[4] = "man";
char a[10];
a[10] = 'c';
puts(b);
}
Outputs "can". That's a really bad thing to do.
Okay, thanks.
C compilers generally do not generate code to check array bounds, for the sake of efficiency. Out-of-bounds array accesses result in "undefined behavior", and one
possible outcome is that "it works". It's not guaranteed to cause a crash or other
diagnostic, but if you're on an operating system with virtual memory support, and your array index points to a virtual memory location that hasn't yet been mapped to physical memory, your program is more likely to crash.
So how is this possible?
Because the stack was, on your machine, large enough that there happened to be a memory location on the stack at the location to which &arr[150] happened to correspond, and because your small example program exited before anything else referred to that location and perhaps crashed because you'd overwritten it.
The compiler you're using doesn't check for attempts to go past the end of the array (the C99 spec says that the result of arr[150], in your sample program, would be "undefined", so it could fail to compile it, but most C compilers don't).
Most implementations don't check for these kinds of errors. Memory access granularity is often very large (4 KiB boundaries), and the cost of finer-grained access control means that it is not enabled by default. There are two common ways for errors to cause crashes on modern OSs: either you read or write data from an unmapped page (instant segfault), or you overwrite data that leads to a crash somewhere else. If you're unlucky, then a buffer overrun won't crash (that's right, unlucky) and you won't be able to diagnose it easily.
You can turn instrumentation on, however. When using GCC, compile with Mudflap enabled.
$ gcc -fmudflap -Wall -Wextra test999.c -lmudflap
test999.c: In function ‘main’:
test999.c:3:9: warning: variable ‘arr’ set but not used [-Wunused-but-set-variable]
test999.c:5:1: warning: control reaches end of non-void function [-Wreturn-type]
Here's what happens when you run it:
$ ./a.out
*******
mudflap violation 1 (check/write): time=1362621592.763935 ptr=0x91f910 size=151
pc=0x7f43f08ae6a1 location=`test999.c:4:13 (main)'
/usr/lib/x86_64-linux-gnu/libmudflap.so.0(__mf_check+0x41) [0x7f43f08ae6a1]
./a.out(main+0xa6) [0x400a82]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd) [0x7f43f0538ead]
Nearby object 1: checked region begins 0B into and ends 141B after
mudflap object 0x91f960: name=`alloca region'
bounds=[0x91f910,0x91f919] size=10 area=heap check=0r/3w liveness=3
alloc time=1362621592.763807 pc=0x7f43f08adda1
/usr/lib/x86_64-linux-gnu/libmudflap.so.0(__mf_register+0x41) [0x7f43f08adda1]
/usr/lib/x86_64-linux-gnu/libmudflap.so.0(__mf_wrap_alloca_indirect+0x1a4) [0x7f43f08afa54]
./a.out(main+0x45) [0x400a21]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd) [0x7f43f0538ead]
number of nearby objects: 1
Oh look, it crashed.
Note that Mudflap is not perfect, it won't catch all of your errors.
Native C arrays do not get bounds checking. That would require additional instructions and data structures. C is designed for efficiency and leanness, so it doesn't specify features that trade performance for safety.
You can use a tool like valgrind, which runs your program in a kind of emulator and attempts to detect such things as buffer overflows by tracking which bytes are initialized and which aren't. But it's not infallible, for example if the overflowing access happens to perform an otherwise-legal access to another variable.
Under the hood, array indexing is just pointer arithmetic. When you say arr[ 150 ], you are just adding 150 times the sizeof one element and adding that to the address of arr to obtain the address of a particular object. That address is just a number, and it might be nonsense, invalid, or itself an arithmetic overflow. Some of these conditions result in the hardware generating a crash, when it can't find memory to access or detects virus-like activity, but none result in software-generated exceptions because there is no room for a software hook. If you want a safe array, you'll need to build functions around the principle of addition.
By the way, the array in your example isn't even technically of fixed size.
int len = 10; /* variable of type int */
char arr[len]; /* variable-length array */
Using a non-const object to set the array size is a new feature since C99. You could just as well have len be a function parameter, user input, etc. This would be better for compile-time analysis:
const int len = 10; /* constant of type int */
char arr[len]; /* constant-length array */
For the sake of completeness: The C standard doesn't specify bounds checking but neither is it prohibited. It falls under the category of undefined behavior, or errors that need not generate error messages, and can have any effect. It is possible to implement safe arrays, various approximations of the feature exist. C does nod in this direction by making it illegal, for example, to take the difference between two arrays in order to find the correct out-of-bounds index to access an arbitrary object A from array B. But the language is very free-form, and if A and B are part of the same memory block from malloc it is legal. In other words, the more C-specific memory tricks you use, the harder automatic verification becomes even with C-oriented tools.
Under the C spec, accessing an element past the end of an array is undefined behaviour. Undefined behaviour means that the specification does not say what would happen -- therefore, anything could happen, in theory. The program might crash, or it might not, or it might crash hours later in a completely unrelated function, or it might wipe your harddrive (if you got unlucky and poked just the right bits into the right place).
Undefined behaviour is not easily predictable, and it should absolutely never be relied upon. Just because something appears to work does not make it right, if it invokes undefined behaviour.
Because you were lucky. Or rather unlucky, because it means it's harder to find the bug.
The runtime will only crash if you start using the memory of another process (or in some cases unallocated memory). Your application is given a certain amount of memory when it opens, which in this case is enough, and you can mess about in your own memory as much as you like, but you'll give yourself a nightmare of a debugging job.

Resources