Comparison to NULL pointer in c - c

This is a code from a tutorial in which user enters the size of the string and string itself. The code should uses memory allocation to reproduce the same string. I have few doubts in the code-
Why is the *text pointer initialized to NULL at the beginning? Was this initialization useful in later part of the program or it is good practice to initialize to NULL.
Why is it comparing the pointer to NULL. Won't the address change once we allocate a string to the pointer? At the end of the string will pointer point to NULL (no address)?
What is the use of scanf(" ")?
After freeing *text pointer, it was again allocated to NULL. So did it effectively free up memory?
#include <stdio.h>
#include <stdlib.h>
int main()
{
int size;
char *text = NULL; //---------------------------------------------------------->1
printf("Enter limit of the text: \n");
scanf("%d", &size);
text = (char *) malloc(size * sizeof(char));
if (text != NULL) //---------------------------------------------------------->2
{
printf("Enter some text: \n");
scanf(" "); //---------------------------------------------------------->3
gets(text);
printf("Inputted text is: %s\n", text);
}
free(text);
text = NULL;//---------------------------------------------------------->4
return 0;
}

Why is the *text pointer initialized to NULL at the beginning?
To protect you from your own humanity, mainly. As the code evolves, it's often easy to forget to initialize the pointer in one or more branches of code and then you're dereferencing an uninitialized pointer - it is undefined behavior, and as such it's not guaranteed to crash. In worst case, if you don't use proper tools such as Valgrind (it'd point it out right away), you can spend hours or days finding such a problem because of how unpredictable it is, and because the behavior changes based on what else was on the stack before the call - so you might see a "bug" in a completely unreleated and perfectly not-buggy code.
Why is it comparing the pointer to NULL.
Because malloc can return a NULL and just because it returns it doesn't mean you can dereference it. The null pointer value is special: it means "hey, this pointer is not valid, don't use it for anything". So before you dereference anything returned from malloc, you have to check that it's not null. To do otherwise is undefined behavior, and modern compilers may do quite unexpected things to your code when such behavior is present. But before asking such a question I'd advise to always check what is the function you're wondering about actually designed to do. Google cppref malloc and the first hit is: https://en.cppreference.com/w/c/memory/malloc. There, under the heading of Return value, we read:
On failure, returns a null pointer.
That's why it's comparing the pointer to NULL!
What is the use of scanf(" ")?
That one is easy: you could have looked it up yourself. The C standard library is well documented: https://en.cppreference.com/w/c/io/fscanf
When you read it, the relevant part is:
format: pointer to a null-terminated character string specifying how to read the input.
The format string consists of [...]
whitespace characters: any single whitespace character in the format string consumes all available consecutive whitespace characters from the input (determined as if by calling isspace in a loop). Note that there is no difference between "\n", " ", "\t\t", or other whitespace in the format string.
And there's your answer: scanf(" ") will consume any whitespace characters in the input, until it reaches either EOF or the first non-whitespace character.
After freeing *text pointer, it was again allocated to NULL. So did it effectively free up memory?
No. First of all, the language used here is wrong: the pointer was assigned a value of NULL. Nothing was allocated! Pointer is like a postal address. You can replace it with the word "NOWHERE", and that's what NULL is. But putting something like "this person has no address" in your address book you have not "allocated" anything.
Yes - free did free the memory. Then you can set it to NULL because you're human, so that you won't forget so easily that the pointer's value is not valid anymore. It's in this case a "note to self". Humans tend to forget that a pointer is null and then will use it. Such use is undefined behavior (your program can do anything, for example erase your hard drive). So the text = NULL assignment has nothing to do with the machine. It has everything to do with you: humans are not perfect, and it's best to program defensively so that you give yourself less chances to introduce a bug as you change the code, or as you work under deadline pressure, etc.
Generally speaking, the NULL assignment at the end of main is not necessary in such a simple short program. But you have to recognize the fact that text cannot be dereferenced after it has been free-d.
Personally, I find it best to leverage the property of C language that gives variables lexical scope. Once the scope ends, the variable is not accessible, so you can't write a bug that would use text - it won't compile. This is called "correctness by design": you design the software in such a way that some bugs are impossible by construction, and if you code the bug then the code won't compile. That's a million times better than catching the bug at runtime, or worse - having to debug it, potentially in unrelated code (remember: undefined behavior is nasty - it often manifests as problems thousands of lines away from the source).
So here's how I'd rewrite it just to address this one issue (there are others still left there):
#include <stdio.h>
#include <stdlib.h>
void process_text(int size)
{
char *const text = malloc(size * sizeof(char));
if (!text) return;
printf("Enter some text: \n");
scanf(" ");
gets(text);
printf("Inputted text is: %s\n", text);
free(text);
}
int main()
{
int size;
printf("Enter limit of the text: \n");
scanf("%d", &size);
process_text(size);
}
The scope of text is limited to the block of process_text. You initialize it immediately at the point of declaration: that's always preferred. There's no need to set it to NULL first, since you assign the desired value right away. You check if maybe malloc has returned NULL (i.e. it failed to allocate memory), and if so you immediately return from the function. A NULL check is idiomatically written as if (pointer) /* do something if the pointer is non-null */ or as if (!pointer) /* do something if the pointer IS NULL */. It's less verbose that way, and anyone reading such code is supposed to know what it means if they have any sort of experience. Now you know too what such code means. It's not a big hurdle to be aware of this idiom. It's less typing and less distraction.
Generally speaking, code that returns early should be preferred to nested if blocks and unending levels of indentation. When there are multiple checks before a function can do its job, they often end up in nested if statements, making the function much harder to read.
There's a flip side to that: in C++ the code is supposed to leverage C++ (i.e. it's not just C compiled with a C++ compiler), and the resources that have to be released when returning from a function should be automatically released by the compiler generated code that invokes destructors. But in C no such automatic destructor calls are made. So if you return from a function early, you have to make sure that you've released any resources that were allocated earlier on. Sometimes the nested if statements help with that, so you shouldn't be blindly following some advice without understanding the context and assumptions the advice makes :)
Although it's truly a matter of preference - and I have C++ background where the code written as above is way more natural - in C probably it'd be better not to return early:
void process_text_alternative_version(int size)
{
char *text = malloc(size * sizeof(char));
if (text) {
printf("Enter some text: \n");
scanf(" ");
gets(text);
printf("Inputted text is: %s\n", text);
}
free(text);
}
The value of text is only used if it's not null, but we don't return from the function early. This ensures that in all cases will the memory block pointed to by text - if any - gets freed! This is very important: it's yet another way to write code that's correct by design, i.e. in a way that makes certain mistakes either impossible or much harder to commit. Written as above, you have no way of forgetting to free the memory (unless you add a return statement somewhere inside).
It must be said that even though some decisions made in the design of the C language library have been atrocious, the interface to free has been thoughtfully made in a way that makes the above code valid. free is explicitly allowed to be passed a null pointer. When you pass it a null pointer - e.g. when malloc above failed to allocate the memory - it will do nothing. That is: "freeing" a null pointer is a perfectly valid thing to do. It doesn't do anything, but it's not a bug. It enables writing code like above, where it's easy to see that in all cases text will be freed.
A VERY IMPORTANT COROLLARY: null pointer checks before free (in C) or delete (in C++) indicate that the author of the code doesn't have a clue about the most basic behavior of free and delete: it's usually an indicator that the code will be written as if it was a black magical art that no mere mortal understands. If the author doesn't understand it, that is. But we can and must do better: we can educate ourselves about what the functions/operators that we use do. It's documented. It costs no money to look that documentation up. People have spent long hours making sure the documentation is there for anyone so inclined to see. Ignoring it is IMHO the very definition of insanity. It's sheer irrationality on a wild rollercoaster ride. For the sane among us: all it takes is a google search that includes the word cppref somewhere. You'll get cppreference links up top, and that's a reliable resource - and collaboratively written, so you can fix any shortcomings you note, since it's a wiki. It's called "cpp"reference, but it really is two references in one: a C++ Reference as well as a C Reference.
Back to the code in question, though: someone could have written it as follows:
void process_text_alternative_version_not_recommended(int size)
{
char *text = malloc(size * sizeof(char));
if (text) {
printf("Enter some text: \n");
scanf(" ");
gets(text);
printf("Inputted text is: %s\n", text);
free(text);
}
}
It's just as valid, but such form defeats the purpose: it's not clear at a glance that text is always freed. You have to inspect the condition of the if block to convince yourself that indeed it will get freed. This code will be OK for a while, and then years later someone will change it to have a bit fancier if condition. And now you got yourself a memory leak, since in some cases malloc will succeed, but free won't be called. You're now hoping that some future programmer, working under pressure and stress (almost invariably!) will notice and catch the problem. Defensive programming means that we protect ourselves not only from bad inputs (whether errant or malicious), but also from our own inherent human fallibility. Thus it makes most sense in my opinion to use the first alternative version: it won't turn into a memory leak no matter how you modify the if condition. But beware: messing up the if condition may turn it into undefined behavior if the test becomes broken such that the body of if executes in spite of the pointer being null. It's not possible to completely protect ourselves from us, sometimes.
As far as constness is concerned, there are 4 ways of declaring the text pointer. I'll explain what they all mean:
char *text - a non-const pointer to non-const character(s): the pointer can be changed later to point to something else, and the characters it points to can be changed as well (or at least the compiler won't prevent you from doing it).
char *const text - a const pointer to non-const character(s) - the pointer itself cannot be changed past this point (the code won't compile if you try), but the characters will be allowed to be changed (the compiler won't complain but that doesn't mean that it's valid to do it - it's up to you the programmer to understand what the circumstances are).
const char *text - a non-const pointer to const character(s): the pointer can be changed later to point somewhere else, but the characters it points to cannot be changed using that pointer - if you try, the code won't compiler.
const char *const text - a const pointer to const character(s): the pointer cannot be changed after its definition, and it cannot be used to change the character(s) it points to - an attempt to do either will prevent the code from compiling.
We chose variant #2: the pointed-to characters can't be constant since gets will definitely be altering them. If you used the variant #4, the code wouldn't compile, since gets expects a pointer to non-const characters.
Choosing #2 we're less likely to mess it up, and we're explicit: this pointer here will remain the same for the duration of the rest of this function.
We also free the pointer immediately before leaving the function: there's no chance we'll inadvertently use it after it was freed, because there's literally nothing done after free.
This coding style protects you from your own humanity. Remember that a lot of software engineering has nothing whatsoever to do with machines. The machine doesn't care much about how comprehensible the code is: it will do what it's told - the code can be completely impenetrable to any human being. The machine doesn't care one bit. The only entitities that are affected - positively or negatively - by the design of the code are the human developers, maintainers, and users. Their humanity is an inseparable aspect of their being, and that implies that they are imperfect (as opposed to the machine which normally is completely dependable).
Finally, this code has a big problem - it again has to do with humans. Indeed you ask the user to enter the size limit for the text. But the assumption must be that humans - being humans - will invariably mess it up. And you'll be absolutely in the wrong if you blame them for messing it up: to err is human, and if you pretend otherwise then you're just an ostrich sticking your head in the sand and pretending there's no problem.
The user can easily make a mistake and enter text longer than the size they declared. That's undefined behavior: the program at this point can do anything, up to and including erasing your hard drive. Here it's not even a joke: in some circumstances it's possible to artificially create an input to this program that would cause the hard drive to indeed be wiped. You may think that it's a far-off possibility, but that's not the case. If you wrote this sort of a program on an Arduino, with an SD card attached, I could create input for both size and text that would cause the contents of the SD card to be zeroed - possibly even an input that can all be typed on a keyboard without use of special control characters. I'm 100% serious here.
Yes, typically this "undefined behavior means you'll format your hard drive" is said tongue-in-cheek, but that doesn't mean preclude it from being a true statement in the right circumstances (usually the more expensive the circumstances, the truer it becomes - such is life). Of course in most cases the user is not malicious - merely error-prone: they'll burn your house down because they were drunk, not because they tried to kill you - that's an awesome consolation I'm sure! But if you get a user that's an adversary - oh boy, they absolutely will leverage all such buffer overrun bugs to take over your system, and soon make you think hard about your choice of career. Maybe landscaping doesn't look all that bad in retrospect when the alternative is to face a massive lawsuit over loss of data (whether disclosure of data or a true loss when the data is wiped and lost).
To this effect, gets() is an absolutely forbidden sort of an interface: it's not possible to make it safe, that is: to make it work when faced with users that are either human, drunk and just error-prone, or worse - an adversary determined to create yet another "Data leak from Bobby Tables' Bank, Inc." headline in the newspaper.
In the second round of fixes, we need to get rid of the gets call: it's basically a big, absurdly bad mistake that the authors of the original C standard library have committed. I am not joking when I say that millions if not billions of dollars have been lost over decades because gets and similarly unsafe interfaces should never ever have been born, and because programmers have been unwittingly using them in spite of their inherently broken, dangerous and unsafe design. What's the problem: well, how on Earth can you tell gets to limit the length of input to actually fit in however much memory you have provided? Sadly, you can't. gets assumes that you-the-programmer have made no mistakes, and that wherever the input's coming from will fit into the space available. Ergo gets is totally utterly broken and any reasonable C coding standard will simply state "Calls to gets are not allowed".
Yes. Forget about gets. Forget about any examples you saw of people calling gets. They are all wrong. Every single one of them. I'm serious. All code using gets is broken, there's no qualification here. If you use gets, you're basically saying "Hey, I've got nothing to lose. If some big institution exposes millions of their users' data, I'm fine with getting sued and having to live under a bridge thereafter". I bet you'd be not so happy about getting sued by a million angry users, so that's where the tale of gets ends. From now on it doesn't exist, and if someone tell you about using gets, you need to look at them weird and tell them "WTF are you talking about? Have you lost your mind?!". That's the only proper response. It's that bad of a problem. No exaggeration here, I'm not trying to scare you.
As for what to do instead of gets? Of course it's a well solved problem. See this question to learn everything you should know about it!

In this function:
It is not needed at all as there is no danger that the automatic variable will be used not initialized in this function
This test checks if malloc was successful or not. If malloc fails it returns NULL
a bit weird way to skip blanks
This statement is not needed at all. The function terminates and variable stops to exists.
The conclusion: I would not rather recommend this kind of code to be used as an example when you learn programming. The authors C knowledge is IMO very limited

Whenever we declare a variable, it is a good practice to initialize it with some value. As you are declaring a dynamic array here, you are initializing it with NULL.
It is set to NULL so that it can be helpful to check if the text is valid or not. If somehow the malloc failed, the text will be still NULL. So you can check whether the malloc failed or not to allocate the memory. Try to put an invalid number for size like -1. You will see that the program won't prompt for the text input, as malloc failed and text is still NULL. I think this answer your query 1, 2, and 4 about why the text is being set to NULL and why it is checking whether the text is NULL or not.
For the 3rd query, After you get the input of size using scanf("%d", &size);, you are pressing Enter. If you don't use the scanf(" ") the pressed Enter will be taken as the end of gets(text) and text would be always empty. So to ignore the Enter pressed after scanf("%d", &size);, scanf(" ") is being used.

Related

Assign to a null pointer an area inside a function and preserve the value outside

I have a function that reads from a socket, it returns a char** where packets are stored and my intention is to use a NULL unsigned int pointer where I store the length of single packet.
char** readPackets(int numToRead,unsigned int **lens,int socket){
char** packets=(char**)malloc(numToRead);
int *len=(int*)malloc(sizeof(int)*numToRead);
*(lens)=len;
for(int i=0;i<numToRead;i++){
//read
packets[i]=(char*)malloc(MAX_ETH_LEN);
register int pack_len=read(socket,packets[i],MAX_ETH_LEN);
//TODO handler error in case of freezing
if(pack_len<=0){
i--;
continue;
}
len[i]=pack_len;
}
return packets;
}
I use it in this way:
unsigned int *lens_out=NULL;
char **packets=readPackets(N_PACK,&lens,sniff_sock[handler]);
where N_PACK is a constant defined previously.
Now the problem is that when I am inside the function everything works, in fact *(lens) points to the same memory area of len and outside the function lens_out points to the same area too. Inside the function len[i] equals to *(lens[i]) (I checked it with gdb).
The problem is that outside the function even if lens_out points to the same area of len elements with same index are different for example
len[0]=46
lens_out[0]=4026546640
Can anyone explain where I made the mistake?
Your statement char** packets=(char**)malloc(numToRead) for sure does not reserve enough memory. Note that an element of packets-array is of type char*, and that sizeof(char*) is probably 8 (eventually 4), but very very unlikely 1. So you should write
char** packets = malloc(sizeof(char*) * numToRead)
Otherwise, you write out of the bounds of reserved memory, thereby yielding undefined behaviour (probably the one you explained).
Note further that with i--; continue;, you get memory leaks since you assign a new memory block to the ith element, but you lose reference to the memory reserved right before. Write free(packets[i]);i--;continue; instead.
Further, len[0] is an integral type, whereas lens[0] refers to a pointer to int. Comparing these two does not make sense.
Firstly, I want to put it out there that you should write clear code for the sake of future maintenance, and not for what you think is optimal (yet). This entire function should merely be replaced with read. That's the crux of my post.
Now the problem is that when I am inside the function everything works
I disagree. On a slightly broader topic, the biggest problem here is that you've posted a question containing code which doesn't compile when copied and pasted unmodified, and the question isn't about the error messages, so we can't answer the question without guessing.
My guess is that you haven't noticed these error messages; you're running a stale binary which we don't have the source code for, we can't reproduce the issue and we can't see the old source code, so we can't help you. It is as valid as any other guess. For example, there's another answer which speculates:
Your statement char** packets=(char**)malloc(numToRead) for sure does not reserve enough memory.
The malloc manual doesn't guarantee that precisely numToRead bytes will be allocated; in fact, allocations to processes tend to be performed in pages and just as the sleep manual doesn't guarantee a precise number of milliseconds/microseconds, it may allocate more or it may allocate less; in the latter case, malloc must return NULL which your code needs to check.
It's extremely common for implementations to seem to behave correctly when a buffer is overflowed anyway. Nonetheless, it'd be best if you fixed that buffer overflow. malloc doesn't know about the type you're allocating; you need to tell it everything about the size, not just the number of elements.
P.S. You probably want select or sleep within your loop, you know, to "handle error in case of freezing" or whatever. Generally, OSes will switch context to another program when you call one of those, and only switch back when there's data ready to process. By calling sleep after sending or receiving, you give the OS a heads up that it needs to perform some I/O. The ability to choose that timing can be beneficial, when you're optimising. Not at the moment, though.
Inside the function len[i] equals to *(lens[i]) (I checked it with gdb).
I'm fairly sure you've misunderstood that. Perhaps gdb is implicitly dereferencing your pointers, for you; that's really irrelevant to C (so don't confuse anything you learn from gdb with anything C-related).
In fact, I strongly recommend learning a little bit less about gdb and a lot more about assert, because the former won't help you document your code for future maintenance from other people, including us, those who you ask questions to, where-as the latter will. If you include assert in your code, you're almost certainly strengthening your question (and code) much more than including gdb into your question would.
The types of len[i] and *(len[i]) are different, and their values are affected by the way types are interpreted. These values can only be considered equal When they're converted to the same type. We can see this through C11/3.19p1 (the definition of "value", where the standard establishes it is dependant upon type). len[i] is an int * value, where-as *(len[i]) is an int value. The two categories of values might have different alignment, representation and... well, they have different semantics entirely. One is for integral data, and the other is a reference to an object or array. You shouldn't be comparing them, no matter how equal they may seem; the information you obtain from such a comparison is virtually useless.
You can't use len[i] in a multiplication expression, for example. They're certainly not equal in that respect. They might compare equal (as a side-effect of comparison introducing implicit conversions), which is useless information for you to have, and that is a different story.
memcmp((int[]){0}, (unsigned char[]){ [sizeof int] = 42 }, sizeof int) may return 0 indicating that they're equal, but you know that array of characters contains an extra byte, right? Yeh... they're equal...
You must check the return value of malloc (and don't cast the return value), if you're using it, though I really think you should reconsider your options with that regard.
The fact that you use malloc means everyone who uses your function must then use free; it's locking down-stream programmers into an anti-pattern that can tear the architecture of software apart. You should separate categories of allocation logic and user interface logic from processing logic.
For example, you use read which gives you the opportunity to choose whatever storage duration you like. This means you have an immense number of optimisation opportunities. It gives you, the downstream programmer, the opportunity to write flexible code which assigns whatever storage duration you like to the memory used. Imagine if, on the other hand, you had to free every return value from every function... That's the mess you're encouraging.
This is especially a poor, inefficient design when constants are involved (i.e. your usecase), because you could just use an automatic array and get rid of the calls to malloc and free altogether... Your downstream programmers code could be:
char packet[size][count];
int n_read = read(fd, packet, size * count);
Perhaps you think using malloc to allocate (and later read) n spaces for packets is faster than using something else to allocate n spaces. You should test that theory, because from my experience computers tend to be optimised for simpler, shorter, more concise logic.
In anticipation:
But I can't return packet; like that!
True. You can't return packet; to your downstream programmer, so you modify an object pointed at by an argument. That doesn't mean you should use malloc, though.
Unfortunately, too many programs are adopting this "use malloc everywhere" mentality. It's reminiscent of the "don't use goto" crap that we've been fed. Rather than listening to cargo cult propaganda, I recommend thinking critically about what you hear, because your peers are in the same position as you; they don't necessarily know what they're talking about.

array of pointers in c the code should not run

//this code should give segmentation error....but it works fine ....how is it possible.....i just got this code by hit and trail whle i was trying out some code of topic ARRAY OF POINTERS....PLZ can anyone explain
int main()
{
int i,size;
printf("enter the no of names to be entered\n");
scanf("%d",&size);
char *name[size];
for(i=0;i<size;i++)
{
scanf("%s",name[i]);
}
printf("the names in your array are\n");
for(i=0;i<size;i++)
{
printf("%s\n",&name[i]);
}
return 0
The problem in your code (which is incomplete, BTW; you need #include <stdio.h> at the top and a closing } at the bottom) can be illustrated in a much shorter chunk of code:
char *name[10]; // make the size an arbitrary constant
scanf("%s", name[0]); // Read into memory pointed to by an uninitialized pointer
(name could be a single pointer rather than an array, but I wanted to preserve your program's structure for clarity.)
The pointer name[0] has not been initialized, so its value is garbage. You pass that garbage pointer value to scanf, which reads characters from stdin and stores them in whatever memory location that garbage pointer happens to point to.
The behavior is undefined.
That doesn't mean that the program will die with a segmentation fault. C does not require checking for invalid pointers (nor does it forbid it, but most implementations don't do that kind of checking). So the most likely behavior is that your program will take whatever input you provide and attempt to store it in some arbitrary memory location.
If the garbage value of name[0] happens to point to a detectably invalid memory location, your program might die with a segmentation fault. That's if you're luck. If you're not, it might happen to point to some writable memory location that your program is able to modify. Storing data in that location might be harmless, or it might clobber some critical internal data structure that your program depends on.
Again, your program's behavior is undefined. That means the C standard imposes no requirements on its behavior. It might appear to "work", it might blow up in your face, or it might do anything that it's physically possible for a program to do. Apparently to behave correctly is probably the worst consequence of undefined behavior, since it makes it difficult to diagnose the problem (which will probably appear during a critical demo).
Incidentally, using scanf with a %s format specifier is inherently unsafe, since there's no way to limit the amount of data it will attempt to read. Even with a properly initialized pointer, there's no way to guarantee that it points to enough memory to hold whatever input it receives.
You may be accustomed to languages that do run-time checking and can reliably detect (most) problems like this. C is not such a language.
I'm not sure what's your test case (No enough reputation to post a comment). I just try to input it with 0 and 1\n1\n2\n.
It's a little complex to explain the detail. However, Let's start it :-). There are two things you should know. First, main() is a function. Second, you use a C99 feature, variable-length array or gnu extension, zero-length array (supported by gcc), on char *name[size];.
main() is a function, so all the variable declared in this function is local variables. Local variables locate at stack section. You must know about it first.
If you input 1\n1\n2\n, the variable-length array is used. The implementation of it is also to allocate it on stack. Notice that value of each element in array is not initialized as 0. That is the possible answer for you to execute without segmentation fault. You cannot make sure that it'll point to the address which isn't writable (At least failed on me).
If the input is 0\n, you will use extension feature, zero-length array, supported by GNU. As you saw, it means no element in array. The value of name is equal to &size, because size is the last local variable you declared before you declared name[0] (Consider stack pointer). The value of name[0] is equal to dereference to &size, that's zero (='\0') , so it will work fine.
The simple answer to your question is that a segmentation fault is:
A segmentation fault (aka segfault) are caused by a program trying to read or write an illegal memory location.
So it all depends upon what is classed as illegal. If the memory in question is a part of the valid address space, e.g. the stack, for the process the program is running, it may not cause a segfault.
When I run this code in a debugger the line:
scanf("%s, name[i]);
over writes the content of the size variable, clearly not the intended behaviour, and the code essentially goes into an infinite loop.
But that is just what happens on my 64 bit Intel linux machine using gcc 5.4. Another environment will probably do something different.
If I put the missing & in front of name[i] it works OK. Whether that is luck, or expertly exploiting the intended behaviour of C99 variable length arrays, as suggested. I'm afraid I don't know.
So welcome to the world of subtle memory overwriting bugs.

Char array can hold more than expected

I tried to run this code in C and expected runtime error but actually it ran without errors. Can you tell me the reason of why this happens?
char str[10];
scanf("%s",str);
printf("%s",str);
As I initialized the size of array as 10, how can code prints string of more than 10 letters?
As soon as you read or write from an array outside of its bounds, you invoke undefined behavior.
Whenever this happens, the program may do anything it wants. It is even allowed to play you a birthday song although it's not your birthday, or to transfer money from your bank account. Or it may crash, or delete files, or just pretend nothing bad happened.
In your case, it did the latter, but it is in no way guaranteed.
To learn further details about this phenomenon, read something about exploiting buffer overflows, this is a large topic.
C doesn't perform any bounds checking on the array. This can lead to buffer overflows attack on your executable.
The bound checking should be done at the user end to make it anti-buffer overflow.
Instead of typing in magic numbers when taking input from fgets in an array, always use the sizeof(array) - 1 operator on the array to take in that much, -1 for leaving a space for '\0' character.
This is a good question. And the answer
is that it there is indeed a memory problem
The string is read and stored from the address of str
up until the length of the actual read string demands,
and it exceeds the place you allocated for it.
Now, it may be not crash immediately, or even ever for
short programs, but it's very likely that when you expand
the program, and define other variables, this string will
overrun them, creating weird bugs of all kinds, and it may
eventually also crash.
In short, this is a real error, but it's not uncommon to have
memory bugs like this one which do not affect at first, but
do create bugs or crash the program later.

C Program crashes when adding an extra int

I am new to C and using Eclipse IDE
The following code works fine:
#include <stdio.h>
#include <stdlib.h>
#include <String.h>
int main()
{
char *lineName;
int stationNo;
int i;
while (scanf("%s (%d)", lineName, &stationNo)!=EOF) {
for (i=0; i<5 ; i++ ){
printf("%d %d",i);
}
}
return 0;
}
Input:
Green (21)
Red (38)
Output:
Green (21)
Red (38)
0123401234
However, when simply add a new int:
#include <stdio.h>
#include <stdlib.h>
#include <String.h>
int main()
{
char *lineName;
int stationNo;
int i,b=0;
while (scanf("%s (%d)", lineName, &stationNo)!=EOF) {
printf("%d",b);
for (i=0; i<5 ; i++ ){
printf("%d",i);
}
}
return 0;
}
The program will crash with the same input.
Can anybody tell me why?
You said your first program "works", but it works only by accident. It's like a car zooming down the road with no lugnuts holding on the front wheels, only by some miracle they haven't fallen off — yet.
You said
char *lineName;
This gives you a pointer variable that can point to some characters, but it doesn't point anywhere yet. The value of this pointer is undefined. It's sort of like saying "int i" and asking what the value of i is.
Next you said
scanf("%s (%d)", lineName, &stationNo)
You're asking scanf to read a line name and store the string in the memory pointed to by lineName. But where is that memory? We have no idea whatsoever!
The situation with uninitialized pointers is a little trickier to think about because, as always, with pointers we have to distinguish between the value of the pointer as opposed to the data at the memory which the pointer points to. Earlier I mentioned saying int i and asking what the value of i is. Now, there's going to be some bit pattern in i — it might be 0, or 1, or -23, or 8675309.
Similarly, there's going to be some bit pattern in lineName — it might "point at" memory location 0x00000000, or 0xffe01234, or 0xdeadbeef. But then the questions are: is there actually any memory at that location, and do we have permission to write to it, and is it being used for anything else? If there is memory and we do have permission and it's not being used for anything else, the program might seem to work — for now. But those are three pretty big ifs! If the memory doesn't exist, or if we don't have permission to write to it, the program is probably going to crash when it tries. And if the memory is being used for something else, something's going to go wrong — if not now, then later — when we ask scanf to write its string there.
And, really, if what we care about is writing programs that work (and that work for the right reasons), we don't have to ask any of these questions. We don't have to ask where lineName points when we don't initialize it, or whether there's any memory there, or if we have permission to write to it, or if it's being used for something else. Instead, we should simply, actually, initialize lineName! We should explicitly make it point to memory that we do own and that we are allowed to write to and that isn't being used for anything else!
There are several ways to do this. The easiest is to use an array for lineName, not a pointer:
char lineName[20];
Or, if we have our hearts set on using a pointer, we can call malloc:
char *lineName = malloc(20);
However, if we do that, we have to check to make sure malloc succeeded:
if(lineName == NULL) {
fprintf(stderr, "out of memory!\n");
exit(1);
}
If you make either of those changes, your program will work.
...Well, actually, we're still in a situation where your program will seem to work, even though it still has another, pretty serious, lurking problem. We've allocated 20 characters for lineName, which gives us 19 actual characters, plus the trailing '\0'. But we don't know what the user is going to type. What if the user types 20 or more characters? That will cause scanf to write more than 20 characters to lineName, off past the end of what lineName's memory is allowed to hold, and we're back in the situation of writing to memory that we don't own and that might be in use for something else.
One solution is to make lineName bigger — declare it as char lineName[100], or call malloc(100). But that just moves the problem around — now we have to worry about the (perhaps smaller) chance that the user will type 100 or more characters. So the next thing to do is to tell scanf not to write more to lineName than we've arranged for it to hold. This is actually pretty simple. If lineName is still set up to hold 20 characters, just call
scanf("%19s (%d)", lineName, &stationNo)
That format specifier %19s tells scanf that it's only allowed to read and store a string of up to 19 characters long, leaving one byte free for the terminating '\0' that it's also going to add.
Now, I've said a lot here, but I realize I haven't actually gotten around to answering the question of why your program went from working to crashing when you made that seemingly trivial, seemingly unrelated change. This ends up being a hard question to answer satisfactorily. Going back to the analogy I started this answer with, it's like asking why you were able to drive the car with no lugnuts to the store with no problem, but when you tried to drive to grandma's house, the wheels fell off and you crashed into a ditch. There are a million possible factors that might have come into play, but none of them change the underlying fact that driving a car with the wheels not fastened on is a crazy idea, that's not guaranteed to work at all.
In your case, the variables you're talking about — lineName, stationNo, i, and then b — are all local variables, typically allocated on the stack. Now, one of the characteristics of the stack is that it gets used for all sorts of stuff, and it never gets cleared between uses. So if you have an uninitialized local variable, the particular random bits that it ends up containing depend on whatever was using that piece of the stack last time. If you change your program slightly so that different functions get called, those different functions may leave different random values lying around on the stack. Or if you change your function to allocate different local variables, the compiler may place them in different spots on the stack, meaning that they'll end up picking up different random values from whatever was there last time.
Anyway, somehow, with the first version of your program, lineName ended up containing a random value that corresponded to a pointer that pointed to actual memory which you could get away with writing to. But when you added that fourth variable b, things moved around just enough that lineName ended up pointing to memory that didn't exist or that you didn't have permission to write to, and your program crashed.
Make sense?
And now, one more thing, if you're still with me. If you stop and think, this whole thing might be kind of unsettling. You had a program (your first program) that seemed to work just fine, but actually had a decently horrible bug. It wrote to random, unallocated memory. But when you compiled it you got no fatal error messages, and when you ran it there was no indication that anything was amiss. What's up with that?
The answer, as a couple of the comments alluded to, involves what we call undefined behavior.
It turns out that there are three kinds of C programs, which we might call the good, the bad, and the ugly.
Good programs work for the right reasons. They don't break any rules, they don't do anything illegal. They don't get any warnings or error messages when you compile them, and when you run them, they just work.
Bad programs break some rule, and the compiler catches this, and issues a fatal error message, and declines to produce a broken program for you to try to run.
But then there are the ugly programs, that engage in undefined behavior. These are the ones that break a different set of rules, the ones that, for various reasons, the compiler is not obliged to complain about. (Indeed the compiler may or may not even be able to detect them). And programs that engage in undefined behavior can do anything.
Let's think about that last point a little more. The compiler is not obligated to generate error messages when you write a program that uses undefined behavior, so you might not realize you've done it. And the program is allowed to do anything, including work as you expect. But then, since it's allowed to do anything, it might stop working tomorrow, seemingly for no reason at all, either because you made some seemingly innocuous change to it, or merely because you're not around to defend it, as it quietly runs amok and deletes all your customer's data.
So what are you supposed to do about this?
One thing is to use a modern compiler if you can, and turn on its warnings, and pay attention to them. (Good compilers even have an option called "treat warnings as errors", and programmers who care about correct programs usually turn this option on.) Even though, as I said, they're not required to, compilers are getting better and better at detecting undefined behavior and warning you about it, if you ask them to.
And then the other thing, if you're going to be doing a lot of C programming, is to take care to learn the language, what you're allowed to do, what you're not supposed to do. Make a point of writing programs that work for the right reasons. Don't settle for a program that merely seems to work today. And if someone points out that you're depending on undefined behavior, don't say, "But my program works — why should I care?" (You didn't say this, but some people do.)

Is the function strcpy always dangerous?

Are functions like strcpy, gets, etc. always dangerous? What if I write a code like this:
int main(void)
{
char *str1 = "abcdefghijklmnop";
char *str2 = malloc(100);
strcpy(str2, str1);
}
This way the function doesn't accept arguments(parameters...) and the str variable will always be the same length...which is here 16 or slightly more depending on the compiler version...but yeah 100 will suffice as of march, 2011 :).
Is there a way for a hacker to take advantage of the code above?
10x!
Absolutely not. Contrary to Microsoft's marketing campaign for their non-standard functions, strcpy is safe when used properly.
The above is redundant, but mostly safe. The only potential issue is that you're not checking the malloc return value, so you may be dereferencing null (as pointed out by kotlinski). In practice, this likely to cause an immediate SIGSEGV and program termination.
An improper and dangerous use would be:
char array[100];
// ... Read line into uncheckedInput
// Extract substring without checking length
strcpy(array, uncheckedInput + 10);
This is unsafe because the strcpy may overflow, causing undefined behavior. In practice, this is likely to overwrite other local variables (itself a major security breach). One of these may be the return address. Through a return to lib C attack, the attacker may be able to use C functions like system to execute arbitrary programs. There are other possible consequences to overflows.
However, gets is indeed inherently unsafe, and will be removed from the next version of C (C1X). There is simply no way to ensure the input won't overflow (causing the same consequences given above). Some people would argue it's safe when used with a known input file, but there's really no reason to ever use it. POSIX's getline is a far better alternative.
Also, the length of str1 doesn't vary by compiler. It should always be 17, including the terminating NUL.
You are forcefully stuffing completely different things into one category.
Functions gets is indeed always dangerous. There's no way to make a safe call to gets regardless of what steps you are willing to take and how defensive you are willing to get.
Function strcpy is perfectly safe if you are willing to take the [simple] necessary steps to make sure that your calls to strcpy are safe.
That already puts gets and strcpy in vastly different categories, which have nothing in common with regard to safety.
The popular criticisms directed at safety aspects of strcpy are based entirely on anecdotal social observations as opposed to formal facts, e.g. "programmers are lazy and incompetent, so don't let them use strcpy". Taken in the context of C programming, this is, of course, utter nonsense. Following this logic we should also declare the division operator exactly as unsafe for exactly the same reasons.
In reality, there are no problems with strcpy whatsoever. gets, on the other hand, is a completely different story, as I said above.
yes, it is dangerous. After 5 years of maintenance, your code will look like this:
int main(void)
{
char *str1 = "abcdefghijklmnop";
{enough lines have been inserted here so as to not have str1 and str2 nice and close to each other on the screen}
char *str2 = malloc(100);
strcpy(str2, str1);
}
at that point, someone will go and change str1 to
str1 = "THIS IS A REALLY LONG STRING WHICH WILL NOW OVERRUN ANY BUFFER BEING USED TO COPY IT INTO UNLESS PRECAUTIONS ARE TAKEN TO RANGE CHECK THE LIMITS OF THE STRING. AND FEW PEOPLE REMEMBER TO DO THAT WHEN BUGFIXING A PROBLEM IN A 5 YEAR OLD BUGGY PROGRAM"
and forget to look where str1 is used and then random errors will start happening...
Your code is not safe. The return value of malloc is unchecked, if it fails and returns 0 the strcpy will give undefined behavior.
Besides that, I see no problem other than that the example basically does not do anything.
strcpy isn't dangerous as far as you know that the destination buffer is large enough to hold the characters of the source string; otherwise strcpy will happily copy more characters than your target buffer can hold, which can lead to several unfortunate consequences (stack/other variables overwriting, which can result in crashes, stack smashing attacks & co.).
But: if you have a generic char * in input which hasn't been already checked, the only way to be sure is to apply strlen to such string and check if it's too large for your buffer; however, now you have to walk the entire source string twice, once for checking its length, once to perform the copy.
This is suboptimal, since, if strcpy were a little bit more advanced, it could receive as a parameter the size of the buffer and stop copying if the source string were too long; in a perfect world, this is how strncpy would perform (following the pattern of other strn*** functions). However, this is not a perfect world, and strncpy is not designed to do this. Instead, the nonstandard (but popular) alternative is strlcpy, which, instead of going out of the bounds of the target buffer, truncates.
Several CRT implementations do not provide this function (notably glibc), but you can still get one of the BSD implementations and put it in your application. A standard (but slower) alternative can be to use snprintf with "%s" as format string.
That said, since you're programming in C++ (edit I see now that the C++ tag has been removed), why don't you just avoid all the C-string nonsense (when you can, obviously) and go with std::string? All these potential security problems vanish and string operations become much easier.
The only way malloc may fail is when an out-of-memory error occurs, which is a disaster by itself. You cannot reliably recover from it because virtually anything may trigger it again, and the OS is likely to kill your process anyway.
As you point out, under constrained circumstances strcpy isn't dangerous. It is more typical to take in a string parameter and copy it to a local buffer, which is when things can get dangerous and lead to a buffer overrun. Just remember to check your copy lengths before calling strcpy and null terminate the string afterward.
Aside for potentially dereferencing NULL (as you do not check the result from malloc) which is UB and likely not a security threat, there is no potential security problem with this.
gets() is always unsafe; the other functions can be used safely.
gets() is unsafe even when you have full control on the input -- someday, the program may be run by someone else.
The only safe way to use gets() is to use it for a single run thing: create the source; compile; run; delete the binary and the source; interpret results.

Resources