Why can I use a char pointer without malloc? - c

I've programmed something similar and I'm wondering why it works...
char* produceAString(void){
char* myString;
while(somethingIsGoingOn){
//fill myString with a random amountof chars
}
return myString;
}
The theory tells me that I should use malloc to allocate space, when I'm using pointers. But in this case I don't know how much space I need for myString, therefore I just skipped it.
But why does this work? Is it just bad code, which luckily worked for me, or is there something special behind char pointers?

It worked due to pure chance. It might not work the next time you try it. Uninitialized pointers can point anywhere in memory. Writing to them can cause an instant access violation, or a problem that will manifest later, or nothing at all.

This is generally bad code, yes. Also whatever compiler you use is probably not very intelligent or warnings turned off since they usually throw an error or at least a warning like "variable used uninitialized" which is completely true.
You are in ( bad ) luck that when the code runs the point is garbage and somehow the OS allows the write ( or read ), probably you are running in debug mode?
My personal experience is that in some cases its predictable what the OS will do, but you should never ever rely on those things, one example is if you build with MinGW in debug mode, the unintialized values are usualy follow a pattern or zero, in release build its usually complete random junk.
Since you "point to a memory location" it must point to a valid location whenever it is an another variable ( pointing to another variable ) or allocating space at run time ( malloc ) what you are doing is neither so you basically read/write a random memory block and because of some black magic the app doesn't crash because of this, are you running on windows? Windows 2000 or XP? since I know those are not as restrictive as windows since Vista, I remember that back in the day I did similar thing under Windows XP and nothing happened when it was supposed to crash.
So generally, allocate or point to a memory block you want to use before you use the pointer in case you dont know how much memory you need use realloc or just simply figure out a good strategy that has the smallest footprint for your specific case.
One way to see what C actually does is to change this line
char* myString;
into
char* myString=(char*)0;
and break before that line with a debugger and watch the myString variable, it'll junk and if it intalizes the variable it'll be 0 then the rest of your code fail with access violation because you point "nowhere".
The normal operation would be
char* myString=(char*)malloc(125); // whatever amount you want

Related

C Program crashes when adding an extra int

I am new to C and using Eclipse IDE
The following code works fine:
#include <stdio.h>
#include <stdlib.h>
#include <String.h>
int main()
{
char *lineName;
int stationNo;
int i;
while (scanf("%s (%d)", lineName, &stationNo)!=EOF) {
for (i=0; i<5 ; i++ ){
printf("%d %d",i);
}
}
return 0;
}
Input:
Green (21)
Red (38)
Output:
Green (21)
Red (38)
0123401234
However, when simply add a new int:
#include <stdio.h>
#include <stdlib.h>
#include <String.h>
int main()
{
char *lineName;
int stationNo;
int i,b=0;
while (scanf("%s (%d)", lineName, &stationNo)!=EOF) {
printf("%d",b);
for (i=0; i<5 ; i++ ){
printf("%d",i);
}
}
return 0;
}
The program will crash with the same input.
Can anybody tell me why?
You said your first program "works", but it works only by accident. It's like a car zooming down the road with no lugnuts holding on the front wheels, only by some miracle they haven't fallen off — yet.
You said
char *lineName;
This gives you a pointer variable that can point to some characters, but it doesn't point anywhere yet. The value of this pointer is undefined. It's sort of like saying "int i" and asking what the value of i is.
Next you said
scanf("%s (%d)", lineName, &stationNo)
You're asking scanf to read a line name and store the string in the memory pointed to by lineName. But where is that memory? We have no idea whatsoever!
The situation with uninitialized pointers is a little trickier to think about because, as always, with pointers we have to distinguish between the value of the pointer as opposed to the data at the memory which the pointer points to. Earlier I mentioned saying int i and asking what the value of i is. Now, there's going to be some bit pattern in i — it might be 0, or 1, or -23, or 8675309.
Similarly, there's going to be some bit pattern in lineName — it might "point at" memory location 0x00000000, or 0xffe01234, or 0xdeadbeef. But then the questions are: is there actually any memory at that location, and do we have permission to write to it, and is it being used for anything else? If there is memory and we do have permission and it's not being used for anything else, the program might seem to work — for now. But those are three pretty big ifs! If the memory doesn't exist, or if we don't have permission to write to it, the program is probably going to crash when it tries. And if the memory is being used for something else, something's going to go wrong — if not now, then later — when we ask scanf to write its string there.
And, really, if what we care about is writing programs that work (and that work for the right reasons), we don't have to ask any of these questions. We don't have to ask where lineName points when we don't initialize it, or whether there's any memory there, or if we have permission to write to it, or if it's being used for something else. Instead, we should simply, actually, initialize lineName! We should explicitly make it point to memory that we do own and that we are allowed to write to and that isn't being used for anything else!
There are several ways to do this. The easiest is to use an array for lineName, not a pointer:
char lineName[20];
Or, if we have our hearts set on using a pointer, we can call malloc:
char *lineName = malloc(20);
However, if we do that, we have to check to make sure malloc succeeded:
if(lineName == NULL) {
fprintf(stderr, "out of memory!\n");
exit(1);
}
If you make either of those changes, your program will work.
...Well, actually, we're still in a situation where your program will seem to work, even though it still has another, pretty serious, lurking problem. We've allocated 20 characters for lineName, which gives us 19 actual characters, plus the trailing '\0'. But we don't know what the user is going to type. What if the user types 20 or more characters? That will cause scanf to write more than 20 characters to lineName, off past the end of what lineName's memory is allowed to hold, and we're back in the situation of writing to memory that we don't own and that might be in use for something else.
One solution is to make lineName bigger — declare it as char lineName[100], or call malloc(100). But that just moves the problem around — now we have to worry about the (perhaps smaller) chance that the user will type 100 or more characters. So the next thing to do is to tell scanf not to write more to lineName than we've arranged for it to hold. This is actually pretty simple. If lineName is still set up to hold 20 characters, just call
scanf("%19s (%d)", lineName, &stationNo)
That format specifier %19s tells scanf that it's only allowed to read and store a string of up to 19 characters long, leaving one byte free for the terminating '\0' that it's also going to add.
Now, I've said a lot here, but I realize I haven't actually gotten around to answering the question of why your program went from working to crashing when you made that seemingly trivial, seemingly unrelated change. This ends up being a hard question to answer satisfactorily. Going back to the analogy I started this answer with, it's like asking why you were able to drive the car with no lugnuts to the store with no problem, but when you tried to drive to grandma's house, the wheels fell off and you crashed into a ditch. There are a million possible factors that might have come into play, but none of them change the underlying fact that driving a car with the wheels not fastened on is a crazy idea, that's not guaranteed to work at all.
In your case, the variables you're talking about — lineName, stationNo, i, and then b — are all local variables, typically allocated on the stack. Now, one of the characteristics of the stack is that it gets used for all sorts of stuff, and it never gets cleared between uses. So if you have an uninitialized local variable, the particular random bits that it ends up containing depend on whatever was using that piece of the stack last time. If you change your program slightly so that different functions get called, those different functions may leave different random values lying around on the stack. Or if you change your function to allocate different local variables, the compiler may place them in different spots on the stack, meaning that they'll end up picking up different random values from whatever was there last time.
Anyway, somehow, with the first version of your program, lineName ended up containing a random value that corresponded to a pointer that pointed to actual memory which you could get away with writing to. But when you added that fourth variable b, things moved around just enough that lineName ended up pointing to memory that didn't exist or that you didn't have permission to write to, and your program crashed.
Make sense?
And now, one more thing, if you're still with me. If you stop and think, this whole thing might be kind of unsettling. You had a program (your first program) that seemed to work just fine, but actually had a decently horrible bug. It wrote to random, unallocated memory. But when you compiled it you got no fatal error messages, and when you ran it there was no indication that anything was amiss. What's up with that?
The answer, as a couple of the comments alluded to, involves what we call undefined behavior.
It turns out that there are three kinds of C programs, which we might call the good, the bad, and the ugly.
Good programs work for the right reasons. They don't break any rules, they don't do anything illegal. They don't get any warnings or error messages when you compile them, and when you run them, they just work.
Bad programs break some rule, and the compiler catches this, and issues a fatal error message, and declines to produce a broken program for you to try to run.
But then there are the ugly programs, that engage in undefined behavior. These are the ones that break a different set of rules, the ones that, for various reasons, the compiler is not obliged to complain about. (Indeed the compiler may or may not even be able to detect them). And programs that engage in undefined behavior can do anything.
Let's think about that last point a little more. The compiler is not obligated to generate error messages when you write a program that uses undefined behavior, so you might not realize you've done it. And the program is allowed to do anything, including work as you expect. But then, since it's allowed to do anything, it might stop working tomorrow, seemingly for no reason at all, either because you made some seemingly innocuous change to it, or merely because you're not around to defend it, as it quietly runs amok and deletes all your customer's data.
So what are you supposed to do about this?
One thing is to use a modern compiler if you can, and turn on its warnings, and pay attention to them. (Good compilers even have an option called "treat warnings as errors", and programmers who care about correct programs usually turn this option on.) Even though, as I said, they're not required to, compilers are getting better and better at detecting undefined behavior and warning you about it, if you ask them to.
And then the other thing, if you're going to be doing a lot of C programming, is to take care to learn the language, what you're allowed to do, what you're not supposed to do. Make a point of writing programs that work for the right reasons. Don't settle for a program that merely seems to work today. And if someone points out that you're depending on undefined behavior, don't say, "But my program works — why should I care?" (You didn't say this, but some people do.)

Using Malloc to allocate array size in C

In a program I'm writing, I have an array of accounts(account is a struct I made). I need this visible to all functions and threads in my program. However, I won't know the size it has to be until the main function figures that out. so I created it with:
account *accounts;
and try to allocate space to it in main with this:
number of accounts = 100 //for example
accounts = (account*)malloc(numberOfAccounts * sizeof (account));
However, it appears to be sizing the array larger than it needs to be. For example, accounts[150] exists, and so on.
Is there anything I am doing wrong? How can I get the size of accounts to be exactly 100?
Thanks
You can't do that - malloc() doesn't provide any guarantees about how much memory it actually allocates (except that if it succeeds it will return a pointer to at least as much as you requested). If you access anything outside the range you asked for, it causes undefined behaviour. That means it might appear to work, but there's nothing you can do about that.
BTW, in C you don't need to typecast the returned value from malloc().
Even though it may look like it, accounts[150] does not truly exist.
So why does your program continue to run? Well, that's because even though accounts[150] isn't a real element, it lies within the memory space your program is allowed to access.
C contains no runtime checking of indexes - it just calculates the appropriate address and accesses that. If your program doesn't have access to that memory address, it'll crash with a segmentation fault (or, in Windows terms, an access violation). If, on the other hand, the program is allowed to access that memory address, then it'll simply treat whatever is at that address as an account.
If you try to modify that, almost anything can happen - depending on a wide variety of factors, it could modify some other variables in your program, or given some very unlucky circumstances, it could even modify the program code itself, which could lead to all kinds of funky behavior (including a crash). It is even possible that no side effects can ever be observed if malloc (for whatever reason) allocated more memory than you explicitly requested (which is possible).
If you want to make sure that such errors are caught at runtime, you'll have to implement your own checking and error handling.
I can't seem to find anything wrong with what you provide. If you have a struct, e.g.:
struct account{
int a,b,c,d;
float e,f,g,h;
}
Then you can indeed create an array of accounts using:
struct account *accounts = (struct account *) malloc(numAccounts * sizeof(account));
Note that for C the casting of void* (retun type of malloc) is not necessary. It will get upcasted automatically.
[edit]
Ahhh! I see your problem now! Right. Yes you can still access accounts[150], but basically what happens is that accounts will point to some memory location. accounts[150] simply points 150 times the size of the struct further. You can get the same result by doing this:
*(accounts + 150), which basically says: Give me the value at location accounts+150.
This memory is simply not reserved, and therefore causes undefined behavior. It basically comes down to: Don't do this!
Your code is fine. When you say accounts[150] exits do you mean exits or exists?
If your code is crashing when accessing accounts[150] (assuming numberOfAccounts = 100) then this is to be expected you are accessing memory outside that you allocated.
If you meant exists it doesn't really, you are just walking off the end of the array and the pointer you get back is to a different area of memory than you allocated.
Size of accounts is exacly for 100 structures from malloc result pointer starts if this address is non-zero.
Just because it works doesn't mean it exists as part of the memory you allocated, most likely it belongs to someone else.
C doesn't care or know that your account* came from malloc, all it knows is that is a memory pointer to something that is sizeof(account).
accounts[150] accesses the 150th account-sized object from the value in the pointer, which may be random data, may be something else, depending on your system it may even be your program.
The reason things seem to "work" is that whatever is there happens to be unimportant, but that might not always be the case.

Strange C program behaviour

I really have a strange situation. I'm making a Linux multi-threaded C application using all the nitty-gritty memory stuff involving char* strings, and I'm stuck in a really odd position.
Basically, what happens is, using POSIX threads, I'm reading and writing to a two-dimensional char array, but it has unusual errors. You have my word that I have done extensive testing on what they are individually accessing, and they don't read another threads' data, let alone write to others. When the last thread that works with the array changes its parts of the array, it seems to change the last few chars of its arrays and put characters in there that I don't know how they could possibly have got in there; mainly ones that print as black diamond question mark things.
I use valgrind and GDB, and they don't really help. As far as I can tell, all should work. Valgrind tells me I'm not freeing everything.
I know all that sounds fairly undescriptive, but here's where it gets weird: if I compile my program with electric fence, then it all works. Valgrind tells me I'm freeing everything and that there's no memory errors at all, just as I thought it should have been. It works absolutely flawlessly!
So, I guess my question is, why does my program work fine when compiled with electric fence?
(And also as a side question, what steps need to be taken to ensure 100% "thread-safe" code?)
Electric fence allocates pages, I've heard at least two, for each allocation you make. It uses the OSs paging mechanisms to check for accessing outside of the allocation. This means that if you want a new 14-character array you end up with a whole new page to hold it, say 8k. Most of the page is unused but you can detect errant accesses by watching which pages get used. I can imagine that on account of having so much extra space if a problem gets past the guards you wouldn't see an error.
If you don't have a bad access but rather corruption due to two threads not locking correctly efence won't detect it. efence also likely keeps pointers to allocated memory, fooling valgrind into reporting no problems. You should run valgrind with the --show-reachable=yes flag and see what's unclaimed at the end of your run.
It sounds like you're trashing your data structures. Try putting canaries at the beginning and end of your arrays, open up GDB, then put write breakpoints on the canaries.
A canary is a const value that should never be changed - its only purpose is to detect memory corruption should it be overwritten. For example:
int the_size_i_need;
char* array = malloc((the_size_i_need + 2) * sizeof(char));
array[0] = 0xAA;
array[the_size_i_need+1] = 0xFF;
char* real_array = array+1;
/* Do some stuff here using real_array */
if (array[0] != 0xAA || array[the_size_i_need+1] != 0xFF) {
printf("Oh noes! We're corrupted\n");
}
Oh god, I'm so sorry. I've worked it out: there was a variable given to the thread for each to put their answer into, but I didn't define it as zero, and it contains 2 funny chars. Maybe the electric fence malloc() allocates 'zeroed' memory like calloc(), but standard malloc() of course doesn't.

Why freed struct in C still has data?

When I run this code:
#include <stdio.h>
typedef struct _Food
{
char name [128];
} Food;
int
main (int argc, char **argv)
{
Food *food;
food = (Food*) malloc (sizeof (Food));
snprintf (food->name, 128, "%s", "Corn");
free (food);
printf ("%d\n", sizeof *food);
printf ("%s\n", food->name);
}
I still get
128
Corn
although I have freed food. Why is this? Is memory really freed?
When you free 'food', you are saying you are done with it. However, the pointer food still points to the same address, and that data is still there (it would be too much overhead to have to zero out every bit of memory that's freed when not necessary)
Basically it's because it's such a small example that this works. If any other malloc calls were in between the free and the print statements, there's a chance that you wouldn't be seeing this, and would most likely crash in some awful way. You shouldn't rely on this behavior.
There is nothing like free food :)
When you "free" something, it means that the same space is again ready to be used by something else. It does NOT mean filling it up by garbage.
Secondly, the pointer value has not changed -- if you are seriously coding you should set a pointer to NULL once you have freed it so that potential junk accesses like this do not happen.
Freeing memory doesn't necessarily overwrite the contents of it.
sizeof is a compile-time operation, so memory allocation won't change how it works.
free does not erase memory, it just marks the block as unused. Even if you allocate a few hundred megabytes of memory, your pointer may still not be overwritten (modern computers have lots of RAM). However, after you free memory, you can no longer depend on its value.
See if your development environment has a memory allocation debugging setting -- some have settings to overwrite blocks with something like 0xDEADBEEF when you free them.
Also, you may wish to adopt the habit of setting your pointer to NULL immediately after calling free (to help encourage your program to crash early and loudly).
free tells the memory allocator that it can reuse that memory block, nothing else. It doesn't overwrite the block with zeros or anything - luckily, because that could be quite an expensive operation! What it does do is make any further dereferencing of the pointer undefined, but 'undefined' behaviour can very well mean 'do the same thing as before' - you just can't rely on it. In another compiler, another runime, or under other conditions it might throw an exception, or terminate the program, or corrupt other data, so... just DON'T.
There's no such thing as "struct has data" or "struct doesn't have data" in C. In your program you have a pointer that points somewhere in memory. As long as this memory belongs to your application (i.e. not returned to the system) it will always contain something. That "something" might be total garbage, or it might look more or less meaningful. Moreover, memory might contain garbage that appears as something meaningful (remains of the data previously stored there).
This is exactly what you observe in your experiment. Once you deallocated the struct, the memory formerly occupied by it officially contains garbage. However, that garbage might still resemble bits and pieces of the original data stored in that struct object at the moment it was deallocated. In your case you got lucky, so the data looks intact. Don't count on it though - next time it might get totally destroyed.
As far as C language is concerned, what you are doing constitutes undefined behavior. You are not allowed to check whether a deallocated struct "has data" or not. The "why" question you are asking does not really exist in the realm of C language.
In some systems freeing memory will unmap it from the address space and you will get a core dump or equivalent if you try to access it after unallocating it.
In win32 systems (at least up through XP) this is specifically not the case. Microsoft made their memory subsystem on 32 bit Windows purposely linger memory blocks to maintain compatibility with well known MS-DOS applications that used memory after freeing it.
In the MS-DOS programming model there is no concept of mapping or process space so these types of bugs didn't show up as program failures until they were executed as DOS-mode programs under Windows95.
That behavior persisted for 32-bit Windows for over a decade. It may change now that legacy compatibility is being withdrawn in systems such as Vista and 7.

string overflow detection in C

We are using DevPartners boundchecker for detecting memory leak issues. It is doing a wonderful job, though it does not find string overflows like the following
char szTest [1] = "";
for (i = 0; i < 100; i ++) {
strcat (szTest, "hi");
}
Question-1: Is their any way, I can make BoundsChecker to detect this?
Question-2: Is their any other tool that can detect such issues?
I tried it in my devpartner (msvc6.6) (devpartner 7.2.0.372)
I confirm your observed behavior.
I get an access violation after about 63 passes of the loop.
What does compuware have to say about the issue?
CppCheck will detect this issue.
One option is to simply ban the use of string functions that don't have information about the destination buffer. A set of macros like the following in a universally included header can be helpful:
#define strcpy strcpy_is_banned_use_strlcpy
#define strcat strcat_is_banned_use_strlcat
#define strncpy strncpy_is_banned_use_strlcpy
#define strncat strncat_is_banned_use_strlcat
#define sprintf sprintf_is_banned_use_snprintf
So any attempted uses of the 'banned' routines will result in a linker error that also tells you what you should use instead. MSVC has done something similar that can be controlled using macros like _CRT_SECURE_NO_DEPRECATE.
The drawback to this technique is that if you have a large set of existing code, it can be a huge chore to get things moved over to using the new, safer routines. It can drive you crazy until you've gotten rid of the functions considered dangerous.
valgrind will detect writing past dynamically allocated data, but I don't think it can do so for automatic arrays like in your example. If you are using strcat, strcpy, etc., you have to make sure that the destination is big enough.
Edit: I was right about valgrind, but there is some hope:
Unfortunately, Memcheck doesn't do bounds checking on static or stack arrays. We'd like to, but it's just not possible to do in a reasonable way that fits with how Memcheck works. Sorry.
However, the experimental tool Ptrcheck can detect errors like this. Run Valgrind with the --tool=exp-ptrcheck option to try it, but beware that it is not as robust as Memcheck.
I haven't used Ptrcheck.
You may find that your compiler can help. For example, in Visual Studio 2008, check the project properties - C/C++ - Code Generation page. Theres a "Buffer Security Check" option.
My guess would be that it reserves a bit of extra memory and writes a known sequence in there. If that sequence gets modified, it assumes a buffer overrun. I'm not sure, though - I remember reading this somewhere, but I don't remember for certain if it was about VC++.
Given that you've tagged this C++, why use a pointer to char at all?
std::stringstream test;
std::fill_n(std::ostream_iterator<std::string>(test), 100, "hi");
If you enable the /RTCs compiler switch, it may help catch problems like this. With this switch on, the test caused an access violation when running the strcat only one time.
Another useful utility that helps with problems like this (more heap-oriented than stack but extremely helpful) is application verifier. It is free and can catch a lot of problems related to heap overflow.
An alternative: our Memory Safety Checker.
I think it will handle this case.
The problem was that by default, the API Validation subsystem is not enabled, and the messages you were interested in come from there.
I can't speak for older versions of BoundsChecker, but version 10.5 has no particular problems with this test. It reports the correct results and BoundsChecker itself does not crash. The test application does, however, because this particular test case completely corrupts the call stack that led to the function where the test code was, and as soon as that function terminated, the application did too.
The results: 100 messages about write overrun to a local variable, and 99 messages about the destination string not being null terminated. Technically, that second message is not right, but BoundsChecker only searches for the null termination within the bounds of the destination string itself, and after the first strcat call, it no longer contains a zero byte within its bounds.
Disclaimer: I work for MicroFocus as a developer working on BoundsChecker.

Resources