I would rather not dump code, but explain my problem. After hours of debugging I managed to understand that at some point in my code, a float value that is not explicitly modified turns HUGE (more than 1e15). I do use a lot of memory in my program (a string array containing 800+ words), other than that though, I have no idea what could cause this.
If anyone has any ideas regarding this, please share. Otherwise, I'll post a pastebin of the
code soon.
EDIT:
Here is the code: http://pastebin.com/vgiZweNq. The problem rests in the next_generation() function, where the sumfit variable goes nuts at random times in the loop.
Also, I've compiled this on linux using -fno-stack-limit and -fstack-check, to avoid stack overflows.
EDIT 2:
I've changed the program to use a dynamically allocated linked list, to further avoid stack overflows. Still, sumfit gets changed to Floatzilla at random points, usually pretty early on.
Cheers!
Since the variable is obviously being modified from an unexpected point, you might want to check some possibilities:
Is it being modified from a different thread or from an interrupt / event handler? If so, is the access properly synchronized to prevent a data race?
Are you doing pointer arithmetic that might be buggy and cause access outside the intended buffer?
Are you casting pointers between types of different sizes?
Especially if you are working on an embedded device: Maybe the memory is full and your stack is overlapping the heap, or the global variables.
More information about the platform this happens on would be helpful.
You're using strcpy on the chrom array, but i don't see where they ever get null terminated.
Maybe I'm just missing it, though.
You've got a huge string array. I reckon you're probably going off the end of it. Keep track of the size of data going into that array.
Related
I am currently using dlmalloc() to see how much faster it can be than the original libc malloc().
However, running free() keeps giving me a segmentation fault...
Does anyone know some logical reasons why this could keep happening?
A segfault inside the memory management functions almost always indicates that you've done something wrong (like overwriting memory beyond the valid bounds) before the call that actually segfaults.
Running your code under Valgrind may help you determine the real source of the problem.
I would be looking first into memory corruption issues. For example, if you allocate N bytes and then write to N+100 of them, you're very likely to corrupt the memory arena.
That's because many implementations keep their housekeeping information (block sizes, list pointers and so on) in-line (between the actual data areas).
Another possibility would be double freeing of blocks which may cause problems if that memory has since been used for some other allocation (especially if your address is now in the middle of a data area rather than at the beginning).
First things first, make sure you're following the rules. Any thing else is undefined behaviour and all bets are off.
You may also want to post the source code for the problem you're having so we can examine it. If you do this, try to reduce it to the smallest example that exhibits the problem. Only the most dedicated SOer (a) will want to look over some 10,000-line behemoth to find your issue .
(a) And I'm certainly not that dedicated :-)
I have a function and i'm accessing a struct's members a lot of times in it.
What I was wondering about is what is the good practice to go about this?
For example:
struct s
{
int x;
int y;
}
and I have allocated memory for 10 objects of that struct using malloc.
So, whenever I need to use only one of the object in a function, I usually create (or is passed as argument) pointer and point it to the required object (My superior told me to avoid array indexing because it adds a calculation when accessing any member of the struct)
But is this the right way? I understand that dereferencing is not as expensive as creating a copy, but what if I'm dereferencing a number of times (like 20 to 30) in the function.
Would it be better if i created temporary variables for the struct variables (only the ones I need, I certainly don't use all the members) and copy over the value and then set the actual struct's value before returning?
Also, is this unnecessary micro optimization? Please note that this is for embedded devices.
This is for an embedded system. So, I can't make any assumptions about what the compiler will do. I can't make any assumptions about word size, or the number of registers, or the cost of accessing off the stack, because you didn't tell me what the architecture is. I used to do embedded code on 8080s when they were new...
OK, so what to do?
Pick a real section of code and code it up. Code it up each of the different ways you have listed above. Compile it. Find the compiler option that forces it to print out the assembly code that is produced. Compile each piece of code with every different set of optimization options. Grab the reference manual for the processor and count the cycles used by each case.
Now you will have real data on which to base a decision. Real data is much better that the opinions of a million highly experience expert programmers. Sit down with your lead programmer and show him the code and the data. He may well show you better ways to code it. If so, recode it his way, compile it, and count the cycles used by his code. Show him how his way worked out.
At the very worst you will have spent a weekend learning something very important about the way your compiler works. You will have examined N ways to code things times M different sets of optimization options. You will have learned a lot about the instruction set of the machine. You will have learned how good, or bad, the compiler is. You will have had a chance to get to know your lead programmer better. And, you will have real data.
Real data is the kind of data that you must have to answer this question. With out that data nothing anyone tells you is anything but an ego based guess. Data answers the question.
Bob Pendleton
First of all, indexing an array is not very expensive (only like one operation more expensive than a pointer dereference, or sometimes none, depending on the situation).
Secondly, most compilers will perform what is called RVO or return value optimisation when returning structs by value. This is where the caller allocates space for the return value of the function it calls, and secretly passes the address of that memory to the function for it to use, and the effect is that no copies are made. It does this automatically, so
struct mystruct blah = func();
Only constructs one object, passes it to func for it to use transparently to the programmer, and no copying need be done.
What I do not know is if you assign an array index the return value of the function, like this:
someArray[0] = func();
will the compiler pass the address of someArray[0] and do RVO that way, or will it just not do that optimisation? You'll have to get a more experienced programmer to answer that. I would guess that the compiler is smart enough to do it though, but it's just a guess.
And yes, I would call it micro optimisation. But we're C programmers. And that's how we roll.
Generally, the case in which you want to make a copy of a passed struct in C is if you want to manipulate the data in place. That is to say, have your changes not be reflected in the struct it self but rather only in the return value. As for which is more expensive, it depends on a lot of things. Many of which change implementation to implementation so I would need more specific information to be more helpful. Though, I would expect, that in an embedded environment you memory is at a greater premium than your processing power. Really this reads like needless micro optimization, your compiler should handle it.
In this case creating temp variable on the stack will be faster. But if your structure is much bigger then you might be better with dereferencing.
I've got a buffer overrun I absolutely can't see to figure out (in C). First of all, it only happens maybe 10% of the time or so. The data that it is pulling from the DB each time doesn't seem to be all that much different between executions... at least not different enough for me to find any discernible pattern as to when it happens. The exact message from Visual Studio is this:
A buffer overrun has occurred in
hub.exe which has corrupted the
program's internal state. Press
Break to debug the program or Continue
to terminate the program.
For more details please see Help topic
'How to debug Buffer Overrun Issues'.
If I debug, I find that it is broken in __report_gsfailure() which I'm pretty sure is from the /GS flag on the compiler and also signifies that this is an overrun on the stack rather than the heap. I can also see the function it threw this on as it was leaving, but I can't see anything in there that would cause this behavior, the function has also existed for a long time (10+ years, albeit with some minor modifications) and as far as I know, this has never happened.
I'd post the code of the function, but it's decently long and references a lot of proprietary functions/variables/etc.
I'm basically just looking for either some idea of what I should be looking for that I haven't or perhaps some tools that may help. Unfortunately, nearly every tool I've found only helps with debugging overruns on the heap, and unless I'm mistaken, this is on the stack. Thanks in advance.
You could try putting some local variables on either end of the buffer, or even sentinels into the (slightly expanded) buffer itself, and trigger a breakpoint if those values aren't what you think they should be. Obviously, using a pattern that is not likely in the data would be a good idea.
While it won't help you in Windows, Valgrind is by far the best tool for detecting bad memory behavior.
If you are debugging the stack, your need to get to low level tools - place a canary in the stack frame (perhaps a buffer filled with something like 0xA5) around any potential suspects. Run the program in a debugger and see which canaries are no longer the right size and contain the right contents. You will gobble up a large chunk of stack doing this, but it may help you spot exactly what is occurring.
One thing I have done in the past to help narrow down a mystery bug like this was to create a variable with global visibility named checkpoint. Inside the culprit function, I set checkpoint = 0; as the very first line. Then, I added ++checkpoint; statements before and after function calls or memory operations that I even remotely suspected might be able to cause an out-of-bounds memory reference (plus peppering the rest of the code so that I had a checkpoint at least every 10 lines or so). When your program crashes, the value of checkpoint will narrow down the range you need to focus on to a handful of lines of code. This may be a bit overkill, I do this sort of thing on embedded systems (where tools like valgrind can't be used) but it should still be useful.
Wrap it in an exception handler and dump out useful information when it occurs.
Does this program recurse at all? If so, I check there to ensure you don't have an infinite recursion bug. If you can't see it manually, sometimes you can catch it in the debugger by pausing frequently and observing the stack.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
So I have a C program. And I don't think I can post any code snippets due to complexity issues. But I'll outline my error, because it's weird, and see if anyone can give any insights.
I set a pointer to NULL. If, in the same function where I set the pointer to NULL, I printf() the pointer (with "%p"), I get 0x0, and when I print that same pointer a million miles away at the end of my program, I get 0x0. If I remove the printf() and make absolutely no other changes, then when the pointer is printed later, I get 0x1, and other random variables in my structure have incorrect values as well. I'm compiling it with GCC on -O2, but it has the same behavior if I take off optimization, so that's not hte problem.
This sounds like a Heisenbug, and I have no idea why it's happening, nor how to fix it. Does anyone who has dealt with something like this in the past have advice on how they approached this kind of problem? I know this may sound kind of vague.
EDIT: Somehow, it works now. Thank you, all of you, for your suggestions.
The debugger told me interesting things - that my variable was getting optimized away. So I rewrote the function so it didn't need the intermediate variable, and now it works with and without the printf(). I have a vague idea of what might have been happening, but I need sleep more than I need to know what was happening.
Are you using multiple threads? I've often found that the act of printing something out can be enough to effectively suppress a race condition (i.e. not remove the bug, just make it harder to spot).
As for how to diagnose/fix it... can you move the second print earlier and earlier until you can see where it's changing?
Do you always see 0x1 later on when you don't have the printf in there?
One way of avoiding the delay/synchronization of printf would be to copy the pointer value into another variable at the location of the first printf and then print out that value later on - so you can see what the value was at that point, but in a less time-critical spot. Of course, as you've got odd value "corruption" going on, that may not be as reliable as it sounds...
EDIT: The fact that you're always seeing 0x1 is encouraging. It should make it easier to track down. Not being multithreaded does make it slightly harder to explain, admittedly.
I wonder whether it's something to do with the extra printf call making a difference to the size of stack. What happens if you print the value of a different variable in the same place as the first printf call was?
EDIT: Okay, let's take the stack idea a bit further. Can you create another function with the same sort of signature as printf and with enough code to avoid it being inlined, but which doesn't actually print anything? Call that instead of printf, and see what happens. I suspect you'll still be okay.
Basically I suspect you're screwing with your stack memory somewhere, e.g. by writing past the end of an array on the stack; changing how the stack is used by calling a function may be disguising it.
If you're running on a processor that supports hardware data breakpoints (like x86), just set a breakpoint on writes to the pointer.
Do you have a debugger available to you? If so, what do the values look like in that? Can you set any kind of memory/hardware breakpoint on the value? Maybe there's something trampling over the memory elsewhere, and the printf moves things around enough to move or hide the bug?
Probably worth looking at the asm to see if there's anything obviously wrong there. Also, if you haven't already, do a full clean rebuild. If the definition of the struct has changed recently, there's a vague change that the compiler could be getting it wrong if the dependency checking failed to correctly rebuild everything it needed to.
Have you tried setting a condition in your debugger which notifies you when that value is modified? Or running it through Valgrind? These are the two major things that I would try, especially Valgrind if you're using Linux. There's no better way to figure out memory errors.
Without code, it's a little hard to help, but I understand why you don't want to foist copious amounts on us.
Here's my first suggestion: use a debugger and set a watchpoint on that pointer location.
If that's not possible, or the bug disappears again, here's my second suggestion.
1/ Start with the buggy code, the one where you print the pointer value and you see 0x1.
2/ Insert another printf a little way back from there (in terms of code execution path).
3/ If it's still 0x1, go back to step 2, moving a little back through the execution path each time.
4/ If it's 0x0, you know where the problem lies.
If there's nothing obvious between the 0x0 printf and the 0x1 printf, it's likely to be corruption of some sort. Without a watchpoint, that'll be hard to track down - you need to check every single stack variable to ensure there's no possibility of overrun.
I'm assuming that pointer is a global since you set it and print it "a million miles away". If it is, lok at the variables you define on either side of it (in the source). They're the ones most likely to be causing overrun.
Another possibility is to turn off the optimization to see if the problem still occurs. We've occasionally had to ship code like that in cases where we couldn't fix the bug before deadlines (we'll always go back and fix it later, of course).
You may think that this is a coincidence that the topic of my question is similar to the name of the forum but I actually got here by googling the term "stack overflow".
I use the OPNET network simulator in which I program using C. I think I am having a problem with big array sizes. It seems that I am hitting some sort of memory allocation limitation. It may have to do with OPNET, Windows, my laptop memory or most likely C language. The problem is caused when I try to use nested arrays with a total number of elements coming to several thousand integers. I think I am exceeding an overall memory allocation limit and I am wondering if there is a way to increase this cap.
Here's the exact problem description:
I basically have a routing table. Let's call it routing_tbl[n], meaning I am supporting 30 nodes (routers). Now, for each node in this table, I keep info. about many (hundreds) available paths, in an array called paths[p]. Again, for each path in this array, I keep the list of nodes that belong to it in an array called hops[h]. So, I am using at least nph integers worth of memory but this table contains other information as well. In the same function, I am also using another nested array that consumes almost 40,000 integers as well.
As soon as I run my simulation, it quits complaining about stack overflow. It works when I reduce the total size of the routing table.
What do you think causes the problem and how can it be solved?
Much appreciated
Ali
It may help if you post some code. Edit the question to include the problem function and the error.
Meanwhile, here's a very generic answer:
The two principal causes of a stack overflow are 1) a recursive function, or 2) the allocation of a large number of local variables.
Recursion
if your function calls itself, like this:
int recurse(int number) {
return (recurse(number));
}
Since local variables and function arguments are stored on the stack, then it will in fill the stack and cause a stack overflow.
Large local variables
If you try to allocate a large array of local variables then you can overflow the stack in one easy go. A function like this may cause the issue:
void hugeStack (void) {
unsigned long long reallyBig[100000000][1000000000];
...
}
There is quite a detailed answer to this similar question.
Somehow you are using a lot of stack. Possible causes include that you're creating the routing table on the stack, you're passing it on the stack, or else you're generating lots of calls (eg by recursively processing the whole thing).
In the first two cases you should create it on the heap and pass around a pointer to it. In the third case you'll need to rewrite your algorithm in an iterative form.
Stack overflows can happen in C when the number of embedded recursive calls is too high. Perhaps you are calling a function from itself too many times?
This error may also be due to allocating too much memory in static declarations. You can switch to dynamic allocations through malloc() to fix this type of problem.
Is there a reason why you cannot use the debugger on this program?
It depends on where you have declared the variable.
A local variable (i.e. one declared on the stack is limited by the maximum frame size) This is a limit of the compiler you are using (and can usually be adjusted with compiler flags).
A dynamically allocated object (i.e. one that is on the heap) is limited by the amount of available memory. This is a property of the OS (and can technically by larger the physical memory if you have a smart OS).
Many operating systems dynamically expand the stack as you use more of it. When you start writing to a memory address that's just beyond the stack, the OS assumes your stack has just grown a bit more and allocates it an extra page (usually 4096Kib on x86 - exactly 1024 ints).
The problem is, on the x86 (and some other architectures) the stack grows downwards but C arrays grow upwards. This means if you access the start of a large array, you'll be accessing memory that's more than a page away from the edge of the stack.
If you initialise your array to 0 starting from the end of the array (that's right, make a for loop to do it), the errors might go away. If they do, this is indeed the problem.
You might be able to find some OS API functions to force stack allocation, or compiler pragmas/flags. I'm not sure about how this can be done portably, except of course for using malloc() and free()!
You are unlikely to run into a stack overflow with unthreaded compiled C unless you do something particularly egregious like have runaway recursion or a cosmic memory leak. However, your simulator probably has a threading package which will impose stack size limits. When you start a new thread it will allocate a chunk of memory for the stack for that thread. Likely, there is a parameter you can set somewhere that establishes the the default stack size, or there may be a way to grow the stack dynamically. For example, pthreads has a function pthread_attr_setstacksize() which you call prior to starting a new thread to set its size. Your simulator may or may not be using pthreads. Consult your simulator reference documentation.