Got a problem which to me make no sense. So here goes:
I have a function that counts how many times a word appears in a file, thus this function return a integer (int). So on another function it uses the "counter". Now for some reason it decided to start launching a stack smashing detected error. I had been testing it for 2 weeks the whole program and it worked to perfection. Now I get that error, which really makes no sense. What in the world is going on? And the error is right there, after the function has counter and it return, it launches the stack smashing detected error.
Edit:
I keep searching, and yes i get a stack smashing detected error when returning a int function. Any ideas? If i take that code out, it does not crash. Really i have no idea
Any suggestion?
Thanks...
May I suggest compiling your program with debugging information and running it under Valgrind? See also this related question.
If you need it, I have posted some hints on using Valgrind in an older answer of mine.
Related
I get segmentation fault on a certain scenario(it is C code with DEC VAX FMS(Forms Management System) calls to get a certain field on a CRT screen - pretty old legacy code). I am on an AIX machine, and have only dbx installed on it. GDB, valgrind etc. are not available.
Here is what I get when I try to debug:
Unreadable instruction at address 0x53484950
I do not know how to proceed from here.
I have tried a few things:
1.
(dbx) up
not that many levels
(dbx) down
not that many levels
(dbx) n
where
Segmentation fault in . at 0x53484950 ($t1)
0x53484950 (???) Unreadable instruction at address 0x53484950
Tried tracei(for machine instructions), dump(dump gives so much output, I am unable to make sense of it) etc. but nothing seems to help.
(dbx) &0x53484950/X
expected variable, found "1397246288"
I am used to getting a stack trace on "where" and going on from there. This is something I have not encountered before, and it appears I am not very good at dbx either. Any help to get to at least the line of code that is causing trouble is appreciated.
Once you have hit a segfault, there is no way to continue, so the n command is not going to do anything. At that point, all you can do is examine the stack and the variables, and that will be meaningless unless you have the source code and can recompile it.
In fact, without the source code, I am not sure how you could possibly proceed with fixing the program. Even if you could "decompile" the program, or at least disassemble the program, the risk of making a mistake when trying to patch the binary in order to fix it is virtually 100%.
I'm sorry. Given the limitations you are working under, I would argue the the problem is insolvable. Without tools such as gdb or valgind, it will be difficult to find the problem, and without the source code, it will be very difficult to fix the problem once you have found it.
Let's say I've got a program foo that allocates and frees memory. I run it like this:
./foo something.foo
It works perfectly, exits with no errors. Now, if I set the first line of the file to #!/path/foo, change the permissions, and run it as ./something.foo, the program runs correctly, but before exiting, I see this:
*** Error in '/path/foo': free(): invalid next size(fast): 0x019e2008 ***
Aborted
I've seen a lot of questions about free(): invalid next sign(fast), all with specific code examples. So I've got two questions:
Why might the error appear when using #!/path/foo instead of ./foo?
What exactly does the error mean - what conditions must be present for it to appear?
Huh, fixed this by changing
some_var = malloc(sizeof(char *));
to
some_var = malloc(CONSTANT);
It means you have heap corruption in your program. The message is telling you how the C library detected the corruption, not how the corruption occurred.
Heap corruption is a particularly insidious bug to track down as it generally does not manifest at the point where the bug occurs, but rather at some later point. Its quite possible for the program to continue to work despite the corruption, meaning it might be a bug that has been present in your code for weeks or months and has nothing to do with any recent changes you are testing and debugging.
The best response to heap corruption is usually a tool like valgrind, which can run along with your program and will often (though not always) be able to pinpoint where the offending code is.
I have a critical bug in my project. When I use gdb to open the .core it shows me something like(I didn't put all the gdb output for ease of reading):
This is very very suspicious, new written part of code ::
0x00000000004579fe in http_chunk_count_loop
(f=0x82e68dbf0, pl=0x817606e8a Address 0x817606e8a out of bounds)
This is very mature part of code which worked for a long time without problem::
0x000000000045c8a5 in packet_handler_http
(f=0x82e68dbf0, pl=0x817606e8a Address 0x817606e8a out of bounds)
Ok now what messes my mind is the pl=0x817606e8a Address 0x817606e8a out of bounds, gdb shows it was already out of bounds before it reached new written code. This make me think the problem caused by function which calls packet_handler_http.
But packet_handler_http is very mature and working for a long time without problem. And this makes me I am misundertanding gdb output.
The problem is with packet_handler_http I guess but because of this was already working code I am confused, am I right with my guess or am I missing something?
To detect "memory errors" you might like to run the program under Valgrind: http://valgrind.org
If having compiled the program with symbols (-g for gcc) you could quite reliably detect "out of bounds" conditions down to the line of code where the error occurrs, as well with the line of code having allocated the memory (if ever).
The problem is with packet_handler_http I guess
That guess is unlikely to be correct: if the packet_handler_http is really receiving invalid pointer, then the corruption has happened "upstream" from it.
This is very mature part of code which worked for a long time without problem
I routinely find bugs in code that worked "without problem" for 10+ years. Also, the corruption may be happening in newly-added code, but causing problems elsewhere. Heap and stack buffer overflows are often just like that.
As alk already suggested, run your executable under Valgrind, or Address Sanitizer (also included in GCC-4.8), and fix any problems they find.
Thanks guys for your contrubition , even gdb says opposite it turn out pointer was good.
There was a part in new code which causes out of bounds problem.
There was line like :: (goodpointer + offset) and this offset was http chunk size and I were taking it from network(data sniffing). And there was kind of attack that this offset were extremely big, which cause integer overflow. And this resulted out of bounds problem.
My conclusions : don't thrust the parameters from network never AND gdb may not always points the parameter correctly at coredump because at the moment of crush things can get messy in stack .
my program is crashing on the second run on this line:
char* temp_directive = (char *)malloc(7);
with this error:
Critical error detected c0000374
Windows has triggered a breakpoint in Maman14.exe.
This may be due to a corruption of the heap, which indicates a bug in Maman14.exe or any of the DLLs it has loaded.
This may also be due to the user pressing F12 while Maman14.exe has focus.
I can't understand why, it always happen on the second run.
I've tried to add free(temp_directive), but it didn't help
anyone famailer with this issue?
http://blogs.msdn.com/b/jiangyue/archive/2010/03/16/windows-heap-overrun-monitoring.aspx
Sounds like you ran off the end of the array earlier in the code, and your memory management isn't picking it up until you try to malloc that memory space.
Found the problem, it was caused from a different realloc . Thanks everyone!
Lets state the conditions where sqlcxt() can cause segmentation fault, I am woking on unix, using ProC for database connections to Oracle database.
My program crashes and the core file shows that the crash is due to the sqlcxt() function
A loadobject was found with an unexpected checksum value.
See `help core mismatch' for details, and run `proc -map'
to see what checksum values were expected and found.
...
dbx: warning: Some symbolic information might be incorrect.
...
t#null (l#1) program terminated by signal SEGV
(no mapping at the fault address)0xffffffffffffffff:
<bad address 0xffffffffffffffff>
Current function is dbMatchConsortium
442 **sqlcxt((void **)0, &sqlctx, &sqlstm, &sqlfpn);**
There is a decent chance that the problem you are having is some sort of pointer-error / memory allocation error in your C code. These things are never easy to find. Some things that you might
try:
See if you can comment out (or #ifdef) out sections of your program and if the problem disappears. If so then you can close in on the bad section
Run your program in a debugger.
Do a code review with somebody else - this will often lead to finding more than one problem (Usually works in my code).
I hope that this helps. Please add more details and I will check back on this question and see if I can help you .
It's probably an allocation error in your program. When I got this kind of behaviour it was always my fault. I develop on Solaris/SPARC and Oracle 10g. Once it was a double free (i.e. I freed the same pointer twice) and the second time I had a core in the Oracle part of the program was when I freed a pointer which was not an allocated memory block.
If you're on Solaris you can try the libumem allocation library (google it for details) to see if the behaviour change.
A solution that worked for me: Delete the c files created by ProC & make(recompile)
Pro c files(*.pc) are 'compiled'/preprocessed in c files and sometimes when 'compiling' them some errors may occur (in my case it wasn't any more space left) and even if the build succeeds I would get a SIGSEGV signal in sqlcxt libclntsh.so when executing them.
pstack & gdb could help you for debugging if that's not the case.