I got a segment error in a object like this:
http_client_reset(struct http_client *client) {
if (client->last_req) {
/* #client should never be NULL, but weather
a valid object, I don't know */
...
}
}
by debugging the core dump file in GDB, the memory address of client is 0x40a651c0. I have tried several times, and the address is the same.
Then I tried the bt command in GDB:
(gdb) bt
#0 0x0804c80e in http_client_reset (
c=<error reading variable: Cannot access memory at address 0x40a651c0>,
c#entry=<error reading variable: Cannot access memory at address 0x40a651bc>)
at http/client.c:170
Cannot access memory at address 0x40a651bc
there is no back trace message, I have greped my source code, and there is only one call on http_client_reset.
How to debug such a bug via only a memory address?
Is there a way to judge a object is valid before access its field(except obj == NULL)?
Never a coredump crash debugging is a 'Black and White' matter.So you would not be able to get an exact answer for the questions pertaining to debugging coredump. However, most coredump will be due to programming errors which can be classified into broad areas. I will provide some of these broad areas and some debugging mechanism - which might help you.
Class of programming error leading to crash
multi-threaded code - check for missing critical section while accessing common data. This can corrupt the data leading to such crash. In your case you can check for http_client pointer, access of this and CRUD - Create/Read/Update and Delete.
Heap Corruption - In most of the cases, this would be a valid pointer and due incorrect handling of heap in another section of code, this may cause the valid pointer to be overwritten. Think of an array in and around the pointer location - ABW etc kind of issues would easily cause this problem.
Stack Corruption - This is very unlikely, but hard to find them. In case you overwrite stack data - similar to array in the above example - but on stack, then the same issue will occur.
Ways to un-earth the coredump root cause
You need understand that - technically coredump is an illegal operation causing un-handled exception leading to crash. Since most of it are related to memory handling, a static-analysis tool - such as kloc/PCLint would capture almost 80% of the issues. Then I would next run on valgrind/purify and would most probably uncover the rest of the issue. Very few issues miss both of them - which would be some sequencing/timing related code - which can be found out with code review.
HTH!
Related
I have a large body of legacy code that I inherited. It has worked fine until now. Suddenly at a customer trial that I cannot reproduce inhouse, it crashes in malloc. I think that I need to add instrumentation e.g on top of malloc I have my own malloc that stores some meta information about each malloc e.g. who has made the malloc call. When it crashes, I can then look up the meta information and see what was happening. I had done something similar years ago but cannot recall it now...I am sure people have come up with better ideas. Will be glad to have inputs.
Thanks
Is memory allocation broken?
Try valgrind.
Malloc is still crashing.
Okay, I'm going to have to assume that you mean SIGSEGV (segmentation fault) is firing in malloc. This is usually caused by heap corruption. Heap corruption, that itself does not cause a segmentation fault, is usually the result of an array access outside of the array's bounds. This is usually nowhere near the point where you call malloc.
malloc stores a small header of information "in front of" the memory block that it returns to you. This information usually contains the size of the block and a pointer to the next block. Needless to say, changing either of these will cause problems. Usually, the next-block pointer is changed to an invalid address, and the next time malloc is called, it eventually dereferences the bad pointer and segmentation faults. Or it doesn't and starts interpreting random memory as part of the heap. Eventually its luck runs out.
Note that free can have the same thing happen, if the block being released or the free block list is messed up.
How you catch this kind of error depends entirely on how you access the memory that malloc returns. A malloc of a single struct usually isn't a problem; it's malloc of arrays that usually gets you. Using a negative (-1 or -2) index will usually give you the block header for your current block, and indexing past the array end can give you the header of the next block. Both are valid memory locations, so there will be no segmentation fault.
So the first thing to try is range checking. You mention that this appeared at the customer's site; maybe it's because the data set they are working with is much larger, or that the input data is corrupt (e.g. it says to allocate 100 elements and then initializes 101), or they are performing things in a different order (which hides the bug in your in-house testing), or doing something you haven't tested. It's hard to say without more specifics. You should consider writing something to sanity check your input data.
Try Asan
AddressSanitizer (aka ASan) is a memory error detector for C/C++. It finds:
Use after free (dangling pointer dereference)
Heap buffer overflow
Stack buffer overflow
Global buffer overflow
Use after return
Use after scope
Initialization order bugs
Memory leaks
Please find the links to know more and how to use it
https://github.com/google/sanitizers/wiki/AddressSanitizer and
https://github.com/google/sanitizers/wiki/AddressSanitizerFlags
I know this is old, but issues like this will continue to exist as long as we have pointers. Although valgrind is the best tool for this purpose, it has a steep learning curve and often the results are too intimidating to understand.
Assuming you are working on some *nux, another tool I can suggest is electricfence. Quote:
Electric Fence helps you detect two common programming bugs:
software that overruns the boundaries of a malloc() memory allocation,
software that touches a memory allocation that has been released by free().
Unlike other malloc() debuggers, Electric Fence will detect read accesses
as well as writes, and it will pinpoint the exact instruction that causes
an error.
Usage is amazingly simple. Just link your code with an additional library lefence
When you run the application, a corefile will be generated when memory is corrupted, instead of when corrupted memory is used.
Recently, I am facing a - to me - strange behavior in my embedded software.
What I got: Running a 32 bit AVR32 controller, starting the program from an external SDRAM, as the file size is too big to start it directly from the micro-controller flash. Due to the physical memory map, the memory areas are split between:
stack (start at 0x1000, length of 0xF000) ( < 0x1000 is protected by the MPU)
EBI SDRAM (start at 0xD0000000, length of 0x00400000).
What happens: Unfortunately I got an exception, which is not reproducible. Looking at my given stack trace, the following event irregular occurs:
Name: Bus error data fetch - Event source: Data bus - Stored Return Address: First non-completed instruction
Additionally, the stack pointer has a valid value, whereas the address where the exception occurs (last entry point for fetching instructions), points into the memory nirvana (e.g. 0x496e6372, something around 0x5..., 0x6....). I guess, this has to be the "First non-completed instruction", the manual is talking about. However, the line in my source code is always the same: accessing a member function from a data array via pointer.
if(mSomeArray[i])
{
mSomeArray[i]->someFunction(); <-- Crash
}
The thing is: adding or deleting other source code makes the event disappear and return again.
What I thought about: Something is corrupting my memory (mapping). What kinds of errors are possible for this?
A buffer overflow?
The SDRAM controller could be turned off, so it loses some data. That is not impossible, but rather improbably
The stack is big enough, I already checked this with a watermark
The Data Bus Rate and AVR clock are set correctly
How to solve this: More assert? Unfortunately I cannot debug this with AVRStudio. Anyone a hint or idea? Or am I missing something obvious?
Edit:
Mentioned approaches from users:
Check for addresses of function pointer and array entries
Overwrite of stack array
Not properly written interrupts
Not initialized pointers
Check for array access via i at crash case
use exception handler address for illegal memory access
use snprintf instead of sprintf
Late appendix to the thread: the issue was a wrong array access (wrong index was set) in an old software module, that had nothing to do with my modules. I found this by accident, it was a curiosity that it didn't appear earlier and it took me quite a while to find the line of code. I mark the only given answer as correct solution.
Thank you all for your input.
Take care (of your software ;))
Here are some ideas:
Check 'i' to make sure it is within the array bounds.
Check the address of the function pointer that is about to be called. It should have an address within the SDRAM.
See if the chip has an exception handler address it will jump to when it accesses illegal memory. Once you are there, output some debug data
If your debugger allows, set a breakpoint on someFunction() when it is written. This would catch some other function when it overwrites the function pointer.
In one of our first CS lectures on security we were walked through C's issue with not checking alleged buffer lengths and some examples of the different ways in which this vulnerability could be exploited.
In this case, it looks like it was a case of a malicious read operation, where the application just read out however many bytes of memory
Am I correct in asserting that the Heartbleed bug is a manifestation of the C buffer length checking issue?
Why didn't the malicious use cause a segmentation fault when it tried to read another application's memory?
Would simply zero-ing the memory before writing to it (and then subsequently reading from it) have caused a segmentation fault? Or does this vary between operating systems? Or between some other environmental factor?
Apparently exploitations of the bug cannot be identified. Is that because the heartbeat function does not log when called? Otherwise surely any request for a ~64k string is likely to be malicious?
Am I correct in asserting that the Heartbleed bug is a manifestation of the C buffer length checking issue?
Yes.
Is the heartbleed bug a manifestation of the classic buffer overflow exploit in C?
No. The "classic" buffer overflow is one where you write more data into a stack-allocated buffer than it can hold, where the data written is provided by the hostile agent. The hostile data overflows the buffer and overwrites the return address of the current method. When the method ends it then returns to an address containing code of the attacker's choice and starts executing it.
The heartbleed defect by contrast does not overwrite a buffer and does not execute arbitrary code, it just reads out of bounds in code that is highly likely to have sensitive data nearby in memory.
Why didn't the malicious use cause a segmentation fault when it tried to read another application's memory?
It did not try to read another application's memory. The exploit reads memory of the current process, not another process.
Why didn't the malicious use cause a segmentation fault when it tried to read memory out of bounds of the buffer?
This is a duplicate of this question:
Why does this not give a segmentation violation fault?
A segmentation fault means that you touched a page that the operating system memory manager has not allocated to you. The bug here is that you touched data on a valid page that the heap manager has not allocated to you. As long as the page is valid, you won't get a segfault. Typically the heap manager asks the OS for a big hunk of memory, and then divides that up amongst different allocations. All those allocations are then on valid pages of memory as far as the operating system is concerned.
Dereferencing null is a segfault simply because the operating system never makes the page that contains the zero pointer a valid page.
More generally: the compiler and runtime are not required to ensure that undefined behaviour results in a segfault; UB can result in any behaviour whatsoever, and that includes doing nothing. For more thoughts on this matter see:
Can a local variable's memory be accessed outside its scope?
For both me complaining that UB should always be the equivalent of a segfault in security-critical code, as well as some pointers to a discussion on static analysis of the vulnerability, see today's blog article:
http://ericlippert.com/2014/04/15/heartbleed-and-static-analysis/
Would simply zero-ing the memory before writing to it (and then subsequently reading from it) have caused a segmentation fault?
Unlikely. If reading out of bounds doesn't cause a segfault then writing out of bounds is unlikely to. It is possible that a page of memory is read-only, but in this case it seems unlikely.
Of course, the later consequences of zeroing out all kinds of memory that you should not are seg faults all over the show. If there's a pointer in that zeroed out memory that you later dereference, that's dereferencing null which will produce a segfault.
does this vary between operating systems?
The question is vague. Let me rephrase it.
Do different operating systems and different C/C++ runtime libraries provide differing strategies for allocating virtual memory, allocating heap memory, and identifying when memory access goes out of bounds?
Yes; different things are different.
Or between some other environmental factor?
Such as?
Apparently exploitations of the bug cannot be identified. Is that because the heartbeat function does not log when called?
Correct.
surely any request for a ~64k string is likely to be malicious?
I'm not following your train of thought. What makes the request likely malicious is a mismatch between bytes sent and bytes requested to be echoed, not the size of the data asked to be echoed.
A segmentation fault does not occur because the data accessed is that immediately adjacent to the data requested, and is generally within the memory of the same process. It might cause an exception if the request were sufficiently large I suppose, but doing that is not in the exploiter's interest, since crashing the process would prevent them obtaining the data.
For a clear explanation, this XKCD comic is hard to better:
Earlier I encountered a problem with dynamic memory in C (visual studio) .
I had a more or less working program that threw a run-time error when freeing one of the buffers. It was a clear memory corruption, the program wrote over the end of the buffer.
My problem is, that it was very time consuming to track down. The error was thrown way down after the corruption, and i had to manually debug the entire run to find when is the buffer end overwritten.
Is there any tool\ way to assist in tracking down this issue? if the program would have crashed immediately i would have found the problem a lot faster...
an example of the issue:
int *pNum = malloc(10 * sizeof(int));
// ||
// \/
for(int i = 0; i < 13; i++)
{
pNum[i] = 3;
}
// error....
free(pNum);
I use "data breakpoints" for that. In your case, when the program crashes, it might first complain like this:
Heap block at 00397848 modified at 0039789C past requested size of 4c
Then, start your program again, and set a data breakpoint at address 0039789C. When the code writes to that address, the execution will stop. It often happens that i find the bug immediately at this point.
If your program allocates and deallocates memory repeatedly, and it happens to be at this exact address, just disable deallocations:
_CrtSetDbgFlag(_CrtSetDbgFlag(_CRTDBG_REPORT_FLAG) | _CRTDBG_DELAY_FREE_MEM_DF);
I use pageheap. This is a tool from Microsoft that changes how the allocator works. With pageheap on, when you call malloc, the allocation is rounded up to the nearest page(a block of memory), and an additional page of virtual memory that is set to no-read/no-write is placed after it. The dynamic memory you allocate is aligned so that the end of your buffer is just before the end of the page before the virtual page. This way, if you go over the edge of your buffer, often by a single byte, the debugger can catch it easily.
Is there any tool\ way to assist in tracking down this issue?
Yes, that's precisely the type of error which static code analysers try to locate. e.g. splint/PC-Lint
Here's a list of such tools:
http://en.wikipedia.org/wiki/List_of_tools_for_static_code_analysis
Edit: In trying out splint on your code snippet I get the following warning:
main.c:9:2: Possible out-of-bounds store: pnum[i]
Presumably this warning would have assisted you.
Our CheckPointer tool can help find memory management errors. It works with GCC 3/4 and Microsoft dialects of C.
Many dynamic checkers only catch accesses outside of an object, and then only if the object is heap allocated. CheckPointer will find memory access errors inside a heap-allocated object; it is illegal to access off the end of a field in a struct regardless of the field type; most dynamic checkers cannot detect such errors. It will also find accesses off the edge of locals.
Lets state the conditions where sqlcxt() can cause segmentation fault, I am woking on unix, using ProC for database connections to Oracle database.
My program crashes and the core file shows that the crash is due to the sqlcxt() function
A loadobject was found with an unexpected checksum value.
See `help core mismatch' for details, and run `proc -map'
to see what checksum values were expected and found.
...
dbx: warning: Some symbolic information might be incorrect.
...
t#null (l#1) program terminated by signal SEGV
(no mapping at the fault address)0xffffffffffffffff:
<bad address 0xffffffffffffffff>
Current function is dbMatchConsortium
442 **sqlcxt((void **)0, &sqlctx, &sqlstm, &sqlfpn);**
There is a decent chance that the problem you are having is some sort of pointer-error / memory allocation error in your C code. These things are never easy to find. Some things that you might
try:
See if you can comment out (or #ifdef) out sections of your program and if the problem disappears. If so then you can close in on the bad section
Run your program in a debugger.
Do a code review with somebody else - this will often lead to finding more than one problem (Usually works in my code).
I hope that this helps. Please add more details and I will check back on this question and see if I can help you .
It's probably an allocation error in your program. When I got this kind of behaviour it was always my fault. I develop on Solaris/SPARC and Oracle 10g. Once it was a double free (i.e. I freed the same pointer twice) and the second time I had a core in the Oracle part of the program was when I freed a pointer which was not an allocated memory block.
If you're on Solaris you can try the libumem allocation library (google it for details) to see if the behaviour change.
A solution that worked for me: Delete the c files created by ProC & make(recompile)
Pro c files(*.pc) are 'compiled'/preprocessed in c files and sometimes when 'compiling' them some errors may occur (in my case it wasn't any more space left) and even if the build succeeds I would get a SIGSEGV signal in sqlcxt libclntsh.so when executing them.
pstack & gdb could help you for debugging if that's not the case.