While debugging a crash using GDB, I found the program crashes in ASSERT(). The odd thing is that the pointer contains 0x0 which points to valid data.
Sample code:
#define MAX_NUM 10;
...
...
assert(x->y != NULL);
assert(x->y->z < MAX_NUM); <-- Crashes here
I can see that 'x' points to a valid address. When I do:
(gdb) print x
$16 = 0x841eda3
(gdb) print x->y
$17 = 0x0
(gdb) print *x->y
$18 = {
...
...
z = 1;
...
}
How is this possible? Shouldn't I get "Cannot access memory at address 0x0" error from GDB?
What version of GDB?
Core dumps don't always tell the truth. They can be messed up in many ways. I've had core dumps that look valid and are not. I think you simply got lucky that it looks right.
Related
I'm trying to migrate a small c program from hpux to linux. The project compiles fine but crashes at runtime showing me a segmentation fault. I've already tried to see behind the mirror using strace and gdb but still don't understand. The relevant (truncated) parts:
tts_send_2.c
Contains a method
int sequenznummernabgleich(int sockfd, char *snd_id, char *rec_id, int timeout_quit) {
TS_TEL_TAB tel_tab_S01;
int n;
# truncated
}
which is called from within that file like this:
. . .
. . .
switch(sequenznummernabgleich(sockfd,c_snd_id,c_rec_id,c_timeout_quit)) {
/* kritischer Fehler */
case -1:
. . .
. . .
when calling that method I'm presented a segmentation fault (gdb output):
Program received signal SIGSEGV, Segmentation fault.
0x0000000000403226 in sequenznummernabgleich (sockfd=<error reading variable: Cannot access memory at address 0x7fffff62f94c>,
snd_id=<error reading variable: Cannot access memory at address 0x7fffff62f940>, rec_id=<error reading variable: Cannot access memory at address 0x7fffff62f938>,
timeout_quit=<error reading variable: Cannot access memory at address 0x7fffff62f934>) at tts_snd_2.c:498
498 int sequenznummernabgleich(int sockfd, char *snd_id, char *rec_id, int timeout_quit) {
which I just don't understand. When I'm stepping to the line where the method is called using gdb, all the variables are looking fine:
1008 switch(sequenznummernabgleich(sockfd,c_snd_id,c_rec_id,c_timeout_quit)) {
(gdb) p sockfd
$9 = 8
(gdb) p &sockfd
$10 = (int *) 0x611024 <sockfd>
(gdb) p c_snd_id
$11 = "KR", '\000' <repeats 253 times>
(gdb) p &c_snd_id
$12 = (char (*)[256]) 0xfde220 <c_snd_id>
(gdb) p c_rec_id
$13 = "CO", '\000' <repeats 253 times>
(gdb) p &c_rec_id
$14 = (char (*)[256]) 0xfde560 <c_rec_id>
(gdb) p c_timeout_quit
$15 = 20
(gdb) p &c_timeout_quit
$16 = (int *) 0xfde660 <c_timeout_quit>
I've also created an strace output. Here's the last part concerning the code shown above:
strace output
Any ideas ? I've searched the web and of course stackoverflow for hours without finding a really similar case.
Thanks
Kriz
I haven't used an HP/UX in eons but do hazily remember enough for the following suggestions:
Make sure you're initializing variables / struts correctly. Use calloc instead of malloc.
Also don't assume a specific bit pattern order: eg low byte then high byte. Ska endian-ness of the machine. There are usually macros in the compiler that will handle the appropriate ordering for you.
Update 15.10.16
After debugging for even more hours I found the real Problem. On the first line of the Method "sequenznummernabgleich" is a declaration of a struct
TS_TEL_TAB tel_tab_S01;
This is defined as following:
typedef struct {
TS_BOF_REC bof;
TS_REM_REC rem;
TS_EOF_REC eof;
int bof_len;
int rem_len;
int eof_len;
int cnt;
char teltyp[LEN_TELTYP+1];
TS_TEL_ENTRY entries[MAX_TEL];
} TS_TEL_TAB;
and it's embedded struct TS_TEL_ENTRY
typedef struct {
int len;
char tel[MAX_TEL_LEN];
} TS_TEL_ENTRY;
The problem is that the value for MAX_TEL_LEN had been changed from 512 to 1024 and thus the struct almost doubled in size what lead to that the STACK SIZE was not big enough anymore.
SOLUTION
Simply set the stack size from 8Mb to 64Mb. This can be achieved using ulimit command (under linux).
List current stack size: ulimit -s
Set stack size to 64Mb: ulimit -s 65535
Note: Values for stack size are in kB.
For a good short ref on ulimit command have a look # ss64
I am running an application through gdb and I want to set a breakpoint for any time a specific variable is accessed / changed. Is there a good method for doing this? I would also be interested in other ways to monitor a variable in C/C++ to see if/when it changes.
watch only breaks on write, rwatch let you break on read, and awatch let you break on read/write.
You can set read watchpoints on memory locations:
gdb$ rwatch *0xfeedface
Hardware read watchpoint 2: *0xfeedface
but one limitation applies to the rwatch and awatch commands; you can't use gdb variables
in expressions:
gdb$ rwatch $ebx+0xec1a04f
Expression cannot be implemented with read/access watchpoint.
So you have to expand them yourself:
gdb$ print $ebx
$13 = 0x135700
gdb$ rwatch *0x135700+0xec1a04f
Hardware read watchpoint 3: *0x135700 + 0xec1a04f
gdb$ c
Hardware read watchpoint 3: *0x135700 + 0xec1a04f
Value = 0xec34daf
0x9527d6e7 in objc_msgSend ()
Edit: Oh, and by the way. You need either hardware or software support. Software is obviously much slower. To find out if your OS supports hardware watchpoints you can see the can-use-hw-watchpoints environment setting.
gdb$ show can-use-hw-watchpoints
Debugger's willingness to use watchpoint hardware is 1.
What you're looking for is called a watchpoint.
Usage
(gdb) watch foo: watch the value of variable foo
(gdb) watch *(int*)0x12345678: watch the value pointed by an address, casted to whatever type you want
(gdb) watch a*b + c/d: watch an arbitrarily complex expression, valid in the program's native language
Watchpoints are of three kinds:
watch: gdb will break when a write occurs
rwatch: gdb will break wnen a read occurs
awatch: gdb will break in both cases
You may choose the more appropriate for your needs.
For more information, check this out.
Assuming the first answer is referring to the C-like syntax (char *)(0x135700 +0xec1a04f) then the answer to do rwatch *0x135700+0xec1a04f is incorrect. The correct syntax is rwatch *(0x135700+0xec1a04f).
The lack of ()s there caused me a great deal of pain trying to use watchpoints myself.
I just tried the following:
$ cat gdbtest.c
int abc = 43;
int main()
{
abc = 10;
}
$ gcc -g -o gdbtest gdbtest.c
$ gdb gdbtest
...
(gdb) watch abc
Hardware watchpoint 1: abc
(gdb) r
Starting program: /home/mweerden/gdbtest
...
Old value = 43
New value = 10
main () at gdbtest.c:6
6 }
(gdb) quit
So it seems possible, but you do appear to need some hardware support.
Use watch to see when a variable is written to, rwatch when it is read and awatch when it is read/written from/to, as noted above. However, please note that to use this command, you must break the program, and the variable must be in scope when you've broken the program:
Use the watch command. The argument to the watch command is an
expression that is evaluated. This implies that the variabel you want
to set a watchpoint on must be in the current scope. So, to set a
watchpoint on a non-global variable, you must have set a breakpoint
that will stop your program when the variable is in scope. You set the
watchpoint after the program breaks.
In addition to what has already been answered/commented by asksol and Paolo M
I didn't at first read understand, why do we need to cast the results. Though I read this: https://sourceware.org/gdb/onlinedocs/gdb/Set-Watchpoints.html, yet it wasn't intuitive to me..
So I did an experiment to make the result clearer:
Code: (Let's say that int main() is at Line 3; int i=0 is at Line 5 and other code.. is from Line 10)
int main()
{
int i = 0;
int j;
i = 3840 // binary 1100 0000 0000 to take into account endianness
other code..
}
then i started gdb with the executable file
in my first attempt, i set the breakpoint on the location of variable without casting, following were the results displayed
Thread 1 "testing2" h
Breakpoint 2 at 0x10040109b: file testing2.c, line 10.
(gdb) s
7 i = 3840;
(gdb) p i
$1 = 0
(gdb) p &i
$2 = (int *) 0xffffcbfc
(gdb) watch *0xffffcbfc
Hardware watchpoint 3: *0xffffcbfc
(gdb) s
[New Thread 13168.0xa74]
Thread 1 "testing2" hit Breakpoint 2, main () at testing2.c:10
10 b = a;
(gdb) p i
$3 = 3840
(gdb) p *0xffffcbfc
$4 = 3840
(gdb) p/t *0xffffcbfc
$5 = 111100000000
as we could see breakpoint was hit for line 10 which was set by me. gdb didn't break because although variable i underwent change yet the location being watched didn't change (due to endianness, since it continued to remain all 0's)
in my second attempt, i did the casting on the address of the variable to watch for all the sizeof(int) bytes. this time:
(gdb) p &i
$6 = (int *) 0xffffcbfc
(gdb) p i
$7 = 0
(gdb) watch *(int *) 0xffffcbfc
Hardware watchpoint 6: *(int *) 0xffffcbfc
(gdb) b 10
Breakpoint 7 at 0x10040109b: file testing2.c, line 10.
(gdb) i b
Num Type Disp Enb Address What
6 hw watchpoint keep y *(int *) 0xffffcbfc
7 breakpoint keep y 0x000000010040109b in main at testing2.c:10
(gdb) n
[New Thread 21508.0x3c30]
Thread 1 "testing2" hit Hardware watchpoint 6: *(int *) 0xffffcbfc
Old value = 0
New value = 3840
Thread 1 "testing2" hit Breakpoint 7, main () at testing2.c:10
10 b = a;
gdb break since it detected the value has changed.
I have an issue with my pointer to a structure variable. I just started using GDB to debug the issue. The application stops when it hits on the line of code below due to segmentation fault. ptr_var is a pointer to a structure
ptr_var->page = 0;
I discovered that ptr_var is set to an invalid memory 0x0 after a series of function calls which caused the segmentation fault when assigning the value "0" to struct member "page". The series of function calls does not have a reference to ptr_var. The old address that used to be assigned to ptr_var is still in memory. I can still still print the values of members from the struct ptr_var using the old address. GDB session below shows that I am printing a string member of the struct ptr_var using its address
(gdb) x /s *0x7e11c0
0x7e0810: "Sample String"
I couldn't tell when the variable ptr_var gets assigned an invalid address 0x0. I'm a newbie to GDB and an average C programmer. Your assistance in this matter is greatly appreciated. Thank you.
What you want to do is set a watchpoint, GDB will then stop execution every time a member of a struct is modified.
With the following example code
typedef struct {
int val;
} Foo;
int main(void) {
Foo foo;
foo.val = 5;
foo.val = 10;
}
Drop a breakpoint at the creation of the struct and execute watch -l foo.val Then every time that member is changed you will get a break. The following is my GDB session, with my input
(gdb) break test.c:8
Breakpoint 3 at 0x4006f9: file test.c, line 8.
(gdb) run
Starting program: /usr/home/sean/a.out
Breakpoint 3, main () at test.c:9
9 foo.val = 5;
(gdb) watch -l foo.val
Hardware watchpoint 4: -location foo.val
(gdb) cont
Continuing.
Hardware watchpoint 4: -location foo.val
Old value = 0
New value = 5
main () at test.c:10
(gdb) cont
Continuing.
Hardware watchpoint 4: -location foo.val
Old value = 5
New value = 10
main () at test.c:11
(gdb) cont
If you can rerun, then break at a point where ptr_var is correct you can set a watch point on ptr_var like this: (gdb) watch ptr_var. Now when you continue every time ptr_var is modified gdb should stop.
Here's an example. This does contain undefined behaviour, as I'm trying to reproduce a bug, but hopefully it should be good enough to show you what I'm suggesting:
#include <stdio.h>
#include <stdint.h>
int target1;
int target2;
void
bad_func (int **bar)
{
/* Set contents of bar. */
uintptr_t ptr = (uintptr_t) bar;
printf ("Should clear %p\n", (void *) ptr);
ptr += sizeof (int *);
printf ("Will clear %p\n", (void *) ptr);
/* Bad! We just corrupted foo (maybe). */
*((int **) ptr) = NULL;
}
int
main ()
{
int *foo = &target1;
int *bar = &target2;
printf ("&foo = %p\n", (void *) &foo);
printf ("&boo = %p\n", (void *) &bar);
bad_func (&bar);
return *foo;
}
And here's a gdb session:
(gdb) break bad_func
Breakpoint 1 at 0x400542: file watch.c, line 11.
(gdb) r
&foo = 0x7fffffffdb88
&boo = 0x7fffffffdb80
Breakpoint 1, bad_func (bar=0x7fffffffdb80) at watch.c:11
11 uintptr_t ptr = (uintptr_t) bar;
(gdb) up
#1 0x00000000004005d9 in main () at watch.c:27
27 bad_func (&bar);
(gdb) watch foo
Hardware watchpoint 2: foo
(gdb) c
Continuing.
Should clear 0x7fffffffdb80
Will clear 0x7fffffffdb88
Hardware watchpoint 2: foo
Old value = (int *) 0x60103c <target1>
New value = (int *) 0x0
bad_func (bar=0x7fffffffdb80) at watch.c:18
18 }
(gdb)
For some reason the watchpoint appears to trigger on the line after the change was made, even though I compiled this with -O0, which is a bit of a shame. Still, it's usually close enough to help identify the problem.
For such kind of problems I often use the old electric fence library, it can be used to find bug in "software that overruns the boundaries of a malloc() memory allocation". You will find all the instructions and basic usage at this page:
http://elinux.org/Electric_Fence
(At the very end of the page linked above you will find the download link)
My program after running for around few hours randomly crashes because of a segmentation fault. My environment is Ubuntu (Linux)
When I try to print the data structure thats being accessed when it crashed the pointer is always pointing to invalid memory.
(gdb) p *xxx_info[8]
**Cannot access memory at address 0x7fd200000000**
(gdb)
In order to detect data corruption I add two fence variables that were hardcoded with a well defined value so that I can detect a memory corruption. I see that my fence variables for the linked list node which caused my process to crash had been violated from my earlier logs.
(gdb) p xxx_info
$2 = {0x7fd248000c30, 0x7fd248001050, 0x7fd248000b30, 0x7fd248000d40, 0x7fd248000f50, 0x0,
0x7fd248001160, 0x7fd248001280, 0x7fd200000000, 0x7fd2480008c0,
0x7fd248003000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
(gdb) p xxx_info[8]
$3 = (xxx_info_t *) 0x7fd200000000
(gdb) p *xxx_info[8]
**Cannot access memory at address 0x7fd200000000**
(gdb)
I added the two fence variables at the start and the end of the data structures.
(gdb) pt xxx_info_t
type = struct xxx_info_t {
uint32_t begin_fence;
char *str;
int cluster_id;
uint32_t end_fence;
}
Under normal circumstances the fence variables MUST always be 0xdeaddead as shown below:
(gdb) p/x *xxx_info[7]
$4 = {**begin_fence= 0xdeaddead**, str= 0x7fd248002f10, cluster_id= 0x1f2, **end_fence= 0xdeaddead**}
(gdb)
Whenever I access the array of pointers xxx_info I check for the fence values for each index. I noticed that I would get error messages saying that the fence variables for index 8 look corrupted. Later the code crashed when accessing index 8.
This means that somewhere I am overwriting the memory address pointed to by the index 8 of the array xxx_info .
My question is:
How can I debug these errors? Can I dynamically set some breakpoints so that whenever somebody overwrites that memory address I complain and assert. If this were a global variable that somebody were corrupting I could have set a HW breakpoint. Since this is a list created on heap (i malloc) the addresses that I will get will be dynamic and hence I can't use memory breakpoints/watchpoints.
Any ideas on what I can do?
Code snippets from two C source files:
A.c
Channel *testChannelGet()
{
Channel *ch = channelGet (parser,parserCh);
return ch;
}
B.c
Channel *channelGet(UINT8 parser, UINT16 parserCh)
{
chnl.player = &solPlayer;
return((Channel *)&chnl);
}
I compile both files and create a static and a shared library. Now I call testChannelGet from a sample program. When I link it against the static library, it works perfectly. But if I link it against the shared library, its SEGFAULTing. Debugging tells me that the pointer returned from channelGet is changing the moment it returns. GDB output below.
174 Channel *ch = channelGet (parser,parserCh);
(gdb) s
channelGet (parser=1 '\001', parserCh=1) at B.c:15174
15174 chnl.player = &solPlayer;
(gdb) n
15175 return((Channel *)&chnl);
(gdb) p ((Channel *)&chnl)
$1 = (Channel *) 0x7ffff7fed1a0
(gdb) n
15176 }
(gdb) n
testChannelGet at A.c:175
175 return ch;
(gdb) p ch
$2 = (Channel *) 0xfffffffff7fed1a0
It seems the address value points to a different offset now - 0xfffffffff7fed1a0 vs 0x7ffff7fed1a0 . The last bytes in both addresses are the same.
Any hints? I have tried the -fPIC option to no avail.
Is there a prototype in scope for channelGet() in A.c?
If not, the results you're seeing could be explained as follows:
channelGet() is assumed to return int (due to lack of prototype), so the result is truncated to 0xf7fed1a0
then it is cast to a 64-bit pointer, so gets sign-extended to 0xfffffffff7fed1a0
(You should get complaints about this if you compile with warnings enabled, of course...)
Run your program under valgrind. Find and fix any errors it reports.