Map Var to Declaration Using Dwarf DebugInfo and Source Code - c

Given the line number of a variable access (not declaration), how can I determine its type (or its declaration DIE in the .info tree)?
Look at the following code:
void foo()
{
{
struct A *b;
}
{
struct B *b;
b = malloc(sizeof(struct B));
}
}
Suppose that I have this source code and it is compiled with debug information in DWARF format. How can I determine that variable b is of type struct B * using the source code and debug information?
I mean how can I automatize it offline? The problem is that in the .info section of DWARF there is no mapping between source code (e.g., line number) and scope information. In the example above, using debug information, we can determine that there is a variable of type struct A * which is a child of foo() and a variable of type struct B * which is the other child of foo(). Parsing the source code can help to determine the nesting level at which the access has occurred, but there is no way to map the accessed variable to its type. Because there are two types at the same level at which b is accessed.
If there is a way to force the compiler to include more information in the debug information, the problem can be solved. For example, adding DW_AT_high_pc and DW_AT_low_pc to the debug information of DIEs of type DW_TAG_lexical_block will help.

You have already answered almost all of your own question; there are only two things missing.
Firstly, the relationship between file name/line number and program counter is encoded in .debug_line, not .debug_info.
Secondly, the variables are not children of foo(): each is a child of a lexical block. The relevant portion of the program structure will look like
DW_TAG_compile_unit
DW_TAG_subprogram
DW_TAG_lexical_block
DW_TAG_variable
DW_TAG_lexical_block
DW_TAG_variable
The lexical block should be associated with an address range but this might be encoded using DW_AT_ranges instead of DW_AT_low_pc/DW_AT_high_pc; if that's the case then you'll need to interpret .debug_ranges.
To illustrate the case in hand I compiled the following with cc -g (gcc 4.8.5 on Oracle Linux)...
1 #include <stdlib.h>
2
3 struct A { int a; };
4 struct B { int b; };
5
6 void foo()
7 {
8 {
9 struct A *b;
10 }
11
12 {
13 struct B *b;
14 b = malloc(sizeof (struct B));
15 }
16 }
...and used 'readelf -w' to decode the DWARF. Line 14 appears here in the line number table:
[0x00000032] Special opcode 124: advance Address by 8 to 0x8 and Line by 7 to 14
meaning that we're interested in address 0x8. The DIE hierarchy includes
<0><b>: Abbrev Number: 1 (DW_TAG_compile_unit)
<1><96>: Abbrev Number: 6 (DW_TAG_subprogram)
<9d> DW_AT_low_pc : 0x0
<a5> DW_AT_high_pc : 0x18
<2><b3>: Abbrev Number: 7 (DW_TAG_lexical_block)
<b4> DW_AT_low_pc : 0x8
<bc> DW_AT_high_pc : 0xe
<3><c4>: Abbrev Number: 8 (DW_TAG_variable)
<c5> DW_AT_name : b
<c7> DW_AT_decl_file : 1
<c8> DW_AT_decl_line : 13
<c9> DW_AT_type : <0xd2>
The DIE at 0xb3 does not contain any further lexical blocks so it represents the tightest scope at address 0x8. At this point, hence, the name "b" must refer to the DIE's child at 0xc4. This variable's type is given by
<1><d2>: Abbrev Number: 9 (DW_TAG_pointer_type)
<d3> DW_AT_byte_size : 8
<d4> DW_AT_type : <0x81>
<1><81>: Abbrev Number: 4 (DW_TAG_structure_type)
<82> DW_AT_name : B
<84> DW_AT_byte_size : 4
<2><8b>: Abbrev Number: 5 (DW_TAG_member)
<8c> DW_AT_name : b
<90> DW_AT_type : <0x34>
<94> DW_AT_data_member_location: 0
<1><34>: Abbrev Number: 3 (DW_TAG_base_type)
<35> DW_AT_byte_size : 4
<36> DW_AT_encoding : 5 (signed)
<37> DW_AT_name : int
EDIT:
In your own answer you've given a counter-example for mplayer in which there are lexical blocks without corresponding address ranges. Such DWARF does not conform to the standard: §3.4 of DWARF 2 states that a lexical block entry has DW_AT_low_pc and DW_AT_high_pc attributes and makes no suggestion that these are optional. A likely candidate for this bug, assuming you're using gcc, is "DWARF debug info for inlined lexical blocks missing range". The default mplayer configuration includes -O2 optimisation, which turns on inlining; you will see this reflected in the parent DW_TAG_subprogram for draw_vertices(), from which the example code is taken. A workaround for the bug is to add -fno-inline to the compiler options; this does not seem to suppress all inlining so you may wish to disable optimisation altogether.

Here is the output of objdump --dwarf=info mplayer for an MPlayer-1.3.0 compiled using -gdwarf-2 option.
<2><4000e>: Abbrev Number: 43 (DW_TAG_lexical_block)
<3><4000f>: Abbrev Number: 37 (DW_TAG_variable)
<40010> DW_AT_name : px
<40013> DW_AT_decl_file : 1
<40014> DW_AT_decl_line : 2079
<40016> DW_AT_type : <0x38aed>
<3><4001a>: Abbrev Number: 37 (DW_TAG_variable)
<4001b> DW_AT_name : py
<4001e> DW_AT_decl_file : 1
<4001f> DW_AT_decl_line : 2080
<40021> DW_AT_type : <0x38aed>
<3><40025>: Abbrev Number: 0
<2><40026>: Abbrev Number: 0
As you can see at offset 0x4000e, there is a lexical block with no attribute. The corresponding source code is located in libvo/gl_common.c:2078:
for (i = 0; i < 4; i++) {
int px = 2*i;
int py = 2*i + 1;
mpglTexCoord2f(texcoords[px], texcoords[py]);
if (is_yv12) {
mpglMultiTexCoord2f(GL_TEXTURE1, texcoords2[px], texcoords2[py]);
mpglMultiTexCoord2f(GL_TEXTURE2, texcoords2[px], texcoords2[py]);
}
if (use_stipple)
mpglMultiTexCoord2f(GL_TEXTURE3, texcoords3[px], texcoords3[py]);
mpglVertex2f(vertices[px], vertices[py]);
}
The block is a for block. There are many more similar lexical_block instances.
My solution consists of two parts:
1) Source code analysis:
Find the scope (surrounding left and right braces) where the target variable is accessed. In fact we only need to store the line number of the left brace.
Find the level of the scope in the tree of scopes (a tree that shows parent/child relationships similar to what can be found in .info.
At this point we have the start line of the scope corresponding to a variable access and the level of the scope in the tree of scopes (e.g., line 12 and level 2 in the code depicted in the original question).
2) DebugInfo analysis:
Now, we can analyze the appropriate CU and look for the declarations of that target variable. The important point is that only the declarations with a line number smaller than the line number of the access point are valid. Considering this, we can search the global scope, and continue with deeper levels, in order.
Declarations with scopes deeper than the scope of the access are invalid. Declarations with the same scope as the target variable are only valid if their line number is between the start line of the target scope and the line number of the variable access.

Related

WinDbg evaluate ebp+12

I try to understand things about stackpointer, basepointer .. how does it work .. and because most of the teaching material are not combined with a practical examples, I try to reproduce that: https://en.wikibooks.org/wiki/X86_Disassembly/Functions_and_Stack_Frames
Following very simple code by me:
#include <stdio.h>
int main()
{
function1(1, 2);
}
int function1(int a, int b)
{
int c = a + b;
return c;
}
I use WinDbg to execute the programm and set the breakpoint bm CallStackPractice!function1 and type g to hit the breakpoint and p to step into the function.
With ebp+8 we should get the first parameter. I did that in WinDbg:
0:000> ? poi(ebp+8)
Evaluate expression: 1 = 00000001
good. No we want our second parameter that should be ebp+12.
0:000> ? poi(ebp+12)
Evaluate expression: 270729434 = 102300da
We don't get 2 = 00000002. I opened the memory window in WinDbg and it shows me the correct value but why does my command not work?
Thank you!
UPDATE:
For better understanding the screenshot:
That's a common mistake. 12 means 0x12 by default.
If you want a decimal number 12, use 0n12 or 0xC or change the default number format using n 10 (I don't know anyone who does that, actually).
0:000> ? 12
Evaluate expression: 18 = 00000000`00000012
0:000> n 10
base is 10
0:000> ? 12
Evaluate expression: 12 = 00000000`0000000c
Back at base 16:
1:005:x86> n 16
base is 16
1:005:x86> ? poi(ebp+8)
Evaluate expression: 1 = 00000001
1:005:x86> ? poi(ebp+c)
Evaluate expression: 2 = 00000002
If you get weird errors like
1:005:x86> ?poi(ebp +c)
Memory access error at ')'
that's because you're still at base 10.
You might also want to take a look at the stack with dps like so:
1:005:x86> dps ebp-c L7
008ff60c cccccccc <-- magic number (4x INT 3 breakpoint)
008ff610 00000003
008ff614 cccccccc
008ff618 008ff6f4
008ff61c 00fb189a DebugESPEBP!main+0x2a [C:\...\DebugESPEBP.cpp # 13]
008ff620 00000001 <-- a
008ff624 00000002 <-- b
As you see, dps will give you the return address as a symbol with line number. And you'll see that the memory layout in debug mode contains magic numbers helpful for debugging

How to interpret `p_size` in ELF files?

Recently, I have been playing around ELF format files. And I tried to solve a problem:
Given the eip, print the name of the function in the ELF executable file
And I can do this with symbol table and string table. Since I only need to deal with those symbols whose type is STT_FUNC, I wrote the following program:
for (i = 0; i < nr_symtab_entry; ++i) {
if ((symtab[i].st_info == STT_FUNC) &&
eip < symtab[i].st_value + symtab[i].st_size &&
eip >= symtab[i].st_value) {
strcpy(funcName, strtab + symtab[i].st_name);
}
}
where symtab is the symbol table, strtab is the string table.
But after several tests, I realized that the program above is wrong. After several trials, I changed it into this:
for (i = 0; i < nr_symtab_entry; ++i) {
if ((symtab[i].st_info & STT_FUNC) &&
eip < symtab[i].st_value + symtab[i].st_size &&
eip >= symtab[i].st_value) {
strcpy(funcName, strtab + symtab[i].st_name);
}
}
Then it worked! But when I man elf, the manual told me:
st_info This member specifies the symbol’s type and binding attributes
It didn't mention whether it is a bit flag or not. And then I encountered a problem which needs me to check whether a segment is PT_LOAD. And the manual, again, does not specify whether it is a bit flag or not. So I come here to ask for help---Is PT_LOAD also a bit flag? Is every symbol-constant like thing in ELF file a bit flag?
It seems that st_info can be interpreted by specific macros. But how about p_type?
Use:
if (ELF32_ST_TYPE(symtab[i].st_info) == STT_FUNC && ...
like for example kernel does in linux kernel/module.c.
The ELF32_ST_TYPE is used to extract type of a symbol from the st_info. I can't find the list of which symbols are types of a symbol anywhere, but inspecting #define ELF32_ST_TYPE(info) ((info) & 0xf) and definitions in elf.h I can be pretty sure ELF32_ST_TYPE(st_info) is equal to one of the following macros:
#define STT_NOTYPE 0
#define STT_OBJECT 1
#define STT_FUNC 2
#define STT_SECTION 3
#define STT_FILE 4
#define STT_COMMON 5
#define STT_TLS 6
In man elf there it is:
There are macros for packing and unpacking the binding and
type fields:
ELF32_ST_BIND(info), ELF64_ST_BIND(info)
Extract a binding from an st_info value.
ELF32_ST_TYPE(info), ELF64_ST_TYPE(info)
Extract a type from an st_info value.
ELF32_ST_INFO(bind, type), ELF64_ST_INFO(bind, type)
Convert a binding and a type into an st_info value.

Why some of the local variables are not listed in the corresponding stack frame when inspected using GDB?

I have a piece of code in C as shown below-
In a .c file-
1 custom_data_type2 myFunction1(custom_data_type1 a, custom_data_type2 b)
2 {
3 int c=foo();
4 custom_data_type3 t;
5 check_for_ir_path();
6 ...
7 ...
8 }
9
10 custom_data_type4 myFunction2(custom_data_type3 c, const void* d)
11 {
12 custom_data_type4 e;
13 struct custom_data_type5 f;
14 check_for_ir_path();
15 ...
16 temp = myFunction1(...);
17 return temp;
18 }
In a header file-
1 void CRASH_DUMP(int *i)
2 __attribute__((noinline));
3
4 #define INTRPT_FORCE_DUMMY_STACK 3
5
6 #define check_for_ir_path() { \
7 if (checkfunc1() && !checkfunc2()) { \
8 int sv = INTRPT_FORCE_DUMMY_STACK; \
9 ...
10 CRASH_DUMP(&sv);\
11 }\
12 }\
In an unknown scenario, there is a crash.
After processing the core dump using GDB, we get the call stack like -
#0 0x00007ffa589d9619 in myFunction1 [...]
(custom_data_type1=0x8080808080808080, custom_data_type2=0x7ff9d77f76b8) at ../xxx/yyy/zzz.c:5
sv = 32761
t = <optimized out>
#1 0x00007ffa589d8f91 in myFunction2 [...]
(custom_data_type3=<optimized out>, d=0x7ff9d77f7748) at ../xxx/yyy/zzz.c:16
sv = 167937677
f = {
...
}
If you see the function, myFunction1 there are three local variables- c, t, sv (defined as part of macro definition). However, in the backtrace, in the frame 0, we see only two local variables - t and sv. And i dont see the variable c being listed.
Same is the case, in the function myFunction2, there are three local variables - e, f, sv(defined as part of macro definition). However, from the backtrace, in the frame 1, we see only two local variables - f and sv. And i dont see the variable e being listed.
Why is the behavior like this?
Any non-static variable declared inside the function, should be put on the callstack during execution and which should have been listed in the backtrace full, isn't it? However, some of the local variables are missing in the backtrace. Could someone provide an explanation?
Objects local to a C function often do not appear on the stack because optimization during compilation often makes it unnecessary to store objects on the stack. In general, while an implementation of the C abstract machine may be viewed as storing objects to local to a function on the stack, the actual implementation on a real processor after compilation and optimization may be very different. In particular:
An object local to a function may be created and used only inside a processor register. When there are enough processor registers to hold a function’s local objects, or some of them, there is no point in writing them to memory, so optimized code will not do so.
Optimization may eliminate a local object completely or fold it into other values. For example, given void foo(int x) { int t = 10; bar(x+2*t); … }, the compiler may merely generate code that adds an immediate value of 20 to x, with the result that neither 10 nor any other instantiation of t ever appears on stack, in a register, or even in the immediate operand of an instruction. It simply does not exist in the generated code because there was no need for it.
An object local to a function may appear on the stack at one point during a function’s code but not at others. And the places it appears may differ from place to place in the code. For example, with { int t = x*x; … bar(t); … t = x/3; … bar(t); … }, the compiler may decide to stash the first value of t in one place on the stack. But the second value assigned to t is effectively a separate lifetime, and the compiler may stash it in another place on the stack (or not at all, per the above). In a good implementation, the debugger may be aware of these different places and display the stored value of t while the program counter is in a matching section of code. And, while the program counter is not in a matching section of code, t may effectively not exist, and the debugger could report it is optimized out at that point.

VxWorks 5.5 not filling stack with 0xEEEEEEEE

From taskSpawn VxWorks 5.5 documentation :
"The only resource allocated to a spawned task is a stack of a specified size stackSize, which is allocated from the system memory partition. Stack size should be an even integer. A task control block (TCB) is carved from the stack, as well as any memory required by the task name. The remaining memory is the task's stack and every byte is filled with the value 0xEE for the checkStack( ) facility. See the manual entry for checkStack( ) for stack-size checking aids. "
However when tried to scan the stack by spawning a brand new task:
int scan_the_stack(...)
{
printf("Going to scan the stack forward\n");
int i = 0;
int* stack_addr = &i;
for (int i = 0; i < 100; i++)
{
printf("%d : %X\n", i, *stack_addr);
stack_addr++;
}
return 0;
}
void spawn_scan_stack()
{
taskSpawn("tScanner", /* name of new task (stored at pStackBase) */
150, /* priority of new task */
VX_FP_TASK, /* task option word */
10000, /* size (bytes) of stack needed plus name */
scan_the_stack, /* entry point of new task */
0, /* 1st of 10 req'd args to pass to entryPt */
0,0,0,0,0,0,0,0,0);
}
Instead of getting expected consecutive 'EEEEEEEE' I got some 'EE' intermixed with other values:
-> spawn_scan_stack
value = 80735920 = 0x4cfeeb0
-> Going to scan the stack forward
0 : 0
1 : 4CFEE1C
2 : 2
3 : EEEEEEEE
4 : EEEEEEEE
5 : EEEEEEEE
6 : EEEEEEEE
7 : 0
8 : 0
9 : 0
10 : 4CFEE70
11 : 2951F4
12 : 0
13 : 0
14 : EEEEEEEE
15 : EEEEEEEE
16 : EEEEEEEE
17 : EEEEEEEE
18 : EEEEEEEE
19 : 0
20 : 0
21 : 0
22 : 0
23 : 0
24 : EEEEEEEE
25 : EEEEEEEE
26 : EEEEEEEE
27 : EEEEEEEE
28 : 0
29 : 0
30 : 0
31 : 0
32 : 0
33 : 0
34 : 0
35 : 0
36 : 0
37 : 0
38 : 0
39 : 0
40 : 96
41 : FF630
42 : 20
43 : 11000001
44 : 19BDD /*...*/
The question is why isn't the stack filled with EEEEEEE (also checkStack seems to be working still).
Try 'stack_addr--;' - bet you're on Intel where the stacks grow downwards. You are looking up at valid stack data - return addresses and local vars, some of which are uninitialised.
My initial assumption was that the task had been spawned with VX_NO_STACK_FILL, which tells vxworks not to initialise the stack to 0xEE. But, looking at your code, you just use VX_FP_TASK (for floating point support). So the stack should be correctly initialised.
That really leaves two possibilities. The first (and more unlikely) is that something else is writing where it shouldn't be, but you would likely be seeing strange behaviour elsewhere (and i might expect checkStack to show that something has been smashed)
The second, as already suggested by others is that you are on one of the architectures (such as intel) where the stack grows downwards. The VxWorks Architecture Supplement should tell you which direction the stack grows for your architecture.
You might also be able to tell at compile time by including vxArch.h and testing the value of _STACK_DIR for _STACK_GROWS_DOWN or _STACK_GROWS_UP

Executable Packer (decompression/decryption stub)

I am working on an executable Packer & I have done compression & Encryption part so far. Now I have to store decompression/decryption stub/routine in the compressed file. My question is that will this stub be written in HEX code or I can place the assembly instruction directly ? If later is possible, then how ?
Creating a working packed binary requires:
modify the PE geometry
insert your code
Depending on your code size, you might want to use section padding, or to add your own section.
Then, to insert your code - as you seem to prefer direct ASM insertion - my suggestion would be to make the decryption code EIP-independent, then assemble it with something like YASM as pure code (-o), and include the code as assembled binary directly.
I wrote several mini-packers that might help as a starting reference, as they also 'insert' assembled code.
You have to have section with characteristic as "readable" & "writable" & "contains code" & "Is executable"
Address of Entry Point: 0x00019860
Section Header #1
Name: UPX0
Virtual Size: 0x00010000 (65536)
Virtual Address: 0x00001000
Size of Raw Data: 0x00000000 (0)
File Pointer to Raw Data: 0x00000400
File Pointer to Relocation Table: 0x00000000
File Pointer to Line Numbers: 0x00000000
Number of Relocations: 0
Number of Line Numbers: 0
Characteristics: 0xE0000080
Section contains uninitialized data.
Section is executable.
Section is readable.
Section is writeable.
Section Header #2
Name: UPX1
Virtual Size: 0x00009000 (36864)
Virtual Address: 0x00011000
Size of Raw Data: 0x00008A00 (35328)
File Pointer to Raw Data: 0x00000400
File Pointer to Relocation Table: 0x00000000
File Pointer to Line Numbers: 0x00000000
Number of Relocations: 0
Number of Line Numbers: 0
Characteristics: 0xE0000040
Section contains initialized data.
Section is executable.
Section is readable.
Section is writeable.
Section Header #3
Name: .rsrc
Virtual Size: 0x00001000 (4096)
Virtual Address: 0x0001A000
Size of Raw Data: 0x00000800 (2048)
File Pointer to Raw Data: 0x00008E00
File Pointer to Relocation Table: 0x00000000
File Pointer to Line Numbers: 0x00000000
Number of Relocations: 0
Number of Line Numbers: 0
Characteristics: 0xC0000040
Section contains initialized data.
Section is readable.
Section is writeable.
Just in short, UPX generates one section which contains compressed code and decompressor routine and second section which is uninitialized but is allowed to have writable and executable characteristic. The decompressor routine decompresses the code to the uninitialized section and continues with execution of the original entrypoint...

Resources