using extern to link array with pointer - c

Suppose I have two files:
file1.c- contains global definition of an int array of size 10 named "array[10]".
file2.c- contains an int pointer named "extern int *array", here I am trying to link this pointer to array.
But when I check the address of array in file1.c and pointer value in file2.c, they are both different. Why it is happening?

That doesn't work, in file2.c, you need
extern int array[];
since arrays and pointers are not the same thing. Both declarations must have the compatible types, and int* is not compatible with int[N].
What actually happens is not specified, the programme is ill-formed with extern int *array;, but probably, the first sizeof(int*) bytes of the array are interpreted as an address.

extern1.c
#include <stdio.h>
extern int *array;
int test();
int main(int argc, char *argv[])
{
printf ("in main: array address = %x\n", array);
test();
return 0;
}
extern2.c
int array[10] = {1, 2, 3};
int test()
{
printf ("in test: array address = %x\n", array);
return 0;
}
The output:
in main: array address = 1
in test: array address = 804a040
And the assemble code:
08048404 <main>:
8048404: 55 push %ebp
8048405: 89 e5 mov %esp,%ebp
8048407: 83 e4 f0 and $0xfffffff0,%esp
804840a: 83 ec 10 sub $0x10,%esp
804840d: 8b 15 40 a0 04 08 mov 0x804a040,%edx <--------- this (1)
8048413: b8 20 85 04 08 mov $0x8048520,%eax
8048418: 89 54 24 04 mov %edx,0x4(%esp)
804841c: 89 04 24 mov %eax,(%esp)
804841f: e8 dc fe ff ff call 8048300 <printf#plt>
8048424: e8 07 00 00 00 call 8048430 <test>
8048429: b8 00 00 00 00 mov $0x0,%eax
804842e: c9 leave
804842f: c3 ret
08048430 <test>:
8048430: 55 push %ebp
8048431: 89 e5 mov %esp,%ebp
8048433: 83 ec 18 sub $0x18,%esp
8048436: c7 44 24 04 40 a0 04 movl $0x804a040,0x4(%esp) <------- this (2)
804843d: 08
804843e: c7 04 24 3d 85 04 08 movl $0x804853d,(%esp)
8048445: e8 b6 fe ff ff call 8048300 <printf#plt>
804844a: b8 00 00 00 00 mov $0x0,%eax
804844f: c9 leave
8048450: c3 ret
Pay attention to the <------- in the assemble code. You can see in main function the array is array[0] and in test function the array is the address.

Related

Why is a returned stack-pointer replaced by a null-pointer by gcc?

I've created the following function in c as a demonstration/small riddle about how the stack works in c:
#include "stdio.h"
int* func(int i)
{
int j = 3;
j += i;
return &j;
}
int main()
{
int *tmp = func(4);
printf("%d\n", *tmp);
func(5);
printf("%d\n", *tmp);
}
It's obviously undefined behavior and the compiler also produces a warning about that. However unfortunately the compilation didn't quite work out. For some reason gcc replaces the returned pointer by NULL (see line 6d6).
00000000000006aa <func>:
6aa: 55 push %rbp
6ab: 48 89 e5 mov %rsp,%rbp
6ae: 48 83 ec 20 sub $0x20,%rsp
6b2: 89 7d ec mov %edi,-0x14(%rbp)
6b5: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
6bc: 00 00
6be: 48 89 45 f8 mov %rax,-0x8(%rbp)
6c2: 31 c0 xor %eax,%eax
6c4: c7 45 f4 03 00 00 00 movl $0x3,-0xc(%rbp)
6cb: 8b 55 f4 mov -0xc(%rbp),%edx
6ce: 8b 45 ec mov -0x14(%rbp),%eax
6d1: 01 d0 add %edx,%eax
6d3: 89 45 f4 mov %eax,-0xc(%rbp)
6d6: b8 00 00 00 00 mov $0x0,%eax
6db: 48 8b 4d f8 mov -0x8(%rbp),%rcx
6df: 64 48 33 0c 25 28 00 xor %fs:0x28,%rcx
6e6: 00 00
6e8: 74 05 je 6ef <func+0x45>
6ea: e8 81 fe ff ff callq 570 <__stack_chk_fail#plt>
6ef: c9 leaveq
6f0: c3 retq
This is the disassembly of the binary compiled with gcc version 7.5.0 and the -O0-flag; no other flags were used. This behavior makes the entire code pointless, since it's supposed to show how the stack behaves across function-calls. Is there any way to achieve a more literal compilation of this code with a at least somewhat up-to-date version of gcc?
And just for the sake of curiosity: what's the point of changing the behavior of the code like this in the first place?
Putting the return value in a pointer variable seems to change the behavior of the compiler and it generates the assembly code that returns a pointer to stack:
int* func(int i) {
int j = 3;
j += i;
int *p = &j;
return p;
}

Variable length bitfield

tl;dr: How can I properly generate a variable length bitfield in C?
I have a structure containing a variable number of elements for instance:
struct s{
int v1;
int v2;
/* ... */
int vn;
}
I'd like to create an anonymous bitfield of the size of the struct. Since i want low coupling between my two structs I do like this:
struct x{
unsigned long a:sz(struct s);
};
This code works but leads to issues:
What can I do if the struct has a size above 64 bits?
I can use several ints to store my bitfield but that gets messy and overkill.
I could also use a pointer to a void* but I suspect performance loss
By design my bitfield will generate AND for each operation over it deteriorate performances as seen in the following code
int main() {
560: 41 54 push %r12
}
__fortify_function int
printf (const char *__restrict __fmt, ...)
{
return __printf_chk (__USE_FORTIFY_LEVEL - 1, __fmt, __va_arg_pack ());
562: 4c 8d 25 cb 01 00 00 lea 0x1cb(%rip),%r12 # 734 <_IO_stdin_used+0x4>
569: 55 push %rbp
56a: bd 64 00 00 00 mov $0x64,%ebp
56f: 53 push %rbx
570: bb 04 00 00 00 mov $0x4,%ebx
575: 0f 1f 00 nopl (%rax)
struct x x = { .a = 3, };
for(int i=0;i <100; ++i){
++x.a;
printf("%u\n", x.a);
578: 0f b6 d3 movzbl %bl,%edx
57b: 31 c0 xor %eax,%eax
57d: 83 c3 01 add $0x1,%ebx
580: 4c 89 e6 mov %r12,%rsi
583: bf 01 00 00 00 mov $0x1,%edi
588: 83 e3 0f and $0xf,%ebx
58b: e8 b0 ff ff ff callq 540 <__printf_chk#plt>
for(int i=0;i <100; ++i){
590: 83 ed 01 sub $0x1,%ebp
593: 75 e3 jne 578 <main+0x18>
}
return 0;
}
595: 5b pop %rbx
596: 31 c0 xor %eax,%eax
598: 5d pop %rbp
599: 41 5c pop %r12
59b: c3 retq
59c: 0f 1f 40 00 nopl 0x0(%rax)
Ideally I'd like to do something similar to the following pseudo-code (but this is obviously invalid in C).
#define varint(sz) (sz > 32) ? \
((sz>64) ? "#error size too huge" : int64_t) : \
((sz>16) ? int32_t: int16_t)
varint(sizeof(s)/sizeof(int)) mybitfield = 0;
How can I properly generate a variable length bitfield in C?

C - accessing parameter inside the function

I have a main.c file
int boyut(const char* string);
char greeting[6] = {"Helle"};
int main(){
greeting[5] = 0x00;
int a = boyut(greeting);
return 0;
}
int boyut(const char* string){
int len=0;
while(string[len]){
len++;
}
return len;
}
I compile it with GCC command gcc -Wall -m32 -nostdlib main.c -o main.o
When I check disassembly, I see the variable greeting is placed in .data segment. And before calling boyut it's not pushed into stack. Inside the boyut function, it acts like variable greeting is in stack segment. So that variable actually not being accessed inside the function. Why is it generating a code like this? How can I correct this?
Disassembly of section .text:
080480f8 <main>:
80480f8: 55 push ebp
80480f9: 89 e5 mov ebp,esp
80480fb: 83 ec 18 sub esp,0x18
80480fe: c6 05 05 a0 04 08 00 mov BYTE PTR ds:0x804a005,0x0
8048105: 83 ec 0c sub esp,0xc
8048108: 68 00 a0 04 08 push 0x804a000
804810d: e8 0d 00 00 00 call 804811f <boyut>
8048112: 83 c4 10 add esp,0x10
8048115: 89 45 f4 mov DWORD PTR [ebp-0xc],eax
8048118: b8 00 00 00 00 mov eax,0x0
804811d: c9 leave
804811e: c3 ret
0804811f <boyut>:
804811f: 55 push ebp
8048120: 89 e5 mov ebp,esp
8048122: 83 ec 10 sub esp,0x10
8048125: c7 45 fc 00 00 00 00 mov DWORD PTR [ebp-0x4],0x0
804812c: eb 04 jmp 8048132 <boyut+0x13>
804812e: 83 45 fc 01 add DWORD PTR [ebp-0x4],0x1
8048132: 8b 55 fc mov edx,DWORD PTR [ebp-0x4]
8048135: 8b 45 08 mov eax,DWORD PTR [ebp+0x8]
8048138: 01 d0 add eax,edx
804813a: 0f b6 00 movzx eax,BYTE PTR [eax]
804813d: 84 c0 test al,al
804813f: 75 ed jne 804812e <boyut+0xf>
8048141: 8b 45 fc mov eax,DWORD PTR [ebp-0x4]
8048144: c9 leave
8048145: c3 ret
main.o: file format elf32-i386
Contents of section .data:
804a000 48656c6c 6500 Helle.
The function boyut is declared like this:
int boyut(const char* string);
That means: boyut takes a pointer to char and returns an int. And indeed, the compiler pushes a point to char on the stack. This pointer points to the beginning of greeting. This happens, because in C, an array is implicitly converted to a pointer to its first element under most circumstances.
If you want to pass an array to a function so it is copied to the function, you have to wrap the array into a structure and pass that.

Editing ELF binary call instruction

I am playing around with manipulating a binary's call functions. I have the below code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void myfunc2(char *str2, char *str1) {
// enter code here
}
void myfunc(char *str2, char *str1)
{
memcpy(str2 + strlen(str2), str1, strlen(str1));
}
int main(int argc, char **argv)
{
char str1[4] = "tim";
char str2[10] = "hello ";
myfunc((char *)&str2, (char *)&str1);
printf("%s\n", str2);
myfunc2((char *)&str2, (char *)&str1);
printf("%s\n", str2);
return 0;
}
void myfunc2(char *str2, char *str1)
{
memcpy(str2, str1, strlen(str1));
}
I have compiled the binary and using readelf or objdump I can see that my two functions reside at:
46: 000000000040072c 52 FUNC GLOBAL DEFAULT 13 myfunc2**
54: 000000000040064d 77 FUNC GLOBAL DEFAULT 13 myfunc**
Using the command objdump -D test (my binaries name), I can see that main has two callq functions. I tried to edit the first one to point to myfunc2 using the above address 72c, but that does not work; causes the binary to fail.
000000000040069a <main>:
40069a: 55 push %rbp
40069b: 48 89 e5 mov %rsp,%rbp
40069e: 48 83 ec 40 sub $0x40,%rsp
4006a2: 89 7d cc mov %edi,-0x34(%rbp)
4006a5: 48 89 75 c0 mov %rsi,-0x40(%rbp)
4006a9: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
4006b0: 00 00
4006b2: 48 89 45 f8 mov %rax,-0x8(%rbp)
4006b6: 31 c0 xor %eax,%eax
4006b8: c7 45 d0 74 69 6d 00 movl $0x6d6974,-0x30(%rbp)
4006bf: 48 b8 68 65 6c 6c 6f movabs $0x206f6c6c6568,%rax
4006c6: 20 00 00
4006c9: 48 89 45 e0 mov %rax,-0x20(%rbp)
4006cd: 66 c7 45 e8 00 00 movw $0x0,-0x18(%rbp)
4006d3: 48 8d 55 d0 lea -0x30(%rbp),%rdx
4006d7: 48 8d 45 e0 lea -0x20(%rbp),%rax
4006db: 48 89 d6 mov %rdx,%rsi
4006de: 48 89 c7 mov %rax,%rdi
4006e1: e8 67 ff ff ff callq 40064d <myfunc>
4006e6: 48 8d 45 e0 lea -0x20(%rbp),%rax
4006ea: 48 89 c7 mov %rax,%rdi
4006ed: e8 0e fe ff ff callq 400500 <puts#plt>
4006f2: 48 8d 55 d0 lea -0x30(%rbp),%rdx
4006f6: 48 8d 45 e0 lea -0x20(%rbp),%rax
4006fa: 48 89 d6 mov %rdx,%rsi
4006fd: 48 89 c7 mov %rax,%rdi
400700: e8 27 00 00 00 callq 40072c <myfunc2>
400705: 48 8d 45 e0 lea -0x20(%rbp),%rax
400709: 48 89 c7 mov %rax,%rdi
40070c: e8 ef fd ff ff callq 400500 <puts#plt>
400711: b8 00 00 00 00 mov $0x0,%eax
400716: 48 8b 4d f8 mov -0x8(%rbp),%rcx
40071a: 64 48 33 0c 25 28 00 xor %fs:0x28,%rcx
400721: 00 00
400723: 74 05 je 40072a <main+0x90>
400725: e8 f6 fd ff ff callq 400520 <__stack_chk_fail#plt>
40072a: c9 leaveq
40072b: c3 retq
I suspect I need to do something with calculating the address information through relative location or using the lea/mov instructions.
Any assistance to learn how to modify the call function would be greatly appreciated - please no pointers on editing strings like the howtos all over most of the internet...
In order to rewrite the address, you have to know the exact way the callq instructions are encoded.
Let's take the disassembly output of the first call:
4006e1: e8 67 ff ff ff callq 40064d <myfunc>
4006e6: ...
You can clearly see that the instruction is encoded with 5 bytes. The e8 byte is the instruction opcode, and 67 ff ff ff is the address to jump to. At this point, one would ask the question, what has 67 ff ff ff to do with 0x40064d?
Well, the answer is that e8 encodes a so-called "relative call" and the jump is performed relative to the location of the next instruction. You have to calculate the distance between 4006e6 and the called function in order to rewrite the address. Had the call been absolute (ff), you could just put the function address in these 4 bytes.
To prove that this is the case, consider the following arithmetic:
0x004006e6 + 0xffffff67 == 0x10040064d

Why does this code prevent gcc & llvm from tail-call optimization?

I have tried the following code on gcc 4.4.5 on Linux and gcc-llvm on Mac OSX(Xcode 4.2.1) and this. The below are the source and the generated disassembly of the relevant functions. (Added: compiled with gcc -O2 main.c)
#include <stdio.h>
__attribute__((noinline))
static void g(long num)
{
long m, n;
printf("%p %ld\n", &m, n);
return g(num-1);
}
__attribute__((noinline))
static void h(long num)
{
long m, n;
printf("%ld %ld\n", m, n);
return h(num-1);
}
__attribute__((noinline))
static void f(long * num)
{
scanf("%ld", num);
g(*num);
h(*num);
return f(num);
}
int main(void)
{
printf("int:%lu long:%lu unsigned:%lu\n", sizeof(int), sizeof(long), sizeof(unsigned));
long num;
f(&num);
return 0;
}
08048430 <g>:
8048430: 55 push %ebp
8048431: 89 e5 mov %esp,%ebp
8048433: 53 push %ebx
8048434: 89 c3 mov %eax,%ebx
8048436: 83 ec 24 sub $0x24,%esp
8048439: 8d 45 f4 lea -0xc(%ebp),%eax
804843c: c7 44 24 08 00 00 00 movl $0x0,0x8(%esp)
8048443: 00
8048444: 89 44 24 04 mov %eax,0x4(%esp)
8048448: c7 04 24 d0 85 04 08 movl $0x80485d0,(%esp)
804844f: e8 f0 fe ff ff call 8048344 <printf#plt>
8048454: 8d 43 ff lea -0x1(%ebx),%eax
8048457: e8 d4 ff ff ff call 8048430 <g>
804845c: 83 c4 24 add $0x24,%esp
804845f: 5b pop %ebx
8048460: 5d pop %ebp
8048461: c3 ret
8048462: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
8048469: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi
08048470 <h>:
8048470: 55 push %ebp
8048471: 89 e5 mov %esp,%ebp
8048473: 83 ec 18 sub $0x18,%esp
8048476: 66 90 xchg %ax,%ax
8048478: c7 44 24 08 00 00 00 movl $0x0,0x8(%esp)
804847f: 00
8048480: c7 44 24 04 00 00 00 movl $0x0,0x4(%esp)
8048487: 00
8048488: c7 04 24 d8 85 04 08 movl $0x80485d8,(%esp)
804848f: e8 b0 fe ff ff call 8048344 <printf#plt>
8048494: eb e2 jmp 8048478 <h+0x8>
8048496: 8d 76 00 lea 0x0(%esi),%esi
8048499: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi
080484a0 <f>:
80484a0: 55 push %ebp
80484a1: 89 e5 mov %esp,%ebp
80484a3: 53 push %ebx
80484a4: 89 c3 mov %eax,%ebx
80484a6: 83 ec 14 sub $0x14,%esp
80484a9: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
80484b0: 89 5c 24 04 mov %ebx,0x4(%esp)
80484b4: c7 04 24 e1 85 04 08 movl $0x80485e1,(%esp)
80484bb: e8 94 fe ff ff call 8048354 <__isoc99_scanf#plt>
80484c0: 8b 03 mov (%ebx),%eax
80484c2: e8 69 ff ff ff call 8048430 <g>
80484c7: 8b 03 mov (%ebx),%eax
80484c9: e8 a2 ff ff ff call 8048470 <h>
80484ce: eb e0 jmp 80484b0 <f+0x10>
We can see that g() and h() are mostly identical except the & (address of) operator beside the argument m of printf()(and the irrelevant %ld and %p).
However, h() is tail-call optimized and g() is not. Why?
In g(), you're taking the address of a local variable and passing it to a function. A "sufficiently smart compiler" should realize that printf does not store that pointer. Instead, gcc and llvm assume that printf might store the pointer somewhere, so the call frame containing m might need to be "live" further down in the recursion. Therefore, no TCO.
It's the & that does it. It tells the compiler that m should be stored on the stack. Even though it is passed to printf, the compiler has to assume that it might be accessed by somebody else and thus must the cleaned from the stack after the call to g.
In this particular case, as printf is known by the compiler (and it knows that it does not save pointers), it could probably be taught to perform this optimization.
For more info on this, look up 'escape anlysis'.

Resources