How to iterate over a C array in inline assembly? [duplicate] - arrays

This question already has answers here:
Can rip be used with another register with RIP-relative addressing?
(1 answer)
Referencing the contents of a memory location. (x86 addressing modes)
(2 answers)
Looping over arrays with inline assembly
(3 answers)
Closed 2 years ago.
I just started learning assembly for a contest and I am using intel-syntax inline assembly.
Recentlly I've learned how to access global C variable's in assembly using <type> ptr[rip + <variable name>] syntax. (for example in the mov command)
I tried to do the same thing with arrays and by adding a + <n * size of one element> after the variable name, I can access the nth element of the C array.
But I decided to write a program that iterates over an array and calculate the sum.
Here is the code:
#include <stdio.h>
unsigned long long array[10];
int main()
{
array[0] = 1;
array[1] = 1;
array[2] = 1;
array[3] = 1;
array[4] = 1;
array[5] = 1;
array[6] = 1;
array[7] = 1;
array[8] = 1;
array[9] = 1;
unsigned long long element_len = sizeof(*array);
unsigned long long len = sizeof(array);
unsigned long long sum = 0;
asm(R"(
.intel_syntax noprefix;
mov rcx, 0;
mov r8, 0;
loop:
add rcx, qword ptr[rip + array + r8];
add r8, rax;
cmp r8, rbx;
jb loop;
.att_syntax noprefix;
)"
: "=c" (sum) // outputs
: "a" (element_len), "b" (len) // inputs
: "r8" // clobbers
);
printf("%llu\n", sum);
return 0;
}
But because of that + r8; the compiler gives me this error (works fine when there is some constant value instead of r8)
$ gcc sum.c
sum.c: Assembler messages:
sum.c:61: Error: `qword ptr[rip+array+r8]' is not a valid base/index expression
I am using Kubuntu (technically Ubuntu) 20.04 on a 64-bit Intel processor and compiling the program using gcc 9.3.0.
EDIT:
Fixed by #Jester's comment.
asm(R"(
.intel_syntax noprefix;
mov rcx, 0;
mov r8, 0;
lea rdx, [rip+array];
loop:
add rcx, [rdx+r8];
add r8, rax;
cmp r8, rbx;
jb loop;
.att_syntax noprefix;
)"
: "=c" (sum) // outputs
: "a" (element_len), "b" (len) // inputs
: "r8" , "rdx"// clobbers
);

Related

gcc: "Boolean or"-optimization [duplicate]

This question already has answers here:
Why does Clang generate different code for reference and non-null pointer arguments?
(2 answers)
Why is gcc allowed to speculatively load from a struct?
(6 answers)
Boolean values as 8 bit in compilers. Are operations on them inefficient?
(3 answers)
Closed 6 months ago.
Let's assume there's the following code:
#include <stdbool.h>
typedef struct {
bool a;
bool b;
} MyStruct2;
bool g(MyStruct2 *s) {
return s->a || s->b;
}
bool g2(MyStruct2 *s) {
return s->a | s->b;
}
int main() {
return 0;
}
Which compiles into this:
g:
movzx eax, BYTE PTR [rdi]
test al, al
jne .L1
movzx eax, BYTE PTR [rdi+1]
.L1:
ret
g2:
movzx eax, BYTE PTR [rdi]
or al, BYTE PTR [rdi+1]
ret
main:
xor eax, eax
ret
g2 seems to be shorter and it does not include any jump. So why does gcc not optimize g to the same code as g2? None of the members of MyStruct2 is volatile (or otherwise special), so it should be safe to evaluate s->b in g in all cases (even if s->a is true and it would not be required to evaluate s->b).
Why doesnt gcc produce the shorter code without a jump?
Thanks

Count brackets in string - Assembly [duplicate]

This question already has answers here:
How does argument passing and returning values work in C/C++ on x86 at the assembly level? [closed]
(1 answer)
Passing Arguments C -> NASM -> C
(2 answers)
Passing various parameters from C to Assembler
(2 answers)
Closed 1 year ago.
Hi I have already written a program in Assembly 64 bit mode in connection with C, that counts the number of left and right brackets check here:
bits 64
section .data
extern g_left, g_right, g_str
section .text
global count
count:
enter 0,0
mov eax, 0
mov ebx, 0
mov ecx, 0
.back:
cmp [g_str + eax], byte 0
je .out
cmp [g_str + eax], byte '['
jne .right
inc ebx
.right:
cmp [g_str + eax], byte ']'
jne .skip
inc ecx
.skip:
inc eax
jmp .back
.out:
mov [g_l], ebx
mov [g_r], ecx
leave
ret
C code:
#include <stdio.h>
void count();
char g_str[] = "[[[]]]][[32423]][234dsfsdf";
int g_left, g_right;
int main()
{
count();
printf("left = %d and right = %d\n", g_left, g_right);
}
What I want is to use this assembly code but change it a bit so that a function that is called in C with a string as input and just prints the number of brackets. Also, I want it in 32-bit mode this time. It should look like this:
int brackets( char *t_str );
I'm new to assembly and confused on how to change my code, please help me.

Strange assembly output when optimizing string hashing

when trying to create a compile-time hash macro, it worked but it had its problems. So I thought if the strings are known at compile time(which they are), the whole hashing should get optimized away to a constant.
This gcc C99 code with optimization level -O3 enabled:
#include <stdio.h>
int main(void)
{
char const *const string = "hello";
int hash = 0;
for (unsigned char i=0; i < sizeof string; ++i)
{
hash += string[i]; //reeaally simple hash :)
}
printf("%i", hash);
return 0;
}
produced the following assembly code:
.LC0:
.string "hello"
.LC1:
.string "%i"
main:
sub rsp, 8
movsx eax, BYTE PTR .LC0[rip+6]
movsx edx, BYTE PTR .LC0[rip+7]
mov edi, OFFSET FLAT:.LC1
lea esi, [rax+532+rdx]
xor eax, eax
call printf
xor eax, eax
add rsp, 8
ret
whilst the same code, I only changed "hello" to "hello w", produces this assembly code, which completely optimized the hashing away:
.LC0:
.string "%i"
main:
sub rsp, 8
mov esi, 683
mov edi, OFFSET FLAT:.LC0
xor eax, eax
call printf
xor eax, eax
add rsp, 8
ret
Try it yourself
What is the reason? Does this mean I can't use this way of hashing because it may be that the overhead won't get optimized out? How can I make sure there won't be any overhead, what are alternatives?
EDIT 1:
I have played around a bit and it seems if the number of chars in the string is 6, it won't get optimized away if the number of chars is 7, it will
sizeof is wrong here. It returns the size of the char pointer not the length of the string.
In your case it is an UB and the compiler cannot optimize it out as you read outside string literal bounds. it is a clang bug not the feature.
if you do it properly gcc will optimize it as well
int main(void)
{
char const string[] = "hello";
int hash = 0;
for (unsigned char i=0; i < sizeof(string); ++i)
{
hash += string[i]; //reeaally simple hash :)
}
printf("%i", hash);
return 0;
}
https://godbolt.org/z/YCCNCt

How to print EIP address in C? [duplicate]

This question already has an answer here:
Save CPU registers to variables in GCC
(1 answer)
Closed 4 years ago.
This is my C program ... I was trying to print out ESP, EBP and EIP.
#include <stdio.h>
int main() {
register int i asm("esp");
printf("%#010x <= $ESP\n", i);
int a = 1;
int b = 2;
char c[] = "A";
char d[] = "B";
printf("%p d = %s \n", &d, d);
printf("%p c = %s \n", &c, c);
printf("%p b = %d \n", &b, b);
printf("%p a = %d \n", &a, a);
register int j asm("ebp");
printf("%#010x <= $EBP\n", j);
//register int k asm("eip");
//printf("%#010x <= $EIP\n", k);
return 0;
}
I don't have problem with ESP and EBP.
user#linux:~# ./memoryAddress
0xbffff650 <= $ESP
0xbffff654 d = B
0xbffff656 c = A
0xbffff658 b = 2
0xbffff65c a = 1
0xbffff668 <= $EBP
user#linux:~#
But when I try to put EIP code, I'm getting the following error when compiling it.
user#linux:~# gcc memoryAddress.c -o memoryAddress -g
memoryAddress.c: In function ‘main’:
memoryAddress.c:20:15: error: invalid register name for ‘k’
register int k asm("eip");
^
user#linux:~#
What's wrong with this code?
register int k asm("eip");
printf("%#010x <= $EIP\n", k);
Is it possible to print out EIP value via C programming?
If yes, please let me know how to do it.
Update
I've tested the code here ...
user#linux:~/c$ lscpu
Architecture: i686
CPU op-mode(s): 32-bit
Byte Order: Little Endian
Thanks #Antti Haapala and others for your help. The code works ... However, when I load it into GDB, the EIP value is different.
(gdb) b 31
Breakpoint 1 at 0x68f: file eip.c, line 31.
(gdb) i r $eip $esp $ebp
The program has no registers now.
(gdb) r
Starting program: /home/user/c/a.out
0x00000000 <= Low Memory Address
0x40055d <= main() function
0x4005a5 <= $EIP 72 bytes from main() function (start)
0xbffff600 <= $ESP (Top of the Stack)
0xbffff600 d = B
0xbffff602 c = A
0xbffff604 b = 2
0xbffff608 a = 1
0xbffff618 <= $EBP (Bottom of the Stack)
0xffffffff <= High Memory Address
Breakpoint 1, main () at eip.c:31
31 return 0;
(gdb) i r $eip $esp $ebp
eip 0x40068f 0x40068f <main+306>
esp 0xbffff600 0xbffff600
ebp 0xbffff618 0xbffff618
(gdb)
Here is the new code
#include <stdio.h>
#include <inttypes.h>
int main() {
register int i asm("esp");
printf("0x00000000 <= Low Memory Address\n");
printf("%p <= main() function\n", &main);
uint32_t eip;
asm volatile("1: lea 1b, %0;": "=a"(eip));
printf("0x%" PRIx32 " <= $EIP %" PRIu32 " bytes from main() function (start)\n",
eip, eip - (uint32_t)main);
int a = 1;
int b = 2;
char c[] = "A";
char d[] = "B";
printf("%#010x <= $ESP (Top of the Stack)\n", i);
printf("%p d = %s \n", &d, d);
printf("%p c = %s \n", &c, c);
printf("%p b = %d \n", &b, b);
printf("%p a = %d \n", &a, a);
register int j asm("ebp");
printf("%#010x <= $EBP (Bottom of the Stack)\n", j);
printf("0xffffffff <= High Memory Address\n");
return 0;
}
Please first read the QA Reading program counter directly - from there we can see that there are no mov commands to access the EIP/RIP directly, therefore you cannot use register asm to get access to it. Instead at any point you can use those tricks. It is easiest in 64-bit mode, use
uint64_t rip;
asm volatile("1: lea 1b(%%rip), %0;": "=a"(rip));
to get the 64-bit instruction (thanks Michael Petch for pointing out that a label works with lea here.
Demonstration:
#include <stdio.h>
#include <inttypes.h>
int main(void) {
uint64_t rip;
asm volatile("1: lea 1b(%%rip), %0;": "=a"(rip));
printf("%" PRIx64 "; %" PRIu64 " bytes from main start\n",
rip, rip - (uint64_t)main);
}
Then
% gcc -m64 rip.c -o rip; ./rip
55b7bf9e8659; 8 bytes from start of main
Proof that it is correct:
% gdb -batch -ex 'file ./rip' -ex 'disassemble main'
Dump of assembler code for function main:
0x000000000000064a <+0>: push %rbp
0x000000000000064b <+1>: mov %rsp,%rbp
0x000000000000064e <+4>: sub $0x10,%rsp
0x0000000000000652 <+8>: lea -0x7(%rip),%rax # 0x652 <main+8>
For 32-bit code it seems you can use lea with a label - this didn't work for 64-bit code though.
#include <stdio.h>
#include <inttypes.h>
int main(void) {
uint32_t eip;
asm volatile("1: lea 1b, %0;": "=a"(eip));
printf("%" PRIx32 "; %" PRIu32 " bytes from main start\n",
eip, eip - (uint32_t)main);
}
Then
% gcc -m32 eip.c -o eip; ./eip
5663754a; 29 bytes from main start
Proof that it is correct:
% gdb -batch -ex 'file ./eip' -ex 'disassemble main'
Dump of assembler code for function main:
0x0000052d <+0>: lea 0x4(%esp),%ecx
0x00000531 <+4>: and $0xfffffff0,%esp
0x00000534 <+7>: pushl -0x4(%ecx)
0x00000537 <+10>: push %ebp
0x00000538 <+11>: mov %esp,%ebp
0x0000053a <+13>: push %ebx
0x0000053b <+14>: push %ecx
0x0000053c <+15>: sub $0x10,%esp
0x0000053f <+18>: call 0x529 <__x86.get_pc_thunk.dx>
0x00000544 <+23>: add $0x1a94,%edx
0x0000054a <+29>: lea 0x54a,%eax
(in the 32-bit version there are many more lea commands, but this one is the "load my constant address here", which then will be corrected by the dynamic linker when it loads the exe).
EIP can't be read directly. RIP can, with lea 0(%rip), %rax, but it's not a general-purpose register.
Instead of reading an address from a register you can just use a code address directly.
void print_own_address() {
printf("%p\n", print_own_address);
}
If you compile this as PIC (position-independent code), the compiler will get the run-time address of the function by reading EIP or RIP for you. You don't need inline asm for this.
Or for addresses other than functions, GNU C allows labels as values.
void print_label_address() {
for (int i=0 ; i<1000; i++) {
volatile int sink = i;
}
mylabel:
for (int i=0 ; i<1000; i++) {
volatile int sink2 = i;
}
printf("%p\n", &&mylabel); // Take the label address with && GNU C syntax.
}
Compiled on the Godbolt compiler explorer with an without -fPIE to generate position-independent code, we get:
# PIE version:
xor eax, eax # i=0
.L4: # do {
mov DWORD PTR -16[rsp], eax # sink=i
add eax, 1
cmp eax, 1000
jne .L4 # } while(i!=1000);
xor eax, eax # i=0
.L5: # do {
mov DWORD PTR -12[rsp], eax # sink2 = i
add eax, 1
cmp eax, 1000
jne .L5 # }while(i != 1000);
lea rsi, .L5[rip] # address of .L5 = mylabel
lea rdi, .LC0[rip] # format string
xor eax, eax # 0 FP args in XMM regs for a variadic function
jmp printf#PLT # tailcall printf
Without -fPIE, the addresses are link-time constants (and fit in a 32-bit constant), so we get
mov esi, OFFSET FLAT:.L5
mov edi, OFFSET FLAT:.LC0
xor eax, eax
jmp printf
Whether you get a meaningful address from your label or not depends on how aggressively the compiler optimized the code where you put it. Putting a label somewhere may inhibit optimization (like autovectorization) if you even take the label address, but IDK. Maybe it would only hurt if you actually had a goto to it.
You can read rip with another small hack if you are interested. here your full code which reads rip too:
#include <stdio.h>
#include <inttypes.h>
int main()
{
register uint64_t i asm("rsp");
printf("%" PRIx64 " <= $RSP\n", i);
int a = 1;
int b = 2;
char c[] = "A";
char d[] = "B";
printf("%p d = %s \n", &d, d);
printf("%p c = %s \n", &c, c);
printf("%p b = %d \n", &b, b);
printf("%p a = %d \n", &a, a);
register uint64_t j asm("rbp");
printf("%" PRIx64 " <= $RBP\n", j);
uint64_t rip = 0;
asm volatile ("call here2\n\t"
"here2:\n\t"
"pop %0"
: "=m" (rip));
printf("%" PRIx64 " <= $RIP\n", rip);
return 0;
}
Hack here is a fun one. You just call next assembly line. now because return address which is rip in stack, you can retrieve it by a pop instruction from stack. :)
Update:
The main reason for this approach is data injection. see following code:
#include <stdio.h>
#include <inttypes.h>
int main()
{
uint64_t rip = 0;
asm volatile ("call here2\n\t"
".byte 0x41\n\t" // A
".byte 0x42\n\t" // B
".byte 0x43\n\t" // C
".byte 0x0\n\t" // \0
"here2:\n\t"
"pop %0"
: "=m" (rip));
printf("%" PRIx64 " <= $RIP\n", rip);
printf("injected data:%s\n", (char*)rip);
return 0;
}
This approach can inject data inside code segment(which can be usefull for code injection). If you compile and run, you see following output:
400542 <= $RIP
injected data:ABC
You have used rip as placeholder for your data. I personally like this approach, but it can have efficiency impacts as mentioned in comments.
I have tested both codes in 64-bit Ubuntu bash for Windows(Linux subsystem for Windows) and both works.
Update 2:
Please make sure to read comments about red zones. Thanks michael a lot for mentioning this problem and providing an example. :)
If you need to use this code without red zone problem, you need to write it as following (from micheal's sample):
asm volatile ("sub $128, %%rsp\n\t"
"call 1f\n\t"
".byte 0x41\n\t" // A
".byte 0x42\n\t" // B
".byte 0x43\n\t" // C
".byte 0x0\n\t" // \0
"1:\n\t"
"pop %0\n\t"
"add $128, %%rsp"
: "=r" (rip));

Pseudo registers in MSVC

Borland C has pseudo-Registers _AX,_BX, _FLAGS etc that could be used in 'C' code to save the registers to temp variables.
Is there any MSVC equivalent? I tried #AX, #BX, etc, but the compiler (MSVC1.5) gave error ('40' unrecognized symbol).
I'm developing a 16-bit pre-boot app and can't use .
Thanks.
you don't need to have pseudo registers if you only move values between registers and variables. example:
int a = 4;
int b = 999;
__asm
{
mov eax, a; // eax equals to 4
mov b, eax; // b equals to eax
}
// b equals to 4 now
edit: to copy the flags into a variable and back to flags again, you can use LAHF and SAHF instructions. example:
int flags = 0;
__asm
{
lahf;
mov flags, eax;
}
flags |= (1 << 3);
__asm
{
mov eax, flags;
sahf;
// 4th bit of the flag is set
}

Resources