Displaying PSW content - masm

I'm beginner with asm, so I've been researching for my question for a while but answears were unsatisfactory. I'm wondering how to display PSW content on standard output. Other thing, how to display Instruction Pointer value ? I would be very gratefull if ypu could give me a hint (or better a scratch of code). It may be masm or 8086 as well (actually I don't know wthat is the difference :) )

The instruction pointer is not directly accessible on the x86 family, however, it is quite straightforward to retrieve its value - it will never be accurate though.
Since a subroutine call places the return address on the stack, you just need to copy it from there and violá! You have the address of the opcode following the call instruction:
proc getInstructionPointer
push bp
mov bp,sp
mov ax,[word ptr ss:bp + 2]
mov sp,bp
pop bp
ret
endp getInstructionPointer
The PSW on the x86 is called the Flags register. There are two operations that explicitly reference it: pushf and popf. As you might have guessed, you can simply push the Flags onto the stack and load it to any general purpose register you like:
pushf
pop ax
Displaying these values consists of converting their values to ASCII and writing them onto the screen. There are several ways of doing this - search for "string output assembly", I bet you find the answer.
To dispel a minor confusion: 8086 is the CPU itself, whereas MASM is the assembler. The syntax is assembler-specific; MASM assembly is x86 assembly. TASM assembly is x86 assembly as well, just like NASM assembly.
When one says "x86 Assembly", he/she is referencing any of these (or others), talking about the instruction set, not the dialect.
Note that the above examples are 16bit, indtended for 8086 and won't work on 80386+ in 32bit mode

Related

Variadic Function 64-Bit Windows

I am trying to get the parameters of a 64-Bit __fastcall function, but I am having a couple of issues/questions.
1) I checked the registers in the debugger and when I have 3 32-bit parameters and a void function, the second one goes into RDX, the third one into R8 and the first one I cannot see at all and assume is on the stack.
I did not check every possible combination but this goes against what MSDN's documentation on 64-bit __fastcall says. ...Or am I missing something?
-- Regarding 1 I just realized I think it says that if I pass a 32-bit value into a 64-bit register it's not 0 extended so I probably just missed it due to gibberish data that was in the RCX register.
Due to VS not support 64-bit inline assembly or any useful intrinsics (At least that I can find), I wrote a shellcode to get all of the parameters from RCX, RDX, R8, R9, XMM0-3.
The issue here is that in order to prepare the shellcode I have to allocate memory, copy memory then set the EIP to my shellcode or calling it, etc. which screws up the thread's context. Is there any way to cleanly do this?

Multiplication of corresponding values in an array

I want to write an x86 program that multiplies corresponding elements of 2 arrays (array1[0]*array2[0] and so on till 5 elements) and stores the results in a third array. I don't even know where to start. Any help is greatly appreciated.
First thing you'll want to get is an assembler, I'm personally a big fan of NASM in my opinion it has a very clean and concise syntax, it's also what I started on so that's what I'll use for this answer.
Other than NASM you have:
GAS
This is the GNU assembler, unlike NASM there are versions for many architectures so the directives and way of working will be about the same other than the instructions if you switch architectures. GAS does however have the unfortunate downside of being somewhat unfriendly for people who want to use the Intel syntax.
FASM
This is the Flat Assembler, it is an assembler written in Assembly. Like NASM it's unfriendly to people who want to use AT&T syntax. It has a few rough edges but some people seem to prefer it for DOS applications (especially because there's a DOS port of it) and bare metal work.
Now you might be reading 'AT&T syntax' and 'Intel syntax' and wondering what's meant by that. These are dialects of x86 assembly, they both assemble to the same machine code but reflect slightly different ways of thinking about each instruction. AT&T syntax tends to be more verbose whereas Intel syntax tends to be more minimal, however certain parts of AT&T syntax have nicer operand orderings tahn Intel syntax, a good demonstration of the difference is the mov instruction:
AT&T syntax:
movl (0x10), %eax
This means get the long value (1 dword, aka 4 bytes) and put it in the register eax. Take note of the fact that:
The mov is suffixed with the operand length.
The memory address is surrounded in parenthesis (you can think of them like a pointer dereference in C)
The register is prefixed with %
The instruction moves the left operand into the right operand
Intel Syntax:
mov eax, [0x10]
Take note of the fact that:
We do not need to suffix the instruction with the operand size, the assembler infers it, there are situations where it can't, in which case we specify the size next to the address.
The register is not prefixed
Square brackets are used to address memory
The second operand is moved into the first operand
I will be using Intel syntax for this answer.
Once you've installed NASM on your machine you'll want a simple build script (when you start writing bigger programs use a Makefile or some other proper build system, but for now this will do):
nasm -f elf arrays.asm
ld -o arrays arrays.o -melf_i386
rm arrays.o
echo
echo " Done building, the file 'arrays' is your executable"
Remember to chmod +x the script or you won't be able to execute it.
Now for the code along with some comments explaining what everything means:
global _start ; The linker will be looking for this entrypoint, so we need to make it public
section .data ; We're going on to describe our data here
array_length equ 5 ; This is effectively a macro and isn't actually being stored in memory
array1 dd 1,4,1,5,9 ; dd means declare dwords
array2 dd 2,6,5,3,5
sys_exit equ 1
section .bss ; Data that isn't initialised with any particular value
array3 resd 5 ; Leave us 5 dword sized spaces
section .text
_start:
xor ecx,ecx ; index = 0 to start
; In a Linux static executable, registers are initialized to 0 so you could leave this out if you're never going to link this as a dynamic executable.
_multiply_loop:
mov eax, [array1+ecx*4] ; move the value at the given memory address into eax
; We calculate the address we need by first taking ecx (which tells us which
; item we want) multiplying it by 4 (i.e: 4 bytes/1 dword) and then adding it
; to our array's start address to determine the address of the given item
imul eax, dword [array2+ecx*4] ; This performs a 32-bit integer multiply
mov dword [array3+ecx*4], eax ; Move our result to array3
inc ecx ; Increment ecx
; While ecx is a general purpose register the convention is to use it for
; counting hence the 'c'
cmp ecx, array_length ; Compare the value in ecx with our array_length
jb _multiply_loop ; Restart the loop unless we've exceeded the array length
; If the loop has concluded the instruction pointer will continue
_exit:
mov eax, sys_exit ; The system call we want
; ebx is already equal to 0, ebx contains the exit status
mov ebp, esp ; Prepare the stack before jumping into the system
sysenter ; Call the Linux kernel and tell it that our program has concluded
If you wanted the full 64-bit result of the 32-bit multiply, use one-operand mul. But normally you only want a result that's the same width as the inputs, in which case imul is most efficient and easiest to use. See links in the x86 tag wiki for docs and tutorials.
You'll notice that this program has no output. I'm not going to cover writing the algorithm to print numbers because we'd be here all day, that's an exercise for the reader (or see this Q&A)
However in the meantime we can run our program in gdbtui and inspect the data, use your build script to build then open your program with the command gdbtui arrays. You'll want to enter these commands:
layout asm
break _exit
run
print (int[5])array3
And GDB will display the results.

Indexed addressing in x86 assembly - is something like mov array[ebx],eax a valid instruction?

My understanding is that indexed addressing essentially produces an address whose offset is the number in the brackets. Is this understanding correct? But I also understand that this address is dereferenced somehow. I don't understand exactly how this works. The book Assembly Language for Intel-Based Computers shows a lot of instructions of the form mov reg,array[reg] that move data from an indexed array element into a register. But I need a way to move data back from the register into the array. How can I do this? Do I use the opposite of that, which would be mov array[reg],reg? Or would this dereference array[reg] and move the data into the address given by the value stored in that array element?
For example, suppose the array index is 3 and it's stored in register EBX, and I want to move the value stored in EAX into that array element. The value currently stored in that array element is 500 hex. If I use the instruction mov array[ebx],eax, will this instruction move the value in EAX into array[3], or will it move it into memory location 500 hex? And if it's the latter case, what instruction can I use to avoid this effect and do what I actually want to do, which is move the data into array[3]?
Note: The syntax I am using is for MASM. I do not have MASM installed on my machine, and it's not really an option since I'm using Ubuntu. But the book I'm reading is written for MASM, so I'm learning MASM first, just to get a feel for how x86 assembly language works. I'm not assembling any programs, but I'd like to understand them.

What is the gcc ARM equivalent to this x64 assembly

This is the X64 code (I don't know much about assembly, and this code can be compiled by visual studio, don't know which format it is):
.code
extern mProcs:QWORD
myfunc proc
jmp mProcs[1*8]
myfunc endp
The mProcs is an array defined in C code, and the function myfunc simply jmp to the second element in the array. If viewed from C, it is jumping to *(mProcs+1) (1*8 because in x64 a pointer is 8 bytes).
In GCC ARM version, I tried to do this:
.extern mProcs
.global myfunc
myfunc:
b mProcs+4
(here mProcs+4 because a pointer is 4 bytes)
But this code seems no working. In C does it means jump to *(mProcs+1) or jump to mProcs+1?
How can I make it *(mProcs+1)?
============================================================================
After discussion in the comments with Michael, I understood that I need to do the calculation on the register and then use the bx instruction to jump to the target function.
However, the problem comes here.
Since I'm implementing a thunk (I'm intercepting the function call and do something in the middle), I have no idea how the target routine uses the register.
1. I need to keep the callee save registers before I jmp to the target, else the target function will be preserving the wrong value.
2. I need to keep the argument registers intact before I perform a jump, so that the target function will have the correct arguments.
3. The above 2 points means I can only use caller save but non-argument registers.
r0-r3 is the argument register, and r4-r12 are callee save register, r13 onwards are special registers.
Which means none of the registers can be used without restoring the value.
If the bx instruction can only operate on a register, then there is no chance I can restore that register even if that register is temporally saved on the stack.
Any solutions? Or just arm binaries can't be hooked.

How to use relative position in c/assembly?

It's said Position Independent Code only uses relative position instead of absolute positions, how's this implemented in c and assembly respectively?
Let's take char test[] = "string"; as an example, how to reference it by relative address?
In C, position-independent code is a detail of the compiler's implementation. See your compiler manual to determine whether it is supported and how.
In assembly, position-independent code is a detail of the instruction set architecture. See your CPU manual to find out how to read the PC (program counter) register, how efficient that is, and what the recommended best practices are in translating a code address to a data address.
Position-relative data is less popular now that code and data are separated into different pages on most modern operating systems. It is a good way to implement self-contained executable modules, but the most common such things nowadays are viruses.
On x86, position-independent code in principle looks like this:
call 1f
1: popl %ebx
followed by use of ebx as a base pointer with a displacement equal to the distance between the data to be accessed and the address of the popl instruction.
In reality it's often more complicated, and typically a tiny thunk function might be used to load the PIC register like this:
load_ebx:
movl 4(%esp),%ebx
addl $some_offset,%ebx
ret
where the offset is chosen such that, when the thunk returns, ebx contains a pointer to a designated special point in the program/library (usually the start of the global offset table), and then all subsequent ebx-relative accesses can simply use the distance between the desired data and the designated special point as the offset.
On other archs everything is similar in principle, but there may be easier ways to load the program counter. Many simply let you use the pc or ip register as an ordinary register in relative addressing modes.
In pseudo code it could look like:
lea str1(pc), r0 ; load address of string relative to the pc (assuming constant strings, maybe)
st r0, test ; save the address in test (test could also be PIC, in which case it could be relative
; to some register)
A lot depends on your compiler and CPU architecture, as the previous answer stated. One way to find out would be to compile with the appropriate flags (-PIC -S for gcc) and look at the assembly language you get.

Resources