How to change the qword memory offset in Hopper Assembler v3? - disassembly

So I got the following (as an example):
0x00000001000022c4 db "Apple", 0
0x0000000100002347 db "Ducks", 0
In a procedure it refers to Apple as such:
lea rcx, qword [ds:0x1000022c4] ; "Apple"
Now I like this string to say Ducks and so I tried to modify assembly instruction by saying:
lea rcx, qword [ds:0x100002347]
However when I apply it says something like:
lea rcx, qword [ds:0x2ace]
Why does it do it?
I was able to fix it by going into the hex editor find the hex value, look how much the offset was off and correct it. But it felt cumbersome.

Hopper Disassembler V3 is great tool to do reverse engineering. I have the same problem too. Here is my solution. My Demo arch is x86_64:
00000001000174a6 mov rsi, qword [ds:0x1004b3040] ; #selector(setAlignment:)
When you see this, it's not mean you could modify the address(0x1004b3040) to whatever you want.
Exactly the assemble code is:
00000001000174a6 movq 0x49bb93(%rip), %rsi ## Objc selector ref: setAlignment:
That means you should convert target address '0x49bb93'
The formula is 0x1004b3040 - 00000001000174a6 - 7 = 0x49bb93
So if you want to modify the address to 100002347 'Ducks', you should follow this formula and find the byte length of your instruction, my is '7'
In my demo I'd like to modify the #selector(setAlignment:) to #selector(setHidden:), So I have to convert it with the formula below:
0x1004b2238 - 0x1000174a6 - 7 = 0x49ad8b
So modify the hex code with 48 8b 35 8b ad 49 00, press 'command + shift + H' to show hex editor in Hopper.
Here comes some demo pictures:
Before my work
After my work
My english is not very good, so welcome to reply.

Related

What is the beginning and the end of this disassembled array?

In a disassembled dll (by IDA), I reached an array, which is commented as an array of int (but it may be of byte):
.rdata:000000018003CC00 ; int boxA[264]
.rdata:000000018003CC00 boxA dd 0 ; DATA XREF: BlockPrepXOR+5FC↑r
.rdata:000000018003CC04 db 0Eh
.rdata:000000018003CC05 db 0Bh
.rdata:000000018003CC06 db 0Dh
.rdata:000000018003CC07 db 9
.rdata:000000018003CC08 db 1Ch
.rdata:000000018003CC09 db 16h
.rdata:000000018003CC0A db 1Ah
.rdata:000000018003CC0B db 12h
.rdata:000000018003CC0C db 12h
.rdata:000000018003CC0D db 1Dh
.rdata:000000018003CC0E db 17h
.rdata:000000018003CC0F db 1Bh
Can I interpret the data as
{000000h, E0B0D09h, 1C161A12h, ..} or
{0, 90D0B0Eh, 121A161Ch, ...} or
{00h,00h,00h,00h, 0Eh, 0Bh, ..} ?
From the comment (from IDA), can you confirm that the array ends at CC00h + 253*4 = D01Fh ? I have another array starting at D020h:
.rdata:000000018003D01D db 0F9h ; ù
.rdata:000000018003D01E db 0A2h ; ¢
.rdata:000000018003D01F db 3Fh ; ?
.rdata:000000018003D020 array4_1248 db 1 ; DATA XREF: BlockPrepXOR+39A↑o
.rdata:000000018003D021 db 2
.rdata:000000018003D022 db 4
.rdata:000000018003D023 db 8
That's just the AES decryption's T8 matrix as described in this paper.
You can easily identify it by looking for the DWORDs values on Google (e.g. this is one of the results).
So that's just data for an AES decryption function.
Note also that the interpretation of a sequence of bytes as a sequence of multi-byte data (WORDs, DWORDs, QWORDs, and so on) depends on the architecture.
For x86, only the little-endian interpretation is correct (this is your case 2) but data may undergo arbitrary manipulations (e.g. it can be bswapped) so, when looking on Google, always use both the little and the big-endian versions of the data.
It's also worth noting that IDA can interpret the bytes as DWORDs (type d twice or use the context menù), showing the correct value based on the architecture of disassembled binary.

How to accomplish this byte munging in perl?

Background:
I'm trying to use the perl script from here to decrypt an android backup. Unfortunately, the checksum validation fails.
After playing around with this (Python) script, the problem seems to be that I need to do some additional munging of the master key (n.b. masterKeyJavaConversion in the Python script).
Problem:
I need to take a bag of bytes and perform the following conversion steps:
Sign-extend from signed char to signed short
Convert the result from UTF16 (BE?) to UTF-8
For example (all bytes are in hex):
3x → 3x
7x → 7x
ax -> ef be ax
bx -> ef be bx
cx -> ef bf 8x
dx -> ef bf 9x
ex -> ef bf ax
fx -> ef bf bx
(The x always remains unchanged.)
More specifically, given a bit sequence 1abc defg, I need to output 1110 1111 1011 111a 10bc defg. (For 0abc defg, the output is just 0abc defg, i.e. unchanged.)
Answers may use UTF conversions or may do the bit twiddling directly; I don't care, as long as it works (this isn't performance-critical). Answers in the form of a subroutine are ideal. (My main problem is I know just enough Perl to be dangerous. If this was C/C++, I wouldn't need help, but it would be a major undertaking to rewrite the entire script in another language, or to modify the Python script to not need to read the entire input into memory.)
1110 1111 1011 111a 10bc defg would be a valid UTF-8 encoding.
++++-------------------------- Start of three byte sequence
|||| ++------------------- Continuation byte
|||| || ++---------- Continuation byte
|||| || ||
11101111 1011111a 10bcdefg
|||| |||||| ||||||
++++---++++++---++++++---- 1111 1111 1abc defg
That's just the extension of an 8-bit signed number to 16 bits, cast to unsigned, and treated as a Unicode Code Point.
So, without looking at the code, I think you want
sub encode_utf8 {
my ($s) = #_;
utf8::encode($s);
return $s;
}
sub munge {
return
encode_utf8 # "\x30\x70\xEF\xBE\xA0..."
pack 'W*', # "\x{0030}\x{0x0070}\x{0xFFA0}..."
unpack 'S*', # 0x0030, 0x0070, 0xFFA0, ...
pack 's*', # "\x30\x00\x70\x00\xA0\xFF..." (on a LE machine)
unpack 'c*', # 48, 112, -96, ...
$_[0]; # "\x30\x70\xA0..."
}
my $s = "\x30\x70\xA0\xB0\xC0\xD0\xE0\xF0";
my $munged = munge($s);
If you remove the comments, you get the following:
sub munge {
my $s = pack 'W*', unpack 'S*', pack 's*', unpack 'c*', $_[0];
utf8::encode($s);
return $s;
}
Here's a much faster solution:
my #map = (
( map chr($_), 0x00..0x7F ),
( map "\xEF\xBE".chr($_), 0x80..0xBF ),
( map "\xEF\xBF".chr($_), 0xC0..0xFF ),
);
sub munge { join '', #map[ unpack 'C*', $_[0] ] }
This may not be as elegant as ikegami's answer, but it worked:
sub munge_mk
{
my $out;
foreach(unpack('C*', $_[0])) {
if($_ < 128) {
$out .= chr($_);
} else {
my $hi = 0xbc | (($_ & 0xc0) >> 6);
my $lo = 0x80 | ($_ & 0x3f);
$out .= chr(0xef) . chr($hi) . chr($lo);
}
}
return $out;
}

ARM NEON to aarch64

I have code for ARM NEON armv7-a:
vst2.u8 {d1,d3}, [%1]!
I port it to aarch64 like that:
st2 {v1.8b,v3.8b},[%1],#16
and got an error: Error: invalid register list at operand 1 -- `st2 {v1.8b,v3.8b},[x1],#16'
In accordance with doc this is valid:
ST2 {Vt.<T>, Vt+2.<T>}, vaddr
I can't figure out the problem.
p.s. if i change it like
st2 {v1.8b,v2.8b},[%1],#16
the compiler doesn't break with error message
I am refering to the ARM a64 instruction set architecture here, which was last updated in 2018.
The first link in your comment was only about the aarch32 instruction set. The second link was about the aarch64 instruction set, but it's titled as iterim in the pdf title and was published 2011. The format
ST2 { <Vt>.<T>, <Vt+2>.<T> }, vaddr
is mentioned there (page 89), but this is not included in the current version.
Encoding of ST2
In the current version, ST2 is coded for multiple data structures as follows (see page 1085):
┌───┬───┬──────────┬───┬───────┬──────┬────┬───────┬───────┐
│ 0 │ Q │ 00110010 │ I │ mmmmm │ 1000 │ ss │ nnnnn │ ttttt │
└───┴───┴──────────┴───┴───────┴──────┴────┴───────┴───────┘
Rm size Rn Rt
There are three types of offset the instruction can be used with:
No offset (Rm == 000000 and I == 0):
ST2 { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>]
Immediate offset (Rm == 111111 and I == 1):
ST2 { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <imm>
Register offset (Rm != 111111 and I == 1):
ST2 { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <Xm>
<imm> is #16 or #32 here, regarding to Q. Only the first register's index t is saved in the encoding here. The second register's index is always calculated as t+1 mod 32.
That's why you got the error: the registers must follow one another. There is simply not enough space to encode the second register separately. The two index registers already take up too much lot of space.
Consideration
Wouldn't it be possible to encode the second register? In the case I == 0, Rm is set to 00000, but that's just conventional. This register could be used for our purpose, but only in the case that no immediate or register offset is specified.
I also see the reason why the format with <Vt+2> was not adopted from the draft: it can only be coded for this special case. The implementation would make the implementation of the chip more complex and simply not worthwhile.

Simple 8086 Assembly String Array and Printing

I have to write a program for my Assembly class that allows the user to enter his/her full name and the program then utilizes and array to store the characters and print them in the following order: "LastName", "Middle"(Optional and can have more than one middle name" "First". I have gotten my program to almost do this except it prints out a '$' at the start of the Last name. I have tried incrementing the index (bx) but then it gives me a garbled output. I am fairly new to assembly so please bear with me. I am thinking my macros might be interfering with my output when I increment the index. Also please bear with my formatting. I can never transfer code correctly. Thanks in Advance!
Here is my code:
;This Program Reads in a user's full name and prints out the results in the format
'Lastname', 'Middle'(Optional), First using an array.
include pcmac.inc
.MODEL SMALL
.586 ;Allows Pentium instructions. Must come after .MODEL
.STACK 100h
.DATA
MAXBUF EQU 100
GetBuf DB MAXBUF
GetCnt DB ?
CharStr DB MAXBUF DUP (?)
Message DB 'Enter your full name',10,13,'$'
Message2 DB 'Here is your name in the correct format', 10,13,'$'
Count DB 0
.CODE
Array PROC
_Begin
_PutStr Message
_GetStr GetBuf
mov bl, GetCnt
sub bh,bh
FindLast:
cmp [CharStr+bx],32
je SeperateLast
dec bx
inc Count ;Counter to record how long the lastname is
jmp FindLast
SeperateLast:
mov [CharStr+bx],'$'
_PutStr Message2
jmp Printlast
FirstName:
_PutCh ',',32 ;Add comma and space for readability
_PutStr CharStr ;Print up to the inputted dollar sign
_PutCh 10,13
jmp Done
Printlast:
cmp Count,0
je FirstName
_PutCh[CharStr+bx] ;Print Last Name Character by Character
inc bx
dec Count
jmp Printlast
Done:
_Exit 0
Array ENDP
END Array
From what I can see in your code, you seem to be properly finding the last name. Since the space before the last name represents the end of the First and Middle name it makes sense to replace the space with the $ sign just before the last name. Since BX is the offset of the dollar sign you just put in, you should increment BX by one to skip over it. You then need to decrement the Count variable by 1 as well.
This code:
mov [CharStr+bx],'$'
_PutStr Message2
jmp Printlast
Should probably be something like:
mov [CharStr+bx],'$'
inc bx
dec Count
_PutStr Message2
jmp Printlast

FindFirstFile and FindNextFile crash my program in assembly. Why?

I am writing a program that looking for files with the extension ".fly". It displays the names of the founded files, but when I try to use FindFirstFile and FindNextFile, my program crashes. I have tried with FindFirstFileA (ANSI Version of the function) , but my code crashes too. Please, give me a example of code if it's possible.
Please, thanks for your answers. Here's my source code written in FASM Assembly
format pe console 4.0
include 'C:\fasm\INCLUDE\WIN32AX.INC'
; Declare a macro to make buffers
macro dup label,lcount
{ forward
label#:
common
repeat lcount
forward
db 0
common
end repeat }
.code
main:
explore_directoy:
invoke FindFirstFile,file_extension, FIND_STRUCT ; find the first *.fly
mov dword [find_handle], eax
find_more_files:
cmp eax, 0
je exit
call show_file
findnextfile:
invoke FindNextFile, dword [find_handle], FIND_STRUCT
jmp find_more_files
show_file:
invoke MessageBox, NULL, addr msg, addr msg_caption, MB_OK ; Easy way of outputing the text
;invoke MessageBox, NULL, addr cFileName, addr msg_caption, MB_OK
ret
exit:
invoke ExitProcess, 0
datazone:
end_msg db 'End of program', 0
msg db 'File founded', 0
msg_caption db 'Assembly program...', 0
file_extension db '*.fly', 0
find_handle dd 0 ; handles/other stuff..
FIND_STRUCT: ; find structure used with searching
dwFileAttributes dd 0
ftCreationTime dd 0,0
ftLastAccessTime dd 0,0
ftLastWriteTime dd 0,0
nFileSizeHigh dd 0
nFileSizeLow dd 0
dwReserved0 dd 0
dwReserved1 dd 0
dup cFileName, 256 ; found file buffer
dup cAlternate, 14
.end main
Your .text section isn't writable. Change
.code
to
section '.text' readable writable executable
Data should go into the .data section, not the .code section. If your assembler supports uninitialized data (all values defined as ?), then it should go into the .data? section (this is the equivalent of a bss (block started by symbol) section used in some other assemblers).

Resources