ARM NEON to aarch64 - arm

I have code for ARM NEON armv7-a:
vst2.u8 {d1,d3}, [%1]!
I port it to aarch64 like that:
st2 {v1.8b,v3.8b},[%1],#16
and got an error: Error: invalid register list at operand 1 -- `st2 {v1.8b,v3.8b},[x1],#16'
In accordance with doc this is valid:
ST2 {Vt.<T>, Vt+2.<T>}, vaddr
I can't figure out the problem.
p.s. if i change it like
st2 {v1.8b,v2.8b},[%1],#16
the compiler doesn't break with error message

I am refering to the ARM a64 instruction set architecture here, which was last updated in 2018.
The first link in your comment was only about the aarch32 instruction set. The second link was about the aarch64 instruction set, but it's titled as iterim in the pdf title and was published 2011. The format
ST2 { <Vt>.<T>, <Vt+2>.<T> }, vaddr
is mentioned there (page 89), but this is not included in the current version.
Encoding of ST2
In the current version, ST2 is coded for multiple data structures as follows (see page 1085):
┌───┬───┬──────────┬───┬───────┬──────┬────┬───────┬───────┐
│ 0 │ Q │ 00110010 │ I │ mmmmm │ 1000 │ ss │ nnnnn │ ttttt │
└───┴───┴──────────┴───┴───────┴──────┴────┴───────┴───────┘
Rm size Rn Rt
There are three types of offset the instruction can be used with:
No offset (Rm == 000000 and I == 0):
ST2 { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>]
Immediate offset (Rm == 111111 and I == 1):
ST2 { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <imm>
Register offset (Rm != 111111 and I == 1):
ST2 { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <Xm>
<imm> is #16 or #32 here, regarding to Q. Only the first register's index t is saved in the encoding here. The second register's index is always calculated as t+1 mod 32.
That's why you got the error: the registers must follow one another. There is simply not enough space to encode the second register separately. The two index registers already take up too much lot of space.
Consideration
Wouldn't it be possible to encode the second register? In the case I == 0, Rm is set to 00000, but that's just conventional. This register could be used for our purpose, but only in the case that no immediate or register offset is specified.
I also see the reason why the format with <Vt+2> was not adopted from the draft: it can only be coded for this special case. The implementation would make the implementation of the chip more complex and simply not worthwhile.

Related

U-boot Script Bad Header CRC

I have a "flashing" script being loaded into a Uboot, on an iMX6, from a host PC via sdp. The script has been run through mkimage, so it has an image header. Here's the mkimage command:
mkimage -A arm -O linux -T script -C none -a 0 -e 0 -n "U-Boot script" -d $files_dir$flash_txt $files_dir$flash_scr
I can parse the header with binwalk:
$ binwalk -B flash.scr
DECIMAL HEXADECIMAL DESCRIPTION
--------------------------------------------------------------------------------
0 0x0 uImage header, header size: 64 bytes, header CRC: 0x315E128A, created: 2021-09-13 14:52:46, image size: 2406 bytes, Data Address: 0x0, Entry Point: 0x0, data CRC: 0x7B909BAE, OS: Linux, CPU: ARM, image type: Script file, compression type: none, image name: "U-Boot script"
It fails to execute, with the uboot saying "Bad Header CRC". So, I'm trying to at least force it to work, while on the device, by correcting the CRC on the spot, using the built-in crc32 tool. It still won't work.
Here are uboot commands I'm trying (after the script is already in memory at address 0x10100000):
# display the script bytes
md 0x10100000
# reset the crc
mw 0x10100004 0x0 1
# verify the change
md 0x10100000
# recalc the crc
crc32 0x10100000 64
# write the crc
#mw 0x10100004 <VALUE> 1
# example:
mw 0x10100004 0x3a833d99 1
# run the script
source 0x10100000
When I do this, I still get the dreaded "Bad Header CRC" error!
I've found numerous sources which reveal / indicate which bytes contain the "header CRC". I've seen multiple sources saying that one needs to "reset" the checksum prior to attempting to compute it. This all looks to be valid logic. What am I doing wrong?
Update
The CORRECT uboot input / results
Per help from sawdust
...
Bad header crc
CTRL+C - Operation aborted.
SDP ended
uboot# md 0x10100000
10100000: 56190527 229a2e76 f4f64161 66090000 '..Vv.."aA.....f
10100010: 00000000 00000000 0533a380 00060205 ..........3.....
10100020: 6f422d55 7320746f 70697263 00000074 U-Boot script...
10100030: 00000000 00000000 00000000 00000000 ................
10100040: 5e090000 00000000 20766e65 61666564 ...^....env defa
10100050: 20746c75 2d20662d 65730a61 766e6574 ult -f -a.setenv
10100060: 7a697320 79625f65 685f6574 725f7865 size_byte_hex_r
10100070: 66746f6f 33352073 35453937 65730a41 ootfs 5379E5A.se
10100080: 766e6574 7a697320 6c625f65 5f6b636f tenv size_block_
10100090: 5f786568 746f6f72 32207366 30444239 hex_rootfs 29BD0
101000a0: 7465730a 20766e65 657a6973 7479625f .setenv size_byt
101000b0: 65685f65 62755f78 20746f6f 30433337 e_hex_uboot 73C0
101000c0: 65730a30 766e6574 7a697320 6c625f65 0.setenv size_bl
101000d0: 5f6b636f 5f786568 6f6f6275 39332074 ock_hex_uboot 39
101000e0: 730a0a45 6e657465 6f722076 7366746f E..setenv rootfs
101000f0: 6c5f615f 6c656261 6f722220 7366746f _a_label "rootfs
uboot# mw 0x10100004 0x0 1
uboot# crc32 0x10100000 0x40 0x10100004
crc32 for 10100000 ... 1010003f ==> 94926f83
uboot# md 0x10100000
10100000: 56190527 836f9294 f4f64161 66090000 '..V..o.aA.....f
10100010: 00000000 00000000 0533a380 00060205 ..........3.....
10100020: 6f422d55 7320746f 70697263 00000074 U-Boot script...
10100030: 00000000 00000000 00000000 00000000 ................
10100040: 5e090000 00000000 20766e65 61666564 ...^....env defa
10100050: 20746c75 2d20662d 65730a61 766e6574 ult -f -a.setenv
10100060: 7a697320 79625f65 685f6574 725f7865 size_byte_hex_r
10100070: 66746f6f 33352073 35453937 65730a41 ootfs 5379E5A.se
10100080: 766e6574 7a697320 6c625f65 5f6b636f tenv size_block_
10100090: 5f786568 746f6f72 32207366 30444239 hex_rootfs 29BD0
101000a0: 7465730a 20766e65 657a6973 7479625f .setenv size_byt
101000b0: 65685f65 62755f78 20746f6f 30433337 e_hex_uboot 73C0
101000c0: 65730a30 766e6574 7a697320 6c625f65 0.setenv size_bl
101000d0: 5f6b636f 5f786568 6f6f6275 39332074 ock_hex_uboot 39
101000e0: 730a0a45 6e657465 6f722076 7366746f E..setenv rootfs
101000f0: 6c5f615f 6c656261 6f722220 7366746f _a_label "rootfs
uboot# source 0x10100000
## Executing script at 10100000
... SUCCESS!!!!!
Pre-upload PC Patch
After running mkimage, I fixed the CRC in this way. This could be written in a more elegant fashion I'm sure - but this proved functional at least:
# The mkimage "header CRC" is invalid for some reason, so it needs to be corrected!
$files_dir=files/
$flash_scr=flash.scr
$flash_hdr=flash.hdr
chmod 644 $files_dir$flash_scr
# zero the current crc, and copy the 64 byte header to a separate file
printf '\x00\x00\x00\x00' | dd of=$files_dir$flash_scr bs=1 seek=4 count=4 conv=notrunc
head -c 64 $files_dir$flash_scr > $files_dir$flash_hdr
# calculate the correct crc
crc=$(crc32 $files_dir$flash_hdr)
b1=$(echo $crc | cut -c 1,2)
b2=$(echo $crc | cut -c 3,4)
b3=$(echo $crc | cut -c 5,6)
b4=$(echo $crc | cut -c 7,8)
# update the script, with the correct value
printf "\x${b1}\x${b2}\x${b3}\x${b4}" | dd of=$files_dir$flash_scr bs=1 seek=4 count=4 conv=notrunc
# clean up
rm $files_dir$flash_hdr
The fundamental flaw in mkimage
Ideally, mkimage would do its job correctly in the first place. Why it is malfunctioning is still a mystery...
What am I doing wrong?
U-Boot almost always assumes hexadecimal values for command arguments, so using the 0x... prefix is actually superfluous. AFAIK there is no way to input decimal values.
# recalc the crc
crc32 0x10100000 64
So when you specify the length of the header in the crc32 command as 64, you have specified an incorrect (header) length of 0x64 or 100 decimal.
Reviewing the U-Boot command response confirms your mistake:
=> crc32 0x10100000 64
CRC32 for 10100000 ... 10100063 ==> 9988c6ca
The address range of 0x10100000 through 0x10100063 is a span of 100 bytes.
Addendum (with revisions)
When you install the calculated value, you are probably introducing an endian (byte order) issue.
# write the crc
#mw 0x10100004 <VALUE> 1
The calculated CRC32 value probably needs to specified in reverse byte-order (assuming little-endian mode for the i.MX6) in the mw command.
For example, the mkimage command installs an original header CRC32 value of d8bc0e3a 3a0ebcd8 in the second word (in little-endian order):
=> md 20000000
20000000: 56190527 3a0ebcd8 a0284161 1f000000 '..V...:aA(.....
20000010: ...
=> imi 20000000
## Checking Image at 20000000 ...
Legacy image found
Image Name: U-Boot script
Image Type: PowerPC Linux Script (uncompressed)
Data Size: ...
After zeroing the second word, the CRC32 command produces the (same) value (as expected) (but as a 4-byte string rather than an integer):
=> crc32 20000000 40
crc32 for 20000000 ... 2000003f ==> d8bc0e3a
If you install this value literally using the mw command, then the byte order is not what is required for a 32-bit word value.
=> mw 20000004 d8bc0e3a
=> md 20000000
20000000: 56190527 d8bc0e3a a0284161 1f000000 '..V:...aA(.....
20000010: ...
=> imi 20000000
## Checking Image at 20000000 ...
Legacy image found
Bad Header Checksum
=>
The mw command treats the value to write as a byte string rather than an integer value an integer value, and therefore does not perform a byte reordering (even though this is a little-endian CPU) a byte reordering of the CRC32 value is required.
Addendum 2
Slight corrections to above in regards to which U-Boot commands are displaying results in little-endian mode.
The md command is displaying word values assuming the stored value is in little-endian mode.
To see the true byte order in memory, use the md.b command.
=> md 20000000 4
20000000: 56190527 3a0ebcd8 a0284161 1f000000 '..V...:aA(.....
=> md.b 20000000 10
20000000: 27 05 19 56 d8 bc 0e 3a 61 41 28 a0 00 00 00 1f '..V...:aA(.....
=>
Therefore, the mkimage command installs an original header CRC32 value of 3a0ebcd8 in the second word (in little-endian order).
The CRC32 command produces the 4-byte value as a byte string (i.e. already in little-endian order).
The mw command is aware that this is a little-endian CPU, and does treats the value to write as an integer value.
Since the CRC32 result is a byte string (rather than an integer value), these bytes must be reordered for input using the mw command.
Got that?
One possible way to avoid this endian confusion would be to use the automatic write feature of the CRC32 command.
Append the address of the header CRC to your crc32 command, and the calculated value will be stored in the correct order for you, e.g.
crc32 0x10100000 0x40 0x10100004
Still wonder why you don't have a good CRC32 in the first place using mkimage, and why you had to resort to this hack.

What is the beginning and the end of this disassembled array?

In a disassembled dll (by IDA), I reached an array, which is commented as an array of int (but it may be of byte):
.rdata:000000018003CC00 ; int boxA[264]
.rdata:000000018003CC00 boxA dd 0 ; DATA XREF: BlockPrepXOR+5FC↑r
.rdata:000000018003CC04 db 0Eh
.rdata:000000018003CC05 db 0Bh
.rdata:000000018003CC06 db 0Dh
.rdata:000000018003CC07 db 9
.rdata:000000018003CC08 db 1Ch
.rdata:000000018003CC09 db 16h
.rdata:000000018003CC0A db 1Ah
.rdata:000000018003CC0B db 12h
.rdata:000000018003CC0C db 12h
.rdata:000000018003CC0D db 1Dh
.rdata:000000018003CC0E db 17h
.rdata:000000018003CC0F db 1Bh
Can I interpret the data as
{000000h, E0B0D09h, 1C161A12h, ..} or
{0, 90D0B0Eh, 121A161Ch, ...} or
{00h,00h,00h,00h, 0Eh, 0Bh, ..} ?
From the comment (from IDA), can you confirm that the array ends at CC00h + 253*4 = D01Fh ? I have another array starting at D020h:
.rdata:000000018003D01D db 0F9h ; ù
.rdata:000000018003D01E db 0A2h ; ¢
.rdata:000000018003D01F db 3Fh ; ?
.rdata:000000018003D020 array4_1248 db 1 ; DATA XREF: BlockPrepXOR+39A↑o
.rdata:000000018003D021 db 2
.rdata:000000018003D022 db 4
.rdata:000000018003D023 db 8
That's just the AES decryption's T8 matrix as described in this paper.
You can easily identify it by looking for the DWORDs values on Google (e.g. this is one of the results).
So that's just data for an AES decryption function.
Note also that the interpretation of a sequence of bytes as a sequence of multi-byte data (WORDs, DWORDs, QWORDs, and so on) depends on the architecture.
For x86, only the little-endian interpretation is correct (this is your case 2) but data may undergo arbitrary manipulations (e.g. it can be bswapped) so, when looking on Google, always use both the little and the big-endian versions of the data.
It's also worth noting that IDA can interpret the bytes as DWORDs (type d twice or use the context menù), showing the correct value based on the architecture of disassembled binary.

BPF write fails with 1514 bytes

I'm unable to write 1514 bytes (including the L2 information) via write to /dev/bpf. I can write smaller packets (meaning I think the basic setup is correct), but I see "Message too long" with the full-length packets. This is on Solaris 11.2.
It's as though the write is treating this as the write of an IP packet.
Per the specs, there 1500 bytes for the IP portion, 14 for the L2 headers (18 if tagging), and 4 bytes for the checksum.
I've set the feature that I thought would prevent the OS from adding its own layer 2 information (yes, I also find it odd that a 1 disables it; pseudo code below):
int hdr_complete = 1;
ioctl(bpf, BIOCSHDRCMPLT, &hdr_complete);
The packets are never larger than 1514 bytes (they're captured via a port span and start with the source and destination MAC addresses; I'm effectively replaying them).
I'm sure I'm missing something basic here, but I'm hitting a dead end. Any pointers would be much appreciated!
Partial Answer: This link was very helpful.
Update 3/20/2017
Code works on Mac OS X, but on Solaris results in repeated "Interrupted system call" (EINTR). I'm starting to read scary things about having to implement signal handling, which I'd rather not do...
Sample code on GitHub based on various code I've found via Google. On most systems you have to run this with root privileges unless you've granted "net_rawaccess" to the user.
Still trying to figure out the EINTR issue. Output from truss:
27158/1: 0.0122 0.0000 write(3, 0x08081DD0, 1514) Err#4 EINTR
27158/1: \0 >E1C09B92 4159E01C694\b\0 E\005DC82E1 #\0 #06F8 xC0A81C\fC0A8
27158/1: 1C eC8EF14 Q nB0BC 4 V #FBDE8010FFFF8313\0\00101\b\n ^F3 W # C E
27158/1: d SDD G14EDEB ~ t sCFADC6 qE3C3B7 ,D9D51D VB0DFB0\b96C4B8EC1C90
27158/1: 12F9D7 &E6C2A4 Z 6 t\bFCE5EBBF9C1798 r 4EF "139F +A9 cE3957F tA7
27158/1: x KCD _0E qB9 DE5C1 #CAACFF gC398D9F787FB\n & &B389\n H\t ~EF81
27158/1: C9BCE0D7 .9A1B13 [ [DE\b [ ECBF31EC3 z19CDA0 #81 ) JC9 2C8B9B491
27158/1: u94 iA3 .84B78AE09592 ;DA ] .F8 A811EE H Q o q9B 8A4 cF1 XF5 g
27158/1: EC ^\n1BE2C1A5C2 V 7FD 094 + (B5D3 :A31B8B128D ' J 18A <897FA3 u
EDIT 7 April 2017
The EINTR problem was the result of a bug in the sample code that I placed on GitHub. The code was not associating the bpf device with the actual interface and Solaris was throwing the EINTR as a result.
Now I'm back to the "message too long" problem that I still haven't resolved.

How to change the qword memory offset in Hopper Assembler v3?

So I got the following (as an example):
0x00000001000022c4 db "Apple", 0
0x0000000100002347 db "Ducks", 0
In a procedure it refers to Apple as such:
lea rcx, qword [ds:0x1000022c4] ; "Apple"
Now I like this string to say Ducks and so I tried to modify assembly instruction by saying:
lea rcx, qword [ds:0x100002347]
However when I apply it says something like:
lea rcx, qword [ds:0x2ace]
Why does it do it?
I was able to fix it by going into the hex editor find the hex value, look how much the offset was off and correct it. But it felt cumbersome.
Hopper Disassembler V3 is great tool to do reverse engineering. I have the same problem too. Here is my solution. My Demo arch is x86_64:
00000001000174a6 mov rsi, qword [ds:0x1004b3040] ; #selector(setAlignment:)
When you see this, it's not mean you could modify the address(0x1004b3040) to whatever you want.
Exactly the assemble code is:
00000001000174a6 movq 0x49bb93(%rip), %rsi ## Objc selector ref: setAlignment:
That means you should convert target address '0x49bb93'
The formula is 0x1004b3040 - 00000001000174a6 - 7 = 0x49bb93
So if you want to modify the address to 100002347 'Ducks', you should follow this formula and find the byte length of your instruction, my is '7'
In my demo I'd like to modify the #selector(setAlignment:) to #selector(setHidden:), So I have to convert it with the formula below:
0x1004b2238 - 0x1000174a6 - 7 = 0x49ad8b
So modify the hex code with 48 8b 35 8b ad 49 00, press 'command + shift + H' to show hex editor in Hopper.
Here comes some demo pictures:
Before my work
After my work
My english is not very good, so welcome to reply.

Id2sym & symbol.object_id

Using ruby-hacking-guide site, I've found that fixnum << 8 | 1 is object_id of any fixnum.
I've tried using similar approach with symbol.
#define ID2SYM(x) ((VALUE)(((long)(x))<<8|SYMBOL_FLAG))
When shifting 8 bits left, x becomes a multiple of 256, that means a
multiple of 4. Then after with a bitwise or (in this case it’s the
same as adding) with 0×0e (14 in decimal)
I have tried it with :a(:a.object_id = 175_976, on my 32-bit system):
ASCII number of a is 97.
97 << 8 = 24832
24832 | 14 = 24_846
So it's not even close to :a's object id.
I've checked source of object_id and found this:
* sizeof(RVALUE) is
* 20 if 32-bit, double is 4-byte aligned
* 24 if 32-bit, double is 8-byte aligned
* 40 if 64-bit
*/
if (SYMBOL_P(obj)) {
return (SYM2ID(obj) * sizeof(RVALUE) + (4 << 2)) | FIXNUM_FLAG;
I got ~ 500 000, which is bad value.
So what I'm missing? How to calculate object_id of symbol?
The ID value that you calculate from a symbols object_id doesn’t directly represent the string content of that symbol. It is an index into a table that Ruby maintains containing the string. When you use a symbol in Ruby then if that symbol hasn’t been used before in the current process it will be given the ID value of the next free slot in the symbol table.
This means that a given symbol won’t always have the same ID value. The ID value associated a Ruby processes symbols will depend on the order that they are created.
You can see this by starting a new Ruby process, creating a new symbol and looking at its object_id, and then repeating with a different symbol name. The object_id should be the same in both cases, since it will be referring to the next free spot in the symbol table. You need to be careful doing this as Ruby defines a lot of symbols itself, so if you use one of these you’ll get different results.
For example, an irb session:
2.1.0 :001 > Symbol.all_symbols.find {|sym| sym.to_s == 'matt' }
=> nil
2.1.0 :002 > :matt.object_id
=> 542248
And another:
2.1.0 :001 > Symbol.all_symbols.find {|sym| sym.to_s == 'banana' }
=> nil
2.1.0 :002 > :banana.object_id
=> 542248
Here we first check to see if the name we are going to use doesn’t already exist as a symbol, then we create the symbol and look at its object_id. In both cases it is the same 542248, corresponding to an ID of 2118, even though they have different names (these values may differ on different systems or Ruby versions).

Resources