How to accomplish this byte munging in perl? - arrays

Background:
I'm trying to use the perl script from here to decrypt an android backup. Unfortunately, the checksum validation fails.
After playing around with this (Python) script, the problem seems to be that I need to do some additional munging of the master key (n.b. masterKeyJavaConversion in the Python script).
Problem:
I need to take a bag of bytes and perform the following conversion steps:
Sign-extend from signed char to signed short
Convert the result from UTF16 (BE?) to UTF-8
For example (all bytes are in hex):
3x → 3x
7x → 7x
ax -> ef be ax
bx -> ef be bx
cx -> ef bf 8x
dx -> ef bf 9x
ex -> ef bf ax
fx -> ef bf bx
(The x always remains unchanged.)
More specifically, given a bit sequence 1abc defg, I need to output 1110 1111 1011 111a 10bc defg. (For 0abc defg, the output is just 0abc defg, i.e. unchanged.)
Answers may use UTF conversions or may do the bit twiddling directly; I don't care, as long as it works (this isn't performance-critical). Answers in the form of a subroutine are ideal. (My main problem is I know just enough Perl to be dangerous. If this was C/C++, I wouldn't need help, but it would be a major undertaking to rewrite the entire script in another language, or to modify the Python script to not need to read the entire input into memory.)

1110 1111 1011 111a 10bc defg would be a valid UTF-8 encoding.
++++-------------------------- Start of three byte sequence
|||| ++------------------- Continuation byte
|||| || ++---------- Continuation byte
|||| || ||
11101111 1011111a 10bcdefg
|||| |||||| ||||||
++++---++++++---++++++---- 1111 1111 1abc defg
That's just the extension of an 8-bit signed number to 16 bits, cast to unsigned, and treated as a Unicode Code Point.
So, without looking at the code, I think you want
sub encode_utf8 {
my ($s) = #_;
utf8::encode($s);
return $s;
}
sub munge {
return
encode_utf8 # "\x30\x70\xEF\xBE\xA0..."
pack 'W*', # "\x{0030}\x{0x0070}\x{0xFFA0}..."
unpack 'S*', # 0x0030, 0x0070, 0xFFA0, ...
pack 's*', # "\x30\x00\x70\x00\xA0\xFF..." (on a LE machine)
unpack 'c*', # 48, 112, -96, ...
$_[0]; # "\x30\x70\xA0..."
}
my $s = "\x30\x70\xA0\xB0\xC0\xD0\xE0\xF0";
my $munged = munge($s);
If you remove the comments, you get the following:
sub munge {
my $s = pack 'W*', unpack 'S*', pack 's*', unpack 'c*', $_[0];
utf8::encode($s);
return $s;
}
Here's a much faster solution:
my #map = (
( map chr($_), 0x00..0x7F ),
( map "\xEF\xBE".chr($_), 0x80..0xBF ),
( map "\xEF\xBF".chr($_), 0xC0..0xFF ),
);
sub munge { join '', #map[ unpack 'C*', $_[0] ] }

This may not be as elegant as ikegami's answer, but it worked:
sub munge_mk
{
my $out;
foreach(unpack('C*', $_[0])) {
if($_ < 128) {
$out .= chr($_);
} else {
my $hi = 0xbc | (($_ & 0xc0) >> 6);
my $lo = 0x80 | ($_ & 0x3f);
$out .= chr(0xef) . chr($hi) . chr($lo);
}
}
return $out;
}

Related

What is the beginning and the end of this disassembled array?

In a disassembled dll (by IDA), I reached an array, which is commented as an array of int (but it may be of byte):
.rdata:000000018003CC00 ; int boxA[264]
.rdata:000000018003CC00 boxA dd 0 ; DATA XREF: BlockPrepXOR+5FC↑r
.rdata:000000018003CC04 db 0Eh
.rdata:000000018003CC05 db 0Bh
.rdata:000000018003CC06 db 0Dh
.rdata:000000018003CC07 db 9
.rdata:000000018003CC08 db 1Ch
.rdata:000000018003CC09 db 16h
.rdata:000000018003CC0A db 1Ah
.rdata:000000018003CC0B db 12h
.rdata:000000018003CC0C db 12h
.rdata:000000018003CC0D db 1Dh
.rdata:000000018003CC0E db 17h
.rdata:000000018003CC0F db 1Bh
Can I interpret the data as
{000000h, E0B0D09h, 1C161A12h, ..} or
{0, 90D0B0Eh, 121A161Ch, ...} or
{00h,00h,00h,00h, 0Eh, 0Bh, ..} ?
From the comment (from IDA), can you confirm that the array ends at CC00h + 253*4 = D01Fh ? I have another array starting at D020h:
.rdata:000000018003D01D db 0F9h ; ù
.rdata:000000018003D01E db 0A2h ; ¢
.rdata:000000018003D01F db 3Fh ; ?
.rdata:000000018003D020 array4_1248 db 1 ; DATA XREF: BlockPrepXOR+39A↑o
.rdata:000000018003D021 db 2
.rdata:000000018003D022 db 4
.rdata:000000018003D023 db 8
That's just the AES decryption's T8 matrix as described in this paper.
You can easily identify it by looking for the DWORDs values on Google (e.g. this is one of the results).
So that's just data for an AES decryption function.
Note also that the interpretation of a sequence of bytes as a sequence of multi-byte data (WORDs, DWORDs, QWORDs, and so on) depends on the architecture.
For x86, only the little-endian interpretation is correct (this is your case 2) but data may undergo arbitrary manipulations (e.g. it can be bswapped) so, when looking on Google, always use both the little and the big-endian versions of the data.
It's also worth noting that IDA can interpret the bytes as DWORDs (type d twice or use the context menù), showing the correct value based on the architecture of disassembled binary.

Difference between constants defined in Go's syscall and C's stat.h [duplicate]

Update: based on the comment and response so far, I guess I should make it explicit that I understand 0700 is the octal representation of the decimal number 448. My concern here is that when an octal mode parameter or when a decimal number is recast as octal and passed to the os.FileMode method the resulting permissions on the file created using WriteFile don't seem to line up in a way that makes sense.
I worked as hard as I could to reduce the size of the question to its essence, maybe I need to go thru another round of that
Update2: after re-re-reading, I think I can more succinctly state my issue. Calling os.FileMode(700) should be the same as calling it with the binary value 1-010-111-100. With those 9 least significant bits there should be permissions of:
--w-rwxr-- or 274 in octal (and translates back to
Instead, that FileMode results in WriteFile creating the file with:
--w-r-xr-- which is 254 in octal.
When using an internal utility written in go, there was a file creation permission bug caused by using decimal 700 instead of octal 0700 when creating the file with ioutil.WriteFile(). That is:
ioutil.WriteFile("decimal.txt", "filecontents", 700) <- wrong!
ioutil.WriteFile("octal.txt", "filecontents", 0700) <- correct!
When using the decimal number (ie. no leading zero to identify it as an octal number to go_lang) the file that should have had permissions
0700 -> '-rwx------' had 0254 -> '--w-r-xr--'
After it was fixed, I noticed that when I converted 700 decimal to octal, I got “1274” instead of the experimental result of "0254".
When I converted 700 decimal to binary, I got: 1-010-111-100 (I added dashes where the rwx’s are separated). This looks like a permission of "0274" except for that leading bit being set.
I went looking at the go docs for FileMode and saw that under the covers FileMode is a uint32. The nine smallest bits map onto the standard unix file perm structure. The top 12 bits indicate special file features. I think that one leading bit in the tenth position is in unused territory.
I was still confused, so I tried:
package main
import (
"io/ioutil"
"fmt"
"os"
)
func main() {
content := []byte("temporary file's content")
modes := map[string]os.FileMode{
"700": os.FileMode(700),
"0700": os.FileMode(0700),
"1274": os.FileMode(1274),
"01274": os.FileMode(01274)}
for name, mode := range modes {
if err := ioutil.WriteFile(name, content, mode); err != nil {
fmt.Println("error creating ", name, " as ", mode)
}
if fi, err := os.Lstat(name); err == nil {
mode := fi.Mode()
fmt.Println("file\t", name, "\thas ", mode.String())
}
}
}
And now I'm even more confused. The results I got are:
file 700 has --w-r-xr--
file 0700 has -rwx------
file 1274 has --wxr-x---
file 01274 has --w-r-xr--
and was confirmed by looking at the filesystem:
--w-r-xr-- 1 rfagen staff 24 Jan 5 17:43 700
-rwx------ 1 rfagen staff 24 Jan 5 17:43 0700
--wxr-x--- 1 rfagen staff 24 Jan 5 17:43 1274
--w-r-xr-- 1 rfagen staff 24 Jan 5 17:43 01274
The first one is the broken situation that triggered the original bug in the internal application.
The second one is the corrected code working as expected.
The third one is bizarre, as 1274 decimal seems to translate into 0350
The fourth one kind of makes a twisted sort of sense, given that dec(700)->oct(1274) and explicitly asking for 01274 gives the same puzzling 0254 as the first case.
I have a vague suspicion that the extra part of the number larger than 2^9 is somehow messing it up but I can't figure it out, even after looking at the source for FileMode. As far as I can tell, it only ever looks at the 12 MSB and 9 LSB.
os.FileMode only knows about integers, it doesn't care whether the literal representation is octal or not.
The fact that 0700 is interpreted in base 8 comes from the language spec itself:
An integer literal is a sequence of digits representing an integer
constant. An optional prefix sets a non-decimal base: 0 for octal, 0x
or 0X for hexadecimal. In hexadecimal literals, letters a-f and A-F
represent values 10 through 15.
This is a fairly standard way of representing literal octal numbers in programming languages.
So your file mode was changed from the requested 0274 to the actual on-disk 0254. I'll bet that your umask is 0022. Sounds to me like everything is working fine.

Multi byte store and fetch in Forth - how to implement?

When using large arrays it would be nice to be able to adjust the array for a certain number of bytes per number. Mostly I want fast routines to read such adjusted multi byte numbers to singles on the stack and conversely to store singles in the array adjusted for a certain number of bytes. In a 64 bit system there is a need for other single number arrays than one byte (c# c!) and eight bytes (# !).
So how to implement
cs# ( ad b -- n )
cs! ( n ad b -- )
where b is the number of bytes. The word cs! seems to work as
: cs! ( n ad b -- ) >r sp# cell+ swap r> cmove drop ;
but how about cs# and how to do it in pure ANS Forth without sp# or similar words?
The Forth200*x* committee has put quite some time into developing a Memory Access wordset that would suite. We have not included it into the standard thus far due to its size.
The compatible way is to use C# and bitwise operations. To use the same byte order in memory as Forth system there is need to detect endianness and compile the suitable versions of the certain definitions.
\ These definitions use little-endian format in memory.
\ Assumption: char size and address unit size equal to 1 octet.
: MB! ( x addr u -- )
ROT >R OVER + SWAP
BEGIN 2DUP U> WHILE R> DUP 8 RSHIFT >R OVER C! 1+ REPEAT
2DROP RDROP
;
: MB# ( addr u -- x )
0 >R OVER +
BEGIN 2DUP U< WHILE 1- DUP C# R> 8 LSHIFT OR >R REPEAT
2DROP R>
;
For higher performance it could be better to use implementation specific features (including W#, T#, Q#, SP#, etc) or even inline Forth-assembler.
Note that a straightforward definition via DO loop usually has worse performance (depends on optimizer; 10% in SP-Forth/4.21). The code for reference:
: MB! ( x addr u -- )
OVER + SWAP ?DO DUP I C! 8 RSHIFT LOOP DROP
;
: MB# ( addr u -- x )
DUP 0= IF NIP EXIT THEN
0 -ROT
1- OVER + DO 8 LSHIFT I C# OR -1 +LOOP
;
We can't use ?DO in the second case because of decreasing the loop index and +LOOP semantics: it leaves circle when the index crosses "the boundary between the loop limit minus one and the loop limit".
\ little-endian (eg. pc, android)
: mb! ( n ad i -- ) 2>r here ! here 2r> cmove ;
: mb# ( ad i -- n ) here 0 over ! swap cmove here # ;
\ big-endian (eg. mac)
: mb! ( n ad i -- ) 2>r here ! here cell + r# - 2r> cmove ;
: mb# ( ad i -- n ) here 0 over ! cell + over - swap cmove here # ;
\ little-endian test
1 here ! here c# negate .
Of course HERE could be any one cell buffer.
Thanks ruvim for parsing the process forward!

Going from Hex To Dec with PIC18?

I have a little issue trying to combine 4 digits together to give me the correct decimal value. First let me start with my code.
long firsttwo, secondtwo, combined;
firsttwo = 0x0C;
secondtwo = 0x6C;
The Decimal value of 0C: 12
The Decimal value of 6C: 108
But the Decimal value of all 0C6C: 3180
Now how do I get all the digits into one variable to be able to convert it to decimal correctely? Because if I just convert firsttwo by itself then secondtwo by itself I don't get the same final total. Thanks!
You need to shift the most significant byte when combining:
combined = (firsttwo << 8) | secondtwo;
this sets combined to 0x0c6c.

Serial data parsing

I have a probably simple question, which I just cant seem to understand.
I am creating a serial parser for a datalogger which sends a serial stream. Under the documentation for the product a calculation is stated, which I don't understand.
Lateral = Data1 And 0x7F + Data2 / 0x100
If (Data1 And 0x80)=0 Then Lateral = -Lateral
What does Data1 And 0x7f means? I know that 7F is 127, but besides that I don't understand the combination with the And statement.
What would the real formula look like?
Bitwise AND -- a bit in the output is set if and only if the corresponding bit is set in both inputs.
Since your tags indicate that you're working in C, you can perform bitwise AND with the & operator.
(Note that 0x7F is 01111111 and 0x80 is 10000000 in binary, so ANDing with these correspond respectively to extracting the lower seven bits and extracting the upper bit of a byte.)
1st sentence
Lateral = Data1 And(&) 0x7f + Data2/ 0x100
means take the magnitude of Data1(Data and 0x7f) and add to it the value of Data2/256
2nd sentence
check the sign od Data1 and assign the same to Lateral.

Resources