Disassembler logic for custom opcodes in C - c

So I'm building a disassembler that will convert a file containing hexadecimal data into assembly language.
So from this format I could convert the hexadecimal data in the file into decimal using uint8_t and store them in an array. Then I decided to bit shift the last number in the array to get number of instructions of the last function; essentially I'm parsing backwards since I don't know how much padding there are at the beginning and the number of ops in a function is given at the end of the function. But then I realised that the operations varies in bit size and aren't in perfect 8 or 16 bit bounds. So then I was stuck since my array, using the example at the top, was essentially this:
uint8_t hex[] = {0x00, 0x03, 0x02, 0x01, 0x42, 0x82, 0x86, 0x04, 0x10, 0x45};
So can anyone help me with the logic in parsing? This is my first time posting so I'm sorry if I'm missing anything and will provide more information or delete if needed

Instead of shifting and masking (which I think would be really complicated) what if you convert the uint8_t array into an array of bits - it uses a lot more memory but you can access individual bits much easier.
Here is a sample program that does this:
#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>
uint8_t getBits(uint8_t *bits, uint8_t size, uint32_t *index)
{
uint8_t value = 0;
*index -= size; // decrement index to the starting point
for(uint32_t i=0; i<size; i++)
value = (value<<1) | bits[*index+i];
return value;
}
int main()
{
// sample program
uint8_t array[] = {0x00,0x03,0x02,0x01,0x42,0x82,0x86,0x04,0x10,0x45};
// program with zero padding
// uint8_t array[] = {0xE8,0x39,0x06,0xA0,0xC4,0x16,0x82,0x90,0x4A,0x08,0x41};
uint32_t array_size = sizeof(array)/sizeof(*array); // 10 bytes
uint32_t bits_size = 8*array_size; // 80 bytes
uint8_t* bits = malloc(bits_size);
for(uint32_t a=0;a<array_size;a++)
for(uint32_t b=0;b<8;b++)
bits[a*8+b] = (array[a] >> (7-b)) & 1;
puts("Binary program file:");
for(uint32_t i=0;i<bits_size;i++)
printf("%s%d",(i%8?"":" "),bits[i]);
puts("");
enum { MOV, CAL, RET, REF, ADD, PRINT, NOT, EQU};
uint8_t params[] = { 2, 1, 0, 2, 2, 1, 1, 1};
const char *opcodes[] = {"MOV","CAL","RET","REF","ADD","PRINT","NOT","EQU"};
enum { VAL, REG, STK, PTR};
uint8_t value_size[] = { 8, 3, 5, 5};
const char *types[] = {"VAL","REG","STK","PTR"};
uint32_t index = bits_size; // start at end
// minimum program size is function(3) + opcode(3) + size(5)
// if there are less than that number of bits then it must be padding
while(index>10)
{
uint8_t size = getBits(bits,5,&index);
printf("\nsize=%d\n",size);
if (size > 0)
{
for(int o=0; o<size; o++)
{
uint8_t opcode = getBits(bits,3,&index);
printf("opcode=%s",opcodes[opcode]);
for(int p=0; p<params[opcode]; p++)
{
printf("%c ",p?',':':');
uint8_t type = getBits(bits,2,&index);
printf("type=%s ",types[type]);
uint8_t value = getBits(bits,value_size[type],&index);
printf("value=%d",value);
}
puts("");
}
uint8_t function = getBits(bits,3,&index);
printf("function=%d\n",function);
}
}
return 0;
}
Try it at https://onlinegdb.com/S1qVStz8d
How it getBits() works:
You make an array of individual digits from the original value, and then you take bits from it one at a time to make a new value - getBits() is the function I have written for that.
To understand how it works imagine how it works in base 10: 321 is put into the array {3,2,1} and you could turn it back into a value with:
value = 0;
value = value*10 + digits[0];
value = value*10 + digits[1];
value = value*10 + digits[2];
Which gives (((0)*10+3)*10+2)*10+1 which is 321
If 5 (binary 101) is put into the array {1,0,1}, you could turn it back into a value with:
value = 0;
value = value*2 + bits[0];
value = value*2 + bits[1];
value = value*2 + bits[2];
Which gives (((0)*2+1)*2+0)*2+1 which is 5 (binary 101)
And that does work. And a decent compiler would optimize the *2 into <<1 and the + into |, but you could do it yourself (which is what I did):
value = 0;
value = (value<<1) | bits[0];
value = (value<<1) | bits[1];
value = (value<<1) | bits[2];
Which produces that same binary 00000101
It's just a readability thing - with decimal you expect to see value*10+x but with binary you expect to see bit operations like shift/or instead of math operations like multiply/add.
Then, if you use a loop with a size and an index that points to the end of the array, you get:
uint8_t value = 0;
index -= size; // decrement index to the starting point
for(uint32_t i=0; i<size; i++)
value = (value<<1) | bits[index+i];
But, of course, if it is a function then index needs to be a pointer and you need to dereference it everywhere:
uint8_t getBits(uint8_t *bits, uint8_t size, uint32_t *index)
{
uint8_t value = 0;
*index -= size; // decrement index to the starting point
for(uint32_t i=0; i<size; i++)
value = (value<<1) | bits[*index+i];
return value;
}

Related

Finding the Occurrence of a Pattern in a Data Block using C

I am trying to figure out finding the occurrence of a set of numbers in a data block without using arrays. Right now my issue is that everytime it finds the first number in the pattern it assumes the whole pattern is right. I am having trouble trying to go through the whole pattern using patternLength in the data block before assuming it is right. For example is the pattern I want to find over and over again is 12 14 3C 48. Everytime is sees the number 12 it says it is the whole pattern.
uint32_t findOccurrencesOfPattern(uint32_t *const pOffsets,
const uint8_t *const blockAddress,
uint32_t blockLength,
const uint8_t *const pPattern,
uint8_t patternLength) {
uint32_t bytesRead = 0;
int count = 0;
int length = 0;
// const char* c = ((const char *)&pPattern);
while (bytesRead < blockLength) {
if (*(blockAddress + bytesRead) == *pPattern) {
*(pOffsets + count) = bytesRead;
count++;
}
patternLength--;
bytesRead++;
}
return count;
}
You will need to turn the if() statement (which presently check only the first byte) into a for() loop to iterate over patternLength-worth of bytes.

What does the last `for` loop in this Radix Sort code do?

I was reading the book Learn C The Hard Way by Zed A. Shaw and I was looking over his implementation of the radix sort algorithm.
This is his code:
#define ByteOf(x, y) (((u_int8_t *)x)[y])
static inline void radix_sort(short offset, uint64_t max,
uint64_t * source, uint64_t * dest)
{
uint64_t count[256] = { 0 };
uint64_t *cp = NULL;
uint64_t *sp = NULL;
uint64_t *end = NULL;
uint64_t s = 0;
uint64_t c = 0;
// Count occurences of every byte value
for (sp = source, end = source + max; sp < end; sp++) {
count[ByteOf(sp, offset)]++;
}
// transform count into index by summing
// elements and storing them into same array.
for (s = 0, cp = count, end = count + 256; cp < end; cp++) {
c = *cp;
*cp = s;
s += c;
}
// fill dest with right values in the right place
for (sp = source, end = source + max; sp < end; sp++) {
cp = count + ByteOf(sp, offset);
printf("dest[%d] = %d\n", *cp, *sp);
dest[*cp] = *sp;
++(*cp);
}
}
The above is just a helper function. His actual radix sort is done here:
void RadixMap_sort(RadixMap * map)
{
uint64_t *source = &map->contents[0].raw;
uint64_t *temp = &map->temp[0].raw;
radix_sort(0, map->end, source, temp);
radix_sort(1, map->end, temp, source);
radix_sort(2, map->end, source, temp);
radix_sort(3, map->end, temp, source);
}
Here's the structures he's defined:
typedef union RMElement {
uint64_t raw;
struct {
uint32_t key;
uint32_t value;
} data;
} RMElement;
typedef struct RadixMap {
size_t max;
size_t end;
uint32_t counter;
RMElement *contents;
RMElement *temp;
} RadixMap;
I can understand the first 2 for loops in the inline function radix_sort. As far as I understand, the first one simply just counts the byte values and the second function basically makes a cumulative frequency table, where each entry is the sum of the previous entries.
I still can't wrap my head around the ByteOf(x, y) macro and the third for loop. I've tried reading the Wikipedia page for Radix-sort and I read another article that used a C++ implementation. However, the code written in each of these articles doesn't match the code that he's written.
I understand how Radix Sort works in principle. Basically, we group it according to each digit, rearranging the groupings for every new digit we encounter. For example, to sort the array [223, 912, 275, 100, 633, 120, 380], you first group them by the ones digit so you get [380, 100, 120], [912], [633, 223], [275]. Then you do the same thing with the tens and hundreds place until you've run out of digits.
Any help explaining his code would be appreciated.
Thanks.
ByteOf(x, y) is the same as:
#define ByteOf(x, y) ((*(x) >> (offset*8)) & 0xff)
That is, it isolates the value of byte #{offset} within a value.
The second loop is a sort of allocator. If the first six counts[] were 1,2,4,0,16,25 after the first loop they would be 0,1,3,7,7,23 after the second. This directs the third loop (over source[]) to layout the destination as:
ByteOf index number of values
0 0 1
1 1 2
2 3 4
3 7 0 -- there are none.
4 7 16
5 23 25
I find it a bit clearer to rewrite the third loop as:
for (i = 0; i < max; i++) {
dest[count[ByteOf((source+i), offset)]++] = source[i];
}
I think it shows the relationship more clearly, that is that the ith’ source element is being copied to an index in dest. The index in dest is at the start of the partition (count[]) previously computed for this digit. Since there is now a number at this location, we increment the start of this partition to prevent over-writing it.
Note that the brackets around (source+i) are necessary to get the right address for the cast in ByteOf.

converting base 4 code to letters

im working on an assmebler project that i have and i need to translate binary machine code that i have to a "weird" 4 base code for example
if i get binary code like this "0000-10-01-00" i should translate it to "aacba"
00=a
01=b
10=c
11=d
i have managed to translate the code to 4 base code but i dont know how to continue from there or if this is the right way to do it,...
adding my code below
void intToBase4 (unsigned int *num)
{
int d[7];
int j,i=0;
double x=0;
while((*num)>0)
{
d[i]=(*num)%4;
i++;
(*num)=(*num)/4;
}
for(x=0,j=i-1; j>=0; j--)
{
x += d[j]*pow(10,j);
}
(*num)=(unsigned int)x;
}
I've included a little 32-bit to num to letter converter for you to grasp the basics. It works a single "32-bit number" at a time. You could use this as a basis for an array based solution like you have half way done in your example, or change the type to be bigger, or whatever. It should show you roughly what you need to do:
void intToBase4 (uint32_t num, char *outString)
{
// There are 16 digits per num in this example
for(int i=0; i<16; i++)
{
// Grab the lowest 2 bits and convert to a letter.
*outString++ = (num & 0x03) + 'a';
// Shift next 2 bits low
num >>= 2;
}
// NUL terminate string.
*outString = '\0';
}
A bit more universal one:
Usage:
value - value to decode, buff - buff where result string will be stored, numofwrds - number of fields to be decoded, ... fields sizes in bits
example "xxxxyyvvzz": - 4 bits, two bits, two bits, two bits
decode(v, buff, 4, 4, 2, 2, 2);
char dictionary[] = "abcdefghijklmnopqrstuwxyz";
char *decode(unsigned int value, char *buff, int numofwrds, ...)
{
va_list vl;
int *fieldsizes = malloc(sizeof(int) * numofwrds);
int bitsize = 0;
char *result = NULL;
if (fieldsizes != NULL)
{
va_start(vl, numofwrds);
for (int i = 0; i < numofwrds; i++)
{
fieldsizes[i] = va_arg(vl, int);
bitsize += fieldsizes[i];
}
va_end(vl);
for (int i = 0; i < numofwrds; i++)
{
unsigned int mask, offset;
mask = (1 << fieldsizes[i]) - 1;
offset = bitsize - fieldsizes[i];
mask <<= offset;
buff[i] = dictionary[(value & mask) >> offset];
bitsize -= fieldsizes[i];
}
free(fieldsizes);
result = buff;
}
buff[numofwrds] = '\0';
return result;
}

figure out why my RC4 Implementation doesent produce the correct result

Ok I am new to C, I have programmed in C# for around 10 years now so still getting used to the whole language, Ive been doing great in learning but im still having a few hickups, currently im trying to write a implementation of RC4 used on the Xbox 360 to encrypt KeyVault/Account data.
However Ive run into a snag, the code works but it is outputting the incorrect data, I have provided the original c# code I am working with that I know works and I have provided the snippet of code from my C project, any help / pointers will be much appreciated :)
Original C# Code :
public struct RC4Session
{
public byte[] Key;
public int SBoxLen;
public byte[] SBox;
public int I;
public int J;
}
public static RC4Session RC4CreateSession(byte[] key)
{
RC4Session session = new RC4Session
{
Key = key,
I = 0,
J = 0,
SBoxLen = 0x100,
SBox = new byte[0x100]
};
for (int i = 0; i < session.SBoxLen; i++)
{
session.SBox[i] = (byte)i;
}
int index = 0;
for (int j = 0; j < session.SBoxLen; j++)
{
index = ((index + session.SBox[j]) + key[j % key.Length]) % session.SBoxLen;
byte num4 = session.SBox[index];
session.SBox[index] = session.SBox[j];
session.SBox[j] = num4;
}
return session;
}
public static void RC4Encrypt(ref RC4Session session, byte[] data, int index, int count)
{
int num = index;
do
{
session.I = (session.I + 1) % 0x100;
session.J = (session.J + session.SBox[session.I]) % 0x100;
byte num2 = session.SBox[session.I];
session.SBox[session.I] = session.SBox[session.J];
session.SBox[session.J] = num2;
byte num3 = data[num];
byte num4 = session.SBox[(session.SBox[session.I] + session.SBox[session.J]) % 0x100];
data[num] = (byte)(num3 ^ num4);
num++;
}
while (num != (index + count));
}
Now Here is my own c version :
typedef struct rc4_state {
int s_box_len;
uint8_t* sbox;
int i;
int j;
} rc4_state_t;
unsigned char* HMAC_SHA1(const char* cpukey, const unsigned char* hmac_key) {
unsigned char* digest = malloc(20);
digest = HMAC(EVP_sha1(), cpukey, 16, hmac_key, 16, NULL, NULL);
return digest;
}
void rc4_init(rc4_state_t* state, const uint8_t *key, int keylen)
{
state->i = 0;
state->j = 0;
state->s_box_len = 0x100;
state->sbox = malloc(0x100);
// Init sbox.
int i = 0, index = 0, j = 0;
uint8_t buf;
while(i < state->s_box_len) {
state->sbox[i] = (uint8_t)i;
i++;
}
while(j < state->s_box_len) {
index = ((index + state->sbox[j]) + key[j % keylen]) % state->s_box_len;
buf = state->sbox[index];
state->sbox[index] = (uint8_t)state->sbox[j];
state->sbox[j] = (uint8_t)buf;
j++;
}
}
void rc4_crypt(rc4_state_t* state, const uint8_t *inbuf, uint8_t **outbuf, int buflen)
{
int idx = 0;
uint8_t num, num2, num3;
*outbuf = malloc(buflen);
if (*outbuf) { // do not forget to test for failed allocation
while(idx != buflen) {
state->i = (int)(state->i + 1) % 0x100;
state->j = (int)(state->j + state->sbox[state->i]) % 0x100;
num = (uint8_t)state->sbox[state->i];
state->sbox[state->i] = (uint8_t)state->sbox[state->j];
state->sbox[state->j] = (uint8_t)num;
num2 = (uint8_t)inbuf[idx];
num3 = (uint8_t)state->sbox[(state->sbox[state->i] + (uint8_t)state->sbox[state->j]) % 0x100];
(*outbuf)[idx] = (uint8_t)(num2 ^ num3);
printf("%02X", (*outbuf)[idx]);
idx++;
}
}
printf("\n");
}
Usage (c#) :
byte[] cpukey = new byte[16]
{
...
};
byte[] hmac_key = new byte[16]
{
...
};
byte[] buf = new System.Security.Cryptography.HMACSHA1(cpukey).ComputeHash(hmac_key);
MessageBox.Show(BitConverter.ToString(buf).Replace("-", ""), "");
Usage(c):
const char cpu_key[16] = { 0xXX, 0xXX, 0xXX };
const unsigned char hmac_key[16] = { ... };
unsigned char* buf = HMAC_SHA1(cpu_key, hmac_key);
uint8_t buf2[20];
uint8_t buf3[8] = { 0x1E, 0xF7, 0x94, 0x48, 0x22, 0x26, 0x89, 0x8E }; // Encrypted Xbox 360 data
uint8_t* buf4;
// Allocated 8 bytes out.
buf4 = malloc(8);
int num = 0;
while(num < 20) {
buf2[num] = (uint8_t)buf[num]; // convert const char
num++;
}
rc4_state_t* rc4 = malloc(sizeof(rc4_state_t));
rc4_init(rc4, buf2, 20);
rc4_crypt(rc4, buf3, &buf4, 8);
Now I have the HMACsha1 figured out, im using openssl for that and I confirm I am getting the correct hmac/decryption key its just the rc4 isnt working, Im trying to decrypt part of the Kyevault that should == "Xbox 360"||"58626F7820333630"
The output is currently : "0000008108020000" I do not get any errors in the compilation, again any help would be great ^.^
Thanks to John's help I was able to fix it, it was a error in the c# version, thanks John !
As I remarked in comments, your main problem appeared to involve how the output buffer is managed. You have since revised the question to fix that, but I describe it anyway here, along with some other alternatives for fixing it. The remaining problem is discussed at the end.
Function rc4_crypt() allocates an output buffer for itself, but it has no mechanism to communicate a pointer to the allocated space back to its caller. Your revised usage furthermore exhibits some inconsistency with rc4_crypt() with respect to how the output buffer is expected to be managed.
There are three main ways to approach the problem.
Function rc4_crypt() presently returns nothing, so you could let it continue to allocate the buffer itself, and modify it to return a pointer to the allocated output buffer.
You could modify the type of the outbuf parameter to uint8_t ** to enable rc4_crypt() to set the caller's pointer value indirectly.
You could rely on the caller to manage the output buffer, and make rc4_crypt() just write the output via the pointer passed to it.
The only one of those that might be tricky for you is #2; it would look something like this:
void rc4_crypt(rc4_state_t* state, const uint8_t *inbuf, uint8_t **outbuf, int buflen) {
*outbuf = malloc(buflen);
if (*outbuf) { // do not forget to test for failed allocation
// ...
(*outbuf)[idx] = (uint8_t)(num2 ^ num3);
// ...
}
}
And you would use it like this:
rc4_crypt(rc4, buf3, &buf4, 8);
... without otherwise allocating any memory for buf4.
The caller in any case has the responsibility for freeing the output buffer when it is no longer needed. This is clearer when it performs the allocation itself; you should document that requirement if rc4_crypt() is going to be responsible for the allocation.
The remaining problem appears to be strictly an output problem. You are apparently relying on print statements in rc4_crypt() to report on the encrypted data. I have no problem whatever with debugging via print statements, but you do need to be careful to print the data you actually want to examine. In this case you do not. You update the joint buffer index idx at the end of the encryption loop before printing a byte from the output buffer. As a result, at each iteration you print not the encrypted byte value you've just computed, but rather an indeterminate value that happens to be in the next position of the output buffer.
Move the idx++ to the very end of the loop to fix this problem, or change it from a while loop to a for loop and increment idx in the third term of the loop control statement. In fact, I strongly recommend for loops over while loops where the former are a good fit to the structure of the code (as here); I daresay you would not have made this mistake if your loop had been structured that way.

Storing bits from an array in an integer

So i have an array of bits, basically 0's and 1's in a character array.
Now what I want to do is store these bits in an integer I have in another array (int array), but I'm not sure how to do this.
Here is my code to get the bits:
char * convertStringToBits(char * string) {
int i;
int stringLength = strlen(string);
int mask = 0x80; /* 10000000 */
char *charArray;
charArray = malloc(8 * stringLength + 1);
if(charArray == NULL) {
printf("An error occured!\n");
return NULL; //error - cant use charArray
}
for(i = 0; i < stringLength; i++) {
mask = 0x80;
char c = string[i];
int x = 0;
while(mask > 0) {
char n = (c & mask) > 0;
printf("%d", n);
charArray[x++] = n;
mask >>= 1; /* move the bit down */
}
printf("\n");
}
return charArray;
}
This gets a series of bits in an array {1, 0, 1, 1, 0, 0, 1} for example. I want to store this in the integers that I have in another array. I've heard about integers having unused space or something.
For Reference: The integer values are red values from the rgb colour scheme.
EDIT:
To use this I would store this string in the integer values, later to be decoded the same way to retrieve the message (steganography).
So you want to do LSB substitution for the integers, the simplest form of steganography.
It isn't that integers have unused space, it's just that changing the LSB changes the value of an integer by 1, at most. So if you're looking at pixels, changing their value by 1 won't be noticeable by the human eye. In that respect, the LSB holds redundant information.
You've played with bitwise operations. You basically want to clear the last bit of an integer and substitute it with the value of one of your bits. Assuming your integers range between 0 and 255, you can do the following.
pixel = (pixel & 0xfe) | my_bit;
Edit: Based on the code snippet from the comments, you can achieve this like so.
int x;
for (x = 0; x < messageLength; x++) {
rgbPixels[x][0] = (rgbPixels[x][0] & 0xfe) | bitArray[x];
}
Decoding is much simpler, in that all you need to do is read the value of the LSB of each pixel. The question here is how will you know how many pixels to read? You have 3 options:
The decoder knows the message length in advance.
The message length is similarly hidden in some known location so that the decoder can extract it. For example, 16 bits representing in binary the message length, which is hidden in the first 16 pixels before bitArray.
You use an end-of-message marker, where you keep extracting bits until you hit a signature sequence that signals you to stop. For example, eight 0s in a row. You must make sure that how long the sequence and whatever it will be, it mustn't be encountered prematurely in your bit array.
So say somehow you have allocated the size for the message length. You can simply get extract your bit array (after allocation) like so.
int x;
for (x = 0; x < messageLength; x++) {
bitArray[x] = rgbPixels[x][0] & 0x01;
}
This converts a string to the equivalent int.
char string[] = "101010101";
int result = 0;
for (int i=0; i<strlen(string); i++)
result = (result<<1) | string[i]=='1';

Resources