Generating machine code from C - c

Sorry if these are naive questions - I have very little understanding of how C really works at the low level.
So I'm generating machine code to write to some mmap'd memory for execution. I'm confused about the use of hexadecimal literals for generating machine code.
Consider the assembly instruction (AT&T syntax): cmove %edx, %ecx. This has the machine code representation 0x0F44CA.
So, would doing something like:
char opcode[3] { 0x0F, 0x44, 0xCA };
represent the correct binary string under when 'under the hood'? I suspect it might not, since apparently hexadecimal literals in C are stored as integers. My concern is that, since integers are 32-bit, the actual values getting stored are
0x0000000F 0x00000044 0x000000CA
Which is something completely different from what I need.
Another concern I have is, does the type I give to the array affect the value actually being stored? So would
uint8_t opcode[3] { 0x0F, 0x44, 0xCA };
or
int opcode[3] { 0x0F, 0x44, 0xCA };
be any different from
char opcode[3] { 0x0F, 0x44, 0xCA };
under the hood?

uint8_t opcode[3] = { 0x0F, 0x44, 0xCA };
will store your values as 8-bit values 'bytes' in the order you gave them.
It is the same as
unsigned char opcode[3] = { 0x0F, 0x44, 0xCA };
But using an 'int' type is as you said
0000000F00000044000000CA
or
0F00000044000000CA000000
depending on the endianess of your system.

I did not get your actual problem but I think these two points may help you for better understanding of machine code.
Use objdump and you will get machine code and assembly code
together to understand what is happening.
objdump -d prog.o
Read this article http://csapp.cs.cmu.edu/public/ch3-preview.pdf
I hope this will help you somewhat.

Related

How to prepare data for use with MMX/SSE intrinsics for shifting 16bit values?

No matter what I do with {0,8,16,0}(16bit vector, representation for copying into a big endian 64bit value) I am unable to properly bit shift a test value of { 0x00, 0x01, (...) 0x07 };
The result I get in the debugger is always 0x0.
I tried to convert the value in a couple of different ways, but I am unable to get this right.
Executed on a little endian:
#include <mmintrin.h>
#include <stdint.h>
int main(int argc, char** argv) {
__m64 input;
__m64 vectors;
__m64 output;
_Alignas(8) uint16_t bit16Vectors[1*4] = {
0x0000,0x0008,0x0010,0x0000
// Intent: {0,8,16,0} 16 bit array
// Convert for copy: {0,16,8,0} 64bit one item
// 8bit data, Bytes need to rotate: {0,8,16,0}
};
_Alignas(8) uint8_t in[8] = {
0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
};
input = _m_from_int64(*((long long*)in) );
vectors = _m_from_int64 (*((long long*)bit16Vectors));
output = _mm_sll_pi16(input, vectors);
__asm__("int3");
}
I wrote down a simple MMX-only RGB24 plane separation pseudoAssembly[which processes 8x1 values], but I am unable to convert all the 16+32bit bit shift vectors to "real world", or I do something wrong with the intrinsics.
I am unable to pin it down exactly, I just know it fails at the very first bit shift and returns the value of 0x0.

C contiguous data arrays

Say I have 2 arrays of data:
const uint8_t data1[] = {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07};
const uint8_t data2[] = {0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f};
I would like to be able to read these individually, but also as one contiguous block of data.
eg: I could access data1[8] in the same way as data[0].
Reason: I have various const data definitions in some individual .c files that I'd rather not touch (font bitmaps) but I'd like to append some extra data to them (extra special characters). So I'd like to
#include <original font file>
const uint8_t extrafonts[] = {<more font bitmaps>};
Can this be done?
The only way to guarantee contiguous allocation in C is to use arrays of arrays. In this case it would seem that a const uint8_t [2][8] would solve all your problems, so use that if possible.
Otherwise, more advanced solutions could use structs and unions. These guarantee an order of allocation but come with the disadvantage that the compiler can insert padding anywhere. In this specific case it wouldn't be a problem on any real-world computer, since chunks of 8 bytes are aligned. If you have a standard C compiler, you can do this:
#include <inttypes.h>
#include <stdio.h>
typedef union
{
struct
{
uint8_t data1 [8];
uint8_t data2 [8];
};
uint8_t data [16];
} data_t;
int main (void)
{
const data_t data =
{
.data1 = {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07},
.data2 = {0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f},
};
for(size_t i=0; i<16; i++)
{
printf("%.2"PRIx8" ", data.data[i]);
}
}
Now you can access the arrays individually through data.data1/data.data2 or as one, with data.data.
In cases where you worry about struct padding, you'll have to add some non-standard #pragma pack(1) or similar compiler-specific instruction.
There is another alternative that might be applicable to your use case: Use external preprocessing.
Use your favorite scripting language with some regular expression magic to read original font source file, and append extra font data at the end of array. Then save it to new file and use that in compilation instead.
This might seem a lot of work at first, but c files which are generated by tools (which I assume is the case with your font bitmaps) tend to have predictable format that is not too hard to parse.
You could probably use:
struct ArrayPair
{
uint8_t data1[8];
uint8_t data2[8];
};
const struct ArrayPair data =
{
{ 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07 },
{ 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f }
};
Now you can probably get away with using:
data.data1[8]
It isn't very elegant. Your requirement is not sensible.
You said that these are 2 global arrays on different source files. So I am ignoring any options where you can group these with unions or structures.
Now the bad news: There is absolutely no guarantee that 2 global variables will end up next to each other in memory with standard C rules.
You can do this, but you need to use compiler specific extensions instead. You need to:
Use manual memory placement to place variables next to each other.
Disable any optimizations or other compiler features that might break your code because you are accessing array out of bounds.
Note that I am assuming that you are in system where resources are limited, for example, some embedded system. If that is not the case, then simply create third array, and merge 2 arrays at program startup.

How to append another HEX value at the end of an existing byte array in C

I have searched on google and checked my findings on https://www.onlinegdb.com/
But so far, I am not satisfied with my trials and errors. Perhaps, I didn't know how to ask.
I am sure that this could be already very known by many people.
Normally I am reading HEX values from UART communication and placing in a buffer array.
But, for making things simpler, I give you that code snippet;
uint8_t buffer[20] = {0x7E, 0x00, 0x07, 0xAA, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0xCC};
uint8_t newValue = 0x55;
My goal is to append newValue on buffer and that new value has to be seen after the last array value which is 0xCC in this case.
So, my question is how to do that efficiently?
Note that: One of my trials (works OK but not as I wanted);
buffer[11] = newValue ;
for(int i=0;i<sizeof(buffer);i++)
printf("%02x", buffer[i]);
But, then I need to know the position of the last value and increase the position by one which is 11 (0 based counting) in this case.

AES Plaintext to Ciphertext Encryption

I am currently working through Practical Cryptography and am trying to do the exercises at the end of chapter 4. There are a couple of of questions that ask you to decrypt a bit of ciphertext in hex with a key also in hex. I am using the openssl crypto libraries to try and accomplish this.
I am having trouble knowing whether I have done this correctly as I have done it a number of ways and I get different answers. Also because the values provided are in hex I am having a bit of a hard time getting them into a usable value.
I am just using printf to put the hex values into a string. This seems correct to me as I can read them out again and they are correct. The key is harder, I am trying to store the numbers directly into the key structure provided by openssl but I dont know how the openssl implementation uses the key structure.
My code is below. If I run it in this format I get the following output.
In : 0x53:0x9b:0x33:0x3b:0x39:0x70:0x6d:0x14:0x90:0x28:0xcf:0xe1:0xd9:0xd4:0xa4:0x7
Out: 0xea:0xb4:0x1b:0xfe:0x47:0x4c:0xb3:0x2e:0xa8:0xe7:0x31:0xf6:0xdb:0x98:0x4e:0xe2
My questions are:
Does my method of storing the key look correct?
Does my overall method look correct?
Does anyone know what the actual answer should be?
Code below
int main( void )
{
unsigned char InStr [100];
unsigned char OutStr[100];
char KeyStr[100];
AES_KEY Key;
Key.rd_key[0] = 0x8000000000000000;
Key.rd_key[1] = 0x0000000000000000;
Key.rd_key[2] = 0x0000000000000000;
Key.rd_key[3] = 0x0000000000000001;
Key.rounds = 14;
sprintf( InStr, "%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c\0",
0x53, 0x9B, 0x33, 0x3B, 0x39, 0x70, 0x6D, 0x14,
0x90, 0x28, 0xCF, 0xE1, 0xD9, 0xD4, 0xA4, 0x07 );
AES_decrypt( InStr, OutStr, &Key );
printf( "In : %#hx:%#hx:%#hx:%#hx:%#hx:%#hx:%#hx:%#hx:%#hx:%#hx:%#hx:%#hx:%#hx:%#hx:%#hx:%#hx\n",
InStr[0], InStr[1], InStr[2], InStr[3], InStr[4], InStr[5], InStr[6], InStr[7],
InStr[8], InStr[9], InStr[10], InStr[11], InStr[12], InStr[13], InStr[14], InStr[15] );
printf( "Out: %#hx:%#hx:%#hx:%#hx:%#hx:%#hx:%#hx:%#hx:%#hx:%#hx:%#hx:%#hx:%#hx:%#hx:%#hx:%#hx\n",
OutStr[0], OutStr[1], OutStr[2], OutStr[3], OutStr[4], OutStr[5], OutStr[6], OutStr[7],
OutStr[8], OutStr[9], OutStr[10], OutStr[11], OutStr[12], OutStr[13], OutStr[14], OutStr[15] );
return 0;
}

CUDA kernels throw different results on 2 different GPUs(GeForce 8600M GT vs Quadro FX 770M)

I've been working on an AES CUDA application and I have a kernel which performs ECB encryption on the GPU. In order to assure the logic of the algorithm is not modified when running in parallel I send a known input test vector provided by NIST and then from host code compare the output with the know test vector output provided by NIST with an assert.
I have run this test on my NVIDIA GPU which is a 8600M GT. This is running under Windows 7 and the driver version is 3.0. Under this sceneario everything works perfect and the assert succeeds.
Now, when the application is run on a Quadro FX 770M. The same application is launched, the same test vectors are sent but the result obtained is incorrect and the assert fails!!. This runs on Linux with the same driver version
The kernels are executed by 256 threads. Within the kernels and to skip arithmetic pre computed lookup tables of 256 elements are used. These tables are originally loaded in global memory, 1 thread out of the 256 threads launching the kernel colaborate in loading 1 element of the lookup table and moves the element into a new lookup table in shared memory so access latency is decreased.
Originally, I thought about syncrhonization problems due to Clock speed differences between GPUs. So may be threads were using values still not loaded into shared memory or somehow values which were still not processed, making the output
to mess up and finally get it incorrect.
In here the known test vectors are declared, so basically they are sent to AES_set_encrption which is in charge to setup the kernel
void test_vectors ()
{
unsigned char testPlainText[] = {0x6b, 0xc1, 0xbe, 0xe2, 0x2e, 0x40, 0x9f, 0x96, 0xe9, 0x3d, 0x7e, 0x11, 0x73, 0x93, 0x17, 0x2a};
unsigned char testKeyText[] = {0x60, 0x3d, 0xeb, 0x10, 0x15, 0xca, 0x71, 0xbe, 0x2b, 0x73, 0xae, 0xf0, 0x85, 0x7d, 0x77,0x1f, 0x35, 0x2c, 0x07, 0x3b, 0x61, 0x08, 0xd7, 0x2d, 0x98, 0x10, 0xa3, 0x09, 0x14, 0xdf, 0xf4};
unsigned char testCipherText[] = {0xf3, 0xee, 0xd1, 0xbd, 0xb5, 0xd2, 0xa0, 0x3c, 0x06, 0x4b, 0x5a, 0x7e, 0x3d, 0xb1, 0x81, 0xf8};
unsigned char out[16] = {0x0};
//AES Encryption
AES_set_encrption( testPlainText, out, 16, (u32*)testKeyText);
//Display encrypted data
printf("\n GPU Encryption: ");
for (int i = 0; i < AES_BLOCK_SIZE; i++)
printf("%x", out[i]);
//Assert that the encrypted output is the same as the NIST testCipherText vector
assert (memcmp (out, testCipherText, 16) == 0);
}
In here the setup function is in charge to allocate memory, call the kernel and send the results back to the hos. Notice I have syncrhonize before sending back to the host so at that point everything should be finished, which makes me think the problem is within the kernel..
__host__ double AES_set_encrption (... *input_data,...*output_data, .. input_length, ... ckey )
//Allocate memory in the device and copy the input buffer from the host to the GPU
CUDA_SAFE_CALL( cudaMalloc( (void **) &d_input_data,input_length ) );
CUDA_SAFE_CALL( cudaMemcpy( (void*)d_input_data, (void*)input_data, input_length, cudaMemcpyHostToDevice ) );
dim3 dimGrid(1);
dim3 dimBlock(THREAD_X,THREAD_Y); // THREAD_X = 4 & THREAD_Y = 64
AES_encrypt<<<dimGrid,dimBlock>>>(d_input_data);
cudaThreadSynchronize();
//Copy the data processed by the GPU back to the host
cudaMemcpy(output_data, d_input_data, input_length, cudaMemcpyDeviceToHost);
//Free CUDA resources
CUDA_SAFE_CALL( cudaFree(d_input_data) );
}
And finally within the kernel I have a set of AES rounds computed. Since I was thinking that the syncrhonization problem was then within the kernel I set __syncthreads(); after each round or computational operation to make sure all threads were moving at the same time so no uncomputed values migth be evaluated.. But still that did not solved the problem..
Here is the output when I use the 8600M GT GPU which works fine:
AES 256 Bit key
NIST Test Vectors:
PlaintText: 6bc1bee22e409f96e93d7e117393172a
Key: 603deb1015ca71be2b73aef0857d7781
CipherText: f3eed1bdb5d2a03c64b5a7e3db181f8
GPU Encryption: f3eed1bdb5d2a03c64b5a7e3db181f8
Test status: Passed
And here is when I use the Quadro FX 770M and fails!!
AES 256 Bit key
NIST Test Vectors:
PlaintText: 6bc1bee22e409f96e93d7e117393172a
Key: 603deb1015ca71be2b73aef0857d7781
CipherText: f3eed1bdb5d2a03c64b5a7e3db181f8
GPU Encryption: c837204eb4c1063ed79c77946893b0
Generic assert memcmp (out, testCipherText, 16) == 0 has thrown an error
Test status: Failed
What might be the reason why 2 GPUs compute different results even when they process same kernels???
I will appreciate any hint or troubleshooting any of you could give me or any step in order to fix this issue
Thanks in advance!!
disclaimer: I don't know anything about AES encryption.
Do you use double precision? You are probably aware, but just to be sure - I believe that both of the cards you are using are compute capabality 1.1 which does not support double precision. Perhaps the cards or the platforms convert to single precision in different ways...? Anyone know? Truthfully, the IEEE floating point deviations are well specified, so I'd be suprised.

Resources