How do I deal with negative numbers in C? - c

I'm trying to code a very basic Assembler language in C.
Each instruction has 32 bits, the first 8 being the Opcode and the following 24 containing an immediate value (if the instruction comes with one. Else it's just zeros).
The Opcodes are defined as simple numbers. For example PUSHC is defined as 1.
Now I want to test if the Opcode and the immediate value are properly separable.
I wrote the following defintions:
#define SIGN_EXTEND(i) ((i) & 0x00800000 ? (i) | 0xFF000000 : (i)) // Handles negative Immediate Values (i)
#define IMMEDIATE(x) SIGN_EXTEND(x & 0x00FFFFFF) // Returns the Immediate Value of an instruction (x)
#define OPCODE(x) (x >> 24) // Returns the Opcode of an instruction (x)
#define OP(o) ((o) << 24) // An instruction with an Opcode (o)
#define OPI(o, i) (OP(o) | IMMEDIATE(i)) // An instruction with an Opcode (o) and an Immediate Value (i)
In a method void runProg(uint32_t prog[]) {...}, I pass an Array that contains the instructions for an Assembler program, in order, separated by commas, like such:
uint32_t progTest[] = {OPI(PUSHC, 3), OPI(PUSHC, 4}, OP(ADD), OP(HALT)};
runProg(progTest);
Here's what runProg does:
void runProg(uint32_t prog[]) {
int pc = -1; // the program counter
uint32_t instruction;
do {
pc++;
instruction = prog[pc];
printf("Command %d : 0x%08x -> Opcode [%d] Immediate [%d]\n",
pc, instruction, OPCODE(instruction), IMMEDIATE(instruction));
} while (prog[pc] != OP(HALT));
}
So it prints out the full command as a hexidecimal, followed by just the Opcode, followed by just the immediate value. This works for all instructions I have defined. The test program above gives this output:
Command 0 : 0x01000003 -> Opcode [1] Immediate [3]
Command 1 : 0x01000004 -> Opcode [1] Immediate [4]
Command 2 : 0x02000000 -> Opcode [2] Immediate [0]
Command 3 : 0x00000000 -> Opcode [0] Immediate [0]
Now here's the problem:
The command PUSHC only works with positive values.
Changing the immediate value of the first PUSHC call to -3 produces this result:
Command 0 : 0xfffffffd -> Opcode [255] Immediate [-3]
Command 1 : 0x01000004 -> Opcode [1] Immediate [4]
Command 2 : 0x02000000 -> Opcode [2] Immediate [0]
Command 3 : 0x00000000 -> Opcode [0] Immediate [0]
So as you can see, the immediate value is displayed correctly, however the command is missing the Opcode.
Changing the definition of IMMEDIATE(x)
from
#define IMMEDIATE(x) SIGN_EXTEND(x & 0x00FFFFFF)
to
#define IMMEDIATE(x) (SIGN_EXTEND(x) & 0x00FFFFFF)
produces the exact opposite result, where the Opcode is correctly separated, but the immediate value is wrong:
Command 0 : 0x01fffffd -> Opcode [1] Immediate [16777213]
Command 1 : 0x01000004 -> Opcode [1] Immediate [4]
Command 2 : 0x02000000 -> Opcode [2] Immediate [0]
Command 3 : 0x00000000 -> Opcode [0] Immediate [0]
Therefore, I am relatively sure that my definition of SIGN_EXTEND(i) is flawed. However, I can't seem to put my finger on it.
Any ideas on how to fix this are greatly appreciated!

0x01fffffd is correct; the 8-bit operation code field contains 0x01, and the 24-bit immediate field contains 0xfffffd, which is the 24-bit two’s complement encoding of −3. When encoding, there is no need for sign extension; the IMMEDIATE macro is passed a (presumably) 32-bit two’s complement value, and its job is merely to reduce it to 24 bits, not to extend it. So it merely needs to be #define IMMEDIATE(x) ((x) & 0xffffff). When you want to interpret the immediate field, as for printing, then you need to convert from 24 bits to 32. For that, you need a different macro, the way you have different macros to encode an operation code (OPI) and to decode/extract/interpret the operation code (OPCODE).
You can interpret the immediate field with this function:
static int InterpretImmediate(unsigned x)
{
// Extract the low 24 bits.
x &= (1u<<24) - 1;
/* Flip the sign bit. If the sign bit is 0, this
adds 2**23. If the sign bit is 1, this subtracts 2**23.
*/
x ^= 1u<<23;
/* Convert to int and subtract 2**23. If the sign bit started as 0, this
negates the 2**23 we added above. If it started as 1, this results in
a total subtraction of 2**24, which produces the two’s complement of a
24-bit encoding.
*/
return (int) x - (1u<<23);
}

Related

WinDbg evaluate ebp+12

I try to understand things about stackpointer, basepointer .. how does it work .. and because most of the teaching material are not combined with a practical examples, I try to reproduce that: https://en.wikibooks.org/wiki/X86_Disassembly/Functions_and_Stack_Frames
Following very simple code by me:
#include <stdio.h>
int main()
{
function1(1, 2);
}
int function1(int a, int b)
{
int c = a + b;
return c;
}
I use WinDbg to execute the programm and set the breakpoint bm CallStackPractice!function1 and type g to hit the breakpoint and p to step into the function.
With ebp+8 we should get the first parameter. I did that in WinDbg:
0:000> ? poi(ebp+8)
Evaluate expression: 1 = 00000001
good. No we want our second parameter that should be ebp+12.
0:000> ? poi(ebp+12)
Evaluate expression: 270729434 = 102300da
We don't get 2 = 00000002. I opened the memory window in WinDbg and it shows me the correct value but why does my command not work?
Thank you!
UPDATE:
For better understanding the screenshot:
That's a common mistake. 12 means 0x12 by default.
If you want a decimal number 12, use 0n12 or 0xC or change the default number format using n 10 (I don't know anyone who does that, actually).
0:000> ? 12
Evaluate expression: 18 = 00000000`00000012
0:000> n 10
base is 10
0:000> ? 12
Evaluate expression: 12 = 00000000`0000000c
Back at base 16:
1:005:x86> n 16
base is 16
1:005:x86> ? poi(ebp+8)
Evaluate expression: 1 = 00000001
1:005:x86> ? poi(ebp+c)
Evaluate expression: 2 = 00000002
If you get weird errors like
1:005:x86> ?poi(ebp +c)
Memory access error at ')'
that's because you're still at base 10.
You might also want to take a look at the stack with dps like so:
1:005:x86> dps ebp-c L7
008ff60c cccccccc <-- magic number (4x INT 3 breakpoint)
008ff610 00000003
008ff614 cccccccc
008ff618 008ff6f4
008ff61c 00fb189a DebugESPEBP!main+0x2a [C:\...\DebugESPEBP.cpp # 13]
008ff620 00000001 <-- a
008ff624 00000002 <-- b
As you see, dps will give you the return address as a symbol with line number. And you'll see that the memory layout in debug mode contains magic numbers helpful for debugging

How to read binary inputs from a file in C

What I need to do is to read binary inputs from a file. The inputs are for example (binary dump),
00000000 00001010 00000100 00000001 10000101 00000001 00101100 00001000 00111000 00000011 10010011 00000101
What I did is,
char* filename = vargs[1];
BYTE buffer;
FILE *file_ptr = fopen(filename,"rb");
fseek(file_ptr, 0, SEEK_END);
size_t file_length = ftell(file_ptr);
rewind(file_ptr);
for (int i = 0; i < file_length; i++)
{
fread(&buffer, 1, 1, file_ptr); // read 1 byte
printf("%d ", (int)buffer);
}
But the problem here is that, I need to divide those binary inputs in some ways so that I can use it as a command (e.g. 101 in the input is to add two numbers)
But when I run the program with the code I wrote, this provides me an output like:
0 0 10 4 1 133 1 44 8 56 3 147 6
which shows in ASCII numbers.
How can I read the inputs as binary numbers, not ASCII numbers?
The inputs should be used in this way:
0 # Padding for the whole file!
0000|0000 # Function 0 with 0 arguments
00000101|00|000|01|000 # MOVE the value 5 to register 0 (000 is MOV function)
00000011|00|001|01|000 # MOVE the value 3 to register 1
000|01|001|01|100 # ADD registers 0 and 1 (100 is ADD function)
000|01|0000011|10|000 # MOVE register 0 to 0x03
0000011|10|010 # POP the value at 0x03
011 # Return from the function
00000110 # 6 instructions in this function
I am trying to implement some sort of like assembly language commands
Can someone please help me out with this problem?
Thanks!
You need to understand the difference between data and its representation. You are correctly reading the data in binary. When you print the data, printf() gives the decimal representation of the binary data. Note that 00001010 in binary is the same as 10 in decimal and 00000100 in binary is 4 in decimal. If you convert each sequence of bits into its decimal value, you will see that the output is exactly correct. You seem to be confusing the representation of the data as it is output with how the data is read and stored in memory. These are two different and distinct things.
The next step to solve your problem is to learn about bitwise operators: |, &, ~, >>, and <<. Then use the appropriate combination of operators to extract the data you need from the stream of bits.
The format you use is not divisible by a byte, so you need to read your bits into a circular buffer and parse it with a state machine.
Read "in binary" or "in text" is quite the same thing, the only thing that change is your interpretation of the data. In your exemple you are reading a byte, and you are printing the decimal value of that byte. But you want to print the bit of that char, to do that you just need to use binary operator of C.
For example:
#include <stdio.h>
#include <stdbool.h>
#include <string.h>
#include <limits.h>
struct binary_circle_buffer {
size_t i;
unsigned char buffer;
};
bool read_bit(struct binary_circle_buffer *bcn, FILE *file, bool *bit) {
if (bcn->i == CHAR_BIT) {
size_t ret = fread(&bcn->buffer, sizeof bcn->buffer, 1, file);
if (!ret) {
return false;
}
bcn->i = 0;
}
*bit = bcn->buffer & ((unsigned char)1 << bcn->i++); // maybe wrong order you should test yourself
// *bit = bcn->buffer & (((unsigned char)UCHAR_MAX / 2 + 1) >> bcn->i++);
return true;
}
int main(void)
{
struct binary_circle_buffer bcn = { .i = CHAR_BIT };
FILE *file = stdin; // replace by your file
bool bit;
size_t i = 0;
while (read_bit(&bcn, file, &bit)) {
// here you must code your state machine to parse instruction gl & hf
printf(bit ? "1" : "0");
if (++i >= 7) {
i = 0;
printf(" ");
}
}
}
Help you more would be difficult, you are basically asking help to code a virtual machine...

Using a bitmask and if statement

I am trying to allow multiple cases to run in a switch statement. I have a bitmask as follows:
#define SHOOT_ROCKET 2 << 16
#define MOVE_FORWARD 3 << 16
Later, I do
switch (int game_action)
and I have
case SHOOT_ROCKET:
result = fire_weapon(rl);
I don't want to 'break', because I want possibility of multiple actions. But I am returning a value called 'result'. I store this as a variable and return at the end. I can tell other case: statements are running though even when they shouldn't because result keeps getting changed and doesn't if I add break;
What is the best way to deal with this?
Update: I've been told to do if instead.
I changed my << bitmasks so they start at 1 now.
I am experiencing a weird bug
if (game_action->action & SHOOT_ROCKET)
{
game_action->data=5;
}
if (game_action->action & MOVE_FORWARD)
{
game_action->data=64;
}
I am not concerned about game_action being overwritten when I intend for multiple if's to evaluate to true
However: it seems MOVE_FORWARD is happening even if I only try and shoot a rocket!
game_action is a void pointer normally, so this is how it's setup in the function:
game_action = (game_action) *game_action_ptr;
I have verified the bitmask is correct with
printf("%d", game_action->action >> 16) which prints 2.
So why is '3' (the move forward) happening when I am only trying to shoot a rocket?
Please do update your question.
So why is '3' (the move forward) happening when I am only trying to shoot a rocket?
The first thing you want to look at is
#define SHOOT_ROCKET 2 << 16
#define MOVE_FORWARD 3 << 16
and think what that evaluates to. It will evaluate to the following binary numbers (Python is handy for this kind of stuff, 0b is just a prefix that means binary is following):
>>> bin(2 << 16)
'0b100000000000000000'
>>> bin(3 << 16)
'0b110000000000000000'
So you see that you use one bit twice in your #defines (Retired Ninja already pointed this out). This means that if game_action->action is set to anything where the bit 2 << 16 is 1, both of your ifs will evaluate to true, because both #defines have that bit set to 1.
to make them mutually exclusive, should i do 2, 4, 8, instead of 1,2,3,4?
If you want to easily keep track of which bits are used, you can either use powers of two (1,2,4,8,16, etc), e.g. #define MOVE_FORWARD 4 (I'm ignoring the << 16 you have, you can add that if you want), or you can shift a 1 by a variable number of bits, both result in the same binary numbers:
#define MOVE_LEFT 1 << 0
#define MOVE_RIGHT 1 << 1
#define MOVE_UP 1 << 3
#define MOVE_DOWN 1 << 4
There are legitimate cases where bitmasks need to have more than one bit set, for example for checking if any one of several bits are set:
#define MOVE_LEFT ... (as above)
#define SHOOT_ROCKET 1 << 5
#define SHOOT_GUN 1 << 6
//...
#define ANY_MOVEMENT 0xF
#define ANY_WEAPON_USE 0xF << 4
and then check:
if (action & ANY_MOVEMENT) { ... }
if (action & ANY_WEAPON_USE) { ... }
place parens '(' ')' around the value part (2<<15) kind of values so there is no errors introduced by the text replacement.
I.E. this:
'if( (&game_action->action & MOVE_FORWARD) == MOVE_FORWARD)'
becomes
'if( (&game_action->action & 2 << 16) == 2 << 16)'
Note the posted code is missing a left paren, which I added.
Where the '&' has a higher Precedence the '<<' so it (effectively) becomes
'if( ( (&game_action->action & 2) << 16) == 2 << 16)'
where the '&' is done to the 2 and not to the 2<<16

Why is it the convention of some programmers to use long bit shifts instead of directly assigning a value / storing it as a constant

Consider the following code from the wikipedia article on square root algorythms
short isqrt(short num) {
short res = 0;
short bit = 1 << 14; // The second-to-top bit is set: 1 << 30 for 32 bits
// "bit" starts at the highest power of four <= the argument.
while (bit > num)
bit >>= 2;
while (bit != 0) {
if (num >= res + bit) {
num -= res + bit;
res = (res >> 1) + bit;
}
else
res >>= 1;
bit >>= 2;
}
return res;
}
Why would someone ever do this:
short bit = 1<< 14;
Instead of just assigning the value directly:
short bit = 0x4000;
Some RISC instruction sets will let you shift by a given amount, which is handy. MIPS lets you supply the shamt parameter for example. Other instruction sets aren't so handy. An MSP430 (16 bit - not the extended instructions) compiler would need to render this as a looped call to the RLA pseudo instruction I suppose.
So in some cases it seems like it does not 'hurt' to do it this way, but in other cases it seems like it could. So is there ever an advantage to having long shifts like that? Because it seems like it would make code less portable in a certain sense.
Do X86 or other CISC machines do something magical with this that I just don't know about? :)
It is just more explicit, and, as it involves only constants, will most certainly be evaluated at compile time, not at run time, so in the end, the resulting code will be exactly the same.
C has integer constant expressions, which lend themselves to evaluation during translation. That is why you can do things like this:
enum {
foo_bit,
bar_bit,
baz_bit,
foo_flag = 1 << foo_bit,
bar_flag = 1 << bar_bit,
baz_flag = 1 << baz_bit
};
static unsigned int some_flags = bar_flag | baz_flag;
Because it has static duration, the initializer for some_flags must be an integer constant expression; computable during translation. What we give it certainly is, even though there are multiple computations involved. This example only has one magic number: 1.
More options to match documentation. Whatever is driving the code, drives the style.
A supporting document may say the 14th bit (of a range 0 to 15) is set and the following code relates directly to that. Note 1 << 14 is signed.
int x = 1 << 14;
Docs may say use mask 0x4000, then the following is better. Note 0x4000 is unsigned.
int mask = 0x4000;
It's more readable and just as #jcaron said, it will all be the same thing
after all. Here is the relevant part if I compile to assembly, not even
optimised, just -S:
_isqrt: ## #isqrt
[...]
Ltmp4:
.cfi_def_cfa_register %rbp
movw %di, %ax
movw %ax, -2(%rbp)
movw $0, -4(%rbp)
movw $16384, -6(%rbp) ## imm = 0x4000
Writing 16384, 0x4000 or 1<<14 in this declaration is a matter of
preference.
Maybe even if your compiler supports it, why not 0b10000000000000? :-)

need help in understanding Arm processor

I am a mechanical eng student, I am currently studying about the ARM processor. I just came across a question but I don't understand how they could arrive at these answers. Need help in understanding. Also please help in how to convert from negative decimal to hexadecimal. Thank you.
What is the result of the following calculations performed in ARM? How
are the status flags set? (Write the operands and the result in 32-bit
hexadecimal notation!)
(–1) + (+1)
(0) – (+1)
(2^31 – 1) + (+1)
(–4) + (+5)
The Answers are :
(-1)+(+1):
-1: 0xFFFFFFFF
1: 0x00000001
----------------
0: 0x00000000
N=0, Z=1, C=1, V=0
(0)-(+1): subtraction replaced by addition and negation => (0)+(-1)
0: 0x00000000
-1: 0xFFFFFFFF
----------------
0: 0xFFFFFFFF
N=1, Z=0, C=0, V=0
(2^31-1)+(+1):
: 0x7FFFFFFF
1: 0x00000001
----------------
0: 0x80000000
N=1, Z=0, C=0, V=1
(-4)+(+5):
-4: 0xFFFFFFFC
5: 0x00000005
----------------
1: 0x00000001
N=0, Z=0, C=1, V=0
The way to translate negative binary into hexa is called two's complement
Status flags are:
N : strictly negative result
Z : result is zero
C : carry : if you consider the numbers as unsigned, the operation produces a carry on the most significant bit, on ARM, this means that the result should be 2^32 + the actual result stored in the register.
V : signed oVerflow. This means that the result of your operation hasn't the sign it should have. For instance, you add two positive integers and you get a negative one.

Resources