Analyzing and Understand what this code does - c

Long time lurker, first time poster. I'm a student and haven't touched a programming course in two years. Now that I'm taking courses again, I'm having difficulty reading and understanding other people's code. I have a sample-code here from HMC containing a portion of a simple memory allocator in C.
void free_block(ptr p) {
*p = *p & ~1; /* clear allocated flag */
next = (unsigned*)((char*)p + *p); /* find next blk */
if ((*next & 1) == 0) /* if not allocated... */
*p += *next; /* add to this block */
}
Despite the comments, I'm still having confusion as to what's going on here exactly. I know what the code does but I would never be able to code this myself. If someone could possibly explain the mathy portions of this code, I would be extremely grateful.

Due to byte alignment, the last binary bit is not needed for allocation size,
so instead it is being used as a flag whether that block is allocated.
The size of the allocation, is represented by the value at the beginning of the block.
~1 is a bitwise inverse of 1, meaning instead of 0x01, it is 0xFE
*p = *p & ~1; /* clear allocated flag */
0xFFFFFFFE
(1111 1111 1111 1111 1111 1111 1111 1110)
A bitwise AND operation with the current value clears the last bit.
The result is the size of the original allocation.
Technically, it dereferences the value at the address at p, and performs a bitwise AND operation with 0xFFFFFFFE effectively keeping all of the value bits but the least significant
(ensuring the value no longer ends with a 1 if it originally did).
next = (unsigned*)((char*)p + *p); /* find next blk */
'next' is a pointer that point to the subsequent location from
p + [result value from the above statement]
if ((*next & 1) == 0) /* if not allocated... */
*p += *next; /* add to this block */
if the binary value at 'next' does not end with a one (bitwise AND operation),
take the value at 'p' and add the value at 'next', and assign the result to the location at 'p'.
So, in other words, if the next block is unallocated, then the original block includes it by adding that block size to itself (effectively removing it's existence).
Good luck. I hope this helps.

The operations are presumably clear, so I'll try to handle the actual interpretation in context.
*p = *p & ~1; /* clear allocated flag */
The ~ operator complements the bits. So, ~1 is all 1 bits except the last, therefore zeroing out the last bit.
We don't know why...yet.
next = (unsigned*)((char*)p + *p); /* find next blk */
Here, we have p cast to a pointer to a char ((char*)p), added to whatever the contents of p happen to be. That sum is treated as a pointer to an unsigned int. More or less, you're treating p as an offset from itself.
I'm pretty sure this is a bad idea without a lot more support code to make sure it's safe, but you're reading the code, not writing it...
if ((*next & 1) == 0) /* if not allocated... */
We now know why the least-significant bit was dropped in the first line. In this scheme, the code uses that bit to mark that the block has been allocated.
*p += *next; /* add to this block */
Here, we're just pushing p to beyond the thing we last allocated.
Without knowing what next points at, we can guess that it's moving through some sort of linked list-like structure. It's precarious, though, without knowing how p gets populated and how those pointers get structured.
But the upshot is that the code claims the next block and skips the pointer past it.

Related

What size data / position is this C code assigning to a value

I am trying to port some C code to another language. Most of the code is working, except working out what this section does. The C code I've been handed is not in a compilable state so I can't do a run time analysis, but will fix that as a last resort.
Buffer is pointer to file's raw contents.
Based on reading the code my expectation was this:
if pos = 4132, then this is trying to read a 16-bit unsigned value from file position (pos << 8) + (pos + 1).
However when I make this calculation for pos = 4132, I get a file position of 1061925, which is way beyond the end of the file.
unsigned short id;
struct file
{
char *buffer;
}
id = (*(file_instance->buffer + pos)<<8) + *(file_instance->buffer+pos+1);
To make the code easier to read, you could transform it using the identities *(A+B) == A[B], and X << 1 == X * 2:
id = (file_instance->buffer[0] * 256) + file_instance->buffer[1];
If buffer were an unsigned char *, then this code would be a common idiom for reading a 16-bit integer out of memory, where the first octet is the most-significant one. For example if the memory were { 0x01, 0x02 } then you can verify that this equation produces the integer value 0x0102.
However you said buffer is a char *. In C, char may either be a signed or an unsigned type. If it is unsigned on the system you are working on, then everything is fine. Also, everything is fine if the data you are reading never has the MSB set of any byte.
But if char is signed then the code causes undefined behaviour due to left-shifting a negative number, when the value at file_instance->buffer[0] is a negative value (i.e. has the MSB set). Also there may be further unexpected behaviour when adding two negative numbers.
If you are unable to run the code then it may be difficult to work out what the existing behaviour was, since it would be at the mercy of various hardware and compiler optimization details.
If you can run code on the target system then you could try and see what happens with a code snippet like:
#include <stdio.h>
int main()
{
char buf[2] = { 0xAA, 0xBB }; // put sample data here
int r = buf[0] << 8 + buf[1];
printf("%x\n", (unsigned)r);
}

C programming: words from byte array

I have some confusion regarding reading a word from a byte array. The background context is that I'm working on a MIPS simulator written in C for an intro computer architecture class, but while debugging my code I ran into a surprising result that I simply don't understand from a C programming standpoint.
I have a byte array called mem defined as follows:
uint8_t *mem;
//...
mem = calloc(MEM_SIZE, sizeof(uint8_t)); // MEM_SIZE is pre defined as 1024x1024
During some of my testing I manually stored a uint32_t value into four of the blocks of memory at an address called mipsaddr, one byte at a time, as follows:
for(int i = 3; i >=0; i--) {
*(mem+mipsaddr+i) = value;
value = value >> 8;
// in my test, value = 0x1084
}
Finally, I tested trying to read a word from the array in one of two ways. In the first way, I basically tried to read the entire word into a variable at once:
uint32_t foo = *(uint32_t*)(mem+mipsaddr);
printf("foo = 0x%08x\n", foo);
In the second way, I read each byte from each cell manually, and then added them together with bit shifts:
uint8_t test0 = mem[mipsaddr];
uint8_t test1 = mem[mipsaddr+1];
uint8_t test2 = mem[mipsaddr+2];
uint8_t test3 = mem[mipsaddr+3];
uint32_t test4 = (mem[mipsaddr]<<24) + (mem[mipsaddr+1]<<16) +
(mem[mipsaddr+2]<<8) + mem[mipsaddr+3];
printf("test4= 0x%08x\n", test4);
The output of the code above came out as this:
foo= 0x84100000
test4= 0x00001084
The value of test4 is exactly as I expect it to be, but foo seems to have reversed the order of the bytes. Why would this be the case? In the case of foo, I expected the uint32_t* pointer to point to mem[mipsaddr], and since it's 32-bits long, it would just read in all 32 bits in the order they exist in the array (which would be 00001084). Clearly, my understanding isn't correct.
I'm new here, and I did search for the answer to this question but couldn't find it. If it's already been posted, I apologize! But if not, I hope someone can enlighten me here.
It is (among others) explained here: http://en.wikipedia.org/wiki/Endianness
When storing data larger than one byte into memory, it depends on the architecture (means, the CPU) in which order the bytes are stored. Either, the most significant byte is stored first and the least significant byte last, or vice versa. When you read back the individual bytes through byte access operations, and then merge them to form the original value again, you need to consider the endianess of your particular system.
In your for-loop, you are storing your value byte-wise, starting with the most significant byte (counting down the index is a bit misleading ;-). Your memory looks like this afterwards: 0x00 0x00 0x10 0x84.
You are then reading the word back with a single 32 bit (four byte) access. Depending on our architecture, this will either become 0x00001084 (big endian) or 0x84100000 (little endian). Since you get the latter, you are working on a little endian system.
In your second approach, you are using the same order in which you stored the individual bytes (most significant first), so you get back the same value which you stored earlier.
It seems to be a problem of endianness, maybe comes from casting (uint8_t *) to (uint32_t *)

Understand the following line

I read this code in a library which is used to display a bitmap (.bmp) to an LCD.
I do really hard in understanding what is happening at the following lines, and how it does happen.
Maybe someone can explain this to me.
uint16_t s, w, h;
uint8_t* buffer; // does get malloc'd
s = *((uint16_t*)&buffer[0]);
w = *((uint16_t*)&buffer[18]);
h = *((uint16_t*)&buffer[22]);
I guess it's not that hard for a real C programmer, but I am still learning, so I thought I just ask :)
As far as I understand this, it sticks somehow together two uint8_tvariables to an uint16_t.
Thanks in advance for your help here!
In the code you've provided, buffer (which is an array of bytes) is read, and values are extracted into s, w and h.
The (uint16_t*)&buffer[n] syntax means that you're extracting the address of the nth byte of buffer, and casting it into a uint16_t*. The casting tells the compiler to look at this address as if points at a uint16_t, i.e. a pair of uint8_ts.
The additional * in the code dereferences the pointer, i.e. extracts the value from this address. Since the address now points at a uint16_t, a uint16_t value is extracted.
As a result:
s gets the value of the first uint16_t, i.e. bytes 0 and 1.
w gets the value of the tenth uint16_t, i.e. bytes 18 and 19.
h gets the value of the twelveth uint16_t, i.e. bytes 22 and 23.
The code:
takes two bytes at positions 0 and 1 in the buffer, sticks them together into an unsigned 16-bit value, and stores the result in s;
it does the same with bytes 18/19, storing the result in w;
ditto for bytes 22/23 and h.
It is worth noting that the code uses the native endianness of the target platform to decide which of the two bytes represents the top 8 bits of the result, and which represents the bottom 8 bits.
uint8_t* buffer; // pointer to 8 bit or simply one byte
Buffer points to memory address of bytes -> |byte0|byte1|byte2|....
(uint16_t*)&buffer[0] // &buffer[0] is actually the same as buffer
(uint16_t*)&buffer[0] equals (uint16_t*)buffer; it points to 16 bit or halfword
(uint16_t*)buffer points to memory: |byte0byte1 = halfword0|byte2byte3 = halfword1|....
w = *((uint16_t*)&buffer[18]);
Takes memory address to byte 18 in buffer, then reinterpret this address to address of halfword then gets halfword on this address;
it's simply w = byte18 and byte19 sticked together forming a halfword
h = *((uint16_t*)&buffer[22]);
h = byte22 and byte 23 sticked together
UPD More detailed explanation:
h = *((uint16_t*)&buffer[22]) =>
1) buffer[22] === 22nd uint8_t (a.k.a. byte) of buffer; let's call it byte22
2) &buffer[22] === &byte === address of byte22 in memory; it's of type uint8_t*, as same as buffer; letscall it byte22_address;
3) (uint16_t*)&buffer[22] = (uint16_t*)byte22_address; casts address of byte to address of (two bytes sticked together; address of halfword of the same address; let's call it halfword11_address;
4) h = *((uint16_t*)&buffer[22]) === *halfword11_address; * operator takes value at address, that is 11th halfword or bytes 22 and 23 sticked together;

Large bit arrays in C

Our OS professor mentioned that for assigning a process id to a new process, the kernel incrementally searches for the first zero bit in a array of size equivalent to the maximum number of processes(~32,768 by default), where an allocated process id has 1 stored in it.
As far as I know, there is no bit data type in C. Obviously, there's something I'm missing here.
Is there any such special construct from which we can build up a bit array? How is this done exactly?
More importantly, what are the operations that can be performed on such an array?
Bit arrays are simply byte arrays where you use bitwise operators to read the individual bits.
Suppose you have a 1-byte char variable. This contains 8 bits. You can test if the lowest bit is true by performing a bitwise AND operation with the value 1, e.g.
char a = /*something*/;
if (a & 1) {
/* lowest bit is true */
}
Notice that this is a single ampersand. It is completely different from the logical AND operator &&. This works because a & 1 will "mask out" all bits except the first, and so a & 1 will be nonzero if and only if the lowest bit of a is 1. Similarly, you can check if the second lowest bit is true by ANDing it with 2, and the third by ANDing with 4, etc, for continuing powers of two.
So a 32,768-element bit array would be represented as a 4096-element byte array, where the first byte holds bits 0-7, the second byte holds bits 8-15, etc. To perform the check, the code would select the byte from the array containing the bit that it wanted to check, and then use a bitwise operation to read the bit value from the byte.
As far as what the operations are, like any other data type, you can read values and write values. I explained how to read values above, and I'll explain how to write values below, but if you're really interested in understanding bitwise operations, read the link I provided in the first sentence.
How you write a bit depends on if you want to write a 0 or a 1. To write a 1-bit into a byte a, you perform the opposite of an AND operation: an OR operation, e.g.
char a = /*something*/;
a = a | 1; /* or a |= 1 */
After this, the lowest bit of a will be set to 1 whether it was set before or not. Again, you could write this into the second position by replacing 1 with 2, or into the third with 4, and so on for powers of two.
Finally, to write a zero bit, you AND with the inverse of the position you want to write to, e.g.
char a = /*something*/;
a = a & ~1; /* or a &= ~1 */
Now, the lowest bit of a is set to 0, regardless of its previous value. This works because ~1 will have all bits other than the lowest set to 1, and the lowest set to zero. This "masks out" the lowest bit to zero, and leaves the remaining bits of a alone.
A struct can assign members bit-sizes, but that's the extent of a "bit-type" in 'C'.
struct int_sized_struct {
int foo:4;
int bar:4;
int baz:24;
};
The rest of it is done with bitwise operations. For example. searching that PID bitmap can be done with:
extern uint32_t *process_bitmap;
uint32_t *p = process_bitmap;
uint32_t bit_offset = 0;
uint32_t bit_test;
/* Scan pid bitmap 32 entries per cycle. */
while ((*p & 0xffffffff) == 0xffffffff) {
p++;
}
/* Scan the 32-bit int block that has an open slot for the open PID */
bit_test = 0x80000000;
while ((*p & bit_test) == bit_test) {
bit_test >>= 1;
bit_offset++;
}
pid = (p - process_bitmap)*8 + bit_offset;
This is roughly 32x faster than doing a simple for loop scanning an array with one byte per PID. (Actually, greater than 32x since more of the bitmap is will stay in CPU cache.)
see http://graphics.stanford.edu/~seander/bithacks.html
No bit type in C, but bit manipulation is fairly straight forward. Some processors have bit specific instructions which the code below would nicely optimize for, even without that should be pretty fast. May or may not be faster using an array of 32 bit words instead of bytes. Inlining instead of functions would also help performance.
If you have the memory to burn just use a whole byte to store one bit (or whole 32 bit number, etc) greatly improve performance at the cost of memory used.
unsigned char data[SIZE];
unsigned char get_bit ( unsigned int offset )
{
//TODO: limit check offset
if(data[offset>>3]&(1<<(offset&7))) return(1);
else return(0);
}
void set_bit ( unsigned int offset, unsigned char bit )
{
//TODO: limit check offset
if(bit) data[offset>>3]|=1<<(offset&7);
else data[offset>>3]&=~(1<<(offset&7));
}

mprotect - how aligning to multiple of pagesize works?

I am not understanding the 'aligning allocated memory' part from the mprotect usage.
I am referring to the code example given on http://linux.die.net/man/2/mprotect
char *p;
char c;
/* Allocate a buffer; it will have the default
protection of PROT_READ|PROT_WRITE. */
p = malloc(1024+PAGESIZE-1);
if (!p) {
perror("Couldn't malloc(1024)");
exit(errno);
}
/* Align to a multiple of PAGESIZE, assumed to be a power of two */
p = (char *)(((int) p + PAGESIZE-1) & ~(PAGESIZE-1));
c = p[666]; /* Read; ok */
p[666] = 42; /* Write; ok */
/* Mark the buffer read-only. */
if (mprotect(p, 1024, PROT_READ)) {
perror("Couldn't mprotect");
exit(errno);
}
For my understanding, I tried using a PAGESIZE of 16, and 0010 as address of p.
I ended up getting 0001 as the result of (((int) p + PAGESIZE-1) & ~(PAGESIZE-1)).
Could you please clarify how this whole 'alignment' works?
Thanks,
Assuming that PAGESIZE is a power of 2 (a requirement), an integral value x can be rounded down to a multiple of PAGESIZE with (x & ~(PAGESIZE-1)). Similarly, ((x + PAGESIZE-1) & ~(PAGESIZE-1)) will result in x rounded up to a multiple of PAGESIZE.
For example, if PAGESIZE is 16, then in binary with a 32-bit word:
00000000000000000000000000010000 PAGESIZE
00000000000000000000000000001111 PAGESIZE-1
11111111111111111111111111110000 ~(PAGESIZE-1)
A bitwise-and (&) with the above value will clear the low 4 bits of the value, making it a multiple of 16.
That said, the code quoted in the description is from an old version of the manual page, and is not good because it wastes memory and does not work on 64-bit systems. It is better to use posix_memalign() or memalign() to obtain memory that is already properly aligned. The example on the current version of the mprotect() manual page uses memalign(). The advantage of posix_memalign() is that it is part of the POSIX standard, and does not have different behavior on different systems like the older non-standard memalign().

Resources