Strict aliasing in flexible array member? - c

I'm writing an Arena Allocator and it works, but I feel like it violates strict aliasing rules. I want to know if I'm right or wrong. Here's the relevant part of the code:
typedef struct ArenaNode ArenaNode;
struct ArenaNode {
ArenaNode *next;
size_t dataSize;
u8 data[];
};
typedef struct {
ArenaNode *head;
ArenaNode *current;
size_t currentIndex;
} Arena;
static ArenaNode *ArenaNodeNew(size_t dataSize, ArenaNode *next)
{
ArenaNode *n = malloc(sizeof(ArenaNode) + dataSize);
n->next = NULL;
n->dataSize = dataSize;
return n;
}
void *ArenaAlloc(Arena *a, size_t size)
{
const size_t maxAlign = alignof(max_align_t);
size_t offset = nextHigherMultiplePow2(offsetof(ArenaNode, data), maxAlign) - offsetof(ArenaNode, data);
size_t dataSize = offset + max(size, ARENA_SIZE);
// first time
void *ptr;
if (a->head == NULL) {
ArenaNode *n = ArenaNodeNew(dataSize, NULL);
a->head = n;
a->current = n;
ptr = n->data + offset;
a->currentIndex = nextHigherMultiplePow2(offset + size, maxAlign);
} else {
// enough space
if (a->currentIndex + size <= a->current->dataSize) {
ptr = &a->current->data[a->currentIndex];
a->currentIndex = nextHigherMultiplePow2(a->currentIndex + size, maxAlign);
} else {
ArenaNode *n = ArenaNodeNew(dataSize, NULL);
a->current->next = n;
a->current = n;
ptr = n->data + offset;
a->currentIndex = nextHigherMultiplePow2(offset + size, maxAlign);
}
}
return ptr;
}
The Arena is a linked list of Nodes and a Node is a header followed by data u8 data[]. u8 is unsigned char.
I maintain the next available index (currentIndex) and advance data by this index and return it as void * (ptr = &a->current->data[a->currentIndex]). Does this violate strict aliasing rule because I'm converting a pointer to u8 to something else and using that?
My confusion comes from the fact that memory returned by malloc has no effective type. But since I'm casting the malloc'd pointer to ArenaNode * and setting its data members (next and dataSize) after allocating it (in ArenaNodeNew), the effective type becomes ArenaNode. Or does it? I didn't set data field of that.
Basically, I think the question can be simplified to this: If I malloc a memory region of say, size 10, cast the pointer to struct {int a;} * (assume 4 bytes int), set it a to something, what happens to the rest of the 6 bytes? Does it have any effective type? Does the presence of flexible array member affect this in any way?

The extra bytes that are part of the flexible array member will have the effective type of that member as you write to them.
You can safely declare ptr as u8 * and define your function to return that type as well.
In your example of allocating 10 bytes and treating the first 4 as a struct of the given type, the remaining bytes have no effective type yet. You can use those for any type, assuming the pointer you use is aligned correctly, i.e. you can point a int * to the following bytes but not a long long *. due to alignment.

Does this violate strict aliasing rule because I'm converting a pointer to u8 to something else and using that?
No, you are not violating strict aliasing, but your code might violate the constraints imposed by 7.22.3 Memory management functions, paragraph 1:
The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and then used to access such an object or an array of such objects in the space allocated ...
You don't appear to be making sure the memory you use for any object is "suitably aligned" for any object. Given 6.3.2.3 Pointers, paragraph 7's statement:
A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined.
you appear to be risking undefined behavior.
"Suitably aligned" is extremely platform-dependent.

Related

Type casting struct to short/int for saving value on adress

I would like to know, if its possible to cast struct to short but only 2 bites of its adress and save value in there. I personally dont even know if its possible just wanna get any ideas how to do that.
In my project i link void adress of char to struct and then doing something similar like malloc but without using malloc.. making somthing like function malloc.
My struct and its pointer:
typedef struct mem_list {
int size;
struct mem_list *next;
struct mem_list *prev;
}mem_list;
mem_list *start;
my function memory init:
void memory_init(void *ptr, unsigned int size){
mem_list *temp;
temp = (mem_list*)ptr;
if(size <= sizeof(mem_list)){
temp->size = 0;
printf("Failed\n");
return;
}
else
{
temp->size = size - sizeof(mem_list);
temp->next = NULL;
*((unsigned short*)(&temp + size - sizeof(unsigned short))) = 0;
start = temp;
printf("Inicialized was %d bits\n",size-sizeof(mem_list));
return;
}
}
My main:
int main() {
char region[100];
memory_init(region, 60);
//char* pointer = memory_alloc(20);
//printf("adresa %d\n", pointer);
return 0;
}
My problem is in function memory init in this part of code:
*((unsigned short*)(&temp + size - sizeof(unsigned short))) = 0;
What i want to do is to move to end of my inicialized memory and save there short typed zero for showing me later where is end of my memory. And also would like to ask how can i acces that value later? I know there maybe are mistakes in my code. Woul be happy if you point me where and give me some ideas how to do that. thank you :)
(&temp + size - sizeof(unsigned short))): &temp is the address of the pointer to your mem_list, so &temp + xxx is the address of somewhere in the stack :-(
The address of the last byte of your mem_list object is (char*)temp + size.
To be cleaner you could define your
typedef struct mem_list {
int size;
struct mem_list *next;
struct mem_list *prev;
unsigned short body[]
} mem_list_t ;
Then:
blen = (size + sizeof(unsigned short) - 1) / sizeof(unsigned short) ;
temp->body[blen] = 0 ;
writes 0 to the last unsigned short of the body of the mem_list_t.
Note that this assumes that ptr points to an object which has been allocated with asize bytes:
asize = offsetof(mem_list_t, body[blen+1]) ;
with blen calculated as above. (And ptr needs to be aligned as required for mem_list_t, of course.)
It is not clear whether you can reuse a char buffer to create objects of other types in it(*), but you should at least care about alignment. Some processors require non char types to be correctly aligned, for example that:
the address of an int16_t shall be even
the address of an int32_t or larger shall be a multiple of 4
And even if some other processors do not enforce this rule, accessing mis-aligned data often adds a significant overhead. That is the reason for padding in structs.
So without more precautions, this line:
*((unsigned short*)(&temp + size - sizeof(unsigned short))) = 0;
could break because if size is odd, you are trying to write an unsigned short at an odd address.
(*) For more details, you can read that other post from mine, specialy the comments on my own answer
if its possible to cast struct to short but only 2 bites of its adress and save value in there
No, it isn't possible. *((unsigned short*)(&temp...) invokes undefined behavior. It is a so-called "strict aliasing violation" and can also lead to misalignment issues depending on system. What is the strict aliasing rule?
The rule of thumb is: never wildly cast between completely different pointer types. You need a lot of detailed knowledge about C in order to so in a safe manner.
You can do "type punning" either by using a union between the struct and a unsigned short though. Please note that endianess is an issue to consider when doing so.
Other than that, you can safely memcpy the contents of a struct into an allocated unsigned short or vice versa. memcpy is excempt from pointer aliasing rules and will handle alignment safely.

Limitation of converting pointer to one type to pointer to another type

I'm experiencing some troubles with understanding convertation of "pointer to" types. Let me provide some examples:
struct test{
int x;
int y;
};
1.
void *ptr = malloc(sizeof(int));
struct test *test_ptr = ptr; //OK 7.22.3(p1)
int x = test_ptr -> x; //UB 6.2.6.1(p4)
2.
void *ptr = malloc(sizeof(struct test) + 1);
struct test *test_ptr = ptr + 1; //UB 6.3.2.3(p7)
3.
void *ptr = malloc(sizeof(struct test) + 1);
struct test *test_ptr = ptr; //OK 7.22.3(p1)
int x = test_ptr -> x; //Unspecified behavior or also UB?
My understaing of the cases:
The pointer convertation returned by malloc is ok by itself as 7.22.3(p1):
The pointer returned if the allocation succeeds is suitably aligned so
that it may be assigned to a pointer to any type of object with a
fundamental alignment requirement
The accessing is incorrect because the test_ptr cannot point to a valid struct test_ptr object since its size is less then the one allocated with malloc causing UB as explained at 6.2.6.1(p4).
This is UB since we cannot say anything about alignment of ptr + 1 pointer. 6.3.2.3(p7) explains this:
A pointer to an object type may be converted to a pointer to a
different object type. If the resulting pointer is not correctly
aligned68) for the referenced type, the behavior is undefined.
How is case 3 explained in the Standard?
It is unspecified in the standard (at least I could not find) if it is valid to convert a pointer to an object with no declared type to a pointer to an object whose size is less then the one allocated object has? (I'm not considering the array allocation here like malloc(10 * sizeof(struct test)); which is clearly explained at 7.22.3(p1)). 6.2.6.1(p4) states:
Values stored in non-bit-field objects of any other object type
consist of n × CHAR_BIT bits, where n is the size of an object of that
type, in bytes.
The allocated object does not consist of sizeof(struct test) x CHAR_BIT bits, but (sizeof(struct test) + 1) x CHAR_BIT
This has to be legal because in C we have flexible array members.
typedef struct flex_s {
int x;
int arr[];
} flex_t;
void *ptr = malloc(sizeof(flex_t) + sizeof(int));
flex_t *flex = ptr;
flex->arr[0]; // legal
So, if you want an answer from the standard, look at its definition of flexible array members and their allocation, and the rule will be given.
You can start by taking a look at example 20 of page 114 of the free draft of C11.

Generic array searcher and padding

I've been trying to implement a generic array searcher and came across this answer, which made me think that my implementation is only valid for dynamically allocated arrays.
The implementation looks like this:
void *array_search( void *arr,
size_t elem_size,
size_t len,
arr_search_checker v,
void *match)
{
char *p = arr;
for(unsigned i = 0; i < len; ++i)
{
if(v((void *) p, match))
return p;
p += elem_size;
}
return NULL;
}
The type arr_search_checker:
typedef bool (*arr_search_checker)(void *, void *);
Having a simple structure:
struct test_struct { int i; char c; };
And a check function:
bool test_checker(void *l, void *r)
{
struct test_struct *ls = l, *rs = r;
return ls->i == rs->i && ls->c == rs->c;
}
And array of length len which is an array of struct test_struct one can invoke the searcher:
struct test_struct match = { .i = 5, .c = 'A' };
struct test_struct *res = array_search(array,
sizeof(struct test_struct), len, test_checker, &match);
Is that true that this implementation of array_search is only valid for dynamically allocated arrays because of incrementation of the char pointer p by size of the single element in the array what can be dangerous for padded arrays? Or is it totally invalid?
Please state your question in the question topic.
The function array_search as valid for any arrays (don't know why dynamically allocated arrays are particular in any way). Char pointer p is incremented by elem_size. elem_size is assigned the value of sizeof(struct test_struct) in your example and that's perfectly ok. Padding has nothing to do with it. Imagine struct test_struct has some padding bytes added to it (anywhere, at the end of the structure or between any of it members). Then sizeof(struct test_struct) will be the size of the test_struct structure including the padding bytes, and p will still be increment correctly.
You may convert any pointer to void* and any pointer to char* without braking the strict aliasing rule. You cannot do arithmetic on void* pointers, that's why it gets converted to char* pointer. elem_size represents the size of a single array element in bytes, char represents one byte in memory, by doing p += elem_size; you add elem_size bytes to p (I like the form p = &p[elem_size];). The smallest addressable size in C is one byte (remember that byte may not be 8 bits) and the size of every structure or type in C must be an integral value (ie. sizeof(struct test_struct) cannot return 1,5).
For more, look at bsearch and qsort functions from standard C library. They have a very similar declaration to array_search and work with any array types, just like array_search here.

C - Structure's pointer

I'm having trouble understanding pointers in general I think.
I can't seem to follow the logic of this code:
typedef struct StackRecord
{
int Capacity;
int TopOfStack;
int* Array;
}*Stack;
In the following structure, *Stack was declared to receive addresses of StackRecord structure type via simply stating Stack due to typedef
BUT code below the return another receiver of addresss of StackRecord structure type. Why isn't it returning the address? But rather return same type of pointer to itself?
Stack CreateStack(int MaxElements)
{
Stack S;
if (MaxElements < MinStackSize)
{
printf("Error : Stack size is too small");
return 0;
}
S = (Stack)malloc(sizeof(struct StackRecord));
if (S == NULL)
{
printf("FatalError : Out of Space!!!");
return 0;
}
S->Array = (int*)malloc(sizeof(char)* MaxElements);
if (S->Array == NULL)
{
printf("FatalError : Out of Space!!!");
return 0;
}
S->Capacity = MaxElements;
MakeEmpty(S);
return S;
}
Getting rid of the typedef may make things a little clearer, believe it or not:
struct StackRecord
{
int Capacity;
int TopOfStack;
int* Array;
};
/**
* Return a pointer to a new, dynamically allocated instance
* of struct StackRecord
*/
struct StackRecord *CreateStack(int MaxElements)
{
struct StackRecord *S;
if (MaxElements < MinStackSize)
{
printf("Error : Stack size is too small");
return 0;
}
S = malloc(sizeof *S); // no need for cast, sizeof *S is same as sizeof (struct StackRecord)
if (S == NULL)
{
printf("FatalError : Out of Space!!!");
return 0;
}
/**
* Allocate the memory for the Array member of
* the new stack record instance.
*/
S->Array = malloc( sizeof *S->Array * MaxElements );
if (S->Array == NULL)
{
printf("FatalError : Out of Space!!!");
return 0;
}
S->Capacity = MaxElements;
MakeEmpty(S);
return S;
}
In the code you posted, Stack is basically a synonym for struct StackRecord *. The function creates a new instance of struct StackRecord using malloc, initializes the contents of that record, and returns a pointer to that new instance.
A note on the malloc calls - in C, you do not need to cast the result of malloc, and doing so is generally considered bad practice1. Also, the operand to sizeof doesn't have to be a type name - it can be an expression of the type you want to allocate. IOW, given a declaration like
T *p;
both sizeof (T) and sizeof *p do the same thing - the expression *p has type T. So the general form of a malloc call can be written as
T *p = malloc( sizeof *p * N );
or
T *p;
...
p = malloc( sizeof *p * N );
That's simpler to write and easier to maintain than
p = (T *) malloc( sizeof (T) * N );
<rant>
Hiding the pointer-ness of a type behind a typedef is bad juju, especially when the user of that type has to be aware that he or she is dealing with a pointer type. Assigning the result of malloc to S means that S must be a pointer type. Using the -> to access members of S means that S must be a pointer to a struct or union type. Since you have to be aware that S is a pointer, it makes no sense to hide that pointerness behind the typedef. Similarly, if the user has to be aware of the struct-ness of the type, you shouldn't hide that struct-ness behind a typedef either.
Abstraction is a powerful tool, but partial (leaky) abstractions like the original code just make life more confusing for everyone (as you have discovered for yourself).
</rant>
This is not true for C++, because C++ doesn't allow implicit conversions between void * and other pointer types the way C does. But, if you're writing C++, you shouldn't be using malloc anyway.
In the typedef, the type identifier Stack is a pointer to a struct. The function prototype for CreateStack() specifies a return value of type Stack, which is a pointer to a StackRecord struct. S is declared to be of type Stack in the function body, so the function does return a pointer to a StackRecord struct.
In comments on #DavidBowling's answer you express this apparent misconception:
Stack is a pointer to StackRecord which means pointer must contain another address to which it is pointing to.
The typedef declares the identifier Stack to be an alias for the type struct StackRecord *. That would perhaps be clearer if it were rewritten in this wholly equivalent form:
struct StackRecord
{
int Capacity;
int TopOfStack;
int* Array;
};
typedef struct StackRecord *Stack;
No object of type struct StackRecord is declared, only that type itself and type Stack.
When function CreateStack() allocates memory sufficient for a struct StackRecord ...
malloc(sizeof(struct StackRecord));
... it is perfectly reasonable to convert the resulting pointer to type struct StackRecord *. Indeed, type Stack is exactly the same type as struct StackRecord *, so that's precisely what the code in fact does. The converted pointer still points to the same memory, and when that pointer is returned, the return value also points to the same memory.

(C) How to write to/read from memory address returned by mmap?

I've read some of the pages regarding how to ask questions, so I hope this is up to standard.
Our professor wants us to build a custom malloc and free, one that uses buddy allocation. Instead of messing with the heap, he wants us to just use mmap to request 1 GiB of space from the OS:
MAX_MEM = 1 << 30.
void * base = mmap(NULL, MAX_MEM, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, 0, 0);
Each chunk of memory should have a header, and if the memory is empty, pointers to the next and previous free chunks via linked list.
I don't know how to say "I want to put this specific data in this specific place." I would imagine a free chunk to look like this in the memory:
[Occupancy (1 bit)][Size (7 bits)][prev pointer (8 bytes)][next pointer (8bytes)][junk]
So let's say that the whole 1 GiB is free. Pseudo Code:
Occupancy = 0; // 0 if empty, 1 if allocated
Size = 0011110; // where size in bytes = 2^Size
next = NULL;
prev = NULL; //note that these are part of a struct called mallocList
How would I create these variables at the address I want them in?
I tried this,
int MAX_MEM = 1 << 30;
base = mmap(NULL, MAX_MEM, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, 0, 0);
*((unsigned char*) base) = 0x1E;
struct mallocList* temp;
temp->prev = NULL;
temp->next = NULL;
void* tempaddr = base + 1;
*((struct mallocList*) tempaddr) = *temp;
munmap(base, 1 <<30);
which compiled and ran without issue, but I realized trying to access the values,
printf("%c", *base); //line 37
struct mallocList* two;
two->prev = NULL;
two->next = NULL;
tempaddr->next = *two; //line 41
the compiler says,
3.c:37: warning: dereferencing ‘void *’ pointer
3.c:37: error: invalid use of void expression
3.c:41: warning: dereferencing ‘void *’ pointer
3.c:41: error: request for member ‘next’ in something not a structure or union
So I figure something's either wrong with my method of storing the data or retrieving it, and I'd greatly appreciate any help that could be offered.
Here's a header file mymalloc.h:
void *my_buddy_malloc(int size);
void my_free(void *ptr);
struct mallocList
{
struct mallocList *prev;
struct mallocList *next;
} mallocList;
Your compiler error explains the main problem: you can't dereference a void*. Cast the pointer to char* and store whatever bytes you want, or cast it to a struct yourstruct * and store to struct fields with p->field.
/* You need to tell gcc to pack the struct without padding,
* because you want the pointers stored starting with the second byte, i.e. unaligned.
* That's actually fine in *this* case, since they won't cross a cache-line boundary.
* They'll be at the beginning of a page, from mmap, and thus the beginning of a cache line.
* Modern CPUs are fast at handling misaligned loads within a cache line.
*/
struct __attribute__ ((__packed__)) mem_block {
unsigned int occupied:1;
unsigned int size:7; // bitfields. Not necessarily a good idea. Just using a signed int yourself might be better. positive for allocated, negative for free.
struct mallocList { // nested definition. You can do this differently
struct mallocList *prev, *next;
} pointers;
}; // don't need the type-name here. That would declare a variable of the struct type.
int MAX_MEM = 1 << 30;
void *base = mmap(NULL, MAX_MEM, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, 0, 0);
char *cp = base;
cp[0] = size << 1 | 1; // pack the size and occupied bits into a byte
struct mallocList *mlp = (struct mallocList*)(cp+1); // This avoids needing a compiler-specific way to pack your struct.
// or
struct mem_block *mbp = base;
mbp->occupied = 1;
mbp->size=whatever;
mbp->pointers.prev = NULL;
mbp->pointers.next = NULL;
This might not compile, sorry, but the basic idea about casting pointers is solid.

Resources