Pointer alignment issue - c

I have the content of a file already loaded in memory and I want to assign the data from the file to a convenient set of structs, and I donĀ“t want to allocate new memory.
So I have the pointer of the memory where the data from the file starts, from there I work down this pointer assigning the values to different structs but then I reach a point where the program crashes.
//_pack_dynamic is the pointer to the data in memory
us *l_all_indexes = (us *) _pack_dynamic; //us is an unsigned short
printf("Index 0:%d", l_all_indexes[0]); //here is where the program crashes
_pack_dynamic += sizeof(us) * m_number_of_indexes;
The data, at least for the first element, is there, I can get it out like so:
us temp;
memcpy(&temp, _pack_dynamic, sizeof(us));
Any idea how I could extract all the indexes (m_number_of_indexes) from _pack_dynamic and assign them to l_all_indexes without allocating new memory?

Accessing _pack_dynamic as if it contained us object(s) has undefined behaviour unless it actually does contain such objects (this is a slight simplification, but a good rule of thumb. An array of char certainly cannot be interpreted as short).
The memcpy way into a proper us object is the only standard way to interpret memory as an object. Another approach for integers is to read char by char and shift-mask-or them together. This approach allows assuming a particular endianness instead of native.
A system dependent way that might work is to make sure that _pack_dynamic is aligned to the boundary required by us. But even then, standard gives you no guarantees about behaviour.
"Allocating" an automatic variable has hardly any runtime overhead. Allocating a few bytes for a short is usually insignificant.

Related

Memory Management with pointers

I am developing a C library which has three functions. Init function takes a (void*) pointer to a memory chunk. I need to develop functions to allocate and deallocate memory blocks from that said chunk. What this means is that I have to keep track of which parts of the memory chunk I have allocated and which parts are free. Problem is, the structure I will implement to track the memory also has to be part of said memory chunk. I am not allowed to allocate new memory for my management structure.
And I have no idea how to do that.
Currently, I am planning to designate first few hundred bytes as header space and divide the rest into frames of equal size. I will use header space to create an array which will keep track of which frames are allocated. To do that, I need a way to convert memory address into long int so I can save them into the array and my search so far yielded nothing.
Is there any way to accomplish that?
Failing that is there any other way to implement a management structure in this situation.
To do that, I need a way to convert memory address into long int so I can save them into the array and my search so far yielded nothing.
Is there any way to accomplish that?
Generally, you do not need to convert memory addresses to an integer type merely to keep track of them. Options include:
Work with pointers within the memory chunk you are given, using char * to perform arithmetic.
Subtract the base address of the memory (again with char *) from pointers within it to get offsets of type ptrdiff_t (defined in <stddef.h>) and use those.
Convert the addresses to the integer type uintptr_t (defined in <stdint.h>). Unlike the other options, this has implementation-dependent behavior. In common C implementations, the result of conversion will be a simple memory address that you can perform arithmetic on as expected. But, in some C implementations, the result will be more complicated, so the code will not be fully portable.

Segfault when dereferencing a custom mem address (C)

I want to declare a pointer, have it hold a custom address and then assign a value to it:
void main()
{
char *ptr;
ptr = (char *)0x123123; //the assignment works perfectly with a cast
printf("%p\n", ptr); //and the pointer indeed holds the address it's supposed to
*ptr = 'a'; //but this breaks
puts("2");
}
Initially I thought the reason is because I'm trying to dereference uninitialized memory. But I doubt actually that this is the case, since this some_type *some_ptr = &some_variable; works flawlessly, so the deal must be the address I assign it to.
Then I thought, in the same way 3 or 'a' or "alpine" are constants, (char *) 0x123123 must be a constant too. And const-s can't be edited in C, but that still can't be it, because an attempt to change a const value will not compile.
3rd assumption would be that such an address must be unavailable, but this doesn't make sense either, because line 4 works always, no matter the address I give or the type of the pointer.
3rd assumption would be that such an address must be unavailable,
That is correct: on modern OSes (which all have memory protection) you can't write to arbitrary memory address.
It used to be possible to access any memory on OSes that didn't utilize virtual memory (such as MS-DOS), but allowing that is generally a very bad idea -- it allowed random program to corrupt OS state, and required very frequent reboots.
but this doesn't make sense either, because line 4 works always, no matter the address I give or the type of the pointer.
You confuse two distinct operations: printing an address (allowed no matter what that address is) and dereferencing an address, i.e. reading or modifying the value stored at the address (only allowed for valid addresses).
The distinction is similar to "can you print an address?" (e.g. "123 Main Street, SomeTown, SomeCountry"), and "can you enter a house at that address?" (not possible for above address because there is no "SomeCountry" on Earth). Even if the address is valid, e.g. "1600 Pennsylvania Ave NW, Washington, DC 20500", you may still not be allowed to enter it.
The OP clarified elsewhere, that this is actually an XY problem.
The X problem: reading/writing to arbitrary memory locations.
The Y problem: implementing a linked list that uses consecutive memory.
Of course, the answer to that is: one has to implement his complete own memory management system to get there.
As in: first, you use malloc() to acquire a large block of consecutive memory. Then you can use arbitrary pointers within that block of memory. But of course, your code has to track which addresses are already used. Or to correctly "free" up when list nodes get deleted.
The tricky part is about handling the corner cases, such as: what happens when your last "pointer" gets used up? Do you malloc() a larger area, and move all data in memory?
Finally: assume that you don't manage a block of memory, but a single array. ( linked lists implementations are often based on arrays, as that makes some things much easier )
writing to some arbitrary memory address is dangerous and not allowed by modern operating systems, better to create a memory blob and write to that.
e.g. using malloc :
ptr = malloc(32); // now you can write to this memory block and it perfectly legal
*ptr = 'a';

C malloc/free corruption general questions

This question is similar to c malloc questions (mem corruption) but I ask it again because I want more specific information than what was provided.
So I have a program with one malloc, followed by some complex code, followed by one free. Somewhere in the complex code, memory gets corrupted by either a double-free or an out-of-bounds write (both in separate regions of memory from the original malloc). This causes the original free to fail. This behaviour is fairly deterministic.
My questions are several:
What are the minimal conditions for a memory corruption to affect a separate memory region like this.
Are there any proactive steps that can be taken to prevent this cross-corruption.
Is it defined behaviour wrt standards to use pointer arithmetic to jump back and forth between contiguously allocated memory.
/* 3 example */
void *m = malloc(sizeof(header_struct) + sizeof(body_struct));
body_struct *b = (body_struct*) (((header_struct*)m)+1);
header_struct *h = (header_struct*) (((header_struct*)b)-1);
Good questions.
Q1. The minimal conditions under the standard are anything that triggers Undefined Behaviour. Unfortunately, that's a rather long list and not actionable. In practice the list comes down to 4 common scenarios: object underflow, object overflow, dangling reference or wild store.
Object underflow happens when you write to bytes just before the allocated block. It is very common for those bytes to contain critical block links and the damage is usually severe.
Object overflow happens when you write to bytes just after the allocated block. There is usually a small a amount of padding at the end so a byte or two will usually do no serious damage. If you keep writing you will eventually hit something important, as per underflow.
Dangling reference means writing via a pointer that used to be valid. It could be a pointer to a local variable that has gone out of scope, or to a block that has been freed. These are nasty.
Wild store means writing to an address way outside an allocated block. That could be a small positive address (say pointer value of 0x20) and in a debug environment these areas can often be protected, or it can be random garbage because the pointer itself got damaged. These are less common, but very hard to find and fix.
Q2. A debug heap is your first level of protection. It will check the links and write special patterns into unused space and generally help to find and fix problems. if you are using a debug heap then free() usually triggers some diagnostic activities, but you can usually find some other call to do the same thing. On Windows that is HeapValidate().
You can do more by implementing your own heap with guards/sentinels (look that up) and your own heap checking functions. Beyond that (in C at least), you just have to get better at writing the code. You can add assertions so at least the code fails fast.
Then you can use external tools. One is valgrnd, but it is not always feasible. In one case we wrote a complete heap logging system to track every allocation, to find problems like these.
Q3. Your second example does not guarantee the correct alignment of body_struct in line 2. According to the C standard n1570 S7.22.3, memory returned by malloc() is suitable aligned to be used as a pointer to any object. The compiler will lay out the structs with this assumption.
However, that requirement does not extend to the members of an array of structs. It is implementation-defined whether the second member of an array of a struct like this is aligned or not.
struct s {
double d;
char c;
} ss[2];
With this in mind, your code is valid C, but may have implementation-defined or Undefined Behaviour, depending on alignment requirements. It is certainly not recommended.
(1) Any undefined behaviour can cause such memory corruption, including but not restricted to writing to any memory location that is not part of an object.
(2) Write your code carefully :-(
(3) The second assignment is not portable and can lead to all kinds of problems due to alignment problems. To make this correct and portable, you usually use a flexible array member. If you always allocate one header and one body, define a new struct
typedef struct {
header_struct header;
body_struct body;
} body_plus_header_struct;
If you allocate one header and a variable number of bodies, write
typedef struct {
header_struct header;
body_struct bodies [];
} body_plus_header_struct;
Here, body_plus_header_struct has a size that is guaranteed to be rounded up so that the address of the bodies array has the correct alignment. To allocate a struct for n bodies, allocate
body_plus_header_struct* p = malloc (sizeof (*p) + n * sizeof (p->bodies [0]));

structures containing structures vs. structures containing pointers

The following question is in regards to C programming. I am using Microchip C30 compiler (because I know someone will ask)
What is the difference between having a structure which contains several other structures vs a structure which contains several pointers to other structures? Does one make for faster code execution than the other? Does one technique use more or less memory? Does the memory get allocated at the same time in both cases?
If I use the following code does memory automatically get allocated for the subStruct?
// Header file...
typedef struct{
int a;
subStruct * b;
} mainStruct;
typedef struct{
int c;
int d;
}subStruct;
extern mainStruct myMainStruct;
// source file...
mainStruct myMainStruct;
int main(void)
{
//...
{
If you use a pointer, you have to allocate the memory yourself. If you use a substructure, you can allocate the entire thing in one go, either using malloc or on the stack.
What you need depends on your use case:
Pointers will give you smaller struct's
Substructures provide better locality of reference
A pointer may point to either a single struct or the first member in an array of them, while substructures are self-documenting: there's always one of them unless you use an array explicitly
Pointers take up some space, for the pointer itself + overhead from extra memory allocations
And no, it doesn't matter which compiler you use :)
Memory for pointers doesn't get automatically allocated, but when you contain whole structure in your struct, it does.
Also - with pointers you are likely to have fragmented memory - each pointed part of tructure could be in other part of memory.
But with poniters you can share the same substructures across many structs (but this makes changing and deleting them later harder).
Memory for a pointer wouldn't be automatically allocated. You would need to run:
myMainStruct.b=malloc(sizeof(*myMainStruct.b));
In terms of performance, there is likely a small hit to going from one structure to another via the pointer.
As far as speed goes, it varies. Generally, including structs, rather than pointers, will be faster, because the CPU doesn't have to dereference the pointer for every member access. However, if some of the members aren't used very often, and the sub-struct's size is massive, the structure might not fit in the cache and this can slow down your code quite a bit.
Using pointers will use /slightly/ more memory (but only the size of the pointers themselves) than the direct approach.
Usually with pointers to sub-structs you'll allocate the sub-structs separately, but you can write some kind of initialization function which abstracts all the allocation out to "the same time." In your code, memory is allocated for myMainStruct on the stack, but the b member will be garbage. You need to call malloc to allocate heap memory for b, or create a subStruct object on the stack and point myMainStruct.b to it.
What is the difference between having a structure which contains several other structures vs a structure which contains several pointers to other structures?
In the first case what you have is essentially one big structure in contiguous memory. In the "pointers to structures" case your master structure just contains the addresses to the sub-structures which are allocated separately.
Does one make for faster code execution than the other?
The difference should be negligible, but pointers method is will be slightly slower. This is because you must dereference the pointer with each access to the substructure.
Does one technique use more or less memory?
The pointer method uses number_of_pointers * sizeof(void*) more memory. sizeof(void*) will be 4 for 32-bit and 8 for 64-bit.
Does the memory get allocated at the same time in both cases?
No, you need to go through each pointer in your master struct and allocate memory for the sub-structs via malloc().
Conclusion
The pointers add a layer of indirection to the code, which is useful for switching out the sub-structs or having more than one pointer point to the same sub-struct. Having different master-structs pointing to common sub-structs in particular could save quite a bit of memory and allocation time.

Why does C need arrays if it has pointers?

If we can use pointers and malloc to create and use arrays, why does the array type exist in C? Isn't it unnecessary if we can use pointers instead?
Arrays are faster than dynamic memory allocation.
Arrays are "allocated" at "compile time" whereas malloc allocates at run time. Allocating takes time.
Also, C does not mandate that malloc() and friends are available in free-standing implementations.
Edit
Example of array
#define DECK_SIZE 52
int main(void) {
int deck[DECK_SIZE];
play(deck, DECK_SIZE);
return 0;
}
Example of malloc()
int main(void) {
size_t len = 52;
int *deck = malloc(len * sizeof *deck);
if (deck) {
play(deck, len);
}
free(deck);
return 0;
}
In the array version, the space for the deck array was reserved by the compiler when the program was created (but, of course, the memory is only reserved/occupied when the program is being run), in the malloc() version, space for the deck array has to be requested at every run of the program.
Arrays can never change size, malloc'd memory can grow when needed.
If you only need a fixed number of elements, use an array (within the limits of your implementation).
If you need memory that can grow or shrink during the running of the program, use malloc() and friends.
It's not a bad question. In fact, early C had no array types.
Global and static arrays are allocated at compile time (very fast). Other arrays are allocated on the stack at runtime (fast). Allocating memory with malloc (to be used for an array or otherwise) is much slower. A similar thing is seen in deallocation: dynamically allocated memory is slower to deallocate.
Speed is not the only issue. Array types are automatically deallocated when they go out of scope, so they cannot be "leaked" by mistake. You don't need to worry about accidentally freeing something twice, and so on. They also make it easier for static analysis tools to detect bugs.
You may argue that there is the function _alloca() which lets you allocate memory from the stack. Yes, there is no technical reason why arrays are needed over _alloca(). However, I think arrays are more convenient to use. Also, it is easier for the compiler to optimise the use of an array than a pointer with an _alloca() return value in it, since it's obvious what a stack-allocated array's offset from the stack pointer is, whereas if _alloca() is treated like a black-box function call, the compiler can't tell this value in advance.
EDIT, since tsubasa has asked for more details on how this allocation occurs:
On x86 architectures, the ebp register normally refers to the current function's stack frame, and is used to reference stack-allocated variables. For instance, you may have an int located at [ebp - 8] and a char array stretching from [ebp - 24] to [ebp - 9]. And perhaps more variables and arrays on the stack. (The compiler decides how to use the stack frame at compile time. C99 compilers allow variable-size arrays to be stack allocated, this is just a matter of doing a tiny bit of work at runtime.)
In x86 code, pointer offsets (such as [ebp - 16]) can be represented in a single instruction. Pretty efficient.
Now, an important point is that all stack-allocated variables and arrays in the current context are retrieved via offsets from a single register. If you call malloc there is (as I have said) some processing overhead in actually finding some memory for you. But also, malloc gives you a new memory address. Let's say it is stored in the ebx register. You can't use an offset from ebp anymore, because you can't tell what that offset will be at compile time. So you are basically "wasting" an extra register that you would not need if you used a normal array instead. If you malloc more arrays, you have more "unpredictable" pointer values that magnify this problem.
Arrays have their uses, and should be used when you can, as static allocation will help make programs more stable, and are a necessity at times due to the need to ensure memory leaks don't happen.
They exist because some requirements require them.
In a language such as BASIC, you have certain commands that are allowed, and this is known, due to the language construct. So, what is the benefit of using malloc to create the arrays, and then fill them in from strings?
If I have to define the names of the operations anyway, why not put them into an array?
C was written as a general purpose language, which means that it should be useful in any situation, so they had to ensure that it had the constructs to be useful for writing operating systems as well as embedded systems.
An array is a shorthand way to specify pointing to the beginning of a malloc for example.
But, imagine trying to do matrix math by using pointer manipulations rather than vec[x] * vec[y]. It would be very prone to difficult to find errors.
See this question discussing space hardening and C. Sometimes dynamic memory allocation is just a bad idea, I have worked with C libraries that are completely devoid of malloc() and friends.
You don't want a satellite dereferencing a NULL pointer any more than you want air traffic control software forgetting to zero out heap blocks.
Its also important (as others have pointed out) to understand what is part of C and what extends it into various uniform standards (i.e. POSIX).
Arrays are a nice syntax improvement compared to dealing with pointers. You can make all sorts of mistakes unknowingly when dealing with pointers. What if you move too many spaces across the memory because you're using the wrong byte size?
Explanation by Dennis Ritchie about C history:
Embryonic C
NB existed so briefly that no full description of it was written. It supplied the types int and char, arrays of them, and pointers to them, declared in a style typified by
int i, j;
char c, d;
int iarray[10];
int ipointer[];
char carray[10];
char cpointer[];
The semantics of arrays remained exactly as in B and BCPL: the declarations of iarray and carray create cells dynamically initialized with a value pointing to the first of a sequence of 10 integers and characters respectively. The declarations for ipointer and cpointer omit the size, to assert that no storage should be allocated automatically. Within procedures, the language's interpretation of the pointers was identical to that of the array variables: a pointer declaration created a cell differing from an array declaration only in that the programmer was expected to assign a referent, instead of letting the compiler allocate the space and initialize the cell.
Values stored in the cells bound to array and pointer names were the machine addresses, measured in bytes, of the corresponding storage area. Therefore, indirection through a pointer implied no run-time overhead to scale the pointer from word to byte offset. On the other hand, the machine code for array subscripting and pointer arithmetic now depended on the type of the array or the pointer: to compute iarray[i] or ipointer+i implied scaling the addend i by the size of the object referred to.
These semantics represented an easy transition from B, and I experimented with them for some months. Problems became evident when I tried to extend the type notation, especially to add structured (record) types. Structures, it seemed, should map in an intuitive way onto memory in the machine, but in a structure containing an array, there was no good place to stash the pointer containing the base of the array, nor any convenient way to arrange that it be initialized. For example, the directory entries of early Unix systems might be described in C as
struct {
int inumber;
char name[14];
};
I wanted the structure not merely to characterize an abstract object but also to describe a collection of bits that might be read from a directory. Where could the compiler hide the pointer to name that the semantics demanded? Even if structures were thought of more abstractly, and the space for pointers could be hidden somehow, how could I handle the technical problem of properly initializing these pointers when allocating a complicated object, perhaps one that specified structures containing arrays containing structures to arbitrary depth?
The solution constituted the crucial jump in the evolutionary chain between typeless BCPL and typed C. It eliminated the materialization of the pointer in storage, and instead caused the creation of the pointer when the array name is mentioned in an expression. The rule, which survives in today's C, is that values of array type are converted, when they appear in expressions, into pointers to the first of the objects making up the array.
To summarize in my own words - if name above were just a pointer, any of that struct would contain an additional pointer, destroying the perfect mapping of it to an external object (like an directory entry).

Resources