Binary resources embedded in .exe and memory management when loaded - c

I'm working in a small C program and I need to embed binary data into an exe file. The method I'm using is converting that binary data into a char[] array... but I'm not including directly that array as a global variable; instead, I copy that array inside a function (LoadResource) that dynamically creates an array on heap, where I copy my original data. That's what I mean:
char *dataPntr;
void LoadResource()
{
char data[2048] = {/*my binary data */};
dataPntr = malloc(2048);
for (int i = 0; i < 2048; i++) dataPntr [i] = data[i];
}
That way, if my understanding is correct, when calling LoadResource() data[] will be placed in stack, copied to heap and finally data[] will be automatically deallocated from stack; heap copy should be manually deallocated with free().
I'm doing it this way because the resource is only used in some situations, not always... and I prefer to avoid a large global variable.
My questions:
When running the program, is data[] array placed somewhere in memory? text segment maybe? or is it just loaded into stack when calling LoadResource()?
Is my solution the proper one (in terms of memory management) or would it be better to just declare a global data array?
Thanks for your answers!

Generally it is a good idea to avoid global variables. I won't say you never need them, but they can be a pain to debug. The problem is that it can be difficult to follow who changed it last. And if you ever do any multi-threading then you will never want to see a global again!
I include your char *dataPntr in those comments - why is that global? It might be better to return the pointer instead.
Not sure why you are using an array on the stack (data), my guess is so that you can use the {...} initialisation syntax. Can you avoid that? It might not be a big deal, 2k is not a large overhead, but maybe it might grow?
Personally I would copy the data using memcpy()
You have a couple of "magic numbers" in your code, 2048 and 2018. Maybe one is a typo? To avoid this kind of issue, most will use a pre-processor macro. For example:
#include <string.h> /* for memcpy() */
#define DATA_SIZE 2048
char * LoadResource(void)
{
char data[DATA_SIZE] = {/*my binary data */};
char * dataPntr = malloc(DATA_SIZE);
if (dataPntr)
memcpy(dataPntr, data, DATA_SIZE);
return dataPntr;
}
By the way, notice the prototype for LoadResource as void. In C (not C++) an empty parameter list means no parameter checking, not no parameters. Also note that I check the returned value from malloc. This means that the function will return NULL on error.
Another strategy might be to make the data array static instead, however exactly when that gets initialised is compiler dependant, and you might find that you incur the memory overhead even if you don't use it.

While I agree with #cdarke in general, it sounds to me that you are creating a constant array (i. e. is never modified at run-time). If this is true, I would not hesitate to make it a global const array. Most compilers will simply place the array in text memory at link time and there will not be any run-time overhead for initialization.
If, on the other hand, you need to modify the data at run-time, I'd follow #cdarke's example, except to make your data array static const. This way, again, most compilers will place the preinitialized array in the text segment and you will avoid the run-time overhead for initializing the data array.

Related

Use Global Array instead of Pointer to array for transfer a string from Function

As you can see from the code, to extract a string from a function I used two methods, the first one is to use a pointer and the second one to use a global array.
Both have the same result, but I wanted to know if it is a mistake to use one or the other, since they advised me not to use the globals without giving me an explanation.
forgive my ignorance, unfortunately I have no one to ask and in the search engines I have not found someone who would deal with this topic.
#include <stdio.h>
#include <string.h>
char MyArr[18] = {'\0'}; // Global Array //
// This function use a *Pointer //
const char * MyWordWP( void )
{
return "Hello with pointer";
}
// This function use a global array //
void MyWordArr( void )
{
strcpy(MyArr, "Hello With Array");
}
int main( void )
{
// Read pointer //
const char *MyW = MyWordWP( );
printf( "With Pointer = %s \n" , MyW );
//Set the global //
MyWordArr();
printf( "With Array = %s \n", MyArr);
return 0;
}
Thank you for your support.
A few reasons why to not use global objects and arrays too extensively are for example:
First, Functions which use global objects are not very reusable. Identifiers are hard-coded in the source code. This makes the use of the function almost unique which is generally not the best solution. In contrary, parameters allow to pass values from objects of different identifiers and in the case of pointers also addresses from arrays of different sizes.
The same goes for the opposite process of returning values and pointers from functions, like in your case. Note, that inside of a function an object must be declared with the storage class specifier static to remain in memory after the end of the function´s execution, which is important in case of returning a pointer to a function-local object.
The string literal "Hello With Array" exists beyond the execution of the function MyWordArr() because string literals reside in memory until the program finishes.
Second, your code may become hard to read and maintain for others but also for yourself when the source code is very large and f.e. after a few hundred function calls you lost a bit of the exquisite overview and may ask yourself: "What is the function actually doing?" - whereas opposed to that, passed parameters and return values describe the intention and the context of the use of a function call pretty good.
Third, the object can be accessed from almost anywhere in the code, making it hard to find the cause of an unintentional modification in a large program.
Fourth, global objects are not thread-safe.
So as conclusion:
It depends upon the case of use. In your provided example it might fit well as the program is small and focused, but generally you shall try (if it is possible) to avoid global objects because production code is in most cases very large and tend to be brain smashing on its own.
Maybe a good reason to use a global variable, either array or pointer, is when there are one or more functions that always use that (global) variable, and this is coded into the logic. In such case, you would end up with a program full of calls to "func(myvar);" when the logic is that func() always should read or update that variable "myvar".
An example could be a text console driver or library, where a cache of the screen is kept in memory. The application calls setcursorto(x,y), display("hello"), color(yellow) and so on; all these routines update the cache in memory: well, that cache should be a global variable and there is no need to pass that global variable to every function operating on it.
Regarding the visibility, not all the global variables have to be visible to all the program. A C source file can have static global variables not visible outside.

Security in returning arrays

Suppose the following function:
float *dosomething(const float *src, const int N)
{
float *dst = (float *)malloc(sizeof(float) * N);
if(!dst)
{
printf("Cannot allocate memory\n");
exit(EXIT_FAILURE);
}
for(int i = 0; i < N; i++)
dst[i] = src[i] * 2;
return dst;
}
In this case we don't need allocate memory previously if we want to use it right?
Now, just another case:
void dosomething(float *dst, const float *src, const int N)
{
for(int i = 0; i < N; i++)
dst[i] = src[i] * 2;
}
In the last case we need to allocate memory previously. So I share it and I'm wondering which is the best method for returning an array. Which of them provide more security to an user of the library or class? which method is most recommended? why?
What's better practice or a better idea depends on what you're actually trying to do.
A function like char *strdup(const char *s) (POSIX) is implemented like the first case, it takes a string as an argument, allocates memory for another of the same length and then copies the source to the new piece of memory. It's convenient and saves you from manually doing the common action of allocating a buffer for the copy of the string. You could assume this is simply like a call to malloc and then strcpy/memcpy.
Then you've got a function like char *strcpy(char *dest, const char *src), which is like the second case, where you have control of where the string is going to be copied to. This way you're not forced into having the string copied into a dynamically allocated, not of your choice, piece of memory.
The first way might come in handy if you needed to create and initialise some sort of dynamic structure (list, tree, etc), but then again the second way also suffices and gives you control of what piece of memory is being used; you can use dynamically allocated memory on the heap, or local variables on the stack, etc.
Personally, I would usually go the second way, because I have more control of what variable's being initialised, and I'm not forced into having to use a newly malloc'd piece of memory (what if I wanted my local variable to be initialised?). You could always then write a wrapper function that makes a call to malloc and then to your function using the newly allocated memory as the destination.
It's really up to you and your design and what you're trying to achieve, there are no right and wrong ways and as long as you remember the allocated memory you shouldn't have any problems. I wouldn't say either of the two is more "secure."
There is no RIGHT answer.
C language is inherently insecure, i.e. you can only make data secure if you make a copy and return the copy. Thus hiding the real location of the original from the caller.
What is more important is how to handle the memory de-allocation of shared data that usually dictates the approach is more correct.
In the example you cite the only data being accessed is the data the caller has already passed (and already owns). So the fact you allocate memory, do something with the data and return the allocated memory to the caller is just fine. Just document that is how the function works (like strdup() works on C strings, the caller is responsible for using free() on any returned non-NULL pointer).
FWIW you don't "share" the data. The caller invokes the function to do work on the data on its behalf, once the function returns no more access occurs. If there was a retained (by the function) memory pointer (or other data) it would be correct to describe the situation as sharing data. Since at some point in the future that retained memory pointer (or other data) maybe utilized in some way.
There is no definite "this is better than the other". I never actually think about these things, and just do whatever comes to mind. Which is likely to be the more "natural" solution for the problem at hand. And if it turns out to be "bad" along the way... well, luckily we are not programming by engraving on stone tablets.
In your case, without knowing anything about the software at all, nothing "feels" better. That's actually quite common; almost everything you do in programming can be done in different ways, and often there's no actual difference other than personal preference or just random "that's what I came up with first".
For example, your second solution lets the caller copy to existing memory, which might be part of a larger object. On the other hand, he has to provide the destination memory every time. Although this could also mean saving allocations by using just one memory block for multiple calls. The first solution seems slightly more convenient for the simple case, but 'locks' the user in that case: there's always a fresh memory block allocated.

In what situation am I supposed to use malloc? I'm confused by when dynamic allocation should be used in C

If I want to create a 2D array, with dimensions specified by user input, can't I just do this sequentially in the main function? Once I have the dimensions by using scanf, I then create an array with those dimensions? From what I understood, malloc is supposed to be used when the space required is not known at runtime. I wouldn't've known the space required at runtime but I didn't have to allocate the memory dynamically, and it would work anyway, right? Perhaps I'm completely misunderstanding something.
Generally there are three reaons to use dynamic allocation in C:
The size is not known until runtime (another alternative is VLA's, but that's C99 and potentially dangerous, see reason 2).
The size is (most likely) too big for the stack, risking stack overflow.
The object needs to live on the heap giving it a longer life than "automatic" storage.
malloc is generally used when the space requirements aren't known at compile-time, or when you need the object to persist beyond the scope that it was created in.
In C99 (unlike earlier versions of C), you can also define variable-length 1-dimensional arrays without using malloc. But many people consider then evil, because there's no way to catch an out-of-memory condition.
But if you want something that acts like a multidimensional array (in the sense of being able to index it like x[i][j]) where the dimensions aren't known until runtime, you will need to involve malloc somewhere.
Here's a archetypal example of the need for dynamic allocation: Making a dynamic container. Since you don't know the number of elements in advance, each element has to be allocated dynamically. Since you populate the container in a loop, each element must outlive the loop scope. This is the "zero-one-many" rule at its barest, and the "many" part of it immediately entails dynamic allocation:
int value;
node * my_list = NULL;
while (get_one_more_input(&value) == SUCCESS)
{
node * elem = malloc(sizeof(node));
elem->data = value;
elem->next = my_list;
my_list = elem;
}
The crux here is that the actual list node is only populated inside the loop which reads the input data, but clearly it must outlive that scope. Thus no automatic object can do, because it would only live to the end of the scope, and also no static element could do, because you cannot know the number of loop iterations beforehand. Dynamic lifetime and storage management is the only solution here, and that's what it is primarily intended for.
In C you will be doing a lot of this by hand, since dynamic data structures are at the very heart of computing. C++ makes a lot of this much easier and safer by wrapping all the dynamic management logic into hidden-away, reusable code that you never need to look at as a consumer (though it's still doing the exact same thing).

Checking if a certain adress in memory is allocated

I have a function that recieves a pointer to dynamic array of 100 ints. But instead of 100 I have just 50 allocated by malloc or calloc before that.
Is there a way that I could check if any ellement (like 79th for example) is allocated rather than wonder what this SIGSEGV actually means ?
My question is purely theoretic and I have no actual code to show.
No, the pointer does not store its size. You may be better off storing the size and the pointer in a struct and passing it instead:
typedef struct
{
size_t size;
int *ptr;
} my_data;
void myFunc(my_data *data)
{
size_t i;
for(i = 0; i < data->size; i++)
{
// data->ptr[i];
}
}
void myFunc2(my_data *data, size_t index)
{
if(index < data->size)
{
// memory location exists
}
}
Well, you could do such a thing according to your description, given an array and looking for an index (which is slightly different from "any raw pointer"). And with some more work, it is even possible to do such a thing for any pointer.
The malloc function necessarily stores information about how much was allocated. Unluckily, there is no standard how this must be done. Some compilers over-allocate and store the size immediately preceding the allocated data. Others may store addresses in a map, yet others may do something else, you don't know.
However, most (all?) C libraries and at least one linker that I know of have explicit support for overloading/hooking/replacing allocation functions.
For example in the GNU C library, you can set __malloc_hook. and GNU ld lets you do such a thing at linker level with __wrap_malloc.
You could thus overload/hook malloc and free with a function that simply calls the real malloc function and stores the information how much was allocated yourself somewhere (e.g. by over-allocating and using the first word, or whatever you like).
Then write a function which takes a base pointer and an index. That function looks at the allocation info (now you know where to find it!), and can trivially check whether the index is in range. This does not work for "just any pointer".
An alternative solution which works for "just any pointer" would be to write an allocator that satisfies allocations from separate arenas rather than simply wrapping the real malloc. All allocations coming from the same arena have the same allocation size. Given any pointer, you would then only need to iterate over all your arenas and look whether the address is within the arena's start and end address.
However, one should normally be quite sure how much one has allocated, this should not be guesswork, or random luck, or something to figure out at runtime.
Also, given the presence of ready-to-use memory debuggers, I doubt it is really worth investing time in doing such a thing application-side. Just use something like valgrind, no need to write any code at all.
No, there's no portable and reliable way to check this from within the code.
There exist tools -- such as valgrind -- that may help diagnose certain types of memory bugs.
No, there isn't.
This is when you break out your dynamic analysis tool (e.g. valgrind), or use a real container that keeps information about its size.
Some years ago i used one library, i forget its name. Using it, you can create try-catch block and try to access to unknown data e.g. x[79] in try-block, and, if memory is not allocated in it, exception was generated.

Is it a common practice to re-use the same buffer name for various things in C?

For example, suppose I have a buffer called char journal_name[25] which I use to store the journal name. Now suppose a few lines later in the code I want to store someone's name into a buffer. Should I go char person_name[25] or just reuse the journal_name[25]?
The trouble is that everyone who reads the code (and me too after a few weeks) has to understand journal_name is now actually person_name.
But the counter argument is that having two buffers increases space usage. So better to use one.
What do you think about this problem?
Thanks, Boda Cydo.
The way to solve this in the C way, if you really don't want to waste memory, is to use blocks to scope the buffers:
int main()
{
{
char journal_name[26];
// use journal name
}
{
char person_name[26];
// use person name
}
}
The compiler will reuse the same memory location for both, while giving you a perfectly legible name.
As an alternative, call it name and use it for both <.<
Some code is really in order here. However a couple of points to note:
Keep the identifier decoupled from your objects. Call it scratchpad or anything. Also, by the looks of it, this character array is not dynamically allocated. Which means you have to allocate a large enough scratch-pad to be able to reuse them.
An even better approach is to probably make your functions shorter: One function should ideally do one thing at a time. See if you can break up and still face the issue.
As an alternative to the previous (good) answers, what about
char buffer[25];
char* journal_name = buffer;
then later
char* person_name = buffer;
Would it be OK ?
If both buffers are automatic why not use this?
Most of compilers will handle this correctly with memory reused.
But you keep readability.
{
char journal_name[25];
/*
Your code which uses journal_name..
*/
}
{
char person_name[25];
/*
Your code which uses person_name...
*/
}
By the way even if your compiler is stupid and you are very low of memory you can use union but keep different names for readability. Usage of the same variable is worst way.
Please use person_name[25]. No body likes hard to read code. Its not going to do much if anything at all for your program in terms of memory. Please, just do it the readable way.
You should always (well unless you're very tight on memory) go for readability and maintainability when writing code for the very reason you mentioned in your question.
25 characters (unless this is just an example) isn't going to "break the bank", but if memory is at a premium you could dynamically allocate the storage for journal_name and then free it when you've finished with it before dynamically allocating the storage for person_name. Though there is the "overhead" of the pointer to the array.
Another way would be to use local scoping on the arrays:
void myMethod()
{
... some code
{
char journal_name[25];
... some more code
}
... even more code
{
char person_name[25];
... yet more code
}
}
Though even with this pseudo code the method is getting quite long and would benefit from refactoring into subroutines which wouldn't have this problem.
If you are worried about memory, and I doubt 25 bytes will be an issue, but then you can just use malloc and free and then you just have an extra 4-8 bytes being used for the pointer.
But, as others mentioned, readability is important, and you may want to decompose your function so that the two buffers are used in functions that actually give more indication as to their uses.
UPDATE:
Now, I have had a buffer called buffer that I would use for reading from a file, for example, and then I would use a function pointer that was passed, to parse the results so that the function reads the file, and handles it appropriately, so that the buffer isn't filled in and then I have to remember that it shouldn't be overwritten yet.
So, yet, reusing a buffer can be useful, when reading from sockets or files, but you want to localize the usage of this buffer otherwise you may have race conditions.

Resources