Char array size when using certain library functions - arrays

When using some library functions (e.g. strftime(), strcpy(), MultiByteToWideChar()) that deal with character arrays (instead of std::string's) one has 2 options:
use a fixed size array (e.g. char buffer[256];) which is obviously bad because of the string length limit
use new to allocate required size which is also bad when one wants to create a utility function like this:
char * fun(void)
{
char * array = new char[exact_required_size];
some_function(array);
return array;
}
because the user of such function has to delete the array.
And the 2nd option isn't even always possible if one can't know the exact array size/length before using the problematic function (when one can't predict how long a string the function will return).
The perfect way would be to use std::string since it has variable length and its destructor takes care of deallocating memory but many library functions just don't support std::string (whether they should is another question).
Ok, so what's the problem? Well - how should I use these functions? Use a fixed size array or use new and make the user of my function worry about deallocating memory? Or maybe there actually is a smooth solution I didn't think of?

You can use std::string's data() method to get a pointer to a character array with the same sequence of characters currently contained in the string object. The character pointer returned points to a constant, non-modifiable character array located somewhere in internal memory. You don't need to worry about deallocating the memory referenced by this pointer as the string object's destructor will do so automatically.
But as to your original question: depends on how you want the function to work. If you're modifying a character array that you create within the function, it sounds like you'll need to allocate memory on the heap and return a pointer to it. The user would have to deallocate the memory themselves - there are plenty of standard library functions that work this way.
Alternatively, you could force the user to pass in character pointer as a parameter, which would ensure they've already created the array and know that they will need to deallocate the memory themselves. That method is used even more often and is probably preferable.

Related

Assign pointer contents to variable

Given a large struct pointer, say, large_ptr, and I want to assign it to a global var of the same type, let's call it g_large, then I have 2 options:
The first one using memcpy:
memcpy(&g_large, large_ptr, sizeof(g_large));
The second one using assignment:
g_large = *large_ptr;
Due to lack of memory and stack size in an embedded software I would like to know, does the second way behave like memcpy, or does it create a tmp var to do the assignment? Is there any standard for this?
If it behaves like memcpy then I'd prefer it for being shorter. But if it creates a temporary var, it might be a problem for the stack.
Your experience & knowledge will be appreciated!
Edit
A few mentioned I need to compile and view the assembly.
This is a process I need to learn, since it's a cross compiler, which generates asm files that are binary and need to be parsed. Not a simple task. I could do that, but it will take time.
I may misunderstood but in ur memcpy function the sizeof(g_large) will always return 8 bytes as result since the size of a pointer in c is 8 bytes. Therefore you get the pointer size and not the struct size. It's like you can not find the size of an array if you only have the pointer addressing it.
[edit: oh yeah i misunderstood but anyway the following section is still recommended]
What I would do:
dinamically allocate memory in main function
pass the allocated memory pointer to the local function where you want to work with your struct
extend the allocated memory if needed while working with your struct on the designated memory space
at the end of the local function you will already have ur struct stored in main function memory space without any copy function needed

C Design: Pass memory address or return

In some functions (such as *scanf variants) there is a argument that takes a memory space for the result. You could also write the code where it returns an address. What are the advantages, why design the function in such a weird way?
Example
void process_settings(char* data)
{
.... // open file and put the contents in the data memory
return;
}
vs
char* process_settings()
{
char* data = malloc(some_size);
.... // open file and load it into data memory
return data;
}
The benefit is that you can reserve the return value of the function for error checking, status indicators, etc, and actually send back data using the output parameter. In fact, with this pattern, you can send back any amount of data along with the return value, which could be immensely useful. And, of course, with multiple calls to the function (for example, calling scanf in a loop to validate user input), you don't have to malloc every time.
One of the best examples of this pattern being used effectively is the function strtol, which converts a string to a long.
The function accepts a pointer to a character as one of its parameters. It's common to declare this char locally as endptr and pass in its address to the function. The function will return the converted number if it was able to, but if not, it'll return 0 to indicate failure but also set the character pointer passed in to the non-digit character it encountered that caused the failure.
You can then report that the conversion failed on that particular character.
This is better design than using global error indicators; consider multithreaded programs. It likely isn't reasonable to use global error indicators if you'll be calling functions that could fail in several threads.
You mention that a function should be responsible for its own memory. Well, scanf doesn't exist to create the memory to store the scanned value. It exists to scan a value from an input buffer. The responsibilities of that function are very clear and don't include allocating the space.
It's also not unreasonable to return a malloc'd pointer. The programmer should be prudent, though, and free the returned pointer when they're done using it.
The decision of using one method instead of another depends on what you intend to do.
Example
If you want to modify an array inside a function an maintain the modification in the original array, you should use your first example.
If you are creating your own data structure, you have to deal with all the operations. And if you want to create a new struct you should allocate memory inside the function and return the pointer. The second example.
If you want to "return" two values from a function, like a vector and the length of the vector, and you don't want to create a struct for this, you could return the pointer of the vector and pass an int pointer as an argument of the function. That way you could modify the value of the int inside the function and you use it outside too.
char* return_vector_and_length(int* length);
Let’s say that, for example, you wanted to store process settings in a specific place in memory. With the first version, you can write this as, process_settings(output_buffer + offset);. How would you have to do it in you only had the second version? What would happen to performance if it were a really big array? Or what if, let’s say, you’re writing a multithreaded application where having all the threads call malloc() all the time would make them fight over the heap and serialize the program, so you want to preallocate all your buffers?
Your intuition is correct in some cases, though: on modern OSes that can memory-map files, it does turn out to be more efficient to return a pointer to the file contents than the way the standard library was historically written, and this is how glib does it. Sometimes allocating all your buffers on the heap helps avoid buffer overflows that smash the stack.
An important point is that, if you have the first version, you can trivially get the second one by calling malloc and then passing the buffer as the dest argument. But, if you have only the second, you can’t implement the first without copying the whole array.

Binary resources embedded in .exe and memory management when loaded

I'm working in a small C program and I need to embed binary data into an exe file. The method I'm using is converting that binary data into a char[] array... but I'm not including directly that array as a global variable; instead, I copy that array inside a function (LoadResource) that dynamically creates an array on heap, where I copy my original data. That's what I mean:
char *dataPntr;
void LoadResource()
{
char data[2048] = {/*my binary data */};
dataPntr = malloc(2048);
for (int i = 0; i < 2048; i++) dataPntr [i] = data[i];
}
That way, if my understanding is correct, when calling LoadResource() data[] will be placed in stack, copied to heap and finally data[] will be automatically deallocated from stack; heap copy should be manually deallocated with free().
I'm doing it this way because the resource is only used in some situations, not always... and I prefer to avoid a large global variable.
My questions:
When running the program, is data[] array placed somewhere in memory? text segment maybe? or is it just loaded into stack when calling LoadResource()?
Is my solution the proper one (in terms of memory management) or would it be better to just declare a global data array?
Thanks for your answers!
Generally it is a good idea to avoid global variables. I won't say you never need them, but they can be a pain to debug. The problem is that it can be difficult to follow who changed it last. And if you ever do any multi-threading then you will never want to see a global again!
I include your char *dataPntr in those comments - why is that global? It might be better to return the pointer instead.
Not sure why you are using an array on the stack (data), my guess is so that you can use the {...} initialisation syntax. Can you avoid that? It might not be a big deal, 2k is not a large overhead, but maybe it might grow?
Personally I would copy the data using memcpy()
You have a couple of "magic numbers" in your code, 2048 and 2018. Maybe one is a typo? To avoid this kind of issue, most will use a pre-processor macro. For example:
#include <string.h> /* for memcpy() */
#define DATA_SIZE 2048
char * LoadResource(void)
{
char data[DATA_SIZE] = {/*my binary data */};
char * dataPntr = malloc(DATA_SIZE);
if (dataPntr)
memcpy(dataPntr, data, DATA_SIZE);
return dataPntr;
}
By the way, notice the prototype for LoadResource as void. In C (not C++) an empty parameter list means no parameter checking, not no parameters. Also note that I check the returned value from malloc. This means that the function will return NULL on error.
Another strategy might be to make the data array static instead, however exactly when that gets initialised is compiler dependant, and you might find that you incur the memory overhead even if you don't use it.
While I agree with #cdarke in general, it sounds to me that you are creating a constant array (i. e. is never modified at run-time). If this is true, I would not hesitate to make it a global const array. Most compilers will simply place the array in text memory at link time and there will not be any run-time overhead for initialization.
If, on the other hand, you need to modify the data at run-time, I'd follow #cdarke's example, except to make your data array static const. This way, again, most compilers will place the preinitialized array in the text segment and you will avoid the run-time overhead for initializing the data array.

Best practice for allocating memory for use by a function — malloc inside or outside?

During my experience with C coding, I've seen 2 ways of passing arguments for functions:
malloc before calling functions
malloc inside functions (variable is not initialized before calling function)
I, particularly, prefer the second form. But while I'm the only one to code my program, I know that, but some else could not know, and could lead to 2 malloc, and leak of memory.
So, my question is: What's the best practice for this?
Allocating memory in the caller is more flexible, because it allows the caller to use static or automatic storage instead of dynamic allocation, and eliminates the need to handle the case of allocation failure in the callee. On the other hand, having the caller provide the storage requires the caller to know the size in advance. If the size is compiled into the caller as a constant and the callee is in a library that's later updated to use a larger structure, things will break horribly. You can avoid this, of course, by providing a second function (or external variable in the library) for retrieving the necessary size.
When in doubt, you can always make two functions:
The main function that uses caller-provided storage.
A wrapper function which allocates the right amount of storage, calls the function in #1 using it, and returns the pointer to the caller.
Then the caller is free to choose whichever method is more appropriate for the particular usage case.
I personally strongly favor your first proposition (whenever it is possible) for orthogonality. Take the following example:
extern void bar(int *p, int n);
void foo(int n)
{
int *p = malloc(n * sizeof *p);
// fill array object
bar(p, n);
// work with array elements
/* ... */
// array no longer needed, free object
free(p);
}
This is orthogonal. malloc and free are called in the same lexical scope which is clean and readable. Another advantage is you can pass to bar function an array with a different storage duration for example an array with automatic or static storage duration. You let bar function focus only on the work it has do and let another function manage the array allocation.
Note that this is also how all Standard C functions work: they never appear to call malloc.
The criteria I'd use for deciding are:
If the code outside the called function can know how much memory to allocate, then it is better to have the calling code allocate the memory.
If the code outside the called function cannot know how much memory to allocate, then the called function must do the memory allocation. It is likely then that there will be a second function available to release the memory returned by the first function (the 'called' function), unless it is just a single free() that's needed. The function documentation should make this clear.
For example, if the called function is reading a complete tree structure from a file, the function will have to allocate the memory. But, there will also be a companion function for releasing the memory (since the called code knows how to do it and the calling code shouldn't need to know).
On the other hand, if the called function is reading a simple list of integer and floating point values into a fixed size structure, it is far better to make the calling function allocate the memory. Note that I skipped 'strings'! If the strings are of a fixed size in the structure, then the calling function can do the allocation, but if the strings are of variable size, then probably the called function does the allocation.
The Standard C Library has functions like fgets() which expect the calling code to allocate the memory to be used. The calling sequence tells fgets() how much space is available. You run into problems if you didn't provide enough memory. (The problem with fgets() is that you may only get the start of a line of text, not the whole line of text.)
The POSIX 2008 Library provides getline() which will allocate enough space for the line.
The asprintf() and related functions (see TR24731-2) allocate memory as required. The snprintf() function does not — it is told how much space there is available, it uses no more than that, and says how much it really needed, and it is up to you to note if you didn't provide enough space and do something about it (allocate more space and try again, or blithely ignore the truncated value and continue as if nothing went wrong).
The principal of information hiding suggests that it allocating memory is best done within a function.
If you look at how stdio.h works:
FILE *myFile;
myFile = fopen("input.txt", "r");
if (!myFile) {
fprintf(stderr, "Error opening input.txt for reading.\n");
// other exit handling close
}
else {
// code to read from file
fclose(myFile);
}
the library call allocates memory that holds information about the file you are working with, and it returns a pointer to that structure. The caller is responsible for later on freeing that memory (with a call to fclose).
This pattern is repeated throughout the Standard C library.
There are at least two disadvantages to requiring the caller to allocate and free memory:
Extra code would be required on the calling side.
The calling code would need to be recompiled (at a minimum) or changed if the size of structure being allocated ever changed.

How to know whether an array is initialized in C

How to know whether an array is initialized in C ? Functions like strlen() are not helping me as I dont want to know whether the array is empty or not.
There's no way to test that at runtime -- an uninitialized array looks just like one that has been initialized with garbage.
Depending on what you're doing, you need either to make sure the array is actually initialized or explicitly pass around a flag that tells you whether the values in the array are meaningful yet.
Also note that "whether the array is empty" is not a very meaningful concept in C. The array is just there, and it always contains whatever number of bits are necessary to represent the elements it's declared to have. Those bits may not have meaningful values, but the're always there.
You can't by using programmatic language features.
But you can by design and discipline. At declaration time set your array as a pointer to NULL.
Then make a function to assign both memory and value to your pointer and a corresponding freeing function to destroy it when is not needed anymore, setting it to NULL again. And then making every function that processes check for NULL as an error condition.
To do bounds recognition, set the last element to NULL.
Example:
char* myArray=NULL;
/* other code */
myArray = createMyArray(n_elements);
memset(myArray,0,sizeof(int)*n_elements); /* Set the allocated memory to zero */
/* other code */
myArray[0]=functionReturningAString();
myArray[n_elements-1]=functionReturningAnotherString();
/* other code */
/*Processing*/
char* incr=myArray;
while( incr != NULL){
processArray(incr);
incr++;/* Increments by size of char pointer to the next pointer*/
}
free_Array(&myArray);/* this function calls free() and sets myArray to NULL*/
This is usable, when you need a lot of efficiency. Otherwise you should either create your own arraylist or use an existing library which provides it.
You need too much discipline to keep track of every possible error condition, so it can be tiresome.
Usually is just better to just use a library which provides arraylist, linkedlist, HashSets, etc. For C I use a lot of Glib functions for this.

Resources