How to know whether an array is initialized in C - c

How to know whether an array is initialized in C ? Functions like strlen() are not helping me as I dont want to know whether the array is empty or not.

There's no way to test that at runtime -- an uninitialized array looks just like one that has been initialized with garbage.
Depending on what you're doing, you need either to make sure the array is actually initialized or explicitly pass around a flag that tells you whether the values in the array are meaningful yet.
Also note that "whether the array is empty" is not a very meaningful concept in C. The array is just there, and it always contains whatever number of bits are necessary to represent the elements it's declared to have. Those bits may not have meaningful values, but the're always there.

You can't by using programmatic language features.
But you can by design and discipline. At declaration time set your array as a pointer to NULL.
Then make a function to assign both memory and value to your pointer and a corresponding freeing function to destroy it when is not needed anymore, setting it to NULL again. And then making every function that processes check for NULL as an error condition.
To do bounds recognition, set the last element to NULL.
Example:
char* myArray=NULL;
/* other code */
myArray = createMyArray(n_elements);
memset(myArray,0,sizeof(int)*n_elements); /* Set the allocated memory to zero */
/* other code */
myArray[0]=functionReturningAString();
myArray[n_elements-1]=functionReturningAnotherString();
/* other code */
/*Processing*/
char* incr=myArray;
while( incr != NULL){
processArray(incr);
incr++;/* Increments by size of char pointer to the next pointer*/
}
free_Array(&myArray);/* this function calls free() and sets myArray to NULL*/
This is usable, when you need a lot of efficiency. Otherwise you should either create your own arraylist or use an existing library which provides it.
You need too much discipline to keep track of every possible error condition, so it can be tiresome.
Usually is just better to just use a library which provides arraylist, linkedlist, HashSets, etc. For C I use a lot of Glib functions for this.

Related

C - Newly declared array contaminated with values from other variables

I'm alarmed to see that a newly declared array is being contiminated with some random values and some partial values from other variables within my C program.
Here's the source code of my function. I'm basically writing some pseudo code in preparation for doing some complex XML parsing and file manipulation (think similar to a mail merge). Anyway I'm concerned if there are random values in my newly declared array. Why isn't it empty of values when I first declare it?
Do I really need to traverse my entire array to set it's elements to blank values before I begin assigning values or is it likely that there's something wrong with other variable declarations in my code?
Thank you for your help.
Regards,
Chris
void ShowArray(void)
{
char aryString[5][5][255];
sprintf(aryString[1][1],"AAAAA");
sprintf(aryString[1][2],"BBBBB");
sprintf(aryString[1][3],"CCCCC");
sprintf(aryString[1][4],"DDDDD");
sprintf(aryString[1][5],"EEEEE");
sprintf(aryString[2][1],"A2");
sprintf(aryString[2][2],"B2");
int numRow;
int numCol;
for (numRow=1;numRow < 6;numRow++)
{
for (numCol=1;numCol < 6;numCol++)
printf("%d,%d:%s\n", numRow, numCol,aryString[numRow][numCol]);
}
}
Unfortunately you have to initialise the values of every element in an array.
Having random values populating your array and variables when you first declare it is normal. This is because when your computer frees up memory, it doesn't reset them to zero. You computer just allows other programs to overwrite the values in those newly freed memory locations.
Those uninitiallized values are just leftovers from other functions.
A local variable in a function will have an initially undefined value. This is, in fact, what you want, since the alternative would be for the compiler to force an initialization that in most case you don't want, unavoidably slowing your function. It is your responsibility to ensure that any variable has been properly defined before trying to use its value. I have never found this to be a problem.
You are also writing to the [1][5]th string in your code with sprintf. Your aryString variable is of dimensions [5][5][255]. Remember that array indexing in C is 0-based. You should not go beyond the [1][4]th element. You might want to delete that line and try again, because you will end up corrupting your own data by yourself.
Yes, all auto(opposite to static, which is declared explicitly) variables you declare in a function calls for manual initialization. The compiler won't initialize it automatically because it don't know what do you want to be written to that memory. To make it write the default value, which is usually 00000000, to uninitialized variables, write char aryString[5][5][255] = {};, or more commonly, char aryString[5][5][255] = {0};.
Also, the value an uninitialized variable contains is not only a garbage value, but also likely a trap representation, and merely accessing it will cause undefined behavior.

C Design: Pass memory address or return

In some functions (such as *scanf variants) there is a argument that takes a memory space for the result. You could also write the code where it returns an address. What are the advantages, why design the function in such a weird way?
Example
void process_settings(char* data)
{
.... // open file and put the contents in the data memory
return;
}
vs
char* process_settings()
{
char* data = malloc(some_size);
.... // open file and load it into data memory
return data;
}
The benefit is that you can reserve the return value of the function for error checking, status indicators, etc, and actually send back data using the output parameter. In fact, with this pattern, you can send back any amount of data along with the return value, which could be immensely useful. And, of course, with multiple calls to the function (for example, calling scanf in a loop to validate user input), you don't have to malloc every time.
One of the best examples of this pattern being used effectively is the function strtol, which converts a string to a long.
The function accepts a pointer to a character as one of its parameters. It's common to declare this char locally as endptr and pass in its address to the function. The function will return the converted number if it was able to, but if not, it'll return 0 to indicate failure but also set the character pointer passed in to the non-digit character it encountered that caused the failure.
You can then report that the conversion failed on that particular character.
This is better design than using global error indicators; consider multithreaded programs. It likely isn't reasonable to use global error indicators if you'll be calling functions that could fail in several threads.
You mention that a function should be responsible for its own memory. Well, scanf doesn't exist to create the memory to store the scanned value. It exists to scan a value from an input buffer. The responsibilities of that function are very clear and don't include allocating the space.
It's also not unreasonable to return a malloc'd pointer. The programmer should be prudent, though, and free the returned pointer when they're done using it.
The decision of using one method instead of another depends on what you intend to do.
Example
If you want to modify an array inside a function an maintain the modification in the original array, you should use your first example.
If you are creating your own data structure, you have to deal with all the operations. And if you want to create a new struct you should allocate memory inside the function and return the pointer. The second example.
If you want to "return" two values from a function, like a vector and the length of the vector, and you don't want to create a struct for this, you could return the pointer of the vector and pass an int pointer as an argument of the function. That way you could modify the value of the int inside the function and you use it outside too.
char* return_vector_and_length(int* length);
Let’s say that, for example, you wanted to store process settings in a specific place in memory. With the first version, you can write this as, process_settings(output_buffer + offset);. How would you have to do it in you only had the second version? What would happen to performance if it were a really big array? Or what if, let’s say, you’re writing a multithreaded application where having all the threads call malloc() all the time would make them fight over the heap and serialize the program, so you want to preallocate all your buffers?
Your intuition is correct in some cases, though: on modern OSes that can memory-map files, it does turn out to be more efficient to return a pointer to the file contents than the way the standard library was historically written, and this is how glib does it. Sometimes allocating all your buffers on the heap helps avoid buffer overflows that smash the stack.
An important point is that, if you have the first version, you can trivially get the second one by calling malloc and then passing the buffer as the dest argument. But, if you have only the second, you can’t implement the first without copying the whole array.

Sentinel vs. passing around a count

I have a structure in C which contains an array on which I perform stack operations.
If the stack is full, I need to prevent pushing an element past the end of the array, and return an error condition.
Is it better style to include the size of the stack as an element of the structure, and pass that number of elements to the stack_push() function, or should I have a sentinel element at the end of the stack array?
How are you going to implement your stack_push() function?
If it requires scanning to the end of the array looking for an empty slot to insert the pushed element into, then you need a sentinel value anyway (e.g., NULL, if the array contains pointer elements). But note that the algorithm is going to be O(N).
On the other hand, keeping track of the number of active elements within the array allows your algorithm to be O(1) for pushes (and also for pops). It also saves you the trouble of allocating one extra element in the array, which may be significant if it's an array of structs.
Generally speaking, most stack data structures are implemented using an array and a counter.
What sentinel value would you use? Of course, you have to be sure that the caller of this function will definitely never use this value. It could be very confusing to debug if your function is stopping prematurely at a sentinel value that seems like a reasonable input.
For things like strings, it is easy to use a NULL to terminate because a string should never have a zero byte. However, if you start using sentinels in some places and not in others, it can start to get very confusing for developers that are trying to use your code.
I would say to use a size argument unless there is a VERY clear and obvious choice of sentinel, and probably not even then.

Char array size when using certain library functions

When using some library functions (e.g. strftime(), strcpy(), MultiByteToWideChar()) that deal with character arrays (instead of std::string's) one has 2 options:
use a fixed size array (e.g. char buffer[256];) which is obviously bad because of the string length limit
use new to allocate required size which is also bad when one wants to create a utility function like this:
char * fun(void)
{
char * array = new char[exact_required_size];
some_function(array);
return array;
}
because the user of such function has to delete the array.
And the 2nd option isn't even always possible if one can't know the exact array size/length before using the problematic function (when one can't predict how long a string the function will return).
The perfect way would be to use std::string since it has variable length and its destructor takes care of deallocating memory but many library functions just don't support std::string (whether they should is another question).
Ok, so what's the problem? Well - how should I use these functions? Use a fixed size array or use new and make the user of my function worry about deallocating memory? Or maybe there actually is a smooth solution I didn't think of?
You can use std::string's data() method to get a pointer to a character array with the same sequence of characters currently contained in the string object. The character pointer returned points to a constant, non-modifiable character array located somewhere in internal memory. You don't need to worry about deallocating the memory referenced by this pointer as the string object's destructor will do so automatically.
But as to your original question: depends on how you want the function to work. If you're modifying a character array that you create within the function, it sounds like you'll need to allocate memory on the heap and return a pointer to it. The user would have to deallocate the memory themselves - there are plenty of standard library functions that work this way.
Alternatively, you could force the user to pass in character pointer as a parameter, which would ensure they've already created the array and know that they will need to deallocate the memory themselves. That method is used even more often and is probably preferable.

How to check if a pointer variable is junk during runtime?

I use valgrind to validate my code and it reports "Conditional jump or move depends on uninitialised value(s)" in one of my functions, which takes an array of pointers as argument.
Now, how do I check if an array contains junk values (might be using conditional break point) during run-time? Say, I don't access the pointer and hence the program doesn't break.
What is the condition to be checked for to identify a junk pointer?
While the other answers are correct, you can also get valgrind to help you identify which entry or entries in the array exactly are causing the problem.
What you need to do is to add code to your program which loops over the array (you may already have such a loop of course) and then include valgrind/memcheck.h and add something like this to the loop:
if (VALGRIND_CHECK_VALUE_IS_DEFINED(entry)) {
printf("index %d undefined\n", index);
}
where entry is the actual value from the array and index is the index of that value in the arry.
You can't differentiate a valid pointer and junk(uninitialized) pointer, they are all just numbers.
The fact that you are dealing with a "junk" pointer at some point in your code indicates, there's a problem before reaching that point.
You don't test for junk, you put non-junk values in the array at some point between the time you create the array, and the first time you consider using the values. Usually you do it when the array is created:
const char* strings[] = {0, "junk", "here"};
int some_values[10] = { 0 };
Valgrind uses its own tricks to identify what it thinks is junk, but those tricks are outside the scope of the standard, and regular C code can't use them (or anyway shouldn't try). Even if you could somehow hook into what valgrind does, you'd end up with code that doesn't work on all implementations, or that only works when run under valgrind.
You need to systematically initialize all your pointers to NULL.
When you deallocate memory reset your pointer to NULL as well.
This can be done using "constructor/destructor" functions wrapping malloc/free for instance.
Only then you can test for NULL valued pointer to see if something went wrong.

Resources