How to split a generic void array into parts.c - c

as a beginner in C, I am struggling with an obscure problem and because I couldn't find a solution to this particular problem I want to ask you the following:
Currently I am trying to understand void pointers and their arithmetic operations. I attempted to write a generic function, which accepts a void pointer to an array, the length of the array and size of one element and splits the given array into two different parts (list1 and list2):
void split(void *array, int arrlen, int objsize)
{
// divide the arrlen and save the values
int len_list1 = arrlen / 2;
int len_list2 = arrlen - len_list1;
// Allocate memory for two temporary lists
void *list1 = (void*)malloc(objsize * len_list1);
void *list2 = (void*)malloc(objsize * len_list2);
if (list1 == NULL || list2 == NULL)
{
printf("[ERROR]!\n");
exit(-1);
}
// list1 gets initialized properly with int and char elements
memmove(list1, array, len_list1*objsize);
printf("list1:");
print_arr(list1, len_list1);
// memmove((char*)list2, (char*)array+list_1_length, list_2_length*objsize); (*)
memmove(list2, (int*)array+len_list1, len_list2*objsize);
printf("list2:");
print_arr(list2, len_list2);
}
My problem is the following:
If I give this function an int array it will work fine, but if I call split() with a char array as an argument, I have to...
memmove((char*)list2, (char*)array+list_1_length, list_2_length*objsize);
//memmove((int*)list2, (char*)array+list_1_length, list_2_length*objsize);
comment line (*) out, in order to have the same results. A solution certainly could be to write an if-else condition and test the objsize:
if (objsize == sizeof(int))
// move memory as in the 1st code snippet
else
// move memory with the (int*) cast
But with this solution I would also have to check other data types, so it would be very kind of you to give me a hint.
Thanks!
-matzui

memmove(list2, (int*)array+len_list1, len_list2*objsize);
Here you typecast array to an int *, and add len_list1 to it. But adding something to a pointer, means it will be multiplied with the size of one element of the datatype of that pointer. So if an int is 4 bytes, and you add 5 to an int * variable, it will move 20 bytes.
Because you know exactly how many bytes you want to move the pointer, you can cast is to char * (char = 1 byte), and add the number of bytes to it.
So instead of (int*)array+len_list1, you can use (char*)array+(len_list1*objsize)

A void pointer is just a word-sized dereferencable pointer that implies no particular data type. Thus, you cannot do pointer math with it. To do what you're trying to do, declare an appropriately typed pointer in your function, and then set its value equal to that of the parameter void pointer.

Related

What is the correct way to temporarily cast void* for arithmetic?

I am C novice but been a programmer for some years, so I am trying to learn C by following along Stanford's course from 2008 and doing Assignment 3 on Vectors in C.
It's just a generic array basically, so the data is held inside a struct as a void *. The compiler flag -Wpointer-arith is turned on so I can't do arithmetic (and I understand the reasons why).
The struct around the data must not know what type the data is, so that it is generic for the caller.
To simplify things I am trying out the following code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct {
void *data;
int aindex;
int elemSize;
} trial;
void init(trial *vector, int elemSize)
{
vector->aindex = 0;
vector->elemSize = elemSize;
vector->data = malloc(10 * elemSize);
}
void add(trial *vector, const void *elemAddr)
{
if (vector->aindex != 0)
vector->data = (char *)vector->data + vector->elemSize;
vector->aindex++;
memcpy(vector->data, elemAddr, sizeof(int));
}
int main()
{
trial vector;
init(&vector, sizeof(int));
for (int i = 0; i < 8; i++)
{add(&vector, &i);}
vector.data = (char *)vector.data - ( 5 * vector.elemSize);
printf("%d\n", *(int *)vector.data);
printf("%s\n", "done..");
free(vector.data);
return 0;
}
However I get an error at free with free(): invalid pointer. So I ran valgrind on it and received the following:
==21006== Address 0x51f0048 is 8 bytes inside a block of size 40 alloc'd
==21006== at 0x4C2CEDF: malloc (vg_replace_malloc.c:299)
==21006== by 0x1087AA: init (pointer_arithm.c:13)
==21006== by 0x108826: main (pointer_arithm.c:29)
At this point my guess is I am either not doing the char* correctly, or maybe using memcpy incorrectly
This happens because you add eight elements to the vector, and then "roll back" the pointer by only five steps before attempting a free. You can easily fix that by using vector->aindex to decide by how much the index is to be unrolled.
The root cause of the problem, however, is that you modify vector->data. You should avoid modifying it in the first place, relying on a temporary pointer inside of your add function instead:
void add(trial *vector, const void *elemAddr, size_t sz) {
char *base = vector->data;
memcpy(base + vector->aindex*sz, elemAddr, sz);
vector->aindex++;
}
Note the use of sz, you need to pass sizeof(int) to it.
Another problem in your code is when you print by casting vector.data to int*. This would probably work, but a better approach would be to write a similar read function to extract the data.
If you don't know the array's data type beforehand, you must assume a certain amount of memory when you first initialize it, for example, 32 bytes or 100 bytes. Then if you run out of memory, you can expand using realloc and copying over your previous data to the new slot. The C++ vector IIRC follows either a x2 or x2.2 ratio to reallocate, not sure.
Next up is your free. There's a big thing you must know here. What if the user were to send you a memory allocated object of their own? For example a char* that they allocated previously? If you simply delete the data member of your vector, that won't be enough. You need to ask for a function pointer in case the data type is something that requires special attention as your input to add.
Lastly you are doing a big mistake at this line here:
if (vector->aindex != 0)
vector->data = (char *)vector->data + vector->elemSize;
You are modifiyng your pointer address!!! Your initial address is lost here! You must never do this. Use a temporary char* to hold your initial data address and manipulate it instead.
Your code is somewhat confusing, there's probably a mis-understanding or two hiding in there.
A few observations:
You can't change a pointer returned by malloc() and then pass the new value to free(). Every value passed to free() must be the exact same value returned by one of the allocation functions.
As you've guessed, the copying is best done by memcpy() and you have to cast to char * for the arithmetic.
The function to append a value could be:
void add(trial *vector, const void *element)
{
memcpy((char *) vector->data + vector->aindex * vector->elemSize, element);
++vector->aindex;
}
Of course this doesn't handle overflowing the vector, since the length is not stored (I didn't want to assume it was hard-coded at 10).
Changing the data value in vector for each object is very odd, and makes things more confusing. Just add the required offset when you need to access the element, that's super-cheap and very straight forward.

Why does this example of pointer dereferencing work?

I have some code, and it works, and I don't understand why. Here:
// This structure keeps the array and its bookkeeping details together.
typedef struct {
void** headOfArray;
size_t numberUsed;
size_t currentSize;
} GrowingArray;
// This function malloc()'s an empty array and returns a struct containing it and its bookkeeping details.
GrowingArray createGrowingArray(int startingSize) { ... }
// Self-explanatory
void appendToGrowingArray(GrowingArray* growingArray, void* itemToAppend) { ... }
// This function realloc()'s an array, causing it to double in size.
void growGrowingArray(GrowingArray* arrayToGrow) { ... }
int main(int argc, char* argv[]) {
GrowingArray testArray = createGrowingArray(5);
int* testInteger = (int*) malloc(1);
*testInteger = 4;
int* anotherInteger = (int*) malloc(1);
*anotherInteger = 6;
appendToGrowingArray(&testArray, &testInteger);
appendToGrowingArray(&testArray, &anotherInteger);
printf("%llx\n", **(int**)(testArray.headOfArray[1]));
return 0;
}
So far, everything works exactly as I intend. The part that confuses me is this line:
printf("%llx\n", **(int**)(testArray.headOfArray[1]));
By my understanding, the second argument to printf() doesn't make sense. I got to mostly by trial and error. It reads to me as though I'm saying that the second element of the array of pointers in the struct is a pointer to a pointer to an int. It's not. It's just a pointer to an int.
What does make sense to me is this:
*(int*)(testArray.headOfArray[1])
It's my understanding that the second element of the array of pointers contained in the struct will be fetched by the last parenthetical, and that I then cast it as a pointer to an integer and then dereference that pointer.
What's wrong with my understanding? How is the compiler interpreting this?
My best guess is that your appendToGrowingArray looks something like this:
void appendToGrowingArray(GrowingArray* growingArray, void* itemToAppend) {
growingArray->headOfArray[growingArray->numberUsed++] = itemToAppend;
}
though obviously with additional logic to actually grow the arrow. However the point is that the itemToAppend is stored in the array pointed to by headOfArray.
But, if you look at your appendToGrowingArray calls, you are passing the addresses of testInteger and anotherInteger -- these are already pointers to integers, so you are storing pointers to pointers to integers in your headOfArray when you really intend to store pointers to integers.
So, when you consider testArray.headOfArray[1], it's value is the address on main's stack of the variable anotherInteger. When you dereference it the first time, it now points to the address of the buffer returned by the second malloc call that you stored in anotherInteger. So, it's only when you deference it a second time that you get to the contents of that buffer, namely the number 6.
You probably want to write:
appendToGrowingArray(&testArray, testInteger);
appendToGrowingArray(&testArray, anotherInteger);
instead.
(As noted in a comment, you also should fix your mallocs; you need more than 1 byte to store an integer these days!)

How to return a char** in C

I've been trying for a while now and I can not seem to get this working:
char** fetch (char *lat, char*lon){
char emps[10][50];
//char** array = emps;
int cnt = -1;
while (row = mysql_fetch_row(result))
{
char emp_det[3][20];
char temp_emp[50] = "";
for (int i = 0; i < 4; i++){
strcpy(emp_det[i], row[i]);
}
if ( (strncmp(emp_det[1], lat, 7) == 0) && (strncmp(emp_det[2], lon, 8) == 0) ) {
cnt++;
for (int i = 0; i < 4; i++){
strcat(temp_emp, emp_det[i]);
if(i < 3) {
strcat(temp_emp, " ");
}
}
strcpy(emps[cnt], temp_emp);
}
}
}
mysql_free_result(result);
mysql_close(connection);
return array;
Yes, I know array = emps is commented out, but without it commented, it tells me that the pointer types are incompatible. This, in case I forgot to mention, is in a char** type function and I want it to return emps[10][50] or the next best thing. How can I go about doing that? Thank you!
An array expression of type T [N][M] does not decay to T ** - it decays to type T (*)[M] (pointer to M-element array).
Secondly, you're trying to return the address of an array that's local to the function; once the function exits, the emps array no longer exists, and any pointer to it becomes invalid.
You'd probably be better off passing the target array as a parameter to the function and have the function write to it, rather than creating a new array within the function and returning it. You could dynamically allocate the array, but then you're doing a memory management dance, and the best way to avoid problems with memory management is to avoid doing memory management.
So your function definition would look like
void fetch( char *lat, char *lon, char emps[][50], size_t rows ) { ... }
and your function call would look like
char my_emps[10][50];
...
fetch( &lat, &lon, my_emps, 10 );
What you're attempting won't work, even if you attempt to cast, because you'll be returning the address of a local variable. When the function returns, that variable goes out of scope and the memory it was using is no longer valid. Attempting to dereference that address will result in undefined behavior.
What you need is to use dynamic memory allocation to create the data structure you want to return:
char **emps;
emps = malloc(10 * sizeof(char *));
for (int i=0; i<10; i++) {
emps[i] = malloc(50);
}
....
return emps;
The calling function will need to free the memory created by this function. It also needs to know how many allocations were done so it knows how many times to call free.
If you found a way to cast char emps[10][50]; into a char * or char **
you wouldn't be able to properly map the data (dimensions, etc). multi-dimensional char arrays are not char **. They're just contiguous memory with index calculation. Better fit to a char * BTW
but the biggest problem would be that emps would go out of scope, and the auto memory would be reallocated to some other variable, destroying the data.
There's a way to do it, though, if your dimensions are really fixed:
You can create a function that takes a char[10][50] as an in/out parameter (you cannot return an array, not allowed by the compiler, you could return a struct containing an array, but that wouldn't be efficient)
Example:
void myfunc(char emp[10][50])
{
emp[4][5] = 'a'; // update emp in the function
}
int main()
{
char x[10][50];
myfunc(x);
// ...
}
The main program is responsible of the memory of x which is passed as modifiable to myfunc routine: it is safe and fast (no memory copy)
Good practice: define a type like this typedef char matrix10_50[10][50]; it makes declarations more logical.
The main drawback here is that dimensions are fixed. If you want to use myfunc for another dimension set, you have to copy/paste it or use macros to define both (like a poor man's template).
EDITa fine comment suggests that some compilers support variable array size.
So you could pass dimensions alongside your unconstrained array:
void myfunc(int rows, int cols, char emp[rows][cols])
Tested, works with gcc 4.9 (probably on earlier versions too) only on C code, not C++ and not in .cpp files containing plain C (but still beats cumbersome malloc/free calls)
In order to understand why you can't do that, you need to understand how matrices work in C.
A matrix, let's say your char emps[10][50] is a continuous block of storage capable of storing 10*50=500 chars (imagine an array of 500 elements). When you access emps[i][j], it accesses the element at index 50*i + j in that "array" (pick a piece of paper and a pen to understand why). The problem is that the 50 in that formula is the number of columns in the matrix, which is known at the compile time from the data type itself. When you have a char** the compiler has no way of knowing how to access a random element in the matrix.
A way of building the matrix such that it is a char** is to create an array of pointers to char and then allocate each of those pointers:
char **emps = malloc(10 * sizeof(char*)); // create an array of 10 pointers to char
for (int i = 0; i < 10; i++)
emps[i] = malloc(50 * sizeof(char)); // create 10 arrays of 50 chars each
The point is, you can't convert a matrix to a double pointer in a similar way you convert an array to a pointer.
Another problem: Returning a 2D matrix as 'char**' is only meaningful if the matrix is implemented using an array of pointers, each pointer pointing to an array of characters. As explained previously, a 2D matrix in C is just a flat array of characters. The most you can return is a pointer to the [0][0] entry, a 'char*'. There's a mismatch in the number of indirections.

Dynamic arrays of arrays

If I have:
typedef char pos[2]; /*btw I now know no one should do this*/
void someFunction(void) {
pos *s = malloc(sizeof(pos) * 2);
}
In the cases like this how s working? What is it? Arrays are like pointers except when you use sizeof on them you will get the "correct" size. So in this case the following means that s is going to be a pointing to a sizeof(char)*4 sized memory? But the type of s is a pointer to a pointer which means that you can't use s as a one dimensional array (or a pointer which points chars ) becouse you "still need to go through one level/layer of indirection/pointer". Or am I wrong?
How can I use s? As a 2 dimensional array or as a one dimensional one?
(If you are interested: I need this bc I want to return two pos from a function. Is there a better way? (despite fixing this and using a struct for storing position data instead of a 2-sized array))
This typedef costruct is just equivalent to:
#include <stdio.h>
void someFunction(void) {
char (*pos)[2];
pos = malloc(sizeof(*pos) * 2);
pos[0][0] = 1;
}
int main(void) {
someFunction();
return 0;
}
That means that pos is the pointer to two-elements array of char. You can use just like as two-dimensional array with fixed column size as two. Number of rows is controlled by malloc() call, in your case it happened to be two as well.

C Pointer help: Array/pointer equivalence

In this toy code example:
int MAX = 5;
void fillArray(int** someArray, int* blah) {
int i;
for (i=0; i<MAX; i++)
(*someArray)[i] = blah[i]; // segfault happens here
}
int main() {
int someArray[MAX];
int blah[] = {1, 2, 3, 4, 5};
fillArray(&someArray, blah);
return 0;
}
... I want to fill the array someArray, and have the changes persist outside the function.
This is part of a very large homework assignment, and this question addresses the issue without allowing me to copy the solution. I am given a function signature that accepts an int** as a parameter, and I'm supposed to code the logic to fill that array. I was under the impression that dereferencing &someArray within the fillArray() function would give me the required array (a pointer to the first element), and that using bracketed array element access on that array would give me the necessary position that needs to be assigned. However, I cannot figure out why I'm getting a segfault.
Many thanks!
I want to fill the array someArray, and have the changes persist outside the function.
Just pass the array to the function as it decays to a pointer to the first element:
void fillArray(int* someArray, int* blah) {
int i;
for (i=0; i<MAX; i++)
someArray[i] = blah[i];
}
and invoked:
fillArray(someArray, blah);
The changes to the elements will be visible outside of the function.
If the actual code was to allocate an array within fillArray() then an int** would be required:
void fillArray(int** someArray, int* blah) {
int i;
*someArray = malloc(sizeof(int) * MAX);
if (*someArray)
{
for (i=0; i<MAX; i++) /* or memcpy() instead of loop */
(*someArray)[i] = blah[i];
}
}
and invoked:
int* someArray = NULL;
fillArray(&someArray, blah);
free(someArray);
When you create an array, such as int myArray[10][20], a guaranteed contiguous block of memory is allocated from the stack, and normal array arithmetic is used to find any given element in the array.
If you want to allocate that 3D "array" from the heap, you use malloc() and get some memory back. That memory is "dumb". It's just a chunk of memory, which should be thought of as a vector. None of the navigational logic attendant with an array comes with that, which means you must find another way to navigate your desired 3D array.
Since your call to malloc() returns a pointer, the first variable you need is a pointer to hold the vector of int* s you're going to need to hold some actual integer data IE:
int *pArray;
...but this still isn't the storage you want to store integers. What you have is an array of pointers, currently pointing to nothing. To get storage for your data, you need to call malloc() 10 times, with each malloc() allocating space for 20 integers on each call, whose return pointers will be stored in the *pArray vector of pointers. This means that
int *pArray
needs to be changed to
int **pArray
to correctly indicate that it is a pointer to the base of a vector of pointers.
The first dereferencing, *pArray[i], lands you somewhere in an array of int pointers, and the 2nd dereferencing, *p[i][j], lands you somewhere inside an array of ints, pointed to by an int pointer in pArray[i].
IE: you have a cloud of integer vectors scattered all over the heap, pointed to by an array of pointers keeping track of their locations. Not at all similar to Array[10][20] allocated statically from the stack, which is all contiguous storage, and doesn't have a single pointer in it anywhere.
As others have eluded to, the pointer-based heap method doesn't seem to have a lot going for it at first glance, but turns out to be massively superior.
1st, and foremost, you can free() or realloc() to resize heap memory whenever you want, and it doesn't go out of scope when the function returns. More importantly, experienced C coders arrange their functions to operate on vectors where possible, where 1 level of indirection is removed in the function call. Finally, for large arrays, relative to available memory, and especially on large, shared machines, the large chunks of contiguous memory are often not available, and are not friendly to other programs that need memory to operate. Code with large static arrays, allocated on the stack, are maintenance nightmares.
Here you can see that the table is just a shell collecting vector pointers returned from vector operations, where everything interesting happens at the vector level, or element level. In this particular case, the vector code in VecRand() is calloc()ing it's own storage and returning calloc()'s return pointer to TblRand(), but TblRand has the flexibility to allocate VecRand()'s storage as well, just by replacing the NULL argument to VecRand() with a call to calloc()
/*-------------------------------------------------------------------------------------*/
dbl **TblRand(dbl **TblPtr, int rows, int cols)
{
int i=0;
if ( NULL == TblPtr ){
if (NULL == (TblPtr=(dbl **)calloc(rows, sizeof(dbl*))))
printf("\nCalloc for pointer array in TblRand failed");
}
for (; i!=rows; i++){
TblPtr[i] = VecRand(NULL, cols);
}
return TblPtr;
}
/*-------------------------------------------------------------------------------------*/
dbl *VecRand(dbl *VecPtr, int cols)
{
if ( NULL == VecPtr ){
if (NULL == (VecPtr=(dbl *)calloc(cols, sizeof(dbl))))
printf("\nCalloc for random number vector in VecRand failed");
}
Randx = GenRand(VecPtr, cols, Randx);
return VecPtr;
}
/*--------------------------------------------------------------------------------------*/
static long GenRand(dbl *VecPtr, int cols, long RandSeed)
{
dbl r=0, Denom=2147483647.0;
while ( cols-- )
{
RandSeed= (314159269 * RandSeed) & 0x7FFFFFFF;
r = sqrt(-2.0 * log((dbl)(RandSeed/Denom)));
RandSeed= (314159269 * RandSeed) & 0x7FFFFFFF;
*VecPtr = r * sin(TWOPI * (dbl)(RandSeed/Denom));
VecPtr++;
}
return RandSeed;
}
There is no "array/pointer" equivalence, and arrays and pointers are very different. Never confuse them. someArray is an array. &someArray is a pointer to an array, and has type int (*)[MAX]. The function takes a pointer to a pointer, i.e. int **, which needs to point to a pointer variable somewhere in memory. There is no pointer variable anywhere in your code. What could it possibly point to?
An array value can implicitly degrade into a pointer rvalue for its first element in certain expressions. Something that requires an lvalue like taking the address (&) obviously does not work this way. Here are some differences between array types and pointer types:
Array types cannot be assigned or passed. Pointer types can
Pointer to array and pointer to pointer are different types
Array of arrays and array of pointers are different types
The sizeof of an array type is the length times the size of the component type; the sizeof of a pointer is just the size of a
pointer

Resources