How is memory allocated to multi-nested structs in C?

How is memory allocated to multi-nested structs in C? - c

A couple of days ago, I asked this question. I duplicated 90% of the code given in the answer to my previous question. However, when I used Valgrind to do memcheck, it told me that there were memory leaks. But I don't think it was that 10% difference that caused the memory leaks. In addition to the memory leak issue, I have a couple of other questions.
A brief summary of my previous post:
I have a multi-nested struct. I need to correctly allocate memory to it and free the memory later on. The structure of the entire struct should look like this:
College Sys
| | | ... |
ColleA ColleB ColleC ... ColleX
| | | | | | | | | | | | ... | | | |
sA sB sC sD sA sB sC sD sA sB sC sD ... sA sB sC sD
| | | | | | | | | | | | ... | | | |
fam fam ...
// Colle is short for college
// s is short for stu (which is short for student)
There could be arbitrary number of colleges and students, which is controllable by #define MAX_NUM _num_.
As per the previous answer, I should allocate memory in the order of "outermost to innermost" and free the memory "innermost to outermost". I basically understand the logic behind the pattern. The following questions are extensions to it.
1) Does
CollegeSys -> Colle = malloc(sizeof(*(CollegeSys -> Colle)) * MAX_NUM);
CollegeSys -> Colle -> stu = malloc(sizeof(*(CollegeSys -> Colle -> stu)) * MAX_NUM);
CollegeSys -> Colle -> stu -> fam = malloc(sizeof(*(CollegeSys -> Colle -> stu -> fam)));
mean " there are MAX_NUM colleges under the college system, each of which has MAX_NUM students — each of which has one family"?
1.a)If yes, do I still need for loops to initialize every single value contained in this huge struct?
For example, the possibly correct way:
for (int i = 0; i < MAX_NUM; i++) {
strcpy(CollegeSys -> Colle[i].name, "collestr");
for (int n = 0; n < MAX_NUM; n++) {
strcpy(system -> Colle[i].stu[n].name, "stustr");
...
}
}
the possibly incorrect way:
strcpy(CollegeSys -> Colle -> name, "collestr");
strcpy(CollegeSys -> Colle -> stu -> name, "stustr");
I tried the "possibly incorrect way". There was no syntax error, but it would only initialize CollegeSys -> Colle[0].name and ... -> stu[0].name. So, the second approach is very likely to be incorrect if I want to initialize every single attribute.
2) If I modularize this whole process, separating the process into several functions that return corresponding struct pointers — newSystem(void), newCollege(void), newStudent(void) (arguments might not necessarily be void; we might also pass a str as the name to the functions; besides, there might be a series of addStu(), etc... to assign those returned pointers to the corresponding part of CollegeSys). When I create a new CollegeSys in newSystem(), is it correct to malloc memory to every nested struct once and for all within newSystem()?
2.a) If I allocate memory to all parts of the struct in newSystem(), the possible consequence I can think of so far is there would be memory leaks. Since we've allocated memory to all parts when creating the system, we inevitably have to create a new struct pointer and allocate adequate memory to it in the other two functions too. For instance,
struct Student* newStudent(void) {
struct Student* newStu = malloc(sizeof(struct Student));
newStu -> fam = malloc(sizeof(*(newStu -> fam)));
// I'm not sure if I'm supposed to also allocate memoty to fam struct
...
return newStu;
}
If so, we actually allocate the same amount of memory to an instance at least twice — one in the newSystem(void), the other in newStudent(void). If I'm correct so far, this is definitely memory leak. And the memory we allocate to the pointer newStu in newStudent(void) can never be freed (I think so). Then, what is the correct way to allocate memory to the whole structure when we separate the whole memory allocation process into several small steps?
3) Do we have to use sizeof(*(layer1 -> layer2 -> ...)) when malloc'ing memory to a struct nested in a struct? Can we directly specify the type of it? For example, doing this
CollegeSys -> Colle = malloc(sizeof(struct College) * MAX_NUM);
// instead of
// CollegeSys -> Colle = malloc(sizeof(*(CollegeSys -> Colle)) * MAX_NUM);
4) It seems even if we allocate a certain amount of memory to a pointer, we still can't prevent segfault. For example, we code
// this is not completely correct C code, just to show what I mean
struct* ptr = malloc(sizeof(struct type) * 3);
We still could call ptr[3], ptr[4] and so on, and the compiler will print out nonsense. Sometimes, the compiler may throw an error but sometimes may not. So, essentially, we can't rely on malloc (or calloc and so forth) to avoid the appearance of segfault?
I'm sorry about writing such a long text. Thanks for your patience.

Related

Why is the following code susceptible to heap overflow attack

I'm new to cyber security, and I am trying to understand why the following code is susceptible to a heap overflow attack...
struct data {
char name[128];
};
struct fp {
int (*fp)();
};
void printName() {
printf("Printing function...\n");
}
int main(int argc, char **argv) {
struct data *d;
struct fp *f;
d = malloc(sizeof(struct data));
f = malloc(sizeof(struct fp));
f->fp = printName;
read(stdin,d->name,256);
f->fp();
}
Is it because of the read(stdin, d->name, 256) as it is reading more than the allocated buffer size of 128 for char name in struct data?
Any help would be great

A heap overflow attack is similar to a buffer overflow attack, except instead of overwriting values in the stack, the attacker tramples data in the heap.
Notice in your code that there are two dynamically allocated values:
d = malloc(sizeof(struct data));
f = malloc(sizeof(struct fp));
So d now holds the address of a 128-byte chunk of memory in the heap, while f holds the address of an 8-byte (assuming a 64-bit machine) chunk of memory. Theoretically, these two addresses could be nowhere near each other, but since they're both relatively small, it's likely that the OS allocated one larger chunk of contiguous memory and gave you pointers that are next to each other.
So once you run f->fp = printName;, your heap looks something like this:
Note: Each row is 8 bytes wide
| |
+------------------------+
f -> | <Address of printName> |
+------------------------+
| ▲ |
| 11 more rows |
| not shown |
| |
d -> | <Uninitialized data> |
+------------------------+
| |
Your initial assessment of where the vulnerability comes from is correct. d points to 128 bytes of memory, but you let the user write 256 bytes to that area. C has no mechanism for bounds checking, so the compiler is perfectly happy to let you write past the edge of the d memory. If f is right next to d, you'll fall over the edge of d and into f. Now, an attacker has the ability to modify the contents of f just by writing to d.
To exploit this vulnerability, an attacker feeds the address of some code that they've written to d by repeating it for all 256 bytes of input. If the attacker has stored some malicious code at address 0xbadc0de, they feed in 0xbadc0de to stdin 32 times (256 bytes) so that the heap gets overwritten.
| 0xbadc0de |
+-------------+
f -> | 0xbadc0de |
+-------------+
| ... |
| 0xbadc0de |
| 0xbadc0de |
d -> | 0xbadc0de |
+-------------+
| |
Then, your code reaches the line
f->fp();
which is a function call using the address stored in f. The machine goes to memory location f and retrieves the value stored there, which is now the address of the attacker's malicious code. Since we're calling it as a function, the machine now jumps to that address and begins executing the code stored there, and now you've got a lovely arbitrary code execution attack vector on your hands.

Freeing 2D array - Heap Corruption Detected

EDIT: Sorry guys, I forgot to mention that this is coded in VS2013.
I have a globally declared struct:
typedef struct data //Struct for storing search & sort run-time statistics.
{
int **a_collision;
} data;
data data1;
I then allocate my memory:
data1.a_collision = (int**)malloc(sizeof(int)*2); //Declaring outer array size - value/key index.
for (int i = 0; i < HASH_TABLE_SIZE; i++)
data1.a_collision[i] = (int*)malloc(sizeof(int)*HASH_TABLE_SIZE); //Declaring inner array size.
I then initialize all the elements:
//Initializing 2D collision data array.
for (int i = 0; i < 2; i++)
for (int j = 0; j < HASH_TABLE_SIZE; j++)
data1.a_collision[i][j] = NULL;
And lastly, I wish to free the memory (which FAILS). I have unsuccessfully tried following some of the answers given on SO already.
free(data1.a_collision);
for (int i = 0; i < HASH_TABLE_SIZE; i++)
free(data1.a_collision[i]);
A heap corruption detected error is given at the first free statement. Any suggestions?

There are multiple mistakes in your code. logically wrong how to allocate memory for two dimension array as well as some typos.
From comment in your code "outer array size - value/key index" it looks like you wants to allocate memory for "2 * HASH_TABLE_SIZE" size 2D array, whereas from your code in for loop breaking condition "i < HASH_TABLE_SIZE;" it seems you wants to create an array of size "HASH_TABLE_SIZE * 2".
Allocate memory:
Lets I assume you wants to allocate memory for "2 * HASH_TABLE_SIZE", you can apply same concept for different dimensions.
The dimension "2 * HASH_TABLE_SIZE" means two rows and HASH_TABLE_SIZE columns. Correct allocation steps for this would be as follows:
step-1: First create an array of int pointers of lenght equals to number of rows.
data1.a_collision = malloc(2 * sizeof(int*));
// 2 rows ^ ^ you are missing `*`
this will create an array of int pointers (int*) of two size, In your code in outer-array allocation you have allocated memory for two int objects as 2 * sizeof(int) whereas you need memory to store addresses. total memory bytes you need to allocate should be 2 * sizeof(int*) (this is poor typo mistake).
You can picture above allocation as:
343 347
+----+----+
data1.a_collision---►| ? | ? |
+----+----+
? - means garbage value, malloc don't initialize allocate memory
It has allocated two memory cells each can store address of int
In picture I have assumed that size of int* is 4 bytes.
Additionally, you should notice I didn't typecast returned address from malloc function because it is implicitly typecast void* is generic and can be assigned to any other types of pointer type (in fact in C we should avoid typecasting you should read more from Do I cast the result of malloc?).
Now step -2: Allocate memory for each rows as an array of length number of columns you need in array that is = HASH_TABLE_SIZE. So you need loop for number of rows(not for HASH_TABLE_SIZE) to allocate array for each rows, as below:
for(int i = 0; i < 2; i++)
// ^^^^ notice
data1.a_collision[i] = malloc(HASH_TABLE_SIZE * sizeof(int));
// ^^^^^
Now in each rows you are going to store int for array of ints of length HASH_TABLE_SIZE you need memory bytes = HASH_TABLE_SIZE * sizeof(int). You can picture it as:
Diagram
data1.a_collision = 342
|
▼ 201 205 209 213
+--------+ +-----+-----+-----+-----+
343 | | | ? | ? | ? | ? | //for i = 0
| |-------| +-----+-----+-----+-----+
| 201 | +-----------▲
+--------+ 502 506 510 514
| | +-----+-----+-----+-----+
347 | | | ? | ? | ? | ? | //for i = 1
| 502 |-------| +-----+-----+-----+-----+
+--------+ +-----------▲
data1.a_collision[0] = 201
data1.a_collision[1] = 502
In picture I assuming HASH_TABLE_SIZE = 4 and size of int= 4 bytes, note address's valuea
Now these are correct allocation steps.
Deallocate memory:
Other then allocation your deallocation steps are wrong!
Remember once you have called free on some pointer you can't access that pointer ( pr memory via other pointer also), doing this calls undefined behavior—it is an illegal memory instruction that can be detected at runtime that may causes—a segmentation fault as well or Heap Corruption Detected.
Correct deallocation steps are reverse of allocation as below:
for(int i = 0; i < 2; i++)
free(data1.a_collision[i]); // free memory for each rows
free(data1.a_collision); //free for address of rows.
Further more this is one way to allocate memory for two dimension array something like you were trying to do. But there is better way to allocate memory for complete 2D array continuously for this you should read "Allocate memory 2d array in function C" (to this linked answer I have also given links how to allocate memory for 3D arrays).

Here is a start:
Your "outer array" has space for two integers, not two pointers to integer.
Is HASH_TABLE_SIZE equal to 2? Otherwise, your first for loop will write outside the array you just allocated.

There are several issues :
The first allocation is not correct, you should alloc an array of (int *) :
#define DIM_I 2
#define DIM_J HASH_TABLE_SIZE
data1.a_collision = (int**)malloc(sizeof(int*)*DIM_I);
The second one is not correct any more :
for (int i = 0; i < DIM_I; i++)
data1.a_collision[i] = (int*)malloc(sizeof(int)*DIM_J);
When you free memory, you have to free in LastInFirstOut order:
for (int i = 0; i < DIM_I; i++)
free(data1.a_collision[i]);
free(data1.a_collision);

Seg fault when using structure pointers to access struct members in C

What is wrong with my program, I get seg fault when I try to print the values.
My aim is assign some values in sample_function.
and in main function I want to copy the structure to another structure.
#include<stdio.h>
#include<string.h>
typedef struct
{
char *name;
char *class;
char *rollno;
} test;
test *
sample_function ()
{
test *abc;
abc = (test *)malloc(sizeof(test));
strcpy(abc->name,"Microsoft");
abc->class = "MD5";
abc->rollno = "12345";
printf("%s %s %s\n",abc->name,abc->class,abc->rollno);
return abc;
}
int main(){
test *digest_abc = NULL;
test *abc = NULL;
abc = sample_function();
digest_abc = abc;
printf(" %s %s %s \n",digest_abc->name,digest_abc->class,digest_abc->rollno);
return 1;
}
Pointer has always been a nightmare for me, I never understood it.

test * sample_function ()
{
test *abc;
strcpy(abc->name,"Surya");
What do you think abc points to, here? The answer is, it doesn't really point to anything. You need to initialize it to something, which in this case means allocating some memory.
So, let's fix that first issue:
test * sample_function ()
{
test *abc = malloc(sizeof(*abc));
strcpy(abc->name,"Surya");
Now, abc points to something, and we can store stuff in there!
But ... abc->name is a pointer too, and what do you think that points to? Again, it doesn't really point to anything, and you certainly can't assume it points somewhere you can store your string.
So, let's fix your second issue:
test * sample_function ()
{
test *abc = malloc(sizeof(*abc));
abc->name = strdup("Surya");
/* ... the rest is ok ... */
return abc;
}
Now, there's one last issue: you never release the memory you just allocated (this probably isn't an issue here, but it'd be a bug in a full-sized program).
So, at the end of main, you should have something like
free(abc->name);
free(abc);
return 1;
}
The final issue is a design one: you have three pointers in your structure, and only convention to help you remember which is dynamically allocated (and must be freed) and which point to string literals (and must not be freed).
That's fine, so long as this convention is followed everywhere. As soon as you dynamically allocate class or rollno, you have a memory leak. As soon as you point name at a string literal, you'll have a crash and/or heap damage.
As japreiss points out in a comment, a good way to enforce your convention is to write dedicated functions, like:
void initialize_test(test *obj, const char *name, char *class, char *rollno) {
obj->name = strdup(name);
...
}
void destroy_test(test *obj) {
free(obj->name);
}
test *malloc_test(const char *name, ...) {
test *obj = malloc(sizeof(*obj));
initialize_test(obj, name, ...);
return test;
}
void free_test(test *obj) {
destroy_test(obj);
free(obj);
}

In your function sample_function you return a pointer to abc. You cannot do this in C due to the way Activation Records are organized.
An Activation Record is a data structure that contains all the relevant information for a function call, parameters, return address, addresses of local variables, etc...
When you call a function a new Activation Record gets pushed onto the stack it could look something like this.
// Record for some function f(a, b)
| local variable 1 | <- stack pointer (abc in your case)
| local variable 2 |
| old stack pointer | <- base pointer
| return address |
| parameter 1 |
| parameter 2 |
---------------------
| caller activation |
| record |
When you return from a function this same activation record gets popped off of the stack but what happens if you returned the address of a variable that was on the old record ?
// popped record
| local variable 1 | <- address of abc #
| local variable 2 | #
| old stack pointer | # Unallocated memory, any new function
| return address | # call could overwrite this
| parameter 1 | #
| parameter 2 | #
--------------------- <- stack pointer
| caller activation |
| record |
Now you try to use abc and your program correctly crashes because it sees that you are accessing an area of memory that is unallocated.
You also have problems with allocation, but other answers have already covered that.

In sample_function you declare abc as a pointer to a test structure, but you never initialize it. It's just pointing off into the weeds somewhere. Then you try to dereference it to store values - BOOM.
Your program doesn't need any pointers at all; structures can be passed by value in C.
If you do want to keep similar interfaces to what you have now, you're going to have to add some dynamic allocations (malloc/free calls) to make sure your structures are actually allocated and that your pointers actually point to them.

One element array in struct

Why some struct uses a single element array, such as follows:
typedef struct Bitmapset
{
int nwords;
uint32 words[1];
} Bitmapset;
To make it convenient for latter dynamic allocation?

In a word, yes.
Basically, the C99 way to do it is with an flexible array member:
uint32 words[];
Some pre-C99 compilers let you get away with:
uint32 words[0];
But the way to guarantee it to work across all compilers is:
uint32 words[1];
And then, no matter how it's declared, you can allocate the object with:
Bitmapset *allocate(int n)
{
Bitmapset *p = malloc(offsetof(Bitmapset, words) + n * sizeof(p->words[0]));
p->nwords = n;
return p;
}
Though for best results you should use size_t instead of int.

This is usually to allow idiomatic access to variable-sized struct instances. Considering your example, at runtime, you may have a Bitmapset that is laid out in memory like this:
-----------------
| nwords | 3 |
| words[0] | 10 |
| words[1] | 20 |
| words[2] | 30 |
-----------------
So you end up with a runtime-variable number of uint32 "hanging off" the end of your struct, but accessible as if they're defined inline in the struct. This is basically (ab)using the fact that C does no runtime array-bounds checking to allow you to write code like:
for (int i = 0; i < myset.nwords; i++) {
printf("%d\n", myset.words[i]);
}

Initialize 2-D array of unknown size

I have a 2-D array of characters e.g. char aList[numStrings][maxLength]. ideally, during program execution I want to be able to modify the contents of aList i.e. add, amend or delete entries. Since aList will be subject to change, I don't want to have to recompile my program after every such change to modify aList. So I want to write aList out to a text file at program end and then read it back into aList at the commencement of the next program run.
However, I don't know at program start what is the value of numStrings. (I am not using C99 so I can't use a VLA, and pick up a count of previous strings from an external file.) I could, of course, set numStrings to an artificially high value but that grates!
Is there a way to populate aList without knowing the value of numStrings? I don't think there is (I have looked at related questions) but is there another way of achieving what I need?

If you really want to be able to remove items from the middle of the grid (your questions isn't clear on this), you'll need some kind of multiply linked structure. These are often used to implement sparse arrays, so you can probably find one pre-made.
I'm talking about something like this:
+---+
| A |
+-|\+
| \
| \
| \
| \
| +----+----+----+
| | C0 | C1 | C2 | ...
| +--|-+----+--|-+
| | |
| | |
+-V--+ +--V-+ | +----+
| R0 |->|a0,0|-------+>|a0,3|--> ...
+----+ +--|-+ +--V-+----+
| R1 |-----+----->|a1,2|--> ...
+----+ | +--|-+
... V |
... V
...
Where A is the root node of the object, C is an array of column pointers, R is an array of row pointers, and each cell points to it next neighbor along both its row and column. All cells not explicitly represented are assumed to have some default value (usually NULL or 0).
It is a simple idea, but a fairly picky implementation, with lots of chances to mess up, so use a debugged library if you can.

You could use a dynamically allocated array. Use malloc() to make one, realloc() to change the size of one, and free() when you're done with it. But this has already been covered by another answer.
Another alternative is to use a linked list. That way you don't have to realloc() every time you want to extend your array - realloc() can be rather expensive if it has to copy the entire array to a new location.

The situation you've described is precisely what malloc is for -- allocating a variable-length block of memory.

If your plan is to populate while reading in the file you could do one of two things.
Either store the number of strings as the first element in the file, then the suggestion by jgottula would work well.
Or, must you use an array? You could read them directly into a linked list, and then when finished reading, move them into an array, and free up the linked list.

2D C-style arrays in general are, sorry the term, kindof wacky ... they look simple and useful on paper but implementing dynamic memory management - handling of allocation failures and cleanups/resizes - is often quite difficult in detail.
What you can do is something like:
/*
* Start with an array that can hold INITIAL_NUM elements of (char*).
*/
char **aList = (char**)malloc(INITIAL_NUM, sizeof(*aList));
int curIdx = 0, curListSz = INITIAL_NUM;
while (more_stuff_to_append) {
/*
* Still space in the existing list ? If not - resize
*/
if (curIdx >= INITIAL_NUM) {
curListSz += ALLOC_INCREMENT_FOR_ALIST;
if ((aList = realloc(aList, curListSz * sizeof(*aList))) == NULL)
error_and_yucky_cleanup("can't resize list, out of memory");
}
/*
* Allocate a new element.
* Note that if it's _known_ in advance that all elements
* are the same size, then malloc'ing a big block and slicing
* that into pieces is more efficient.
*/
if ((aList[curIdx] = malloc(new_elem_size, sizeof(char)) == NULL)
error_and_yucky_cleanup("out of memory");
/*
* put the contents into the new buffer, however that's done.
*/
populate_new_entry(aList[curIdx]);
curIdx++;
}
The big problem with these approaches is usually that cleanup is messy. One needs to go through the array and call free() on every element, plus the additional final one to clean up aList itself.
If you know all sizes in advance, one can allocate a single memory block that holds both aList and all the elements. This works via something like:
#define LISTSZ(lst) (NUMSTRINGS_MAX * sizeof(*(lst)))
#define ELEMSZ(lst) (STRINGSIZE_MAX * sizeof(**(lst)))
char **aList = malloc(LISTSZ(aList) + NUMSTRINGS * ELEMSZ(aList));
char *curElem = ((char*)aList) + LISTSZ(aList));
int i;
for (i = 0; i < NUMSTRINGS_MAX; i++) {
aList[i] = curElem;
curElem += ELEMSZ(aList);
}
The advantage of this is that cleanup is trivial - just call free((char*)aList); and the whole thing is gone. But you can't realloc() it anymore as that wouldn't insert new space at the beginning of the memory block (where aList[] is stored).
These things make up really good reasons for using C++ vectors; at least C++ does the cleanup (e.g. on out of memory exceptions) automatically.

You can dynamically allocate the array:
char **aList;
int i;
aList = malloc(sizeof(char *) * numStrings);
for (i = 0; i < numStrings; i++)
{
aList[i] = malloc(maxLength);
}
If, by any chance, you can use C++ instead of C, you could always use a C++ vector:
std::vector<std::vector<char> > aList;

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How is memory allocated to multi-nested structs in C? - c

Related

Why is the following code susceptible to heap overflow attack

Freeing 2D array - Heap Corruption Detected

Seg fault when using structure pointers to access struct members in C

One element array in struct

Initialize 2-D array of unknown size

Categories

Resources