Why is realloc eating tons of memory?

Why is realloc eating tons of memory? - c

This question is a bit long due the source code, which I tried to simplify as much as possible. Please bear with me and thanks for reading along.
I have an application with a loop that runs potentially millions of times. Instead of several thousands to millions of malloc/free calls within that loop, I would like to do one malloc up front and then several thousands to millions of realloc calls.
But I'm running into a problem where my application consumes several GB of memory and kills itself, when I am using realloc. If I use malloc, my memory usage is fine.
If I run on smaller test data sets with valgrind's memtest, it reports no memory leaks with either malloc or realloc.
I have verified that I am matching every malloc-ed (and then realloc-ed) object with a corresponding free.
So, in theory, I am not leaking memory, it is just that using realloc seems to consume all of my available RAM, and I'd like to know why and what I can do to fix this.
What I have initially is something like this, which uses malloc and works properly:
Malloc code
void A () {
do {
B();
} while (someConditionThatIsTrueForMillionInstances);
}
void B () {
char *firstString = NULL;
char *secondString = NULL;
char *someOtherString;
/* populate someOtherString with data from stream, for example */
C((const char *)someOtherString, &firstString, &secondString);
fprintf(stderr, "first: [%s] | second: [%s]\n", firstString, secondString);
if (firstString)
free(firstString);
if (secondString)
free(secondString);
}
void C (const char *someOtherString, char **firstString, char **secondString) {
char firstBuffer[BUFLENGTH];
char secondBuffer[BUFLENGTH];
/* populate buffers with some data from tokenizing someOtherString in a special way */
*firstString = malloc(strlen(firstBuffer)+1);
strncpy(*firstString, firstBuffer, strlen(firstBuffer)+1);
*secondString = malloc(strlen(secondBuffer)+1);
strncpy(*secondString, secondBuffer, strlen(secondBuffer)+1);
}
This works fine. But I want something faster.
Now I test a realloc arrangement, which malloc-s only once:
Realloc code
void A () {
char *firstString = NULL;
char *secondString = NULL;
do {
B(&firstString, &secondString);
} while (someConditionThatIsTrueForMillionInstances);
if (firstString)
free(firstString);
if (secondString)
free(secondString);
}
void B (char **firstString, char **secondString) {
char *someOtherString;
/* populate someOtherString with data from stream, for example */
C((const char *)someOtherString, &(*firstString), &(*secondString));
fprintf(stderr, "first: [%s] | second: [%s]\n", *firstString, *secondString);
}
void C (const char *someOtherString, char **firstString, char **secondString) {
char firstBuffer[BUFLENGTH];
char secondBuffer[BUFLENGTH];
/* populate buffers with some data from tokenizing someOtherString in a special way */
/* realloc should act as malloc on first pass through */
*firstString = realloc(*firstString, strlen(firstBuffer)+1);
strncpy(*firstString, firstBuffer, strlen(firstBuffer)+1);
*secondString = realloc(*secondString, strlen(secondBuffer)+1);
strncpy(*secondString, secondBuffer, strlen(secondBuffer)+1);
}
If I look at the output of free -m on the command-line while I run this realloc-based test with a large data set that causes the million-loop condition, my memory goes from 4 GB down to 0 and the app crashes.
What am I missing about using realloc that is causing this? Sorry if this is a dumb question, and thanks in advance for your advice.

realloc has to copy the contents from the old buffer to the new buffer if the resizing operation cannot be done in place. A malloc/free pair can be better than a realloc if you don't need to keep around the original memory.
That's why realloc can temporarily require more memory than a malloc/free pair. You are also encouraging fragmentation by continuously interleaving reallocs. I.e., you are basically doing:
malloc(A);
malloc(B);
while (...)
{
malloc(A_temp);
free(A);
A= A_temp;
malloc(B_temp);
free(B);
B= B_temp;
}
Whereas the original code does:
while (...)
{
malloc(A);
malloc(B);
free(A);
free(B);
}
At the end of each of the second loop you have cleaned up all the memory you used; that's more likely to return the global memory heap to a clean state than by interleaving memory allocations without completely freeing all of them.

Using realloc when you don't want to preserve the existing contents of the memory block is a very very bad idea. If nothing else, you'll waste lots of time duplicating data you're about to overwrite. In practice, the way you're using it, the resized blocks will not fit in the old space, so they get located at progressively higher and higher addresses on the heap, causing the heap to grow ridiculously.
Memory management is not easy. Bad allocation strategies lead to fragmentation, atrocious performance, etc. The best you can do is avoid introducing any more constraints than you absolutely have to (like using realloc when it's not needed), free as much memory as possible when you're done with it, and allocate large blocks of associated data together in a single allocation rather than in small pieces.

You are expecting &(*firstString) to be the same as firstString, but in fact it is taking the address of the argument to your function rather than passing through the address of the pointers in A. Thus every time you call you make a copy of NULL, realloc new memory, lose the pointer to the new memory, and repeat. You can easily verify this by seeing that at the end of A the original pointers are still null.
EDIT: Well, it's an awesome theory, but I seem to be wrong on the compilers I have available to me to test.

Related

Using realloc to reduce the size of a memory block

A little more than 20 years ago I had some grasp of writing something small in C , but even at that time, I probably didn't really do things right all the time. Now I'm trying to learn C again, so I'm really a newbie.
Based on this article:
Using realloc to shrink the allocated memory
, I made this test, which works, but troubles me:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int test (char *param) {
char *s = malloc(strlen(param));
strcpy(s, param);
printf("original string : [%4d] %s \n", strlen(s), s);
// reduce size
char *tmp = realloc(s, 5);
if (tmp == NULL) {
printf("Failed\n");
free(s);
exit(1);
} else {
tmp[4] = 0;
}
s = tmp;
printf("the reduced string : [%4d] %s\n", strlen(s), s );
free(s);
}
void main(void){
test("This is a string with a certain length!");
}
If I leave out "tmp[4] = 0", then I still get back the whole string. Does this mean the rest of the string is still in memory, but not allocated anymore?
how does c free memory anyway? Does it keep track of memory by itself or is it something that is handled by the OS?
I free the s string "free(s)", do I also need to free the tmp str (it does point to the same memory block, yet the (same) address it holds is probably stored on another memory location?
These are most likely just basics, but none of what I have read so far has given me a clear answer (including mentioned article).

If I leave out "tmp[4] = 0", then I still get back the whole string.
You've invoked undefined behavior. All the string operations require the argument to be a null-terminated array of characters. If you reduce the size of the allocation so it doesn't include the null terminator, you're accessing outside the allocation when it tries to find it.
Does this mean the rest of the string is still in memory, but not allocated anymore?
In practice, many implementations don't actually re-allocate anything when you shrink the size. They simply update the bookkeeping information to say that the allocated length is shorter, and return the original pointer. So the remainder of the string stays the same unless you do another allocation that happens to use that memory.
This can even happen when you grow the size. Some designs always allocate memory in specific granularities (e.g. powers of 2), so if you grow the allocation but it doesn't exceed the granularity, it doesn't need to copy the data.
how does c free memory anyway? Does it keep track of memory by itself or is it something that is handled by the OS?
Heap management is part of the C runtime library. It can use a variety of strategies.
I free the s string "free(s)", do I also need to free the tmp str (it does point to the same memory block, yet the (same) address it holds is probably stored on another memory location?
After s = tmp;, both s and tmp point to the same allocated memory block. You only need to free one of them.
BTW, the initial allocation should be:
char *s = malloc(strlen(param)+1);
You need to add 1 for the null terminator, since strlen() doesn't count this.

Why does malloc need to be used for dynamic memory allocation in C?

I have been reading that malloc is used for dynamic memory allocation. But if the following code works...
int main(void) {
int i, n;
printf("Enter the number of integers: ");
scanf("%d", &n);
// Dynamic allocation of memory?
int int_arr[n];
// Testing
for (int i = 0; i < n; i++) {
int_arr[i] = i * 10;
}
for (int i = 0; i < n; i++) {
printf("%d ", int_arr[i]);
}
printf("\n");
}
... what is the point of malloc? Isn't the code above just a simpler-to-read way to allocate memory dynamically?
I read on another Stack Overflow answer that if some sort of flag is set to "pedantic", then the code above would produce a compile error. But that doesn't really explain why malloc might be a better solution for dynamic memory allocation.

Look up the concepts for stack and heap; there's a lot of subtleties around the different types of memory. Local variables inside a function live in the stack and only exist within the function.
In your example, int_array only exists while execution of the function it is defined in has not ended, you couldn't pass it around between functions. You couldn't return int_array and expect it to work.
malloc() is used when you want to create a chunk of memory which exists on the heap. malloc returns a pointer to this memory. This pointer can be passed around as a variable (eg returned) from functions and can be used anywhere in your program to access your allocated chunk of memory until you free() it.
Example:
'''C
int main(int argc, char **argv){
int length = 10;
int *built_array = make_array(length); //malloc memory and pass heap pointer
int *array = make_array_wrong(length); //will not work. Array in function was in stack and no longer exists when function has returned.
built_array[3] = 5; //ok
array[3] = 5; //bad
free(built_array)
return 0;
}
int *make_array(int length){
int *my_pointer = malloc( length * sizeof int);
//do some error checking for real implementation
return my_pointer;
}
int *make_array_wrong(int length){
int array[length];
return array;
}
'''
Note:
There are plenty of ways to avoid having to use malloc at all, by pre-allocating sufficient memory in the callers, etc. This is recommended for embedded and safety critical programs where you want to be sure you'll never run out of memory.

Just because something looks prettier does not make it a better choice.
VLAs have a long list of problems, not the least of which they are not a sufficient replacement for heap-allocated memory.
The primary -- and most significant -- reason is that VLAs are not persistent dynamic data. That is, once your function terminates, the data is reclaimed (it exists on the stack, of all places!), meaning any other code still hanging on to it are SOL.
Your example code doesn't run into this problem because you aren't using it outside of the local context. Go ahead and try to use a VLA to build a binary tree, then add a node, then create a new tree and try to print them both.
The next issue is that the stack is not an appropriate place to allocate large amounts of dynamic data -- it is for function frames, which have a limited space to begin with. The global memory pool, OTOH, is specifically designed and optimized for this kind of usage.
It is good to ask questions and try to understand things. Just be careful that you don't believe yourself smarter than the many, many people who took what now is nearly 80 years of experience to design and implement systems that quite literally run the known universe. Such an obvious flaw would have been immediately recognized long, long ago and removed before either of us were born.
VLAs have their place, but it is, alas, small.

Declaring local variables takes the memory from the stack. This has two ramifications.
That memory is destroyed once the function returns.
Stack memory is limited, and is used for all local variables, as well as function return addresses. If you allocate large amounts of memory, you'll run into problems. Only use it for small amounts of memory.

When you have the following in your function code:
int int_arr[n];
It means you allocated space on the function stack, once the function will return this stack will cease to exist.
Image a use case where you need to return a data structure to a caller, for example:
Car* create_car(string model, string make)
{
Car* new_car = malloc(sizeof(*car));
...
return new_car;
}
Now, once the function will finish you will still have your car object, because it was allocated on the heap.

The memory allocated by int int_arr[n] is reserved only until execution of the routine ends (when it returns or is otherwise terminated, as by setjmp). That means you cannot allocate things in one order and free them in another. You cannot allocate a temporary work buffer, use it while computing some data, then allocate another buffer for the results, and free the temporary work buffer. To free the work buffer, you have to return from the function, and then the result buffer will be freed to.
With automatic allocations, you cannot read from a file, allocate records for each of the things read from the file, and then delete some of the records out of order. You simply have no dynamic control over the memory allocated; automatic allocations are forced into a strictly last-in first-out (LIFO) order.
You cannot write subroutines that allocate memory, initialize it and/or do other computations, and return the allocated memory to their callers.
(Some people may also point out that the stack memory commonly used for automatic objects is commonly limited to 1-8 mebibytes while the memory used for dynamic allocation is generally much larger. However, this is an artifact of settings selected for common use and can be changed; it is not inherent to the nature of automatic versus dynamic allocation.)

If the allocated memory is small and used only inside the function, malloc is indeed unnecessary.
If the memory amount is extremely large (usually MB or more), the above example may cause stack overflow.
If the memory is still used after the function returned, you need malloc or global variable (static allocation).
Note that the dynamic allocation through local variables as above may not be supported in some compiler.

Weird situation when returning char *

I am pretty new to C programming and I have several functions returning type char *
Say I declare char a[some_int];, and I fill it later on. When I attempt to return it at the end of the function, it will only return the char at the first index. One thing I noticed, however, is that it will return the entirety of a if I call any sort of function on it prior to returning it. For example, my function to check the size of a string (calling something along the lines of strLength(a);).
I'm very curious what the situation is with this exactly. Again, I'm new to C programming (as you probably can tell).
EDIT: Additionally, if you have any advice concerning the best method of returning this, please let me know. Thanks!
EDIT 2: For example:
I have char ret[my_strlen(a) + my_strlen(b)]; in which a and b are strings and my_strlen returns their length.
Then I loop through filling ret using ret[i] = a[i]; and incrementing.
When I call my function that prints an input string (as a test), it prints out how I want it, but when I do
return ret;
or even
char *ptr = ret;
return ptr;
it never supplies me with the full string, just the first char.

A way not working to return a chunk of char data is to return it in memory temporaryly allocated on the stack during the execution of your function and (most probably) already used for another purpose after it returned.
A working alternative would be to allocate the chunk of memory ont the heap. Make sure you read up about and understand the difference between stack and heap memory! The malloc() family of functions is your friend if you choose to return your data in a chunk of memory allocated on the heap (see man malloc).
char* a = (char*) malloc(some_int * sizeof(char)) should help in your case. Make sure you don't forget to free up memory once you don't need it any more.
char* ret = (char*) malloc((my_strlen(a) + my_strlen(b)) * sizeof(char)) for the second example given. Again don't forget to free once the memory isn't used any more.
As MByD correctly pointed out, it is not forbidden in general to use memory allocated on the stack to pass chunks of data in and out of functions. As long as the chunk is not allocated on the stack of the function returning this is also quite well.
In the scenario below function b will work on a chunk of memory allocated on the stackframe created, when function a entered and living until a returns. So everything will be pretty fine even though no memory allocated on the heap is involved.
void b(char input[]){
/* do something useful here */
}
void a(){
char buf[BUFFER_SIZE];
b(buf)
/* use data filled in by b here */
}
As still another option you may choose to leave memory allocation on the heap to the compiler, using a global variable. I'd count at least this option to the last resort category, as not handled properly, global variables are the main culprits in raising problems with reentrancy and multithreaded applications.
Happy hacking and good luck on your learning C mission.

Why is not freeing memory bad practice?

int a = 0;
int *b = malloc (sizeof(int));
b = malloc (sizeof(int));
The above code is bad because it allocates memory on the heap and then doesn't free it, meaning you lose access to it. But you also created 'a' and never used it, so you also allocated memory on the stack, which isn't freed until the scope ends.
So why is it bad practice to not free memory on the heap but okay for memory on the stack to not be freed (until the scope ends)?
Note: I know that memory on the stack can't be freed, I want to know why its not considered bad.

The stack memory will get released automatically when the scope ends. The memory allocated on the heap will remain occupied unless you release it explicitly. As an example:
void foo(void) {
int a = 0;
void *b = malloc(1000);
}
for (int i=0; i<1000; i++) {
foo();
}
Running this code will decrease the available memory by 1000*1000 bytes required by b, whereas the memory required by a will always get released automatically when you return from the foo call.

Simple: Because you'll leak memory. And memory leaks are bad. Leaks: bad, free: good.
When calling malloc or calloc, or indeed any *alloc function, you're claiming a chunk of memory (the size of which is defined by the arguments passed to the allocating function).
Unlike stack variables, which reside in a portion of memory the program has, sort of, free reign over, the same rules don't apply to heap memory. You may need to allocate heap memory for any number of reasons: the stack isn't big enough, you need an array of pointers, but have no way of knowing how big this array will need to be at compile time, you need to share some chunk of memory (threading nightmares), a struct that requires the members to be set at various places (functions) in your program...
Some of these reasons, by their very nature, imply that the memory can't be freed as soon as pointer to that memory goes out of scope. Another pointer might still be around, in another scope, that points to the same block of memory.
There is, though, as mentioned in one of the comments, a slight drawback to this: heap memory requires not just more awareness on the programmers part, but it's also more expensive, and slower than working on the stack.
So some rules of thumb are:
You claimed the memory, so you take care of it... you make sure it's freed when you're done playing around with it.
Don't use heap memory without a valid reason. Avoiding stack overflow, for example, is a valid reason.
Anyway,
Some examples:
Stack overflow:
#include <stdio.h>
int main()
{
int foo[2000000000];//stack overflow, array is too large!
return 0;
}
So, here we've depleted the stack, we need to allocate the memory on the heap:
#include <stdio.h>
#include <stdlib.h>
int main()
{
int *foo= malloc(2000000000*sizeof(int));//heap is bigger
if (foo == NULL)
{
fprintf(stderr, "But not big enough\n");
}
free(foo);//free claimed memory
return 0;
}
Or, an example of an array, whose length depends on user input:
#include <stdio.h>
#include <stdlib.h>
int main()
{
int *arr = NULL;//null pointer
int arrLen;
scanf("%d", &arrLen);
arr = malloc(arrLen * sizeof(int));
if (arr == NULL)
{
fprintf(stderr, "Not enough heap-mem for %d ints\n", arrLen);
exit ( EXIT_FAILURE);
}
//do stuff
free(arr);
return 0;
}
And so the list goes on... Another case where malloc or calloc is useful: An array of strings, that all might vary in size. Compare:
char str_array[20][100];
In this case str_array is an array of 20 char arrays (or strings), each 100 chars long. But what if 100 chars is the maximum you'll ever need, and on average, you'll only ever use 25 chars, or less?
You're writing in C, because it's fast and your program won't use any more resources than it actually needs? Then this isn't what you actually want to be doing. More likely, you want:
char *str_array[20];
for (int i=0;i<20;++i) str_array[i] = malloc((someInt+i)*sizeof(int));
Now each element in the str_array has exactly the amount of memory I need allocated too it. That's just way more clean. However, in this case calling free(str_array) won't cut it. Another rule of thumb is: Each alloc call has to have a free call to match it, so deallocating this memory looks like this:
for (i=0;i<20;++i) free(str_array[i]);
Note:
Dynamically allocated memory isn't the only cause for mem-leaks. It has to be said. If you read a file, opening a file pointer using fopen, but failing to close that file (fclose) will cause a leak, too:
int main()
{//LEAK!!
FILE *fp = fopen("some_file.txt", "w");
if (fp == NULL) exit(EXIT_FAILURE);
fwritef(fp, "%s\n", "I was written in a buggy program");
return 0;
}
Will compile and run just fine, but it will contain a leak, that is easily plugged (and it should be plugged) by adding just one line:
int main()
{//OK
FILE *fp = fopen("some_file.txt", "w");
if (fp == NULL) exit(EXIT_FAILURE);
fwritef(fp, "%s\n", "I was written in a bug-free(?) program");
fclose(fp);
return 0;
}
As an asside: if the scope is really long, chances are you're trying to cram too much into a single function. Even so, if you're not: you can free up claimed memory at any point, it needn't be the end of the current scope:
_Bool some_long_f()
{
int *foo = malloc(2000000000*sizeof(int));
if (foo == NULL) exit(EXIT_FAILURE);
//do stuff with foo
free(foo);
//do more stuff
//and some more
//...
//and more
return true;
}

Because stack and heap, mentioned many times in the other answers, are sometimes misunderstood terms, even amongst C programmers, Here is a great conversation discussing that topic....
So why is it bad practice to not free memory on the heap but okay for memory on the stack to not be freed (until the scope ends)?
Memory on the stack, such as memory allocated to automatic variables, will be automatically freed upon exiting the scope in which they were created.
whether scope means global file, or function, or within a block ( {...} ) within a function.
But memory on the heap, such as that created using malloc(), calloc(), or even fopen() allocate memory resources that will not be made available to any other purpose until you explicity free them using free(), or fclose()
To illustrate why it is bad practice to allocate memory without freeing it, consider what would happen if an application were designed to run autonomously for very long time, say that application was used in the PID loop controlling the cruise control on your car. And, in that application there was un-freed memory, and that after 3 hours of running, the memory available in the microprocessor is exhausted, causing the PID to suddenly rail. "Ah!", you say, "This will never happen!" Yes, it does. (look here). (not exactly the same problem, but you get the idea)
If that word picture doesn't do the trick, then observe what happens when you run this application (with memory leaks) on your own PC. (at least view the graphic below to see what it did on mine)
Your computer will exhibit increasingly sluggish behavior until it eventually stops working. Likely, you will be required to re-boot to restore normal behavior.
(I would not recommend running it)
#include <ansi_c.h>
char *buf=0;
int main(void)
{
long long i;
char text[]="a;lskdddddddd;js;'";
buf = malloc(1000000);
strcat(buf, "a;lskdddddddd;js;dlkag;lkjsda;gkl;sdfja;klagj;aglkjaf;d");
i=1;
while(strlen(buf) < i*1000000)
{
strcat(buf,text);
if(strlen(buf) > (i*10000) -10)
{
i++;
buf = realloc(buf, 10000000*i);
}
}
return 0;
}
Memory usage after just 30 seconds of running this memory pig:

I guess that has to do with scope 'ending' really often (at the end of a function) meaning if you return from that function creating a and allocating b, you will have freed in a sense the memory taken by a, and lost for the remainder of the execution memory used by b
Try calling that function a a handful of times, and you'll soon exhaust all of your memory. This never happens with stack variables (except in the case of a defectuous recursion)

Memory for local variables automatically is reclaimed when the function is left (by resetting the frame pointer).

The problem is that memory you allocate on the heap never gets freed until your program ends, unless you explicitly free it. That means every time you allocate more heap memory, you reduce available memory more and more, until eventually your program runs out (in theory).
Stack memory is different because it's laid-out and used in a predictable pattern, as determined by the compiler. It expands as needed for a given block, then contracts when the block ends.

So why is it bad practice to not free memory on the heap but okay for memory on the stack to not be freed (until the scope ends)?
Imagine the following:
while ( some_condition() )
{
int x;
char *foo = malloc( sizeof *foo * N );
// do something interesting with x and foo
}
Both x and foo are auto ("stack") variables. Logically speaking, a new instance for each is created and destroyed in each loop iteration1; no matter how many times this loop runs, the program will only allocate enough memory for a single instance of each.
However, each time through the loop, N bytes are allocated from the heap, and the address of those bytes is written to foo. Even though the variable foo ceases to exist at the end of the loop, that heap memory remains allocated, and now you can't free it because you've lost the reference to it. So each time the loop runs, another N bytes of heap memory is allocated. Over time, you run out of heap memory, which may cause your code to crash, or even cause a kernel panic depending on the platform. Even before then, you may see degraded performance in your code or other processes running on the same machine.
For long-running processes like Web servers, this is deadly. You always want to make sure you clean up after yourself. Stack-based variables are cleaned up for you, but you're responsible for cleaning up the heap after you're done.
1. In practice, this (usually) isn't the case; if you look at the generated machine code, you'll (usually) see the stack space allocated for x and foo at function entry. Usually, space for all local variables (regardless of their scope within the function) is allocated at once.

C Memory Overflow (v2)

EDIT: Updated code with new Pastebin link but it's still stopping at the info->citizens[x]->name while loop. Added realloc to loops and tidied up the code. Any more comments would be greatly appreciated
I'm having a few problems with memory allocation overflowing
http://pastebin.com/vukRGkq9 (v2)
No matter what I try, simply not enough memory is being allocated for info->citizens and gdb is often saying that it cannot access info->citizens[x]->name.
On occasion, I'll even get KERN_INVALID_ADDRESS errors directly after printf statements for strlen (Strlen is not used in the code at the point where gdb halts due to the error, but I'm assuming printf uses strlen in some way). I think it's something to do with how the structure is being allocated memory. So I was wondering if anyone could take a look?

You shouldn't do malloc(sizeof(PEOPLE*)), because it allocates exactly amount of bytes for pointer (4 bytes on 32bit arch).
Seems the thing you want to do is malloc(sizeof(PEOPLE) * N) where N is the max. number of PEOPLE you want to put into that memory chunk.

Clearly the problem lies with:
info->citizens = malloc(sizeof(PEOPLE *));
info->citizens[0] = malloc(sizeof(PEOPLE *));
info->citizens[1] = malloc(sizeof(PEOPLE *));
Think about it logically what you are trying to do here.

Your structs should almost certainly not contains members such as:
time_t *modtimes;
mode_t *modes;
bool *exists;
Instead you should simply use:
time_t modtimes;
mode_t modes;
bool exists;
In that way you do not need to dynamically allocate them, or subsequently release them. The reasons are that a) they're small and b) their size is known in advance. You would use:
char *name;
for a string field because it's not small and you don't know in advance how large it is.
Elsewhere in the code, you have the folllowing:
if(top)
{
PEOPLE *info;
info = malloc(sizeof(PEOPLE *));
}
If top is true then this code allocates a pointer and then immediately leaks it -- the scope of the second info is limited to the if statement so you can neither use it later nor can you release it later. You would need to do something like this:
PEOPLE *process(PEOPLE *info, ...)
{
if (top)
{
info = malloc(sizeof(PEOPLE));
}
info->name = strdup("Henry James");
info->exists = true;
return info;
}

It seems you have one too many levels of indirection. Why are you using **citizens instead of *?
Also, apart from the fact that you are allocating the space for a pointer, not the struct, there are a couple of weird things, such as the local variable info on line 31 means the initial allocation is out of scope once the block closes at line 34.
You need to think more clearly about what data is where.

Lots of memory allocation issues with this code. Those mentioned above plus numerous others, for example:
info->citizens[masterX]->name = malloc(sizeof(char)*strlen(dp->d_name)+1);
info->citizens[masterX]->name = dp->d_name;
You cannot copy strings in C through assignment (using =). You can write this as:
info->citizens[masterX]->name = malloc(strlen(dp->d_name)+1);
strcpy(info->citizens[masterX]->name, dp->d_name);
Or you could condense the whole allocate & copy as follows:
info->citizens[masterX]->name = strdup(dp->d_name);
Similarly at lines 143/147 (except in that case you have also allocated one byte too few in your malloc call).

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight