I have a question about dynamic memory allocation.
Context: I'm writing a program that reads a text file of words and counts the frequency with which each word occurs (one word per line).
This particular function reads the file, counts the lines and characters, then dynamically allocates memory to the array of string pointers, an array storing the count of characters for each line and the strings themselves. (The other parts are less directly relevant to my question).
Question: How often should I reallocate memory if I run out of space? I set a constant ("memstart") for setting the initial memory allocation value. In the below code snippet I realloc for every line over the value of "memstart". Would the program process faster if a reallocated a larger block of memory instead of increasing the memory space by 1 "variable type" each time?
What would be best practice for something like this?
Code Snip:
int read_alloc(FILE* fin, FILE *tmp, char **wdp, int *sz){
int line_cnt= 0, chr, let=1;
do{
chr=getc(fin);
let++;
//count characters
if(chr!=EOF){
chr=tolower(chr);
fputc(chr, tmp);
}
//convert to lcase and write to temp file
if ('\n' == chr || chr==EOF){
sz[(line_cnt)]=((let)*sizeof(char)); //save size needed to store string in array
*(wdp+(line_cnt))=malloc((let)*sizeof(char)); //allocate space for the string
if ((line_cnt-1) >= memstart){
realloc(wdp, (sizeof(wdp)*(memstart+line_cnt))); //if more space needed increase size
realloc(sz, (sizeof(sz)*(memstart+line_cnt)));
}
line_cnt++;
let=1;
}
} while (EOF != chr);
return (line_cnt);
}
While the question is about how often realloc should be called, looking at OP's code, I think it's better to start with how safely it should be done.
The C11 standard states (n1570 draft, § 7.22.3.5, The realloc function, emphasis mine):
Synopsis
#include <stdlib.h>
void *realloc(void *ptr, size_t size);
Description
The realloc function deallocates the old object pointed to by ptr and returns a pointer to a new object that has the size specified by size. The contents of the new object shall be the same as that of the old object prior to deallocation, up to the lesser of the new and old sizes. Any bytes in the new object beyond the size of the old object have indeterminate values.
If ptr is a null pointer, the realloc function behaves like the malloc function for the specified size. (...). If memory for the new object cannot be allocated, the old object is not deallocated and its value is unchanged.
Returns
The realloc function returns a pointer to the new object (which may have the same value as a pointer to the old object), or a null pointer if the new object could not be allocated.
Now let's consider this snippet from the question, where sz is declared as int* sz;
realloc(sz, (sizeof(sz)*(memstart+line_cnt)));
The return value is lost, so we can't know if the call succeeded and if it did, sz is invalidated. Moreover, sizeof(sz) is the size of the pointer, not of the pointed type (int).
A more safe (and correct) pattern would be:
size_t new_size = /* Whatever, let's say */ size + SOME_COSTANT + size / 2;
void *tmp = realloc(ptr, new_size * sizeof *ptr);
if ( tmp == NULL ) {
/* Deal with the error, e.g. log a message with perror, return NULL
(if this is in a function) or just give up, but remeber that
realloc doesn't invalidate nor free 'ptr' on failure */
exit(EXIT_FAILURE);
}
ptr = tmp; // <- on success, realloc invalidated ptr
size = new_size;
Now, to answer the question, realloc should be called only when needed, because it involves potentially expansive system calls. So either allocate a big chunk ahead or choose a growing stratey like doubling (or 1.5 times) the size every time.
It's worth noting that if possible, the OS could perform the reallocation without copying any element of the original array.
The classic answer is to double each time, but a factor of 1.5 might be better. The important bit is that you multiply your array size by some factor each time, rather than adding additional space each time.
Each re-allocation might need to copy the previous array into a new one. We'd like to minimize these copies. If we will be adding n items, and we start with an array of size a, increase by a factor of r each re-allocation, to end with a value of n, the sequence of (re-)allocations will be a, ar, ar^2, ar^3, ..., n. The sum of that sequence is (nr-a)/(r-1). Thus the total space is of order O(n).
Suppose instead we start with a, and this time add r each time. The sequence is a, a+r, a+2r, a+3r, ..., n. The sum of that sequence will be 0.5*((n^2-a^2)/r + a + n). In this case the total space of order O(n^2). Much worse!
With a constant factor of 2, the array will be in the worse case 1/2 empty. That's probably ok. You can always shrink the allocation when you're done and know the final size.
As pointed out in another answer, there are several bugs in the manner in which you call realloc(), but that wasn't the question.
Related
I have this code
int main(int argc, char *argv[])
{
int i=1;
char **m=malloc(sizeof(char *)*i);
printf("%zu\n",sizeof *m);
m[0]=malloc(strlen("hello")+1);
strcpy(m[0],"hello");
printf("%s\n", m[0]);
i=2;
m=(char **)realloc(m,sizeof (char *)*i);
m[1]=malloc(strlen("hi")+1);
strcpy(m[1],"hi");
printf("%s %s \n",m[0],m[1] );
// TODO: write proper cleanup code just for good habits.
return 0;
}
this is how I am allocating pointer char **m 8 byte single char pointer
int i=1;
char **m=malloc(sizeof(char *)*i);
and this is how I am allocating area of space whose address will be kept in m[0]
m[0]=malloc(strlen("hello")+1);
strcpy(m[0],"hello");
printf("%s\n", m[0]);
I like to know is this normally how its done. I mean allocating space for pointer and then allocating space in memory that the pointer will hold.
Does m[0]=malloc(strlen("hello")+1); is same as this *(m+0)=malloc(strlen("hello")+1); and does this m[1]=malloc(strlen("hi")+1); this *(m+1)=malloc(strlen("hi")+1);
And I am increasing pointer to pointer numbers like this in allocation m=(char **)realloc(m,sizeof (char *)*i); before m[1]=malloc(strlen("hi")+1);
is there anything wrong with above code. I seen similar code on this Dynamic memory/realloc string array
can anyone please explain with this statement char **m=malloc(sizeof(char *)*i); I am allocating 8 byte single pointer of type char but with this statement m=(char **)realloc(m,sizeof (char *)*i); why I am not getting stack smaching detected error. How exactly realloc works. can anyone give me the link of realloc function or explain a bit on this please
I like to know is this normally how its done. I mean allocating space for pointer and then allocating space in memory that the pointer will hold.
It depends on what you are trying to achieve. If you wish to allocate an unspecified amount of strings with individual lengths, then your code is pretty much the correct way to do it.
If you wish to have a fixed amount of strings with individual lengths, you could just do char* arr [n]; and then only malloc each arr[i].
Or if you wish to have a fixed amount of strings with a fixed maximum length, you could use a 2D array of characters, char arr [x][y];, and no malloc at all.
Does m[0]=malloc(strlen("hello")+1); is same as this *(m+0)=malloc(strlen("hello")+1);
Yes, m[0] is 100% equivalent to *((m)+(0)). See Do pointers support "array style indexing"?
is there anything wrong with above code
Not really, except stylistic and performance issues. It could optionally be rewritten like this:
char** m = malloc(sizeof(*m) * i); // subjective style change
m[0]=malloc(sizeof("hello")); // compile-time calculation, better performance
why I am not getting stack smaching detected error
Why would you get that? The only thing stored on the stack here is the char** itself. The rest is stored on the heap.
How exactly realloc works. can anyone give me the link of realloc function or explain a bit on this please
It works pretty much as you've used it, though pedantically you should not store the result in the same pointer as the one passed, in case realloc fails and you wish to continue using the old data. That's a very minor remark though, since in case realloc fails, it either means that you made an unrealistic request for memory, or that the RAM on your system is toast and you will unlikely be able to continue execution anyway.
The canonical documentation for realloc would be the C standard C17 7.22.3.5:
#include <stdlib.h>
void *realloc(void *ptr, size_t size);
The realloc function deallocates the old object pointed to by ptr and returns a
pointer to a new object that has the size specified by size. The contents of the new
object shall be the same as that of the old object prior to deallocation, up to the lesser of
the new and old sizes. Any bytes in the new object beyond the size of the old object have
indeterminate values.
If ptr is a null pointer, the realloc function behaves like the malloc function for the
specified size. Otherwise, if ptr does not match a pointer earlier returned by a memory
management function, or if the space has been deallocated by a call to the free or
realloc function, the behavior is undefined. If memory for the new object cannot be
allocated, the old object is not deallocated and its value is unchanged.
Returns
The realloc function returns a pointer to the new object (which may have the same value as a pointer to the old object), or a null pointer if the new object could not be allocated.
Notably there is no guarantee that the returned pointer always has the same value as the old pointer, so correct use would be:
char* tmp = realloc(arr, size);
if(tmp == NULL)
{
/* error handling */
}
arr = tmp;
(Where tmp has the same type as arr.)
Your code looks fine to me. Yes, if you are storing an array of strings, and you don't know how many strings will be in the array in advance, then it is perfectly fine to allocate space for an array of pointers with malloc. You also need to somehow get memory for the strings themselves, and it is perfectly fine for each string to be allocated with its own malloc call.
The line you wrote to use realloc is fine; it expands the memory area you've allocated for pointers so that it now has the capacity to hold 2 pointers, instead of just 1. When the realloc function does this, it might need to move the memory allocation to a different address, so that is why you have to overwrite m as you did. There is no stack smashing going on here. Also, please note that pointers are not 8 bytes on every platform; that's why it was wise of you to write sizeof(char *) instead of 8.
To find more documentation about realloc, you can look in the C++ standard, or the POSIX standard, but perhaps the most appropriate place for this question is the C standard, which documents realloc on page 314.
I was wondering what happens with the memory when u realloc -1 your array. According everything that I've read about realloc I suppose that pointer still points at the same place in memory (there's no need for function to seek another block of memory as that one is available and sufficient), tell me if I'm wrong. My question is: Is the deleted piece of array deleted (like with using free()) or are the values stay untouched and the piece of memory is being shared for future operations of malloc, calloc etc.?
EDIT:
I have one more question. Does this function work properly? It should delete element of array previously overwriting it by the next element of the array. Doing it over the whole array, the last element is the same as the one before last and the last one is deleted. PicCounter is the number of pictures already uploaded to program. Check this out:
int DeletePicture(struct Picture **tab, int *PicCounter)
{
int PicToDelete;
printf("Enter the number of pic to delete ");
scanf("%d", &PicToDelete);
for (int i = PicToDelete - 1; i < (*PicCounter) - 2; i++)
{
(*tab)[i] = (*tab)[i + 1];
}
struct Picture *temp;
temp = realloc(*tab, ((*PicCounter)-1) * sizeof(*temp));
if (temp != NULL)
{
*tab = temp;
//That doesn't delete the element, because in main I can still print it
//like e.g. tab[lastelement].
(*PicCounter)--;
printf("Picture has been deleted\n");
return 0;
}
else
{
printf("Memory reallocation error\n");
return 1;
}
}
Regarding void *realloc(void *ptr, size_t size), the C standard says in C 2018 7.22.3.5 paragraphs 2 and 3:
The realloc function deallocates the old object pointed to by ptr and returns a pointer to a new object that has the size specified by size. The contents of the new object shall be the same as that of the old object prior to deallocation, up to the lesser of the new and old sizes. Any bytes in the new object beyond the size of the old object have indeterminate values.
If ptr is a null pointer, the realloc function behaves like the malloc function for the specified size. Otherwise, if ptr does not match a pointer earlier returned by a memory management function, or if the space has been deallocated by a call to the free or realloc function, the behavior is undefined. If size is nonzero and memory for the new object is not allocated, the old object is not deallocated. If size is zero and memory for the new object is not allocated, it is implementation-defined whether the old object is deallocated. If the old object is not deallocated, its value shall be unchanged.
What this means when you ask to reduce the size of a previously allocated object is:
The returned pointer might or might not be the same as the original pointer. (See discussion below.)
The C standard permits the portion of memory that is released to be reused for other allocations. Whether or not it is reused is up to the C implementation.
Whether the values in the released portion of memory are immediately overwritten or not is not specified by the C standard. Certainly the user of realloc may not rely on any behavior regarding that memory.
Discussion
When a memory allocation is reduced, it certainly seems “easy” for the memory allocation routines to simply return the same pointer while remembering that the released memory is free. However, memory allocation systems are fairly complex, so other factors may be involved. For example, hypothetically:
To support many small allocations without much overhead, a memory allocation system might create a pool of memory for one-to-four byte allocations, another pool for five-to-eight, another pool for eight-to-16, and a general pool for larger sizes. For the larger sizes, it might remember each allocation individually, customizing its size and managing them all with various data structures. For the smaller sizes, it might keep little more than a bitmap for each, with each bit indicating whether or not its corresponding four-byte (or eight or 16) region is allocated. In such a system, if you release eight bytes of a 16-byte allocation, the memory allocation software might move the data to something in the eight-byte pool.
In any memory allocation system, if you release just a few bytes at the end of an allocation, it might not be enough bytes to take advantage of—the data structures required to track the few bytes you released might be bigger than the few bytes. So it is not worthwhile to make them available for reuse. The memory allocation system just keeps them with the block, although it may remember the data in the block is actually a bit smaller than the space reserved for it.
I'm trying to code a buffer for an input file. The Buffer should always contain a defined amount of data. If a few bytes of the data were used, the buffer should read data from the file until it has the defined size again.
const int bufsize = 10;
int *field = malloc(bufsize*sizeof(int)); //allocate the amount of memory the buffer should contain
for(i=0;i<bufsize;++i) //initialize memory with something
*(field+i) = i*2;
field += 4; //Move pointer 4 units because the first 4 units were used and are no longer needed
field= realloc(field,bufsize*sizeof(int)); //resize the now smaller buffer to its original size
//...some more code were the new memory (field[6]-field[9]) are filled again...
Here is a short example of how I'm trying to do it at the moment (without files, because this is the part thats not working), but the realloc() always returns NULL. In this example, the first 4 units were used, so the pointer should move forward and the missing data at the end of the memory (so that it will again contain 10 elements) should be allocated. What am I doing wrong?
I would be very thankful if someone could help me
You need memmove() instead
memmove(field, field + 4, (bufsize - 4) * sizeof(*field));
you don't need to realloc() because you are not changing the size of the buffer, just think about it.
If you do this
field += 4;
now you lost the reference to the begining of field so you can't even call free on it, nor realloc() of course. Read WhozCraig comment for instance.
Doing realloc() for the same size doesn't make that much sense.
Using realloc() the way you do causes some other problems, for instance when it fails you also run into the same problem, you loose reference to the original pointer.
So the recommended method is
void *pointer;
pointer = realloc(oldPointer, oldSize + nonZeroNewSize);
if (pointer == NULL)
handleFailure_PerhapsFree_oldPointer();
oldPointer = pointer;
So the title of your question contains the answer to it, what you need is to move the data from offset 4 * sizeof(int) bytes to the begining of the pointer, for which memmove() is the perfect tool, notice that you could also think of using memcpy() but memcpy() cannot handle the case of overlapping data, which is your case.
Your problem should be named as Cyclic Buffer.
You should call malloc() just once when opening file and once free() when you close it.
You don't need to call realloc() at all. All is necessary is advaning pointer by amount of read data, wrapping its value around size of buffer and replacing old data with new data from file.
Your problem with realloc(): you must pass same pointer to it which was previously returned from malloc() or realloc() without offsetting it!
Please look at the following code
char *line = (char *) malloc(100);
char *newline,*source = line;
int size=100;
newline = realloc ( line , size +=size);
// assuming that newline has been successfully assigned the demanded memory and line is freed
Now my question here is that can i write in future expression like
source = newline +( line - source );
I am having doubt in mind because i am using the line pointer which is freed after the successful operation of the realloc() but My program ( this only a snippet from it ) is
still working?so is it safe to use line pointer after realloc() has done?
No, it's not safe to use the line pointer after realloc is done. realloc changes the size of the block pointed to by line. If the size increases, the old location may not have enough contiguous space to accommodate the new larger block. So the location of the block in the memory changes. Use the pointer returned by realloc.
If the old location does not have enough contiguous space for the larger block requested by the user, realloc tries to find a new block of the required size (like malloc), copies the elements from the old block and frees the old block.
If realloc fails, the old pointer is valid and must be freed. If realloc succeeds, the old pointer is considered invalid and should not be freed.
Also, the fact that your program is working is not always a good way to check if something is correct. For example, if you declare int a[10] and you access a[10] or a[11] it may not fail most of the time but it's still undefined behaviour.
so is it safe to use 'linepointer afterrealloc()` has done?
No, because realloc() may free your input buffer while making the new one(if new one is larger than the previous size). And realloc would copy the previous user data from the old buffer into new buffer. So user should use the new buffer as it contains all older information not the old one.
This is about realloc()
void *realloc(void *p, size_t size) realloc changes the size of the
object pointed to by p to size. The contents will be unchanged up to
the minimum of the old and new sizes. If the new size is larger, the
new space is uninitialized. realloc returns a pointer to the new
space, or NULL if the request cannot be satisfied, in which case *p is
unchanged.
Could someone explain what second parameter in realloc really is as I cant find a way to test it.
So suppose that we have something like this
int *p = malloc(sizeof(int)); //we have just enough space to store single int value
now if I want to store 2 int values in p do I need to send to realloc as second parameter 2 * sizeof(int) new size of the block or sizeof(int) as it needs to extend memory for size of int
In case I should send to realloc total value of new block in this case 2 * sizeof(int), what will it do in case I send it just sizeof(int), just return original pointer and do nothing within memory or something else?
The second parameter is the new size of the memory block (total new size) in bytes. If you want room for 2 ints, you'd use 2 * sizeof(int).
just return original pointer and do nothing within memory or something else?
This is not guaranteed by the specifications. Many implementations will just return the original pointer unchanged, but they could just as easily move it. You will not get a new memory allocation large enough for 2 ints, however, merely the original data values in a memory location large enough for sizeof(int).
You want to tell realloc the new total size of the allocation. So from your example, 2 * sizeof(int).
Your second question isn't entirely clear, so I'll try to wrap all those pieces into one sentence. If you call realloc with the same size value as the original call to malloc, it is up to the implementation whether to return the original pointer or to move the data to a (implementation-defined) more convenient place; if you then try to store two ints in the space for which you only requested one int then you've triggered undefined behavior. It could clobber other allocated memory (causing wrong results in calculations) or (more likely) it could clobber malloc's own bookkeeping data (likely your program will abort with a Segfault somewhat later on when malloc actually takes a look at the data.